

Our Courses
Foundations of Sports Analytics: Data, Representation, and Models in Sports
This course provides an introduction to using Python to analyze team performance in sports. Learners will discover a variety of techniques that can be used to represent sports data and how to extract narratives based on these analytical techniques. The main focus of the introduction will be on the use of regression analysis to analyze team and player performance data, using examples drawn from the National Football League (NFL), the National Basketball Association (NBA), the National Hockey League (NHL), the English Premier LEague (EPL, soccer) and the Indian Premier League (IPL, cricket).
-
Course by
-
Self Paced
-
49 hours
-
English
Data Processing and Manipulation
The "Data Processing and Manipulation" course provides students with a comprehensive understanding of various data processing and manipulation concepts and tools. Participants will learn how to handle missing values, detect outliers, perform sampling and dimension reduction, apply scaling and discretization techniques, and explore data cube and pivot table operations. This course equips students with essential skills for efficiently preparing and transforming data for analysis and decision-making. Learning Objectives: 1.
-
Course by
-
Self Paced
-
29 hours
-
English
Data Processing Using Python
This course (The English copy of "用Python玩转数据" ) is mainly for non-computer majors. It starts with the basic syntax of Python, to how to acquire data in Python locally and from network, to how to present data, then to how to conduct basic and advanced statistic analysis and visualization of data, and finally to how to design a simple GUI to present and process data, advancing level by level.
-
Course by
-
Self Paced
-
29 hours
-
English
Advanced Algorithms and Complexity
In previous courses of our online specialization you've learned the basic algorithms, and now you are ready to step into the area of more complex problems and algorithms to solve them. Advanced algorithms build upon basic ones and use new ideas. We will start with networks flows which are used in more typical applications such as optimal matchings, finding disjoint paths and flight scheduling as well as more surprising ones like image segmentation in computer vision.
-
Course by
-
Self Paced
-
27 hours
-
English
Introduction to Big Data with Spark and Hadoop
This self-paced IBM course will teach you all about big data! You will become familiar with the characteristics of big data and its application in big data analytics. You will also gain hands-on experience with big data processing tools like Apache Hadoop and Apache Spark. Bernard Marr defines big data as the digital trace that we are generating in this digital era. You will start the course by understanding what big data is and exploring how insights from big data can be harnessed for a variety of use cases.
-
Course by
-
Self Paced
-
24 hours
-
English
Fundamentals of Scalable Data Science
Apache Spark is the de-facto standard for large scale data processing. This is the first course of a series of courses towards the IBM Advanced Data Science Specialization. We strongly believe that is is crucial for success to start learning a scalable data science platform since memory and CPU constraints are to most limiting factors when it comes to building advanced machine learning models.\n\nIn this course we teach you the fundamentals of Apache Spark using python and pyspark.
-
Course by
-
Self Paced
-
22 hours
-
English
Predictive Modeling and Machine Learning with MATLAB
In this course, you will build on the skills learned in Exploratory Data Analysis with MATLAB and Data Processing and Feature Engineering with MATLAB to increase your ability to harness the power of MATLAB to analyze data relevant to the work you do. These skills are valuable for those who have domain knowledge and some exposure to computational tools, but no programming background.
-
Course by
-
Self Paced
-
22 hours
-
English
Introduction to Relational Databases (RDBMS)
Are you ready to dive into the world of data engineering? In this beginner level course, you will gain a solid understanding of how data is stored, processed, and accessed in relational databases (RDBMSes). You will work with different types of databases that are appropriate for various data processing requirements. You will begin this course by being introduced to relational database concepts, as well as several industry standard relational databases, including IBM DB2, MySQL, and PostgreSQL.
-
Course by
-
Self Paced
-
19 hours
-
English
Serverless Data Processing with Dataflow: Develop Pipelines
In this second installment of the Dataflow course series, we are going to be diving deeper on developing pipelines using the Beam SDK. We start with a review of Apache Beam concepts. Next, we discuss processing streaming data using windows, watermarks and triggers. We then cover options for sources and sinks in your pipelines, schemas to express your structured data, and how to do stateful transformations using State and Timer APIs. We move onto reviewing best practices that help maximize your pipeline performance.
-
Course by
-
Self Paced
-
19 hours
-
English
Big Data Integration and Processing
At the end of the course, you will be able to: *Retrieve data from example database and big data management systems *Describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications *Identify when a big data problem needs data integration *Execute simple big data integration and processing on Hadoop and Spark platforms This course is for those new to data science. Completion of Intro to Big Data is recommended.
-
Course by
-
Self Paced
-
18 hours
-
English
Building Modern Node.js Applications on AWS
In modern cloud native application development, it’s oftentimes the goal to build out serverless architectures that are scalable, are highly available, and are fully managed. This means less operational overhead for you and your business, and more focusing on the applications and business specific projects that differentiate you in your marketplace. In this course, we will be covering how to build a modern, greenfield serverless backend on AWS.
-
Course by
-
Self Paced
-
17 hours
-
English
Building Batch Data Pipelines on Google Cloud
Data pipelines typically fall under one of the Extract and Load (EL), Extract, Load and Transform (ELT) or Extract, Transform and Load (ETL) paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud for data transformation including BigQuery, executing Spark on Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Dataflow. Learners get hands-on experience building data pipeline components on Google Cloud using Qwiklabs.
-
Course by
-
Self Paced
-
17 hours
-
English
Introduction to Business Analytics with R
Nearly every aspect of business is affected by data analytics. For businesses to capitalize on data analytics, they need leaders who understand the business analytic workflow. This course addresses the human skills gap by providing a foundational set of data processing skills that can be applied to many business settings. In this course you will use a data analytic language, R, to efficiently prepare business data for analytic tools such as algorithms and visualizations.
-
Course by
-
Self Paced
-
17 hours
-
English
Building Modern .NET Applications on AWS
In modern cloud native application development, it’s oftentimes the goal to build out serverless architectures that are scalable, are highly available, and are fully managed. This means less operational overhead for you and your business, and more focusing on the applications and business specific projects that differentiate you in your marketplace. In this course, we will be covering how to build a modern, greenfield serverless backend on AWS.
-
Course by
-
Self Paced
-
16 hours
-
English
Big Data Analysis Deep Dive
The job market for architects, engineers, and analytics professionals with Big Data expertise continues to increase. The Academy’s Big Data Career path focuses on the fundamental tools and techniques needed to pursue a career in Big Data. This course includes: data processing with python, writing and reading SQL queries, transmitting data with MaxCompute, analyzing data with Quick BI, using Hive, Hadoop, and spark on E-MapReduce, and how to visualize data with data dashboards.
-
Course by
-
Self Paced
-
14 hours
-
English
Social Media Data Analytics
Learner Outcomes: After taking this course, you will be able to: - Utilize various Application Programming Interface (API) services to collect data from different social media sources such as YouTube, Twitter, and Flickr. - Process the collected data - primarily structured - using methods involving correlation, regression, and classification to derive insights about the sources and people who generated that data. - Analyze unstructured data - primarily textual comments - for sentiments expressed in them. - Use different tools for collecting, analyzing, and exploring social media data for resea
-
Course by
-
13 hours
-
English
Introduction to Data Engineering
Start your journey in one of the fastest growing professions today with this beginner-friendly Data Engineering course! You will be introduced to the core concepts, processes, and tools you need to know in order to get a foundational knowledge of data engineering. as well as the roles that Data Engineers, Data Scientists, and Data Analysts play in the ecosystem. You will begin this course by understanding what is data engineering as well as the roles that Data Engineers, Data Scientists, and Data Analysts play in this exciting field.
-
Course by
-
Self Paced
-
13 hours
-
English
Introduction to Designing Data Lakes on AWS
In this class, Introduction to Designing Data Lakes on AWS, we will help you understand how to create and operate a data lake in a secure and scalable way, without previous knowledge of data science! Starting with the "WHY" you may want a data lake, we will look at the Data-Lake value proposition, characteristics and components.
-
Course by
-
Self Paced
-
13 hours
-
English
Deploying Machine Learning Models
In this course we will learn about Recommender Systems (which we will study for the Capstone project), and also look at deployment issues for data products. By the end of this course, you should be able to implement a working recommender system (e.g.
-
Course by
-
Self Paced
-
11 hours
-
English
Basic Data Processing and Visualization
This is the first course in the four-course specialization Python Data Products for Predictive Analytics, introducing the basics of reading and manipulating datasets in Python. In this course, you will learn what a data product is and go through several Python libraries to perform data retrieval, processing, and visualization. This course will introduce you to the field of data science and prepare you for the next three courses in the Specialization: Design Thinking and Predictive Analytics for Data Products, Meaningful Predictive Modeling, and Deploying Machine Learning Models.
-
Course by
-
Self Paced
-
11 hours
-
English
I/O-efficient algorithms
I/O-efficient algorithms, also known as external memory algorithms or cache-oblivious algorithms, are a class of algorithms designed to efficiently process data that is too large to fit entirely in the main memory (RAM) of a computer. These algorithms are particularly useful when dealing with massive datasets, such as those found in large-scale data processing, database management, and file systems. Operations on data become more expensive when the data item is located higher in the memory hierarchy.
-
Course by
-
Self Paced
-
10 hours
-
English
Serverless Data Processing with Dataflow: Operations
In the last installment of the Dataflow course series, we will introduce the components of the Dataflow operational model. We will examine tools and techniques for troubleshooting and optimizing pipeline performance. We will then review testing, deployment, and reliability best practices for Dataflow pipelines. We will conclude with a review of Templates, which makes it easy to scale Dataflow pipelines to organizations with hundreds of users. These lessons will help ensure that your data platform is stable and resilient to unanticipated circumstances.
-
Course by
-
Self Paced
-
10 hours
-
English
Using Sensors With Your Raspberry Pi
This course on integrating sensors with your Raspberry Pi is course 3 of a Coursera Specialization and can be taken separately or as part of the specialization. Although some material and explanations from the prior two courses are used, this course largely assumes no prior experience with sensors or data processing other than ideas about your own projects and an interest in building projects with sensors. This course focuses on core concepts and techniques in designing and integrating any sensor, rather than overly specific examples to copy.
-
Course by
-
Self Paced
-
9 hours
-
English
Advanced Data Science Capstone
This project completer has proven a deep understanding on massive parallel data processing, data exploration and visualization, advanced machine learning and deep learning and how to apply his knowledge in a real-world practical use case where he justifies architectural decisions, proves understanding the characteristics of different algorithms, frameworks and technologies and how they impact model performance and scalability.
-
Course by
-
9 hours
-
English
Big Data Science with the BD2K-LINCS Data Coordination and Integration Center
The Library of Integrative Network-based Cellular Signatures (LINCS) was an NIH Common Fund program that lasted for 10 years from 2012-2021. The idea behind the LINCS program was to perturb different types of human cells with many different types of perturbations such as drugs and other small molecules, genetic manipulations such as single gene knockdown, knockout, or overexpression, manipulation of the extracellular microenvironment conditions, for example, growing cells on different surfaces, and more.
-
Course by
-
Self Paced
-
9 hours
-
English