- Level Foundation
- Duration 21 hours
- Course by Johns Hopkins University
-
Offered by
About
Developing insights about your organization, business, or research project depends on effective modeling and analysis of the data you collect. Building effective models requires understanding the different types of questions you can ask and how to map those questions to your data. Different modeling approaches can be chosen to detect interesting patterns in the data and identify hidden relationships. This course covers the types of questions you can ask of data and the various modeling approaches that you can apply. Topics covered include hypothesis testing, linear regression, nonlinear modeling, and machine learning. With this collection of tools at your disposal, as well as the techniques learned in the other courses in this specialization, you will be able to make key discoveries from your data for improving decision-making throughout your organization. In this specialization we assume familiarity with the R programming language. If you are not yet familiar with R, we suggest you first complete R Programming before returning to complete this course.Modules
Modeling Data Basics
2
Readings
- Course Textbook
- The Purpose of Data Science
Types of Data Science Questions
1
Readings
- Types of Data Science Questions
Data Needs
7
Readings
- Data Needs
- Number of observations is too small
- Dataset does not contain the exact variables you are looking for
- Variables in the dataset are not collected in the same year
- Dataset is not representative of the population that you are interested in
- Some variables in the dataset are measured with error
- Variables are confounded
Descriptive and Exploratory Data Analysis
1
Assignment
- Modeling Data Basics Quiz
6
Readings
- Descriptive and Exploratory Data Analysis
- Missing Values
- Shape
- Identifying Outliers
- Evaluating Variables
- Evaluating Relationships
Inference
1
Assignment
- Inference Quiz
3
Readings
- Inference
- Uncertainty
- Random Sampling
Linear Modeling
1
Assignment
- Linear Regression Quiz
12
Readings
- Linear Regression
- Assumptions
- Association
- Association Testing in R
- Fitting the Model
- Model Diagnostics
- Tree Girth and Height Example
- Interpreting the Model
- Variance Explained
- Using broom
- Correlation Is Not Causation
- Confounding
Multiple Linear Regression
1
Assignment
- Multiple Linear Regression Quiz
1
Readings
- Multiple Linear Regression
Beyond Linear Regression
3
Readings
- Beyond Linear Regression
- Mean Different From Expectation?
- Testing Mean Difference From Expectation in R
More Statistical Tests
1
Readings
- More Statistical Tests
Hypothesis Testing
1
Assignment
- Hypothesis Testing Quiz
2
Readings
- Hypothesis Testing
- The infer Package
Prediction Modeling
1
Assignment
- Prediction and Machine Learning Quiz
12
Readings
- Prediction Modeling
- What is Machine Learning?
- Machine Learning Steps
- Data Splitting
- Train, Test, Validate
- Train
- Test
- Validate
- Variable Selection
- Model Selection
- Regression vs. Classification
- Model Accuracy
The tidymodels Ecosystem
1
Assignment
- tidymodels Quiz
5
Readings
- The tidymodels Ecosystem
- Benefits of tidymodels
- Packages of tidymodels
- Example of Continuous Variable Prediction
- Example of Categorical Variable Prediction
Case Study #1: Predicting Annual Air Pollution
1
Labs
- Case Study #1: Predicting Annual Air Pollution
17
Readings
- Case Study #1: Predicting Annual Air Pollution
- The Data
- Data Import
- Data Exploration and Wrangling
- Evaluate Correlation
- Splitting the Data
- Making a Recipe
- Running Preprocessing
- Specifying the Model
- Assessing the Model Fit
- Model Performance: Getting Predicted Values
- Visualizing Model Performance
- Quantifying Model Performance
- Assessing Model Performance on v -folds Using tune
- Random Forest
- Model Tuning
- Final model performance evaluation
Summary of tidymodels
1
Readings
- Summary of tidymodels
Project
1
Assignment
- Course Project Prediction Quiz
1
Peer Review
- Modeling Data in the Tidyverse Course Project
1
Readings
- Important information before you start the quiz
Auto Summary
The course "Modeling Data in the Tidyverse" is designed for individuals interested in Big Data and Analytics, aiming to enhance their skills in data modeling and analysis. Led by experienced instructors from Coursera, this foundational course dives into the various types of questions you can pose to your data and the appropriate modeling techniques to apply, including hypothesis testing, linear regression, nonlinear modeling, and machine learning. Spanning approximately 1260 minutes, the course offers a comprehensive exploration of how to build effective models to detect patterns and uncover hidden relationships within your data. By the end of this course, learners will be equipped with a robust toolkit for extracting meaningful insights and making informed decisions in their organizations. Ideal for those with a basic understanding of the R programming language, this course is part of a broader specialization that builds on techniques learned in previous R programming courses. Available through Coursera's Starter and Professional subscription plans, this course caters to individuals seeking to strengthen their data modeling capabilities and drive impactful outcomes in their business or research projects.

Carrie Wright, PhD

Shannon Ellis, PhD

Stephanie Hicks, PhD

Roger D. Peng, PhD