- Level Foundation
- Duration 12 hours
- Course by University of Michigan
-
Offered by
About
By the end of this first course in the Total Data Quality specialization, learners will be able to: 1. Identify the essential differences between designed and gathered data and summarize the key dimensions of the Total Data Quality (TDQ) Framework; 2. Define the three measurement dimensions of the Total Data Quality framework, and describe potential threats to data quality along each of these dimensions for both gathered and designed data; 3. Define the three representation dimensions of the Total Data Quality framework, and describe potential threats to data quality along each of these dimensions for both gathered and designed data; and 4. Describe why data analysis defines an important dimension of the Total Data Quality framework, and summarize potential threats to the overall quality of an analysis plan for designed and/or gathered data. This specialization as a whole aims to explore the Total Data Quality framework in depth and provide learners with more information about the detailed evaluation of total data quality that needs to happen prior to data analysis. The goal is for learners to incorporate evaluations of data quality into their process as a critical component for all projects. We sincerely hope to disseminate knowledge about total data quality to all learners, such as data scientists and quantitative analysts, who have not had sufficient training in the initial steps of the data science process that focus on data collection and evaluation of data quality. We feel that extensive knowledge of data science techniques and statistical analysis procedures will not help a quantitative research study if the data collected/gathered are not of sufficiently high quality. This specialization will focus on the essential first steps in any type of scientific investigation using data: either generating or gathering data, understanding where the data come from, evaluating the quality of the data, and taking steps to maximize the quality of the data prior to performing any kind of statistical analysis or applying data science techniques to answer research questions. Given this focus, there will be little material on the analysis of data, which is covered in myriad existing Coursera specializations. The primary focus of this specialization will be on understanding and maximizing data quality prior to analysis.Modules
Welcome!
1
Videos
- Welcome to the Specialization and Course 1!
3
Readings
- Course Syllabus
- Meet your Instructors
- Course Pre-Survey
Introducing Different Types of Data
6
Videos
- Introduction to Course 1: The Total Data Quality Framework
- What Are Designed Data?
- Example: Developing an Online Survey with SurveyMonkey
- What are Gathered Data?
- Example: Scraping Data from the Web
- Hybrid Data: Designed and Gathered
1
Readings
- File for use in next example
Introducing The Data Quality Framework
1
Assignment
- Measurement and Representation Concepts
2
Videos
- The Total Data Quality Framework
- Interview: Perspectives on the Meaning of Total Data Quality
1
Readings
- Interview Guest Biographies
Validity
1
Assignment
- Understanding Validity
5
Videos
- Defining Validity
- Threats to Validity for Designed Data
- Cognitive Interviewing (Think Aloud)
- Try It Out: Using The Survey Quality Predictor Application
- Threats to Validity for Gathered Data
2
Readings
- Interview Guest Biography
- Case Study: The Google Flu Trends Example
Data Origin
1
Assignment
- Understanding Data Origin
3
Videos
- Defining Data Origin
- Data Origin Threats for Designed Data
- Data Origin Threats for Gathered Data
2
Readings
- Case Study: Suchman and Jordan, and Interviewer Effects
- Case Study: COVID-19 Tracking in the U.S.
Data Processing
1
Assignment
- Understanding Data Processing
5
Videos
- Defining Data Processing
- Data Processing Threats for Designed Data
- Case Study: Between-Coder Variance
- Data Processing Threats for Gathered Data
- Case Study: Author Name Ambiguity in Bibliographic Data
1
Readings
- Case Study Guest Contributor Biographies
Data Access
1
Assignment
- Understanding Data Access
7
Videos
- Defining Data Access
- Defining Target Populations
- Part 1 of 2: Data Access Threats for Gathered Data
- Part 2 of 2: Data Access Threats for Gathered Data
- Case Study: Random Samples from Twitter APIs May Not Be Random
- Data Access Threats for Designed Data
- Case Study: Evaluating Sampling Frames/Commercial Data
3
Readings
- Gathering Twitter Data Using APIs (code and step-by-step instructions)
- Articles for the Case Study (Random Samples from Twitter APIs May Not Be Random)
- Files for use in the following example
Data Source
5
Videos
- Data Source Definition
- Data Source Threats for Designed Data
- Data Source Threats for Gathered Data
- Case Study: How Content and User Characteristics Can Impact Quality of Gathered Data
- Case Study: Who is Missing in Twitter User Data?
Data Missingness
1
Assignment
- Understanding Data Missingness
4
Videos
- Defining Data Missingness
- Data Missingness Threats for Designed Data
- Imputing Missing Values Demo, Before and After Estimates
- Data Missingness Threats for Gathered Data
Data Analysis as an Important Aspect of Total Data Quality
1
Assignment
- Data Analysis Threats
5
Videos
- Why is Data Analysis Part of Total Data Quality?
- Threats to the Quality of Data Analysis for Designed Data
- Demo: Alternative Approaches to Analyzing Survey Data
- Threats Concerning Data Analysis for Gathered Data
- Case Study: Algorithm Bias in Gathered Data
3
Readings
- Case Study: Analytic Error in NCSES Surveys
- Optional Tutorial: Using the Free R Software
- Files for the next Demo
Course Conclusion
3
Readings
- Course Conclusion
- References for The Total Data Quality Framework
- Course Post-Survey
Auto Summary
"The Total Data Quality Framework" course, led by Coursera, delves into Big Data and Analytics, emphasizing the critical importance of data quality. Over 720 minutes, learners will explore the key dimensions of the Total Data Quality (TDQ) Framework, identifying threats to data quality and understanding the distinctions between designed and gathered data. This foundational course is ideal for data scientists and quantitative analysts aiming to enhance their data evaluation skills. Subscription options include Starter and Professional tiers. Join to ensure your data quality is robust before diving into analysis.

Brady T. West

James Wagner

Jinseok Kim

Trent D Buskirk