- Level Foundation
- Ratings
- Duration 24 hours
- Course by Harvard University
- Total students 14,706 enrolled
-
Offered by
About
In this course, part of our Professional Certificate Program in Data Science,we cover several standard steps of the data wrangling process like importing data into R, tidying data, string processing, HTML parsing, working with dates and times, and text mining. Rarely are all these wrangling steps necessary in a single analysis, but a data scientist will likely face them all at some point.
Very rarely is data easily accessible in a data science project. It's more likely for the data to be in a file, a database, or extracted from documents such as web pages, tweets, or PDFs. In these cases, the first step is to import the data into R and tidy the data, using the tidyverse package. The steps that convert data from its raw form to the tidy form is called data wrangling.
This process is a critical step for any data scientist. Knowing how to wrangle and clean data will enable you to make critical insights that would otherwise be hidden.
What you will learn
- Importing data into R fromdifferent file formats
- Web scraping
- How to tidy data using the tidyverse tobetter facilitateanalysis
- String processing with regular expressions (regex)
- Wrangling data using dplyr
- How to workwith dates and times as file formats
- Text mining
Skills you learn
Auto Summary
Unlock the essentials of data wrangling with the "Data Science: Wrangling" course, a foundational offering within the IT & Computer Science domain. This course, crafted by edX and part of the prestigious Professional Certificate Program in Data Science by HarvardX, equips learners with key skills for handling and preparing data for analysis. Dive into the comprehensive curriculum that covers importing data into R, tidying up datasets, processing strings, parsing HTML, managing dates and times, and mining text. These steps, while not always required in every analysis, are crucial tools for any data scientist's toolkit as they often encounter raw, unstructured data from various sources like files, databases, and web pages. Spanning over 24 hours of engaging content, this course ensures that you are well-versed in the data wrangling process, enabling you to transform messy data into a clean, analyzable format using the tidyverse package. This pivotal skill set will empower you to uncover critical insights and make informed decisions based on your data. Ideal for beginners and those at the foundation level, "Data Science: Wrangling" offers professional subscription options, making it accessible and convenient for aspiring data scientists to start their journey. Join now and master the art of data wrangling to advance your data science capabilities.

Rafael Irizarry