- Level Foundation
- المدة 18 ساعات hours
- الطبع بواسطة University of California San Diego
-
Offered by
عن
At the end of the course, you will be able to: *Retrieve data from example database and big data management systems *Describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications *Identify when a big data problem needs data integration *Execute simple big data integration and processing on Hadoop and Spark platforms This course is for those new to data science. Completion of Intro to Big Data is recommended. No prior programming experience is needed, although the ability to install applications and utilize a virtual machine is necessary to complete the hands-on assignments. Refer to the specialization technical requirements for complete hardware and software specifications. Hardware Requirements: (A) Quad Core Processor (VT-x or AMD-V support recommended), 64-bit; (B) 8 GB RAM; (C) 20 GB disk free. How to find your hardware information: (Windows): Open System by clicking the Start button, right-clicking Computer, and then clicking Properties; (Mac): Open Overview by clicking on the Apple menu and clicking "About This Mac." Most computers with 8 GB RAM purchased in the last 3 years will meet the minimum requirements.You will need a high speed internet connection because you will be downloading files up to 4 Gb in size. Software Requirements: This course relies on several open-source software tools, including Apache Hadoop. All required software can be downloaded and installed free of charge (except for data charges from your internet provider). Software requirements include: Windows 7+, Mac OS X 10.10+, Ubuntu 14.04+ or CentOS 6+ VirtualBox 5+.الوحدات
Why Big Data Integration and Processing?
1
Discussions
- Getting to know you: Tell us about yourself and why you are taking this course.
3
Videos
- What is in this Course?
- Summary of Big Data Modeling and Management
- Why is Big Data Processing Different?
1
Readings
- Slides: Summary & Why Is Big Data Processing Different
Hands On: Setting Up Your Software Environment
4
Readings
- Downloading and Installing Docker Desktop Instructions
- Instroduction to Jupyter Notebooks
- Downloading Hands-On Materials
- Basic terminal shell commands
Querying Data Part 1
4
Videos
- What is Data Retrieval? Part 1
- What is Data Retrieval? Part 2
- Querying Two Relations
- Subqueries
1
Readings
- Slides: What is Data Retrieval?
Hands On
1
Videos
- Querying Relational Data with Postgres
1
Readings
- Querying Relational Data with Postgres
Querying Data Part 2
1
Assignment
- Retrieving Big Data Quiz
1
Discussions
- Let's Discuss: MongoDB
3
Videos
- Querying JSON Data with MongoDB
- Aggregation Functions
- Querying Aerospike
1
Readings
- Slides: Querying Data Part 2
Hands On
1
Assignment
- Postgres, MongoDB, and Pandas
2
Videos
- Querying Documents in MongoDB
- Exploring Pandas DataFrames
2
Readings
- Querying Documents in MongoDB
- Exploring Pandas DataFrames
Information Integration
1
Assignment
- Information Integration - Quiz
1
Discussions
- Let's Discuss: Big Data Integration
3
Videos
- Overview of Information Integration
- A Data Integration Scenario
- Integration for Multichannel Customer Analytics
1
Readings
- Slides: Information Integration
Industry Examples for Big Data Integration and Processing
4
Videos
- Big Data Management and Processing Using Splunk and Datameer
- Why Splunk?
- Connected Cars with Ford's OpenXC and Splunk
- Big Data Management and Processing using Datameer
Hands-On: Big Data Management and Processing Using Splunk
1
Assignment
- Hands-On With Splunk
4
Videos
- Installing Splunk Enterprise on Windows
- Installing Splunk Enterprise on Linux
- Exploring Splunk Queries
- Optional: Creating Pivot Reports in Splunk
3
Readings
- Downloading Splunk Enterprise
- Exploring Splunk Queries
- Optional: Instructions for Splunk Pivot Tutorial
Big Data Pipelines and High-level Operations for Big Data Processing
1
Discussions
- Let's Discuss: Big Data Pipelines in Your World
4
Videos
- Big Data Processing Pipelines
- Some High-Level Processing Operations in Big Data Pipelines
- Aggregation Operations in Big Data Pipelines
- Typical Analytical Operations in Big Data Pipelines
1
Readings
- Big Data Processing Pipelines Slides
Big Data Processing Tools and Systems
1
Assignment
- Pipeline and Tools
1
Discussions
- Let's Discuss: Big Data Processing Systems
4
Videos
- Overview of Big Data Processing Systems
- The Integration and Processing Layer
- Introduction to Apache Spark
- Getting Started with Spark
2
Readings
- Big Data Workflow Management
- Slides for Big Data Processing Tools and Systems
Hands-On: Let's Try Spark
1
Assignment
- WordCount in Spark
1
Discussions
- Let's Discuss: Word Count
1
Videos
- WordCount in Spark
1
Readings
- WordCount in Spark
Programming in Spark
3
Videos
- Spark Core: Programming In Spark using RDDs in Pipelines
- Spark Core: Transformations
- Spark Core: Actions
1
Readings
- Slides for Module 5 Lesson 1
Main Modules in the Spark Ecosystem
1
Assignment
- More on Spark
1
Discussions
- Let's Discuss: The Spark Ecosystem
4
Videos
- Spark SQL
- Spark Streaming
- Spark MLLib
- Spark GraphX
1
Readings
- Slides for Module 5 Lesson 2
Hands-on: Data Processing in Spark
1
Assignment
- SparkSQL and Spark Streaming
2
Videos
- Exploring SparkSQL and Spark DataFrames
- Analyzing Sensor Data with Spark Streaming
2
Readings
- Exploring SparkSQL and Spark DataFrames
- Analyzing Sensor Data with Spark Streaming
Assignment: Querying and Exporting from MongoDB
1
Assignment
- Check Your Query Results
3
Readings
- Let's Analyze Soccer Tweets!
- Expressing Analytical Questions as MongoDB Queries
- Exporting Data from MongoDB to a CSV File
Assignment: Analysis using Spark
1
Assignment
- Check Your Analysis Results
1
Readings
- Analyzing Tweets About Countries
Auto Summary
Immerse yourself in the realm of big data with the "Big Data Integration and Processing" course, designed for budding data scientists eager to dive into data management and analytics. Under the guidance of expert instructors on Coursera, this foundational course equips you with the skills to retrieve data from databases and big data management systems, connect data management operations to large-scale analytical applications, and identify when data integration is necessary for big data problems. With no prior programming experience required, this course is accessible to beginners, although having completed an introductory course to Big Data is recommended. You'll gain hands-on experience executing data integration and processing on Hadoop and Spark platforms, essential tools in the data science field. The course spans a comprehensive 1080 minutes of learning, available through a Starter subscription. To fully engage with the course materials, ensure your computer meets the hardware requirements: a quad-core processor, 8 GB of RAM, and 20 GB of free disk space. Additionally, a high-speed internet connection is essential for downloading large files. All necessary software, including Apache Hadoop, can be downloaded and installed at no extra cost beyond your internet provider's data charges. Compatible operating systems include Windows 7+, Mac OS X 10.10+, Ubuntu 14.04+, or CentOS 6+, with VirtualBox 5+. Join the course today and embark on your journey to mastering big data integration and processing, setting a strong foundation for your data science career.

Ilkay Altintas

Amarnath Gupta