- Level Foundation
- المدة 26 ساعات hours
- الطبع بواسطة University of California San Diego
-
Offered by
عن
This course is for novice programmers or business people who would like to understand the core tools used to wrangle and analyze big data. With no prior experience, you will have the opportunity to walk through hands-on examples with Hadoop and Spark frameworks, two of the most common in the industry. You will be comfortable explaining the specific components and basic processes of the Hadoop architecture, software stack, and execution environment. In the assignments you will be guided in how data scientists apply the important concepts and techniques such as Map-Reduce that are used to solve fundamental problems in big data. You'll feel empowered to have conversations about big data and the data analysis process.الوحدات
Lesson 1: Big Data Hadoop Stack
1
Assignment
- Basic Hadoop Stack
5
Videos
- Hadoop Stack Basics
- The Apache Framework: Basic Modules
- Hadoop Distributed File System (HDFS)
- The Hadoop "Zoo"
- Hadoop Ecosystem Major Components
2
Readings
- Apache Hadoop Ecosystem
- Lesson 1 Slides (PDF)
Lesson 2: Hands-On Exploration of the Cloudera VM
2
Videos
- Exploring the Cloudera VM: Hands-On Part 1
- Exploring the Cloudera VM: Hands-On Part 2
2
Readings
- Hardware & Software Requirements
- Lesson 2 Slides - Cloudera VM Tour
Lesson 1: Overview of the Hadoop Stack
1
Assignment
- Overview of Hadoop Stack
3
Videos
- Overview of the Hadoop Stack
- The Hadoop Distributed File System (HDFS) and HDFS2
- MapReduce Framework and YARN
1
Readings
- Hadoop Basics - Lesson 1 Slides
Lesson 2: The Hadoop Execution Environment
1
Assignment
- Hadoop Execution Environment
3
Videos
- The Hadoop Execution Environment
- YARN, Tez, and Spark
- Hadoop Resource Scheduling
1
Readings
- Lesson 2: Hadoop Execution Environment - Slides
Lesson 3: Overview of Hadoop based Applications and Services
1
Assignment
- Hadoop Applications
4
Videos
- Hadoop-Based Applications
- Introduction to Apache Pig
- Introduction to Apache HIVE
- Introduction to Apache HBASE
4
Readings
- Lesson 3: Hadoop-Based Applications Overview - All Slides
- Command list for Applications Slides
- Tips to handle service connection errors
- References for Applications
Lesson 1: HDFS Architecture and Configuration
1
Assignment
- HDFS Architecture
3
Videos
- Overview of HDFS Architecture
- The HDFS Performance Envelope
- Read/Write Processes in HDFS
2
Readings
- Lesson 1: Introduction to HDFS - Slides
- HDFS references
Lesson 2: HDFS Performance and Tuning
1
Assignment
- HDFS performance,tuning, and robustness
2
Videos
- HDFS Tuning Parameters
- HDFS Performance and Robustness
1
Readings
- Lesson 2: HDFS Performance and Tuning - Slides
Lesson 3: HDFS Access, Commands, APIs, and Applications
1
Assignment
- Accessing HDFS
4
Videos
- Overview of HDFS Access, APIs, and Applications
- HDFS Commands
- Native Java API for HDFS
- REST API for HDFS
2
Readings
- HDFS Access, APIs
- Lesson 3: HDFS Access, APIs, Applications - Slides
Lesson 1: Introduction to Map/Reduce
- Running Wordcount with Hadoop streaming, using Python code
1
Assignment
- Lesson 1 Review
3
Videos
- Introduction to Map/Reduce
- The Map/Reduce Framework
- A MapReduce Example: Wordcount in detail
2
Readings
- Lesson 1: Introduction to MapReduce - Slides
- A note on debugging map/reduce programs.
Lesson 2: Map/Reduce Examples and Principles
- Joining Data
6
Videos
- MapReduce: Intro to Examples and Principles
- MapReduce Example: Trending Wordcount
- MapReduce Example: Joining Data
- MapReduce Example: Vector Multiplication
- Computational Costs of Vector Multiplication
- MapReduce Summary
1
Readings
- Lesson 2: MapReduce Examples and Principles - Slides
Lesson 1: Introduction to Apache Spark
1
Assignment
- Spark Lesson 1
2
Videos
- Introduction to Apache Spark
- Architecture of Spark
2
Readings
- Setup PySpark on the Cloudera VM
- Lesson 1: Intro to Apache Spark - Slides
Lesson 2: Resilient Distributed Datasets and Transformations
- Simple Join in Spark
1
Assignment
- Spark Lesson 2
3
Videos
- Resilient Distributed Datasets
- Spark Transformations
- Wide Transformations
1
Readings
- Lesson 2: RDD and Transformations - Slides
Lesson 3: Job scheduling, Actions, Caching and Shared Variables
- Advanced Join in Spark
1
Assignment
- Spark Lesson 3
5
Videos
- Directed Acyclic Graph (DAG) Scheduler
- Actions in Spark
- Memory Caching in Spark
- Broadcast Variables
- Accumulators
1
Readings
- Lesson 3: Scheduling, Actions, Caching - Slides
Auto Summary
Embark on a journey into the world of big data with the course "Hadoop Platform and Application Framework," a foundational program within the Data Science & AI domain. Designed specifically for novice programmers and business professionals, this course offers a comprehensive introduction to the core tools essential for managing and analyzing vast datasets. Throughout this engaging course, learners will delve into hands-on examples using Hadoop and Spark, two pivotal frameworks in the industry. With no prior experience required, participants will gain a solid understanding of the Hadoop architecture, software stack, and execution environment, enabling them to confidently explain these components and processes. Guided assignments will illustrate how data scientists employ critical concepts and techniques such as Map-Reduce to address fundamental big data challenges. By the end of the course, learners will be equipped to engage in informed discussions about big data and the data analysis process. Offered by Coursera, this beginner-friendly course spans 1560 minutes and is available under the Starter subscription. Whether you're stepping into the data science field or looking to enhance your business acumen with big data insights, this course is tailored to empower and educate you on the essential frameworks driving today's data-driven decisions.

Natasha Balac, Ph.D.

Paul Rodriguez

Andrea Zonca