Hadoop Platform and Application Framework

Buy Now AED 274.99 + VAT

Monthly Subscription Starting at AED 99 + VAT

Level Foundation
المدة 26 ساعات hours
الطبع بواسطة University of California San Diego
Offered by

عن

This course is for novice programmers or business people who would like to understand the core tools used to wrangle and analyze big data. With no prior experience, you will have the opportunity to walk through hands-on examples with Hadoop and Spark frameworks, two of the most common in the industry. You will be comfortable explaining the specific components and basic processes of the Hadoop architecture, software stack, and execution environment. In the assignments you will be guided in how data scientists apply the important concepts and techniques such as Map-Reduce that are used to solve fundamental problems in big data. You'll feel empowered to have conversations about big data and the data analysis process.

الوحدات

Lesson 1: Big Data Hadoop Stack

5 Videos

2 Readings

1 Assignment

Show info about module content

1 Assignment

Basic Hadoop Stack

5 Videos

Hadoop Stack Basics
The Apache Framework: Basic Modules
Hadoop Distributed File System (HDFS)
The Hadoop "Zoo"
Hadoop Ecosystem Major Components

2 Readings

Apache Hadoop Ecosystem
Lesson 1 Slides (PDF)

Lesson 2: Hands-On Exploration of the Cloudera VM

2 Videos

2 Readings

Show info about module content

2 Videos

Exploring the Cloudera VM: Hands-On Part 1
Exploring the Cloudera VM: Hands-On Part 2

2 Readings

Hardware & Software Requirements
Lesson 2 Slides - Cloudera VM Tour

Lesson 1: Overview of the Hadoop Stack

3 Videos

1 Readings

1 Assignment

Show info about module content

1 Assignment

Overview of Hadoop Stack

3 Videos

Overview of the Hadoop Stack
The Hadoop Distributed File System (HDFS) and HDFS2
MapReduce Framework and YARN

1 Readings

Hadoop Basics - Lesson 1 Slides

Lesson 2: The Hadoop Execution Environment

3 Videos

1 Readings

1 Assignment

Show info about module content

1 Assignment

Hadoop Execution Environment

3 Videos

The Hadoop Execution Environment
YARN, Tez, and Spark
Hadoop Resource Scheduling

1 Readings

Lesson 2: Hadoop Execution Environment - Slides

Lesson 3: Overview of Hadoop based Applications and Services

4 Videos

4 Readings

1 Assignment

Show info about module content

1 Assignment

Hadoop Applications

4 Videos

Hadoop-Based Applications
Introduction to Apache Pig
Introduction to Apache HIVE
Introduction to Apache HBASE

4 Readings

Lesson 3: Hadoop-Based Applications Overview - All Slides
Command list for Applications Slides
Tips to handle service connection errors
References for Applications

Lesson 1: HDFS Architecture and Configuration

3 Videos

2 Readings

1 Assignment

Show info about module content

1 Assignment

HDFS Architecture

3 Videos

Overview of HDFS Architecture
The HDFS Performance Envelope
Read/Write Processes in HDFS

2 Readings

Lesson 1: Introduction to HDFS - Slides
HDFS references

Lesson 2: HDFS Performance and Tuning

2 Videos

1 Readings

1 Assignment

Show info about module content

1 Assignment

HDFS performance,tuning, and robustness

2 Videos

HDFS Tuning Parameters
HDFS Performance and Robustness

1 Readings

Lesson 2: HDFS Performance and Tuning - Slides

Lesson 3: HDFS Access, Commands, APIs, and Applications

4 Videos

2 Readings

1 Assignment

Show info about module content

1 Assignment

Accessing HDFS

4 Videos

Overview of HDFS Access, APIs, and Applications
HDFS Commands
Native Java API for HDFS
REST API for HDFS

2 Readings

HDFS Access, APIs
Lesson 3: HDFS Access, APIs, Applications - Slides

Lesson 1: Introduction to Map/Reduce

3 Videos

2 Readings

1 Programming

1 Assignment

Show info about module content

1 Programming

Running Wordcount with Hadoop streaming, using Python code

1 Assignment

Lesson 1 Review

3 Videos

Introduction to Map/Reduce
The Map/Reduce Framework
A MapReduce Example: Wordcount in detail

2 Readings

Lesson 1: Introduction to MapReduce - Slides
A note on debugging map/reduce programs.

Lesson 2: Map/Reduce Examples and Principles

6 Videos

1 Readings

1 Programming

Show info about module content

1 Programming

Joining Data

6 Videos

MapReduce: Intro to Examples and Principles
MapReduce Example: Trending Wordcount
MapReduce Example: Joining Data
MapReduce Example: Vector Multiplication
Computational Costs of Vector Multiplication
MapReduce Summary

1 Readings

Lesson 2: MapReduce Examples and Principles - Slides

Lesson 1: Introduction to Apache Spark

2 Videos

2 Readings

1 Assignment

Show info about module content

1 Assignment

Spark Lesson 1

2 Videos

Introduction to Apache Spark
Architecture of Spark

2 Readings

Setup PySpark on the Cloudera VM
Lesson 1: Intro to Apache Spark - Slides

Lesson 2: Resilient Distributed Datasets and Transformations

3 Videos

1 Readings

1 Programming

1 Assignment

Show info about module content

1 Programming

Simple Join in Spark

1 Assignment

Spark Lesson 2

3 Videos

Resilient Distributed Datasets
Spark Transformations
Wide Transformations

1 Readings

Lesson 2: RDD and Transformations - Slides

Lesson 3: Job scheduling, Actions, Caching and Shared Variables

5 Videos

1 Readings

1 Programming

1 Assignment

Show info about module content

1 Programming

Advanced Join in Spark

1 Assignment

Spark Lesson 3

5 Videos

Directed Acyclic Graph (DAG) Scheduler
Actions in Spark
Memory Caching in Spark
Broadcast Variables
Accumulators

1 Readings

Lesson 3: Scheduling, Actions, Caching - Slides

Auto Summary

Embark on a journey into the world of big data with the course "Hadoop Platform and Application Framework," a foundational program within the Data Science & AI domain. Designed specifically for novice programmers and business professionals, this course offers a comprehensive introduction to the core tools essential for managing and analyzing vast datasets. Throughout this engaging course, learners will delve into hands-on examples using Hadoop and Spark, two pivotal frameworks in the industry. With no prior experience required, participants will gain a solid understanding of the Hadoop architecture, software stack, and execution environment, enabling them to confidently explain these components and processes. Guided assignments will illustrate how data scientists employ critical concepts and techniques such as Map-Reduce to address fundamental big data challenges. By the end of the course, learners will be equipped to engage in informed discussions about big data and the data analysis process. Offered by Coursera, this beginner-friendly course spans 1560 minutes and is available under the Starter subscription. Whether you're stepping into the data science field or looking to enhance your business acumen with big data insights, this course is tailored to empower and educate you on the essential frameworks driving today's data-driven decisions.

Instructors

Natasha Balac, Ph.D.

Instructors

Paul Rodriguez

Instructors

Andrea Zonca