- Level Foundation
- المدة 24 ساعات hours
- الطبع بواسطة IBM
-
Offered by
عن
Bernard Marr defines Big Data as the digital trace that we are generating in this digital era. In this course, you will learn about the characteristics of Big Data and its application in Big Data Analytics. You will gain an understanding about the features, benefits, limitations, and applications of some of the Big Data processing tools. You'll explore how Hadoop and Hive help leverage the benefits of Big Data while overcoming some of the challenges it poses. Hadoop is an open-source framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Hive, a data warehouse software, provides an SQL-like interface to efficiently query and manipulate large data sets residing in various databases and file systems that integrate with Hadoop. Apache Spark is an open-source processing engine that provides users new ways to store and make use of big data. It is an open-source processing engine built around speed, ease of use, and analytics. In this course, you will discover how to leverage Spark to deliver reliable insights. The course provides an overview of the platform, going into the different components that make up Apache Spark. In this course, you will also learn about Resilient Distributed Datasets, or RDDs, that enable parallel processing across the nodes of a Spark cluster.الوحدات
Introduction to Big Data
1
Assignment
- Practice Quiz: Introduction to Big Data
8
Videos
- Course Introduction
- What is Big Data?
- Impact of Big Data
- Parallel Processing, Scaling, and Data Parallelism
- Big Data Tools and Ecosystem
- Open Source and Big Data
- Beyond the Hype
- Big Data Use Cases
1
Readings
- Summary and Highlights: Introduction to Big Data
Module 1 Glossary and Graded Quiz
1
Assignment
- Graded Quiz: What Is Big Data?
Introduction to Hadoop
1
Assignment
- Practice Quiz: Introduction to Hadoop
3
External Tool
- Hands-on Lab: Getting Started with Hive
- Hands-on Lab: Hadoop MapReduce
- Hands-on lab : Hadoop Cluster (Optional)
6
Videos
- Introduction to Hadoop
- Intro to MapReduce
- Hadoop Ecosystem
- HDFS
- HIVE
- HBASE
1
Readings
- Summary and Highlights: Introduction to Hadoop
Module 2 Cheat Sheet, Glossary and Graded Quiz
1
Assignment
- Graded Quiz: Introduction to Hadoop Ecosystem
Introduction to Apache Spark
1
Assignment
- Practice Quiz: Introduction to Apache Spark
2
External Tool
- Practice Lab: Getting Started with Pyspark and Pandas
- Hands-on Lab: Getting Started with Spark using Python
5
Videos
- Why use Apache Spark?
- Functional Programming Basics
- Parallel Programming using Resilient Distributed Datasets
- Scale out / Data Parallelism in Apache Spark
- Dataframes and SparkSQL
1
Readings
- Summary and Highlights: Introduction to Apache Spark
Module 3 Cheat Sheet, Glossary and Graded Quiz
1
Assignment
- Graded Quiz: Apache Spark
Introduction to DataFrames & Spark SQL
1
Assignment
- Practice Quiz: Introduction to DataFrames & Spark SQL
2
External Tool
- Hands-on Lab: Introduction to DataFrames
- Hands-On Lab: Introduction to SparkSQL
5
Videos
- RDDs in Parallel Programming and Spark
- Data-frames and Datasets
- Catalyst and Tungsten
- ETL with DataFrames
- Real-world usage of SparkSQL
1
Readings
- Summary and Highlights: Introduction to DataFrames and Spark SQL
Module 4 Cheat Sheet, Glossary and Graded Quiz
1
Assignment
- Graded Quiz: DataFrames and Spark SQL
Spark Architecture
1
Assignment
- Practice Quiz: Spark Architecture
1
External Tool
- Hands-on Lab: Submit Apache Spark Applications
3
Videos
- Apache Spark Architecture
- Overview of Apache Spark Cluster Modes
- How to Run an Apache Spark Application
1
Readings
- Summary and Highlights: Spark Architecture
Spark Runtime Environments
1
Assignment
- Practice Quiz: Spark Runtime Environments
1
External Tool
- Hands-on Lab: Apache Spark on Kubernetes
3
Videos
- Using Apache Spark on IBM Cloud
- Setting Apache Spark Configuration
- Running Spark on Kubernetes
1
Readings
- Summary and Highlights: Spark Runtime Environments
Module 5 Cheat Sheet, Glossary and Graded Quiz
1
Assignment
- Graded Quiz: Development and Runtime Environment Options
Introduction to Monitoring and Tuning
1
Assignment
- Practice Quiz: Introduction to Monitoring and Tuning
1
External Tool
- Hands-on Lab: Monitoring and Performance Tuning
5
Videos
- The Apache Spark User Interface
- Monitoring Application Progress
- Debugging Apache Spark Application Issues
- Understanding Memory Resources
- Understanding Processor Resources
1
Readings
- Summary and Highlights: Introduction to Monitoring and Tuning
Module 6 Cheat Sheet, Glossary and Graded Quiz
1
Assignment
- Graded Quiz: Monitoring and Tuning
Project: Data Analysis using Apache Spark
2
External Tool
- Practice Project: Data Processing Using Spark
- Final Project: Data Analysis using Spark
Course Final Assessment
1
Assignment
- Final Assessment
1
Readings
- Instructions for the Final Assessment
Course Wrap-Up
2
Readings
- Congratulations and Next Steps
- Thanks from the Course Team
Auto Summary
Embark on a journey into the vast world of big data with the "Introduction to Big Data with Spark and Hadoop" course, a comprehensive and engaging learning experience designed by IBM and offered on Coursera. This foundational course is tailored for those interested in IT and Computer Science, providing a deep dive into the essentials of big data and its significant impact on analytics. Led by expert instructor Bernard Marr, you will begin by understanding the fundamental characteristics of big data and its myriad applications in today's digital era. The course will guide you through the key concepts of big data technologies, such as parallel processing and data scaling, and introduce you to powerful tools like Apache Hadoop and Apache Spark. The curriculum covers the Hadoop ecosystem extensively, including core components like the Hadoop Distributed File System (HDFS), MapReduce, HBase, and Hive, which offers an SQL-like interface for handling large datasets. Moving forward, you will delve into Apache Spark, exploring its architecture, DataFrames, SparkSQL, and the Spark Application UI for monitoring and tracking application requests. This self-paced course spans a robust 1440 minutes, allowing for flexible learning. It features numerous hands-on labs utilizing Docker, Kubernetes, Python, and Jupyter Notebooks to reinforce your understanding and practical skills in big data processing. Ideal for beginners, this foundational course is available through various subscription options, including Starter, Professional, and Paid plans, making it accessible to a wide range of learners. Whether you're looking to kickstart your career in big data or expand your existing knowledge, this course offers the tools and insights you need to succeed in the dynamic field of big data analytics.

Aije Egwaikhide

Romeo Kienzler

Rav Ahuja