Introduction to Big Data with Spark and Hadoop

This self-paced IBM course will teach you all about big data! You will become familiar with the characteristics of big data and its application in big data analytics. You will also gain hands-on experience with big data processing tools like Apache Hadoop and Apache Spark. Bernard Marr defines big data as the digital trace that we are generating in this digital era. You will start the course by understanding what big data is and exploring how insights from big data can be harnessed for a variety of use cases.

Buy Now AED 170.99 + VAT

Monthly Subscription Starting at AED 99 + VAT

Level Foundation
المدة 24 ساعات hours
الطبع بواسطة IBM
Offered by

عن

Bernard Marr defines Big Data as the digital trace that we are generating in this digital era. In this course, you will learn about the characteristics of Big Data and its application in Big Data Analytics. You will gain an understanding about the features, benefits, limitations, and applications of some of the Big Data processing tools. You'll explore how Hadoop and Hive help leverage the benefits of Big Data while overcoming some of the challenges it poses. Hadoop is an open-source framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Hive, a data warehouse software, provides an SQL-like interface to efficiently query and manipulate large data sets residing in various databases and file systems that integrate with Hadoop. Apache Spark is an open-source processing engine that provides users new ways to store and make use of big data. It is an open-source processing engine built around speed, ease of use, and analytics. In this course, you will discover how to leverage Spark to deliver reliable insights. The course provides an overview of the platform, going into the different components that make up Apache Spark. In this course, you will also learn about Resilient Distributed Datasets, or RDDs, that enable parallel processing across the nodes of a Spark cluster.

الوحدات

Introduction to Big Data

8 Videos

1 Readings

1 Plugin

1 Assignment

Show info about module content

1 Assignment

Practice Quiz: Introduction to Big Data

8 Videos

Course Introduction
What is Big Data?
Impact of Big Data
Parallel Processing, Scaling, and Data Parallelism
Big Data Tools and Ecosystem
Open Source and Big Data
Beyond the Hype
Big Data Use Cases

1 Readings

Summary and Highlights: Introduction to Big Data

Module 1 Glossary and Graded Quiz

1 Plugin

1 Assignment

Show info about module content

1 Assignment

Graded Quiz: What Is Big Data?

Introduction to Hadoop

6 Videos

1 Readings

3 ExternalTool

1 Assignment

Show info about module content

1 Assignment

Practice Quiz: Introduction to Hadoop

3 External Tool

Hands-on Lab: Getting Started with Hive
Hands-on Lab: Hadoop MapReduce
Hands-on lab : Hadoop Cluster (Optional)

6 Videos

Introduction to Hadoop
Intro to MapReduce
Hadoop Ecosystem
HDFS
HIVE
HBASE

1 Readings

Summary and Highlights: Introduction to Hadoop

Module 2 Cheat Sheet, Glossary and Graded Quiz

2 Plugin

1 Assignment

Show info about module content

1 Assignment

Graded Quiz: Introduction to Hadoop Ecosystem

Introduction to Apache Spark

5 Videos

1 Readings

2 ExternalTool

1 Assignment

Show info about module content

1 Assignment

Practice Quiz: Introduction to Apache Spark

2 External Tool

Practice Lab: Getting Started with Pyspark and Pandas
Hands-on Lab: Getting Started with Spark using Python

5 Videos

Why use Apache Spark?
Functional Programming Basics
Parallel Programming using Resilient Distributed Datasets
Scale out / Data Parallelism in Apache Spark
Dataframes and SparkSQL

1 Readings

Summary and Highlights: Introduction to Apache Spark

Module 3 Cheat Sheet, Glossary and Graded Quiz

2 Plugin

1 Assignment

Show info about module content

1 Assignment

Graded Quiz: Apache Spark

Introduction to DataFrames & Spark SQL

5 Videos

1 Readings

2 ExternalTool

2 Plugin

1 Assignment

Show info about module content

1 Assignment

Practice Quiz: Introduction to DataFrames & Spark SQL

2 External Tool

Hands-on Lab: Introduction to DataFrames
Hands-On Lab: Introduction to SparkSQL

5 Videos

RDDs in Parallel Programming and Spark
Data-frames and Datasets
Catalyst and Tungsten
ETL with DataFrames
Real-world usage of SparkSQL

1 Readings

Summary and Highlights: Introduction to DataFrames and Spark SQL

Module 4 Cheat Sheet, Glossary and Graded Quiz

2 Plugin

1 Assignment

Show info about module content

1 Assignment

Graded Quiz: DataFrames and Spark SQL

Spark Architecture

3 Videos

1 Readings

1 ExternalTool

1 Assignment

Show info about module content

1 Assignment

Practice Quiz: Spark Architecture

1 External Tool

Hands-on Lab: Submit Apache Spark Applications

3 Videos

Apache Spark Architecture
Overview of Apache Spark Cluster Modes
How to Run an Apache Spark Application

1 Readings

Summary and Highlights: Spark Architecture

Spark Runtime Environments

3 Videos

1 Readings

1 ExternalTool

2 Plugin

1 Assignment

Show info about module content

1 Assignment

Practice Quiz: Spark Runtime Environments

1 External Tool

Hands-on Lab: Apache Spark on Kubernetes

3 Videos

Using Apache Spark on IBM Cloud
Setting Apache Spark Configuration
Running Spark on Kubernetes

1 Readings

Summary and Highlights: Spark Runtime Environments

Module 5 Cheat Sheet, Glossary and Graded Quiz

2 Plugin

1 Assignment

Show info about module content

1 Assignment

Graded Quiz: Development and Runtime Environment Options

Introduction to Monitoring and Tuning

5 Videos

1 Readings

1 ExternalTool

1 Plugin

1 Assignment

Show info about module content

1 Assignment

Practice Quiz: Introduction to Monitoring and Tuning

1 External Tool

Hands-on Lab: Monitoring and Performance Tuning

5 Videos

The Apache Spark User Interface
Monitoring Application Progress
Debugging Apache Spark Application Issues
Understanding Memory Resources
Understanding Processor Resources

1 Readings

Summary and Highlights: Introduction to Monitoring and Tuning

Module 6 Cheat Sheet, Glossary and Graded Quiz

2 Plugin

1 Assignment

Show info about module content

1 Assignment

Graded Quiz: Monitoring and Tuning

Project: Data Analysis using Apache Spark

2 ExternalTool

1 Plugin

Show info about module content

2 External Tool

Practice Project: Data Processing Using Spark
Final Project: Data Analysis using Spark

Course Final Assessment

1 Readings

1 Assignment

Show info about module content

1 Assignment

Final Assessment

1 Readings

Instructions for the Final Assessment

Course Wrap-Up

2 Readings

1 Plugin

Show info about module content

2 Readings

Congratulations and Next Steps
Thanks from the Course Team

Auto Summary

Embark on a journey into the vast world of big data with the "Introduction to Big Data with Spark and Hadoop" course, a comprehensive and engaging learning experience designed by IBM and offered on Coursera. This foundational course is tailored for those interested in IT and Computer Science, providing a deep dive into the essentials of big data and its significant impact on analytics. Led by expert instructor Bernard Marr, you will begin by understanding the fundamental characteristics of big data and its myriad applications in today's digital era. The course will guide you through the key concepts of big data technologies, such as parallel processing and data scaling, and introduce you to powerful tools like Apache Hadoop and Apache Spark. The curriculum covers the Hadoop ecosystem extensively, including core components like the Hadoop Distributed File System (HDFS), MapReduce, HBase, and Hive, which offers an SQL-like interface for handling large datasets. Moving forward, you will delve into Apache Spark, exploring its architecture, DataFrames, SparkSQL, and the Spark Application UI for monitoring and tracking application requests. This self-paced course spans a robust 1440 minutes, allowing for flexible learning. It features numerous hands-on labs utilizing Docker, Kubernetes, Python, and Jupyter Notebooks to reinforce your understanding and practical skills in big data processing. Ideal for beginners, this foundational course is available through various subscription options, including Starter, Professional, and Paid plans, making it accessible to a wide range of learners. Whether you're looking to kickstart your career in big data or expand your existing knowledge, this course offers the tools and insights you need to succeed in the dynamic field of big data analytics.

Instructors

Aije Egwaikhide

Instructors

Romeo Kienzler

Instructors

Rav Ahuja

Introduction to Big Data with Spark and Hadoop

عن

الوحدات

Introduction to Big Data

Module 1 Glossary and Graded Quiz

Introduction to Hadoop

Module 2 Cheat Sheet, Glossary and Graded Quiz

Introduction to Apache Spark

Module 3 Cheat Sheet, Glossary and Graded Quiz

Introduction to DataFrames & Spark SQL

Module 4 Cheat Sheet, Glossary and Graded Quiz

Spark Architecture

Spark Runtime Environments

Module 5 Cheat Sheet, Glossary and Graded Quiz

Introduction to Monitoring and Tuning

Module 6 Cheat Sheet, Glossary and Graded Quiz

Project: Data Analysis using Apache Spark

Course Final Assessment

Course Wrap-Up

Auto Summary

ابدأ التعلّم معنا اليوم!

دردشة مباشرة

Introduction to Big Data with Spark and Hadoop

عن

الوحدات

What Is Big Data?

Introduction to Big Data

Module 1 Glossary and Graded Quiz

Introduction to the Hadoop Ecosystem

Introduction to Hadoop

Module 2 Cheat Sheet, Glossary and Graded Quiz

Apache Spark

Introduction to Apache Spark

Module 3 Cheat Sheet, Glossary and Graded Quiz

DataFrames and Spark SQL

Introduction to DataFrames & Spark SQL

Module 4 Cheat Sheet, Glossary and Graded Quiz

Development and Runtime Environment Options

Spark Architecture

Spark Runtime Environments

Module 5 Cheat Sheet, Glossary and Graded Quiz

Monitoring and Tuning

Introduction to Monitoring and Tuning

Module 6 Cheat Sheet, Glossary and Graded Quiz

Final Project and Assessment

Project: Data Analysis using Apache Spark

Course Final Assessment

Course Wrap-Up

Auto Summary