- Level Professional
- المدة 17 ساعات hours
- الطبع بواسطة Google Cloud
-
Offered by
عن
Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud for data transformation including BigQuery, executing Spark on Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Dataflow. Learners will get hands-on experience building data pipeline components on Google Cloud using Qwiklabs.الوحدات
Introduction
1
Videos
- Course Introduction
EL, ELT, ETL
6
Videos
- Module introduction
- EL, ELT, ETL
- Quality considerations
- How to carry out operations in BigQuery
- Shortcomings
- ETL to solve data quality issues
Quiz
1
Assignment
- Introduction to Building Batch Data Pipelines
The Hadoop Ecosystem
3
Videos
- Module introduction
- The Hadoop ecosystem
- Running Hadoop on Dataproc
Cloud Storage instead of HDFS
1
Videos
- Cloud Storage instead of HDFS
Optimizing Dataproc
4
Videos
- Optimizing Dataproc
- Optimizing Dataproc Storage
- Optimizing Dataproc Templates and Autoscaling
- Optimizing Dataproc Monitoring
Lab
1
External Tool
- Running Apache Spark jobs on Dataproc
2
Videos
- Lab Intro: Running Apache Spark jobs on Dataproc
- Getting Started with Google Cloud and Qwiklabs
Module Summary
1
Videos
- Summary
Quiz
1
Assignment
- Executing Spark on Dataproc
Run batch processing pipelines on Dataflow
6
Videos
- Module introduction
- Introduction to Dataflow
- Why customers value Dataflow
- Building Dataflow Pipelines in code
- Key considerations with designing pipelines
- Transforming data with PTransforms
Lab
2
External Tool
- A Simple Dataflow Pipeline (Python)
- Serverless Data Analysis with Dataflow: A Simple Dataflow Pipeline (Java)
1
Videos
- Lab Intro: Building a Simple Dataflow Pipeline
1
Readings
- Completing Labs in this course
Aggregate with GroupByKey and Combine
1
Videos
- Aggregate with GroupByKey and Combine
Lab
2
External Tool
- MapReduce in Beam (Python)
- Serverless Data Analysis with Beam: MapReduce in Beam (Java)
1
Videos
- Lab Intro: MapReduce in Beam
Side Inputs and Windows
1
Videos
- Side Inputs and Windows of data
Lab
2
External Tool
- Serverless Data Analysis with Dataflow: Side Inputs (Python)
- Serverless Data Analysis with Dataflow: Side Inputs (Java)
1
Videos
- Lab Intro: Serverless Data Analysis with Dataflow: Side Inputs
Dataflow Templates and SQL
1
Videos
- Creating and re-using Pipeline Templates
Module Summary
1
Videos
- Summary
Quiz
1
Assignment
- Serverless Data Processing with Dataflow
Cloud Data Fusion
6
Videos
- Module introduction
- Introduction to Cloud Data Fusion
- Components of Cloud Data Fusion
- Cloud Data Fusion UI
- Build a pipeline
- Explore data using wrangler
Lab
1
External Tool
- Building and Executing a Pipeline Graph with Data Fusion
1
Videos
- Lab Intro: Building and executing a pipeline graph in Cloud Data Fusion
Cloud Composer
5
Videos
- Orchestrate work between Google Cloud services with Cloud Composer
- Apache Airflow Environment
- DAGs and Operators
- Workflow scheduling
- Monitoring and Logging
Lab
1
External Tool
- Lab: An Introduction to Cloud Composer
1
Videos
- Lab Intro: An Introduction to Cloud Composer
Quiz
1
Assignment
- Manage Data Pipelines with Cloud Data Fusion and Cloud Composer
Course Summary
1
Videos
- Course Summary
Auto Summary
"Building Batch Data Pipelines on Google Cloud" is a professional-level IT course on Coursera, designed for those in computer science. It focuses on EL, ELT, and ETL paradigms for batch data, utilizing Google Cloud technologies like BigQuery, Dataproc, Cloud Data Fusion, and Dataflow. Learners will gain hands-on experience through Qwiklabs. The course lasts 1020 minutes and offers various subscription options, including Starter, Professional, and Paid. Ideal for professionals aiming to enhance their data transformation skills on Google Cloud.

Google Cloud Training