- Level Expert
- Duration 19 hours
- Course by Google Cloud
-
Offered by
About
In this course, we dive into the components and best practices of building high-performing ML systems in production environments. We cover some of the most common considerations behind building these systems, e.g. static training, dynamic training, static inference, dynamic inference, distributed TensorFlow, and TPUs. This course is devoted to exploring the characteristics that make for a good ML system beyond its ability to make good predictions.Modules
Welcome to the course
2
Videos
- Specialization: Advanced Machine Learning on Google Cloud
- Welcome
2
Readings
- How to download course resources
- How to send feedback
Introduction
1
Videos
- Architecting ML systems
Data extraction, analysis, and preparation
3
Videos
- Data extraction, analysis, and preparation
- Model training, evaluation, and validation
- Trained model, prediction service, and performance monitoring
Design Decisions
2
Videos
- Training design decisions
- Serving design decisions
Designing an Architecture from Scratch
1
Videos
- Designing from scratch
Introducing Vertex AI
1
Assignment
- Architecting production ML systems
1
External Tool
- Lab: Structured data prediction using Vertex AI
3
Videos
- Using Vertex AI
- Lab Introduction: Structured data prediction
- Getting Started with Google Cloud and Qwiklabs
1
Readings
- Architecting production ML systems
Introduction
1
Videos
- Introduction
Adapting to Data
2
External Tool
- Lab: Introduction to TensorFlow data validation
- Lab: Advanced visualizations with TensorFlow data validation
11
Videos
- Adapting to data
- Changing distributions
- Lab: Adapting to data
- Right and wrong decisions
- System failure
- Concept drift
- Actions to mitigate concept drift
- TensorFlow data validation
- Components of TensorFlow data validation
- Lab Introduction: Introduction to TensorFlow data validation
- Lab Introduction: Advanced visualizations with TensorFlow data validation
Mitigating Training-Serving Skew
1
External Tool
- Lab: Vertex AI: Training and Serving a Custom Model
1
Videos
- Mitigating training-serving skew through design
Debugging a Production Model
1
Assignment
- Designing adaptable ML systems
1
Videos
- Diagnosing a production model
1
Readings
- Designing adaptable ML systems
Introduction
1
Videos
- Introduction
Aspects of Performance
2
Videos
- Training
- Predictions
Distributed Training
1
External Tool
- Lab: Distributed training with Keras
8
Videos
- Why distributed training is needed
- Distributed training architectures
- TensorFlow distributed training strategies
- Mirrored strategy
- Multi-worker mirrored strategy
- TPU strategy
- Parameter server strategy
- Lab Introduction: Distributed training with Keras
Faster Input Pipelines
1
External Tool
- Lab: TPU-speed data pipelines
2
Videos
- Training on large datasets with tf.data API
- Lab introduction: TPU-speed data pipelines
Inference
1
Assignment
- Designing high-performance ML systems
1
Videos
- Inference
1
Readings
- Designing high-performance ML systems
Introduction
2
Videos
- Introduction
- Machine Learning on Hybrid Cloud
KubeFlow
1
External Tool
- Running Pipelines on Vertex AI
2
Videos
- KubeFlow
- Lab Introduction: Kubeflow Pipelines with AI Platform
Optimizing TensorFlow for Mobile
1
Assignment
- Hybrid ML systems
3
Videos
- TensorFlow Lite
- Optimizing TensorFlow for mobile
- Summary
1
Readings
- Hybrid ML systems
Summary
1
Videos
- Course summary
Wrap Up
2
Readings
- Production Machine Learning systems - readings
- All quiz questions and answers
Auto Summary
"Production Machine Learning Systems" is an expert-level course offered by Coursera, designed for those looking to master the intricacies of implementing high-performing machine learning systems in real-world production environments. This comprehensive program delves into essential components and best practices, ensuring learners can build robust ML systems that excel beyond mere prediction accuracy. Throughout the course, participants will explore critical aspects such as static and dynamic training and inference, distributed TensorFlow, and the use of Tensor Processing Units (TPUs). The curriculum is crafted to provide a deep understanding of what defines an effective ML system, emphasizing both performance and scalability. With a total duration of 1140 minutes, this course offers extensive, in-depth content suitable for professionals aiming to enhance their skills in the rapidly evolving field of Data Science & AI. Available through a Starter subscription, it caters to an audience of advanced learners ready to take their knowledge to the next level under the guidance of industry experts.

Google Cloud Training