- Level Professional
- المدة 22 ساعات hours
- الطبع بواسطة University of Alberta
-
Offered by
عن
In this course, you will learn about several algorithms that can learn near optimal policies based on trial and error interaction with the environment---learning from the agent's own experience. Learning from actual experience is striking because it requires no prior knowledge of the environment's dynamics, yet can still attain optimal behavior. We will cover intuitively simple but powerful Monte Carlo methods, and temporal difference learning methods including Q-learning. We will wrap up this course investigating how we can get the best of both worlds: algorithms that can combine model-based planning (similar to dynamic programming) and temporal difference updates to radically accelerate learning. By the end of this course you will be able to: - Understand Temporal-Difference learning and Monte Carlo as two strategies for estimating value functions from sampled experience - Understand the importance of exploration, when using sampled experience rather than dynamic programming sweeps within a model - Understand the connections between Monte Carlo and Dynamic Programming and TD. - Implement and apply the TD algorithm, for estimating value functions - Implement and apply Expected Sarsa and Q-learning (two TD methods for control) - Understand the difference between on-policy and off-policy control - Understand planning with simulated experience (as opposed to classic planning strategies) - Implement a model-based approach to RL, called Dyna, which uses simulated experience - Conduct an empirical study to see the improvements in sample efficiency when using Dynaالوحدات
Course Introduction
1
Discussions
- Meet and Greet!
2
Videos
- Course Introduction
- Meet your instructors!
2
Readings
- Reinforcement Learning Textbook
- Read Me: Pre-requisites and Learning Objectives
Introduction to Monte Carlo Methods
1
Discussions
- Comparing on-policy and off-policy learning
2
Videos
- What is Monte Carlo?
- Using Monte Carlo for Prediction
2
Readings
- Module 1 Learning Objectives
- Weekly Reading
Monte Carlo for Control
3
Videos
- Using Monte Carlo for Action Values
- Using Monte Carlo methods for generalized policy iteration
- Solving the Blackjack Example
Exploration Methods for Monte Carlo
1
Videos
- Epsilon-soft policies
Off-policy Learning for Prediction
- Blackjack
1
Assignment
- Graded Quiz
5
Videos
- Why does off-policy learning matter?
- Importance Sampling
- Off-Policy Monte Carlo Prediction
- Emma Brunskill: Batch Reinforcement Learning
- Week 1 Summary
1
Readings
- Chapter Summary
Introduction to Temporal Difference Learning
1
Discussions
- Should we care about TD in the brain?
2
Videos
- What is Temporal Difference (TD) learning?
- Rich Sutton: The Importance of TD Learning
2
Readings
- Module 2 Learning Objectives
- Weekly Reading
Advantages of TD
- Policy Evaluation with Temporal Difference Learning
1
Assignment
- Practice Quiz
4
Videos
- The advantages of temporal difference learning
- Comparing TD and Monte Carlo
- Andy Barto and Rich Sutton: More on the History of RL
- Week 2 Summary
TD for Control
2
Videos
- Sarsa: GPI with TD
- Sarsa in the Windy Grid World
2
Readings
- Module 3 Learning Objectives
- Weekly Reading
Off-policy TD Control: Q-learning
3
Videos
- What is Q-learning?
- Q-learning in the Windy Grid World
- How is Q-learning off-policy?
Expected Sarsa
- Q-Learning and Expected SARSA
1
Assignment
- Practice Quiz
1
Discussions
- How can we use off-policy for learning multiple goals?
4
Videos
- Expected Sarsa
- Expected Sarsa in the Cliff World
- Generality of Expected Sarsa
- Week 3 Summary
1
Readings
- Chapter summary
What is a Model?
2
Videos
- What is a Model?
- Comparing Sample and Distribution Models
2
Readings
- Module 4 Learning Objectives
- Weekly Reading
Planning
1
Discussions
- Compare Planning and Reasoning
1
Videos
- Random Tabular Q-planning
Dyna as a formalism for planning
3
Videos
- The Dyna Architecture
- The Dyna Algorithm
- Dyna & Q-learning in a Simple Maze
Dealing with inaccurate models
- Dyna-Q and Dyna-Q+
2
Assignment
- Practice Assessment
- Replacement Practice Assignment
4
Videos
- What if the model is inaccurate?
- In-depth with changing environments
- Drew Bagnell: self-driving, robotics, and Model Based RL
- Week 4 Summary
2
Readings
- Chapter Summary
- Text Book Part 1 Summary
Course Wrap-up
1
Videos
- Congratulations!
Auto Summary
Dive into the world of Data Science & AI with the "Sample-based Learning Methods" course by Coursera. Led by expert instructors, this professional-level course explores algorithms that enable learning near-optimal policies through trial and error, without prior environment knowledge. Key topics include Monte Carlo methods, temporal difference learning, Q-learning, and Dyna. Over 1320 minutes, you'll gain practical skills in implementing these techniques and understand their theoretical foundations. Ideal for data science enthusiasts, this course is available via a Starter subscription.

Martha White

Adam White