Sample-based Learning Methods

Buy Now AED 344.99 + VAT

Monthly Subscription Starting at AED 99 + VAT

Level Professional
المدة 22 ساعات hours
الطبع بواسطة University of Alberta
Offered by

عن

In this course, you will learn about several algorithms that can learn near optimal policies based on trial and error interaction with the environment---learning from the agent's own experience. Learning from actual experience is striking because it requires no prior knowledge of the environment's dynamics, yet can still attain optimal behavior. We will cover intuitively simple but powerful Monte Carlo methods, and temporal difference learning methods including Q-learning. We will wrap up this course investigating how we can get the best of both worlds: algorithms that can combine model-based planning (similar to dynamic programming) and temporal difference updates to radically accelerate learning. By the end of this course you will be able to: - Understand Temporal-Difference learning and Monte Carlo as two strategies for estimating value functions from sampled experience - Understand the importance of exploration, when using sampled experience rather than dynamic programming sweeps within a model - Understand the connections between Monte Carlo and Dynamic Programming and TD. - Implement and apply the TD algorithm, for estimating value functions - Implement and apply Expected Sarsa and Q-learning (two TD methods for control) - Understand the difference between on-policy and off-policy control - Understand planning with simulated experience (as opposed to classic planning strategies) - Implement a model-based approach to RL, called Dyna, which uses simulated experience - Conduct an empirical study to see the improvements in sample efficiency when using Dyna

الوحدات

Course Introduction

2 Videos

2 Readings

1 Discussion

Show info about module content

1 Discussions

Meet and Greet!

2 Videos

Course Introduction
Meet your instructors!

2 Readings

Reinforcement Learning Textbook
Read Me: Pre-requisites and Learning Objectives

Introduction to Monte Carlo Methods

2 Videos

2 Readings

1 Discussion

Show info about module content

1 Discussions

Comparing on-policy and off-policy learning

2 Videos

What is Monte Carlo?
Using Monte Carlo for Prediction

2 Readings

Module 1 Learning Objectives
Weekly Reading

Monte Carlo for Control

3 Videos

Show info about module content

3 Videos

Using Monte Carlo for Action Values
Using Monte Carlo methods for generalized policy iteration
Solving the Blackjack Example

Exploration Methods for Monte Carlo

1 Videos

Show info about module content

1 Videos

Epsilon-soft policies

Off-policy Learning for Prediction

5 Videos

1 Readings

1 Programming

1 Assignment

Show info about module content

1 Programming

Blackjack

1 Assignment

Graded Quiz

5 Videos

Why does off-policy learning matter?
Importance Sampling
Off-Policy Monte Carlo Prediction
Emma Brunskill: Batch Reinforcement Learning
Week 1 Summary

1 Readings

Chapter Summary

Introduction to Temporal Difference Learning

2 Videos

2 Readings

1 Discussion

Show info about module content

1 Discussions

Should we care about TD in the brain?

2 Videos

What is Temporal Difference (TD) learning?
Rich Sutton: The Importance of TD Learning

2 Readings

Module 2 Learning Objectives
Weekly Reading

Advantages of TD

4 Videos

1 Programming

1 Assignment

Show info about module content

1 Programming

Policy Evaluation with Temporal Difference Learning

1 Assignment

Practice Quiz

4 Videos

The advantages of temporal difference learning
Comparing TD and Monte Carlo
Andy Barto and Rich Sutton: More on the History of RL
Week 2 Summary

TD for Control

2 Videos

2 Readings

Show info about module content

2 Videos

Sarsa: GPI with TD
Sarsa in the Windy Grid World

2 Readings

Module 3 Learning Objectives
Weekly Reading

Off-policy TD Control: Q-learning

3 Videos

Show info about module content

3 Videos

What is Q-learning?
Q-learning in the Windy Grid World
How is Q-learning off-policy?

Expected Sarsa

4 Videos

1 Readings

1 Discussion

1 Programming

1 Assignment

Show info about module content

1 Programming

Q-Learning and Expected SARSA

1 Assignment

Practice Quiz

1 Discussions

How can we use off-policy for learning multiple goals?

4 Videos

Expected Sarsa
Expected Sarsa in the Cliff World
Generality of Expected Sarsa
Week 3 Summary

1 Readings

Chapter summary

What is a Model?

2 Videos

2 Readings

Show info about module content

2 Videos

What is a Model?
Comparing Sample and Distribution Models

2 Readings

Module 4 Learning Objectives
Weekly Reading

Planning

1 Videos

1 Discussion

Show info about module content

1 Discussions

Compare Planning and Reasoning

1 Videos

Random Tabular Q-planning

Dyna as a formalism for planning

3 Videos

Show info about module content

3 Videos

The Dyna Architecture
The Dyna Algorithm
Dyna & Q-learning in a Simple Maze

Dealing with inaccurate models

4 Videos

2 Readings

1 Programming

2 Assignment

Show info about module content

1 Programming

Dyna-Q and Dyna-Q+

2 Assignment

Practice Assessment
Replacement Practice Assignment

4 Videos

What if the model is inaccurate?
In-depth with changing environments
Drew Bagnell: self-driving, robotics, and Model Based RL
Week 4 Summary

2 Readings

Chapter Summary
Text Book Part 1 Summary

Course Wrap-up

1 Videos

Show info about module content

1 Videos

Congratulations!

Auto Summary

Dive into the world of Data Science & AI with the "Sample-based Learning Methods" course by Coursera. Led by expert instructors, this professional-level course explores algorithms that enable learning near-optimal policies through trial and error, without prior environment knowledge. Key topics include Monte Carlo methods, temporal difference learning, Q-learning, and Dyna. Over 1320 minutes, you'll gain practical skills in implementing these techniques and understand their theoretical foundations. Ideal for data science enthusiasts, this course is available via a Starter subscription.

Instructors

Martha White

Instructors

Adam White