Decision Making and Reinforcement Learning

Monthly Subscription Starting at AED 99 + VAT

Level Professional
المدة 47 ساعات hours
الطبع بواسطة Columbia University
Offered by

عن

This course is an introduction to sequential decision making and reinforcement learning. We start with a discussion of utility theory to learn how preferences can be represented and modeled for decision making. We first model simple decision problems as multi-armed bandit problems in and discuss several approaches to evaluate feedback. We will then model decision problems as finite Markov decision processes (MDPs), and discuss their solutions via dynamic programming algorithms. We touch on the notion of partial observability in real problems, modeled by POMDPs and then solved by online planning methods. Finally, we introduce the reinforcement learning problem and discuss two paradigms: Monte Carlo methods and temporal difference learning. We conclude the course by noting how the two paradigms lie on a spectrum of n-step temporal difference methods. An emphasis on algorithms and examples will be a key part of this course.

الوحدات

Week 1: Getting Started and Course Overview

2 Videos

4 Readings

1 Discussion

Show info about module content

1 Discussions

Introduce Yourself!

2 Videos

Introduction to Decision Making and Reinforcement Learning
Course Logistics

4 Readings

Course Syllabus
About the Instructor
Academic Honesty Policy
Discussion Forum Etiquette

Pre-Course Survey

1 Readings

1 Plugin

Show info about module content

1 Readings

Pre-Course Survey

Week 1: Decision Making and Utility Theory

4 Videos

1 Readings

Show info about module content

4 Videos

1.1 Rational Agents and Utility Theory
1.2 Preferences and Axioms of Utility Theory
1.3 Uncertain and Multi-Attribute Utilities
1.4 Value of Perfect Information

1 Readings

Week 1 Lesson Materials

Week 1: Apply Your Knowledge

1 Programming

1 Assignment

Show info about module content

1 Programming

Utility Theory

1 Assignment

Utility Theory

Week 1: Discussion Questions

2 Discussion

Show info about module content

2 Discussions

Discussion on Utility Theory
Week 1 Questions and Feedback

Week 2: Bandit Problems

3 Videos

1 Readings

Show info about module content

3 Videos

2.1 Multi-Armed Bandits and Action Values
2.2 Ɛ-Greedy Action Selection
2.3 Upper Confidence Bound

1 Readings

Week 2 Lesson Materials

Week 2: Apply Your Knowledge

1 Programming

1 Assignment

Show info about module content

1 Programming

Multi-Armed Bandit Problems

1 Assignment

Multi-Armed Bandit Problems

Week 2: Discussion Questions

2 Discussion

Show info about module content

2 Discussions

Discussion on Multi-Armed Bandits
Week 2 Questions and Feedback

Week 3: Markov Decision Processes

6 Videos

1 Readings

Show info about module content

6 Videos

3.1 Markov Decision Process Framework
3.2 Gridworld Example
3.3 Rewards, Utilities, and Discounting
3.4 Policies and Value Functions
3.5 Example: Mini-Gridworld
3.6 Bellman Optimality Equations

1 Readings

Week 3 Lesson Materials

Week 3: Apply Your Knowledge

1 Programming

1 Assignment

Show info about module content

1 Programming

Bellman Equations

1 Assignment

Sequential Decision Problems

Week 3: Discussion Questions

3 Discussion

Show info about module content

3 Discussions

Discussion on Sequential Decision Problem - Part 1
Discussion on Sequential Decision Problem - Part 2
Week 3 Questions and Feedback

Week 4: Dynamic Programming

6 Videos

1 Readings

Show info about module content

6 Videos

4.1 Time-Limited Values
4.2 Value Iteration
4.3 Value Iteration Implementation
4.4 Policy Iteration
4.5 Example: Mini-Gridworld
4.6 Algorithm Complexity

1 Readings

Week 4 Lesson Materials

Week 4: Apply Your Knowledge

2 Programming

1 Assignment

Show info about module content

2 Programming

Value Iteration
Policy Iteration

1 Assignment

Markov Decision Processes

Week 4: Discussion Questions

3 Discussion

Show info about module content

3 Discussions

Discussion on Markov Decision Processes
Discussion on Policy Iteration vs. Value Iteration
Week 4 Questions and Feedback

Week 5: Partially Observable Markov Decision Processes

5 Videos

2 Readings

Show info about module content

5 Videos

5.1 Partial Observability and POMDP
5.2 Belief States
5.3 Belief Transition Model
5.4 Policies and Value Functions
5.5 Example: Mini-Gridworld

2 Readings

Week 5 Lesson Materials
Summary of Weeks 3, 4, and 5

Week 5: Apply Your Knowledge

1 Programming

1 Assignment

Show info about module content

1 Programming

POMDPs

1 Assignment

POMDPs

Week 5: Discussion Questions

3 Discussion

Show info about module content

3 Discussions

Discussion on POMDPs - Part 1
Discussion on POMDPs - Part 2
Week 5 Questions and Feedback

Week 6: Monte Carlo Methods

6 Videos

2 Readings

Show info about module content

6 Videos

6.1 Monte Carlo Methods
6.2 First-Visit MC Prediction
6.3 State-Action Values
6.4 Ɛ−Greedy On-Policy MC Control
6.5 On and Off-Policy MC Control
6.6 Example: Mini-Gridworld

2 Readings

Week 6 Lesson Materials
Post-Lecture Reading

Week 6: Apply Your Knowledge

1 Programming

1 Assignment

Show info about module content

1 Programming

Monte Carlo

1 Assignment

Monte Carlo RL

Week 6: Discussion Questions

2 Discussion

Show info about module content

2 Discussions

Discussion on Monte Carlo RL
Week 6 Questions and Feedback

Week 7: Temporal-Difference Learning

5 Videos

2 Readings

Show info about module content

5 Videos

7.1 Temporal Difference Learning
7.2 Temporal Difference Prediction
7.3 Batch Updating
7.4 TD Learning for Control
7.5 SARSA vs Q-Learning

2 Readings

Week 7 Lesson Materials
Post-Lecture Readings

Week 7: Apply Your Knowledge

3 Programming

1 Assignment

Show info about module content

3 Programming

Tic-Tac-Toe
Q-Learning
SARSA

1 Assignment

Temporal Difference Learning

Week 7: Discussion Questions

2 Discussion

Show info about module content

2 Discussions

Discussion on Temporal Difference RL
Week 7 Questions and Feedback

Week 8: Reinforcement Learning - Generalization

4 Videos

2 Readings

Show info about module content

4 Videos

8.1 𝑛-step Temporal Difference Prediction
8.2 𝑛-step SARSA
8.3 Model-Based Methods
8.4 Function Approximation

2 Readings

Week 8 Lesson Materials
Post-Lecture Readings

Week 8: Apply Your Knowledge

1 Programming

1 Assignment

Show info about module content

1 Programming

Frozen Lake

1 Assignment

Generalization of Tabular Methods

Week 8: Discussion Questions

2 Discussion

Show info about module content

2 Discussions

Reinforcement Learning in Daily Lives
Week 8 Questions and Feedback

Post-Course Survey

1 Readings

1 Plugin

Show info about module content

1 Readings

Post-Course Survey

Auto Summary

"Decision Making and Reinforcement Learning" is an engaging course in Data Science & AI, taught by expert instructors on Coursera. It covers utility theory, multi-armed bandit problems, Markov decision processes, POMDPs, and reinforcement learning with a focus on Monte Carlo methods and temporal difference learning. The course runs for approximately 2820 minutes and offers both Starter and Professional subscription options, making it ideal for professionals seeking in-depth knowledge in sequential decision making and reinforcement learning.

Instructor

Tony Dear

Decision Making and Reinforcement Learning

عن

الوحدات

Week 1: Getting Started and Course Overview

Pre-Course Survey

Week 1: Decision Making and Utility Theory

Week 1: Apply Your Knowledge

Week 1: Discussion Questions

Week 2: Bandit Problems

Week 2: Apply Your Knowledge

Week 2: Discussion Questions

Week 3: Markov Decision Processes

Week 3: Apply Your Knowledge

Week 3: Discussion Questions

Week 4: Dynamic Programming

Week 4: Apply Your Knowledge

Week 4: Discussion Questions

Week 5: Partially Observable Markov Decision Processes

Week 5: Apply Your Knowledge

Week 5: Discussion Questions

Week 6: Monte Carlo Methods

Week 6: Apply Your Knowledge

Week 6: Discussion Questions

Week 7: Temporal-Difference Learning

Week 7: Apply Your Knowledge

Week 7: Discussion Questions

Week 8: Reinforcement Learning - Generalization

Week 8: Apply Your Knowledge

Week 8: Discussion Questions

Post-Course Survey

Auto Summary

ابدأ التعلّم معنا اليوم!

دردشة مباشرة