- Level Professional
- المدة 47 ساعات hours
- الطبع بواسطة Columbia University
-
Offered by
عن
This course is an introduction to sequential decision making and reinforcement learning. We start with a discussion of utility theory to learn how preferences can be represented and modeled for decision making. We first model simple decision problems as multi-armed bandit problems in and discuss several approaches to evaluate feedback. We will then model decision problems as finite Markov decision processes (MDPs), and discuss their solutions via dynamic programming algorithms. We touch on the notion of partial observability in real problems, modeled by POMDPs and then solved by online planning methods. Finally, we introduce the reinforcement learning problem and discuss two paradigms: Monte Carlo methods and temporal difference learning. We conclude the course by noting how the two paradigms lie on a spectrum of n-step temporal difference methods. An emphasis on algorithms and examples will be a key part of this course.الوحدات
Week 1: Getting Started and Course Overview
1
Discussions
- Introduce Yourself!
2
Videos
- Introduction to Decision Making and Reinforcement Learning
- Course Logistics
4
Readings
- Course Syllabus
- About the Instructor
- Academic Honesty Policy
- Discussion Forum Etiquette
Pre-Course Survey
1
Readings
- Pre-Course Survey
Week 1: Decision Making and Utility Theory
4
Videos
- 1.1 Rational Agents and Utility Theory
- 1.2 Preferences and Axioms of Utility Theory
- 1.3 Uncertain and Multi-Attribute Utilities
- 1.4 Value of Perfect Information
1
Readings
- Week 1 Lesson Materials
Week 1: Apply Your Knowledge
- Utility Theory
1
Assignment
- Utility Theory
Week 1: Discussion Questions
2
Discussions
- Discussion on Utility Theory
- Week 1 Questions and Feedback
Week 2: Bandit Problems
3
Videos
- 2.1 Multi-Armed Bandits and Action Values
- 2.2 Ɛ-Greedy Action Selection
- 2.3 Upper Confidence Bound
1
Readings
- Week 2 Lesson Materials
Week 2: Apply Your Knowledge
- Multi-Armed Bandit Problems
1
Assignment
- Multi-Armed Bandit Problems
Week 2: Discussion Questions
2
Discussions
- Discussion on Multi-Armed Bandits
- Week 2 Questions and Feedback
Week 3: Markov Decision Processes
6
Videos
- 3.1 Markov Decision Process Framework
- 3.2 Gridworld Example
- 3.3 Rewards, Utilities, and Discounting
- 3.4 Policies and Value Functions
- 3.5 Example: Mini-Gridworld
- 3.6 Bellman Optimality Equations
1
Readings
- Week 3 Lesson Materials
Week 3: Apply Your Knowledge
- Bellman Equations
1
Assignment
- Sequential Decision Problems
Week 3: Discussion Questions
3
Discussions
- Discussion on Sequential Decision Problem - Part 1
- Discussion on Sequential Decision Problem - Part 2
- Week 3 Questions and Feedback
Week 4: Dynamic Programming
6
Videos
- 4.1 Time-Limited Values
- 4.2 Value Iteration
- 4.3 Value Iteration Implementation
- 4.4 Policy Iteration
- 4.5 Example: Mini-Gridworld
- 4.6 Algorithm Complexity
1
Readings
- Week 4 Lesson Materials
Week 4: Apply Your Knowledge
- Value Iteration
- Policy Iteration
1
Assignment
- Markov Decision Processes
Week 4: Discussion Questions
3
Discussions
- Discussion on Markov Decision Processes
- Discussion on Policy Iteration vs. Value Iteration
- Week 4 Questions and Feedback
Week 5: Partially Observable Markov Decision Processes
5
Videos
- 5.1 Partial Observability and POMDP
- 5.2 Belief States
- 5.3 Belief Transition Model
- 5.4 Policies and Value Functions
- 5.5 Example: Mini-Gridworld
2
Readings
- Week 5 Lesson Materials
- Summary of Weeks 3, 4, and 5
Week 5: Apply Your Knowledge
- POMDPs
1
Assignment
- POMDPs
Week 5: Discussion Questions
3
Discussions
- Discussion on POMDPs - Part 1
- Discussion on POMDPs - Part 2
- Week 5 Questions and Feedback
Week 6: Monte Carlo Methods
6
Videos
- 6.1 Monte Carlo Methods
- 6.2 First-Visit MC Prediction
- 6.3 State-Action Values
- 6.4 Ɛ−Greedy On-Policy MC Control
- 6.5 On and Off-Policy MC Control
- 6.6 Example: Mini-Gridworld
2
Readings
- Week 6 Lesson Materials
- Post-Lecture Reading
Week 6: Apply Your Knowledge
- Monte Carlo
1
Assignment
- Monte Carlo RL
Week 6: Discussion Questions
2
Discussions
- Discussion on Monte Carlo RL
- Week 6 Questions and Feedback
Week 7: Temporal-Difference Learning
5
Videos
- 7.1 Temporal Difference Learning
- 7.2 Temporal Difference Prediction
- 7.3 Batch Updating
- 7.4 TD Learning for Control
- 7.5 SARSA vs Q-Learning
2
Readings
- Week 7 Lesson Materials
- Post-Lecture Readings
Week 7: Apply Your Knowledge
- Tic-Tac-Toe
- Q-Learning
- SARSA
1
Assignment
- Temporal Difference Learning
Week 7: Discussion Questions
2
Discussions
- Discussion on Temporal Difference RL
- Week 7 Questions and Feedback
Week 8: Reinforcement Learning - Generalization
4
Videos
- 8.1 𝑛-step Temporal Difference Prediction
- 8.2 𝑛-step SARSA
- 8.3 Model-Based Methods
- 8.4 Function Approximation
2
Readings
- Week 8 Lesson Materials
- Post-Lecture Readings
Week 8: Apply Your Knowledge
- Frozen Lake
1
Assignment
- Generalization of Tabular Methods
Week 8: Discussion Questions
2
Discussions
- Reinforcement Learning in Daily Lives
- Week 8 Questions and Feedback
Post-Course Survey
1
Readings
- Post-Course Survey
Auto Summary
"Decision Making and Reinforcement Learning" is an engaging course in Data Science & AI, taught by expert instructors on Coursera. It covers utility theory, multi-armed bandit problems, Markov decision processes, POMDPs, and reinforcement learning with a focus on Monte Carlo methods and temporal difference learning. The course runs for approximately 2820 minutes and offers both Starter and Professional subscription options, making it ideal for professionals seeking in-depth knowledge in sequential decision making and reinforcement learning.

Tony Dear