I am composing a list of questions that can help me refresh my RL knowledge as I study.
Dynamic Programming
- What is DP used for? When should we consider using DP?
- DP uses full backups, what does it mean?
- The vanilla DP will need two arrays to update. Describe the in-place algorithm.
- Describe policy iteration.
- Policy evaluation in policy iteration requires multiple sweeps. Describe value iteration that solves this problem.
- Generalized Policy Iteration (GPI)
Markov Decision Processes
- Definitions
- What is
Markov process
? (, ) - What is
Markov reward process
? (, , , ) - What is
Markov Decision Process
? () - What is
Value Function
, (also called the state value function? (Use , the total return) What is anaction-value function
? - What is the
Bellman Equation for MRPs
? Concise version in matrix form? What is the Bellman Expectation Equation? What is Bellman Optimality Equation? - What is policy ?
- What is an
Optimal Value Function
? An MDP is “solved” if we know the optimal value fn.
- What is
- Why MRPs and MDPs are discounted?
Planning by Dynamic Programming
- Definitions
- What is the
Iterative Policy Evaluation
algorithm? - What is the
Principle of Optimality
? How does it relate toValue Iteration
? - What is
Value Iteration
? What is the major difference from policy evaluation? VI is equivalent to some form of PE, describe it. 5.
- What is the
- Describe the goals of using DP for prediction and for control.
- Prove that if we act greedily, we are improving the policy. Hint: First prove
- Describe some ideas for improving synchronous backup: in-place, prioritised sweeping, and real-time DP.
Model Free Prediction
- Definitions
- What is
Monte-Carlo
Learning?
- What is