I am composing a list of questions that can help me refresh my RL knowledge as I study.
Dynamic Programming
- What is DP used for? When should we consider using DP?
- DP uses full backups, what does it mean?
- The vanilla DP will need two arrays to update. Describe the in-place algorithm.
- Describe policy iteration.
- Policy evaluation in policy iteration requires multiple sweeps. Describe value iteration that solves this problem.
- Generalized Policy Iteration (GPI)
Markov Decision Processes
- Definitions
- What is
Markov process? (, ) - What is
Markov reward process? (, , , ) - What is
Markov Decision Process? () - What is
Value Function, (also called the state value function? (Use , the total return) What is anaction-value function? - What is the
Bellman Equation for MRPs? Concise version in matrix form? What is the Bellman Expectation Equation? What is Bellman Optimality Equation? - What is policy ?
- What is an
Optimal Value Function? An MDP is “solved” if we know the optimal value fn.
- What is
- Why MRPs and MDPs are discounted?
Planning by Dynamic Programming
- Definitions
- What is the
Iterative Policy Evaluationalgorithm? - What is the
Principle of Optimality? How does it relate toValue Iteration? - What is
Value Iteration? What is the major difference from policy evaluation? VI is equivalent to some form of PE, describe it. 5.
- What is the
- Describe the goals of using DP for prediction and for control.
- Prove that if we act greedily, we are improving the policy. Hint: First prove
- Describe some ideas for improving synchronous backup: in-place, prioritised sweeping, and real-time DP.
Model Free Prediction
- Definitions
- What is
Monte-CarloLearning?
- What is