I am composing a list of questions that can help me refresh my RL knowledge as I study.
Dynamic Programming
- What is DP used for? When should we consider using DP?
- DP uses full backups, what does it mean?
- The vanilla DP will need two arrays to update. Describe the in-place algorithm.
- Describe policy iteration.
- Policy evaluation in policy iteration requires multiple sweeps. Describe value iteration that solves this problem.
- Generalized Policy Iteration (GPI)
Markov Decision Processes
- Definitions
    - What is Markov process? (, )
- What is Markov reward process? (, , , )
- What is Markov Decision Process? ()
- What is Value Function, (also called the state value function? (Use , the total return) What is anaction-value function?
- What is the Bellman Equation for MRPs? Concise version in matrix form? What is the Bellman Expectation Equation? What is Bellman Optimality Equation?
- What is policy ?
- What is an Optimal Value Function? An MDP is “solved” if we know the optimal value fn.
 
- What is 
- Why MRPs and MDPs are discounted?
Planning by Dynamic Programming
- Definitions
    - What is the Iterative Policy Evaluationalgorithm?
- What is the Principle of Optimality? How does it relate toValue Iteration?
- What is Value Iteration? What is the major difference from policy evaluation? VI is equivalent to some form of PE, describe it. 5.
 
- What is the 
- Describe the goals of using DP for prediction and for control.
- Prove that if we act greedily, we are improving the policy. Hint: First prove
- Describe some ideas for improving synchronous backup: in-place, prioritised sweeping, and real-time DP.
Model Free Prediction
- Definitions
    - What is Monte-CarloLearning?
 
- What is 
 Kai
                        
						 Kai