I am composing a list of questions that can help me refresh my RL knowledge as I study.

Dynamic Programming

What is DP used for? When should we consider using DP?
DP uses full backups, what does it mean?
The vanilla DP will need two arrays to update. Describe the in-place algorithm.
Describe policy iteration.
Policy evaluation in policy iteration requires multiple sweeps. Describe value iteration that solves this problem.
Generalized Policy Iteration (GPI)

Markov Decision Processes

Definitions
1. What is the Iterative Policy Evaluation algorithm?
2. What is the Principle of Optimality? How does it relate to Value Iteration?
3. What is Value Iteration? What is the major difference from policy evaluation? VI is equivalent to some form of PE, describe it. 5.
Describe the goals of using DP for prediction and for control.
Prove that if we act greedily, we are improving the policy. Hint: First prove $v_{\pi}(s) \le q_{\pi}(s, \pi'(s))$
Describe some ideas for improving synchronous backup: in-place, prioritised sweeping, and real-time DP.