ml,

Reinforcement Learning

Kai Kai Apr 12, 2019 · 3 mins read
Share this

I am composing a list of questions that can help me refresh my RL knowledge as I study.

Dynamic Programming

  1. What is DP used for? When should we consider using DP?
  2. DP uses full backups, what does it mean?
  3. The vanilla DP will need two arrays to update. Describe the in-place algorithm.
  4. Describe policy iteration.
  5. Policy evaluation in policy iteration requires multiple sweeps. Describe value iteration that solves this problem.
  6. Generalized Policy Iteration (GPI)

Markov Decision Processes

  1. Definitions
    1. What is Markov process? (S\mathcal{S}, P\mathcal{P})
    2. What is Markov reward process? (S\mathcal{S}, P\mathcal{P}, R\mathcal{R}, γ\gamma)
    3. What is Markov Decision Process? (A\mathcal{A})
    4. What is Value Function, (also called the state value function? (Use GtG_t, the total return) What is an action-value function?
    5. What is the Bellman Equation for MRPs? Concise version in matrix form? What is the Bellman Expectation Equation? What is Bellman Optimality Equation?
    6. What is policy π\pi?
    7. What is an Optimal Value Function? An MDP is “solved” if we know the optimal value fn.
  2. Why MRPs and MDPs are discounted?

Planning by Dynamic Programming

  1. Definitions
    1. What is the Iterative Policy Evaluation algorithm?
    2. What is the Principle of Optimality? How does it relate to Value Iteration?
    3. What is Value Iteration? What is the major difference from policy evaluation? VI is equivalent to some form of PE, describe it. 5.
  2. Describe the goals of using DP for prediction and for control.
  3. Prove that if we act greedily, we are improving the policy. Hint: First prove vπ(s)qπ(s,π(s))v_{\pi}(s) \le q_{\pi}(s, \pi'(s))
  4. Describe some ideas for improving synchronous backup: in-place, prioritised sweeping, and real-time DP.

Model Free Prediction

  1. Definitions
    1. What is Monte-Carlo Learning?

Model Free Control

Value Function Approximation

Kai
Written by Kai
Hi, I am Kai.