Machine Learning & Data Science Reinforcement Learning Value Based Q Functions

In reinforcement learning, the Q function (also called the action-value function) represents the expected return of taking a specific action in a specific state, and then following a particular policy thereafter. It is denoted as $Q^{\pi}(s, a)$.

Definition

The Q function is formally defined as: $$ Q^{\pi}(s, a) = \mathbb{E}{\pi} \left[ \sum{t=0}^{\infty} \gamma^t r_{t} \mid s_0 = s, a_0 = a \right] $$ Where:

$s$ is the current state
$a$ is the action taken
$\pi$ is the policy being followed after taking action $a$
$\gamma$ is the discount factor
$r_t$ is the reward at time step $t$

In words: "If I'm in state $s$ and take action $a$ , then follow policy $\pi$ afterwards, what total discounted reward can I expect to receive?"

Relationship to Value Functions

The Q function is closely related to the state-value function $V^{\pi}(s)$ defined in the Machine-Learning-&-Data-Science-Reinforcement-Learning-Fundamentals-Bellman-Equation. While $V^{\pi}(s)$ tells us the expected return from a state following policy $\pi$ , $Q^{\pi}(s, a)$ tells us the expected return from taking a specific action first, then following the policy.

The relationship between them is: $$ V^{\pi}(s) = \mathbb{E}_{a \sim \pi} \left[ Q^{\pi}(s, a) \right] $$

In other words, the value of a state is the expected Q-value over all actions that the policy might take in that state.

For a deterministic policy, this simplifies to: $$ V^{\pi}(s) = Q^{\pi}(s, \pi(s)) $$

The Bellman Equation for Q Functions

Just like value functions, Q functions satisfy their own Bellman equation: $$ Q^{\pi}(s, a) = \mathbb{E}{s' \sim P} \left[ r(s, a) + \gamma \mathbb{E}{a' \sim \pi} \left[ Q^{\pi}(s', a') \right] \right] $$

Where:

$r(s, a)$ is the immediate reward for taking action $a$ in state $s$
$s'$ is the next state
$a'$ is the next action according to policy $\pi$

In words: "The Q-value of taking action $a$ in state $s$ is the immediate reward plus the discounted expected Q-value of the next state and action."

For the optimal Q function $Q^(s, a)$ , the Bellman optimality equation is: $$ Q^(s, a) = \mathbb{E}{s'} \left[ r(s, a) + \gamma \max{a'} Q^*(s', a') \right] $$

This optimal Q function represents the expected return of taking action $a$ in state $s$ and then acting optimally thereafter.

Why Q Functions Matter

Q functions are fundamental to many reinforcement learning algorithms:

Q-Learning: Uses the Bellman optimality equation to iteratively learn $Q^(s, a)$ , enabling the agent to derive an optimal policy by always choosing $\arg\max_a Q^(s, a)$.

Deep Q-Networks (DQN): Uses neural networks to approximate Q functions in high-dimensional state spaces.

Actor-Critic Methods: Use Q functions (or approximations) to evaluate actions taken by the policy, providing lower-variance gradient estimates than methods like Machine-Learning-&-Data-Science-Reinforcement-Learning-Policy-Optimization-REINFORCE.

Advantage-Estimation: Q functions are used to compute the advantage $A^{\pi}(s, a) = Q^{\pi}(s, a) - V^{\pi}(s)$ , which measures how much better an action is compared to the average action in that state.

Optimal Policy from Q Functions

Once we know the optimal Q function $Q^(s, a)$ , we can extract the optimal policy trivially: $$ \pi^(s) = \arg\max_a Q^(s, a) $$ This is one of the key advantages of Q functions: if we can learn $Q^$ , we immediately have the optimal policy without needing to learn the environment's dynamics $P(s' \mid s, a)$.

Machine Learning & Data Science Reinforcement Learning Value Based Q Functions

Definition

Relationship to Value Functions

The Bellman Equation for Q Functions

Why Q Functions Matter

Optimal Policy from Q Functions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!