Agents#

This library contains the agents that are used to run the different algorithms.

Available Agents:#

A function is given to create an agent from a configuration dictionary:

rlib.agents.get_agent(obs_space, action_space, kwargs, q_table=False, ddpg_q_agent=False, ppo_critic=False)#

Global function to get an agent from its type and parameters

This is the function used in the algorithms when a kwargs for an agent are given. The agent type (MLP, CNN…) is usually automatically inferred from the environment’s observation and action spaces.

Example:

>>> from rlib.agents import get_agent
>>> import gymnasium as gym
>>>
>>> env = gym.make("CartPole-v1") 
>>> # This environment has a Box(4,) observation space and a Discrete(2,) action space
>>> # Hence the infered agent type is a MLP with `input_size=4` and `output_size=2`
>>> 
>>> agent = get_agent(env.observation_space, 
                      env.action_space, 
                      {'hidden_sizes': [64, 64], 'activation': 'tanh'})
>>> 
>>> print(agent)

Returns:

>>> MLP(
>>>     (layers): Sequential(
>>>         (0): Linear(in_features=4, out_features=64, bias=True)
>>>         (1): ReLU()
>>>         (2): Linear(in_features=64, out_features=64, bias=True)
>>>         (3): ReLU()
>>>         (4): Linear(in_features=64, out_features=2, bias=True)
>>>     )
>>> )
Parameters:
  • obs_space (gym.spaces) – observation space of the environment

  • action_space (gym.spaces) – action space of the environment

  • kwargs (dict) – kwargs for the agent, see rlib.agents for more details.

  • q_table (bool) – whether to use a QTable agent

  • ddpg_q_agent (bool) – if True, the agent is a Q function and returns a scalar value for each state-action pair

  • ppo_critic (bool) – if True, the agent is a critic and returns a scalar value for each state

Returns:

Either MLP or QTable depending on the parameters

This is useful for saving and loading agents automatically from file.

QTable#

class agents.q_table.QTable(env_kwargs, grid_size=10)#

Q-Table class for classic Q-Learning.

The Q-Table is a table of size (grid_size, grid_size, …, grid_size, action_size) where grid_size is the number of discretization for each dimension of the state space and action_size is the number of actions. The values given by the QTable are the Q-values for each state-action pair, i.e.:

\[Q(s_t, a_t) = \sum_{k=0}^{T} \gamma^{k} R(s_{t+k+1}, a_{t+k+1}) \]

where \(\gamma\) is the discount factor, \(T\) the end of the episode, \(s_t\) the state at time \(t\) and \(a_t\) the action at time \(t\).

Example:

from rlib.agents import QTable
import gymnasium as gym

env = gym.make("CartPole-v1")

q_table = QTable(env, grid_size=10)  # 10 discretization for each dimension of the state space
state = env.reset()
action = agent.get_action(state)

q_table.update(state, action, 0.5)  # update the Q-Table with the new value

q_s_a = q_table.sample(state, action)  # sample the Q-Table for the given state-action pair

q_s = q_table.sample(state)  # sample the Q-Table for the given state

best_action = np.argmax(q_s)  # get the best action to take from the given state
best_action = q_table.get_action(state)  # equivalent to the previous line
Variables:
  • grid_size (int) – number of discretization for each dimension of the state space.

  • state_size (int) – size of the state space.

  • action_size (int) – size of the action space.

  • q_table (np.ndarray) – the Q-Table.

__init__(env_kwargs, grid_size=10)#

Initialize the QTable class

Parameters:
  • env_kwargs – gymnasium environment kwargs, is used to call gym.make(**env_kwargs).

  • grid_size (int, optional) – number of discretization for each dimension of the state space.

Raises:

ValueError – if the action space is not discrete

get_action(state)#

Get the action to take from the current state

Parameters:

state (np.array) – current state

Returns:

action to take

Return type:

int

discretize(state)#

Discretize the state, i.e. convert it to a tuple of integers allowing sampling in the table.

Parameters:

state (np.array) – current state

Returns:

discretized state

Return type:

tuple

sample(state, action=None)#

Sample the QTable, if no action is given return the Q-values for each action, otherwise return the Q-value for the given action.

Parameters:
  • state (np.array) – current state

  • action (int, optional) – action to take

Returns:

sampled value

Return type:

float if action is given, np.array otherwise

update(state, action, new_value)#

Update the QTable, given a new value for a state-action pair

Parameters:
  • state (np.array) – current state

  • action (int) – action to take

  • new_value (float) – new value to set

MLP#

class agents.mlp.MLP(input_size, hidden_sizes, output_size, activation='relu', init_weights=None)#

Simple Multi-Layer Perceptron (MLP) class.

Note that, in order to use algorithm such that learning.evolution_strategy.EvolutionStrategy, the gradient of the MLP should be disabled. This can be done by setting the requires_grad argument of each tensor to False.

Example:

import torch
from rlib.agents import MLP

agent = MLP(4, [32, 32], 2, activation='relu')  # 4 observations, 2 actions, 2 hidden layers of 32 neurons each
x = torch.randn(4) 
y = agent(x)
Variables:

layers (torch.nn.Sequential) – A torch.nn.Sequential object containing the layers of the MLP.

__init__(input_size, hidden_sizes, output_size, activation='relu', init_weights=None)#

Initialize the MLP.

Parameters:
  • input_size (int) – The size of the input.

  • hidden_sizes (list) – A list containing the sizes of the hidden layers.

  • output_size (int) – The size of the output.

  • activation (str, optional) – The activation function to use. Should be one of ‘relu’, ‘tanh’ or ‘sigmoid’. Default is ‘relu’.

Raises:

ValueError – If activation is not one of ‘relu’, ‘tanh’ or ‘sigmoid’.