Agents#
This library contains the agents that are used to run the different algorithms.
Available Agents:#
A function is given to create an agent from a configuration dictionary:
- rlib.agents.get_agent(obs_space, action_space, kwargs, q_table=False, ddpg_q_agent=False, ppo_critic=False)#
Global function to get an agent from its type and parameters
This is the function used in the algorithms when a kwargs for an agent are given. The agent type (MLP, CNN…) is usually automatically inferred from the environment’s observation and action spaces.
Example:
>>> from rlib.agents import get_agent >>> import gymnasium as gym >>> >>> env = gym.make("CartPole-v1") >>> # This environment has a Box(4,) observation space and a Discrete(2,) action space >>> # Hence the infered agent type is a MLP with `input_size=4` and `output_size=2` >>> >>> agent = get_agent(env.observation_space, env.action_space, {'hidden_sizes': [64, 64], 'activation': 'tanh'}) >>> >>> print(agent)
Returns:
>>> MLP( >>> (layers): Sequential( >>> (0): Linear(in_features=4, out_features=64, bias=True) >>> (1): ReLU() >>> (2): Linear(in_features=64, out_features=64, bias=True) >>> (3): ReLU() >>> (4): Linear(in_features=64, out_features=2, bias=True) >>> ) >>> )
- Parameters:
obs_space (gym.spaces) – observation space of the environment
action_space (gym.spaces) – action space of the environment
kwargs (dict) – kwargs for the agent, see
rlib.agents
for more details.q_table (bool) – whether to use a QTable agent
ddpg_q_agent (bool) – if True, the agent is a Q function and returns a scalar value for each state-action pair
ppo_critic (bool) – if True, the agent is a critic and returns a scalar value for each state
- Returns:
This is useful for saving and loading agents automatically from file.
QTable#
- class agents.q_table.QTable(env_kwargs, grid_size=10)#
Q-Table class for classic Q-Learning.
The Q-Table is a table of size (grid_size, grid_size, …, grid_size, action_size) where grid_size is the number of discretization for each dimension of the state space and action_size is the number of actions. The values given by the QTable are the Q-values for each state-action pair, i.e.:
\[Q(s_t, a_t) = \sum_{k=0}^{T} \gamma^{k} R(s_{t+k+1}, a_{t+k+1}) \]where \(\gamma\) is the discount factor, \(T\) the end of the episode, \(s_t\) the state at time \(t\) and \(a_t\) the action at time \(t\).
Example:
from rlib.agents import QTable import gymnasium as gym env = gym.make("CartPole-v1") q_table = QTable(env, grid_size=10) # 10 discretization for each dimension of the state space state = env.reset() action = agent.get_action(state) q_table.update(state, action, 0.5) # update the Q-Table with the new value q_s_a = q_table.sample(state, action) # sample the Q-Table for the given state-action pair q_s = q_table.sample(state) # sample the Q-Table for the given state best_action = np.argmax(q_s) # get the best action to take from the given state best_action = q_table.get_action(state) # equivalent to the previous line
- Variables:
grid_size (int) – number of discretization for each dimension of the state space.
state_size (int) – size of the state space.
action_size (int) – size of the action space.
q_table (np.ndarray) – the Q-Table.
- __init__(env_kwargs, grid_size=10)#
Initialize the QTable class
- Parameters:
env_kwargs – gymnasium environment kwargs, is used to call gym.make(**env_kwargs).
grid_size (int, optional) – number of discretization for each dimension of the state space.
- Raises:
ValueError – if the action space is not discrete
- get_action(state)#
Get the action to take from the current state
- Parameters:
state (np.array) – current state
- Returns:
action to take
- Return type:
int
- discretize(state)#
Discretize the state, i.e. convert it to a tuple of integers allowing sampling in the table.
- Parameters:
state (np.array) – current state
- Returns:
discretized state
- Return type:
tuple
- sample(state, action=None)#
Sample the QTable, if no action is given return the Q-values for each action, otherwise return the Q-value for the given action.
- Parameters:
state (np.array) – current state
action (int, optional) – action to take
- Returns:
sampled value
- Return type:
float if action is given, np.array otherwise
- update(state, action, new_value)#
Update the QTable, given a new value for a state-action pair
- Parameters:
state (np.array) – current state
action (int) – action to take
new_value (float) – new value to set
MLP#
- class agents.mlp.MLP(input_size, hidden_sizes, output_size, activation='relu', init_weights=None)#
Simple Multi-Layer Perceptron (MLP) class.
Note that, in order to use algorithm such that
learning.evolution_strategy.EvolutionStrategy
, the gradient of the MLP should be disabled. This can be done by setting the requires_grad argument of each tensor to False.Example:
import torch from rlib.agents import MLP agent = MLP(4, [32, 32], 2, activation='relu') # 4 observations, 2 actions, 2 hidden layers of 32 neurons each x = torch.randn(4) y = agent(x)
- Variables:
layers (torch.nn.Sequential) – A torch.nn.Sequential object containing the layers of the MLP.
- __init__(input_size, hidden_sizes, output_size, activation='relu', init_weights=None)#
Initialize the MLP.
- Parameters:
input_size (int) – The size of the input.
hidden_sizes (list) – A list containing the sizes of the hidden layers.
output_size (int) – The size of the output.
activation (str, optional) – The activation function to use. Should be one of ‘relu’, ‘tanh’ or ‘sigmoid’. Default is ‘relu’.
- Raises:
ValueError – If activation is not one of ‘relu’, ‘tanh’ or ‘sigmoid’.