Agent 

This page is organized as follow:

Objectives 

In this RL framework, an Agent is an entity that acts on the Environment (modeled in grid2op as an object of class Environment). In grid2op such entity is modeled by the BaseAgent class. It can alternatively be named “bot” or “controller” in other literature.

This module presents a few possible BaseAgent that can serve either as baseline, or as example on how to implement such agents. NB Stronger baselines are defined in an another repository.

To perform their actions, agent receive two main signals from the grid2op.Environment:

the grid2op.Reward.BaseReward that states how good the previous has been

the grid2op.Observation.BaseObservation that is a (partial) view on the state of the Environment.

Both these signals can be use to determine what is the best action to perform on the grid. This is actually the main objective of an BaseAgent, and this is done in the BaseAgent.act() method.

To get started coding your agent we encourage you to read the description of the Action to know how to implement your action. Don’t hesitate to have a look at the Easier actions manipulation for an easier / higher level action manipulation.

Once you know how to manipulate a powergrid in case of the grid2op framework, you can easily implement an agent following this example

import grid2op
from grid2op.Agent import BaseAgent

class MyCustomAgent(BaseAgent):
    def __init__(self, action_space, something_else, and_another_something):
        # define here the constructor of your agent
        # here we say our agent needs "something_else" and "and_another_something"
        # to be built just to demonstrate it does not cause any problem to extend the
        # construction of the base class BaseAgent that only takes "action_space" as a constructor
        BaseAgent.__init__(self, action_space)
        self.something_else = something_else
        self.and_another_something = and_another_something

    def act(obs, reward, done=False):
        # this is the only method you need to implement
        # it takes an observation obs (and a reward and a flag)
        # and should return a valid action
        dictionary_describing_the_action = {}  # this can be anything you want that grid2op understands
        my_action = env.action_space(dictionary_describing_the_action)
        return my_action

Detailed Documentation by class 

Classes:

`AgentWithConverter`(action_space[, ...])	Compared to a regular BaseAgent, these types of Agents are able to deal with a different representation of `grid2op.Action.BaseAction` and `grid2op.Observation.BaseObservation`.
`AlertAgent`(action_space[, grid_controler, ...])	This is a `AlertAgent` example, which will attempt to reconnect powerlines and send alerts on the worst possible attacks: for each disconnected powerline that can be reconnected, it will simulate the effect of reconnecting it.
`BaseAgent`(action_space)	This class represents the base class of an BaseAgent.
`DeltaRedispatchRandomAgent`(action_space[, ...])	INTERNAL
`DoNothingAgent`(action_space)	This is the most basic BaseAgent.
`FromActionsListAgent`(action_space[, action_list])	This type of agent will perform some actions based on a provided list of actions.
`GreedyAgent`(action_space)	This is a class of "Greedy BaseAgent".
`MLAgent`(action_space[, action_space_converter])	This agent allows to handle only vectors.
`OneChangeThenNothing`(action_space)	This is a specific kind of BaseAgent.
`PowerLineSwitch`(action_space)	This is a `GreedyAgent` example, which will attempt to disconnect powerlines.
`RandomAgent`(action_space[, ...])	This agent acts randomly on the powergrid.
`RecoPowerlineAgent`(action_space)	This is a `GreedyAgent` example, which will attempt to reconnect powerlines: for each disconnected powerline that can be reconnected, it will simulate the effect of reconnecting it.
`RecoPowerlinePerArea`(action_space, ...)	This class acts like the `RecoPowerlineAgent` but it is able to reconnect multiple lines at the same steps (one line per area).
`TopologyGreedy`(action_space)	This is a `GreedyAgent` example, which will attempt to reconfigure the substations connectivity.

class grid2op.Agent.AgentWithConverter(action_space, action_space_converter=None, **kwargs_converter)[source]

Compared to a regular BaseAgent, these types of Agents are able to deal with a different representation of grid2op.Action.BaseAction and grid2op.Observation.BaseObservation.

As any other Agents, AgentWithConverter will implement the BaseAgent.act() method. But for them, it’s slightly different.

They receive in this method an observation, as an object (ie an instance of grid2op.Observation.BaseObservation). This object can then be converted to any other object with the method AgentWithConverter.convert_obs().

Then, this transformed_observation is pass to the method AgentWithConverter.my_act() that is supposed to be defined for each agents. This function outputs an encoded_act which can be whatever you want to be.

Finally, the encoded_act is decoded into a proper action, object of class grid2op.Action.BaseAction, thanks to the method AgentWithConverter.convert_act().

This allows, for example, to represent actions as integers to train more easily standard discrete control algorithm used to solve atari games for example.

NB It is possible to define AgentWithConverter.convert_obs() and AgentWithConverter.convert_act(): or to define a grid2op.Converters.Converter and feed it to the action_space_converter parameters used to initialise the class. The second option is preferred, as the AgentWithConverter.action_space will then directly be this converter. Such an BaseAgent will really behave as if the actions are encoded the way he wants.

Examples

For example, imagine an BaseAgent uses a neural networks to take its decision.

Suppose also that, after some features engineering, it’s best for the neural network to use only the load active values (grid2op.Observation.BaseObservation.load_p) and the sum of the relative flows (grid2op.Observation.BaseObservation.rho) with the active flow (grid2op.Observation.BaseObservation.p_or) [NB that agent would not make sense a priori, but who knows]

Suppose that this neural network can be accessed with a class AwesomeNN (not available…) that can predict some actions. It can be loaded with the “load” method and make predictions with the “predict” method.

For the sake of the examples, we will suppose that this agent only predicts powerline status (so 0 or 1) that are represented as vector. So we need to take extra care to convert this vector from a numpy array to a valid action.

This is done below:

import grid2op
import AwesomeNN # this does not exists!
# create a simple environment
env = grid2op.make("l2rpn_case14_sandbox")

# define the class above
class AgentCustomObservation(AgentWithConverter):
    def __init__(self, action_space, path):
        AgentWithConverter.__init__(self, action_space)
        self.my_neural_network = AwesomeNN()
        self.my_neural_networl.load(path)

    def convert_obs(self, observation):
        # convert the observation
        return np.concatenate((observation.load_p, observation.rho + observation.p_or))

    def convert_act(self, encoded_act):
        # convert back the action, output from the NN "self.my_neural_network"
        # to a valid action
        act = self.action_space({"set_status": encoded_act})

    def my_act(self, transformed_observation, reward, done=False):
        act_predicted = self.my_neural_network(transformed_observation)
        return act_predicted


# make the agent that behaves as expected.
my_agent = AgentCustomObservation(action_space=env.action_space, path=".")

# this agent is perfectly working :-) You can use it as any other agents.

action_space_converter

The converter that is used to represents the BaseAgent action space. Might be set to None if not initialized

Type:: grid2op.Converters.Converter

init_action_space

The initial action space. This corresponds to the action space of the grid2op.Environment.Environment.

Type:: grid2op.Action.ActionSpace

action_space

If a converter is used, then this action space represents is this converter. The agent will behave as if the action space is directly encoded the way it wants.

Type:: grid2op.Converters.ActionSpace

Methods:

`act`(observation, reward[, done])	Standard method of an `BaseAgent`.
`convert_act`(encoded_act)	This function will convert an "ecnoded action" that be of any types, to a valid action that can be ingested by the environment.
`convert_obs`(observation)	This function convert the observation, that is an object of class `grid2op.Observation.BaseObservation` into a representation understandable by the BaseAgent.
`my_act`(transformed_observation, reward[, done])	This method should be override if this class is used.
`seed`(seed)	Seed the agent AND the associated converter if it needs to be seeded.

act(observation, reward, done=False)[source]

Standard method of an BaseAgent. There is no need to overload this function.

Parameters:

observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns:

res – The action chosen by the bot / controler / agent.

Return type:

grid2op.Action.Action

convert_act(encoded_act)[source]

This function will convert an “ecnoded action” that be of any types, to a valid action that can be ingested by the environment.

Parameters:: encoded_act (object) – Anything that represents an action.
Returns:: act – A valid actions, represented as a class, that corresponds to the encoded action given as input.
Return type:: :grid2op.BaseAction.BaseAction`

convert_obs(observation)[source]

This function convert the observation, that is an object of class grid2op.Observation.BaseObservation into a representation understandable by the BaseAgent.

For example, and agent could only want to look at the relative flows grid2op.Observation.BaseObservation.rho to take his decision. This is possible by overloading this method.

This method can also be used to scale the observation such that each compononents has mean 0 and variance 1 for example.

Parameters:: observation (grid2op.Observation.Observation) – Initial observation received by the agent in the BaseAgent.act() method.
Returns:: res – Anything that will be used by the BaseAgent to take decisions.
Return type:: object

abstractmethod my_act(transformed_observation, reward, done=False)[source]

This method should be override if this class is used. It is an “abstract” method.

If someone wants to make a agent that handles different kinds of actions an observation.

Parameters:

transformed_observation (object) – Anything that will be used to create an action. This is the results to the call of AgentWithConverter.convert_obs(). This is likely a numpy array.
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns:

res – A representation of an action in any possible format. This action will then be ingested and formatted into a valid action with the AgentWithConverter.convert_act() method.

Return type:

object

seed(seed)[source]

Seed the agent AND the associated converter if it needs to be seeded.

See a more detailed explanation in BaseAgent.seed() for more information about seeding.

class grid2op.Agent.AlertAgent(action_space, grid_controler=<class 'grid2op.Agent.recoPowerlineAgent.RecoPowerlineAgent'>, percentage_alert=30, simu_step=1, threshold=0.99)[source]

This is a AlertAgent example, which will attempt to reconnect powerlines and send alerts on the worst possible attacks: for each disconnected powerline that can be reconnected, it will simulate the effect of reconnecting it. And reconnect the one that lead to the highest simulated reward. It will also simulate the effect of having a line disconnection on attackable lines and raise alerts for the worst ones

Methods:

act(observation, reward[, done])

This is the main method of an BaseAgent.

act(observation: BaseObservation, reward: float, done: bool = False) → BaseAction[source]

This is the main method of an BaseAgent. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).

Notes

In order to be reproducible, and to make proper use of the BaseAgent.seed() capabilities, you must absolutely NOT use the random python module (which will not be seeded) nor the np.random module and avoid any other “sources” of pseudo random numbers.

You can adapt your code the following way. Instead of using np.random use self.space_prng.

For example, if you wanted to write np.random.randint(1,5) replace it by self.space_prng.randint(1,5). It is the same for np.random.normal() that is replaced by self.space_prng.normal().

You have an example of such usage in RandomAgent.my_act().

If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the BaseAgent.seed() accordingly. In that

Parameters:

observation (grid2op.Observation.BaseObservation) – The current observation of the grid2op.Environment.Environment
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns:

res – The action chosen by the bot / controler / agent.

Return type:

grid2op.Action.PlaybleAction

class grid2op.Agent.BaseAgent(action_space: ActionSpace)[source]

This class represents the base class of an BaseAgent. All bot / controller / agent used in the Grid2Op simulator should derived from this class.

To work properly, it is advise to create BaseAgent after the grid2op.Environment has been created and reuse the grid2op.Environment.Environment.action_space to build the BaseAgent.

action_space

It represent the action space ie a tool that can serve to create valid action. Note that a valid action can be illegal or ambiguous, and so lead to a “game over” or to a error. But at least it will have a proper size.

Type:: grid2op.Action.ActionSpace

Methods:

`act`(observation, reward[, done])	This is the main method of an BaseAgent.
`load_state`(loadstate_path)	An optional method to re-load the internal agent state that was saved with self.save_state.
`reset`(obs)	This method is called at the beginning of a new episode.
`save_state`(savestate_path)	An optional method to save the internal state of your agent.
`seed`(seed)	This function is used to guarantee that the "pseudo random numbers" generated and used by the agent instance will be deterministic.

abstractmethod act(observation: BaseObservation, reward: float, done: bool = False) → BaseAction[source]

This is the main method of an BaseAgent. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).

Notes

In order to be reproducible, and to make proper use of the BaseAgent.seed() capabilities, you must absolutely NOT use the random python module (which will not be seeded) nor the np.random module and avoid any other “sources” of pseudo random numbers.

You can adapt your code the following way. Instead of using np.random use self.space_prng.

For example, if you wanted to write np.random.randint(1,5) replace it by self.space_prng.randint(1,5). It is the same for np.random.normal() that is replaced by self.space_prng.normal().

You have an example of such usage in RandomAgent.my_act().

If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the BaseAgent.seed() accordingly. In that

Parameters:

observation (grid2op.Observation.BaseObservation) – The current observation of the grid2op.Environment.Environment
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns:

res – The action chosen by the bot / controler / agent.

Return type:

grid2op.Action.PlaybleAction

load_state(loadstate_path: PathLike)[source]

An optional method to re-load the internal agent state that was saved with self.save_state. This can be useful to re-set your agent to an earlier simulation time step and reproduce past experiments with Grid2Op. Concept developed by Fraunhofer IEE KES.

Notes

First, the internal state your agent consists of attributes that are contained in the grid2op.Agent.BaseAgent and grid2op.Agent.BaseAgent.action_space. Such attributes can easily be re-set with setattr().

Second, your agent may contain custom attributes, such as e.g. a vector of line indices from a Grid2Op observation. You can re-set them with setattr() as well.

Third, your agent may contain very specific modules such as Tensorflow that do not support the simple setattr(). However, these modules normally have their own methods to re-load an internal state. Examples of such methods are load_weights() that you can integrate in your implementation of self.load_state.

Parameters:: savestate_path (string) – The path from which your agent state variables should be loaded

reset(obs: BaseObservation)[source]

This method is called at the beginning of a new episode. It is implemented by agents to reset their internal state if needed.

obs

The first observation corresponding to the initial state of the environment.

Type:: grid2op.Observation.BaseObservation

save_state(savestate_path: PathLike)[source]

An optional method to save the internal state of your agent. The saved state can later be re-loaded with self.load_state, e.g. to repeat a Grid2Op time step with exactly the same internal parameterization. This can be useful to repeat Grid2Op experiments and analyze why your agent performed certain actions in past time steps. Concept developed by Fraunhofer IEE KES.

Notes

First, the internal state your agent consists of attributes that are contained in the grid2op.Agent.BaseAgent and grid2op.Agent.BaseAgent.action_space. Examples are the parameterization and seeds of the random number generator that your agent uses. Such attributes can easily be obtained with the getattr() and stored in a common file format, such as .npy.

Second, your agent may contain custom attributes, such as e.g. a vector of line indices from a Grid2Op observation. You could obtain and save them in the same way as explained before.

Third, your agent may contain very specific modules such as Tensorflow that do not support the simple getattr(). However, these modules normally have their own methods to save an internal state. Examples of such methods are save_weights() that you can integrate in your implementation of self.save_state.

Parameters:: savestate_path (string) – The path to which your agent state variables should be saved

seed(seed: int) → None[source]

This function is used to guarantee that the “pseudo random numbers” generated and used by the agent instance will be deterministic.

This guarantee, if the recommendation in BaseAgent.act() are followed that the agent will produce the same set of actions if it faces the same observations in the same order. This is particularly important for random agent.

You can override this function with the method of your choosing, but if you do so, don’t forget to call super().seed(seed).

Parameters:: seed (int) – The seed used
Returns:: seed – a tuple of seed used
Return type:: tuple

class grid2op.Agent.DeltaRedispatchRandomAgent(action_space, n_gens_to_redispatch=2, redispatching_delta=1.0)[source]

INTERNAL

Warning

/!\ Internal, do not use unless you know what you are doing /!\

Used for test. Prefer using a random agent by selecting only the redispatching action that you want.

This agent will perform some redispatch of a given amount among randomly selected dispatchable generators.

Parameters:

action_space (grid2op.Action.ActionSpace) – the Grid2Op action space
n_gens_to_redispatch (int) – The maximum number of dispatchable generators to play with
redispatching_delta (float) – The redispatching MW value used in both directions

Methods:

act(observation, reward[, done])

This is the main method of an BaseAgent.

act(observation, reward, done=False)[source]

This is the main method of an BaseAgent. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).

Notes

In order to be reproducible, and to make proper use of the BaseAgent.seed() capabilities, you must absolutely NOT use the random python module (which will not be seeded) nor the np.random module and avoid any other “sources” of pseudo random numbers.

You can adapt your code the following way. Instead of using np.random use self.space_prng.

For example, if you wanted to write np.random.randint(1,5) replace it by self.space_prng.randint(1,5). It is the same for np.random.normal() that is replaced by self.space_prng.normal().

You have an example of such usage in RandomAgent.my_act().

If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the BaseAgent.seed() accordingly. In that

Parameters:

observation (grid2op.Observation.BaseObservation) – The current observation of the grid2op.Environment.Environment
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns:

res – The action chosen by the bot / controler / agent.

Return type:

grid2op.Action.PlaybleAction

class grid2op.Agent.DoNothingAgent(action_space)[source]

This is the most basic BaseAgent. It is purely passive, and does absolutely nothing.

As opposed to most reinforcement learning environments, in grid2op, doing nothing is often the best solution.

Methods:

act(observation, reward[, done])

As better explained in the document of grid2op.BaseAction.update() or grid2op.BaseAction.ActionSpace.__call__().

act(observation, reward, done=False)[source]

As better explained in the document of grid2op.BaseAction.update() or grid2op.BaseAction.ActionSpace.__call__().

The preferred way to make an object of type action is to call grid2op.BaseAction.ActionSpace.__call__() with the dictionary representing the action. In this case, the action is “do nothing” and it is represented by the empty dictionary.

Parameters:

observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns:

res – The action chosen by the bot / controller / agent.

Return type:

grid2op.Action.Action

class grid2op.Agent.FromActionsListAgent(action_space, action_list=None)[source]

This type of agent will perform some actions based on a provided list of actions. If no action is provided for a given step (for example because it survives for more steps that the length of the provided action list, it will do nothing.

Notes

No check are performed to make sure the action types is compatible with the environment. For example, the environment might prevent to perform redispatching, but, at the creation of the agent, we do not ensure that no actions performing redispatching are performed.

Methods:

act(observation, reward[, done])

This is the main method of an BaseAgent.

act(observation, reward, done=False)[source]

This is the main method of an BaseAgent. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).

Notes

In order to be reproducible, and to make proper use of the BaseAgent.seed() capabilities, you must absolutely NOT use the random python module (which will not be seeded) nor the np.random module and avoid any other “sources” of pseudo random numbers.

You can adapt your code the following way. Instead of using np.random use self.space_prng.

For example, if you wanted to write np.random.randint(1,5) replace it by self.space_prng.randint(1,5). It is the same for np.random.normal() that is replaced by self.space_prng.normal().

You have an example of such usage in RandomAgent.my_act().

If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the BaseAgent.seed() accordingly. In that

Parameters:

observation (grid2op.Observation.BaseObservation) – The current observation of the grid2op.Environment.Environment
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns:

res – The action chosen by the bot / controler / agent.

Return type:

grid2op.Action.PlaybleAction

class grid2op.Agent.GreedyAgent(action_space)[source]

This is a class of “Greedy BaseAgent”. Greedy agents are all executing the same kind of algorithm to take action:

They grid2op.Observation.Observation.simulate() all actions in a given set

They take the action that maximise the simulated reward among all these actions

This class is an abstract class (object of this class cannot be created). To create “GreedyAgent” one must override this class. Examples are provided with PowerLineSwitch and TopologyGreedy.

Methods:

`_get_tested_action`(observation)	Returns the list of all the candidate actions.
`act`(observation, reward[, done])	By definition, all "greedy" agents are acting the same way.

abstractmethod _get_tested_action(observation)[source]

Returns the list of all the candidate actions.

From this list, the one that achieve the best “simulated reward” is used.

Parameters:: observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment
Returns:: res – A list of all candidate grid2op.BaseAction.BaseAction
Return type:: list

act(observation, reward, done=False)[source]

By definition, all “greedy” agents are acting the same way. The only thing that can differentiate multiple agents is the actions that are tested.

These actions are defined in the method _get_tested_action(). This act() method implements the greedy logic: take the actions that maximizes the instantaneous reward on the simulated action.

Parameters:

observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns:

res – The action chosen by the bot / controller / agent.

Return type:

grid2op.Action.Action

class grid2op.Agent.MLAgent(action_space, action_space_converter=<class 'grid2op.Converter.ToVect.ToVect'>, **kwargs_converter)[source]

This agent allows to handle only vectors. The “my_act” function will return “do nothing” action (so it needs to be override)

In this class, the “my_act” is expected to return a vector that can be directly converted into a valid action.

Methods:

`convert_from_vect`(act)	Helper to convert an action, represented as a numpy array as an `grid2op.BaseAction` instance.
`my_act`(transformed_observation, reward[, done])	By default this agent returns only the "do nothing" action, unless some smarter implementations are provided for this function.

convert_from_vect(act)[source]

Helper to convert an action, represented as a numpy array as an grid2op.BaseAction instance.

Parameters:: act (numppy.ndarray) – An action cast as an grid2op.BaseAction.BaseAction instance.
Returns:: res – The act parameters converted into a proper grid2op.BaseAction.BaseAction object.
Return type:: grid2op.Action.Action

my_act(transformed_observation, reward, done=False)[source]

By default this agent returns only the “do nothing” action, unless some smarter implementations are provided for this function.

Parameters:

transformed_observation (numpy.ndarray, dtype=float) – The observation transformed into a 1d numpy array of float. All components of the observation are kept.
reward (float) – Reward of the previous action
done (bool) – Whether the episode is over or not.

Returns:

res – The action taken represented as a vector.

Return type:

numpy.ndarray, dtype=float

class grid2op.Agent.OneChangeThenNothing(action_space)[source]

This is a specific kind of BaseAgent. It does an BaseAction (possibly non empty) at the first time step and then does nothing.

This class is an abstract class and cannot be instanciated (ie no object of this class can be created). It must be overridden and the method OneChangeThenNothing._get_dict_act() be defined. Basically, it must know what action to do.

my_dict

Representation, as a dictionnary of the only action that this Agent will do at the first time step.

Type:: dict (class member)

Examples

We advise to use this class as following

import grid2op
from grid2op.Agent import OneChangeThenNothing
acts_dict_ = [{}, {"set_line_status": [(0,-1)]}]  # list of dictionaries. Each dictionary
# represents a valid action

env = grid2op.make("l2rpn_case14_sandbox")  # create an environment
for act_as_dict in zip(acts_dict_):
    # generate the proper class that will perform the first action (encoded by {}) in acts_dict_
    agent_class = OneChangeThenNothing.gen_next(act_as_dict)

    # start a runner with this agent
    runner = Runner(**env.get_params_for_runner(), agentClass=agent_class)
    # run 2 episode with it
    res_2 = runner.run(nb_episode=2)

Methods:

`_get_dict_act`()	Function that need to be overridden to indicate which action to perform.
`act`(observation, reward[, done])	This is the main method of an BaseAgent.
`gen_next`(dict_)	This function allows to change the dictionnary of the action that the agent will perform.
`reset`(obs)	This method is called at the beginning of a new episode.

_get_dict_act()[source]

Function that need to be overridden to indicate which action to perform.

Returns:: res – A dictionnary that can be converted into a valid grid2op.BaseAction.BaseAction. See the help of grid2op.BaseAction.ActionSpace.__call__() for more information.
Return type:: dict

act(observation, reward, done=False)[source]

This is the main method of an BaseAgent. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).

Notes

In order to be reproducible, and to make proper use of the BaseAgent.seed() capabilities, you must absolutely NOT use the random python module (which will not be seeded) nor the np.random module and avoid any other “sources” of pseudo random numbers.

You can adapt your code the following way. Instead of using np.random use self.space_prng.

For example, if you wanted to write np.random.randint(1,5) replace it by self.space_prng.randint(1,5). It is the same for np.random.normal() that is replaced by self.space_prng.normal().

You have an example of such usage in RandomAgent.my_act().

If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the BaseAgent.seed() accordingly. In that

Parameters:

observation (grid2op.Observation.BaseObservation) – The current observation of the grid2op.Environment.Environment
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns:

res – The action chosen by the bot / controler / agent.

Return type:

grid2op.Action.PlaybleAction

classmethod gen_next(dict_)[source]

This function allows to change the dictionnary of the action that the agent will perform.

See the class level documentation for an example on how to use this.

Parameters:: dict (dict) – A dictionnary representing an action. This dictionnary is assumed to be convertible into an action. No check is performed at this stage.

reset(obs)[source]

This method is called at the beginning of a new episode. It is implemented by agents to reset their internal state if needed.

obs

The first observation corresponding to the initial state of the environment.

Type:: grid2op.Observation.BaseObservation

class grid2op.Agent.PowerLineSwitch(action_space)[source]

This is a GreedyAgent example, which will attempt to disconnect powerlines.

It will choose among:

doing nothing

changing the status of one powerline

which action that will maximize the simulated reward. All powerlines are tested at each steps. This means that if n is the number of powerline on the grid, at each steps this actions will perform n +1 calls to “simulate” (one to do nothing and one that change the status of each powerline)

Methods:

_get_tested_action(observation)

Returns the list of all the candidate actions.

_get_tested_action(observation)[source]

Returns the list of all the candidate actions.

From this list, the one that achieve the best “simulated reward” is used.

Parameters:: observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment
Returns:: res – A list of all candidate grid2op.BaseAction.BaseAction
Return type:: list

class grid2op.Agent.RandomAgent(action_space, action_space_converter=<class 'grid2op.Converter.IdToAct.IdToAct'>, **kwargs_converter)[source]

This agent acts randomly on the powergrid. It uses the grid2op.Converters.IdToAct to compute all the possible actions available for the environment. And then chooses a random one among all these.

Notes

Actions are taken uniformly at random among unary actions. For example, if a game rules allows to take actions that can disconnect a powerline AND modify the topology of a substation an action that do both will not be sampled by this class.

This agent is not equivalent to calling env.action_space.sample() because the sampling is not done the same manner. This agent sample uniformly among all unary actions whereas env.action_space.sample() (see grid2op.Action.SerializableActionSpace.sample() for more information about the later).

Methods:

my_act(transformed_observation, reward[, done])

A random agent will "simply" draw a random number between 0 and the number of action, and return this action.

my_act(transformed_observation, reward, done=False)[source]

A random agent will “simply” draw a random number between 0 and the number of action, and return this action.

This is equivalent to draw uniformly at random a feasible action.

Notes

In order to be working as intended, it is crucial that this method does not rely on any other source of “pseudo randomness” than grid2op.Space.RandomObject.space_prng.

In particular, you must avoid to use np.random.XXXX or the random python module. You can replace any call to np.random.XXX by self.space_prng.XXX (eg np.random.randint(1,5) can be replaced by self.space_prng.randint(1,5)).

If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the BaseAgent.seed() accordingly.

class grid2op.Agent.RecoPowerlineAgent(action_space)[source]

This is a GreedyAgent example, which will attempt to reconnect powerlines: for each disconnected powerline that can be reconnected, it will simulate the effect of reconnecting it. And reconnect the one that lead to the highest simulated reward.

Methods:

_get_tested_action(observation)

Returns the list of all the candidate actions.

_get_tested_action(observation)[source]

Returns the list of all the candidate actions.

From this list, the one that achieve the best “simulated reward” is used.

Parameters:: observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment
Returns:: res – A list of all candidate grid2op.BaseAction.BaseAction
Return type:: list

class grid2op.Agent.RecoPowerlinePerArea(action_space: ActionSpace, areas_by_sub_id: dict)[source]

This class acts like the RecoPowerlineAgent but it is able to reconnect multiple lines at the same steps (one line per area).

The “areas” are defined by a list of list of substation id provided as input.

Of course the area you provide to the agent should be the same as the areas used in the rules of the game. Otherwise, the agent might try to reconnect two powerline “in the same area for the environment” which of course will lead to an illegal action.

You can use it like:

import grid2op
from grid2op.Agent import RecoPowerlinePerArea

env_name = "l2rpn_idf_2023" # (or any other env name supporting the feature)
env = grid2op.make(env_name)
agent = RecoPowerlinePerArea(env.action_space, env._game_rules.legal_action.substations_id_by_area)

Methods:

act(observation, reward[, done])

This is the main method of an BaseAgent.

act(observation: BaseObservation, reward: float, done: bool = False)[source]

This is the main method of an BaseAgent. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).

Notes

In order to be reproducible, and to make proper use of the BaseAgent.seed() capabilities, you must absolutely NOT use the random python module (which will not be seeded) nor the np.random module and avoid any other “sources” of pseudo random numbers.

You can adapt your code the following way. Instead of using np.random use self.space_prng.

For example, if you wanted to write np.random.randint(1,5) replace it by self.space_prng.randint(1,5). It is the same for np.random.normal() that is replaced by self.space_prng.normal().

You have an example of such usage in RandomAgent.my_act().

If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the BaseAgent.seed() accordingly. In that

Parameters:

observation (grid2op.Observation.BaseObservation) – The current observation of the grid2op.Environment.Environment
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns:

res – The action chosen by the bot / controler / agent.

Return type:

grid2op.Action.PlaybleAction

class grid2op.Agent.TopologyGreedy(action_space)[source]

This is a GreedyAgent example, which will attempt to reconfigure the substations connectivity.

It will choose among:

doing nothing

changing the topology of one substation.

To choose, it will simulate the outcome of all actions, and then chose the action leading to the best rewards.

Methods:

_get_tested_action(observation)

Returns the list of all the candidate actions.

_get_tested_action(observation)[source]

Returns the list of all the candidate actions.

From this list, the one that achieve the best “simulated reward” is used.

Parameters:: observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment
Returns:: res – A list of all candidate grid2op.BaseAction.BaseAction
Return type:: list

If you still can’t find what you’re looking for, try in one of the following pages:

Still trouble finding the information ? Do not hesitate to send a github issue about the documentation at this link: Documentation issue template

Agent

Objectives

Detailed Documentation by class

Agent 

Objectives 

Detailed Documentation by class 