Agent

This page is organized as follow:

Objectives

In this RL framework, an Agent is an entity that acts on the Environment (modeled in grid2op as an object of class Environment). In grid2op such entity is modeled by the BaseAgent class. It can alternatively be named “bot” or “controller” in other literature.

This module presents a few possible BaseAgent that can serve either as baseline, or as example on how to implement such agents. NB Stronger baselines are defined in an another repository.

To perform their actions, agent receive two main signals from the grid2op.Environment:

Both these signals can be use to determine what is the best action to perform on the grid. This is actually the main objective of an BaseAgent, and this is done in the BaseAgent.act() method.

To get started coding your agent we encourage you to read the description of the Action to know how to implement your action. Don’t hesitate to have a look at the Easier actions manipulation for an easier / higher level action manipulation.

Once you know how to manipulate a powergrid in case of the grid2op framework, you can easily implement an agent following this example

import grid2op
from grid2op.Agent import BaseAgent

class MyCustomAgent(BaseAgent):
    def __init__(self, action_space, something_else, and_another_something):
        # define here the constructor of your agent
        # here we say our agent needs "something_else" and "and_another_something"
        # to be built just to demonstrate it does not cause any problem to extend the
        # construction of the base class BaseAgent that only takes "action_space" as a constructor
        BaseAgent.__init__(self, action_space)
        self.something_else = something_else
        self.and_another_something = and_another_something

    def act(obs, reward, done=False):
        # this is the only method you need to implement
        # it takes an observation obs (and a reward and a flag)
        # and should return a valid action
        dictionary_describing_the_action = {}  # this can be anything you want that grid2op understands
        my_action = env.action_space(dictionary_describing_the_action)
        return my_action

Detailed Documentation by class

Classes:

AgentWithConverter(action_space[, …])

Compared to a regular BaseAgent, these types of Agents are able to deal with a different representation of grid2op.Action.BaseAction and grid2op.Observation.BaseObservation.

BaseAgent(action_space)

This class represents the base class of an BaseAgent.

DeltaRedispatchRandomAgent(action_space[, …])

INTERNAL

DoNothingAgent(action_space)

This is the most basic BaseAgent.

GreedyAgent(action_space)

This is a class of “Greedy BaseAgent”.

MLAgent(action_space[, action_space_converter])

This agent allows to handle only vectors.

OneChangeThenNothing(action_space)

This is a specific kind of BaseAgent.

PowerLineSwitch(action_space)

This is a GreedyAgent example, which will attempt to disconnect powerlines.

RandomAgent(action_space[, …])

This agent acts randomly on the powergrid.

RecoPowerlineAgent(action_space)

This is a GreedyAgent example, which will attempt to reconnect powerlines: for each disconnected powerline that can be reconnected, it will simulate the effect of reconnecting it.

TopologyGreedy(action_space)

This is a GreedyAgent example, which will attempt to reconfigure the substations connectivity.

class grid2op.Agent.AgentWithConverter(action_space, action_space_converter=None, **kwargs_converter)[source]

Compared to a regular BaseAgent, these types of Agents are able to deal with a different representation of grid2op.Action.BaseAction and grid2op.Observation.BaseObservation.

As any other Agents, AgentWithConverter will implement the BaseAgent.act() method. But for them, it’s slightly different.

They receive in this method an observation, as an object (ie an instance of grid2op.Observation.BaseObservation). This object can then be converted to any other object with the method AgentWithConverter.convert_obs().

Then, this transformed_observation is pass to the method AgentWithConverter.my_act() that is supposed to be defined for each agents. This function outputs an encoded_act which can be whatever you want to be.

Finally, the encoded_act is decoded into a proper action, object of class grid2op.Action.BaseAction, thanks to the method AgentWithConverter.convert_act().

This allows, for example, to represent actions as integers to train more easily standard discrete control algorithm used to solve atari games for example.

NB It is possible to define AgentWithConverter.convert_obs() and AgentWithConverter.convert_act()

or to define a grid2op.Converters.Converter and feed it to the action_space_converter parameters used to initialise the class. The second option is preferred, as the AgentWithConverter.action_space will then directly be this converter. Such an BaseAgent will really behave as if the actions are encoded the way he wants.

Examples

For example, imagine an BaseAgent uses a neural networks to take its decision.

Suppose also that, after some features engineering, it’s best for the neural network to use only the load active values (grid2op.Observation.BaseObservation.load_p) and the sum of the relative flows (grid2op.Observation.BaseObservation.rho) with the active flow (grid2op.Observation.BaseObservation.p_or) [NB that agent would not make sense a priori, but who knows]

Suppose that this neural network can be accessed with a class AwesomeNN (not available…) that can predict some actions. It can be loaded with the “load” method and make predictions with the “predict” method.

For the sake of the examples, we will suppose that this agent only predicts powerline status (so 0 or 1) that are represented as vector. So we need to take extra care to convert this vector from a numpy array to a valid action.

This is done below:

import grid2op
import AwesomeNN # this does not exists!
# create a simple environment
env = grid2op.make()

# define the class above
class AgentCustomObservation(AgentWithConverter):
    def __init__(self, action_space, path):
        AgentWithConverter.__init__(self, action_space)
        self.my_neural_network = AwesomeNN()
        self.my_neural_networl.load(path)

    def convert_obs(self, observation):
        # convert the observation
        return np.concatenate((observation.load_p, observation.rho + observation.p_or))

    def convert_act(self, encoded_act):
        # convert back the action, output from the NN "self.my_neural_network"
        # to a valid action
        act = self.action_space({"set_status": encoded_act})

    def my_act(self, transformed_observation, reward, done=False):
        act_predicted = self.my_neural_network(transformed_observation)
        return act_predicted


# make the agent that behaves as expected.
my_agent = AgentCustomObservation(action_space=env.action_space, path=".")

# this agent is perfectly working :-) You can use it as any other agents.
action_space_converter

The converter that is used to represents the BaseAgent action space. Might be set to None if not initialized

Type

grid2op.Converters.Converter

init_action_space

The initial action space. This corresponds to the action space of the grid2op.Environment.Environment.

Type

grid2op.Action.ActionSpace

action_space

If a converter is used, then this action space represents is this converter. The agent will behave as if the action space is directly encoded the way it wants.

Type

grid2op.Converters.ActionSpace

Methods:

act(observation, reward, done=False)[source]

Standard method of an BaseAgent. There is no need to overload this function.

Parameters
  • observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment

  • reward (float) – The current reward. This is the reward obtained by the previous action

  • done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns

res – The action chosen by the bot / controler / agent.

Return type

grid2op.Action.Action

convert_act(encoded_act)[source]

This function will convert an “ecnoded action” that be of any types, to a valid action that can be ingested by the environment.

Parameters

encoded_act (object) – Anything that represents an action.

Returns

act – A valid actions, represented as a class, that corresponds to the encoded action given as input.

Return type

:grid2op.BaseAction.BaseAction`

convert_obs(observation)[source]

This function convert the observation, that is an object of class grid2op.Observation.BaseObservation into a representation understandable by the BaseAgent.

For example, and agent could only want to look at the relative flows grid2op.Observation.BaseObservation.rho to take his decision. This is possible by overloading this method.

This method can also be used to scale the observation such that each compononents has mean 0 and variance 1 for example.

Parameters

observation (grid2op.Observation.Observation) – Initial observation received by the agent in the BaseAgent.act() method.

Returns

res – Anything that will be used by the BaseAgent to take decisions.

Return type

object

abstractmethod my_act(transformed_observation, reward, done=False)[source]

This method should be override if this class is used. It is an “abstract” method.

If someone wants to make a agent that handles different kinds of actions an observation.

Parameters
  • transformed_observation (object) – Anything that will be used to create an action. This is the results to the call of AgentWithConverter.convert_obs(). This is likely a numpy array.

  • reward (float) – The current reward. This is the reward obtained by the previous action

  • done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns

res – A representation of an action in any possible format. This action will then be ingested and formatted into a valid action with the AgentWithConverter.convert_act() method.

Return type

object

seed(seed)[source]

Seed the agent AND the associated converter if it needs to be seeded.

See a more detailed explanation in BaseAgent.seed() for more information about seeding.

class grid2op.Agent.BaseAgent(action_space)[source]

This class represents the base class of an BaseAgent. All bot / controller / agent used in the Grid2Op simulator should derived from this class.

To work properly, it is advise to create BaseAgent after the grid2op.Environment has been created and reuse the grid2op.Environment.Environment.action_space to build the BaseAgent.

action_space

It represent the action space ie a tool that can serve to create valid action. Note that a valid action can be illegal or ambiguous, and so lead to a “game over” or to a error. But at least it will have a proper size.

Type

grid2op.Action.ActionSpace

Methods:

abstractmethod act(observation, reward, done=False)[source]

This is the main method of an BaseAgent. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).

Notes

In order to be reproducible, and to make proper use of the BaseAgent.seed() capabilities, you must absolutely NOT use the random python module (which will not be seeded) nor the np.random module and avoid any other “sources” of pseudo random numbers.

You can adapt your code the following way. Instead of using np.random use self.space_prng.

For example, if you wanted to write np.random.randint(1,5) replace it by self.space_prng.randint(1,5). It is the same for np.random.normal() that is replaced by self.space_prng.normal().

You have an example of such usage in RandomAgent.my_act().

If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the BaseAgent.seed() accordingly. In that

Parameters
Returns

res – The action chosen by the bot / controler / agent.

Return type

grid2op.Action.PlaybleAction

reset(obs)[source]

This method is called at the beginning of a new episode. It is implemented by agents to reset their internal state if needed.

obs

The first observation corresponding to the initial state of the environment.

Type

grid2op.Observation.BaseObservation

seed(seed)[source]

This function is used to guarantee that the “pseudo random numbers” generated and used by the agent instance will be deterministic.

This guarantee, if the recommendation in BaseAgent.act() are followed that the agent will produce the same set of actions if it faces the same observations in the same order. This is particularly important for random agent.

You can override this function with the method of your choosing, but if you do so, don’t forget to call super().seed(seed).

Parameters

seed (int) – The seed used

Returns

seed – a tuple of seed used

Return type

tuple

class grid2op.Agent.DeltaRedispatchRandomAgent(action_space, n_gens_to_redispatch=2, redispatching_delta=1.0)[source]

INTERNAL

Warning

/!\ Internal, do not use unless you know what you are doing /!\

Used for test. Prefer using a random agent by selecting only the redispatching action that you want.

This agent will perform some redispatch of a given amount among randomly selected dispatchable generators.

Parameters
  • action_space (grid2op.Action.ActionSpace) – the Grid2Op action space

  • n_gens_to_redispatch (int) – The maximum number of dispatchable generators to play with

  • redispatching_delta (float) – The redispatching MW value used in both directions

Methods:

act(observation, reward, done=False)[source]

This is the main method of an BaseAgent. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).

Notes

In order to be reproducible, and to make proper use of the BaseAgent.seed() capabilities, you must absolutely NOT use the random python module (which will not be seeded) nor the np.random module and avoid any other “sources” of pseudo random numbers.

You can adapt your code the following way. Instead of using np.random use self.space_prng.

For example, if you wanted to write np.random.randint(1,5) replace it by self.space_prng.randint(1,5). It is the same for np.random.normal() that is replaced by self.space_prng.normal().

You have an example of such usage in RandomAgent.my_act().

If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the BaseAgent.seed() accordingly. In that

Parameters
Returns

res – The action chosen by the bot / controler / agent.

Return type

grid2op.Action.PlaybleAction

class grid2op.Agent.DoNothingAgent(action_space)[source]

This is the most basic BaseAgent. It is purely passive, and does absolutely nothing.

As opposed to most reinforcement learning environments, in grid2op, doing nothing is often the best solution.

Methods:

act(observation, reward[, done])

As better explained in the document of grid2op.BaseAction.update() or grid2op.BaseAction.ActionSpace.__call__().

act(observation, reward, done=False)[source]

As better explained in the document of grid2op.BaseAction.update() or grid2op.BaseAction.ActionSpace.__call__().

The preferred way to make an object of type action is to call grid2op.BaseAction.ActionSpace.__call__() with the dictionary representing the action. In this case, the action is “do nothing” and it is represented by the empty dictionary.

Parameters
  • observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment

  • reward (float) – The current reward. This is the reward obtained by the previous action

  • done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns

res – The action chosen by the bot / controller / agent.

Return type

grid2op.Action.Action

class grid2op.Agent.GreedyAgent(action_space)[source]

This is a class of “Greedy BaseAgent”. Greedy agents are all executing the same kind of algorithm to take action:

  1. They grid2op.Observation.Observation.simulate() all actions in a given set

  2. They take the action that maximise the simulated reward among all these actions

This class is an abstract class (object of this class cannot be created). To create “GreedyAgent” one must override this class. Examples are provided with PowerLineSwitch and TopologyGreedy.

Methods:

abstractmethod _get_tested_action(observation)[source]

Returns the list of all the candidate actions.

From this list, the one that achieve the best “simulated reward” is used.

Parameters

observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment

Returns

res – A list of all candidate grid2op.BaseAction.BaseAction

Return type

list

act(observation, reward, done=False)[source]

By definition, all “greedy” agents are acting the same way. The only thing that can differentiate multiple agents is the actions that are tested.

These actions are defined in the method _get_tested_action(). This act() method implements the greedy logic: take the actions that maximizes the instantaneous reward on the simulated action.

Parameters
  • observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment

  • reward (float) – The current reward. This is the reward obtained by the previous action

  • done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns

res – The action chosen by the bot / controller / agent.

Return type

grid2op.Action.Action

class grid2op.Agent.MLAgent(action_space, action_space_converter=<class 'grid2op.Converter.ToVect.ToVect'>, **kwargs_converter)[source]

This agent allows to handle only vectors. The “my_act” function will return “do nothing” action (so it needs to be override)

In this class, the “my_act” is expected to return a vector that can be directly converted into a valid action.

Methods:

convert_from_vect(act)[source]

Helper to convert an action, represented as a numpy array as an grid2op.BaseAction instance.

Parameters

act (numppy.ndarray) – An action cast as an grid2op.BaseAction.BaseAction instance.

Returns

res – The act parameters converted into a proper grid2op.BaseAction.BaseAction object.

Return type

grid2op.Action.Action

my_act(transformed_observation, reward, done=False)[source]

By default this agent returns only the “do nothing” action, unless some smarter implementations are provided for this function.

Parameters
  • transformed_observation (numpy.ndarray, dtype=float) – The observation transformed into a 1d numpy array of float. All components of the observation are kept.

  • reward (float) – Reward of the previous action

  • done (bool) – Whether the episode is over or not.

Returns

res – The action taken represented as a vector.

Return type

numpy.ndarray, dtype=float

class grid2op.Agent.OneChangeThenNothing(action_space)[source]

This is a specific kind of BaseAgent. It does an BaseAction (possibly non empty) at the first time step and then does nothing.

This class is an abstract class and cannot be instanciated (ie no object of this class can be created). It must be overridden and the method OneChangeThenNothing._get_dict_act() be defined. Basically, it must know what action to do.

my_dict

Representation, as a dictionnary of the only action that this Agent will do at the first time step.

Type

dict (class member)

Examples

We advise to use this class as following

import grid2op
from grid2op.Agent import OneChangeThenNothing
acts_dict_ = [{}, {"set_line_status": [(0,-1)]}]  # list of dictionaries. Each dictionary
# represents a valid action

env = grid2op.make()  # create an environment
for act_as_dict in zip(acts_dict_):
    # generate the proper class that will perform the first action (encoded by {}) in acts_dict_
    agent_class = OneChangeThenNothing.gen_next(act_as_dict)

    # start a runner with this agent
    runner = Runner(**env.get_params_for_runner(), agentClass=agent_class)
    # run 2 episode with it
    res_2 = runner.run(nb_episode=2)

Methods:

_get_dict_act()[source]

Function that need to be overridden to indicate which action to perform.

Returns

res – A dictionnary that can be converted into a valid grid2op.BaseAction.BaseAction. See the help of grid2op.BaseAction.ActionSpace.__call__() for more information.

Return type

dict

act(observation, reward, done=False)[source]

This is the main method of an BaseAgent. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).

Notes

In order to be reproducible, and to make proper use of the BaseAgent.seed() capabilities, you must absolutely NOT use the random python module (which will not be seeded) nor the np.random module and avoid any other “sources” of pseudo random numbers.

You can adapt your code the following way. Instead of using np.random use self.space_prng.

For example, if you wanted to write np.random.randint(1,5) replace it by self.space_prng.randint(1,5). It is the same for np.random.normal() that is replaced by self.space_prng.normal().

You have an example of such usage in RandomAgent.my_act().

If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the BaseAgent.seed() accordingly. In that

Parameters
Returns

res – The action chosen by the bot / controler / agent.

Return type

grid2op.Action.PlaybleAction

classmethod gen_next(dict_)[source]

This function allows to change the dictionnary of the action that the agent will perform.

See the class level documentation for an example on how to use this.

Parameters

dict (dict) – A dictionnary representing an action. This dictionnary is assumed to be convertible into an action. No check is performed at this stage.

reset(obs)[source]

This method is called at the beginning of a new episode. It is implemented by agents to reset their internal state if needed.

obs

The first observation corresponding to the initial state of the environment.

Type

grid2op.Observation.BaseObservation

class grid2op.Agent.PowerLineSwitch(action_space)[source]

This is a GreedyAgent example, which will attempt to disconnect powerlines.

It will choose among:

  • doing nothing

  • changing the status of one powerline

which action that will maximize the simulated reward. All powerlines are tested at each steps. This means that if n is the number of powerline on the grid, at each steps this actions will perform n +1 calls to “simulate” (one to do nothing and one that change the status of each powerline)

Methods:

_get_tested_action(observation)

Returns the list of all the candidate actions.

_get_tested_action(observation)[source]

Returns the list of all the candidate actions.

From this list, the one that achieve the best “simulated reward” is used.

Parameters

observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment

Returns

res – A list of all candidate grid2op.BaseAction.BaseAction

Return type

list

class grid2op.Agent.RandomAgent(action_space, action_space_converter=<class 'grid2op.Converter.IdToAct.IdToAct'>, **kwargs_converter)[source]

This agent acts randomly on the powergrid. It uses the grid2op.Converters.IdToAct to compute all the possible actions available for the environment. And then chooses a random one among all these.

Notes

Actions are taken uniformly at random among unary actions. For example, if a game rules allows to take actions that can disconnect a powerline AND modify the topology of a substation an action that do both will not be sampled by this class.

This agent is not equivalent to calling env.action_space.sample() because the sampling is not done the same manner. This agent sample uniformly among all unary actions whereas env.action_space.sample() (see grid2op.Action.SerializableActionSpace.sample() for more information about the later).

Methods:

my_act(transformed_observation, reward, done=False)[source]

A random agent will “simply” draw a random number between 0 and the number of action, and return this action.

This is equivalent to draw uniformly at random a feasible action.

Notes

In order to be working as intended, it is crucial that this method does not rely on any other source of “pseudo randomness” than grid2op.Space.RandomObject.space_prng.

In particular, you must avoid to use np.random.XXXX or the random python module. You can replace any call to np.random.XXX by self.space_prng.XXX (eg np.random.randint(1,5) can be replaced by self.space_prng.randint(1,5)).

If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the BaseAgent.seed() accordingly.

class grid2op.Agent.RecoPowerlineAgent(action_space)[source]

This is a GreedyAgent example, which will attempt to reconnect powerlines: for each disconnected powerline that can be reconnected, it will simulate the effect of reconnecting it. And reconnect the one that lead to the highest simulated reward.

Methods:

_get_tested_action(observation)[source]

Returns the list of all the candidate actions.

From this list, the one that achieve the best “simulated reward” is used.

Parameters

observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment

Returns

res – A list of all candidate grid2op.BaseAction.BaseAction

Return type

list

class grid2op.Agent.TopologyGreedy(action_space)[source]

This is a GreedyAgent example, which will attempt to reconfigure the substations connectivity.

It will choose among:

  • doing nothing

  • changing the topology of one substation.

To choose, it will simulate the outcome of all actions, and then chose the action leading to the best rewards.

Methods:

_get_tested_action(observation)[source]

Returns the list of all the candidate actions.

From this list, the one that achieve the best “simulated reward” is used.

Parameters

observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment

Returns

res – A list of all candidate grid2op.BaseAction.BaseAction

Return type

list

If you still can’t find what you’re looking for, try in one of the following pages:

Still trouble finding the information ? Do not hesitate to send a github issue about the documentation at this link: Documentation issue template