Agent
This page is organized as follow:
Objectives
In this RL framework, an Agent is an entity that acts on the Environment (modeled in grid2op as an object
of class Environment
). In grid2op such entity is modeled by the BaseAgent
class.
It can alternatively be named “bot” or “controller” in other literature.
This module presents a few possible BaseAgent
that can serve either as baseline, or as example on how to
implement such agents. NB Stronger baselines are defined in an another repository.
To perform their actions, agent receive two main signals from the grid2op.Environment
:
the
grid2op.Reward.BaseReward
that states how good the previous has beenthe
grid2op.Observation.BaseObservation
that is a (partial) view on the state of the Environment.
Both these signals can be use to determine what is the best action to perform on the grid. This is actually the main
objective of an BaseAgent
, and this is done in the BaseAgent.act()
method.
To get started coding your agent we encourage you to read the description of the Action to know how to implement your action. Don’t hesitate to have a look at the Easier actions manipulation for an easier / higher level action manipulation.
Once you know how to manipulate a powergrid in case of the grid2op framework, you can easily implement an agent following this example
import grid2op
from grid2op.Agent import BaseAgent
class MyCustomAgent(BaseAgent):
def __init__(self, action_space, something_else, and_another_something):
# define here the constructor of your agent
# here we say our agent needs "something_else" and "and_another_something"
# to be built just to demonstrate it does not cause any problem to extend the
# construction of the base class BaseAgent that only takes "action_space" as a constructor
BaseAgent.__init__(self, action_space)
self.something_else = something_else
self.and_another_something = and_another_something
def act(obs, reward, done=False):
# this is the only method you need to implement
# it takes an observation obs (and a reward and a flag)
# and should return a valid action
dictionary_describing_the_action = {} # this can be anything you want that grid2op understands
my_action = env.action_space(dictionary_describing_the_action)
return my_action
Detailed Documentation by class
Classes:
|
Compared to a regular BaseAgent, these types of Agents are able to deal with a different representation of |
|
This is a |
|
This class represents the base class of an BaseAgent. |
|
INTERNAL |
|
This is the most basic BaseAgent. |
|
This type of agent will perform some actions based on a provided list of actions. |
|
This is a class of "Greedy BaseAgent". |
|
This agent allows to handle only vectors. |
|
|
|
This is a |
|
This agent acts randomly on the powergrid. |
|
This is a |
|
This class acts like the |
|
This is a |
- class grid2op.Agent.AgentWithConverter(action_space, action_space_converter=None, **kwargs_converter)[source]
Compared to a regular BaseAgent, these types of Agents are able to deal with a different representation of
grid2op.Action.BaseAction
andgrid2op.Observation.BaseObservation
.As any other Agents, AgentWithConverter will implement the
BaseAgent.act()
method. But for them, it’s slightly different.They receive in this method an observation, as an object (ie an instance of
grid2op.Observation.BaseObservation
). This object can then be converted to any other object with the methodAgentWithConverter.convert_obs()
.Then, this transformed_observation is pass to the method
AgentWithConverter.my_act()
that is supposed to be defined for each agents. This function outputs an encoded_act which can be whatever you want to be.Finally, the encoded_act is decoded into a proper action, object of class
grid2op.Action.BaseAction
, thanks to the methodAgentWithConverter.convert_act()
.This allows, for example, to represent actions as integers to train more easily standard discrete control algorithm used to solve atari games for example.
- NB It is possible to define
AgentWithConverter.convert_obs()
andAgentWithConverter.convert_act()
or to define a
grid2op.Converters.Converter
and feed it to the action_space_converter parameters used to initialise the class. The second option is preferred, as theAgentWithConverter.action_space
will then directly be this converter. Such an BaseAgent will really behave as if the actions are encoded the way he wants.
Examples
For example, imagine an BaseAgent uses a neural networks to take its decision.
Suppose also that, after some features engineering, it’s best for the neural network to use only the load active values (
grid2op.Observation.BaseObservation.load_p
) and the sum of the relative flows (grid2op.Observation.BaseObservation.rho
) with the active flow (grid2op.Observation.BaseObservation.p_or
) [NB that agent would not make sense a priori, but who knows]Suppose that this neural network can be accessed with a class AwesomeNN (not available…) that can predict some actions. It can be loaded with the “load” method and make predictions with the “predict” method.
For the sake of the examples, we will suppose that this agent only predicts powerline status (so 0 or 1) that are represented as vector. So we need to take extra care to convert this vector from a numpy array to a valid action.
This is done below:
import grid2op import AwesomeNN # this does not exists! # create a simple environment env = grid2op.make("l2rpn_case14_sandbox") # define the class above class AgentCustomObservation(AgentWithConverter): def __init__(self, action_space, path): AgentWithConverter.__init__(self, action_space) self.my_neural_network = AwesomeNN() self.my_neural_networl.load(path) def convert_obs(self, observation): # convert the observation return np.concatenate((observation.load_p, observation.rho + observation.p_or)) def convert_act(self, encoded_act): # convert back the action, output from the NN "self.my_neural_network" # to a valid action act = self.action_space({"set_status": encoded_act}) def my_act(self, transformed_observation, reward, done=False): act_predicted = self.my_neural_network(transformed_observation) return act_predicted # make the agent that behaves as expected. my_agent = AgentCustomObservation(action_space=env.action_space, path=".") # this agent is perfectly working :-) You can use it as any other agents.
- action_space_converter
The converter that is used to represents the BaseAgent action space. Might be set to
None
if not initialized- Type:
grid2op.Converters.Converter
- init_action_space
The initial action space. This corresponds to the action space of the
grid2op.Environment.Environment
.
- action_space
If a converter is used, then this action space represents is this converter. The agent will behave as if the action space is directly encoded the way it wants.
- Type:
grid2op.Converters.ActionSpace
Methods:
act
(observation, reward[, done])Standard method of an
BaseAgent
.convert_act
(encoded_act)This function will convert an "ecnoded action" that be of any types, to a valid action that can be ingested by the environment.
convert_obs
(observation)This function convert the observation, that is an object of class
grid2op.Observation.BaseObservation
into a representation understandable by the BaseAgent.my_act
(transformed_observation, reward[, done])This method should be override if this class is used.
seed
(seed)Seed the agent AND the associated converter if it needs to be seeded.
- act(observation, reward, done=False)[source]
Standard method of an
BaseAgent
. There is no need to overload this function.- Parameters:
observation (
grid2op.Observation.Observation
) – The current observation of thegrid2op.Environment.Environment
reward (
float
) – The current reward. This is the reward obtained by the previous actiondone (
bool
) – Whether the episode has ended or not. Used to maintain gym compatibility
- Returns:
res – The action chosen by the bot / controler / agent.
- Return type:
grid2op.Action.Action
- convert_act(encoded_act)[source]
This function will convert an “ecnoded action” that be of any types, to a valid action that can be ingested by the environment.
- Parameters:
encoded_act (
object
) – Anything that represents an action.- Returns:
act – A valid actions, represented as a class, that corresponds to the encoded action given as input.
- Return type:
:grid2op.BaseAction.BaseAction`
- convert_obs(observation)[source]
This function convert the observation, that is an object of class
grid2op.Observation.BaseObservation
into a representation understandable by the BaseAgent.For example, and agent could only want to look at the relative flows
grid2op.Observation.BaseObservation.rho
to take his decision. This is possible by overloading this method.This method can also be used to scale the observation such that each compononents has mean 0 and variance 1 for example.
- Parameters:
observation (
grid2op.Observation.Observation
) – Initial observation received by the agent in theBaseAgent.act()
method.- Returns:
res – Anything that will be used by the BaseAgent to take decisions.
- Return type:
object
- abstractmethod my_act(transformed_observation, reward, done=False)[source]
This method should be override if this class is used. It is an “abstract” method.
If someone wants to make a agent that handles different kinds of actions an observation.
- Parameters:
transformed_observation (
object
) – Anything that will be used to create an action. This is the results to the call ofAgentWithConverter.convert_obs()
. This is likely a numpy array.reward (
float
) – The current reward. This is the reward obtained by the previous actiondone (
bool
) – Whether the episode has ended or not. Used to maintain gym compatibility
- Returns:
res – A representation of an action in any possible format. This action will then be ingested and formatted into a valid action with the
AgentWithConverter.convert_act()
method.- Return type:
object
- seed(seed)[source]
Seed the agent AND the associated converter if it needs to be seeded.
See a more detailed explanation in
BaseAgent.seed()
for more information about seeding.
- NB It is possible to define
- class grid2op.Agent.AlertAgent(action_space, grid_controler=<class 'grid2op.Agent.recoPowerlineAgent.RecoPowerlineAgent'>, percentage_alert=30, simu_step=1, threshold=0.99)[source]
This is a
AlertAgent
example, which will attempt to reconnect powerlines and send alerts on the worst possible attacks: for each disconnected powerline that can be reconnected, it will simulate the effect of reconnecting it. And reconnect the one that lead to the highest simulated reward. It will also simulate the effect of having a line disconnection on attackable lines and raise alerts for the worst onesMethods:
act
(observation, reward[, done])This is the main method of an BaseAgent.
- act(observation: BaseObservation, reward: float, done: bool = False) BaseAction [source]
This is the main method of an BaseAgent. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).
Notes
In order to be reproducible, and to make proper use of the
BaseAgent.seed()
capabilities, you must absolutely NOT use the random python module (which will not be seeded) nor the np.random module and avoid any other “sources” of pseudo random numbers.You can adapt your code the following way. Instead of using np.random use self.space_prng.
For example, if you wanted to write np.random.randint(1,5) replace it by self.space_prng.randint(1,5). It is the same for np.random.normal() that is replaced by self.space_prng.normal().
You have an example of such usage in
RandomAgent.my_act()
.If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the
BaseAgent.seed()
accordingly. In that- Parameters:
observation (
grid2op.Observation.BaseObservation
) – The current observation of thegrid2op.Environment.Environment
reward (
float
) – The current reward. This is the reward obtained by the previous actiondone (
bool
) – Whether the episode has ended or not. Used to maintain gym compatibility
- Returns:
res – The action chosen by the bot / controler / agent.
- Return type:
grid2op.Action.PlaybleAction
- class grid2op.Agent.BaseAgent(action_space: ActionSpace)[source]
This class represents the base class of an BaseAgent. All bot / controller / agent used in the Grid2Op simulator should derived from this class.
To work properly, it is advise to create BaseAgent after the
grid2op.Environment
has been created and reuse thegrid2op.Environment.Environment.action_space
to build the BaseAgent.- action_space
It represent the action space ie a tool that can serve to create valid action. Note that a valid action can be illegal or ambiguous, and so lead to a “game over” or to a error. But at least it will have a proper size.
Methods:
act
(observation, reward[, done])This is the main method of an BaseAgent.
load_state
(loadstate_path)An optional method to re-load the internal agent state that was saved with self.save_state.
reset
(obs)This method is called at the beginning of a new episode.
save_state
(savestate_path)An optional method to save the internal state of your agent.
seed
(seed)This function is used to guarantee that the "pseudo random numbers" generated and used by the agent instance will be deterministic.
- abstractmethod act(observation: BaseObservation, reward: float, done: bool = False) BaseAction [source]
This is the main method of an BaseAgent. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).
Notes
In order to be reproducible, and to make proper use of the
BaseAgent.seed()
capabilities, you must absolutely NOT use the random python module (which will not be seeded) nor the np.random module and avoid any other “sources” of pseudo random numbers.You can adapt your code the following way. Instead of using np.random use self.space_prng.
For example, if you wanted to write np.random.randint(1,5) replace it by self.space_prng.randint(1,5). It is the same for np.random.normal() that is replaced by self.space_prng.normal().
You have an example of such usage in
RandomAgent.my_act()
.If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the
BaseAgent.seed()
accordingly. In that- Parameters:
observation (
grid2op.Observation.BaseObservation
) – The current observation of thegrid2op.Environment.Environment
reward (
float
) – The current reward. This is the reward obtained by the previous actiondone (
bool
) – Whether the episode has ended or not. Used to maintain gym compatibility
- Returns:
res – The action chosen by the bot / controler / agent.
- Return type:
grid2op.Action.PlaybleAction
- load_state(loadstate_path: PathLike)[source]
An optional method to re-load the internal agent state that was saved with self.save_state. This can be useful to re-set your agent to an earlier simulation time step and reproduce past experiments with Grid2Op. Concept developed by Fraunhofer IEE KES.
Notes
First, the internal state your agent consists of attributes that are contained in the
grid2op.Agent.BaseAgent
andgrid2op.Agent.BaseAgent.action_space
. Such attributes can easily be re-set withsetattr()
.Second, your agent may contain custom attributes, such as e.g. a vector of line indices from a Grid2Op observation. You can re-set them with
setattr()
as well.Third, your agent may contain very specific modules such as Tensorflow that do not support the simple
setattr()
. However, these modules normally have their own methods to re-load an internal state. Examples of such methods areload_weights()
that you can integrate in your implementation of self.load_state.- Parameters:
savestate_path (
string
) – The path from which your agent state variables should be loaded
- reset(obs: BaseObservation)[source]
This method is called at the beginning of a new episode. It is implemented by agents to reset their internal state if needed.
- obs
The first observation corresponding to the initial state of the environment.
- save_state(savestate_path: PathLike)[source]
An optional method to save the internal state of your agent. The saved state can later be re-loaded with self.load_state, e.g. to repeat a Grid2Op time step with exactly the same internal parameterization. This can be useful to repeat Grid2Op experiments and analyze why your agent performed certain actions in past time steps. Concept developed by Fraunhofer IEE KES.
Notes
First, the internal state your agent consists of attributes that are contained in the
grid2op.Agent.BaseAgent
andgrid2op.Agent.BaseAgent.action_space
. Examples are the parameterization and seeds of the random number generator that your agent uses. Such attributes can easily be obtained with thegetattr()
and stored in a common file format, such as .npy.Second, your agent may contain custom attributes, such as e.g. a vector of line indices from a Grid2Op observation. You could obtain and save them in the same way as explained before.
Third, your agent may contain very specific modules such as Tensorflow that do not support the simple
getattr()
. However, these modules normally have their own methods to save an internal state. Examples of such methods aresave_weights()
that you can integrate in your implementation of self.save_state.- Parameters:
savestate_path (
string
) – The path to which your agent state variables should be saved
- seed(seed: int) None [source]
This function is used to guarantee that the “pseudo random numbers” generated and used by the agent instance will be deterministic.
This guarantee, if the recommendation in
BaseAgent.act()
are followed that the agent will produce the same set of actions if it faces the same observations in the same order. This is particularly important for random agent.You can override this function with the method of your choosing, but if you do so, don’t forget to call super().seed(seed).
- Parameters:
seed (
int
) – The seed used- Returns:
seed – a tuple of seed used
- Return type:
tuple
- class grid2op.Agent.DeltaRedispatchRandomAgent(action_space, n_gens_to_redispatch=2, redispatching_delta=1.0)[source]
INTERNAL
Warning
/!\ Internal, do not use unless you know what you are doing /!\
Used for test. Prefer using a random agent by selecting only the redispatching action that you want.
This agent will perform some redispatch of a given amount among randomly selected dispatchable generators.
- Parameters:
action_space (
grid2op.Action.ActionSpace
) – the Grid2Op action spacen_gens_to_redispatch (int) – The maximum number of dispatchable generators to play with
redispatching_delta (float) – The redispatching MW value used in both directions
Methods:
act
(observation, reward[, done])This is the main method of an BaseAgent.
- act(observation, reward, done=False)[source]
This is the main method of an BaseAgent. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).
Notes
In order to be reproducible, and to make proper use of the
BaseAgent.seed()
capabilities, you must absolutely NOT use the random python module (which will not be seeded) nor the np.random module and avoid any other “sources” of pseudo random numbers.You can adapt your code the following way. Instead of using np.random use self.space_prng.
For example, if you wanted to write np.random.randint(1,5) replace it by self.space_prng.randint(1,5). It is the same for np.random.normal() that is replaced by self.space_prng.normal().
You have an example of such usage in
RandomAgent.my_act()
.If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the
BaseAgent.seed()
accordingly. In that- Parameters:
observation (
grid2op.Observation.BaseObservation
) – The current observation of thegrid2op.Environment.Environment
reward (
float
) – The current reward. This is the reward obtained by the previous actiondone (
bool
) – Whether the episode has ended or not. Used to maintain gym compatibility
- Returns:
res – The action chosen by the bot / controler / agent.
- Return type:
grid2op.Action.PlaybleAction
- class grid2op.Agent.DoNothingAgent(action_space)[source]
This is the most basic BaseAgent. It is purely passive, and does absolutely nothing.
As opposed to most reinforcement learning environments, in grid2op, doing nothing is often the best solution.
Methods:
act
(observation, reward[, done])As better explained in the document of
grid2op.BaseAction.update()
orgrid2op.BaseAction.ActionSpace.__call__()
.- act(observation, reward, done=False)[source]
As better explained in the document of
grid2op.BaseAction.update()
orgrid2op.BaseAction.ActionSpace.__call__()
.The preferred way to make an object of type action is to call
grid2op.BaseAction.ActionSpace.__call__()
with the dictionary representing the action. In this case, the action is “do nothing” and it is represented by the empty dictionary.- Parameters:
observation (
grid2op.Observation.Observation
) – The current observation of thegrid2op.Environment.Environment
reward (
float
) – The current reward. This is the reward obtained by the previous actiondone (
bool
) – Whether the episode has ended or not. Used to maintain gym compatibility
- Returns:
res – The action chosen by the bot / controller / agent.
- Return type:
grid2op.Action.Action
- class grid2op.Agent.FromActionsListAgent(action_space, action_list=None)[source]
This type of agent will perform some actions based on a provided list of actions. If no action is provided for a given step (for example because it survives for more steps that the length of the provided action list, it will do nothing.
Notes
No check are performed to make sure the action types is compatible with the environment. For example, the environment might prevent to perform redispatching, but, at the creation of the agent, we do not ensure that no actions performing redispatching are performed.
Methods:
act
(observation, reward[, done])This is the main method of an BaseAgent.
- act(observation, reward, done=False)[source]
This is the main method of an BaseAgent. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).
Notes
In order to be reproducible, and to make proper use of the
BaseAgent.seed()
capabilities, you must absolutely NOT use the random python module (which will not be seeded) nor the np.random module and avoid any other “sources” of pseudo random numbers.You can adapt your code the following way. Instead of using np.random use self.space_prng.
For example, if you wanted to write np.random.randint(1,5) replace it by self.space_prng.randint(1,5). It is the same for np.random.normal() that is replaced by self.space_prng.normal().
You have an example of such usage in
RandomAgent.my_act()
.If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the
BaseAgent.seed()
accordingly. In that- Parameters:
observation (
grid2op.Observation.BaseObservation
) – The current observation of thegrid2op.Environment.Environment
reward (
float
) – The current reward. This is the reward obtained by the previous actiondone (
bool
) – Whether the episode has ended or not. Used to maintain gym compatibility
- Returns:
res – The action chosen by the bot / controler / agent.
- Return type:
grid2op.Action.PlaybleAction
- class grid2op.Agent.GreedyAgent(action_space)[source]
This is a class of “Greedy BaseAgent”. Greedy agents are all executing the same kind of algorithm to take action:
They
grid2op.Observation.Observation.simulate()
all actions in a given setThey take the action that maximise the simulated reward among all these actions
This class is an abstract class (object of this class cannot be created). To create “GreedyAgent” one must override this class. Examples are provided with
PowerLineSwitch
andTopologyGreedy
.Methods:
_get_tested_action
(observation)Returns the list of all the candidate actions.
act
(observation, reward[, done])By definition, all "greedy" agents are acting the same way.
- abstractmethod _get_tested_action(observation)[source]
Returns the list of all the candidate actions.
From this list, the one that achieve the best “simulated reward” is used.
- Parameters:
observation (
grid2op.Observation.Observation
) – The current observation of thegrid2op.Environment.Environment
- Returns:
res – A list of all candidate
grid2op.BaseAction.BaseAction
- Return type:
list
- act(observation, reward, done=False)[source]
By definition, all “greedy” agents are acting the same way. The only thing that can differentiate multiple agents is the actions that are tested.
These actions are defined in the method
_get_tested_action()
. Thisact()
method implements the greedy logic: take the actions that maximizes the instantaneous reward on the simulated action.- Parameters:
observation (
grid2op.Observation.Observation
) – The current observation of thegrid2op.Environment.Environment
reward (
float
) – The current reward. This is the reward obtained by the previous actiondone (
bool
) – Whether the episode has ended or not. Used to maintain gym compatibility
- Returns:
res – The action chosen by the bot / controller / agent.
- Return type:
grid2op.Action.Action
- class grid2op.Agent.MLAgent(action_space, action_space_converter=<class 'grid2op.Converter.ToVect.ToVect'>, **kwargs_converter)[source]
This agent allows to handle only vectors. The “my_act” function will return “do nothing” action (so it needs to be override)
In this class, the “my_act” is expected to return a vector that can be directly converted into a valid action.
Methods:
convert_from_vect
(act)Helper to convert an action, represented as a numpy array as an
grid2op.BaseAction
instance.my_act
(transformed_observation, reward[, done])By default this agent returns only the "do nothing" action, unless some smarter implementations are provided for this function.
- convert_from_vect(act)[source]
Helper to convert an action, represented as a numpy array as an
grid2op.BaseAction
instance.- Parameters:
act (
numppy.ndarray
) – An action cast as angrid2op.BaseAction.BaseAction
instance.- Returns:
res – The act parameters converted into a proper
grid2op.BaseAction.BaseAction
object.- Return type:
grid2op.Action.Action
- my_act(transformed_observation, reward, done=False)[source]
By default this agent returns only the “do nothing” action, unless some smarter implementations are provided for this function.
- Parameters:
transformed_observation (
numpy.ndarray
, dtype=float) – The observation transformed into a 1d numpy array of float. All components of the observation are kept.reward (
float
) – Reward of the previous actiondone (
bool
) – Whether the episode is over or not.
- Returns:
res – The action taken represented as a vector.
- Return type:
numpy.ndarray
, dtype=float
- class grid2op.Agent.OneChangeThenNothing(action_space)[source]
Warning
As of grid2op 1.10.2, this class has been deprecated. Please use env.reset(options={“init state”: THE_INITIAl_CHANGE}) instead.
This is a specific kind of BaseAgent. It does an BaseAction (possibly non empty) at the first time step and then does nothing.
This class is an abstract class and cannot be instanciated (ie no object of this class can be created). It must be overridden and the method
OneChangeThenNothing._get_dict_act()
be defined. Basically, it must know what action to do.- my_dict
Representation, as a dictionnary of the only action that this Agent will do at the first time step.
- Type:
dict
(class member)
Examples
This class is deprecated in favor of the “init state” reset options. Please avoid using it.
But if you really want to use it… then you can do it with:
# This class has been deprecated, please use the env.reset() # with proper options instead # DEPRECATED ! import grid2op from grid2op.Agent import OneChangeThenNothing acts_dict_ = [{}, {"set_line_status": [(0,-1)]}] # list of dictionaries. Each dictionary # represents a valid action env = grid2op.make("l2rpn_case14_sandbox") # create an environment for act_as_dict in zip(acts_dict_): # generate the proper class that will perform the first action (encoded by {}) in acts_dict_ agent_class = OneChangeThenNothing.gen_next(act_as_dict) # start a runner with this agent runner = Runner(**env.get_params_for_runner(), agentClass=agent_class) # run 2 episode with it res_2 = runner.run(nb_episode=2)
Notes
After grid2op 1.10.2, this class has been deprecated. A cleaner alternative to use it is to set the initial state of grid when calling env.reset like this:
import grid2op env = grid2op.make("l2rpn_case14_sandbox") # create an environment dict_act_json = ... # dict representing an action obs = env.reset(options={"init state": dict_act_json})
This way of doing offers:
more flexibility: rules are not checked
more flexibility: any type of actions acting on anything can be performed (even if the action would be illegal for the agent)
less trouble: cooldown are not affected
Methods:
Function that need to be overridden to indicate which action to perform.
act
(observation, reward[, done])This is the main method of an BaseAgent.
gen_next
(dict_)This function allows to change the dictionnary of the action that the agent will perform.
reset
(obs)This method is called at the beginning of a new episode.
- _get_dict_act()[source]
Function that need to be overridden to indicate which action to perform.
- Returns:
res – A dictionnary that can be converted into a valid
grid2op.BaseAction.BaseAction
. See the help ofgrid2op.BaseAction.ActionSpace.__call__()
for more information.- Return type:
dict
- act(observation, reward, done=False)[source]
This is the main method of an BaseAgent. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).
Notes
In order to be reproducible, and to make proper use of the
BaseAgent.seed()
capabilities, you must absolutely NOT use the random python module (which will not be seeded) nor the np.random module and avoid any other “sources” of pseudo random numbers.You can adapt your code the following way. Instead of using np.random use self.space_prng.
For example, if you wanted to write np.random.randint(1,5) replace it by self.space_prng.randint(1,5). It is the same for np.random.normal() that is replaced by self.space_prng.normal().
You have an example of such usage in
RandomAgent.my_act()
.If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the
BaseAgent.seed()
accordingly. In that- Parameters:
observation (
grid2op.Observation.BaseObservation
) – The current observation of thegrid2op.Environment.Environment
reward (
float
) – The current reward. This is the reward obtained by the previous actiondone (
bool
) – Whether the episode has ended or not. Used to maintain gym compatibility
- Returns:
res – The action chosen by the bot / controler / agent.
- Return type:
grid2op.Action.PlaybleAction
- classmethod gen_next(dict_)[source]
This function allows to change the dictionnary of the action that the agent will perform.
See the class level documentation for an example on how to use this.
- Parameters:
dict (
dict
) – A dictionnary representing an action. This dictionnary is assumed to be convertible into an action. No check is performed at this stage.
- class grid2op.Agent.PowerLineSwitch(action_space)[source]
This is a
GreedyAgent
example, which will attempt to disconnect powerlines.It will choose among:
doing nothing
changing the status of one powerline
which action that will maximize the simulated reward. All powerlines are tested at each steps. This means that if n is the number of powerline on the grid, at each steps this actions will perform n +1 calls to “simulate” (one to do nothing and one that change the status of each powerline)
Methods:
_get_tested_action
(observation)Returns the list of all the candidate actions.
- _get_tested_action(observation)[source]
Returns the list of all the candidate actions.
From this list, the one that achieve the best “simulated reward” is used.
- Parameters:
observation (
grid2op.Observation.Observation
) – The current observation of thegrid2op.Environment.Environment
- Returns:
res – A list of all candidate
grid2op.BaseAction.BaseAction
- Return type:
list
- class grid2op.Agent.RandomAgent(action_space, action_space_converter=<class 'grid2op.Converter.IdToAct.IdToAct'>, **kwargs_converter)[source]
This agent acts randomly on the powergrid. It uses the
grid2op.Converters.IdToAct
to compute all the possible actions available for the environment. And then chooses a random one among all these.Notes
Actions are taken uniformly at random among unary actions. For example, if a game rules allows to take actions that can disconnect a powerline AND modify the topology of a substation an action that do both will not be sampled by this class.
This agent is not equivalent to calling env.action_space.sample() because the sampling is not done the same manner. This agent sample uniformly among all unary actions whereas env.action_space.sample() (see
grid2op.Action.SerializableActionSpace.sample()
for more information about the later).Methods:
my_act
(transformed_observation, reward[, done])A random agent will "simply" draw a random number between 0 and the number of action, and return this action.
- my_act(transformed_observation, reward, done=False)[source]
A random agent will “simply” draw a random number between 0 and the number of action, and return this action.
This is equivalent to draw uniformly at random a feasible action.
Notes
In order to be working as intended, it is crucial that this method does not rely on any other source of “pseudo randomness” than
grid2op.Space.RandomObject.space_prng
.In particular, you must avoid to use np.random.XXXX or the random python module. You can replace any call to np.random.XXX by self.space_prng.XXX (eg np.random.randint(1,5) can be replaced by self.space_prng.randint(1,5)).
If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the
BaseAgent.seed()
accordingly.
- class grid2op.Agent.RecoPowerlineAgent(action_space)[source]
This is a
GreedyAgent
example, which will attempt to reconnect powerlines: for each disconnected powerline that can be reconnected, it will simulate the effect of reconnecting it. And reconnect the one that lead to the highest simulated reward.Methods:
_get_tested_action
(observation)Returns the list of all the candidate actions.
- _get_tested_action(observation)[source]
Returns the list of all the candidate actions.
From this list, the one that achieve the best “simulated reward” is used.
- Parameters:
observation (
grid2op.Observation.Observation
) – The current observation of thegrid2op.Environment.Environment
- Returns:
res – A list of all candidate
grid2op.BaseAction.BaseAction
- Return type:
list
- class grid2op.Agent.RecoPowerlinePerArea(action_space: ActionSpace, areas_by_sub_id: dict)[source]
This class acts like the
RecoPowerlineAgent
but it is able to reconnect multiple lines at the same steps (one line per area).The “areas” are defined by a list of list of substation id provided as input.
Of course the area you provide to the agent should be the same as the areas used in the rules of the game. Otherwise, the agent might try to reconnect two powerline “in the same area for the environment” which of course will lead to an illegal action.
You can use it like:
import grid2op from grid2op.Agent import RecoPowerlinePerArea env_name = "l2rpn_idf_2023" # (or any other env name supporting the feature) env = grid2op.make(env_name) agent = RecoPowerlinePerArea(env.action_space, env._game_rules.legal_action.substations_id_by_area)
Methods:
act
(observation, reward[, done])This is the main method of an BaseAgent.
- act(observation: BaseObservation, reward: float, done: bool = False)[source]
This is the main method of an BaseAgent. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).
Notes
In order to be reproducible, and to make proper use of the
BaseAgent.seed()
capabilities, you must absolutely NOT use the random python module (which will not be seeded) nor the np.random module and avoid any other “sources” of pseudo random numbers.You can adapt your code the following way. Instead of using np.random use self.space_prng.
For example, if you wanted to write np.random.randint(1,5) replace it by self.space_prng.randint(1,5). It is the same for np.random.normal() that is replaced by self.space_prng.normal().
You have an example of such usage in
RandomAgent.my_act()
.If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the
BaseAgent.seed()
accordingly. In that- Parameters:
observation (
grid2op.Observation.BaseObservation
) – The current observation of thegrid2op.Environment.Environment
reward (
float
) – The current reward. This is the reward obtained by the previous actiondone (
bool
) – Whether the episode has ended or not. Used to maintain gym compatibility
- Returns:
res – The action chosen by the bot / controler / agent.
- Return type:
grid2op.Action.PlaybleAction
- class grid2op.Agent.TopologyGreedy(action_space)[source]
This is a
GreedyAgent
example, which will attempt to reconfigure the substations connectivity.It will choose among:
doing nothing
changing the topology of one substation.
To choose, it will simulate the outcome of all actions, and then chose the action leading to the best rewards.
Methods:
_get_tested_action
(observation)Returns the list of all the candidate actions.
- _get_tested_action(observation)[source]
Returns the list of all the candidate actions.
From this list, the one that achieve the best “simulated reward” is used.
- Parameters:
observation (
grid2op.Observation.Observation
) – The current observation of thegrid2op.Environment.Environment
- Returns:
res – A list of all candidate
grid2op.BaseAction.BaseAction
- Return type:
list
If you still can’t find what you’re looking for, try in one of the following pages:
Still trouble finding the information ? Do not hesitate to send a github issue about the documentation at this link: Documentation issue template