Compatibility with gymnasium / gym 

The gymnasium framework in reinforcement learning is widely used. Starting from version 1.2.0 we improved the compatibility with this framework.

Starting with 1.9.1 we switch (as advised) from the legacy “gym” framework to the new “gymnasium” framework (gym is no longer maintained since v0.26.2, see https://www.gymlibrary.dev/). This change should not have any impact on older grid2op code except that you now need to use import gymnasium as gym instead of import gym in your base code.

Note

If you want to still use the “legacy” gym classes you can still do it with grid2op: Backward compatibility with openai gym is maintained.

Note

By default, if gymnasium is installed, all default classes from grid2op.gym_compat module will inherit from gymnasium. You can still retrieve the classes inheriting from gym (and not gymnasium).

More information on the section Gymnasium vs Gym

Before grid2op 1.2.0 only some classes fully implemented the gymnasium interface:

the grid2op.Environment (with methods such as env.reset, env.step etc.)
the grid2op.Agent (with the agent.act etc.)
the creation of pre defined environments (with grid2op.make)

Starting from 1.2.0 we implemented some automatic converters that are able to automatically map grid2op representation for the action space and the observation space into gymnasium “spaces”. More precisely these are represented as gym.spaces.Dict.

As of grid2op 1.4.0 we tighten the gap between gymnasium and grid2op by introducing the dedicated module grid2op.gym_compat . Withing this module there are lots of functionalities to convert a grid2op environment into a gymnasium environment (that inherit gymnasium.Env instead of “simply” implementing the gymnasium interface).

A simple usage is:

import grid2op
from grid2op.gym_compat import GymEnv

env_name = "l2rpn_case14_sandbox"  # or any other grid2op environment name
g2op_env = grid2op.make(env_name)  # create the gri2op environment

gym_env = GymEnv(g2op_env)  # create the gymnasium environment

# check that this is a properly defined gymnasium environment:
import gym
print(f"Is gym_env a gymnasium environment: {isinstance(gym_env, gym.Env)}")
# it shows "Is gym_env a gymnasium environment: True"

Note

To be as close as grid2op as possible, by default (using the methode discribed above) the action space will be encoded as a gymnasium.spaces.Dict with keys the attribute of a grid2op action. This might not be the best representation to perform RL with (some framework do not really like it…)

For more customization on that side, please refer to the section Customizing the action and observation space, into Box or Discrete below

Warning

The gym package has some breaking API change since its version 0.26. We attempted, in grid2op, to maintain compatibility both with former versions and later ones. This makes this class behave differently depending on the version of gymnasium you have installed !

The main changes involve the functions env.step and env.reset (core gymnasium functions)

This page is organized as follow:

Observation space and action space customization 

By default, the action space and observation space are gym.spaces.Dict with the keys being the attribute to modify.

Default Observations space 

For example, an observation space will look like:

“_shunt_p”: Box(env.n_shunt,) [type: float, low: -inf, high: inf]
“_shunt_q”: Box(env.n_shunt,) [type: float, low: -inf, high: inf]
“_shunt_v”: Box(env.n_shunt,) [type: float, low: -inf, high: inf]
“_shunt_bus”: Box(env.n_shunt,) [type: int, low: -inf, high: inf]
“a_ex”: Box(env.n_line,) [type: float, low: 0, high: inf]
“a_or”: Box(env.n_line,) [type: float, low: 0, high: inf]
“active_alert”: MultiBinary(env.dim_alerts)
“actual_dispatch”: Box(env.n_gen,)
“alert_duration”: Box(env.dim_alerts,) [type: int, low: 0, high: inf]
“attack_under_alert”: Box(env.dim_alerts,) [type: int, low: -1, high: inf]
“attention_budget”: Box(1,) [type: float, low: 0, high: inf]
“current_step”: Box(1,) [type: int, low: -inf, high: inf]
“curtailment”: Box(env.n_gen,) [type: float, low: 0., high: 1.0]
“curtailment_limit”: Box(env.n_gen,) [type: float, low: 0., high: 1.0]
“curtailment_limit_effective”: Box(env.n_gen,) [type: float, low: 0., high: 1.0]
“day”: Discrete(32)
“day_of_week”: Discrete(8)
“delta_time”: Box(0.0, inf, (1,), float32)
“duration_next_maintenance”: Box(env.n_line,) [type: int, low: -1, high: inf]
“gen_margin_down”: Box(env.n_gen,) [type: float, low: 0, high: env.gen_max_ramp_down]
“gen_margin_up”: Box(env.n_gen,) [type: float, low: 0, high: env.gen_max_ramp_up]
“gen_p”: Box(env.n_gen,) [type: float, low: env.gen_pmin, high: env.gen_pmax * 1.2]
“gen_p_before_curtail”: Box(env.n_gen,) [type: float, low: env.gen_pmin, high: env.gen_pmax * 1.2]
“gen_q”: Box(env.n_gen,) [type: float, low: -inf, high: inf]
“gen_theta”: Box(env.n_gen,) [type: float, low: -180, high: 180]
“gen_v”: Box(env.n_gen,) [type: float, low: 0, high: inf]
“hour_of_day”: Discrete(24)
“is_alarm_illegal”: Discrete(2)
“line_status”: MultiBinary(env.n_line)
“load_p”: Box(env.n_load,) [type: float, low: -inf, high: inf]
“load_q”: Box(env.n_load,) [type: float, low: -inf, high: inf]
“load_theta”: Box(env.n_load,) [type: float, low: -180, high: 180]
“load_v”: Box(env.n_load,) [type: float, low: -inf, high: inf]
“max_step”: Box(1,) [type: int, low: -inf, high: inf]
“minute_of_hour”: Discrete(60)
“month”: Discrete(13)
“p_ex”: Box(env.n_line,) [type: float, low: -inf, high: inf]
“p_or”: Box(env.n_line,) [type: float, low: -inf, high: inf]
“q_ex”: Box(env.n_line,) [type: float, low: -inf, high: inf]
“q_or”: Box(env.n_line,) [type: float, low: -inf, high: inf]
“rho”: Box(env.n_line,) [type: float, low: 0., high: inf]
“storage_charge”: Box(env.n_storage,) [type: float, low: 0., high: env.storage_Emax]
“storage_power”: Box(env.n_storage,) [type: float, low: -env.storage_max_p_prod, high: env.storage_max_p_absorb]
“storage_power_target”: Box(env.n_storage,) [type: float, low: -env.storage_max_p_prod, high: env.storage_max_p_absorb]
“storage_theta”: Box(env.n_storage,) [type: float, low: -180., high: 180.]
“target_dispatch”: Box(env.n_gen,) [type: float, low: -inf, high: inf]
“thermal_limit”: Box(env.n_line,) [type: int, low: 0, high: inf]
“theta_ex”: Box(env.n_line,) [type: float, low: -180., high: 180.]
“theta_or”: Box(env.n_line,) [type: float, low: -180., high: 180.]
“time_before_cooldown_line”: Box(env.n_line,) [type: int, low: 0, high: depending on parameters]
“time_before_cooldown_sub”: Box(env.n_sub,) [type: int, low: 0, high: depending on parameters]
“time_next_maintenance”: Box(env.n_line,) [type: int, low: 0, high: inf]
“time_since_last_alarm”: Box(1,) [type: int, low: -1, high: inf]
“time_since_last_alert”: Box(env.dim_alerts,) [type: int, low: -1, high: inf]
“time_since_last_attack”: Box(env.dim_alerts,) [type: int, low: -1, high: inf]
“timestep_overflow”: Box(env.n_line,) [type: int, low: 0, high: inf]
“topo_vect”: Box(env.dim_topo,) [type: int, low: -1, high: 2]
“total_number_of_alert”: Box(1 if env.dim_alerts > 0 else 0,) [type: int, low: 0, high: inf]
“v_ex”: Box(env.n_line,) [type: float, low: 0, high: inf]
“v_or”: Box(env.n_line,) [type: flaot, low: 0, high: inf]
“was_alarm_used_after_game_over”: Discrete(2)
“was_alert_used_after_attack”: Box(env.dim_alerts,) [type: int, low: -1, high: 1]
“year”: Discrete(2100)

Each keys correspond to an attribute of the observation. In this example “line_status”: MultiBinary(20) represents the attribute obs.line_status which is a boolean vector (for each powerline True encodes for “connected” and False for “disconnected”) See the chapter Observation for more information about these attributes.

You can transform the observation space as you wish. There are some examples in the notebooks.

Default Action space 

The default action space is also a type of gymnasium Dict. As for the observation space above, it is a straight translation from the attribute of the action to the key of the dictionary. This gives:

“change_bus”: MultiBinary(env.dim_topo)
“change_line_status”: MultiBinary(env.n_line)
“curtail”: Box(env.n_gen) [type: float, low=0., high=1.0]
“redispatch”: Box(env.n_gen) [type: float, low=-env.gen_max_ramp_down, high=`env.gen_max_ramp_up`]
“set_bus”: Box(env.dim_topo) [type: int, low=-1, high=2]
“set_line_status”: Box(env.n_line) [type: int, low=-1, high=1]
“storage_power”: Box(env.n_storage) [type: float, low=-env.storage_max_p_prod, high=`env.storage_max_p_absorb`]
“raise_alarm”: MultiBinary(env.dim_alarms)
“raise_alert”: MultiBinary(env.dim_alerts)

For example you can create a “gymnasium action” (for the default encoding) like:

import grid2op
from grid2op.gym_compat import GymEnv
import numpy as np

env_name = ...

env = grid2op.make(env_name)
gym_env = GymEnv(env)

seed = ...
obs, info = gym_env.reset(seed)  # for new gymnasium interface

# do nothing
gym_act = {}
obs, reward, done, truncated, info = gym_env.step(gym_act)

#change the bus of the element 6 and 7 of the "topo_vect"
gym_act = {}
gym_act["change_bus"] = np.zeros(env.dim_topo, dtype=np.int8)   # gymnasium encoding of a multi binary
gym_act["change_bus"][[6, 7]] = 1
obs, reward, done, truncated, info = gym_env.step(gym_act)

# redispatch generator 2 of 1.7MW
gym_act = {}
gym_act["redispatch"] = np.zeros(env.n_gen, dtype=np.float32)   # gymnasium encoding of a Box
gym_act["redispatch"][2] = 1.7
obs, reward, done, truncated, info = gym_env.step(gym_act)

# set the bus of element 8 and 9 to bus 2
gym_act = {}
gym_act["set_bus"] = np.zeros(env.dim_topo, dtype=int)   # gymnasium encoding of a Box
gym_act["set_bus"][[8, 9]] = 2
obs, reward, done, truncated, info = gym_env.step(gym_act)

# of course, you can set_bus, redispatch, change the storage units etc. in the same action.

This way of doing things is perfectly grounded. It works but it is quite verbose and not really “ML friendly”. You can customize the way you “encode” your actions / observations relatively easily. Some examples are given in the following subsections.

Customizing the action and observation space 

We offer some convenience functions to customize these spaces.

If you want a full control on this spaces, you need to implement something like:

import grid2op
env_name = ...
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv
# this of course will not work... Replace "AGymSpace" with a normal gymnasium space, like Dict, Box, MultiDiscrete etc.
from gym.spaces import AGymSpace
gym_env = GymEnv(env)

class MyCustomObservationSpace(AGymSpace):
    def __init__(self, whatever, you, want):
        # do as you please here
        pass
        # don't forget to initialize the base class
        AGymSpace.__init__(self, see, gym, doc, as, to, how, to, initialize, it)
        # eg. Box.__init__(self, low=..., high=..., dtype=float)

    def to_gym(self, observation):
        # this is this very same function that you need to implement
        # it should have this exact name, take only one observation (grid2op) as input
        # and return a gymnasium object that belong to your space "AGymSpace"
        return SomethingThatBelongTo_AGymSpace
        # eg. return np.concatenate((obs.gen_p * 0.1, np.sqrt(obs.load_p))

gym_env.observation_space = MyCustomObservationSpace(whatever, you, wanted)

And for the action space:

import grid2op
env_name = ...
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv
# this of course will not work... Replace "AGymSpace" with a normal gymnasium space, like Dict, Box, MultiDiscrete etc.
from gym.spaces import AGymSpace
gym_env = GymEnv(env)

class MyCustomActionSpace(AGymSpace):
    def __init__(self, whatever, you, want):
        # do as you please here
        pass
        # don't forget to initialize the base class
        AGymSpace.__init__(self, see, gym, doc, as, to, how, to, initialize, it)
        # eg. MultiDiscrete.__init__(self, nvec=...)

    def from_gym(self, gym_action):
        # this is this very same function that you need to implement
        # it should have this exact name, take only one action (member of your gymnasium space) as input
        # and return a grid2op action
        return TheGymAction_ConvertedTo_Grid2op_Action
        # eg. return np.concatenate((obs.gen_p * 0.1, np.sqrt(obs.load_p))

gym_env.action_space = MyCustomActionSpace(whatever, you, wanted)

There are some pre defined transformation (for example transforming the action to Discrete or MultiDiscrete). Do not hesitate to have a look at the section Customizing the action and observation space, into Box or Discrete.

Some already implemented customization 

However, if you don’t want to fully customize everything, we encourage you to have a look at the “GymConverter” that we coded to ease this process.

They all more or less the same manner. We show here an example of a “converter” that will scale the data (removing the value in substract and divide input data by divide):

import grid2op
from grid2op.gym_compat import GymEnv
from grid2op.gym_compat import ScalerAttrConverter

env_name = "l2rpn_case14_sandbox"  # or any other grid2op environment name
g2op_env = grid2op.make(env_name)  # create the gri2op environment

gym_env = GymEnv(g2op_env)  # create the gymnasium environment

ob_space = gym_env.observation_space
ob_space = ob_space.reencode_space("actual_dispatch",
                                   ScalerAttrConverter(substract=0.,
                                                       divide=env.gen_pmax,
                                                       init_space=ob_space["actual_dispatch"]
                                                       )
                                   )

gym_env.observation_space = ob_space

You can also add a specific keys into this observation space, for example say you want to compute the log of the loads instead of giving the direct value to your agent. This can be done with:

import grid2op
from grid2op.gym_compat import GymEnv
from grid2op.gym_compat import ScalerAttrConverter

env_name = "l2rpn_case14_sandbox"  # or any other grid2op environment name
g2op_env = grid2op.make(env_name)  # create the gri2op environment

gym_env = GymEnv(g2op_env)  # create the gymnasium environment

ob_space = gym_env.observation_space
shape_ = (g2op_env.n_load, )
ob_space = ob_space.add_key("log_load",
                             lambda obs: np.log(obs.load_p),
                                      Box(shape=shape_,
                                          low=np.full(shape_, fill_value=-np.inf, dtype=float),
                                          high=np.full(shape_, fill_value=-np.inf, dtype=float),
                                          dtype=float
                                          )
                                   )

gym_env.observation_space = ob_space
# and now you will get the key "log_load" as part of your gymnasium observation.

A detailed list of such “converter” is documented on the section “Detailed Documentation by class”. In the table below we describe some of them (nb if you notice a converter is not displayed there, do not hesitate to write us a “feature request” for the documentation, thanks in advance)

Converter name	Objective
`ContinuousToDiscreteConverter`	Convert a continuous space into a discrete one
`MultiToTupleConverter`	Convert a gymnasium MultiBinary to a gymnasium Tuple of gymnasium Binary and a gymnasium MultiDiscrete to a Tuple of Discrete
`ScalerAttrConverter`	Allows to scale (divide an attribute by something and subtract something from it)
BaseGymSpaceConverter.add_key	Allows you to compute another “part” of the observation space (you add an information to the gymnasium space)
BaseGymSpaceConverter.keep_only_attr	Allows you to specify which part of the action / observation you want to keep
BaseGymSpaceConverter.ignore_attr	Allows you to ignore some attributes of the action / observation (they will not be part of the gymnasium space)

Warning

TODO: Help more than welcome !

Organize this page with a section for each “use”:

scale de the data
keep only some part of the observation
add some info to the observation
transform a box to a discrete action space
use MultiDiscrete

Instead of having the current ordering of things

Note

With the “converters” above, note that the observation space AND action space will still inherit from gymnasium Dict.

They are complex spaces that are not well handled by some RL framework.

These converters only change the keys of these dictionaries !

Customizing the action and observation space, into Box or Discrete 

The use of the converter above is nice if you can work with gymnasium Dict, but in some cases, or for some frameworks it is not convenient to do it at all.

TO alleviate this problem, we developed 4 types of gymnasium action space, following the architecture detailed in subsection Customizing the action and observation space

Converter name	Objective
`BoxGymObsSpace`	Convert the observation space to a single “Box”
`BoxGymActSpace`	Convert a gymnasium MultiBinary to a gymnasium Tuple of gymnasium Binary and a gymnasium MultiDiscrete to a Tuple of Discrete
`MultiDiscreteActSpace`	Allows to scale (divide an attribute by something and subtract something from it)
`DiscreteActSpace`	Allows you to compute another “part” of the observation space (you add an information to the gymnasium space)

They can all be used like:

import grid2op
env_name = ...
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv, BoxGymObsSpace
gym_env = GymEnv(env)
gym_env.observation_space = BoxGymObsSpace(gym_env.init_env)
gym_env.action_space = MultiDiscreteActSpace(gym_env.init_env)

We encourage you to visit the documentation for more information on how to use these classes. Each offer different possible customization.

Gymnasium vs Gym 

Starting from grid2op 1.9.1 we introduced the compatibility with gymnasium package (the replacement of the gym package that will no longer be maintained).

By default, if gymnasium is installed on your machine, all classes from the grid2op.gym_compat module will inherit from gymnasium. That is GymEnv will be inherit from gymnasium.Env`(and not `gym.Env), GymActionSpace will inherit from gymnasium.spaces.Dict (and not from gym.spaces.Dict) etc.

But we wanted to maintain Backward compatibility. It is ensured in two different ways:

if you have both gymnasium and gym installed on your machine, you can choose which “framework” you want to use by explicitly using the right grid2op class. For example, if you want a gym environment (inheriting from gym.Env) you can use GymEnv_Modern`and if you want to explicitly stay in `gymnasium you can use GymnasiumEnv
if you don’t want to have gymnasium and only gym is installed then the default grid2op class will stay in the gym eco system. In this case, gym.Env will be GymEnv_Modern and all the code previously written will work exactly as before.

Note

As you understood if you want to keep the behaviour of grid2op prior to 1.9.1 the simplest solution would be not to install gymnasium at all.

If however you want to benefit from the latest gymnasium package, you can keep the previous code you have and simply install gymnasium. All classes defined there will still be defined and you will be able to use gymnasium transparently.

The table bellow summarize the correspondance between the default classes and the classes specific to gymnasium / gym:

Default class	Class with gymnasium	Class with gym
`BaseGymAttrConverter`	`BaseGymnasiumAttrConverter`	`BaseLegacyGymAttrConverter`
`BoxGymActSpace`	`BoxGymnasiumActSpace`	`BoxLegacyGymActSpace`
`BoxGymObsSpace`	`BoxGymnasiumObsSpace`	`BoxLegacyGymObsSpace`
`ContinuousToDiscreteConverter`	`ContinuousToDiscreteConverterGymnasium`	`ContinuousToDiscreteConverterLegacyGym`
`DiscreteActSpace`	`DiscreteActSpaceGymnasium`	`DiscreteActSpaceLegacyGym`
`GymActionSpace`	`GymnasiumActionSpace`	`LegacyGymActionSpace`
`GymObservationSpace`	`GymnasiumObservationSpace`	`LegacyGymObservationSpace`
`_BaseGymSpaceConverter`	`_BaseGymnasiumSpaceConverter`	`_BaseLegacyGymSpaceConverter`
`GymEnv`	`GymnasiumEnv`	`GymEnv_Modern` / `GymEnv_Legacy`
`MultiToTupleConverter`	`MultiToTupleConverterGymnasium`	`MultiToTupleConverterLegacyGym`
`MultiDiscreteActSpace`	`MultiDiscreteActSpaceGymnasium`	`MultiDiscreteActSpaceLegacyGym`
`ScalerAttrConverter`	`ScalerAttrConverterGymnasium`	`ScalerAttrConverterLegacyGym`

Recommended usage of grid2op with other framework 

Reinforcement learning frameworks 

TODO

Any contribution is welcome here

Other frameworks 

Any contribution is welcome here too (-:

Troubleshoot with some frameworks 

Python complains about pickle 

This usually takes the form of an error with XXX_env_name (eg CompleteObservation_l2rpn_wcci_2022) is not serializable.

This is because grid2op will (to save computation time) generate some classes (the classes themseleves) on the fly, once the environment is loaded. And unfortunately, pickle module is not always able to process these (meta) data.

You can solve this issue by look at Pickle issues section of the documentation.

Observation XXX outside given space YYY 

Often encountered with ray[rllib] this is due to a technical aspect (slack bus) of the grid which may cause issue with gen_p being above / bellow pmin / pmax for certain generators.

You can get rid of it by modifying the observation space and “remove” the low / high values on pmin and pmax:

# we suppose you already have an observation space
self.observation_space["gen_p"].low[:] = -np.inf
self.observation_space["gen_p"].high[:] = np.inf

Detailed Documentation by class 

Classes:

`BaseGymAttrConverter`	alias of `BaseGymnasiumAttrConverter`
`BaseGymnasiumAttrConverter`([space, ...])	TODO work in progress !
`BaseLegacyGymAttrConverter`([space, ...])	TODO work in progress !
`BoxGymActSpace`	alias of `BoxGymnasiumActSpace`
`BoxGymObsSpace`	alias of `BoxGymnasiumObsSpace`
`BoxGymnasiumActSpace`(grid2op_action_space[, ...])	This class allows to convert a grid2op action space into a gymnasium "Box" which is a regular Box in R^d.
`BoxGymnasiumObsSpace`(grid2op_observation_space)	This class allows to convert a grid2op observation space into a gymnasium "Box" which is a regular Box in R^d.
`BoxLegacyGymActSpace`(grid2op_action_space[, ...])	This class allows to convert a grid2op action space into a gymnasium "Box" which is a regular Box in R^d.
`BoxLegacyGymObsSpace`(grid2op_observation_space)	This class allows to convert a grid2op observation space into a gymnasium "Box" which is a regular Box in R^d.
`ContinuousToDiscreteConverter`	alias of `ContinuousToDiscreteConverterGymnasium`
`ContinuousToDiscreteConverterGymnasium`(nb_bins)	TODO doc in progress
`ContinuousToDiscreteConverterLegacyGym`(nb_bins)	TODO doc in progress
`DiscreteActSpace`	alias of `DiscreteActSpaceGymnasium`
`DiscreteActSpaceGymnasium`(grid2op_action_space)	TODO the documentation of this class is in progress.
`DiscreteActSpaceLegacyGym`(grid2op_action_space)	TODO the documentation of this class is in progress.
`GymActionSpace`	alias of `GymnasiumActionSpace`
`GymEnv`	alias of `GymnasiumEnv`
`GymEnv_Legacy`(env_init[, shuffle_chronics, ...])	fully implements the gymnasium API by using the `GymActionSpace` and `GymObservationSpace` for compliance with gymnasium.
`GymEnv_Modern`(env_init[, shuffle_chronics, ...])	fully implements the gymnasium API by using the `GymActionSpace` and `GymObservationSpace` for compliance with gymnasium.
`GymObservationSpace`	alias of `GymnasiumObservationSpace`
`GymnasiumActionSpace`(env[, converter, ...])	This class enables the conversion of the action space into a gymnasium "space".
`GymnasiumEnv`(env_init[, shuffle_chronics, ...])	fully implements the gymnasium API by using the `GymActionSpace` and `GymObservationSpace` for compliance with gymnasium.
`GymnasiumObservationSpace`(env[, dict_variables])	TODO explain gym / gymnasium
`LegacyGymActionSpace`(env[, converter, ...])	This class enables the conversion of the action space into a gymnasium "space".
`LegacyGymObservationSpace`(env[, dict_variables])	TODO explain gym / gymnasium
`MultiDiscreteActSpace`	alias of `MultiDiscreteActSpaceGymnasium`
`MultiDiscreteActSpaceGymnasium`(...[, ...])	This class allows to convert a grid2op action space into a gymnasium "MultiDiscrete".
`MultiDiscreteActSpaceLegacyGym`(...[, ...])	This class allows to convert a grid2op action space into a gymnasium "MultiDiscrete".
`MultiToTupleConverter`	alias of `MultiToTupleConverterGymnasium`
`MultiToTupleConverterGymnasium`([init_space])	Some framework, for example ray[rllib] do not support MultiBinary nor MultiDiscrete gym action space.
`MultiToTupleConverterLegacyGym`([init_space])	Some framework, for example ray[rllib] do not support MultiBinary nor MultiDiscrete gym action space.
`ScalerAttrConverter`	alias of `ScalerAttrConverterGymnasium`
`ScalerAttrConverterGymnasium`(substract, divide)	This is a scaler that transforms a initial gymnasium space init_space into its scale version.
`ScalerAttrConverterLegacyGym`(substract, divide)	This is a scaler that transforms a initial gymnasium space init_space into its scale version.

grid2op.gym_compat.BaseGymAttrConverter: alias of BaseGymnasiumAttrConverter

class grid2op.gym_compat.BaseGymnasiumAttrConverter(space=None, gym_to_g2op=None, g2op_to_gym=None)

TODO work in progress !

Need help if you can :-)

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

BaseGymAttrConverter will inherit from gymnasium if it’s installed (in this case it will be BaseGymnasiumAttrConverter), otherwise it will inherit from gym (and will be exactly BaseLegacyGymAttrConverter)
BaseGymnasiumAttrConverter will inherit from gymnasium if it’s available and never from from gym
BaseLegacyGymAttrConverter will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

class grid2op.gym_compat.BaseLegacyGymAttrConverter(space=None, gym_to_g2op=None, g2op_to_gym=None)

TODO work in progress !

Need help if you can :-)

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

BaseGymAttrConverter will inherit from gymnasium if it’s installed (in this case it will be BaseGymnasiumAttrConverter), otherwise it will inherit from gym (and will be exactly BaseLegacyGymAttrConverter)
BaseGymnasiumAttrConverter will inherit from gymnasium if it’s available and never from from gym
BaseLegacyGymAttrConverter will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

grid2op.gym_compat.BoxGymActSpace: alias of BoxGymnasiumActSpace

grid2op.gym_compat.BoxGymObsSpace: alias of BoxGymnasiumObsSpace

class grid2op.gym_compat.BoxGymnasiumActSpace(grid2op_action_space: ActionSpace, attr_to_keep: Tuple[Literal['set_line_status'], Literal['change_line_status'], Literal['set_bus'], Literal['change_bus'], Literal['redispatch'], Literal['set_storage'], Literal['curtail'], Literal['curtail_mw'], Literal['raise_alarm'], Literal['raise_alert']] | None = ('redispatch', 'set_storage', 'curtail'), add: Dict[str, Any] | None = None, multiply: Dict[str, Any] | None = None, functs: Dict[str, Any] | None = None)

This class allows to convert a grid2op action space into a gymnasium “Box” which is a regular Box in R^d.

It also allows to customize which part of the action you want to use and offer capacity to center / reduce the data or to use more complex function from the observation.

Note

Though it is possible to use every type of action with this type of action space, be aware that this is not recommended at all to use it for discrete attribute (set_bus, change_bus, set_line_status or change_line_status) !

Basically, when doing action in gymnasium for these attributes, this converter will involve rounding and is definitely not the best representation. Prefer the MultiDiscreteActSpace or the DiscreteActSpace classes.

Note

A gymnasium Box is encoded as a numpy array, see the example section for more information.

Danger

If you use this encoding for the “curtailment” you might end up with “weird” behaviour. Your agent will perfom some kind of curtailment at each step (there is no real way to express “I don’t want to curtail”) So the index corresponding to the “curtail” type of actions should rather be “1.” (meaning “limit the value at 100%” which is somewhat equivalent to “I don’t want to curtail”)

Examples

If you simply want to use it you can do:

import grid2op
env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv, BoxGymActSpace
gym_env = GymEnv(env)

gym_env.action_space = BoxGymActSpace(env.action_space)

In this case it will extract all the features in all the action (a detailed list is given in the documentation at Action).

You can select the attribute you want to keep, for example:

gym_env.action_space = BoxGymActSpace(env.action_space,
                                           attr_to_keep=['redispatch', "curtail"])

You can also apply some basic transformation to the attribute of the action. This can be done with:

gym_env.action_space = BoxGymActSpace(env.action_space,
                                      attr_to_keep=['redispatch', "curtail"],
                                      multiply={"redispatch": env.gen_max_ramp_up},
                                      add={"redispatch": 0.5 * env.gen_max_ramp_up})

In the above example, the resulting “redispatch” part of the vector will be given by the following formula: grid2op_act = gym_act * multiply + add

Hint: you can use: multiply being the standard deviation and add being the average of the attribute.

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

BoxGymActSpace will inherit from gymnasium if it’s installed (in this case it will be BoxGymnasiumActSpace), otherwise it will inherit from gym (and will be exactly BoxLegacyGymActSpace)
BoxGymnasiumActSpace will inherit from gymnasium if it’s available and never from from gym
BoxLegacyGymActSpace will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

For the “l2rpn_case14_sandbox” environment, a code using BoxGymActSpace can look something like (if you want to build action “by hands”):

import grid2op
from grid2op.gym_compat import GymEnv, BoxGymActSpace
import numpy as np
env_name = "l2rpn_case14_sandbox"

env = grid2op.make(env_name)
gym_env =  GymEnv(env)
gym_env.action_space = BoxGymActSpace(env.action_space)

obs = gym_env.reset()  # obs will be an Dict (default, but you can customize it)

# you can do a "do nothing" action
act = np.zeros(gym_env.action_space.shape)
# see danger about curtailment !
start_, end_ = gym_env.action_space.get_indexes("curtail")
## real version, not in the space... (write an issue if it's a problem for you)
act[start_:end_] = -1
## version "in the space"
# act[start_:end_] = 1
print(gym_env.action_space.from_gym(act))
obs, reward, done, truncated, info = gym_env.step(act)

# you can also do a random action:
act = gym_env.action_space.sample()
print(gym_env.action_space.from_gym(act))
obs, reward, done, truncated, info = gym_env.step(act)

# you can do an action on say redispatch (for example)
act = np.zeros(gym_env.action_space.shape)
key = "redispatch"  # "redispatch", "curtail", "set_storage" (but there is no storage on this env)
start_, end_ = gym_env.action_space.get_indexes(key)
act[start_:end_] = np.random.uniform(high=1, low=-1, size=env.gen_redispatchable.sum())  # the dispatch vector
print(gym_env.action_space.from_gym(act))
obs, reward, done, truncated, info = gym_env.step(act)

Notes

For more customization, this code is roughly equivalent to something like:

import grid2op
env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv
# this of course will not work... Replace "AGymSpace" with a normal gymnasium space, like Dict, Box, MultiDiscrete etc.
from gym.spaces import AGymSpace
gym_env = GymEnv(env)

class MyCustomActionSpace(AGymSpace):
    def __init__(self, whatever, you, want):
        # do as you please here
        pass
        # don't forget to initialize the base class
        AGymSpace.__init__(self, see, gym, doc, as, to, how, to, initialize, it)
        # eg. MultiDiscrete.__init__(self, nvec=...)

    def from_gym(self, gym_action):
        # this is this very same function that you need to implement
        # it should have this exact name, take only one action (member of your gymnasium space) as input
        # and return a grid2op action
        return TheGymAction_ConvertedTo_Grid2op_Action
        # eg. return np.concatenate((obs.gen_p * 0.1, np.sqrt(obs.load_p))

gym_env.action_space.close()
gym_env.action_space = MyCustomActionSpace(whatever, you, wanted)

And you can implement pretty much anything in the “from_gym” function.

class grid2op.gym_compat.BoxGymnasiumObsSpace(grid2op_observation_space, attr_to_keep=('year', 'month', 'day', 'hour_of_day', 'minute_of_hour', 'day_of_week', 'gen_p', 'gen_p_before_curtail', 'gen_q', 'gen_v', 'gen_margin_up', 'gen_margin_down', 'load_p', 'load_q', 'load_v', 'p_or', 'q_or', 'v_or', 'a_or', 'p_ex', 'q_ex', 'v_ex', 'a_ex', 'rho', 'line_status', 'timestep_overflow', 'topo_vect', 'time_before_cooldown_line', 'time_before_cooldown_sub', 'time_next_maintenance', 'duration_next_maintenance', 'target_dispatch', 'actual_dispatch', 'storage_charge', 'storage_power_target', 'storage_power', 'curtailment', 'curtailment_limit', 'curtailment_limit_effective', 'thermal_limit', 'is_alarm_illegal', 'time_since_last_alarm', 'last_alarm', 'attention_budget', 'was_alarm_used_after_game_over', 'max_step', 'active_alert', 'attack_under_alert', 'time_since_last_alert', 'alert_duration', 'total_number_of_alert', 'time_since_last_attack', 'was_alert_used_after_attack', 'theta_or', 'theta_ex', 'load_theta', 'gen_theta'), subtract=None, divide=None, functs=None)

This class allows to convert a grid2op observation space into a gymnasium “Box” which is a regular Box in R^d.

It also allows to customize which part of the observation you want to use and offer capacity to center / reduce the data or to use more complex function from the observation.

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

BoxGymObsSpace will inherit from gymnasium if it’s installed (in this case it will be BoxGymnasiumObsSpace), otherwise it will inherit from gym (and will be exactly BoxLegacyGymObsSpace)
BoxGymnasiumObsSpace will inherit from gymnasium if it’s available and never from from gym
BoxLegacyGymObsSpace will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

Note

A gymnasium Box is encoded as a numpy array.

Examples

If you simply want to use it you can do:

import grid2op
env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv, BoxGymObsSpace
gym_env = GymEnv(env)

gym_env.observation_space = BoxGymObsSpace(env.observation_space)

In this case it will extract all the features in all the observation (a detailed list is given in the documentation at Observation.

You can select the attribute you want to keep, for example:

gym_env.observation_space = BoxGymObsSpace(env.observation_space,
                                           attr_to_keep=['load_p', "gen_p", "rho])

You can also apply some basic transformation to the attribute of the observation before building the resulting gymnasium observation (which in this case is a vector). This can be done with:

gym_env.observation_space = BoxGymObsSpace(env.observation_space,
                                           attr_to_keep=['load_p', "gen_p", "rho"],
                                           divide={"gen_p": env.gen_pmax},
                                           substract={"gen_p": 0.5 * env.gen_pmax})

In the above example, the resulting “gen_p” part of the vector will be given by the following formula: gym_obs = (grid2op_obs - substract) / divide.

Hint: you can use: divide being the standard deviation and subtract being the average of the attribute on a few episodes for example. This can be done with grid2op.utils.EpisodeStatistics for example.

Finally, you can also modify more the attribute of the observation and add it to your box. This can be done rather easily with the “functs” argument like:

gym_env.observation_space = BoxGymObsSpace(env.observation_space,
                                           attr_to_keep=["connectivity_matrix", "log_load"],
                                           functs={"connectivity_matrix":
                                                      (lambda grid2opobs: grid2opobs.connectivity_matrix().flatten(),
                                                       0., 1.0, None, None),
                                                   "log_load":
                                                    (lambda grid2opobs: np.log(grid2opobs.load_p),
                                                    None, 10., None, None)
                                                }
                                           )

In this case, “functs” should be a dictionary, the “keys” should be string (keys should also be present in the attr_to_keep list) and the values should count 5 elements (callable, low, high, shape, dtype) with:

callable a function taking as input a grid2op observation and returning a numpy array
low (optional) (put None if you don’t want to specify it, defaults to -np.inf) the lowest value your numpy array can take. It can be a single number or an array with the same shape as the return value of your function.
high (optional) (put None if you don’t want to specify it, defaults to np.inf) the highest value your numpy array can take. It can be a single number or an array with the same shape as the return value of your function.
shape (optional) (put None if you don’t want to specify it) the shape of the return value of your function. It should be a tuple (and not a single number). By default it is computed with by applying your function to an observation.
dtype (optional, put None if you don’t want to change it, defaults to np.float32) the type of the numpy array as output of your function.

Notes

The range of the values for “gen_p” / “prod_p” are not strictly env.gen_pmin and env.gen_pmax. This is due to the “approximation” when some redispatching is performed (the precision of the algorithm that computes the actual dispatch from the information it receives) and also because sometimes the losses of the grid are really different that the one anticipated in the “chronics” (yes env.gen_pmin and env.gen_pmax are not always ensured in grid2op)

class grid2op.gym_compat.BoxLegacyGymActSpace(grid2op_action_space: ActionSpace, attr_to_keep: Tuple[Literal['set_line_status'], Literal['change_line_status'], Literal['set_bus'], Literal['change_bus'], Literal['redispatch'], Literal['set_storage'], Literal['curtail'], Literal['curtail_mw'], Literal['raise_alarm'], Literal['raise_alert']] | None = ('redispatch', 'set_storage', 'curtail'), add: Dict[str, Any] | None = None, multiply: Dict[str, Any] | None = None, functs: Dict[str, Any] | None = None)

This class allows to convert a grid2op action space into a gymnasium “Box” which is a regular Box in R^d.

It also allows to customize which part of the action you want to use and offer capacity to center / reduce the data or to use more complex function from the observation.

Note

Though it is possible to use every type of action with this type of action space, be aware that this is not recommended at all to use it for discrete attribute (set_bus, change_bus, set_line_status or change_line_status) !

Basically, when doing action in gymnasium for these attributes, this converter will involve rounding and is definitely not the best representation. Prefer the MultiDiscreteActSpace or the DiscreteActSpace classes.

Note

A gymnasium Box is encoded as a numpy array, see the example section for more information.

Danger

If you use this encoding for the “curtailment” you might end up with “weird” behaviour. Your agent will perfom some kind of curtailment at each step (there is no real way to express “I don’t want to curtail”) So the index corresponding to the “curtail” type of actions should rather be “1.” (meaning “limit the value at 100%” which is somewhat equivalent to “I don’t want to curtail”)

Examples

If you simply want to use it you can do:

import grid2op
env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv, BoxGymActSpace
gym_env = GymEnv(env)

gym_env.action_space = BoxGymActSpace(env.action_space)

In this case it will extract all the features in all the action (a detailed list is given in the documentation at Action).

You can select the attribute you want to keep, for example:

gym_env.action_space = BoxGymActSpace(env.action_space,
                                           attr_to_keep=['redispatch', "curtail"])

You can also apply some basic transformation to the attribute of the action. This can be done with:

gym_env.action_space = BoxGymActSpace(env.action_space,
                                      attr_to_keep=['redispatch', "curtail"],
                                      multiply={"redispatch": env.gen_max_ramp_up},
                                      add={"redispatch": 0.5 * env.gen_max_ramp_up})

In the above example, the resulting “redispatch” part of the vector will be given by the following formula: grid2op_act = gym_act * multiply + add

Hint: you can use: multiply being the standard deviation and add being the average of the attribute.

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

BoxGymActSpace will inherit from gymnasium if it’s installed (in this case it will be BoxGymnasiumActSpace), otherwise it will inherit from gym (and will be exactly BoxLegacyGymActSpace)
BoxGymnasiumActSpace will inherit from gymnasium if it’s available and never from from gym
BoxLegacyGymActSpace will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

For the “l2rpn_case14_sandbox” environment, a code using BoxGymActSpace can look something like (if you want to build action “by hands”):

import grid2op
from grid2op.gym_compat import GymEnv, BoxGymActSpace
import numpy as np
env_name = "l2rpn_case14_sandbox"

env = grid2op.make(env_name)
gym_env =  GymEnv(env)
gym_env.action_space = BoxGymActSpace(env.action_space)

obs = gym_env.reset()  # obs will be an Dict (default, but you can customize it)

# you can do a "do nothing" action
act = np.zeros(gym_env.action_space.shape)
# see danger about curtailment !
start_, end_ = gym_env.action_space.get_indexes("curtail")
## real version, not in the space... (write an issue if it's a problem for you)
act[start_:end_] = -1
## version "in the space"
# act[start_:end_] = 1
print(gym_env.action_space.from_gym(act))
obs, reward, done, truncated, info = gym_env.step(act)

# you can also do a random action:
act = gym_env.action_space.sample()
print(gym_env.action_space.from_gym(act))
obs, reward, done, truncated, info = gym_env.step(act)

# you can do an action on say redispatch (for example)
act = np.zeros(gym_env.action_space.shape)
key = "redispatch"  # "redispatch", "curtail", "set_storage" (but there is no storage on this env)
start_, end_ = gym_env.action_space.get_indexes(key)
act[start_:end_] = np.random.uniform(high=1, low=-1, size=env.gen_redispatchable.sum())  # the dispatch vector
print(gym_env.action_space.from_gym(act))
obs, reward, done, truncated, info = gym_env.step(act)

Notes

For more customization, this code is roughly equivalent to something like:

import grid2op
env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv
# this of course will not work... Replace "AGymSpace" with a normal gymnasium space, like Dict, Box, MultiDiscrete etc.
from gym.spaces import AGymSpace
gym_env = GymEnv(env)

class MyCustomActionSpace(AGymSpace):
    def __init__(self, whatever, you, want):
        # do as you please here
        pass
        # don't forget to initialize the base class
        AGymSpace.__init__(self, see, gym, doc, as, to, how, to, initialize, it)
        # eg. MultiDiscrete.__init__(self, nvec=...)

    def from_gym(self, gym_action):
        # this is this very same function that you need to implement
        # it should have this exact name, take only one action (member of your gymnasium space) as input
        # and return a grid2op action
        return TheGymAction_ConvertedTo_Grid2op_Action
        # eg. return np.concatenate((obs.gen_p * 0.1, np.sqrt(obs.load_p))

gym_env.action_space.close()
gym_env.action_space = MyCustomActionSpace(whatever, you, wanted)

And you can implement pretty much anything in the “from_gym” function.

class grid2op.gym_compat.BoxLegacyGymObsSpace(grid2op_observation_space, attr_to_keep=('year', 'month', 'day', 'hour_of_day', 'minute_of_hour', 'day_of_week', 'gen_p', 'gen_p_before_curtail', 'gen_q', 'gen_v', 'gen_margin_up', 'gen_margin_down', 'load_p', 'load_q', 'load_v', 'p_or', 'q_or', 'v_or', 'a_or', 'p_ex', 'q_ex', 'v_ex', 'a_ex', 'rho', 'line_status', 'timestep_overflow', 'topo_vect', 'time_before_cooldown_line', 'time_before_cooldown_sub', 'time_next_maintenance', 'duration_next_maintenance', 'target_dispatch', 'actual_dispatch', 'storage_charge', 'storage_power_target', 'storage_power', 'curtailment', 'curtailment_limit', 'curtailment_limit_effective', 'thermal_limit', 'is_alarm_illegal', 'time_since_last_alarm', 'last_alarm', 'attention_budget', 'was_alarm_used_after_game_over', 'max_step', 'active_alert', 'attack_under_alert', 'time_since_last_alert', 'alert_duration', 'total_number_of_alert', 'time_since_last_attack', 'was_alert_used_after_attack', 'theta_or', 'theta_ex', 'load_theta', 'gen_theta'), subtract=None, divide=None, functs=None)

This class allows to convert a grid2op observation space into a gymnasium “Box” which is a regular Box in R^d.

It also allows to customize which part of the observation you want to use and offer capacity to center / reduce the data or to use more complex function from the observation.

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

BoxGymObsSpace will inherit from gymnasium if it’s installed (in this case it will be BoxGymnasiumObsSpace), otherwise it will inherit from gym (and will be exactly BoxLegacyGymObsSpace)
BoxGymnasiumObsSpace will inherit from gymnasium if it’s available and never from from gym
BoxLegacyGymObsSpace will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

Note

A gymnasium Box is encoded as a numpy array.

Examples

If you simply want to use it you can do:

import grid2op
env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv, BoxGymObsSpace
gym_env = GymEnv(env)

gym_env.observation_space = BoxGymObsSpace(env.observation_space)

In this case it will extract all the features in all the observation (a detailed list is given in the documentation at Observation.

You can select the attribute you want to keep, for example:

gym_env.observation_space = BoxGymObsSpace(env.observation_space,
                                           attr_to_keep=['load_p', "gen_p", "rho])

You can also apply some basic transformation to the attribute of the observation before building the resulting gymnasium observation (which in this case is a vector). This can be done with:

gym_env.observation_space = BoxGymObsSpace(env.observation_space,
                                           attr_to_keep=['load_p', "gen_p", "rho"],
                                           divide={"gen_p": env.gen_pmax},
                                           substract={"gen_p": 0.5 * env.gen_pmax})

In the above example, the resulting “gen_p” part of the vector will be given by the following formula: gym_obs = (grid2op_obs - substract) / divide.

Hint: you can use: divide being the standard deviation and subtract being the average of the attribute on a few episodes for example. This can be done with grid2op.utils.EpisodeStatistics for example.

Finally, you can also modify more the attribute of the observation and add it to your box. This can be done rather easily with the “functs” argument like:

gym_env.observation_space = BoxGymObsSpace(env.observation_space,
                                           attr_to_keep=["connectivity_matrix", "log_load"],
                                           functs={"connectivity_matrix":
                                                      (lambda grid2opobs: grid2opobs.connectivity_matrix().flatten(),
                                                       0., 1.0, None, None),
                                                   "log_load":
                                                    (lambda grid2opobs: np.log(grid2opobs.load_p),
                                                    None, 10., None, None)
                                                }
                                           )

In this case, “functs” should be a dictionary, the “keys” should be string (keys should also be present in the attr_to_keep list) and the values should count 5 elements (callable, low, high, shape, dtype) with:

callable a function taking as input a grid2op observation and returning a numpy array
low (optional) (put None if you don’t want to specify it, defaults to -np.inf) the lowest value your numpy array can take. It can be a single number or an array with the same shape as the return value of your function.
high (optional) (put None if you don’t want to specify it, defaults to np.inf) the highest value your numpy array can take. It can be a single number or an array with the same shape as the return value of your function.
shape (optional) (put None if you don’t want to specify it) the shape of the return value of your function. It should be a tuple (and not a single number). By default it is computed with by applying your function to an observation.
dtype (optional, put None if you don’t want to change it, defaults to np.float32) the type of the numpy array as output of your function.

Notes

The range of the values for “gen_p” / “prod_p” are not strictly env.gen_pmin and env.gen_pmax. This is due to the “approximation” when some redispatching is performed (the precision of the algorithm that computes the actual dispatch from the information it receives) and also because sometimes the losses of the grid are really different that the one anticipated in the “chronics” (yes env.gen_pmin and env.gen_pmax are not always ensured in grid2op)

grid2op.gym_compat.ContinuousToDiscreteConverter: alias of ContinuousToDiscreteConverterGymnasium

class grid2op.gym_compat.ContinuousToDiscreteConverterGymnasium(nb_bins, init_space=None)

TODO doc in progress

Some RL algorithms are particularly suited for dealing with discrete action space or observation space.

This “AttributeConverter” is responsible to convert continuous space to discrete space. The way it does it is by using bins. It uses np.linspace to compute the bins.

We recommend using an odd number of bins (eg 3, 7 or 9 for example).

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

ContinuousToDiscreteConverter will inherit from gymnasium if it’s installed (in this case it will be ContinuousToDiscreteConverterGymnasium), otherwise it will inherit from gym (and will be exactly ContinuousToDiscreteConverterLegacyGym)
ContinuousToDiscreteConverterGymnasium will inherit from gymnasium if it’s available and never from from gym
ContinuousToDiscreteConverterLegacyGym will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

Examples

If nb_bins is 3 and the original input space is [-10, 10], then the split is the following:

0 encodes all numbers in [-10, -3.33)
1 encodes all numbers in [-3.33, 3.33)
2 encode all numbers in [3.33, 10.]

And reciprocally, this action with :

0 is understood as -5.0 (middle of the interval -10 / 0)
1 is understood as 0.0 (middle of the interval represented by -10 / 10)
2 is understood as 5.0 (middle of the interval represented by 0 / 10)

If nb_bins is 5 and the original input space is [-10, 10], then the split is the following:

0 encodes all numbers in [-10, -6)
1 encodes all numbers in [-6, -2)
2 encode all numbers in [-2, 2)
3 encode all numbers in [2, 6)
4 encode all numbers in [6, 10]

And reciprocally, this action with :

0 is understood as -6.6666…
1 is understood as -3.333…
2 is understood as 0.
3 is understood as 3.333…
4 is understood as 6.6666…

TODO add example of code on how to use this.

class grid2op.gym_compat.ContinuousToDiscreteConverterLegacyGym(nb_bins, init_space=None)

TODO doc in progress

Some RL algorithms are particularly suited for dealing with discrete action space or observation space.

This “AttributeConverter” is responsible to convert continuous space to discrete space. The way it does it is by using bins. It uses np.linspace to compute the bins.

We recommend using an odd number of bins (eg 3, 7 or 9 for example).

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

ContinuousToDiscreteConverter will inherit from gymnasium if it’s installed (in this case it will be ContinuousToDiscreteConverterGymnasium), otherwise it will inherit from gym (and will be exactly ContinuousToDiscreteConverterLegacyGym)
ContinuousToDiscreteConverterGymnasium will inherit from gymnasium if it’s available and never from from gym
ContinuousToDiscreteConverterLegacyGym will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

Examples

If nb_bins is 3 and the original input space is [-10, 10], then the split is the following:

0 encodes all numbers in [-10, -3.33)
1 encodes all numbers in [-3.33, 3.33)
2 encode all numbers in [3.33, 10.]

And reciprocally, this action with :

0 is understood as -5.0 (middle of the interval -10 / 0)
1 is understood as 0.0 (middle of the interval represented by -10 / 10)
2 is understood as 5.0 (middle of the interval represented by 0 / 10)

If nb_bins is 5 and the original input space is [-10, 10], then the split is the following:

0 encodes all numbers in [-10, -6)
1 encodes all numbers in [-6, -2)
2 encode all numbers in [-2, 2)
3 encode all numbers in [2, 6)
4 encode all numbers in [6, 10]

And reciprocally, this action with :

0 is understood as -6.6666…
1 is understood as -3.333…
2 is understood as 0.
3 is understood as 3.333…
4 is understood as 6.6666…

TODO add example of code on how to use this.

grid2op.gym_compat.DiscreteActSpace: alias of DiscreteActSpaceGymnasium

class grid2op.gym_compat.DiscreteActSpaceGymnasium(grid2op_action_space: ActionSpace, attr_to_keep: Tuple[Literal['set_line_status'], Literal['set_line_status_simple'], Literal['change_line_status'], Literal['set_bus'], Literal['change_bus'], Literal['redispatch'], Literal['set_storage'], Literal['curtail'], Literal['curtail_mw']] | None = ('set_line_status', 'change_line_status', 'set_bus', 'change_bus', 'redispatch', 'set_storage', 'curtail'), nb_bins: Dict[Literal['redispatch', 'set_storage', 'curtail', 'curtail_mw'], int] | None = None, action_list=None)

TODO the documentation of this class is in progress.

This class allows to convert a grid2op action space into a gymnasium “Discrete”. This means that the action are labeled, and instead of describing the action itself, you provide only its ID.

Let’s take an example of line disconnection. In the “standard” gymnasium representation you need to:

import grid2op
import numpy as np
from grid2op.gym_compat import GymEnv

env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)
gym_env = GymEnv(env)

# now do an action
gym_act = {}
gym_act["set_bus"]  = np.zeros(env.n_line, dtype=np.int)
l_id = ... # the line you want to disconnect
gym_act["set_bus"][l_id] = -1
obs, reward, done, truncated, info = gym_env.step(gym_act)

This has the advantage to be as close as possible to raw grid2op. But the main drawback is that most of RL framework are not able to do this kind of modification easily. For discrete actions, what is often do is:

enumerate all possible actions (say you have n different actions)
assign a unique id to all actions (say from 0 to n-1)
have a “policy” output a vector of size n with each component representing an action (eg vect[42] represents the score the policy assign to action 42)

Instead of having everyone doing the modifications “on its own” we developed the DiscreteActSpace that does exactly this, in a single line of code:

import grid2op
import numpy as np
from grid2op.gym_compat import GymEnv, DiscreteActSpace

env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)
gym_env = GymEnv(env)
gym_env.action_space = DiscreteActSpace(env.action_space,
                                        attr_to_keep=["set_bus",
                                                      "set_line_status",
                                                      # or anything else
                                                     ]
                                        )

# do action with ID 42
gym_act = 42
obs, reward, done, truncated, info = gym_env.step(gym_act)
# to know what the action did, you can
# print(gym_env.action_space.from_gym(gym_act))

It is related to the MultiDiscreteActSpace but compared to this other representation, it does not allow to do “multiple actions”. Typically, if you use the snippets below:

import grid2op
env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv, MultiDiscreteActSpace, DiscreteActSpace
gym_env1 = GymEnv(env)
gym_env2 = GymEnv(env)

gym_env1.action_space = MultiDiscreteActSpace(env.action_space,
                                              attr_to_keep=['redispatch', "curtail", "one_sub_set"])
gym_env2.action_space = DiscreteActSpace(env.action_space,
                                         attr_to_keep=['redispatch', "curtail", "set_bus"])

Then at each step, gym_env1 will allow to perform a redispatching action (on any number of generators), a curtailment action (on any number of generators) __**AND**__ changing the topology at one substation. But at each steps, the agent should predicts lots of “number”.

On the other hand, at each step, the agent for gym_env2 will have to predict a single integer (which is usually the case in most RL environment) but it this action will affect redispatching on a single generator, perform curtailment on a single generator __**OR**__ changing the topology at one substation. But at each steps, the agent should predicts only one “number”.

The action set is then largely constrained compared to the MultiDiscreteActSpace

Note

This class is really closely related to the grid2op.Converter.IdToAct. It basically “maps” this “IdToAct” into a type of gymnasium space, which, in this case, will be a Discrete one.

Note

By default, the “do nothing” action is encoded by the integer ‘0’.

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

DiscreteActSpace will inherit from gymnasium if it’s installed (in this case it will be DiscreteActSpaceGymnasium), otherwise it will inherit from gym (and will be exactly DiscreteActSpaceLegacyGym)
DiscreteActSpaceGymnasium will inherit from gymnasium if it’s available and never from from gym
DiscreteActSpaceLegacyGym will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

Examples

We recommend to use it like:

import grid2op
env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv, DiscreteActSpace
gym_env = GymEnv(env)

gym_env.observation_space = DiscreteActSpace(env.observation_space,
                                             attr_to_keep=['redispatch', "curtail", "set_bus"])

The possible attribute you can provide in the “attr_to_keep” are:

“set_line_status”
“set_line_status_simple” (grid2op >= 1.6.6) : set line status adds 5 actions per powerlines:
1. disconnect it
2. connect origin side to busbar 1 and extermity side to busbar 1
3. connect origin side to busbar 1 and extermity side to busbar 2
4. connect origin side to busbar 2 and extermity side to busbar 1
5. connect origin side to busbar 2 and extermity side to busbar 2
This is “over complex” for most use case where you just want to “connect it” or “disconnect it”. If you want the simplest version, just use “set_line_status_simple”.
“change_line_status”
“set_bus”: corresponds to changing the topology using the “set_bus” (equivalent to the “one_sub_set” keyword in the “attr_to_keep” of the MultiDiscreteActSpace)
“change_bus”: corresponds to changing the topology using the “change_bus” (equivalent to the “one_sub_change” keyword in the “attr_to_keep” of the MultiDiscreteActSpace)
“redispatch”
“set_storage”
“curtail”
“curtail_mw” (same effect as “curtail”)

If you do not want (each time) to build all the actions from the action space, but would rather save the actions you find the most interesting and then reload them, you can, for example:

import grid2op
from grid2op.gym_compat import GymEnv, DiscreteActSpace
env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)

gym_env = GymEnv(env)
action_list = ... # a list of action, that can be processed by
# IdToAct.init_converter (all_actions): see
# https://grid2op.readthedocs.io/en/latest/converter.html#grid2op.Converter.IdToAct.init_converter
gym_env.observation_space = DiscreteActSpace(env.observation_space,
                                             action_list=action_list)

Note

This last version (providing explicitly the actions you want to keep and their ID) is much (much) safer and reproducible. Indeed, the actions usable by your agent will be the same (and in the same order) regardless of the grid2op version, of the person using it, of pretty much everything.

It might not be consistent (between different grid2op versions) if the actions are built from scratch (for example, depending on the grid2op version other types of actions can be made, such as curtailment, or actions on storage units) like it’s the case with the key-words (eg “set_bus”) you pass as argument in the attr_to_keep

class grid2op.gym_compat.DiscreteActSpaceLegacyGym(grid2op_action_space: ActionSpace, attr_to_keep: Tuple[Literal['set_line_status'], Literal['set_line_status_simple'], Literal['change_line_status'], Literal['set_bus'], Literal['change_bus'], Literal['redispatch'], Literal['set_storage'], Literal['curtail'], Literal['curtail_mw']] | None = ('set_line_status', 'change_line_status', 'set_bus', 'change_bus', 'redispatch', 'set_storage', 'curtail'), nb_bins: Dict[Literal['redispatch', 'set_storage', 'curtail', 'curtail_mw'], int] | None = None, action_list=None)

TODO the documentation of this class is in progress.

This class allows to convert a grid2op action space into a gymnasium “Discrete”. This means that the action are labeled, and instead of describing the action itself, you provide only its ID.

Let’s take an example of line disconnection. In the “standard” gymnasium representation you need to:

import grid2op
import numpy as np
from grid2op.gym_compat import GymEnv

env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)
gym_env = GymEnv(env)

# now do an action
gym_act = {}
gym_act["set_bus"]  = np.zeros(env.n_line, dtype=np.int)
l_id = ... # the line you want to disconnect
gym_act["set_bus"][l_id] = -1
obs, reward, done, truncated, info = gym_env.step(gym_act)

This has the advantage to be as close as possible to raw grid2op. But the main drawback is that most of RL framework are not able to do this kind of modification easily. For discrete actions, what is often do is:

enumerate all possible actions (say you have n different actions)
assign a unique id to all actions (say from 0 to n-1)
have a “policy” output a vector of size n with each component representing an action (eg vect[42] represents the score the policy assign to action 42)

Instead of having everyone doing the modifications “on its own” we developed the DiscreteActSpace that does exactly this, in a single line of code:

import grid2op
import numpy as np
from grid2op.gym_compat import GymEnv, DiscreteActSpace

env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)
gym_env = GymEnv(env)
gym_env.action_space = DiscreteActSpace(env.action_space,
                                        attr_to_keep=["set_bus",
                                                      "set_line_status",
                                                      # or anything else
                                                     ]
                                        )

# do action with ID 42
gym_act = 42
obs, reward, done, truncated, info = gym_env.step(gym_act)
# to know what the action did, you can
# print(gym_env.action_space.from_gym(gym_act))

It is related to the MultiDiscreteActSpace but compared to this other representation, it does not allow to do “multiple actions”. Typically, if you use the snippets below:

import grid2op
env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv, MultiDiscreteActSpace, DiscreteActSpace
gym_env1 = GymEnv(env)
gym_env2 = GymEnv(env)

gym_env1.action_space = MultiDiscreteActSpace(env.action_space,
                                              attr_to_keep=['redispatch', "curtail", "one_sub_set"])
gym_env2.action_space = DiscreteActSpace(env.action_space,
                                         attr_to_keep=['redispatch', "curtail", "set_bus"])

Then at each step, gym_env1 will allow to perform a redispatching action (on any number of generators), a curtailment action (on any number of generators) __**AND**__ changing the topology at one substation. But at each steps, the agent should predicts lots of “number”.

On the other hand, at each step, the agent for gym_env2 will have to predict a single integer (which is usually the case in most RL environment) but it this action will affect redispatching on a single generator, perform curtailment on a single generator __**OR**__ changing the topology at one substation. But at each steps, the agent should predicts only one “number”.

The action set is then largely constrained compared to the MultiDiscreteActSpace

Note

This class is really closely related to the grid2op.Converter.IdToAct. It basically “maps” this “IdToAct” into a type of gymnasium space, which, in this case, will be a Discrete one.

Note

By default, the “do nothing” action is encoded by the integer ‘0’.

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

DiscreteActSpace will inherit from gymnasium if it’s installed (in this case it will be DiscreteActSpaceGymnasium), otherwise it will inherit from gym (and will be exactly DiscreteActSpaceLegacyGym)
DiscreteActSpaceGymnasium will inherit from gymnasium if it’s available and never from from gym
DiscreteActSpaceLegacyGym will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

Examples

We recommend to use it like:

import grid2op
env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv, DiscreteActSpace
gym_env = GymEnv(env)

gym_env.observation_space = DiscreteActSpace(env.observation_space,
                                             attr_to_keep=['redispatch', "curtail", "set_bus"])

The possible attribute you can provide in the “attr_to_keep” are:

“set_line_status”
“set_line_status_simple” (grid2op >= 1.6.6) : set line status adds 5 actions per powerlines:
1. disconnect it
2. connect origin side to busbar 1 and extermity side to busbar 1
3. connect origin side to busbar 1 and extermity side to busbar 2
4. connect origin side to busbar 2 and extermity side to busbar 1
5. connect origin side to busbar 2 and extermity side to busbar 2
This is “over complex” for most use case where you just want to “connect it” or “disconnect it”. If you want the simplest version, just use “set_line_status_simple”.
“change_line_status”
“set_bus”: corresponds to changing the topology using the “set_bus” (equivalent to the “one_sub_set” keyword in the “attr_to_keep” of the MultiDiscreteActSpace)
“change_bus”: corresponds to changing the topology using the “change_bus” (equivalent to the “one_sub_change” keyword in the “attr_to_keep” of the MultiDiscreteActSpace)
“redispatch”
“set_storage”
“curtail”
“curtail_mw” (same effect as “curtail”)

If you do not want (each time) to build all the actions from the action space, but would rather save the actions you find the most interesting and then reload them, you can, for example:

import grid2op
from grid2op.gym_compat import GymEnv, DiscreteActSpace
env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)

gym_env = GymEnv(env)
action_list = ... # a list of action, that can be processed by
# IdToAct.init_converter (all_actions): see
# https://grid2op.readthedocs.io/en/latest/converter.html#grid2op.Converter.IdToAct.init_converter
gym_env.observation_space = DiscreteActSpace(env.observation_space,
                                             action_list=action_list)

Note

This last version (providing explicitly the actions you want to keep and their ID) is much (much) safer and reproducible. Indeed, the actions usable by your agent will be the same (and in the same order) regardless of the grid2op version, of the person using it, of pretty much everything.

It might not be consistent (between different grid2op versions) if the actions are built from scratch (for example, depending on the grid2op version other types of actions can be made, such as curtailment, or actions on storage units) like it’s the case with the key-words (eg “set_bus”) you pass as argument in the attr_to_keep

grid2op.gym_compat.GymActionSpace: alias of GymnasiumActionSpace

grid2op.gym_compat.GymEnv: alias of GymnasiumEnv

class grid2op.gym_compat.GymEnv_Legacy(env_init: Environment, shuffle_chronics: bool | None = True, render_mode: Literal['rgb_array'] = 'rgb_array', with_forecast: bool = False)[source]

fully implements the gymnasium API by using the GymActionSpace and GymObservationSpace for compliance with gymnasium.

They can handle action_space_converter or observation_space converter to change the representation of data that will be fed to the agent. #TODO

Warning

The gym package has some breaking API change since its version 0.26. Depending on the version installed, we attempted, in grid2op, to maintain compatibility both with former version and later one. This makes this class behave differently depending on the version of gymnasium / gym you have installed !

The main changes involve the functions env.step and env.reset

If you want to use the same version of the GymEnv regardless of the gym / gymnasium version installed you can use:

GymnasiumEnv if gymnasium is available
GymEnv_Legacy for gym < 0.26
GymEnv_Modern for gym >= 0.26

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

GymEnv will inherit from gymnasium if it’s installed (in this case it will be GymnasiumEnv), otherwise it will inherit from gym (and will be exactly GymEnv_Legacy - gym < 0.26- or GymEnv_Modern - for gym >= 0.26)
GymnasiumEnv will inherit from gymnasium if it’s available and never from from gym
GymEnv_Legacy and GymEnv_Modern will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

Notes

The environment passed as input is copied. It is not modified by this “gymnasium environment”

Examples

This can be used like:

import grid2op
from grid2op.gym_compat import GymEnv

env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)
gym_env = GymEnv(env)  # is a gymnasium environment properly inheriting from gym.Env !

There are a few difference between “raw” grid2op environment and gymnasium environments.

One of the major difference is that, to our knowledge, gymnasium does not support the simulate feature (which allows an agent to test the impact of a given action on the grid without having to perform a step see Model Based / Planning methods for more information) [NB if you know or better are developping some “model based RL library” let us know !]

Another difference is in the way to do some actions. In grid2op, actions are a dedicated class and can be made with an action_space and a dictionary, or using the properties of the action class.

In gym, there are no specific representations of the action class. More precisely, for each action type (MultiDiscreteActSpace, DiscreteActSpace, BoxGymActSpace or GymActionSpace) there is a way to encode it. For example, by default (GymActionSpace) an action is represented through an Dict (from collection import OrderedDict)

Methods:

`reset`(args, *kwargs)	Resets the environment to an initial state and returns the initial observation.
`step`(action)	Run one timestep of the environment's dynamics.

reset(*args, **kwargs) → ObsType[source]

Resets the environment to an initial state and returns the initial observation.

This method can reset the environment’s random number generator(s) if seed is an integer or if the environment has not yet initialized a random number generator. If the environment already has a random number generator and reset() is called with seed=None, the RNG should not be reset. Moreover, reset() should (in the typical use case) be called with an integer seed right after initialization and then never again.

Parameters:

seed (optional int) – The seed that is used to initialize the environment’s PRNG. If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again. Please refer to the minimal example above to see this paradigm in action.
options (optional dict) – Additional information to specify how the environment is reset (optional, depending on the specific environment)

Returns:

Observation of the initial state. This will be an element of observation_space: (typically a numpy array) and is analogous to the observation returned by step().
info (dictionary): This dictionary contains auxiliary information complementing observation. It should be analogous to: the info returned by step().

Return type:

observation (object)

step(action: ActType) → Tuple[ObsType, float, bool, Dict[Literal['disc_lines', 'is_illegal', 'is_ambiguous', 'failed_redispatching', 'is_illegal_reco', 'reason_alarm_illegal', 'reason_alert_illegal', 'opponent_attack_line', 'opponent_attack_sub', 'exception', 'detailed_infos_for_cascading_failures', 'rewards', 'time_series_id'], Any]][source]

Run one timestep of the environment’s dynamics.

When end of episode is reached, you are responsible for calling reset() to reset this environment’s state. Accepts an action and returns either a tuple (observation, reward, terminated, truncated, info).

Parameters:

action (ActType) – an action provided by the agent

Returns:

this will be an element of the environment’s observation_space.: This may, for instance, be a numpy array containing the positions and velocities of certain objects.

reward (float): The amount of reward returned as a result of taking the action. terminated (bool): whether a terminal state (as defined under the MDP of the task) is reached.

In this case further step() calls could return undefined results.

truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.: Typically a timelimit, but could also be used to indicate agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached.
info (dictionary): info contains auxiliary diagnostic information (helpful for debugging, learning, and logging).: This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward. It also can contain information that distinguishes truncation and termination, however this is deprecated in favour of returning two booleans, and will be removed in a future version.

(deprecated) done (bool): A boolean value for if the episode has ended, in which case further step() calls will return undefined results.

A done signal may be emitted for different reasons: Maybe the task underlying the environment was solved successfully, a certain timelimit was exceeded, or the physics simulation has entered an invalid state.

Return type:

observation (object)

class grid2op.gym_compat.GymEnv_Modern(env_init: Environment, shuffle_chronics: bool | None = True, render_mode: Literal['rgb_array'] = 'rgb_array', with_forecast: bool = False)[source]

fully implements the gymnasium API by using the GymActionSpace and GymObservationSpace for compliance with gymnasium.

They can handle action_space_converter or observation_space converter to change the representation of data that will be fed to the agent. #TODO

Warning

The gym package has some breaking API change since its version 0.26. Depending on the version installed, we attempted, in grid2op, to maintain compatibility both with former version and later one. This makes this class behave differently depending on the version of gymnasium / gym you have installed !

The main changes involve the functions env.step and env.reset

If you want to use the same version of the GymEnv regardless of the gym / gymnasium version installed you can use:

GymnasiumEnv if gymnasium is available
GymEnv_Legacy for gym < 0.26
GymEnv_Modern for gym >= 0.26

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

GymEnv will inherit from gymnasium if it’s installed (in this case it will be GymnasiumEnv), otherwise it will inherit from gym (and will be exactly GymEnv_Legacy - gym < 0.26- or GymEnv_Modern - for gym >= 0.26)
GymnasiumEnv will inherit from gymnasium if it’s available and never from from gym
GymEnv_Legacy and GymEnv_Modern will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

Notes

The environment passed as input is copied. It is not modified by this “gymnasium environment”

Examples

This can be used like:

import grid2op
from grid2op.gym_compat import GymEnv

env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)
gym_env = GymEnv(env)  # is a gymnasium environment properly inheriting from gym.Env !

There are a few difference between “raw” grid2op environment and gymnasium environments.

One of the major difference is that, to our knowledge, gymnasium does not support the simulate feature (which allows an agent to test the impact of a given action on the grid without having to perform a step see Model Based / Planning methods for more information) [NB if you know or better are developping some “model based RL library” let us know !]

Another difference is in the way to do some actions. In grid2op, actions are a dedicated class and can be made with an action_space and a dictionary, or using the properties of the action class.

In gym, there are no specific representations of the action class. More precisely, for each action type (MultiDiscreteActSpace, DiscreteActSpace, BoxGymActSpace or GymActionSpace) there is a way to encode it. For example, by default (GymActionSpace) an action is represented through an Dict (from collection import OrderedDict)

Methods:

`reset`(*[, seed, options])	Resets the environment to an initial state and returns the initial observation.
`step`(action)	Run one timestep of the environment's dynamics.

reset(*, seed: int | None = None, options: Dict[Literal['time serie id'], int] | Dict[Literal['init state'], Dict[Literal['set_line_status', 'change_line_status', 'set_bus', 'change_bus', 'redispatch', 'set_storage', 'curtail', 'raise_alarm', 'raise_alert', 'injection', 'hazards', 'maintenance', 'shunt', 'detach_load', 'detach_gen', 'detach_storage'], Any]] | Dict[Literal['init ts'], int] | Dict[Literal['max step'], int] | Dict[Literal['thermal limit'], List[float] | Dict[str, float]] | Dict[Literal['init datetime'], str | datetime] | None = None) → Tuple[ObsType, Dict[Literal['time serie id', 'seed', 'grid2op_env_seed', 'underlying_env_seeds'], Any]][source]

Resets the environment to an initial state and returns the initial observation.

This method can reset the environment’s random number generator(s) if seed is an integer or if the environment has not yet initialized a random number generator. If the environment already has a random number generator and reset() is called with seed=None, the RNG should not be reset. Moreover, reset() should (in the typical use case) be called with an integer seed right after initialization and then never again.

Parameters:

seed (optional int) – The seed that is used to initialize the environment’s PRNG. If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again. Please refer to the minimal example above to see this paradigm in action.
options (optional dict) – Additional information to specify how the environment is reset (optional, depending on the specific environment)

Returns:

Observation of the initial state. This will be an element of observation_space: (typically a numpy array) and is analogous to the observation returned by step().
info (dictionary): This dictionary contains auxiliary information complementing observation. It should be analogous to: the info returned by step().

Return type:

observation (object)

step(action: ActType) → Tuple[ObsType, float, bool, bool, Dict[Literal['disc_lines', 'is_illegal', 'is_ambiguous', 'failed_redispatching', 'is_illegal_reco', 'reason_alarm_illegal', 'reason_alert_illegal', 'opponent_attack_line', 'opponent_attack_sub', 'exception', 'detailed_infos_for_cascading_failures', 'rewards', 'time_series_id'], Any]][source]

Run one timestep of the environment’s dynamics.

When end of episode is reached, you are responsible for calling reset() to reset this environment’s state. Accepts an action and returns either a tuple (observation, reward, terminated, truncated, info).

Parameters:

action (ActType) – an action provided by the agent

Returns:

this will be an element of the environment’s observation_space.: This may, for instance, be a numpy array containing the positions and velocities of certain objects.

reward (float): The amount of reward returned as a result of taking the action. terminated (bool): whether a terminal state (as defined under the MDP of the task) is reached.

In this case further step() calls could return undefined results.

truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.: Typically a timelimit, but could also be used to indicate agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached.
info (dictionary): info contains auxiliary diagnostic information (helpful for debugging, learning, and logging).: This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward. It also can contain information that distinguishes truncation and termination, however this is deprecated in favour of returning two booleans, and will be removed in a future version.

(deprecated) done (bool): A boolean value for if the episode has ended, in which case further step() calls will return undefined results.

A done signal may be emitted for different reasons: Maybe the task underlying the environment was solved successfully, a certain timelimit was exceeded, or the physics simulation has entered an invalid state.

Return type:

observation (object)

grid2op.gym_compat.GymObservationSpace: alias of GymnasiumObservationSpace

class grid2op.gym_compat.GymnasiumActionSpace(env, converter=None, dict_variables=None)

This class enables the conversion of the action space into a gymnasium “space”.

Resulting action space will be a gym.spaces.Dict.

NB it is NOT recommended to use the sample of the gymnasium action space. Please use the sampling ( if availabe) of the original action space instead [if not available this means there is no implemented way to generate reliable random action]

Note that gymnasium space converted with this class should be seeded independently. It is NOT seeded when calling grid2op.Environment.Environment.seed().

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

GymActionSpace will inherit from gymnasium if it’s installed (in this case it will be GymnasiumActionSpace), otherwise it will inherit from gym (and will be exactly LegacyGymActionSpace)
GymnasiumActionSpace will inherit from gymnasium if it’s available and never from from gym
LegacyGymActionSpace will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

Note

A gymnasium Dict can be encoded as a OrderedDict (from collection import OrderedDict) see the example section for more information.

Examples

For the “l2rpn_case14_sandbox” environment, a code using BoxGymActSpace can look something like (if you want to build action “by hands”):

import grid2op
from grid2op.gym_compat import GymEnv
import numpy as np
env_name = "l2rpn_case14_sandbox"

env = grid2op.make(env_name)
gym_env =  GymEnv(env)

obs = gym_env.reset()  # obs will be an Dict (default, but you can customize it)

# is equivalent to "do nothing"
act = {}
obs, reward, done, truncated, info = gym_env.step(act)

# you can also do a random action:
act = gym_env.action_space.sample()
print(gym_env.action_space.from_gym(act))
obs, reward, done, truncated, info = gym_env.step(act)

# you can chose the action you want to do (say "redispatch" for example)
# here a random redispatch action
act = {}
attr_nm = "redispatch"
act[attr_nm] = np.random.uniform(high=gym_env.action_space.spaces[attr_nm].low,
                                low=gym_env.action_space.spaces[attr_nm].high,
                                size=env.n_gen)
print(gym_env.action_space.from_gym(act))
obs, reward, done, truncated, info = gym_env.step(act)

class grid2op.gym_compat.GymnasiumEnv(env_init: Environment, shuffle_chronics: bool | None = True, render_mode: Literal['rgb_array'] = 'rgb_array', with_forecast: bool = False)[source]

fully implements the gymnasium API by using the GymActionSpace and GymObservationSpace for compliance with gymnasium.

They can handle action_space_converter or observation_space converter to change the representation of data that will be fed to the agent. #TODO

Warning

The gym package has some breaking API change since its version 0.26. Depending on the version installed, we attempted, in grid2op, to maintain compatibility both with former version and later one. This makes this class behave differently depending on the version of gymnasium / gym you have installed !

The main changes involve the functions env.step and env.reset

If you want to use the same version of the GymEnv regardless of the gym / gymnasium version installed you can use:

GymnasiumEnv if gymnasium is available
GymEnv_Legacy for gym < 0.26
GymEnv_Modern for gym >= 0.26

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

GymEnv will inherit from gymnasium if it’s installed (in this case it will be GymnasiumEnv), otherwise it will inherit from gym (and will be exactly GymEnv_Legacy - gym < 0.26- or GymEnv_Modern - for gym >= 0.26)
GymnasiumEnv will inherit from gymnasium if it’s available and never from from gym
GymEnv_Legacy and GymEnv_Modern will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

Notes

The environment passed as input is copied. It is not modified by this “gymnasium environment”

Examples

This can be used like:

import grid2op
from grid2op.gym_compat import GymEnv

env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)
gym_env = GymEnv(env)  # is a gymnasium environment properly inheriting from gym.Env !

There are a few difference between “raw” grid2op environment and gymnasium environments.

One of the major difference is that, to our knowledge, gymnasium does not support the simulate feature (which allows an agent to test the impact of a given action on the grid without having to perform a step see Model Based / Planning methods for more information) [NB if you know or better are developping some “model based RL library” let us know !]

Another difference is in the way to do some actions. In grid2op, actions are a dedicated class and can be made with an action_space and a dictionary, or using the properties of the action class.

In gym, there are no specific representations of the action class. More precisely, for each action type (MultiDiscreteActSpace, DiscreteActSpace, BoxGymActSpace or GymActionSpace) there is a way to encode it. For example, by default (GymActionSpace) an action is represented through an Dict (from collection import OrderedDict)

Methods:

`reset`(*[, seed, options])	This function will reset the underlying grid2op environment and return the next state of the grid (as the gymnasium observation) and some other information.
`step`(action)	Run one timestep of the environment’s dynamics using the agent actions.

reset(*, seed: int | None = None, options: Dict[Literal['time serie id'], int] | Dict[Literal['init state'], Dict[Literal['set_line_status', 'change_line_status', 'set_bus', 'change_bus', 'redispatch', 'set_storage', 'curtail', 'raise_alarm', 'raise_alert', 'injection', 'hazards', 'maintenance', 'shunt', 'detach_load', 'detach_gen', 'detach_storage'], Any]] | Dict[Literal['init ts'], int] | Dict[Literal['max step'], int] | Dict[Literal['thermal limit'], List[float] | Dict[str, float]] | Dict[Literal['init datetime'], str | datetime] | None = None) → Tuple[ObsType, Dict[Literal['time serie id', 'seed', 'grid2op_env_seed', 'underlying_env_seeds'], Any]][source]

This function will reset the underlying grid2op environment and return the next state of the grid (as the gymnasium observation) and some other information.

Parameters:

seed (Optional[int], optional) – The seed for this new environment, by default None
options (RESET_OPTIONS_TYPING, optional) – See the documentation of grid2op.Environment.Environment.reset() for more information about it, by default None

Returns:

_description_

Return type:

Tuple[ ObsType, RESET_INFO_GYM_TYPING ]

step(action: ActType) → Tuple[ObsType, float, bool, bool, Dict[Literal['disc_lines', 'is_illegal', 'is_ambiguous', 'failed_redispatching', 'is_illegal_reco', 'reason_alarm_illegal', 'reason_alert_illegal', 'opponent_attack_line', 'opponent_attack_sub', 'exception', 'detailed_infos_for_cascading_failures', 'rewards', 'time_series_id'], Any]][source]

Run one timestep of the environment’s dynamics using the agent actions.

When the end of an episode is reached (terminated or truncated), it is necessary to call reset() to reset this environment’s state for the next episode.

Parameters:

action (ActType) –

An action that can be process by the grid2op.gym_compat.gym_act_space.GymActionSpace.from_gym() (given in the form of a gymnasium action belonging to a gymnasium space.).

For example it can be a sorted dictionary if you are using default grid2op.gym_compat.gym_act_space.GymActionSpace or a numpy array if you are using grid2op.gym_compat.box_gym_actspace.BoxGymnasiumActSpace

Returns:

observation: an instance of the current observation space (can be a dictionary, a numpy array etc.)
reward: the reward for the previous action
truncated: whether the environment was terminated
done: whether the environment is done
info: other information, see grid2op.Environment.BaseEnv.step() for more information about the available informations.

Return type:

Tuple[ObsType, float, bool, bool, STEP_INFO_TYPING]

class grid2op.gym_compat.GymnasiumObservationSpace(env, dict_variables=None)

TODO explain gym / gymnasium

This class allows to transform the observation space into a gymnasium space.

Gymnasium space will be a gym.spaces.Dict with the keys being the different attributes of the grid2op observation. All attributes are used.

Note that gymnasium space converted with this class should be seeded independently. It is NOT seeded when calling grid2op.Environment.Environment.seed().

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

GymObservationSpace will inherit from gymnasium if it’s installed (in this case it will be GymnasiumObservationSpace), otherwise it will inherit from gym (and will be exactly LegacyGymObservationSpace)
GymnasiumObservationSpace will inherit from gymnasium if it’s available and never from from gym
LegacyGymObservationSpace will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

Examples

Converting an observation space is fairly straightforward:

import grid2op
from grid2op.Converter import GymObservationSpace
env = grid2op.make("l2rpn_case14_sandbox")

gym_observation_space = GymObservationSpace(env.observation_space)
# and now gym_observation_space is a `gymnasium.spaces.dict.Dict` representing the observation space

# you can "convert" the grid2op observation to / from this space with:

grid2op_obs = env.reset()
same_gym_obs = gym_observation_space.to_gym(grid2op_obs)

# the conversion from gym_obs to grid2op obs is feasible, but i don't imagine
# a situation where it is useful. And especially, you will not be able to
# use "obs.simulate" for the observation converted back from this gymnasium action.

Notes

The range of the values for “gen_p” / “prod_p” are not strictly env.gen_pmin and env.gen_pmax. This is due to the “approximation” when some redispatching is performed (the precision of the algorithm that computes the actual dispatch from the information it receives) and also because sometimes the losses of the grid are really different that the one anticipated in the “chronics” (yes env.gen_pmin and env.gen_pmax are not always ensured in grid2op)

class grid2op.gym_compat.LegacyGymActionSpace(env, converter=None, dict_variables=None)

This class enables the conversion of the action space into a gymnasium “space”.

Resulting action space will be a gym.spaces.Dict.

NB it is NOT recommended to use the sample of the gymnasium action space. Please use the sampling ( if availabe) of the original action space instead [if not available this means there is no implemented way to generate reliable random action]

Note that gymnasium space converted with this class should be seeded independently. It is NOT seeded when calling grid2op.Environment.Environment.seed().

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

GymActionSpace will inherit from gymnasium if it’s installed (in this case it will be GymnasiumActionSpace), otherwise it will inherit from gym (and will be exactly LegacyGymActionSpace)
GymnasiumActionSpace will inherit from gymnasium if it’s available and never from from gym
LegacyGymActionSpace will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

Note

A gymnasium Dict can be encoded as a OrderedDict (from collection import OrderedDict) see the example section for more information.

Examples

For the “l2rpn_case14_sandbox” environment, a code using BoxGymActSpace can look something like (if you want to build action “by hands”):

import grid2op
from grid2op.gym_compat import GymEnv
import numpy as np
env_name = "l2rpn_case14_sandbox"

env = grid2op.make(env_name)
gym_env =  GymEnv(env)

obs = gym_env.reset()  # obs will be an Dict (default, but you can customize it)

# is equivalent to "do nothing"
act = {}
obs, reward, done, truncated, info = gym_env.step(act)

# you can also do a random action:
act = gym_env.action_space.sample()
print(gym_env.action_space.from_gym(act))
obs, reward, done, truncated, info = gym_env.step(act)

# you can chose the action you want to do (say "redispatch" for example)
# here a random redispatch action
act = {}
attr_nm = "redispatch"
act[attr_nm] = np.random.uniform(high=gym_env.action_space.spaces[attr_nm].low,
                                low=gym_env.action_space.spaces[attr_nm].high,
                                size=env.n_gen)
print(gym_env.action_space.from_gym(act))
obs, reward, done, truncated, info = gym_env.step(act)

class grid2op.gym_compat.LegacyGymObservationSpace(env, dict_variables=None)

TODO explain gym / gymnasium

This class allows to transform the observation space into a gymnasium space.

Gymnasium space will be a gym.spaces.Dict with the keys being the different attributes of the grid2op observation. All attributes are used.

Note that gymnasium space converted with this class should be seeded independently. It is NOT seeded when calling grid2op.Environment.Environment.seed().

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

GymObservationSpace will inherit from gymnasium if it’s installed (in this case it will be GymnasiumObservationSpace), otherwise it will inherit from gym (and will be exactly LegacyGymObservationSpace)
GymnasiumObservationSpace will inherit from gymnasium if it’s available and never from from gym
LegacyGymObservationSpace will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

Examples

Converting an observation space is fairly straightforward:

import grid2op
from grid2op.Converter import GymObservationSpace
env = grid2op.make("l2rpn_case14_sandbox")

gym_observation_space = GymObservationSpace(env.observation_space)
# and now gym_observation_space is a `gymnasium.spaces.dict.Dict` representing the observation space

# you can "convert" the grid2op observation to / from this space with:

grid2op_obs = env.reset()
same_gym_obs = gym_observation_space.to_gym(grid2op_obs)

# the conversion from gym_obs to grid2op obs is feasible, but i don't imagine
# a situation where it is useful. And especially, you will not be able to
# use "obs.simulate" for the observation converted back from this gymnasium action.

Notes

The range of the values for “gen_p” / “prod_p” are not strictly env.gen_pmin and env.gen_pmax. This is due to the “approximation” when some redispatching is performed (the precision of the algorithm that computes the actual dispatch from the information it receives) and also because sometimes the losses of the grid are really different that the one anticipated in the “chronics” (yes env.gen_pmin and env.gen_pmax are not always ensured in grid2op)

grid2op.gym_compat.MultiDiscreteActSpace: alias of MultiDiscreteActSpaceGymnasium

class grid2op.gym_compat.MultiDiscreteActSpaceGymnasium(grid2op_action_space: ActionSpace, attr_to_keep: Tuple[Literal['set_line_status'], Literal['change_line_status'], Literal['set_bus'], Literal['sub_set_bus'], Literal['one_sub_set'], Literal['change_bus'], Literal['sub_change_bus'], Literal['one_sub_change'], Literal['redispatch'], Literal['set_storage'], Literal['curtail'], Literal['curtail_mw'], Literal['one_line_set'], Literal['one_line_change']] | None = ('set_line_status', 'change_line_status', 'set_bus', 'change_bus', 'redispatch', 'set_storage', 'curtail', 'raise_alarm', 'raise_alert'), nb_bins: Dict[Literal['redispatch', 'set_storage', 'curtail', 'curtail_mw'], int] | None = None)

This class allows to convert a grid2op action space into a gymnasium “MultiDiscrete”. This means that the action are labeled, and instead of describing the action itself, you provide only its ID.

Note

This action space is particularly suited for represented discrete actions.

It is possible to represent continuous actions with it. In that case, the continuous actions are “binarized” thanks to the ContinuousToDiscreteConverter. Feel free to consult its documentation for more information.

In this case it will extract all the features in all the action with:

“set_line_status”: n_line dimensions, each containing 3 choices “DISCONNECT”, “DONT AFFECT”, “FORCE CONNECTION” and affecting the powerline status (connected / disconnected)
“change_line_status”: n_line dimensions, each containing 2 elements “CHANGE”, “DONT CHANGE” and affecting the powerline status (connected / disconnected)
“set_bus”: dim_topo dimensions, each containing 4 choices: “DISCONNECT”, “DONT AFFECT”, “CONNECT TO BUSBAR 1”, or “CONNECT TO BUSBAR 2”, “CONNECT TO BUSBAR 3”, … and affecting to which busbar an object is connected
“change_bus”: dim_topo dimensions, each containing 2 choices: “CHANGE”, “DONT CHANGE” and affect to which busbar an element is connected
“redispatch”: sum(env.gen_redispatchable) dimensions, each containing a certain number of choices depending on the value of the keyword argument nb_bins[“redispatch”] (by default 7).
“curtail”: sum(env.gen_renewable) dimensions, each containing a certain number of choices depending on the value of the keyword argument nb_bins[“curtail”] (by default 7). This is the “conversion to discrete action” of the curtailment action.
“curtail_mw”: sum(env.gen_renewable) dimensions, completely equivalent to “curtail” for this representation. This is the “conversion to discrete action” of the curtailment action.
“set_storage”: n_storage dimensions, each containing a certain number of choices depending on the value of the keyword argument nb_bins[“set_storage”] (by default 7). This is the “conversion to discrete action” of the action on storage units.
“raise_alarm”: TODO
“raise_alert”: TODO

We offer some extra customization, with the keywords:

“sub_set_bus”: n_sub dimension. This type of representation encodes each different possible combination of elements that are possible at each substation. The choice at each component depends on the element connected at this substation. Only configurations that will not lead to straight game over will be generated.
“sub_change_bus”: n_sub dimension. Same comment as for “sub_set_bus”
“one_sub_set”: 1 single dimension. This type of representation differs from the previous one only by the fact that each step you can perform only one single action on a single substation (so unlikely to be illegal).
“one_sub_change”: 1 single dimension. Same as above.
“one_line_set”: 1 single dimension. In this type of representation, you have one dimension with 1 + 2 * n_line elements: first is “do nothing”, then next elements control the force connection or disconnection of the powerlines (new in version 1.10.0)
“one_line_change”: 1 single dimension. In this type of representation, you have 1 + n_line possibility for this element. First one is “do nothing” then it controls the change of status of any given line (new in version 1.10.0).

Warning

We recommend to use either “set” or “change” way to look at things (ie either you want to target a given state -in that case use “sub_set_bus”, “line_set_status”, “one_sub_set”, or “set_bus” __**OR**__ you prefer reasoning in terms of “i want to change this or that” in that case use “sub_change_bus”, “line_change_status”, “one_sub_change” or “change_bus”.

Combining a “set” and “change” on the same element will most likely lead to an “ambiguous action”. Indeed what grid2op can do if you “tell element A to go to bus 1” and “tell the same element A to switch to bus 2 if it was to 1 and to move to bus 1 if it was on bus 2”. It’s not clear at all (hence the “ambiguous”).

No error will be thrown if you mix this, this is your absolute right, be aware it might not lead to the result you expect though.

Note

The arguments “set_bus”, “sub_set_bus” and “one_sub_set” will all perform “set_bus” actions. The only difference if “how you represent these actions”:

In “set_bus” each component represent a single element of the grid. When you sample an action with this keyword you will possibly change all the elements of the grid at once (this is likely to be illega). Nothing prevents you to perform “weird” stuff, for example disconnecting a load or a generator (which is straight game over) or having a load or a generator that will be “alone” on a busbar (which will also lead to a straight game over). You can do anything with it, but as always “A great power comes with a great responsibility”.
In “sub_set_bus” each component represent a substation of the grid. When you sample an action from this, you will possibly change all the elements of the grid at once (because you can act on all the substation at the same time). As opposed to “set_bus” however this constraint the action space to “action that will not lead directly to a game over”, in practice.
In “one_sub_set”: the single component represent the whole grid. When you sample an action with this, you will sample a single action acting on a single substation. You will not be able to act on multiple substation with this.

For this reason, we also do not recommend using only one of these arguments and only provide only one of “set_bus”, “sub_set_bus” and “one_sub_set”. Again, no error will be thrown if you mix them but be warned that the resulting behaviour might not be what you expect.

Warning

The same as above holds for “change_bus”, “sub_change_bus” and “one_sub_change”: Use only one of these !

Danger

The keys set_bus and change_bus does not have the same meaning between this representation of the action and the DiscreteActSpace.

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

MultiDiscreteActSpace will inherit from gymnasium if it’s installed (in this case it will be MultiDiscreteActSpaceGymnasium), otherwise it will inherit from gym (and will be exactly MultiDiscreteActSpaceLegacyGym)
MultiDiscreteActSpaceGymnasium will inherit from gymnasium if it’s available and never from from gym
MultiDiscreteActSpaceLegacyGym will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

Examples

If you simply want to use it you can do:

import grid2op
env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv, MultiDiscreteActSpace
gym_env = GymEnv(env)

gym_env.action_space = MultiDiscreteActSpace(env.action_space)

You can select the attribute you want to keep, for example:

gym_env.action_space = MultiDiscreteActSpace(env.observation_space,
                                             attr_to_keep=['redispatch', "curtail", "sub_set_bus"])

You can also apply some basic transformation when you “discretize” continuous action

gym_env.action_space = MultiDiscreteActSpace(env.observation_space,
                                             attr_to_keep=['redispatch', "curtail", "sub_set_bus"],
                                             nb_bins={"redispatch": 3, "curtail": 17},
                                             )

By default it is “discretized” in 7 different “bins”. The more “bins” there will be, the more “precise” you can be in your control, but the higher the dimension of the action space.

class grid2op.gym_compat.MultiDiscreteActSpaceLegacyGym(grid2op_action_space: ActionSpace, attr_to_keep: Tuple[Literal['set_line_status'], Literal['change_line_status'], Literal['set_bus'], Literal['sub_set_bus'], Literal['one_sub_set'], Literal['change_bus'], Literal['sub_change_bus'], Literal['one_sub_change'], Literal['redispatch'], Literal['set_storage'], Literal['curtail'], Literal['curtail_mw'], Literal['one_line_set'], Literal['one_line_change']] | None = ('set_line_status', 'change_line_status', 'set_bus', 'change_bus', 'redispatch', 'set_storage', 'curtail', 'raise_alarm', 'raise_alert'), nb_bins: Dict[Literal['redispatch', 'set_storage', 'curtail', 'curtail_mw'], int] | None = None)

This class allows to convert a grid2op action space into a gymnasium “MultiDiscrete”. This means that the action are labeled, and instead of describing the action itself, you provide only its ID.

Note

This action space is particularly suited for represented discrete actions.

It is possible to represent continuous actions with it. In that case, the continuous actions are “binarized” thanks to the ContinuousToDiscreteConverter. Feel free to consult its documentation for more information.

In this case it will extract all the features in all the action with:

“set_line_status”: n_line dimensions, each containing 3 choices “DISCONNECT”, “DONT AFFECT”, “FORCE CONNECTION” and affecting the powerline status (connected / disconnected)
“change_line_status”: n_line dimensions, each containing 2 elements “CHANGE”, “DONT CHANGE” and affecting the powerline status (connected / disconnected)
“set_bus”: dim_topo dimensions, each containing 4 choices: “DISCONNECT”, “DONT AFFECT”, “CONNECT TO BUSBAR 1”, or “CONNECT TO BUSBAR 2”, “CONNECT TO BUSBAR 3”, … and affecting to which busbar an object is connected
“change_bus”: dim_topo dimensions, each containing 2 choices: “CHANGE”, “DONT CHANGE” and affect to which busbar an element is connected
“redispatch”: sum(env.gen_redispatchable) dimensions, each containing a certain number of choices depending on the value of the keyword argument nb_bins[“redispatch”] (by default 7).
“curtail”: sum(env.gen_renewable) dimensions, each containing a certain number of choices depending on the value of the keyword argument nb_bins[“curtail”] (by default 7). This is the “conversion to discrete action” of the curtailment action.
“curtail_mw”: sum(env.gen_renewable) dimensions, completely equivalent to “curtail” for this representation. This is the “conversion to discrete action” of the curtailment action.
“set_storage”: n_storage dimensions, each containing a certain number of choices depending on the value of the keyword argument nb_bins[“set_storage”] (by default 7). This is the “conversion to discrete action” of the action on storage units.
“raise_alarm”: TODO
“raise_alert”: TODO

We offer some extra customization, with the keywords:

“sub_set_bus”: n_sub dimension. This type of representation encodes each different possible combination of elements that are possible at each substation. The choice at each component depends on the element connected at this substation. Only configurations that will not lead to straight game over will be generated.
“sub_change_bus”: n_sub dimension. Same comment as for “sub_set_bus”
“one_sub_set”: 1 single dimension. This type of representation differs from the previous one only by the fact that each step you can perform only one single action on a single substation (so unlikely to be illegal).
“one_sub_change”: 1 single dimension. Same as above.
“one_line_set”: 1 single dimension. In this type of representation, you have one dimension with 1 + 2 * n_line elements: first is “do nothing”, then next elements control the force connection or disconnection of the powerlines (new in version 1.10.0)
“one_line_change”: 1 single dimension. In this type of representation, you have 1 + n_line possibility for this element. First one is “do nothing” then it controls the change of status of any given line (new in version 1.10.0).

Warning

We recommend to use either “set” or “change” way to look at things (ie either you want to target a given state -in that case use “sub_set_bus”, “line_set_status”, “one_sub_set”, or “set_bus” __**OR**__ you prefer reasoning in terms of “i want to change this or that” in that case use “sub_change_bus”, “line_change_status”, “one_sub_change” or “change_bus”.

Combining a “set” and “change” on the same element will most likely lead to an “ambiguous action”. Indeed what grid2op can do if you “tell element A to go to bus 1” and “tell the same element A to switch to bus 2 if it was to 1 and to move to bus 1 if it was on bus 2”. It’s not clear at all (hence the “ambiguous”).

No error will be thrown if you mix this, this is your absolute right, be aware it might not lead to the result you expect though.

Note

The arguments “set_bus”, “sub_set_bus” and “one_sub_set” will all perform “set_bus” actions. The only difference if “how you represent these actions”:

In “set_bus” each component represent a single element of the grid. When you sample an action with this keyword you will possibly change all the elements of the grid at once (this is likely to be illega). Nothing prevents you to perform “weird” stuff, for example disconnecting a load or a generator (which is straight game over) or having a load or a generator that will be “alone” on a busbar (which will also lead to a straight game over). You can do anything with it, but as always “A great power comes with a great responsibility”.
In “sub_set_bus” each component represent a substation of the grid. When you sample an action from this, you will possibly change all the elements of the grid at once (because you can act on all the substation at the same time). As opposed to “set_bus” however this constraint the action space to “action that will not lead directly to a game over”, in practice.
In “one_sub_set”: the single component represent the whole grid. When you sample an action with this, you will sample a single action acting on a single substation. You will not be able to act on multiple substation with this.

For this reason, we also do not recommend using only one of these arguments and only provide only one of “set_bus”, “sub_set_bus” and “one_sub_set”. Again, no error will be thrown if you mix them but be warned that the resulting behaviour might not be what you expect.

Warning

The same as above holds for “change_bus”, “sub_change_bus” and “one_sub_change”: Use only one of these !

Danger

The keys set_bus and change_bus does not have the same meaning between this representation of the action and the DiscreteActSpace.

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

MultiDiscreteActSpace will inherit from gymnasium if it’s installed (in this case it will be MultiDiscreteActSpaceGymnasium), otherwise it will inherit from gym (and will be exactly MultiDiscreteActSpaceLegacyGym)
MultiDiscreteActSpaceGymnasium will inherit from gymnasium if it’s available and never from from gym
MultiDiscreteActSpaceLegacyGym will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

Examples

If you simply want to use it you can do:

import grid2op
env_name = "l2rpn_case14_sandbox"  # or any other name
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv, MultiDiscreteActSpace
gym_env = GymEnv(env)

gym_env.action_space = MultiDiscreteActSpace(env.action_space)

You can select the attribute you want to keep, for example:

gym_env.action_space = MultiDiscreteActSpace(env.observation_space,
                                             attr_to_keep=['redispatch', "curtail", "sub_set_bus"])

You can also apply some basic transformation when you “discretize” continuous action

gym_env.action_space = MultiDiscreteActSpace(env.observation_space,
                                             attr_to_keep=['redispatch', "curtail", "sub_set_bus"],
                                             nb_bins={"redispatch": 3, "curtail": 17},
                                             )

By default it is “discretized” in 7 different “bins”. The more “bins” there will be, the more “precise” you can be in your control, but the higher the dimension of the action space.

grid2op.gym_compat.MultiToTupleConverter: alias of MultiToTupleConverterGymnasium

class grid2op.gym_compat.MultiToTupleConverterGymnasium(init_space=None)

Some framework, for example ray[rllib] do not support MultiBinary nor MultiDiscrete gym action space. Apparently this is not going to change in a near future (see https://github.com/ray-project/ray/issues/1519).

We choose to encode some variable using MultiBinary variable in grid2op. This allows for easy manipulation of them if using these frameworks.

MultiBinary are encoded with gymnasium Tuple of gymnasium Discrete variables.

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

MultiToTupleConverter will inherit from gymnasium if it’s installed (in this case it will be MultiToTupleConverterGymnasium), otherwise it will inherit from gym (and will be exactly MultiToTupleConverterLegacyGym)
MultiToTupleConverterGymnasium will inherit from gymnasium if it’s available and never from from gym
MultiToTupleConverterLegacyGym will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

TODO add code example

class grid2op.gym_compat.MultiToTupleConverterLegacyGym(init_space=None)

Some framework, for example ray[rllib] do not support MultiBinary nor MultiDiscrete gym action space. Apparently this is not going to change in a near future (see https://github.com/ray-project/ray/issues/1519).

We choose to encode some variable using MultiBinary variable in grid2op. This allows for easy manipulation of them if using these frameworks.

MultiBinary are encoded with gymnasium Tuple of gymnasium Discrete variables.

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

MultiToTupleConverter will inherit from gymnasium if it’s installed (in this case it will be MultiToTupleConverterGymnasium), otherwise it will inherit from gym (and will be exactly MultiToTupleConverterLegacyGym)
MultiToTupleConverterGymnasium will inherit from gymnasium if it’s available and never from from gym
MultiToTupleConverterLegacyGym will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

TODO add code example

grid2op.gym_compat.ScalerAttrConverter: alias of ScalerAttrConverterGymnasium

class grid2op.gym_compat.ScalerAttrConverterGymnasium(substract, divide, dtype=None, init_space=None)

This is a scaler that transforms a initial gymnasium space init_space into its scale version.

It can be use to scale the observation by substracting the mean and dividing by the variance for example.

TODO work in progress !

Need help if you can :-)

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

ScalerAttrConverter will inherit from gymnasium if it’s installed (in this case it will be ScalerAttrConverterGymnasium), otherwise it will inherit from gym (and will be exactly ScalerAttrConverterLegacyGym)
ScalerAttrConverterGymnasium will inherit from gymnasium if it’s available and never from from gym
ScalerAttrConverterLegacyGym will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

class grid2op.gym_compat.ScalerAttrConverterLegacyGym(substract, divide, dtype=None, init_space=None)

This is a scaler that transforms a initial gymnasium space init_space into its scale version.

It can be use to scale the observation by substracting the mean and dividing by the variance for example.

TODO work in progress !

Need help if you can :-)

Warning

Depending on the presence absence of gymnasium and gym packages this class might behave differently.

In grid2op we tried to maintain compatibility both with gymnasium (newest) and gym (legacy, no more maintained) RL packages. The behaviour is the following:

ScalerAttrConverter will inherit from gymnasium if it’s installed (in this case it will be ScalerAttrConverterGymnasium), otherwise it will inherit from gym (and will be exactly ScalerAttrConverterLegacyGym)
ScalerAttrConverterGymnasium will inherit from gymnasium if it’s available and never from from gym
ScalerAttrConverterLegacyGym will inherit from gym if it’s available and never from from gymnasium

See Gymnasium vs Gym for more information

grid2op.gym_compat.gym_space_converter._BaseGymSpaceConverter: alias of _BaseGymnasiumSpaceConverter

If you still can’t find what you’re looking for, try in one of the following pages:

Still trouble finding the information ? Do not hesitate to send a github issue about the documentation at this link: Documentation issue template