Compatibility with openAI gym

The gym framework in reinforcement learning is widely used. Starting from version 1.2.0 we improved the compatibility with this framework.

Before grid2op 1.2.0 only some classes fully implemented the open AI gym interface:

  • the grid2op.Environment (with methods such as env.reset, env.step etc.)

  • the grid2op.Agent (with the agent.act etc.)

  • the creation of pre defined environments (with grid2op.make)

Starting from 1.2.0 we implemented some automatic converters that are able to automatically map grid2op representation for the action space and the observation space into open AI gym “spaces”. More precisely these are represented as gym.spaces.Dict.

As of grid2op 1.4.0 we tighten the gap between openAI gym and grid2op by introducing the dedicated module grid2op.gym_compat . Withing this module there are lots of functionalities to convert a grid2op environment into a gym environment (that inherit gym.Env instead of “simply” implementing the open ai gym interface).

A simple usage is:

import grid2op
from grid2op.gym_compat import GymEnv

env_name = "l2rpn_case14_sandbox"  # or any other grid2op environment name
g2op_env = grid2op.make(env_name)  # create the gri2op environment

gym_env = GymEnv(g2op_env)  # create the gym environment

# check that this is a properly defined gym environment:
import gym
print(f"Is gym_env and open AI gym environment: {isinstance(gym_env, gym.Env)}")
# it shows "Is gym_env and open AI gym environment: True"

Note

To be as close as grid2op as possible, by default (using the methode discribed above) the action space will be encoded as a gym Dict with keys the attribute of a grid2op action. This might not be the best representation to perform RL with (some framework do not really like it…)

For more customization on that side, please refer to the section Customizing the action and observation space, into Box or Discrete below

This page is organized as follow:

Observation space and action space customization

By default, the action space and observation space are gym.spaces.Dict with the keys being the attribute to modify.

Default Observations space

For example, an observation space will look like:

  • “a_ex”: Box(env.n_line,) [type: float, low: 0, high: inf]

  • “a_or”: Box(env.n_line,) [type: float, low: 0, high: inf]

  • “actual_dispatch”: Box(env.n_gen,)

  • “curtailment”: Box(env.n_gen,) [type: float, low: 0., high: 1.0]

  • “curtailment_limit”: Box(env.n_gen,) [type: float, low: 0., high: 1.0]

  • “gen_p”: Box(env.n_gen,) [type: float, low: env.gen_pmin, high: env.gen_pmax * 1.2]

  • “gen_q”: Box(env.n_gen,) [type: float, low: -inf, high: inf]

  • “gen_v”: Box(env.n_gen,) [type: float, low: 0, high: inf]

  • “day”: Discrete(32)

  • “day_of_week”: Discrete(8)

  • “duration_next_maintenance”: Box(env.n_line,) [type: int, low: -1, high: inf]

  • “hour_of_day”: Discrete(24)

  • “line_status”: MultiBinary(env.n_line)

  • “load_p”: Box(env.n_load,) [type: float, low: -inf, high: inf]

  • “load_q”: Box(env.n_load,) [type: float, low: -inf, high: inf]

  • “load_v”: Box(env.n_load,) [type: float, low: -inf, high: inf]

  • “minute_of_hour”: Discrete(60)

  • “month”: Discrete(13)

  • “p_ex”: Box(env.n_line,) [type: float, low: -inf, high: inf]

  • “p_or”: Box(env.n_line,) [type: float, low: -inf, high: inf]

  • “q_ex”: Box(env.n_line,) [type: float, low: -inf, high: inf]

  • “q_or”: Box(env.n_line,) [type: float, low: -inf, high: inf]

  • “rho”: Box(env.n_line,) [type: float, low: 0., high: inf]

  • “storage_charge”: Box(env.n_storage,) [type: float, low: 0., high: env.storage_Emax]

  • “storage_power”: Box(env.n_storage,) [type: float, low: -env.storage_max_p_prod, high: env.storage_max_p_absorb]

  • “storage_power_target”: Box(env.n_storage,) [type: float, low: -env.storage_max_p_prod, high: env.storage_max_p_absorb]

  • “target_dispatch”: Box(env.n_gen,)

  • “time_before_cooldown_line”: Box(env.n_line,) [type: int, low: 0, high: depending on parameters]

  • “time_before_cooldown_sub”: Box(env.n_sub,) [type: int, low: 0, high: depending on parameters]

  • “time_next_maintenance”: Box(env.n_line,) [type: int, low: 0, high: inf]

  • “timestep_overflow”: Box(env.n_line,) [type: int, low: 0, high: inf]

  • “topo_vect”: Box(env.dim_topo,) [type: int, low: -1, high: 2]

  • “v_ex”: Box(env.n_line,) [type: float, low: 0, high: inf]

  • “v_or”: Box(env.n_line,) [type: flaot, low: 0, high: inf]

  • “year”: Discrete(2100)

Each keys correspond to an attribute of the observation. In this example “line_status”: MultiBinary(20) represents the attribute obs.line_status which is a boolean vector (for each powerline True encodes for “connected” and False for “disconnected”) See the chapter Observation for more information about these attributes.

Default Action space

The default action space is also a type of gym Dict. As for the observation space above, it is a straight translation from the attribute of the action to the key of the dictionary. This gives:

  • “change_bus”: MultiBinary(env.dim_topo)

  • “change_line_status”: MultiBinary(env.n_line)

  • “curtail”: Box(env.n_gen) [type: float, low=0., high=1.0]

  • “redispatch”: Box(env.n_gen) [type: float, low=-env.gen_max_ramp_down, high=`env.gen_max_ramp_up`]

  • “set_bus”: Box(env.dim_topo) [type: int, low=-1, high=2]

  • “set_line_status”: Box(env.n_line) [type: int, low=-1, high=1]

  • “storage_power”: Box(env.n_storage) [type: float, low=-env.storage_max_p_prod, high=`env.storage_max_p_absorb`]

Customizing the action and observation space

We offer some convenience functions to customize these spaces.

If you want a full control on this spaces, you need to implement something like:

import grid2op
env_name = ...
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv
# this of course will not work... Replace "AGymSpace" with a normal gym space, like Dict, Box, MultiDiscrete etc.
from gym.spaces import AGymSpace
gym_env = GymEnv(env)

class MyCustomObservationSpace(AGymSpace):
    def __init__(self, whatever, you, want):
        # do as you please here
        pass
        # don't forget to initialize the base class
        AGymSpace.__init__(self, see, gym, doc, as, to, how, to, initialize, it)
        # eg. Box.__init__(self, low=..., high=..., dtype=float)

    def to_gym(self, observation):
        # this is this very same function that you need to implement
        # it should have this exact name, take only one observation (grid2op) as input
        # and return a gym object that belong to your space "AGymSpace"
        return SomethingThatBelongTo_AGymSpace
        # eg. return np.concatenate((obs.gen_p * 0.1, np.sqrt(obs.load_p))

gym_env.observation_space = MyCustomObservationSpace(whatever, you, wanted)

And for the action space:

import grid2op
env_name = ...
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv
# this of course will not work... Replace "AGymSpace" with a normal gym space, like Dict, Box, MultiDiscrete etc.
from gym.spaces import AGymSpace
gym_env = GymEnv(env)

class MyCustomActionSpace(AGymSpace):
    def __init__(self, whatever, you, want):
        # do as you please here
        pass
        # don't forget to initialize the base class
        AGymSpace.__init__(self, see, gym, doc, as, to, how, to, initialize, it)
        # eg. MultiDiscrete.__init__(self, nvec=...)

    def from_gym(self, gym_action):
        # this is this very same function that you need to implement
        # it should have this exact name, take only one action (member of your gym space) as input
        # and return a grid2op action
        return TheGymAction_ConvertedTo_Grid2op_Action
        # eg. return np.concatenate((obs.gen_p * 0.1, np.sqrt(obs.load_p))

gym_env.action_space = MyCustomActionSpace(whatever, you, wanted)

Customizing the action and observation space, using Converter

However, if you don’t want to fully customize everything, we encourage you to have a look at the “GymConverter” that we coded to ease this process.

They all more or less the same manner. We show here an example of a “converter” that will scale the data (removing the value in substract and divide input data by divide):

import grid2op
from grid2op.gym_compat import GymEnv
from grid2op.gym_compat import ScalerAttrConverter

env_name = "l2rpn_case14_sandbox"  # or any other grid2op environment name
g2op_env = grid2op.make(env_name)  # create the gri2op environment

gym_env = GymEnv(g2op_env)  # create the gym environment

ob_space = gym_env.observation_space
ob_space = ob_space.reencode_space("actual_dispatch",
                                   ScalerAttrConverter(substract=0.,
                                                       divide=env.gen_pmax,
                                                       init_space=ob_space["actual_dispatch"]
                                                       )
                                   )

gym_env.observation_space = ob_space

You can also add a specific keys into this observation space, for example say you want to compute the log of the loads instead of giving the direct value to your agent. This can be done with:

import grid2op
from grid2op.gym_compat import GymEnv
from grid2op.gym_compat import ScalerAttrConverter

env_name = "l2rpn_case14_sandbox"  # or any other grid2op environment name
g2op_env = grid2op.make(env_name)  # create the gri2op environment

gym_env = GymEnv(g2op_env)  # create the gym environment

ob_space = gym_env.observation_space
shape_ = (g2op_env.n_load, )
ob_space = ob_space.add_key("log_load",
                             lambda obs: np.log(obs.load_p),
                                      Box(shape=shape_,
                                          low=np.full(shape_, fill_value=-np.inf, dtype=float),
                                          high=np.full(shape_, fill_value=-np.inf, dtype=float),
                                          dtype=float
                                          )
                                   )

gym_env.observation_space = ob_space
# and now you will get the key "log_load" as part of your gym observation.

A detailed list of such “converter” is documented on the section “Detailed Documentation by class”. In the table below we describe some of them (nb if you notice a converter is not displayed there, do not hesitate to write us a “feature request” for the documentation, thanks in advance)

Converter name

Objective

ContinuousToDiscreteConverter

Convert a continuous space into a discrete one

MultiToTupleConverter

Convert a gym MultiBinary to a gym Tuple of gym Binary and a gym MultiDiscrete to a Tuple of Discrete

ScalerAttrConverter

Allows to scale (divide an attribute by something and subtract something from it

BaseGymSpaceConverter.add_key

Allows you to compute another “part” of the observation space (you add an information to the gym space)

BaseGymSpaceConverter.keep_only_attr

Allows you to specify which part of the action / observation you want to keep

BaseGymSpaceConverter.ignore_attr

Allows you to ignore some attributes of the action / observation (they will not be part of the gym space)

Note

With the “converters” above, note that the observation space AND action space will still inherit from gym Dict.

They are complex spaces that are not well handled by some RL framework.

These converters only change the keys of these dictionaries !

Customizing the action and observation space, into Box or Discrete

The use of the converter above is nice if you can work with gym Dict, but in some cases, or for some frameworks it is not convenient to do it at all.

TO alleviate this problem, we developed 3 types of gym action space, following the architecture detailed in subsection Customizing the action and observation space

Converter name

Objective

BoxGymObsSpace

Convert the observation space to a single “Box”

BoxGymActSpace

Convert a gym MultiBinary to a gym Tuple of gym Binary and a gym MultiDiscrete to a Tuple of Discrete

MultiDiscreteActSpace

Allows to scale (divide an attribute by something and subtract something from it

DiscreteActSpace

Allows you to compute another “part” of the observation space (you add an information to the gym space)

They can all be used like:

import grid2op
env_name = ...
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv, BoxGymObsSpace
gym_env = GymEnv(env)
gym_env.observation_space = BoxGymObsSpace(gym_env.init_env)
gym_env.action_space = MultiDiscreteActSpace(gym_env.init_env)

We encourage you to visit the documentation for more information on how to use these classes. Each offer different possible customization.

Detailed Documentation by class

Classes:

BaseGymAttrConverter([space, gym_to_g2op, …])

TODO work in progress !

BoxGymActSpace(grid2op_action_space[, …])

This class allows to convert a grid2op action space into a gym “Box” which is a regular Box in R^d.

BoxGymObsSpace(grid2op_observation_space[, …])

This class allows to convert a grid2op observation space into a gym “Box” which is a regular Box in R^d.

ContinuousToDiscreteConverter(nb_bins[, …])

Some RL algorithms are particularly suited for dealing with discrete action space or observation space.

DiscreteActSpace(grid2op_action_space[, …])

TODO the documentation of this class is in progress.

GymActionSpace(env[, converter, dict_variables])

This class enables the conversion of the action space into a gym “space”.

GymEnv(env_init)

fully implements the openAI gym API by using the GymActionSpace and GymObservationSpace for compliance with openAI gym.

GymObservationSpace(env[, dict_variables])

This class allows to transform the observation space into a gym space.

MultiDiscreteActSpace(grid2op_action_space)

This class allows to convert a grid2op action space into a gym “MultiDiscrete”.

MultiToTupleConverter([init_space])

Some framework, for example ray[rllib] do not support MultiBinary nor MultiDiscrete gym action space.

ScalerAttrConverter(substract, divide[, …])

This is a scaler that transforms a initial gym space init_space into its scale version.

class grid2op.gym_compat.BaseGymAttrConverter(space=None, gym_to_g2op=None, g2op_to_gym=None)[source]

TODO work in progress !

Need help if you can :-)

Methods:

g2op_to_gym(g2op_object)

Convert a gym object to a grid2op object

gym_to_g2op(gym_object)

Convert a gym object to a grid2op object

g2op_to_gym(g2op_object)[source]

Convert a gym object to a grid2op object

Parameters

g2op_object – An object (action or observation) represented as a grid2op.Action.BaseAction or grid2op.Observation.BaseObservation

Returns

Return type

The same object, represented as a gym “ordered dictionary”

gym_to_g2op(gym_object)[source]

Convert a gym object to a grid2op object

Parameters

gym_object – An object (action or observation) represented as a gym “ordered dictionary”

Returns

Return type

The same object, represented as a grid2op.Action.BaseAction or grid2op.Observation.BaseObservation.

class grid2op.gym_compat.BoxGymActSpace(grid2op_action_space, attr_to_keep=('set_line_status', 'change_line_status', 'set_bus', 'change_bus', 'redispatch', 'set_storage', 'curtail', 'raise_alarm'), add=None, multiply=None, functs=None)[source]

This class allows to convert a grid2op action space into a gym “Box” which is a regular Box in R^d.

It also allows to customize which part of the action you want to use and offer capacity to center / reduce the data or to use more complex function from the observation.

Note

Though it is possible to use every type of action with this type of action space, be aware that this is not recommended at all to use it for discrete attribute (set_bus, change_bus, set_line_status or change_line_status) !

Basically, when doing action in gym for these attributes, this converter will involve rounding and is definitely not the best representation. Prefer the MultiDiscreteActSpace or the DiscreteActSpace classes.

Examples

If you simply want to use it you can do:

import grid2op
env_name = ...
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv, BoxGymActSpace
gym_env = GymEnv(env)

gym_env.action_space = BoxGymActSpace(env.action_space)

In this case it will extract all the features in all the action (a detailed list is given in the documentation at Action).

You can select the attribute you want to keep, for example:

gym_env.observation_space = BoxGymObsSpace(env.observation_space,
                                           attr_to_keep=['redispatch', "curtail"])

You can also apply some basic transformation to the attribute of the action. This can be done with:

gym_env.observation_space = BoxGymObsSpace(env.observation_space,
                                           attr_to_keep=['redispatch', "curtail"],
                                           multiply={"redispatch": env.gen_max_ramp_up},
                                           add={"redispatch": 0.5 * env.gen_max_ramp_up})

In the above example, the resulting “redispatch” part of the vector will be given by the following formula: grid2op_act = gym_act * multiply + add

Hint: you can use: multiply being the standard deviation and add being the average of the attribute.

Notes

For more customization, this code is roughly equivalent to something like:

import grid2op
env_name = ...
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv
# this of course will not work... Replace "AGymSpace" with a normal gym space, like Dict, Box, MultiDiscrete etc.
from gym.spaces import AGymSpace
gym_env = GymEnv(env)

class MyCustomActionSpace(AGymSpace):
    def __init__(self, whatever, you, want):
        # do as you please here
        pass
        # don't forget to initialize the base class
        AGymSpace.__init__(self, see, gym, doc, as, to, how, to, initialize, it)
        # eg. MultiDiscrete.__init__(self, nvec=...)

    def from_gym(self, gym_action):
        # this is this very same function that you need to implement
        # it should have this exact name, take only one action (member of your gym space) as input
        # and return a grid2op action
        return TheGymAction_ConvertedTo_Grid2op_Action
        # eg. return np.concatenate((obs.gen_p * 0.1, np.sqrt(obs.load_p))

gym_env.action_space = MyCustomActionSpace(whatever, you, wanted)

And you can implement pretty much anything in the “from_gym” function.

Methods:

from_gym(gym_act)

This is the function that is called to transform a gym action (in this case a numpy array!) sent by the agent and convert it to a grid2op action that will be sent to the underlying grid2op environment.

from_gym(gym_act)[source]

This is the function that is called to transform a gym action (in this case a numpy array!) sent by the agent and convert it to a grid2op action that will be sent to the underlying grid2op environment.

Parameters

gym_act (numpy.ndarray) – the gym action

Returns

grid2op_act – The corresponding grid2op action.

Return type

grid2op.Action.BaseAction

class grid2op.gym_compat.BoxGymObsSpace(grid2op_observation_space, attr_to_keep=('year', 'month', 'day', 'hour_of_day', 'minute_of_hour', 'day_of_week', 'gen_p', 'gen_q', 'gen_v', 'load_p', 'load_q', 'load_v', 'p_or', 'q_or', 'v_or', 'a_or', 'p_ex', 'q_ex', 'v_ex', 'a_ex', 'rho', 'line_status', 'timestep_overflow', 'topo_vect', 'time_before_cooldown_line', 'time_before_cooldown_sub', 'time_next_maintenance', 'duration_next_maintenance', 'target_dispatch', 'actual_dispatch', 'storage_charge', 'storage_power_target', 'storage_power', 'curtailment', 'curtailment_limit', 'thermal_limit', 'is_alarm_illegal', 'time_since_last_alarm', 'last_alarm', 'attention_budget', 'was_alarm_used_after_game_over'), subtract=None, divide=None, functs=None)[source]

This class allows to convert a grid2op observation space into a gym “Box” which is a regular Box in R^d.

It also allows to customize which part of the observation you want to use and offer capacity to center / reduce the data or to use more complex function from the observation.

Examples

If you simply want to use it you can do:

import grid2op
env_name = ...
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv, BoxGymObsSpace
gym_env = GymEnv(env)

gym_env.observation_space = BoxGymObsSpace(env.observation_space)

In this case it will extract all the features in all the observation (a detailed list is given in the documentation at Observation.

You can select the attribute you want to keep, for example:

gym_env.observation_space = BoxGymObsSpace(env.observation_space,
                                           attr_to_keep=['load_p', "gen_p", "rho])

You can also apply some basic transformation to the attribute of the observation before building the resulting gym observation (which in this case is a vector). This can be done with:

gym_env.observation_space = BoxGymObsSpace(env.observation_space,
                                           attr_to_keep=['load_p', "gen_p", "rho"],
                                           divide={"gen_p": env.gen_pmax},
                                           substract={"gen_p": 0.5 * env.gen_pmax})

In the above example, the resulting “gen_p” part of the vector will be given by the following formula: gym_obs = (grid2op_obs - substract) / divide.

Hint: you can use: divide being the standard deviation and subtract being the average of the attribute on a few episodes for example. This can be done with grid2op.utils.EpisodeStatistics for example.

Finally, you can also modify more the attribute of the observation and add it to your box. This can be done rather easily with the “functs” argument like:

gym_env.observation_space = BoxGymObsSpace(env.observation_space,
                                           attr_to_keep=["connectivity_matrix", "log_load"],
                                           functs={"connectivity_matrix":
                                                      (lambda grid2opobs: grid2opobs.connectivity_matrix().flatten(),
                                                       0., 1.0, None, None),
                                                   "log_load":
                                                    (lambda grid2opobs: np.log(grid2opobs.load_p),
                                                    None, 10., None, None)
                                                }
                                           )

In this case, “functs” should be a dictionary, the “keys” should be string (keys should also be present in the attr_to_keep list) and the values should count 5 elements (callable, low, high, shape, dtype) with:

  • callable a function taking as input a grid2op observation and returning a numpy array

  • low (optional) (put None if you don’t want to specify it, defaults to -np.inf) the lowest value your numpy array can take. It can be a single number or an array with the same shape as the return value of your function.

  • high (optional) (put None if you don’t want to specify it, defaults to np.inf) the highest value your numpy array can take. It can be a single number or an array with the same shape as the return value of your function.

  • shape (optional) (put None if you don’t want to specify it) the shape of the return value of your function. It should be a tuple (and not a single number). By default it is computed with by applying your function to an observation.

  • dtype (optional, put None if you don’t want to change it, defaults to np.float32) the type of the numpy array as output of your function.

Notes

The range of the values for “gen_p” / “prod_p” are not strictly env.gen_pmin and env.gen_pmax. This is due to the “approximation” when some redispatching is performed (the precision of the algorithm that computes the actual dispatch from the information it receives) and also because sometimes the losses of the grid are really different that the one anticipated in the “chronics” (yes env.gen_pmin and env.gen_pmax are not always ensured in grid2op)

Methods:

to_gym(grid2op_observation)

This is the function that is called to transform a grid2Op observation, sent by the grid2op environment and convert it to a numpy array (an element of a gym Box)

to_gym(grid2op_observation)[source]

This is the function that is called to transform a grid2Op observation, sent by the grid2op environment and convert it to a numpy array (an element of a gym Box)

Parameters

grid2op_observation – The grid2op observation (as a grid2op object)

Returns

res – A numpy array compatible with the openAI gym Box that represents the action space.

Return type

numpy.ndarray

class grid2op.gym_compat.ContinuousToDiscreteConverter(nb_bins, init_space=None)[source]

Some RL algorithms are particularly suited for dealing with discrete action space or observation space.

This “AttributeConverter” is responsible to convert continuous space to discrete space. The way it does it is by using bins. It uses np.linspace to compute the bins.

We recommend using an odd number of bins (eg 3, 7 or 9 for example).

Examples

If nb_bins is 3 and the original input space is [-10, 10], then the split is the following:

  • 0 encodes all numbers in [-10, -3.33)

  • 1 encodes all numbers in [-3.33, 3.33)

  • 2 encode all numbers in [3.33, 10.]

And reciprocally, this action with :

  • 0 is understood as -5.0 (middle of the interval -10 / 0)

  • 1 is understood as 0.0 (middle of the interval represented by -10 / 10)

  • 2 is understood as 5.0 (middle of the interval represented by 0 / 10)

If nb_bins is 5 and the original input space is [-10, 10], then the split is the following:

  • 0 encodes all numbers in [-10, -6)

  • 1 encodes all numbers in [-6, -2)

  • 2 encode all numbers in [-2, 2)

  • 3 encode all numbers in [2, 6)

  • 3 encode all numbers in [6, 10]

And reciprocally, this action with :

  • 0 is understood as -6.6666…

  • 1 is understood as -3.333…

  • 2 is understood as 0.

  • 3 is understood as 3.333…

  • 4 is understood as 6.6666…

TODO add example of code on how to use this.

Methods:

g2op_to_gym(g2op_object)

Convert a gym object to a grid2op object

gym_to_g2op(gym_object)

Convert a gym object to a grid2op object

g2op_to_gym(g2op_object)[source]

Convert a gym object to a grid2op object

Parameters

g2op_object – An object (action or observation) represented as a grid2op.Action.BaseAction or grid2op.Observation.BaseObservation

Returns

Return type

The same object, represented as a gym “ordered dictionary”

gym_to_g2op(gym_object)[source]

Convert a gym object to a grid2op object

Parameters

gym_object – An object (action or observation) represented as a gym “ordered dictionary”

Returns

Return type

The same object, represented as a grid2op.Action.BaseAction or grid2op.Observation.BaseObservation.

class grid2op.gym_compat.DiscreteActSpace(grid2op_action_space, attr_to_keep=('set_line_status', 'change_line_status', 'set_bus', 'change_bus', 'redispatch', 'set_storage', 'curtail', 'raise_alarm'), nb_bins=None)[source]

TODO the documentation of this class is in progress.

This class allows to convert a grid2op action space into a gym “Discrete”. This means that the action are labeled, and instead of describing the action itself, you provide only its ID.

It is related to the MultiDiscreteActSpace but compared to this other representation, it does not allow to do “multiple actions”. Typically, if you use the snippets below:

import grid2op
env_name = ...
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv, MultiDiscreteActSpace, DiscreteActSpace
gym_env1 = GymEnv(env)
gym_env2 = GymEnv(env)

gym_env1.action_space = MultiDiscreteActSpace(env.action_space,
                                              attr_to_keep=['redispatch', "curtail", "one_sub_set"])
gym_env2.action_space = MultiDiscreteActSpace(env.action_space,
                                              attr_to_keep=['redispatch', "curtail", "set_bus"])

Then at each step, gym_env1 will allow to perform a redispatching action (on any number of generators), a curtailment action (on any number of generators) __**AND**__ changing the topology at one substation. But at each steps, the agent should predicts lots of “number”.

On the other hand, at each step, the agent for gym_env2 will have to predict a single integer (which is usually the case in most RL environment) but it this action will affect redispatching on a single generator, perform curtailment on a single generator __**OR**__ changing the topology at one substation. But at each steps, the agent should predicts lots of “number”.

The action set is then largely constrained compared to the MultiDiscreteActSpace

Note

This class is really closely related to the grid2op.Converter.IdToAct. It basically “maps” this “IdToAct” into a type of gym space, which, in this case, will be a Discrete one.

Examples

We recommend to use it like:

import grid2op
env_name = ...
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv, MultiDiscreteActSpace, DiscreteActSpace
gym_env = GymEnv(env)

gym_env.observation_space = DiscreteActSpace(env.observation_space,
                                             attr_to_keep=['redispatch', "curtail", "set_bus"])

The possible attribute you can provide in the “attr_to_keep” are:

  • “set_line_status”

  • “change_line_status”

  • “set_bus”: corresponds to changing the topology using the “set_bus” (equivalent to the “one_sub_set” keyword in the “attr_to_keep” of the MultiDiscreteActSpace)

  • “change_bus”: corresponds to changing the topology using the “change_bus” (equivalent to the “one_sub_change” keyword in the “attr_to_keep” of the MultiDiscreteActSpace)

  • “redispatch”

  • “set_storage”

  • “curtail”

  • “curtail_mw” (same effect as “curtail”)

Methods:

from_gym(gym_act)

This is the function that is called to transform a gym action (in this case a numpy array!) sent by the agent and convert it to a grid2op action that will be sent to the underlying grid2op environment.

from_gym(gym_act)[source]

This is the function that is called to transform a gym action (in this case a numpy array!) sent by the agent and convert it to a grid2op action that will be sent to the underlying grid2op environment.

Parameters

gym_act (int``) – the gym action (a single integer for this action space)

Returns

grid2op_act – The corresponding grid2op action.

Return type

grid2op.Action.BaseAction

class grid2op.gym_compat.GymActionSpace(env, converter=None, dict_variables=None)[source]

This class enables the conversion of the action space into a gym “space”.

Resulting action space will be a gym.spaces.Dict.

NB it is NOT recommended to use the sample of the gym action space. Please use the sampling ( if availabe) of the original action space instead [if not available this means there is no implemented way to generate reliable random action]

Note that gym space converted with this class should be seeded independently. It is NOT seeded when calling grid2op.Environment.Environment.seed().

Examples

Converting an action space is fairly straightforward, though the resulting gym action space will depend on the original encoding of the action space.

import grid2op
from grid2op.Converter import GymActionSpace
env = grid2op.make()

gym_action_space = GymActionSpace(env)
# and now gym_action_space is a `gym.spaces.Dict` representing the action space.
# you can convert action to / from this space to grid2op the following way

grid2op_act = env.action_space(...)
gym_act = gym_action_space.to_gym(grid2op_act)

# and the opposite conversion is also possible:
gym_act = ... # whatever you decide to do
grid2op_act = gym_action_space.from_gym(gym_act)

NB you can use this GymActionSpace to represent action into the gym format even if these actions comes from another converter, such as :class`IdToAct` or ToVect in this case, to get back a grid2op action you NEED to convert back the action from this converter. Here is a complete example on this (more advanced) usecase:

import grid2op
from grid2op.Converter import GymActionSpace, IdToAct
env = grid2op.make()

converted_action_space = IdToAct(env)
gym_action_space = GymActionSpace(env=env, converter=converted_action_space)

# and now gym_action_space is a `gym.spaces.Dict` representing the action space.
# you can convert action to / from this space to grid2op the following way

converter_act = ... # whatever action you want
gym_act = gym_action_space.to_gym(converter_act)

# and the opposite conversion is also possible:
gym_act = ... # whatever you decide to do
converter_act = gym_action_space.from_gym(gym_act)

# note that this converter act only makes sense for the converter. It cannot
# be digest by grid2op directly. So you need to also convert it to grid2op
grid2op_act = IdToAct.convert_act(converter_act)

Methods:

from_gym(gymlike_action)

Transform a gym-like action (such as the output of “sample()”) into a grid2op action

reencode_space(key, fun)

This function is used to reencode the action space.

to_gym(action)

Transform an action (non gym) into an action compatible with the gym Space.

from_gym(gymlike_action: collections.OrderedDict) object[source]

Transform a gym-like action (such as the output of “sample()”) into a grid2op action

Parameters

gymlike_action (gym.spaces.dict.OrderedDict) – The action, represented as a gym action (ordered dict)

Returns

  • An action that can be understood by the given action_space (either a grid2Op action if the

  • original action space was used, or a Converter)

reencode_space(key, fun)[source]

This function is used to reencode the action space. For example, it can be used to scale the observation into values close to 0., it can also be used to encode continuous variables into discrete variables or the other way around etc.

Basically, it’s a tool that lets you define your own observation space (there is the same for the action space)

Parameters
  • key (str) – Which part of the observation space you want to study

  • fun (BaseGymAttrConverter) – Put None to deactivate the feature (it will be hided from the observation space) It can also be a BaseGymAttrConverter. See the example for more information.

Returns

The current instance, to be able to chain these calls

Return type

self

Notes

It modifies the observation space. We highly recommend to set it up at the beginning of your script and not to modify it afterwards

‘fun’ should be deep copiable (meaning that if copy.deepcopy(fun) is called, then it does not crash

If an attribute has been ignored, for example by :func`GymEnv.keep_only_obs_attr` or and is now present here, it will be re added in the final observation

to_gym(action: object) collections.OrderedDict[source]

Transform an action (non gym) into an action compatible with the gym Space.

Parameters

action – The action (coming from grid2op or understandable by the converter)

Returns

The same action converted as a OrderedDict (default used by gym in case of action space being Dict)

Return type

gym_action

class grid2op.gym_compat.GymEnv(env_init)[source]

fully implements the openAI gym API by using the GymActionSpace and GymObservationSpace for compliance with openAI gym.

They can handle action_space_converter or observation_space converter to change the representation of data that will be fed to the agent. #TODO

Notes

The environment passed as input is copied. It is not modified by this “gym environment”

Examples

This can be used like:

import grid2op
from grid2op.gym_compat import GymEnv

env_name = ...
env = grid2op.make(env_name)
gym_env = GymEnv(env)  # is a gym environment properly inheriting from gym.Env !

Methods:

close()

Override close in your subclass to perform any necessary cleanup.

render([mode])

for compatibility with open ai gym render function

reset()

Resets the environment to an initial state and returns an initial observation.

seed([seed])

Sets the seed for this env’s random number generator(s).

step(gym_action)

Run one timestep of the environment’s dynamics.

close()[source]

Override close in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when garbage collected or when the program exits.

render(mode='human')[source]

for compatibility with open ai gym render function

reset()[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns

the initial observation.

Return type

observation (object)

seed(seed=None)[source]

Sets the seed for this env’s random number generator(s).

Note

Some environments use multiple pseudorandom number generators. We want to capture all such seeds used in order to ensure that there aren’t accidental correlations between multiple generators.

Returns

Returns the list of seeds used in this env’s random

number generators. The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’. Often, the main seed equals the provided ‘seed’, but this won’t be true if seed=None, for example.

Return type

list<bigint>

step(gym_action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters

action (object) – an action provided by the agent

Returns

agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)

class grid2op.gym_compat.GymObservationSpace(env, dict_variables=None)[source]

This class allows to transform the observation space into a gym space.

Gym space will be a gym.spaces.Dict with the keys being the different attributes of the grid2op observation. All attributes are used.

Note that gym space converted with this class should be seeded independently. It is NOT seeded when calling grid2op.Environment.Environment.seed().

Examples

Converting an observation space is fairly straightforward:

import grid2op
from grid2op.Converter import GymObservationSpace
env = grid2op.make()

gym_observation_space = GymObservationSpace(env.observation_space)
# and now gym_observation_space is a `gym.spaces.Dict` representing the observation space

# you can "convert" the grid2op observation to / from this space with:

grid2op_obs = env.reset()
same_gym_obs = gym_observation_space.to_gym(grid2op_obs)

# the conversion from gym_obs to grid2op obs is feasible, but i don't imagine
# a situation where it is useful. And especially, you will not be able to
# use "obs.simulate" for the observation converted back from this gym action.

Notes

The range of the values for “gen_p” / “prod_p” are not strictly env.gen_pmin and env.gen_pmax. This is due to the “approximation” when some redispatching is performed (the precision of the algorithm that computes the actual dispatch from the information it receives) and also because sometimes the losses of the grid are really different that the one anticipated in the “chronics” (yes env.gen_pmin and env.gen_pmax are not always ensured in grid2op)

Methods:

from_gym(gymlike_observation)

This function convert the gym-like representation of an observation to a grid2op observation.

reencode_space(key, fun)

This function is used to reencode the observation space.

to_gym(grid2op_observation)

Convert a grid2op observation into a gym ordered dict.

from_gym(gymlike_observation: collections.OrderedDict) grid2op.Observation.BaseObservation.BaseObservation[source]

This function convert the gym-like representation of an observation to a grid2op observation.

Parameters

gymlike_observation (gym.spaces.dict.OrderedDict) – The observation represented as a gym ordered dict

Returns

grid2oplike_observation – The corresponding grid2op observation

Return type

grid2op.Observation.BaseObservation

reencode_space(key, fun)[source]

This function is used to reencode the observation space. For example, it can be used to scale the observation into values close to 0., it can also be used to encode continuous variables into discrete variables or the other way around etc.

Basically, it’s a tool that lets you define your own observation space (there is the same for the action space)

Parameters
  • key (str) – Which part of the observation space you want to study

  • fun (BaseGymAttrConverter) – Put None to deactive the feature (it will be hided from the observation space) It can also be a BaseGymAttrConverter. See the example for more information.

Returns

The current instance, to be able to chain these calls

Return type

self

Notes

It modifies the observation space. We highly recommend to set it up at the beginning of your script and not to modify it afterwards

‘fun’ should be deep copiable (meaning that if copy.deepcopy(fun) is called, then it does not crash

If an attribute has been ignored, for example by :func`GymEnv.keep_only_obs_attr` or and is now present here, it will be re added in the final observation

to_gym(grid2op_observation: grid2op.Observation.BaseObservation.BaseObservation) collections.OrderedDict[source]

Convert a grid2op observation into a gym ordered dict.

Parameters

grid2op_observation (grid2op.Observation.BaseObservation) – The observation represented as a grid2op observation

Returns

gymlike_observation – The corresponding gym ordered dict

Return type

gym.spaces.dict.OrderedDict

class grid2op.gym_compat.MultiDiscreteActSpace(grid2op_action_space, attr_to_keep=('set_line_status', 'change_line_status', 'set_bus', 'change_bus', 'redispatch', 'set_storage', 'curtail', 'raise_alarm'), nb_bins=None)[source]

This class allows to convert a grid2op action space into a gym “MultiDiscrete”. This means that the action are labeled, and instead of describing the action itself, you provide only its ID.

Note

This action space is particularly suited for represented discrete actions.

It is possible to represent continuous actions with it. In that case, the continuous actions are “binarized” thanks to the ContinuousToDiscreteConverter. Feel free to consult its documentation for more information.

In this case it will extract all the features in all the action with:

  • “set_line_status”: n_line dimensions, each containing 3 choices “DISCONNECT”, “DONT AFFECT”, “FORCE CONNECTION” and affecting the powerline status (connected / disconnected)

  • “change_line_status”: n_line dimensions, each containing 2 elements “CHANGE”, “DONT CHANGE” and affecting the powerline status (connected / disconnected)

  • “set_bus”: dim_topo dimensions, each containing 4 choices: “DISCONNECT”, “DONT AFFECT”, “CONNECT TO BUSBAR 1”, or “CONNECT TO BUSBAR 2” and affecting to which busbar an object is connected

  • “change_bus”: dim_topo dimensions, each containing 2 choices: “CHANGE”, “DONT CHANGE” and affect to which busbar an element is connected

  • “redispatch”: n_gen dimensions, each containing a certain number of choices depending on the value of the keyword argument nb_bins[“redispatch”] (by default 7) and will be 1 for non dispatchable generator

  • “curtail”: n_gen dimensions, each containing a certain number of choices depending on the value of the keyword argument nb_bins[“curtail”] (by default 7) and will be 1 for non renewable generator. This is the “conversion to discrete action” of the curtailment action.

  • “curtail_mw”: completely equivalent to “curtail” for this representation. This is the “conversion to discrete action” of the curtailment action.

  • “set_storage”: n_storage dimensions, each containing a certain number of choices depending on the value of the keyword argument nb_bins[“set_storage”] (by default 7). This is the “conversion to discrete action” of the action on storage units.

We offer some extra customization, with the keywords:

  • “sub_set_bus”: n_sub dimension. This type of representation encodes each different possible combination of elements that are possible at each substation. The choice at each component depends on the element connected at this substation. Only configurations that will not lead to straight game over will be generated.

  • “sub_change_bus”: n_sub dimension. Same comment as for “sub_set_bus”

  • “one_sub_set”: 1 single dimension. This type of representation differs from the previous one only by the fact that each step you can perform only one single action on a single substation (so unlikely to be illegal).

  • “one_sub_change”: 1 single dimension. Same as above.

Warning

We recommend to use either “set” or “change” way to look at things (ie either you want to target a given state -in that case use “sub_set_bus”, “line_set_status”, “one_sub_set”, or “set_bus” __**OR**__ you prefer reasoning in terms of “i want to change this or that” in that case use “sub_change_bus”, “line_change_status”, “one_sub_change” or “change_bus”.

Combining a “set” and “change” on the same element will most likely lead to an “ambiguous action”. Indeed what grid2op can do if you “tell element A to go to bus 1” and “tell element A2 to go to bus 2 if it was to 1 and to move to bus 1 if it was on bus 2”. It’s not clear at all.

No error will be thrown if you mix this, this is your absolute right, be aware it might not lead to the result you expect though.

Warning

The arguments “set_bus”, “sub_set_bus” and “one_sub_set” will all perform “set_bus” action. The only difference if “how you represent this action”:

  • In “set_bus” each component represent a single element of the grid. When you sample an action with this keyword you will possibly change all the elements of the grid at once (this is likely to be illega). Nothing prevents you to perform “weird” stuff, for example disconnecting a load or a generator (which is straight game over) or having a load or a generator that will be “alone” on a busbar (which will also lead to a straight game over). You can do anything with it, but as always “A great power comes with a great responsibility”.

  • In “sub_set_bus” each component represent a substation of the grid. When you sample an action from this, you will possibly change all the elements of the grid at once (because you can act on all the substation at the same time). As opposed to “set_bus” however this constraint the action space to “action that will not lead directly to a game over”, in practice.

  • In “one_sub_set”: the single component represent the whole grid. When you sample an action with this, you will sample a single action acting on a single substation. You will not be able to act on multiple substation with this.

For this reason, we also do not recommend using only one of these arguments and only provide only one of “set_bus”, “sub_set_bus” and “one_sub_set”. Again, no error will be thrown if you mix them but be warned that the resulting behaviour might not be what you expect.

Warning

The same as above holds for “change_bus”, “sub_change_bus” and “one_sub_change”: Use only one of these !

Examples

If you simply want to use it you can do:

import grid2op
env_name = ...
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv, MultiDiscreteActSpace
gym_env = GymEnv(env)

gym_env.action_space = MultiDiscreteActSpace(env.action_space)

You can select the attribute you want to keep, for example:

gym_env.action_space = MultiDiscreteActSpace(env.observation_space,
                                             attr_to_keep=['redispatch', "curtail", "sub_set_bus"])

You can also apply some basic transformation when you “discretize” continuous action

gym_env.action_space = MultiDiscreteActSpace(env.observation_space,
                                             attr_to_keep=['redispatch', "curtail", "sub_set_bus"],
                                             nb_bins={"redispatch": 3, "curtail": 17},
                                             )

By default it is “discretized” in 7 different “bins”. The more “bins” there will be, the more “precise” you can be in your control, but the higher the dimension of the action space.

Methods:

from_gym(gym_act)

This is the function that is called to transform a gym action (in this case a numpy array!) sent by the agent and convert it to a grid2op action that will be sent to the underlying grid2op environment.

from_gym(gym_act)[source]

This is the function that is called to transform a gym action (in this case a numpy array!) sent by the agent and convert it to a grid2op action that will be sent to the underlying grid2op environment.

Parameters

gym_act (numpy.ndarray) – the gym action

Returns

grid2op_act – The corresponding grid2op action.

Return type

grid2op.Action.BaseAction

class grid2op.gym_compat.MultiToTupleConverter(init_space=None)[source]

Some framework, for example ray[rllib] do not support MultiBinary nor MultiDiscrete gym action space. Apparently this is not going to change in a near future (see https://github.com/ray-project/ray/issues/1519).

We choose to encode some variable using MultiBinary variable in grid2op. This allows for easy manipulation of them if using these frameworks.

MultiBinary are encoded with gym Tuple of gym Discrete variables.

TODO add code example

Methods:

g2op_to_gym(g2op_object)

Convert a gym object to a grid2op object

gym_to_g2op(gym_object)

Convert a gym object to a grid2op object

g2op_to_gym(g2op_object)[source]

Convert a gym object to a grid2op object

Parameters

g2op_object – An object (action or observation) represented as a grid2op.Action.BaseAction or grid2op.Observation.BaseObservation

Returns

Return type

The same object, represented as a gym “ordered dictionary”

gym_to_g2op(gym_object)[source]

Convert a gym object to a grid2op object

Parameters

gym_object – An object (action or observation) represented as a gym “ordered dictionary”

Returns

Return type

The same object, represented as a grid2op.Action.BaseAction or grid2op.Observation.BaseObservation.

class grid2op.gym_compat.ScalerAttrConverter(substract, divide, dtype=None, init_space=None)[source]

This is a scaler that transforms a initial gym space init_space into its scale version.

It can be use to scale the observation by substracting the mean and dividing by the variance for example.

TODO work in progress !

Need help if you can :-)

class grid2op.gym_compat.gym_space_converter._BaseGymSpaceConverter(dict_gym_space, dict_variables=None)[source]

INTERNAL

Warning

/!\ Internal, do not use unless you know what you are doing /!\ Used as a base class to convert grid2op state to gym state (wrapper for some useful function for both the action space and the observation space).

Methods:

add_key(key_name, function, return_type)

Allows to add arbitrary function to the representation, as a gym environment of the action space of the observation space.

get_dict_encoding()

TODO examples and description

ignore_attr(attr_names)

ignore some attribute names from the space

keep_only_attr(attr_names)

keep only a certain part of the observation

reenc(key, fun)

shorthand for GymObservationSpace.reencode_space() or GymActionSpace.reencode_space()

reencode_space(key, func)

TODO examples and description

seed([seed])

Seed the PRNG of this space.

add_key(key_name, function, return_type)[source]

Allows to add arbitrary function to the representation, as a gym environment of the action space of the observation space.

TODO NB this key is not used when converted back to grid2Op object, as of now we don’t recommend to use it for the action space !

See the example for more information.

Parameters
  • key_name – The name you want to get

  • function – A function that takes as input

  • return_type

Examples

In the example below, we explain how to add the “connectivity_matrix” as part of the observation space (when converted to gym). The new key “connectivity matrix” will be added to the gym observation.

# create a grid2op environment
import grid2op
env_name = "l2rpn_case14_sandbox"
env_glop = grid2op.make(env_name)

# convert it to gym
import gym
import numpy as np
from grid2op.gym_compat import GymEnv
env_gym = GymEnv(env_glop)

# default gym environment, the connectivity matrix is not computed
obs_gym = env_gym.reset()
print(f"Is the connectivity matrix part of the observation in gym: {'connectivity_matrix' in obs_gym}")

# add the "connectivity matrix" as part of the observation in gym
from gym.spaces import Box
shape_ = (env_glop.dim_topo, env_glop.dim_topo)
env_gym.observation_space.add_key("connectivity_matrix",
                                  lambda obs: obs.connectivity_matrix(),
                                  Box(shape=shape_,
                                      low=np.zeros(shape_),
                                      high=np.ones(shape_),
                                    )
                                  )

# we highly recommend to "reset" the environment after setting up the observation space

obs_gym = env_gym.reset()
print(f"Is the connectivity matrix part of the observation in gym: {'connectivity_matrix' in obs_gym}")
get_dict_encoding()[source]

TODO examples and description

ignore_attr(attr_names)[source]

ignore some attribute names from the space

keep_only_attr(attr_names)[source]

keep only a certain part of the observation

reenc(key, fun)[source]

shorthand for GymObservationSpace.reencode_space() or GymActionSpace.reencode_space()

reencode_space(key, func)[source]

TODO examples and description

seed(seed=None)[source]

Seed the PRNG of this space. see issue https://github.com/openai/gym/issues/2166 of openAI gym

Legacy version

If you are interested by this feature, we recommend you to proceed like this:

import grid2op
from grid2op.gym_compat import GymActionSpace, GymObservationSpace
from grid2op.Agent import BaseAgent

class MyAgent(BaseAgent):
   def __init__(self, action_space, observation_space):
      BaseAgent.__init__(self, action_space)
      self.gym_obs_space = GymObservationSpace(observation_space)
      self.gym_action_space = GymActionSpace(observation_space)

   def act(self, obs, reward, done=False):
      # convert the observation to gym like one:
      gym_obs = self.gym_obs_space.to_gym(obs)

      # do whatever you want, as long as you retrieve a gym-like action
      gym_action = ...
      grid2op_action = self.gym_action_space.from_gym(gym_action)
      # NB advanced usage: if action_space is a grid2op.converter (for example coming from IdToAct)
      # then what's called  "grid2op_action" is in fact an action that can be understood by the converter.
      # to convert it back to grid2op action you need to convert it. See the documentation of GymActionSpace
      # for such purpose.
      return grid2op_action

env = grid2op.make(...)
my_agent = MyAgent(env.action_space, env.observation_space, ...)

# and now do anything you like
# for example
done = False
reward = env.reward_range[0]
obs = env.reset()
while not done:
   action = my_agent.act(obs, reward, done)
   obs, reward, done, info = env.step(action)

We also implemented some “converter” that allow the conversion of some action space into more convenient gym.spaces (this is only available if gym is installed of course). Please check grid2op.gym_compat.GymActionSpace for more information and examples.

If you still can’t find what you’re looking for, try in one of the following pages:

Still trouble finding the information ? Do not hesitate to send a github issue about the documentation at this link: Documentation issue template