Compatibility with openAI gym

The gym framework in reinforcement learning is widely used. Starting from version 1.2.0 we improved the compatibility with this framework.

Before grid2op 1.2.0 only some classes fully implemented the open AI gym interface:

  • the grid2op.Environment (with methods such as env.reset, env.step etc.)

  • the grid2op.Agent (with the agent.act etc.)

  • the creation of pre defined environments (with grid2op.make)

Starting from 1.2.0 we implemented some automatic converters that are able to automatically map grid2op representation for the action space and the observation space into open AI gym “spaces”. More precisely these are represented as gym.spaces.Dict.

As of grid2op 1.4.0 we tighten the gap between openAI gym and grid2op by introducing the dedicated module grid2op.gym_compat . Withing this module there are lots of functionalities to convert a grid2op environment into a gym environment (that inherit gym.Env instead of “simply” implementing the open ai gym interface).

A simple usage is:

import grid2op
from grid2op.gym_compat import GymEnv

env_name = "l2rpn_case14_sandbox"  # or any other grid2op environment name
g2op_env = grid2op.make(env_name)  # create the gri2op environment

gym_env = GymEnv(g2op_env)  # create the gym environment

# check that this is a properly defined gym environment:
import gym
print(f"Is gym_env and open AI gym environment: {isinstance(gym_env, gym.Env)}")
# it shows "Is gym_env and open AI gym environment: True"

Note

To be as close as grid2op as possible, by default (using the methode discribed above) the action space will be encoded as a gym Dict with keys the attribute of a grid2op action. This might not be the best representation to perform RL with (some framework do not really like it…)

For more customization on that side, please refer to the section Customizing the action and observation space, into Box or Discrete below

This page is organized as follow:

Observation space and action space customization

By default, the action space and observation space are gym.spaces.Dict with the keys being the attribute to modify.

Default Observations space

For example, an observation space will look like:

  • “_shunt_p”: Box(env.n_shunt,) [type: float, low: -inf, high: inf]

  • “_shunt_q”: Box(env.n_shunt,) [type: float, low: -inf, high: inf]

  • “_shunt_v”: Box(env.n_shunt,) [type: float, low: -inf, high: inf]

  • “_shunt_bus”: Box(env.n_shunt,) [type: int, low: -inf, high: inf]

  • “a_ex”: Box(env.n_line,) [type: float, low: 0, high: inf]

  • “a_or”: Box(env.n_line,) [type: float, low: 0, high: inf]

  • “actual_dispatch”: Box(env.n_gen,)

  • “attention_budget”: Box(1,) [type: float, low: 0, high: inf]

  • “current_step”: Box(1,) [type: int, low: -inf, high: inf]

  • “curtailment”: Box(env.n_gen,) [type: float, low: 0., high: 1.0]

  • “curtailment_limit”: Box(env.n_gen,) [type: float, low: 0., high: 1.0]

  • “curtailment_limit_effective”: Box(env.n_gen,) [type: float, low: 0., high: 1.0]

  • “day”: Discrete(32)

  • “day_of_week”: Discrete(8)

  • “delta_time”: Box(0.0, inf, (1,), float32)

  • “duration_next_maintenance”: Box(env.n_line,) [type: int, low: -1, high: inf]

  • “gen_p”: Box(env.n_gen,) [type: float, low: env.gen_pmin, high: env.gen_pmax * 1.2]

  • “gen_p_before_curtail”: Box(env.n_gen,) [type: float, low: env.gen_pmin, high: env.gen_pmax * 1.2]

  • “gen_q”: Box(env.n_gen,) [type: float, low: -inf, high: inf]

  • “gen_v”: Box(env.n_gen,) [type: float, low: 0, high: inf]

  • “gen_margin_up”: Box(env.n_gen,) [type: float, low: 0, high: env.gen_max_ramp_up]

  • “gen_margin_down”: Box(env.n_gen,) [type: float, low: 0, high: env.gen_max_ramp_down]

  • “hour_of_day”: Discrete(24)

  • “is_alarm_illegal”: Discrete(2)

  • “line_status”: MultiBinary(env.n_line)

  • “load_p”: Box(env.n_load,) [type: float, low: -inf, high: inf]

  • “load_q”: Box(env.n_load,) [type: float, low: -inf, high: inf]

  • “load_v”: Box(env.n_load,) [type: float, low: -inf, high: inf]

  • “max_step”: Box(1,) [type: int, low: -inf, high: inf]

  • “minute_of_hour”: Discrete(60)

  • “month”: Discrete(13)

  • “p_ex”: Box(env.n_line,) [type: float, low: -inf, high: inf]

  • “p_or”: Box(env.n_line,) [type: float, low: -inf, high: inf]

  • “q_ex”: Box(env.n_line,) [type: float, low: -inf, high: inf]

  • “q_or”: Box(env.n_line,) [type: float, low: -inf, high: inf]

  • “rho”: Box(env.n_line,) [type: float, low: 0., high: inf]

  • “storage_charge”: Box(env.n_storage,) [type: float, low: 0., high: env.storage_Emax]

  • “storage_power”: Box(env.n_storage,) [type: float, low: -env.storage_max_p_prod, high: env.storage_max_p_absorb]

  • “storage_power_target”: Box(env.n_storage,) [type: float, low: -env.storage_max_p_prod, high: env.storage_max_p_absorb]

  • “target_dispatch”: Box(env.n_gen,)

  • “theta_or”: Box(env.n_line,) [type: float, low: -180., high: 180.]

  • “theta_ex”: Box(env.n_line,) [type: float, low: -180., high: 180.]

  • “load_theta”: Box(env.n_load,) [type: float, low: -180., high: 180.]

  • “gen_theta”: Box(env.n_gen,) [type: float, low: -180., high: 180.]

  • “storage_theta”: : Box(env.n_storage,) [type: float, low: -180., high: 180.]

  • “time_before_cooldown_line”: Box(env.n_line,) [type: int, low: 0, high: depending on parameters]

  • “time_before_cooldown_sub”: Box(env.n_sub,) [type: int, low: 0, high: depending on parameters]

  • “time_next_maintenance”: Box(env.n_line,) [type: int, low: 0, high: inf]

  • “time_since_last_alarm”: Box(1,) [type: int, low: -1, high: inf]

  • “timestep_overflow”: Box(env.n_line,) [type: int, low: 0, high: inf]

  • “thermal_limit”: Box(env.n_line,) [type: int, low: 0, high: inf]

  • “topo_vect”: Box(env.dim_topo,) [type: int, low: -1, high: 2]

  • “v_ex”: Box(env.n_line,) [type: float, low: 0, high: inf]

  • “v_or”: Box(env.n_line,) [type: flaot, low: 0, high: inf]

  • “was_alarm_used_after_game_over”: Discrete(2)

  • “year”: Discrete(2100)

Each keys correspond to an attribute of the observation. In this example “line_status”: MultiBinary(20) represents the attribute obs.line_status which is a boolean vector (for each powerline True encodes for “connected” and False for “disconnected”) See the chapter Observation for more information about these attributes.

Default Action space

The default action space is also a type of gym Dict. As for the observation space above, it is a straight translation from the attribute of the action to the key of the dictionary. This gives:

  • “change_bus”: MultiBinary(env.dim_topo)

  • “change_line_status”: MultiBinary(env.n_line)

  • “curtail”: Box(env.n_gen) [type: float, low=0., high=1.0]

  • “redispatch”: Box(env.n_gen) [type: float, low=-env.gen_max_ramp_down, high=`env.gen_max_ramp_up`]

  • “set_bus”: Box(env.dim_topo) [type: int, low=-1, high=2]

  • “set_line_status”: Box(env.n_line) [type: int, low=-1, high=1]

  • “storage_power”: Box(env.n_storage) [type: float, low=-env.storage_max_p_prod, high=`env.storage_max_p_absorb`]

Customizing the action and observation space

We offer some convenience functions to customize these spaces.

If you want a full control on this spaces, you need to implement something like:

import grid2op
env_name = ...
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv
# this of course will not work... Replace "AGymSpace" with a normal gym space, like Dict, Box, MultiDiscrete etc.
from gym.spaces import AGymSpace
gym_env = GymEnv(env)

class MyCustomObservationSpace(AGymSpace):
    def __init__(self, whatever, you, want):
        # do as you please here
        pass
        # don't forget to initialize the base class
        AGymSpace.__init__(self, see, gym, doc, as, to, how, to, initialize, it)
        # eg. Box.__init__(self, low=..., high=..., dtype=float)

    def to_gym(self, observation):
        # this is this very same function that you need to implement
        # it should have this exact name, take only one observation (grid2op) as input
        # and return a gym object that belong to your space "AGymSpace"
        return SomethingThatBelongTo_AGymSpace
        # eg. return np.concatenate((obs.gen_p * 0.1, np.sqrt(obs.load_p))

gym_env.observation_space = MyCustomObservationSpace(whatever, you, wanted)

And for the action space:

import grid2op
env_name = ...
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv
# this of course will not work... Replace "AGymSpace" with a normal gym space, like Dict, Box, MultiDiscrete etc.
from gym.spaces import AGymSpace
gym_env = GymEnv(env)

class MyCustomActionSpace(AGymSpace):
    def __init__(self, whatever, you, want):
        # do as you please here
        pass
        # don't forget to initialize the base class
        AGymSpace.__init__(self, see, gym, doc, as, to, how, to, initialize, it)
        # eg. MultiDiscrete.__init__(self, nvec=...)

    def from_gym(self, gym_action):
        # this is this very same function that you need to implement
        # it should have this exact name, take only one action (member of your gym space) as input
        # and return a grid2op action
        return TheGymAction_ConvertedTo_Grid2op_Action
        # eg. return np.concatenate((obs.gen_p * 0.1, np.sqrt(obs.load_p))

gym_env.action_space = MyCustomActionSpace(whatever, you, wanted)

Customizing the action and observation space, using Converter

However, if you don’t want to fully customize everything, we encourage you to have a look at the “GymConverter” that we coded to ease this process.

They all more or less the same manner. We show here an example of a “converter” that will scale the data (removing the value in substract and divide input data by divide):

import grid2op
from grid2op.gym_compat import GymEnv
from grid2op.gym_compat import ScalerAttrConverter

env_name = "l2rpn_case14_sandbox"  # or any other grid2op environment name
g2op_env = grid2op.make(env_name)  # create the gri2op environment

gym_env = GymEnv(g2op_env)  # create the gym environment

ob_space = gym_env.observation_space
ob_space = ob_space.reencode_space("actual_dispatch",
                                   ScalerAttrConverter(substract=0.,
                                                       divide=env.gen_pmax,
                                                       init_space=ob_space["actual_dispatch"]
                                                       )
                                   )

gym_env.observation_space = ob_space

You can also add a specific keys into this observation space, for example say you want to compute the log of the loads instead of giving the direct value to your agent. This can be done with:

import grid2op
from grid2op.gym_compat import GymEnv
from grid2op.gym_compat import ScalerAttrConverter

env_name = "l2rpn_case14_sandbox"  # or any other grid2op environment name
g2op_env = grid2op.make(env_name)  # create the gri2op environment

gym_env = GymEnv(g2op_env)  # create the gym environment

ob_space = gym_env.observation_space
shape_ = (g2op_env.n_load, )
ob_space = ob_space.add_key("log_load",
                             lambda obs: np.log(obs.load_p),
                                      Box(shape=shape_,
                                          low=np.full(shape_, fill_value=-np.inf, dtype=float),
                                          high=np.full(shape_, fill_value=-np.inf, dtype=float),
                                          dtype=float
                                          )
                                   )

gym_env.observation_space = ob_space
# and now you will get the key "log_load" as part of your gym observation.

A detailed list of such “converter” is documented on the section “Detailed Documentation by class”. In the table below we describe some of them (nb if you notice a converter is not displayed there, do not hesitate to write us a “feature request” for the documentation, thanks in advance)

Converter name

Objective

ContinuousToDiscreteConverter

Convert a continuous space into a discrete one

MultiToTupleConverter

Convert a gym MultiBinary to a gym Tuple of gym Binary and a gym MultiDiscrete to a Tuple of Discrete

ScalerAttrConverter

Allows to scale (divide an attribute by something and subtract something from it

BaseGymSpaceConverter.add_key

Allows you to compute another “part” of the observation space (you add an information to the gym space)

BaseGymSpaceConverter.keep_only_attr

Allows you to specify which part of the action / observation you want to keep

BaseGymSpaceConverter.ignore_attr

Allows you to ignore some attributes of the action / observation (they will not be part of the gym space)

Note

With the “converters” above, note that the observation space AND action space will still inherit from gym Dict.

They are complex spaces that are not well handled by some RL framework.

These converters only change the keys of these dictionaries !

Customizing the action and observation space, into Box or Discrete

The use of the converter above is nice if you can work with gym Dict, but in some cases, or for some frameworks it is not convenient to do it at all.

TO alleviate this problem, we developed 3 types of gym action space, following the architecture detailed in subsection Customizing the action and observation space

Converter name

Objective

BoxGymObsSpace

Convert the observation space to a single “Box”

BoxGymActSpace

Convert a gym MultiBinary to a gym Tuple of gym Binary and a gym MultiDiscrete to a Tuple of Discrete

MultiDiscreteActSpace

Allows to scale (divide an attribute by something and subtract something from it

DiscreteActSpace

Allows you to compute another “part” of the observation space (you add an information to the gym space)

They can all be used like:

import grid2op
env_name = ...
env = grid2op.make(env_name)

from grid2op.gym_compat import GymEnv, BoxGymObsSpace
gym_env = GymEnv(env)
gym_env.observation_space = BoxGymObsSpace(gym_env.init_env)
gym_env.action_space = MultiDiscreteActSpace(gym_env.init_env)

We encourage you to visit the documentation for more information on how to use these classes. Each offer different possible customization.

Detailed Documentation by class

Legacy version

If you are interested by this feature, we recommend you to proceed like this:

import grid2op
from grid2op.gym_compat import GymActionSpace, GymObservationSpace
from grid2op.Agent import BaseAgent

class MyAgent(BaseAgent):
   def __init__(self, action_space, observation_space):
      BaseAgent.__init__(self, action_space)
      self.gym_obs_space = GymObservationSpace(observation_space)
      self.gym_action_space = GymActionSpace(observation_space)

   def act(self, obs, reward, done=False):
      # convert the observation to gym like one:
      gym_obs = self.gym_obs_space.to_gym(obs)

      # do whatever you want, as long as you retrieve a gym-like action
      gym_action = ...
      grid2op_action = self.gym_action_space.from_gym(gym_action)
      # NB advanced usage: if action_space is a grid2op.converter (for example coming from IdToAct)
      # then what's called  "grid2op_action" is in fact an action that can be understood by the converter.
      # to convert it back to grid2op action you need to convert it. See the documentation of GymActionSpace
      # for such purpose.
      return grid2op_action

env = grid2op.make(...)
my_agent = MyAgent(env.action_space, env.observation_space, ...)

# and now do anything you like
# for example
done = False
reward = env.reward_range[0]
obs = env.reset()
while not done:
   action = my_agent.act(obs, reward, done)
   obs, reward, done, info = env.step(action)

We also implemented some “converter” that allow the conversion of some action space into more convenient gym.spaces (this is only available if gym is installed of course). Please check grid2op.gym_compat.GymActionSpace for more information and examples.

If you still can’t find what you’re looking for, try in one of the following pages:

Still trouble finding the information ? Do not hesitate to send a github issue about the documentation at this link: Documentation issue template