Reward
This page is organized as follow:
Objectives
This module implements some utilities to get rewards given an grid2op.Action
an grid2op.Environment
and some associated context (like has there been an error etc.)
It is possible to modify the reward to use to better suit a training scheme, or to better take into account
some phenomenon by simulating the effect of some grid2op.Action
using
grid2op.Observation.BaseObservation.simulate()
.
Doing so only requires to derive the BaseReward
, and most notably the three abstract methods
BaseReward.__init__()
, BaseReward.initialize()
and BaseReward.__call__()
Customization of the reward
In grid2op you can customize the reward function / reward kernel used by your agent. By default, when you create an environment a reward has been specified for you by the creator of the environment and you have nothing to do:
import grid2op
env_name = "l2rpn_case14_sandbox"
env = grid2op.make(env_name)
obs = env.reset()
an_action = env.action_space()
obs, reward_value, done, info = env.step(an_action)
The value of the reward function above is computed by a default function that depends on
the environment you are using. For the example above, the “l2rpn_case14_sandbox” environment is
using the RedispReward
.
Using a reward function available in grid2op
If you want to customize your environment by adapting the reward and use a reward available in grid2op it is rather simple, you need to specify it in the make command:
import grid2op
from grid2op.Reward import EpisodeDurationReward
env_name = "l2rpn_case14_sandbox"
env = grid2op.make(env_name, reward_class=EpisodeDurationReward)
obs = env.reset()
an_action = env.action_space()
obs, reward_value, done, info = env.step(an_action)
In this example the reward_value is computed using the formula defined in the EpisodeDurationReward
.
Note
There is no error in the syntax. You need to provide the class and not an object of the class (see next paragraph for more information about that).
At time of writing the available reward functions is :
In the provided reward you have also some convenience functions to combine different reward. These are:
Basically these two classes allows you to combine (sum) different reward in a single one.
Passing an instance instead of a class
On some occasion, it might be easier to work with instance of classes (object) rather than to work with classes (especially if you want to customize the implementation used). You can do this without any issue:
import grid2op
from grid2op.Reward import N1Reward
env_name = "l2rpn_case14_sandbox"
n1_l1_reward = N1Reward(l_id=1) # this is an object and not a class.
env = grid2op.make(env_name, reward_class=n1_l1_reward)
obs = env.reset()
an_action = env.action_space()
obs, reward_value, done, info = env.step(an_action)
In this example reward_value is computed as being the maximum flow on all the powerlines after the disconnection of powerline 1 (because we specified l_id=1 at creation). If we want to know the maximum flows after disconnection of powerline 5 you can call:
import grid2op
from grid2op.Reward import N1Reward
env_name = "l2rpn_case14_sandbox"
n1_l5_reward = N1Reward(l_id=5) # this is an object and not a class.
env = grid2op.make(env_name, reward_class=n1_l5_reward)
Customizing the reward for the “simulate”
In grid2op, you have the possibility to simulate the impact of an action
on some future steps with the use of obs.simulate(…) (see grid2op.Observation.BaseObservation.simulate()
)
or obs.get_forecast_env() (see grid2op.Observation.BaseObservation.get_forecast_env()
).
In these methods you have some computations of rewards. Grid2op lets you allow to customize how these rewards are computed. You can change it in multiple fashion:
import grid2op
from grid2op.Reward import EpisodeDurationReward
env_name = "l2rpn_case14_sandbox"
env = grid2op.make(env_name, reward_class=EpisodeDurationReward)
obs = env.reset()
an_action = env.action_space()
sim_obs, sim_reward, sim_d, sim_i = obs.simulate(an_action)
By default sim_reward is comupted with the same function as the environment, in this
example EpisodeDurationReward
.
If for some reason you want to customize the formula used to compute sim_reward and cannot (or does not want to) modify the reward of the environment you can:
import grid2op
from grid2op.Reward import EpisodeDurationReward
env_name = "l2rpn_case14_sandbox"
env = grid2op.make(env_name)
obs = env.reset()
env.observation_space.change_reward(EpisodeDurationReward)
an_action = env.action_space()
sim_obs, sim_reward, sim_d, sim_i = obs.simulate(an_action)
next_obs, reward_value, done, info = env.step(an_action)
In this example, sim_reward is computed using the EpisodeDurationReward (on forecast data) and reward_value is computed using the default reward of “l2rpn_case14_sandbox” on the “real” time serie data.
Creating a new reward
If you don’t find any suitable reward function in grid2op (or in other package) you might want to implement one yourself.
To that end, you need to implement a class that derives from BaseReward
, like this:
import grid2op
from grid2op.Reward import BaseReward
from grid2op.Action import BaseAction
from grid2op.Environment import BaseEnv
class MyCustomReward(BaseReward):
def __init__(self, whatever, you, want, logger=None):
self.whatever = blablabla
# some code needed
...
super().__init__(logger)
def __call__(self,
action: BaseAction,
env: BaseEnv,
has_error: bool,
is_done: bool,
is_illegal: bool,
is_ambiguous: bool) -> float:
# only method really required.
# called at each step to compute the reward.
# this is where you need to code the "formula" of your reward
...
def initialize(self, env: BaseEnv):
# optional
# called once, the first time the reward is used
pass
def reset(self, env: BaseEnv):
# optional
# called by the environment each time it is "reset"
pass
def close(self):
# optional called once when the environment is deleted
pass
And then you can use your (custom) reward like any other:
import grid2op
from the_above_script import MyCustomReward
env_name = "l2rpn_case14_sandbox"
custom_reward = MyCustomReward(whatever=1, you=2, want=42)
env = grid2op.make(env_name, reward_class=custom_reward)
obs = env.reset()
an_action = env.action_space()
obs, reward_value, done, info = env.step(an_action)
And now reward_value is computed using the formula you defined in __call__
Training with multiple rewards
In the standard reinforcement learning framework the reward is unique. In grid2op, we didn’t want to modify that.
However powergrid are complex environment with some specific and unsual dynamics. For these reasons it can be difficult to compress all these signal into one single scalar. To speed up the learning process, to force the Agent to adopt more resilient strategies etc. it can be usefull to look at different aspect, thus using different reward. Grid2op allows to do so. At each time step (and also when using the simulate function) it is possible to compute different rewards. This rewards must inherit and be provided at the initialization of the Environment.
This can be done as followed:
import grid2op
from grid2op.Reward import GameplayReward, L2RPNReward
env = grid2op.make("case14_realistic", reward_class=L2RPNReward, other_rewards={"gameplay": GameplayReward})
obs = env.reset()
act = env.action_space() # the do nothing action
obs, reward, done, info = env.step(act) # immplement the do nothing action on the environment
On this example, “reward” comes from the L2RPNReward
and the results of the “reward” computed with the
GameplayReward
is accessible with the info[“rewards”][“gameplay”]. We choose for this example to name the other
rewards, “gameplay” which is related to the name of the reward “GampeplayReward” for convenience. The name
can be absolutely any string you want.
NB In the case of L2RPN competitions, the reward can be modified by the competitors, and so is the “other_reward” key word arguments. The only restriction is that the key “__score” will be use by the organizers to compute the score the agent. Any attempt to modify it will be erased by the score function used by the organizers without any warning.
What happens in the “reset”
TODO
Detailed Documentation by class
Classes:
|
This reward is based on the "alarm feature" where the agent is asked to send information about potential issue on the grid. |
|
|
|
Base class from which all rewards used in the Grid2Op framework should derived. |
|
This reward computes a penalty based on how many bridges are present in the grid network. |
|
This reward finds all lines close to overflowing. |
|
This class allows to combine multiple pre defined reward. |
|
This class allows to combine multiple rewards. |
|
Most basic implementation of reward: everything has the same values: 0.0 |
|
This reward computes a penalty based on the distance of the current grid to the grid at time 0 where everything is connected to bus 1. |
|
This reward computes the marginal cost of the powergrid. |
|
This reward will always be 0., unless at the end of an episode where it will return the number of steps made by the agent divided by the total number of steps possible in the episode. |
|
This reward return a fixed number (if there are not error) or 0 if there is an error. |
|
This rewards is strictly computed based on the Game status. |
|
This reward just counts the number of timestep the agent has successfully manage to perform. |
|
This is the historical |
|
INTERNAL |
|
INTERNAL |
|
Reward based on lines capacity usage Returns max reward if no current is flowing in the lines Returns min reward if all lines are used at max capacity |
|
This reward computes a penalty based on the number of powerline that could have been reconnected (cooldown at 0.) but are still disconnected. |
|
This class implements a reward that is inspired by the "n-1" criterion widely used in power system. |
|
This reward can be used for environments where redispatching is available. |
|
INTERNAL |
- class grid2op.Reward.AlarmReward(logger=None)[source]
This reward is based on the “alarm feature” where the agent is asked to send information about potential issue on the grid.
On this case, when the environment is in a “game over” state (eg it’s the end) then the reward is computed the following way:
if the environment has been successfully manage until the end of the chronics, then 1.0 is returned
if no alarm has been raised, then -1.0 is return
Examples
You can use this reward in any environment with:
import grid2op from grid2op.Reward import AlarmReward # then you create your environment with it: NAME_OF_THE_ENVIRONMENT = "l2rpn_case14_sandbox" env = grid2op.make(NAME_OF_THE_ENVIRONMENT,reward_class=AlarmReward) # and do a step with a "do nothing" action obs = env.reset() obs, reward, done, info = env.step(env.action_space()) # the reward is computed with the AlarmReward class
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([logger])Initializes
BaseReward.reward_min
andBaseReward.reward_max
initialize
(env)If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.reset
(env)This method is called each time env is reset.
- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- initialize(env)[source]
If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.
- Parameters:
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.
- reset(env)[source]
This method is called each time env is reset.
It can be usefull, for example if the reward depends on the length of the current chronics.
It does nothing by default.
- Parameters:
env (
grid2op.Environment.Environment
) – The current environmentdanger:: (..) –
This function should not modify self.reward_min nor self.reward_max !!!
It might cause really hard trouble for agent to learn if you do so.
- class grid2op.Reward.AlertReward(logger=None, reward_min_no_blackout=-1.0, reward_min_blackout=-10.0, reward_max_no_blackout=1.0, reward_max_blackout=2.0, reward_end_episode_bonus=1.0)[source]
Note
DOC IN PROGRESS !
This reward is based on the “alert feature” where the agent is asked to send information about potential line overload issue on the grid after unpredictable powerline disconnection (attack of the opponent). The alerts are assessed once per attack.
This rewards is computed as followed:
if an attack occurs and the agent survives env.parameters.ALERT_TIME_WINDOW steps then: - if the agent sent an alert BEFORE the attack, reward returns reward_min_no_blackout (-1 by default) - if the agent did not sent an alert BEFORE the attack, reward returns reward_max_no_blackout (1 by default)
if an attack occurs and the agent “games over” withing env.parameters.ALERT_TIME_WINDOW steps then: - if the agent sent an alert BEFORE the attack, reward returns reward_max_blackout (2 by default) - if the agent did not sent an alert BEFORE the attack, reward returns reward_min_blackout (-10 by default)
whatever the attacks / no attacks / alert / no alert, if the scenario is completed until the end, then agent receive reward_end_episode_bonus (1 by default)
In all other cases, including but not limited to:
agent games over but there has been no attack within the previous env.parameters.ALERT_TIME_WINDOW (12) steps
there is no attack
The reward outputs 0.
This is then a “delayed reward”: you receive the reward (in general) env.parameters.ALERT_TIME_WINDOW after having sent the alert.
This is also a “sparse reward”: in the vast majority of cases it’s 0. It is only non zero in case of blackout (at most once per episode) and each time an attack occurs (and in general there is relatively few attacks)
TODO explain a bit more in the “multi lines attacked”
See also
Alert feature section of the doc for more information
Examples
You can use this reward in any environment with:
import grid2op from grid2op.Reward import AlertReward # then you create your environment with it: # at time of writing, the only env supporting it is "l2rpn_idf_2023" NAME_OF_THE_ENVIRONMENT = "l2rpn_idf_2023" env = grid2op.make(NAME_OF_THE_ENVIRONMENT, reward_class=AlertReward) # and do a step with a "do nothing" action obs = env.reset() obs, reward, done, info = env.step(env.action_space()) # the reward is computed with the AlertReward class
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([logger, reward_min_no_blackout, ...])Initializes
BaseReward.reward_min
andBaseReward.reward_max
initialize
(env)If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.reset
(env)This method is called each time env is reset.
- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(logger=None, reward_min_no_blackout=-1.0, reward_min_blackout=-10.0, reward_max_no_blackout=1.0, reward_max_blackout=2.0, reward_end_episode_bonus=1.0)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- initialize(env: grid2op.Environment.BaseEnv)[source]
If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.
- Parameters:
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.
- reset(env)[source]
This method is called each time env is reset.
It can be usefull, for example if the reward depends on the length of the current chronics.
It does nothing by default.
- Parameters:
env (
grid2op.Environment.Environment
) – The current environmentdanger:: (..) –
This function should not modify self.reward_min nor self.reward_max !!!
It might cause really hard trouble for agent to learn if you do so.
- class grid2op.Reward.BaseReward(logger: Logger | None = None)[source]
Base class from which all rewards used in the Grid2Op framework should derived.
In reinforcement learning, a reward is a signal send by the
grid2op.Environment.Environment
to thegrid2op.BaseAgent
indicating how well this agent performs.One of the goal of Reinforcement Learning is to maximize the (discounted) sum of (expected) rewards over time.
You can create all rewards you want in grid2op. The only requirement is that all rewards should inherit this BaseReward.
- reward_min
The minimum reward an
grid2op.BaseAgent
can get performing the worst possiblegrid2op.Action.BaseAction
in the worst possible scenario.- Type:
float
- reward_max
The maximum reward an
grid2op.Agent.BaseAgent
can get performing the best possiblegrid2op.Action.BaseAction
in the best possible scenario.- Type:
float
Examples
If you want the environment to compute a reward that is the sum of the flow (this is not a good reward, but we use it as an example on how to do it) you can achieve it with:
import grid2op from grid2op.Reward import BaseReward # first you create your reward class SumOfFlowReward(BaseReward): def __init__(self): BaseReward.__init__(self) def initialize(self, env): # this function is used to inform the class instance about the environment specification # you can use `env.n_line` or `env.n_load` or `env.get_thermal_limit()` for example # do not forget to initialize "reward_min" and "reward_max" self.reward_min = 0. self.reward_max = np.sum(env.get_thermal_limit) # in this case the maximum reward is obtained when i compute the sum of the maximum flows # on each powerline def __call__(action, env, has_error, is_done, is_illegal, is_ambiguous): # this method is called at the end of 'env.step' to compute the reward # in our case we just want to sum the flow on each powerline because... why not... if has_error: # see the "Notes" paragraph for more information res = self.reward_min else: res = np.sum(env.get_obs().a_or) return res # then you create your environment with it: NAME_OF_THE_ENVIRONMENT = "l2rpn_case14_sandbox" env = grid2op.make(NAME_OF_THE_ENVIRONMENT,reward_class=SumOfFlowReward) # and do a step with a "do nothing" action obs = env.reset() obs, reward, done, info = env.step(env.action_space()) assert np.sum(obs.a_or) == reward # the above should be true
Notes
If the flag has_error is set to
True
this indicates there has been an error in the “env.step” function. This might induce some undefined behaviour if using some method of the environment.Please make sure to check whether or not this is the case when defining your reward.
This “new” behaviour has been introduce to “fix” the akward behavior spotted in # https://github.com/rte-france/Grid2Op/issues/146
def __call__(action, env, has_error, is_done, is_illegal, is_ambiguous): if has_error: # DO SOMETHING IN THIS CASE res = self.reward_min else: # DO NOT USE `env.get_obs()` (nor any method of the environment `env.XXX` if the flag `has_error` # is set to ``True`` # This might result in undefined behaviour res = np.sum(env.get_obs().a_or) return res
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([logger])Initializes
BaseReward.reward_min
andBaseReward.reward_max
__iter__
()Implements python iterable to get a dict summary using summary = dict(reward_instance) Can be overloaded by subclass, default implementation gives name, reward_min, reward_max
close
()overide this for certain reward that might need specific behaviour
Shorthand to retrieve both the minimum and maximum possible rewards in one command.
initialize
(env)If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.reset
(env)This method is called each time env is reset.
set_range
(reward_min, reward_max)Setter function for the
BaseReward.reward_min
andBaseReward.reward_max
.Attributes:
list of weak references to the object (if defined)
- abstractmethod __call__(action: BaseAction, env: BaseEnv, has_error: bool, is_done: bool, is_illegal: bool, is_ambiguous: bool) float [source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(logger: Logger | None = None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- for ... in __iter__()[source]
Implements python iterable to get a dict summary using summary = dict(reward_instance) Can be overloaded by subclass, default implementation gives name, reward_min, reward_max
- __weakref__
list of weak references to the object (if defined)
- get_range()[source]
Shorthand to retrieve both the minimum and maximum possible rewards in one command.
It is not recommended to override this function.
- Returns:
reward_min (
float
) – The minimum reward, seeBaseReward.reward_min
reward_max (
float
) – The maximum reward, seeBaseReward.reward_max
- initialize(env: BaseEnv) None [source]
If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.
- Parameters:
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.
- reset(env: BaseEnv) None [source]
This method is called each time env is reset.
It can be usefull, for example if the reward depends on the length of the current chronics.
It does nothing by default.
- Parameters:
env (
grid2op.Environment.Environment
) – The current environmentdanger:: (..) –
This function should not modify self.reward_min nor self.reward_max !!!
It might cause really hard trouble for agent to learn if you do so.
- set_range(reward_min: float, reward_max: float)[source]
Setter function for the
BaseReward.reward_min
andBaseReward.reward_max
.It is not recommended to override this function
- Parameters:
reward_min (
float
) – The minimum reward, seeBaseReward.reward_min
reward_max (
float
) – The maximum reward, seeBaseReward.reward_max
- class grid2op.Reward.BridgeReward(min_pen_lte=0.0, max_pen_gte=1.0, logger=None)[source]
This reward computes a penalty based on how many bridges are present in the grid network. In graph theory, a bridge is an edge that if removed will cause the graph to be disconnected.
Examples
You can use this reward in any environment with:
import grid2op from grid2op.Reward import BridgeReward # then you create your environment with it: NAME_OF_THE_ENVIRONMENT = "l2rpn_case14_sandbox" env = grid2op.make(NAME_OF_THE_ENVIRONMENT,reward_class=BridgeReward) # and do a step with a "do nothing" action obs = env.reset() obs, reward, done, info = env.step(env.action_space()) # the reward is computed with this class (computing the penalty based on the number of "bridges" in the grid)
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([min_pen_lte, max_pen_gte, logger])Initializes
BaseReward.reward_min
andBaseReward.reward_max
- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(min_pen_lte=0.0, max_pen_gte=1.0, logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- class grid2op.Reward.CloseToOverflowReward(max_lines=5, logger=None)[source]
This reward finds all lines close to overflowing. Returns max reward when there is no overflow, min reward if more than one line is close to overflow and the mean between max and min reward if one line is close to overflow
Examples
You can use this reward in any environment with:
import grid2op from grid2op.Reward import CloseToOverflowReward # then you create your environment with it: NAME_OF_THE_ENVIRONMENT = "l2rpn_case14_sandbox" env = grid2op.make(NAME_OF_THE_ENVIRONMENT,reward_class=CloseToOverflowReward) # and do a step with a "do nothing" action obs = env.reset() obs, reward, done, info = env.step(env.action_space()) # the reward is computed with this class (computing the penalty based on the number of overflow)
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([max_lines, logger])Initializes
BaseReward.reward_min
andBaseReward.reward_max
initialize
(env)If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(max_lines=5, logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- initialize(env)[source]
If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.
- Parameters:
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.
- class grid2op.Reward.CombinedReward(logger=None)[source]
This class allows to combine multiple pre defined reward. The reward it computes will be the sum of all the sub rewards it is made of.
Each sub reward is identified by a key.
It is used a bit differently that the other rewards. See the section example for more information.
Examples
import grid2op from grid2op.Reward import GameplayReward, FlatReward, CombinedReward env = grid2op.make(..., reward_class=CombinedReward) cr = self.env.get_reward_instance() cr.addReward("Gameplay", GameplayReward(), 1.0) cr.addReward("Flat", FlatReward(), 1.0) cr.initialize(self.env) obs = env.reset() obs, reward, done, info = env.step(env.action_space()) # reward here is computed by summing the results of what would have # given `GameplayReward` and the one from `FlatReward`
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([logger])Initializes
BaseReward.reward_min
andBaseReward.reward_max
__iter__
()Implements python iterable to get a dict summary using summary = dict(reward_instance) Can be overloaded by subclass, default implementation gives name, reward_min, reward_max
close
()overide this for certain reward that might need specific behaviour
initialize
(env)If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- for ... in __iter__()[source]
Implements python iterable to get a dict summary using summary = dict(reward_instance) Can be overloaded by subclass, default implementation gives name, reward_min, reward_max
- initialize(env)[source]
If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.
- Parameters:
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.
- class grid2op.Reward.CombinedScaledReward(logger=None)[source]
This class allows to combine multiple rewards. It will compute a scaled reward of the weighted sum of the registered rewards. Scaling is done by linearly interpolating the weighted sum, from the range [min_sum; max_sum] to [reward_min; reward_max]
min_sum and max_sum are computed from the weights and ranges of registered rewards. See
Reward.BaseReward
for setting the output range.Examples
import grid2op from grid2op.Reward import GameplayReward, FlatReward, CombinedScaledReward env = grid2op.make(..., reward_class=CombinedScaledReward) cr = self.env.get_reward_instance() cr.addReward("Gameplay", GameplayReward(), 1.0) cr.addReward("Flat", FlatReward(), 1.0) cr.initialize(self.env) obs = env.reset() obs, reward, done, info = env.step(env.action_space()) # reward here is computed by summing the results of what would have # given `GameplayReward` and the one from `FlatReward`
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([logger])Initializes
BaseReward.reward_min
andBaseReward.reward_max
close
()overide this for certain reward that might need specific behaviour
initialize
(env)Overloaded initialze from Reward.CombinedReward.
- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- class grid2op.Reward.ConstantReward(logger=None)[source]
Most basic implementation of reward: everything has the same values: 0.0
Note that this
BaseReward
subtype is not useful at all, whether to train anBaseAgent
nor to assess its performance of course.Examples
You can use this reward in any environment with:
import grid2op from grid2op.Reward import ConstantReward # then you create your environment with it: NAME_OF_THE_ENVIRONMENT = "l2rpn_case14_sandbox" env = grid2op.make(NAME_OF_THE_ENVIRONMENT,reward_class=ConstantReward) # and do a step with a "do nothing" action obs = env.reset() obs, reward, done, info = env.step(env.action_space()) # the reward is 0., always... Not really useful
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([logger])Initializes
BaseReward.reward_min
andBaseReward.reward_max
- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- class grid2op.Reward.DistanceReward(logger=None)[source]
This reward computes a penalty based on the distance of the current grid to the grid at time 0 where everything is connected to bus 1.
Examples
You can use this reward in any environment with:
import grid2op from grid2op.Reward import DistanceReward # then you create your environment with it: NAME_OF_THE_ENVIRONMENT = "l2rpn_case14_sandbox" env = grid2op.make(NAME_OF_THE_ENVIRONMENT,reward_class=DistanceReward) # and do a step with a "do nothing" action obs = env.reset() obs, reward, done, info = env.step(env.action_space()) # the reward is computed with the DistanceReward class
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([logger])Initializes
BaseReward.reward_min
andBaseReward.reward_max
- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- class grid2op.Reward.EconomicReward(logger=None)[source]
This reward computes the marginal cost of the powergrid. As RL is about maximising a reward, while we want to minimize the cost, this class also ensures that:
the reward is positive if there is no game over, no error etc.
the reward is inversely proportional to the cost of the grid (the higher the reward, the lower the economic cost).
Examples
You can use this reward in any environment with:
import grid2op from grid2op.Reward import EconomicReward # then you create your environment with it: NAME_OF_THE_ENVIRONMENT = "l2rpn_case14_sandbox" env = grid2op.make(NAME_OF_THE_ENVIRONMENT,reward_class=EconomicReward) # and do a step with a "do nothing" action obs = env.reset() obs, reward, done, info = env.step(env.action_space()) # the reward is computed with the EconomicReward class
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([logger])Initializes
BaseReward.reward_min
andBaseReward.reward_max
initialize
(env)If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- initialize(env)[source]
If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.
- Parameters:
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.
- class grid2op.Reward.EpisodeDurationReward(per_timestep=1, logger=None)[source]
This reward will always be 0., unless at the end of an episode where it will return the number of steps made by the agent divided by the total number of steps possible in the episode.
Examples
You can use this reward in any environment with:
import grid2op from grid2op.Reward import EpisodeDurationReward # then you create your environment with it: NAME_OF_THE_ENVIRONMENT = "l2rpn_case14_sandbox" env = grid2op.make(NAME_OF_THE_ENVIRONMENT,reward_class=EpisodeDurationReward) # and do a step with a "do nothing" action obs = env.reset() obs, reward, done, info = env.step(env.action_space()) # the reward is computed with the EpisodeDurationReward class
Notes
In case of an environment being “fast forward” (see
grid2op.Environment.BaseEnv.fast_forward_chronics()
) the time “during” the fast forward are counted “as if” they were successful.This means that if you “fast forward” up until the end of an episode, you are likely to receive a reward of 1.0
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([per_timestep, logger])Initializes
BaseReward.reward_min
andBaseReward.reward_max
initialize
(env)If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.reset
(env)This method is called each time env is reset.
- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(per_timestep=1, logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- initialize(env)[source]
If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.
- Parameters:
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.
- reset(env)[source]
This method is called each time env is reset.
It can be usefull, for example if the reward depends on the length of the current chronics.
It does nothing by default.
- Parameters:
env (
grid2op.Environment.Environment
) – The current environmentdanger:: (..) –
This function should not modify self.reward_min nor self.reward_max !!!
It might cause really hard trouble for agent to learn if you do so.
- class grid2op.Reward.FlatReward(per_timestep=1, logger=None)[source]
This reward return a fixed number (if there are not error) or 0 if there is an error.
Examples
You can use this reward in any environment with:
import grid2op from grid2op.Reward import FlatReward # then you create your environment with it: NAME_OF_THE_ENVIRONMENT = "l2rpn_case14_sandbox" env = grid2op.make(NAME_OF_THE_ENVIRONMENT,reward_class=FlatReward) # and do a step with a "do nothing" action obs = env.reset() obs, reward, done, info = env.step(env.action_space()) # the reward is computed with the FlatReward class
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([per_timestep, logger])Initializes
BaseReward.reward_min
andBaseReward.reward_max
- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(per_timestep=1, logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- class grid2op.Reward.GameplayReward(logger=None)[source]
This rewards is strictly computed based on the Game status. It yields a negative reward in case of game over. A half negative reward on rules infringement. Otherwise the reward is positive.
Examples
You can use this reward in any environment with:
import grid2op from grid2op.Reward import GameplayReward # then you create your environment with it: NAME_OF_THE_ENVIRONMENT = "l2rpn_case14_sandbox" env = grid2op.make(NAME_OF_THE_ENVIRONMENT,reward_class=GameplayReward) # and do a step with a "do nothing" action obs = env.reset() obs, reward, done, info = env.step(env.action_space()) # the reward is computed with the GameplayReward class
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([logger])Initializes
BaseReward.reward_min
andBaseReward.reward_max
- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- class grid2op.Reward.IncreasingFlatReward(per_timestep=1, logger=None)[source]
This reward just counts the number of timestep the agent has successfully manage to perform.
It adds a constant reward for each time step successfully handled.
Examples
You can use this reward in any environment with:
import grid2op from grid2op.Reward import IncreasingFlatReward # then you create your environment with it: NAME_OF_THE_ENVIRONMENT = "l2rpn_case14_sandbox" env = grid2op.make(NAME_OF_THE_ENVIRONMENT,reward_class=IncreasingFlatReward) # and do a step with a "do nothing" action obs = env.reset() obs, reward, done, info = env.step(env.action_space()) # the reward is computed with the IncreasingFlatReward class
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([per_timestep, logger])Initializes
BaseReward.reward_min
andBaseReward.reward_max
initialize
(env)If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(per_timestep=1, logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- initialize(env)[source]
If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.
- Parameters:
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.
- class grid2op.Reward.L2RPNReward(logger=None)[source]
This is the historical
BaseReward
used for the Learning To Run a Power Network competition on WCCI 2019See L2RPN for more information.
This rewards makes the sum of the “squared margin” on each powerline.
The margin is defined, for each powerline as: margin of a powerline = (thermal limit - flow in amps) / thermal limit (if flow in amps <= thermal limit) else margin of a powerline = 0.
This rewards is then: sum (margin of this powerline) ^ 2, for each powerline.
Examples
You can use this reward in any environment with:
import grid2op from grid2op.Reward import L2RPNReward # then you create your environment with it: NAME_OF_THE_ENVIRONMENT = "l2rpn_case14_sandbox" env = grid2op.make(NAME_OF_THE_ENVIRONMENT,reward_class=L2RPNReward) # and do a step with a "do nothing" action obs = env.reset() obs, reward, done, info = env.step(env.action_space()) # the reward is computed with the L2RPNReward class
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([logger])Initializes
BaseReward.reward_min
andBaseReward.reward_max
initialize
(env)If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- initialize(env)[source]
If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.
- Parameters:
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.
- class grid2op.Reward.L2RPNSandBoxScore(alpha_redisp=1.0, alpha_loss=1.0, alpha_storage=1.0, alpha_curtailment=1.0, reward_max=1000.0, logger=None)[source]
INTERNAL
Warning
/!\ Internal, do not use unless you know what you are doing /!\ It must not serve as a reward. This scored needs to be MINIMIZED, and a reward needs to be maximized! Also, this “reward” is not scaled or anything. Use it as your own risk.
Implemented as a reward to make it easier to use in the context of the L2RPN competitions, this “reward” computed the “grid operation cost”. It should not be used to train an agent.
The “reward” the closest to this score is given by the
RedispReward
class.Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([alpha_redisp, alpha_loss, ...])Initializes
BaseReward.reward_min
andBaseReward.reward_max
initialize
(env)If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(alpha_redisp=1.0, alpha_loss=1.0, alpha_storage=1.0, alpha_curtailment=1.0, reward_max=1000.0, logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- initialize(env)[source]
If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.
- Parameters:
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.
- class grid2op.Reward.L2RPNWCCI2022ScoreFun(storage_cost=10.0, alpha_redisp=1.0, alpha_loss=1.0, alpha_storage=1.0, alpha_curtailment=1.0, reward_max=1000.0, logger=None)[source]
INTERNAL
Warning
/!\ Internal, do not use unless you know what you are doing /!\ It must not serve as a reward. This scored needs to be MINIMIZED, and a reward needs to be maximized! Also, this “reward” is not scaled or anything. Use it as your own risk.
Implemented as a reward to make it easier to use in the context of the L2RPN competitions, this “reward” computed the “grid operation cost”. It should not be used to train an agent.
The “reward” the closest to this score is given by the
RedispReward
class.Methods:
__init__
([storage_cost, alpha_redisp, ...])Initializes
BaseReward.reward_min
andBaseReward.reward_max
- __init__(storage_cost=10.0, alpha_redisp=1.0, alpha_loss=1.0, alpha_storage=1.0, alpha_curtailment=1.0, reward_max=1000.0, logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- class grid2op.Reward.LinesCapacityReward(logger=None)[source]
Reward based on lines capacity usage Returns max reward if no current is flowing in the lines Returns min reward if all lines are used at max capacity
Compared to :class:L2RPNReward: This reward is linear (instead of quadratic) and only considers connected lines capacities
Examples
You can use this reward in any environment with:
import grid2op from grid2op.Reward import LinesCapacityReward # then you create your environment with it: NAME_OF_THE_ENVIRONMENT = "l2rpn_case14_sandbox" env = grid2op.make(NAME_OF_THE_ENVIRONMENT,reward_class=LinesCapacityReward) # and do a step with a "do nothing" action obs = env.reset() obs, reward, done, info = env.step(env.action_space()) # the reward is computed with the LinesCapacityReward class
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([logger])Initializes
BaseReward.reward_min
andBaseReward.reward_max
- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- class grid2op.Reward.LinesReconnectedReward(logger=None)[source]
This reward computes a penalty based on the number of powerline that could have been reconnected (cooldown at 0.) but are still disconnected.
Examples
You can use this reward in any environment with:
import grid2op from grid2op.Reward import LinesReconnectedReward # then you create your environment with it: NAME_OF_THE_ENVIRONMENT = "l2rpn_case14_sandbox" env = grid2op.make(NAME_OF_THE_ENVIRONMENT,reward_class=LinesReconnectedReward) # and do a step with a "do nothing" action obs = env.reset() obs, reward, done, info = env.step(env.action_space()) # the reward is computed with the LinesReconnectedReward class
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([logger])Initializes
BaseReward.reward_min
andBaseReward.reward_max
- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- class grid2op.Reward.N1Reward(l_id=0, logger=None)[source]
This class implements a reward that is inspired by the “n-1” criterion widely used in power system.
More specifically it returns the maximum flows (on all the powerlines) after a given (as input) a powerline has been disconnected.
Examples
This can be used as:
import grid2op from grid2op.Reward import N1Reward L_ID = 0 env = grid2op.make("l2rpn_case14_sandbox", reward_class=N1Reward(l_id=L_ID) ) obs = env.reset() obs, reward, *_ = env.step(env.action_space()) print(f"reward: {reward:.3f}") print("We can check that it is exactly like 'simulate' on the current step the disconnection of the same powerline") obs_n1, *_ = obs.simulate(env.action_space({"set_line_status": [(L_ID, -1)]}), time_step=0) print(f" max flow after disconnection of line {L_ID}: {obs_n1.rho.max():.3f}")
Notes
It is also possible to use the other_rewards argument to simulate multiple powerline disconnections, for example:
import grid2op from grid2op.Reward import N1Reward L_ID = 0 env = grid2op.make("l2rpn_case14_sandbox", other_rewards={f"line_{l_id}": N1Reward(l_id=l_id) for l_id in [0, 1]} ) obs = env.reset() obs, reward, *_ = env.step(env.action_space()) print(f"reward: {reward:.3f}") print("We can check that it is exactly like 'simulate' on the current step the disconnection of the same powerline") obs_n1, *_ = obs.simulate(env.action_space({"set_line_status": [(L_ID, -1)]}), time_step=0) print(f" max flow after disconnection of line {L_ID}: {obs_n1.rho.max():.3f}")
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([l_id, logger])Initializes
BaseReward.reward_min
andBaseReward.reward_max
close
()overide this for certain reward that might need specific behaviour
initialize
(env)If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(l_id=0, logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- initialize(env)[source]
If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.
- Parameters:
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.
- class grid2op.Reward.RedispReward(logger=None)[source]
This reward can be used for environments where redispatching is available. It assigns a cost to redispatching action and penalizes with the losses.
This is the closest reward to the score used for the l2RPN competitions.
Examples
You can use this reward in any environment with:
import grid2op from grid2op.Reward import RedispReward # then you create your environment with it: NAME_OF_THE_ENVIRONMENT = "l2rpn_case14_sandbox" env = grid2op.make(NAME_OF_THE_ENVIRONMENT,reward_class=RedispReward) # and do a step with a "do nothing" action obs = env.reset() obs, reward, done, info = env.step(env.action_space()) # the reward is computed with the RedispReward class # NB this is the default reward of many environments in the grid2op framework
This class depends on some “meta parameters”. These meta parameters can be changed when the class is created in the following way:
import grid2op from grid2op.Reward import RedispReward reward_cls = RedispReward.generate_class_custom_params(alpha_redisph=5, min_load_ratio=0.1, worst_losses_ratio=0.05, min_reward=-10., reward_illegal_ambiguous=0., least_losses_ratio=0.015) env_name = "l2rpn_case14_sandbox" # or any other name env = grid2op.make(env_name,reward_class=reward_cls)
These meta parameters means:
alpha_redisp: extra cost paid when performing redispatching. For 1MW of redispatching done, you pay “alpha_redisph”
min_load_ratio: how to compute the minimum load on the grid, based on the total generation (sum of gen_pmax)
worst_losses_ratio: worst loss possible on the grid (5% is an upper bound for normal grid)
min_reward: what is the minimum reward of this class (can be parametrized, and is only used when there is a game over
reward_illegal_ambiguous: reward given when the action is illegal or ambiguous
least_losses_ratio: the minimum loss you can have (1.5% of the total demand should be a lower bound for real grid)
Notes
On windows and MacOs, due to a compatibility issue with multi-processing, it is not possible to have different “RedisReward” with different meta parameters (see the “Examples” section).
Methods:
__call__
(action, env, has_error, is_done, ...)Method called to compute the reward.
__init__
([logger])Initializes
BaseReward.reward_min
andBaseReward.reward_max
initialize
(env)If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Method called to compute the reward.
- Parameters:
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerflow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.
- Returns:
res – The reward associated to the input parameters.
- Return type:
float
Notes
All the flags can be used to know on which type of situation the reward is computed.
For example, if has_error is
True
it means there was an error during the computation of the powerflow. this means there is a “game_over”, sois_done
isTrue
in this case.But, if there is
is_done
equal toTrue
buthas_error
equal toFalse
this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.
- __init__(logger=None)[source]
Initializes
BaseReward.reward_min
andBaseReward.reward_max
- initialize(env)[source]
If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environment.Environment
to be initialized, this should be done in this method.NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.
- Parameters:
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.
- class grid2op.Reward.RewardHelper(reward_func=<class 'grid2op.Reward.constantReward.ConstantReward'>, logger=None)[source]
INTERNAL
Warning
/!\ Internal, do not use unless you know what you are doing /!\ It is a class internal to the
grid2op.Environment.Environment
do not use outside of its purpose and do not attempt to modify it.This class aims at making the creation of rewards class more automatic by the
grid2op.Environment
.It is not recommended to derived or modified this class. If a different reward need to be used, it is recommended to build another object of this class, and change the
RewardHelper.rewardClass
attribute.- rewardClass
Type of reward that will be use by this helper. Note that the type (and not an instance / object of that type) must be given here. It defaults to
ConstantReward
- Type:
type
- template_reward
An object of class
RewardHelper.rewardClass
used to compute the rewards.- Type:
Methods:
__call__
(action, env, has_error, is_done, ...)Gives the reward that follows the execution of the
grid2op.BaseAction.BaseAction
action in thegrid2op.Environment.Environment
env;__init__
([reward_func, logger])change_reward
(reward_func)INTERNAL
close
()clsoe the reward helper (in case there are specific behaviour for certain rewards
initialize
(env)This function initializes the template_reward with the environment.
range
()Provides the range of the rewards.
reset
(env)called each time env.reset is invoked
Attributes:
list of weak references to the object (if defined)
- __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]
Gives the reward that follows the execution of the
grid2op.BaseAction.BaseAction
action in thegrid2op.Environment.Environment
env;- Parameters:
action (
grid2op.Action.Action
) – The action performed by the BaseAgent.env (
grid2op.Environment.Environment
) – The current environment.has_error (
bool
) – Does the action caused an error, such a diverging powerflow for example= (True
: the action caused an error)is_done (
bool
) – Is the game over (True
= the game is over)is_illegal (
bool
) – Is the action legal or not (True
= the action was illegal). Seegrid2op.Exceptions.IllegalAction
for more information.is_ambiguous (
bool
) – Is the action ambiguous or not (True
= the action was ambiguous). Seegrid2op.Exceptions.AmbiguousAction
for more information.
- Returns:
res – The computed reward
- Return type:
float
- __weakref__
list of weak references to the object (if defined)
- change_reward(reward_func)[source]
INTERNAL
Warning
/!\ Internal, do not use unless you know what you are doing /!\
Use env.change_reward instead (
grid2op.Environment.BaseEnv.change_reward()
)
- initialize(env)[source]
This function initializes the template_reward with the environment. It is used especially for using
RewardHelper.range()
.- Parameters:
env (
grid2op.Environment.Environment
) – The current used environment.
If you still can’t find what you’re looking for, try in one of the following pages:
Still trouble finding the information ? Do not hesitate to send a github issue about the documentation at this link: Documentation issue template