Reward

Objectives

This module implements some utilities to get rewards given an grid2op.Action an grid2op.Environment and some associated context (like has there been an error etc.)

It is possible to modify the reward to use to better suit a training scheme, or to better take into account some phenomenon by simulating the effect of some grid2op.Action using grid2op.Observation.BaseObservation.simulate().

Doing so only requires to derive the BaseReward, and most notably the three abstract methods BaseReward.__init__(), BaseReward.initialize() and BaseReward.__call__()

Training with multiple rewards

In the standard reinforcement learning framework the reward is unique. In grid2op, we didn’t want to modify that.

However powergrid are complex environment with some specific and unsual dynamics. For these reasons it can be difficult to compress all these signal into one single scalar. To speed up the learning process, to force the Agent to adopt more resilient strategies etc. it can be usefull to look at different aspect, thus using different reward. Grid2op allows to do so. At each time step (and also when using the simulate function) it is possible to compute different rewards. This rewards must inherit and be provided at the initialization of the Environment.

This can be done as followed:

import grid2op
from grid2op.Reward import GameplayReward, L2RPNReward
env = grid2op.make("case14_realistic", reward_class=L2RPNReward, other_rewards={"gameplay": GameplayReward})
obs = env.reset()
act = env.action_space()  # the do nothing action
obs, reward, done, info = env.step(act)  # immplement the do nothing action on the environment

On this example, “reward” comes from the L2RPNReward and the results of the “reward” computed with the GameplayReward is accessible with the info[“rewards”][“gameplay”]. We choose for this example to name the other rewards, “gameplay” which is related to the name of the reward “GampeplayReward” for convenience. The name can be absolutely any string you want.

NB In the case of L2RPN competitions, the reward can be modified by the competitors, and so is the “other_reward” key word arguments. The only restriction is that the key “__score” will be use by the organizers to compute the score the agent. Any attempt to modify it will be erased by the score function used by the organizers without any warning.

Detailed Documentation by class

Classes:

BaseReward()

Base class from which all rewards used in the Grid2Op framework should derived.

BridgeReward([min_pen_lte, max_pen_gte])

This reward computes a penalty based on how many bridges are present in the grid network.

CloseToOverflowReward([max_lines])

This reward finds all lines close to overflowing.

CombinedReward()

This class allows to combine multiple pre defined reward.

CombinedScaledReward()

This class allows to combine multiple rewards.

ConstantReward()

Most basic implementation of reward: everything has the same values: 0.0

DistanceReward()

This reward computes a penalty based on the distance of the current grid to the grid at time 0 where everything is connected to bus 1.

EconomicReward()

This reward computes the marginal cost of the powergrid.

FlatReward([per_timestep])

This reward return a fixed number (if there are not error) or 0 if there is an error.

GameplayReward()

This rewards is strictly computed based on the Game status.

IncreasingFlatReward([per_timestep])

This reward just counts the number of timestep the agent has successfully manage to perform.

L2RPNReward()

This is the historical BaseReward used for the Learning To Run a Power Network competition on WCCI 2019

L2RPNSandBoxScore([alpha_redisph])

Warning

/!\ Internal, do not use unless you know what you are doing /!\

LinesCapacityReward()

Reward based on lines capacity usage Returns max reward if no current is flowing in the lines Returns min reward if all lines are used at max capacity

LinesReconnectedReward()

This reward computes a penalty based on the number of powerline that could have been reconnected (cooldown at 0.) but are still disconnected.

RedispReward([alpha_redisph])

This reward can be used for environments where redispatching is available.

RewardHelper([rewardClass])

Warning

/!\ Internal, do not use unless you know what you are doing /!\

class grid2op.Reward.BaseReward[source]

Base class from which all rewards used in the Grid2Op framework should derived.

In reinforcement learning, a reward is a signal send by the grid2op.Environment.Environment to the grid2op.BaseAgent indicating how well this agent performs.

One of the goal of Reinforcement Learning is to maximize the (discounted) sum of (expected) rewards over time.

You can create all rewards you want in grid2op. The only requirement is that all rewards should inherit this BaseReward.

reward_min

The minimum reward an grid2op.BaseAgent can get performing the worst possible grid2op.Action.BaseAction in the worst possible scenario.

Type

float

reward_max

The maximum reward an grid2op.Agent.BaseAgent can get performing the best possible grid2op.Action.BaseAction in the best possible scenario.

Type

float

Examples

If you want the environment to compute a reward that is the sum of the flow (this is not a good reward, but we use it as an example on how to do it) you can achieve it with:

Notes

If the flag has_error is set to True this indicates there has been an error in the “env.step” function. This might induce some undefined behaviour if using some method of the environment.

Please make sure to check whether or not this is the case when defining your reward.

This “new” behaviour has been introduce to “fix” the akward behavior spotted in # https://github.com/rte-france/Grid2Op/issues/146

def __call__(action, env, has_error, is_done, is_illegal, is_ambiguous):
    if has_error:
        # DO SOMETHING IN THIS CASE
        res = self.reward_min
    else:
        # DO NOT USE `env.get_obs()` (nor any method of the environment `env.XXX` if the flag `has_error`
        # is set to ``True``
        # This might result in undefined behaviour
        res = np.sum(env.get_obs().a_or)
    return res

Methods:

__init__

Initialize self.

abstractmethod __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]

Method called to compute the reward.

Parameters
  • action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent

  • env (grid2op.Environment.Environment) – An environment instance properly initialized.

  • has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.

  • is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)

  • is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

  • is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

Returns

res – The reward associated to the input parameters.

Return type

float

Notes

All the flags can be used to know on which type of situation the reward is computed.

For example, if has_error is True it means there was an error during the computation of the powerflow. this means there is a “game_over”, so is_done is True in this case.

But, if there is is_done equal to True but has_error equal to False this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.

abstractmethod __init__()[source]

Initializes BaseReward.reward_min and BaseReward.reward_max

for ... in __iter__()[source]

Implements python iterable to get a dict summary using summary = dict(reward_instance) Can be overloaded by subclass, default implementation gives name, reward_min, reward_max

get_range()[source]

Shorthand to retrieve both the minimum and maximum possible rewards in one command.

It is not recommended to override this function.

Returns

initialize(env)[source]

If BaseReward.reward_min, BaseReward.reward_max or other custom attributes require to have a valid grid2op.Environment.Environment to be initialized, this should be done in this method.

NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.

Parameters

env (grid2op.Environment.Environment) – An environment instance properly initialized.

set_range(reward_min, reward_max)[source]

Setter function for the BaseReward.reward_min and BaseReward.reward_max.

It is not recommended to override this function

Parameters
class grid2op.Reward.BridgeReward(min_pen_lte=0.0, max_pen_gte=1.0)[source]

This reward computes a penalty based on how many bridges are present in the grid network. In graph theory, a bridge is an edge that if removed will cause the graph to be disconnected.

Examples

You can use this reward in any environment with:

Methods:

__init__

Initialize self.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]

Method called to compute the reward.

Parameters
  • action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent

  • env (grid2op.Environment.Environment) – An environment instance properly initialized.

  • has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.

  • is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)

  • is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

  • is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

Returns

res – The reward associated to the input parameters.

Return type

float

Notes

All the flags can be used to know on which type of situation the reward is computed.

For example, if has_error is True it means there was an error during the computation of the powerflow. this means there is a “game_over”, so is_done is True in this case.

But, if there is is_done equal to True but has_error equal to False this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.

__init__(min_pen_lte=0.0, max_pen_gte=1.0)[source]

Initializes BaseReward.reward_min and BaseReward.reward_max

class grid2op.Reward.CloseToOverflowReward(max_lines=5)[source]

This reward finds all lines close to overflowing. Returns max reward when there is no overflow, min reward if more than one line is close to overflow and the mean between max and min reward if one line is close to overflow

Examples

You can use this reward in any environment with:

Methods:

__init__

Initialize self.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]

Method called to compute the reward.

Parameters
  • action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent

  • env (grid2op.Environment.Environment) – An environment instance properly initialized.

  • has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.

  • is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)

  • is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

  • is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

Returns

res – The reward associated to the input parameters.

Return type

float

Notes

All the flags can be used to know on which type of situation the reward is computed.

For example, if has_error is True it means there was an error during the computation of the powerflow. this means there is a “game_over”, so is_done is True in this case.

But, if there is is_done equal to True but has_error equal to False this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.

__init__(max_lines=5)[source]

Initializes BaseReward.reward_min and BaseReward.reward_max

initialize(env)[source]

If BaseReward.reward_min, BaseReward.reward_max or other custom attributes require to have a valid grid2op.Environment.Environment to be initialized, this should be done in this method.

NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.

Parameters

env (grid2op.Environment.Environment) – An environment instance properly initialized.

class grid2op.Reward.CombinedReward[source]

This class allows to combine multiple pre defined reward. The reward it computes will be the sum of all the sub rewards it is made of.

Each sub reward is identified by a key.

It is used a bit differently that the other rewards. See the section example for more information.

Examples

import grid2op
from grid2op.Reward import GameplayReward, FlatReward, CombinedReward

env = grid2op.make(..., reward_class=CombinedReward)
cr = self.env.get_reward_instance()
cr.addReward("Gameplay", GameplayReward(), 1.0)
cr.addReward("Flat", FlatReward(), 1.0)
cr.initialize(self.env)

obs = env.reset()
obs, reward, done, info = env.step(env.action_space())

# reward here is computed by summing the results of what would have
# given `GameplayReward` and the one from `FlatReward`

Methods:

__init__

Initialize self.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]

Method called to compute the reward.

Parameters
  • action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent

  • env (grid2op.Environment.Environment) – An environment instance properly initialized.

  • has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.

  • is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)

  • is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

  • is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

Returns

res – The reward associated to the input parameters.

Return type

float

Notes

All the flags can be used to know on which type of situation the reward is computed.

For example, if has_error is True it means there was an error during the computation of the powerflow. this means there is a “game_over”, so is_done is True in this case.

But, if there is is_done equal to True but has_error equal to False this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.

__init__()[source]

Initializes BaseReward.reward_min and BaseReward.reward_max

for ... in __iter__()[source]

Implements python iterable to get a dict summary using summary = dict(reward_instance) Can be overloaded by subclass, default implementation gives name, reward_min, reward_max

initialize(env)[source]

If BaseReward.reward_min, BaseReward.reward_max or other custom attributes require to have a valid grid2op.Environment.Environment to be initialized, this should be done in this method.

NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.

Parameters

env (grid2op.Environment.Environment) – An environment instance properly initialized.

class grid2op.Reward.CombinedScaledReward[source]

This class allows to combine multiple rewards. It will compute a scaled reward of the weighted sum of the registered rewards. Scaling is done by linearly interpolating the weighted sum, from the range [min_sum; max_sum] to [reward_min; reward_max]

min_sum and max_sum are computed from the weights and ranges of registered rewards. See Reward.BaseReward for setting the output range.

Examples

import grid2op
from grid2op.Reward import GameplayReward, FlatReward, CombinedScaledReward

env = grid2op.make(..., reward_class=CombinedScaledReward)
cr = self.env.get_reward_instance()
cr.addReward("Gameplay", GameplayReward(), 1.0)
cr.addReward("Flat", FlatReward(), 1.0)
cr.initialize(self.env)

obs = env.reset()
obs, reward, done, info = env.step(env.action_space())

# reward here is computed by summing the results of what would have
# given `GameplayReward` and the one from `FlatReward`

Methods:

__init__

Initialize self.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]

Method called to compute the reward.

Parameters
  • action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent

  • env (grid2op.Environment.Environment) – An environment instance properly initialized.

  • has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.

  • is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)

  • is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

  • is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

Returns

res – The reward associated to the input parameters.

Return type

float

Notes

All the flags can be used to know on which type of situation the reward is computed.

For example, if has_error is True it means there was an error during the computation of the powerflow. this means there is a “game_over”, so is_done is True in this case.

But, if there is is_done equal to True but has_error equal to False this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.

__init__()[source]

Initializes BaseReward.reward_min and BaseReward.reward_max

initialize(env)[source]

Overloaded initialze from Reward.CombinedReward. This is because it needs to store the ranges internaly

class grid2op.Reward.ConstantReward[source]

Most basic implementation of reward: everything has the same values: 0.0

Note that this BaseReward subtype is not useful at all, whether to train an BaseAgent nor to assess its performance of course.

Examples

You can use this reward in any environment with:

Methods:

__init__

Initialize self.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]

Method called to compute the reward.

Parameters
  • action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent

  • env (grid2op.Environment.Environment) – An environment instance properly initialized.

  • has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.

  • is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)

  • is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

  • is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

Returns

res – The reward associated to the input parameters.

Return type

float

Notes

All the flags can be used to know on which type of situation the reward is computed.

For example, if has_error is True it means there was an error during the computation of the powerflow. this means there is a “game_over”, so is_done is True in this case.

But, if there is is_done equal to True but has_error equal to False this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.

__init__()[source]

Initializes BaseReward.reward_min and BaseReward.reward_max

class grid2op.Reward.DistanceReward[source]

This reward computes a penalty based on the distance of the current grid to the grid at time 0 where everything is connected to bus 1.

Examples

You can use this reward in any environment with:

Methods:

__init__

Initialize self.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]

Method called to compute the reward.

Parameters
  • action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent

  • env (grid2op.Environment.Environment) – An environment instance properly initialized.

  • has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.

  • is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)

  • is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

  • is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

Returns

res – The reward associated to the input parameters.

Return type

float

Notes

All the flags can be used to know on which type of situation the reward is computed.

For example, if has_error is True it means there was an error during the computation of the powerflow. this means there is a “game_over”, so is_done is True in this case.

But, if there is is_done equal to True but has_error equal to False this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.

__init__()[source]

Initializes BaseReward.reward_min and BaseReward.reward_max

class grid2op.Reward.EconomicReward[source]

This reward computes the marginal cost of the powergrid. As RL is about maximising a reward, while we want to minimize the cost, this class also ensures that:

  • the reward is positive if there is no game over, no error etc.

  • the reward is inversely proportional to the cost of the grid (the higher the reward, the lower the economic cost).

Examples

You can use this reward in any environment with:

Methods:

__init__

Initialize self.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]

Method called to compute the reward.

Parameters
  • action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent

  • env (grid2op.Environment.Environment) – An environment instance properly initialized.

  • has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.

  • is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)

  • is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

  • is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

Returns

res – The reward associated to the input parameters.

Return type

float

Notes

All the flags can be used to know on which type of situation the reward is computed.

For example, if has_error is True it means there was an error during the computation of the powerflow. this means there is a “game_over”, so is_done is True in this case.

But, if there is is_done equal to True but has_error equal to False this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.

__init__()[source]

Initializes BaseReward.reward_min and BaseReward.reward_max

initialize(env)[source]

If BaseReward.reward_min, BaseReward.reward_max or other custom attributes require to have a valid grid2op.Environment.Environment to be initialized, this should be done in this method.

NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.

Parameters

env (grid2op.Environment.Environment) – An environment instance properly initialized.

class grid2op.Reward.FlatReward(per_timestep=1)[source]

This reward return a fixed number (if there are not error) or 0 if there is an error.

Examples

You can use this reward in any environment with:

Methods:

__init__

Initialize self.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]

Method called to compute the reward.

Parameters
  • action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent

  • env (grid2op.Environment.Environment) – An environment instance properly initialized.

  • has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.

  • is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)

  • is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

  • is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

Returns

res – The reward associated to the input parameters.

Return type

float

Notes

All the flags can be used to know on which type of situation the reward is computed.

For example, if has_error is True it means there was an error during the computation of the powerflow. this means there is a “game_over”, so is_done is True in this case.

But, if there is is_done equal to True but has_error equal to False this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.

__init__(per_timestep=1)[source]

Initializes BaseReward.reward_min and BaseReward.reward_max

class grid2op.Reward.GameplayReward[source]

This rewards is strictly computed based on the Game status. It yields a negative reward in case of game over. A half negative reward on rules infringement. Otherwise the reward is positive.

Examples

You can use this reward in any environment with:

Methods:

__init__

Initialize self.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]

Method called to compute the reward.

Parameters
  • action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent

  • env (grid2op.Environment.Environment) – An environment instance properly initialized.

  • has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.

  • is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)

  • is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

  • is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

Returns

res – The reward associated to the input parameters.

Return type

float

Notes

All the flags can be used to know on which type of situation the reward is computed.

For example, if has_error is True it means there was an error during the computation of the powerflow. this means there is a “game_over”, so is_done is True in this case.

But, if there is is_done equal to True but has_error equal to False this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.

__init__()[source]

Initializes BaseReward.reward_min and BaseReward.reward_max

class grid2op.Reward.IncreasingFlatReward(per_timestep=1)[source]

This reward just counts the number of timestep the agent has successfully manage to perform.

It adds a constant reward for each time step successfully handled.

Examples

You can use this reward in any environment with:

Methods:

__init__

Initialize self.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]

Method called to compute the reward.

Parameters
  • action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent

  • env (grid2op.Environment.Environment) – An environment instance properly initialized.

  • has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.

  • is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)

  • is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

  • is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

Returns

res – The reward associated to the input parameters.

Return type

float

Notes

All the flags can be used to know on which type of situation the reward is computed.

For example, if has_error is True it means there was an error during the computation of the powerflow. this means there is a “game_over”, so is_done is True in this case.

But, if there is is_done equal to True but has_error equal to False this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.

__init__(per_timestep=1)[source]

Initializes BaseReward.reward_min and BaseReward.reward_max

initialize(env)[source]

If BaseReward.reward_min, BaseReward.reward_max or other custom attributes require to have a valid grid2op.Environment.Environment to be initialized, this should be done in this method.

NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.

Parameters

env (grid2op.Environment.Environment) – An environment instance properly initialized.

class grid2op.Reward.L2RPNReward[source]

This is the historical BaseReward used for the Learning To Run a Power Network competition on WCCI 2019

See L2RPN for more information.

This rewards makes the sum of the “squared margin” on each powerline.

The margin is defined, for each powerline as: margin of a powerline = (thermal limit - flow in amps) / thermal limit (if flow in amps <= thermal limit) else margin of a powerline = 0.

This rewards is then: sum (margin of this powerline) ^ 2, for each powerline.

Examples

You can use this reward in any environment with:

Methods:

__init__

Initialize self.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]

Method called to compute the reward.

Parameters
  • action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent

  • env (grid2op.Environment.Environment) – An environment instance properly initialized.

  • has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.

  • is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)

  • is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

  • is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

Returns

res – The reward associated to the input parameters.

Return type

float

Notes

All the flags can be used to know on which type of situation the reward is computed.

For example, if has_error is True it means there was an error during the computation of the powerflow. this means there is a “game_over”, so is_done is True in this case.

But, if there is is_done equal to True but has_error equal to False this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.

__init__()[source]

Initializes BaseReward.reward_min and BaseReward.reward_max

initialize(env)[source]

If BaseReward.reward_min, BaseReward.reward_max or other custom attributes require to have a valid grid2op.Environment.Environment to be initialized, this should be done in this method.

NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.

Parameters

env (grid2op.Environment.Environment) – An environment instance properly initialized.

class grid2op.Reward.L2RPNSandBoxScore(alpha_redisph=1.0)[source]

Warning

/!\ Internal, do not use unless you know what you are doing /!\ It must not serve as a reward. This scored needs to be minimized, and a reward needs to be maximized! Also, this “reward” is not scaled or anything. Use it as your own risk.

Implemented as a reward to make it easier to use in the context of the L2RPN competitions, this “reward” computed the “grid operation cost”. It should not be used to train an agent.

The “reward” the closest to this score is given by the RedispReward class.

Methods:

__init__

Initialize self.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]

Method called to compute the reward.

Parameters
  • action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent

  • env (grid2op.Environment.Environment) – An environment instance properly initialized.

  • has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.

  • is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)

  • is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

  • is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

Returns

res – The reward associated to the input parameters.

Return type

float

Notes

All the flags can be used to know on which type of situation the reward is computed.

For example, if has_error is True it means there was an error during the computation of the powerflow. this means there is a “game_over”, so is_done is True in this case.

But, if there is is_done equal to True but has_error equal to False this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.

__init__(alpha_redisph=1.0)[source]

Initializes BaseReward.reward_min and BaseReward.reward_max

class grid2op.Reward.LinesCapacityReward[source]

Reward based on lines capacity usage Returns max reward if no current is flowing in the lines Returns min reward if all lines are used at max capacity

Compared to :class:L2RPNReward: This reward is linear (instead of quadratic) and only considers connected lines capacities

Examples

You can use this reward in any environment with:

Methods:

__init__

Initialize self.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]

Method called to compute the reward.

Parameters
  • action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent

  • env (grid2op.Environment.Environment) – An environment instance properly initialized.

  • has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.

  • is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)

  • is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

  • is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

Returns

res – The reward associated to the input parameters.

Return type

float

Notes

All the flags can be used to know on which type of situation the reward is computed.

For example, if has_error is True it means there was an error during the computation of the powerflow. this means there is a “game_over”, so is_done is True in this case.

But, if there is is_done equal to True but has_error equal to False this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.

__init__()[source]

Initializes BaseReward.reward_min and BaseReward.reward_max

initialize(env)[source]

If BaseReward.reward_min, BaseReward.reward_max or other custom attributes require to have a valid grid2op.Environment.Environment to be initialized, this should be done in this method.

NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.

Parameters

env (grid2op.Environment.Environment) – An environment instance properly initialized.

class grid2op.Reward.LinesReconnectedReward[source]

This reward computes a penalty based on the number of powerline that could have been reconnected (cooldown at 0.) but are still disconnected.

Examples

You can use this reward in any environment with:

Methods:

__init__

Initialize self.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]

Method called to compute the reward.

Parameters
  • action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent

  • env (grid2op.Environment.Environment) – An environment instance properly initialized.

  • has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.

  • is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)

  • is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

  • is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

Returns

res – The reward associated to the input parameters.

Return type

float

Notes

All the flags can be used to know on which type of situation the reward is computed.

For example, if has_error is True it means there was an error during the computation of the powerflow. this means there is a “game_over”, so is_done is True in this case.

But, if there is is_done equal to True but has_error equal to False this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.

__init__()[source]

Initializes BaseReward.reward_min and BaseReward.reward_max

class grid2op.Reward.RedispReward(alpha_redisph=5.0)[source]

This reward can be used for environments where redispatching is available. It assigns a cost to redispatching action and penalizes with the losses.

This is the closest reward to the score used for the l2RPN competitions.

Examples

You can use this reward in any environment with:

Methods:

__init__

Initialize self.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]

Method called to compute the reward.

Parameters
  • action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent

  • env (grid2op.Environment.Environment) – An environment instance properly initialized.

  • has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.

  • is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)

  • is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been replaced by “do nohting” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

  • is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been replaced by “do nothing” by the environment. NB an illegal action is NOT an ambiguous action. See the description of the Action module: Illegal vs Ambiguous for more details.

Returns

res – The reward associated to the input parameters.

Return type

float

Notes

All the flags can be used to know on which type of situation the reward is computed.

For example, if has_error is True it means there was an error during the computation of the powerflow. this means there is a “game_over”, so is_done is True in this case.

But, if there is is_done equal to True but has_error equal to False this means that the episode is over without any error. In other word, your agent sucessfully managed all the scenario and to get to the end of the episode.

__init__(alpha_redisph=5.0)[source]

Initializes BaseReward.reward_min and BaseReward.reward_max

initialize(env)[source]

If BaseReward.reward_min, BaseReward.reward_max or other custom attributes require to have a valid grid2op.Environment.Environment to be initialized, this should be done in this method.

NB reward_min and reward_max are used by the environment to compute the maximum and minimum reward and cast it in “reward_range” which is part of the openAI gym public interface. If you don’t define them, some piece of code might not work as expected.

Parameters

env (grid2op.Environment.Environment) – An environment instance properly initialized.

class grid2op.Reward.RewardHelper(rewardClass=<class 'grid2op.Reward.ConstantReward.ConstantReward'>)[source]

Warning

/!\ Internal, do not use unless you know what you are doing /!\ It is a class internal to the grid2op.Environment.Environment do not use outside of its purpose and do not attempt to modify it.

This class aims at making the creation of rewards class more automatic by the grid2op.Environment.

It is not recommended to derived or modified this class. If a different reward need to be used, it is recommended to build another object of this class, and change the RewardHelper.rewardClass attribute.

rewardClass

Type of reward that will be use by this helper. Note that the type (and not an instance / object of that type) must be given here. It defaults to ConstantReward

Type

type

template_reward

An object of class RewardHelper.rewardClass used to compute the rewards.

Type

BaseReward

Methods:

__init__

Initialize self.

Attributes:

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]

Gives the reward that follows the execution of the grid2op.BaseAction.BaseAction action in the grid2op.Environment.Environment env;

Parameters
  • action (grid2op.Action.Action) – The action performed by the BaseAgent.

  • env (grid2op.Environment.Environment) – The current environment.

  • has_error (bool) – Does the action caused an error, such a diverging powerflow for example= (True: the action caused an error)

  • is_done (bool) – Is the game over (True = the game is over)

  • is_illegal (bool) – Is the action legal or not (True = the action was illegal). See grid2op.Exceptions.IllegalAction for more information.

  • is_ambiguous (bool) – Is the action ambiguous or not (True = the action was ambiguous). See grid2op.Exceptions.AmbiguousAction for more information.

Returns

res – The computed reward

Return type

float

__init__(rewardClass=<class 'grid2op.Reward.ConstantReward.ConstantReward'>)[source]

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

initialize(env)[source]

This function initializes the template_reward with the environment. It is used especially for using RewardHelper.range().

Parameters

env (grid2op.Environment.Environment) – The current used environment.

range()[source]

Provides the range of the rewards.

Returns

res – The minimum reward per time step (possibly infinity) and the maximum reward per timestep (possibly infinity)

Return type

(float, float)

If you still can’t find what you’re looking for, try in one of the following pages: