.. currentmodule:: grid2op.Environment .. _environment-module: Environment =================================== This page is organized as follow: .. contents:: Table of Contents :depth: 3 Objectives ----------- This module defines the :class:`Environment` the higher level representation of the world with which an :class:`grid2op.Agent.BaseAgent` will interact. The environment receive an :class:`grid2op.Action.BaseAction` from the :class:`grid2op.Agent.BaseAgent` in the :func:`Environment.step` and returns an :class:`grid2op.Observation.BaseObservation` that the :class:`grid2op.Agent.BaseAgent` will use to perform the next action. An environment is better used inside a :class:`grid2op.Runner.Runner`, mainly because runners abstract the interaction between environment and agent, and ensure the environment are properly reset after each episode. .. _environment-module-usage: Usage ------ In this section we present some way to use the :class:`Environment` class. Basic Usage ++++++++++++ This example is adapted from gymnasium documentation available at `gym random_agent.py `_ ): .. code-block:: python import grid2op from grid2op.Agent import RandomAgent env = grid2op.make("l2rpn_case14_sandbox") agent = RandomAgent(env.action_space) env.seed(0) # for reproducible experiments episode_count = 100 # i want to make 100 episodes # i initialize some useful variables reward = 0 done = False total_reward = 0 # and now the loop starts for i in range(episode_count): obs = env.reset() while True: action = agent.act(obs, reward, done) obs, reward, done, info = env.step(action) total_reward += reward if done: # in this case the episode is over break # Close the env and write monitor result info to disk env.close() print("The total reward was {:.2f}".format(total_reward)) What happens here is the following: - `obs = env.reset()` will reset the environment to be usable again. It will load, by default the next "chronics" (you can imagine chronics as the graphics of a video game: it tells where the enemies are located, where are the walls, the ground etc. - each chronics can be thought a different "game level"). - `action = agent.act(obs, reward, done)` will chose an action facing the observation `ob`. This action should be of type :class:`grid2op.Action.BaseAction` (or one of its derivate class). In case of a video game that would be you receiving and observation (usually display on the screen) and action on a controller. For example you could chose to go "left" / "right" / "up" or "down". Of course in the case of the powergrid the actions are more complicated that than. - `obs, reward, done, info = env.step(action)` is the call to go to the next steps. You can imagine it as being a the next "frame". To continue the parallel with video games, at the previous line you asked "pacman" to go left (for example) and then the next frame is displayed (here returned as an new observation `obs`). You might want to customize this general behaviour in multiple way: - you might want to study only one chronics (equivalent to only one level of a video game) see `Study always the same time serie`_ - you might want to loop through the chronics, but not always in the same order. If that is the case you might want to consult the section `Shuffle the chronics order`_ - you might also have spotted some chronics that have bad properties. In this case, you can "remove" them from the environment (they will be ignored). This is explained in `Skipping some chronics`_ - you might also want to select at random, the next chronic you will use. This allows some compromise between all the above solution. Instead of ignoring some chronics you might want to select them less frequently, instead of always using the same one, you can sampling it more often and of course, because the sampling is done randomly it's unlikely that the order will remain the same. To use that you can check the `Sampling the chronics`_ In a different scenarios, you might also want to skip the first time steps of the chronics, that would be equivalent to starting into the "middle" of a video game. If that is the case, the subsection `Skipping some time steps`_ is made for you. Finally, you might have noticed that each call to "env.reset" might take a while. This can dramatically increase the training time, especially at the beginning. This is due to the fact that each time `env.reset` is called, the whole chronics is read from the hard drive. If you want to lower this impact then you might consult the :ref:`environment-module-data-pipeline` page of the doc. Go to the next scenario ++++++++++++++++++++++++ Starting grid2op 1.9.8 we attempt to make an easier user experience in the selection of time series, seed, initial state of the grid, etc. All of the above can be done when calling `env.reset()` function. For customizing the seed, you can for example do: .. code-block:: python import grid2op env_name = "l2rpn_case14_sandbox" env = grid2op.make(env_name) obs = env.reset(seed=0) For customizing the time series id you want to use: .. code-block:: python import grid2op env_name = "l2rpn_case14_sandbox" env = grid2op.make(env_name) obs = env.reset(options={"time serie id": 1}) # time serie by id (sorted alphabetically) # or obs = env.reset(options={"time serie id": "0001"}) # time serie by name (folder name) For customizing the initial state of the grid, for example forcing the powerline 0 to be disconnected in the initial observation: .. code-block:: python import grid2op env_name = "l2rpn_case14_sandbox" env = grid2op.make(env_name) init_state_dict = {"set_line_status": [(0, -1)]} obs = env.reset(options={"init state": init_state_dict}) Feel free to consult the documentation of the :func:`Environment.reset` function for more information (this doc might be outdated, the one of the function should be more up to date with the code). .. note:: In the near future (next few releases) we will also attempt to make the customization of the `parameters` or the `skip number of steps`, `maximum duration of the scenarios` also available in `env.reset()` options. .. _environment-module-chronics-info: Time series Customization ++++++++++++++++++++++++++ Study always the same time serie ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If you spotted a particularly interesting chronics, or if you want, for some reason your agent to see only one chronics, you can do this rather easily with grid2op. All chronics are given a unique persistent ID (it means that as long as the data is not modified the same chronics will have always the same ID each time you load the environment). The environment has a "set_id" method that allows you to use it. Just add "env.set_id(THE\\_ID\\_YOU\\_WANT)" before the call to "env.reset". This gives the following code: .. code-block:: python import grid2op from grid2op.Agent import RandomAgent env = grid2op.make("l2rpn_case14_sandbox") agent = RandomAgent(env.action_space) env.seed(0) # for reproducible experiments episode_count = 100 # i want to make 100 episodes ################################### THE_CHRONIC_ID = 42 ################################### # i initialize some useful variables reward = 0 done = False total_reward = 0 # and now the loop starts for i in range(episode_count): ################################### # with recent grid2op obs = env.reset(options={"time serie id": THE_CHRONIC_ID}) ################################### ################################### # 'old method (oldest grid2op version)' # env.set_id(THE_CHRONIC_ID) # obs = env.reset() ################################### # now play the episode as usual while True: action = agent.act(obs, reward, done) obs, reward, done, info = env.step(action) total_reward += reward if done: # in this case the episode is over break # Close the env and write monitor result info to disk env.close() print("The total reward was {:.2f}".format(total_reward)) (as always added line compared to the base code are highlighted: they are "circle" with `#####`) Shuffle the chronics order ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In some other usecase, you might want to go through the whole set of chronics, and then loop again through them, but in a different order (remember that by default it will always loop in the same order 0, 1, 2, 3, ..., 0, 1, 2, 3, ..., 0, 1, 2, 3, ...). Again, doing so with grid2op is rather easy. To that end you can use the `chronics_handler.shuffle` function that will do exactly that. You can use it like this: .. code-block:: python import numpy as np import grid2op from grid2op.Agent import RandomAgent env = grid2op.make("l2rpn_case14_sandbox") agent = RandomAgent(env.action_space) env.seed(0) # for reproducible experiments episode_count = 10000 # i want to make lots of episode # total number of episode total_episode = len(env.chronics_handler.subpaths) # i initialize some useful variables reward = 0 done = False total_reward = 0 # and now the loop starts for i in range(episode_count): ################################### if i % total_episode == 0: # I shuffle each time i need to env.chronics_handler.shuffle() ################################### obs = env.reset() # now play the episode as usual while True: action = agent.act(obs, reward, done) obs, reward, done, info = env.step(action) total_reward += reward if done: # in this case the episode is over break (as always added line compared to the base code are highlighted: they are "circle" with `#####`) Skipping some chronics ^^^^^^^^^^^^^^^^^^^^^^ Some chronics might be too hard to start a training ("learn to walk before running") and conversely some chronics might be too easy after a while (you can solve them without doing nothing basically). This is why grid2op allows you to have some control about which chronics will be used by the environment. For this purpose you can use the `chronics_handler.set_filter` function. This function takes a "filtering function" as argument. This "filtering function" takes as argument the full path of the chronics and should return ``True`` / ``False`` whether or not you want to keep the There is an example: .. code-block:: python import numpy as np import re import grid2op from grid2op.Agent import RandomAgent env = grid2op.make("l2rpn_case14_sandbox") agent = RandomAgent(env.action_space) env.seed(0) # for reproducible experiments ################################### # this is the only line of code to add # here i select only the chronics that start by "00" env.chronics_handler.set_filter(lambda path: re.match(".*00[0-9].*", path) is not None) kept = env.chronics_handler.reset() # if you don't do that it will not have any effect print(kept) # i print the chronics kept ################################### episode_count = 10000 # i want to make lots of episode # i initialize some useful variables reward = 0 done = False total_reward = 0 # and now the loop starts # it will only used the chronics selected for i in range(episode_count): obs = env.reset() # now play the episode as usual while True: action = agent.act(obs, reward, done) obs, reward, done, info = env.step(action) total_reward += reward if done: # in this case the episode is over break (as always added line compared to the base code are highlighted: they are "circle" with `#####`) Sampling the chronics ^^^^^^^^^^^^^^^^^^^^^^^ Finally, for even more flexibility, you can choose to sample what will be the next used chronics. To achieve that you can call the `chronics_handler.sample_next_chronics` This function takes a vector of probabilities as input (if not provided it assumes all probabilities are equal) and will select an id based on this probability vector. In the following example we assume that the vector of probabilities is always the same and that we want, for some reason oversampling the 10 first chronics, and under sample the last 10: .. code-block:: python import numpy as np import re import grid2op from grid2op.Agent import RandomAgent env = grid2op.make("l2rpn_case14_sandbox") agent = RandomAgent(env.action_space) env.seed(0) # for reproducible experiments episode_count = 10000 # i want to make lots of episode # i initialize some useful variables reward = 0 done = False total_reward = 0 ################################### # total number of episode total_episode = len(env.chronics_handler.subpaths) probas = np.ones(total_episode) # oversample the first 10 episode probas[:10]*= 5 # undersample the last 10 episode probas[-10:] /= 5 ################################### # and now the loop starts # it will only used the chronics selected for i in range(episode_count): ################################### _ = env.chronics_handler.sample_next_chronics(probas) # this is added ################################### obs = env.reset() # now play the episode as usual while True: action = agent.act(obs, reward, done) obs, reward, done, info = env.step(action) total_reward += reward if done: # in this case the episode is over break (as always added line compared to the base code are highlighted: they are "circle" with `#####`) **NB** here we have a constant vector of probabilities, but you might imagine adapting it during the training, for example to oversample scenarios your agent is having trouble to solve during the training. Skipping some time steps ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Another way to customize which data your agent will face is to make as if the chronics started at different date and time. This might be handy in case a scenario is hard at the beginning but less hard at the end, or if you want your agent to learn to start controlling the grid at any date and time (in grid2op most of the chronics data provided start at midnight for example). To achieve this goal, you can use the :func:`BaseEnv.fast_forward_chronics` function. This function skip a given number of steps. In the following example, we always skip the first 42 time steps before starting the episode: .. code-block:: python import numpy as np import re import grid2op from grid2op.Agent import RandomAgent env = grid2op.make("l2rpn_case14_sandbox") agent = RandomAgent(env.action_space) env.seed(0) # for reproducible experiments episode_count = 10000 # i want to make lots of episode # i initialize some useful variables reward = 0 done = False total_reward = 0 # and now the loop starts # it will only used the chronics selected for i in range(episode_count): obs = env.reset() ################################### # below are the two lines added env.fast_forward_chronics(42) obs = env.get_obs() ################################### # now play the episode as usual while True: action = agent.act(obs, reward, done) obs, reward, done, info = env.step(action) total_reward += reward if done: # in this case the episode is over break (as always added line compared to the base code are highlighted: they are "circle" with `#####`) Generating chronics that are always new ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. versionadded:: 1.6.6 This functionality is only available for some environments, for example "l2rpn_wcci_2022" .. warning:: A much better alternative to this class is to have a "process" generate the data, thanks to the :func:`grid2op.Environment.Environment.generate_data` and then to reload the data in a (separate) training script. This is explained in section :ref:`generate_data_flow` of the documentation. Though it is not recommended at all (for performance reasons), you have, starting from grid2op 1.6.6 (and using a compatible environment *eg* "l2rpn_wcci_2022") to generate a possibly infinite amount of data thanks to the :class:`grid2op.Chronics.FromChronix2grid` class. The data generation process is rather slow for different reasons. The main one is that the data need to meet a lot of "constraints" to be realistic, some of them are given in the :ref:`modeled-elements-module` module. On our machines, it takes roughly 40-50 seconds to generate a weekly scenario for the `l2rpn_wcci_2022` environment (usually an agent will fail in 1 or 2s... This is why we do not recommend to use it) To generate data "on the fly" you simply need to create the environment with the right chronics class as follow: .. code-block:: python import grid2op from grid2op.Chronics import FromChronix2grid env_nm = "l2rpn_wcci_2022" # only compatible environment at time of writing env = grid2op.make(env_nm, chronics_class=FromChronix2grid, data_feeding_kwargs={"env_path": os.path.join(grid2op.get_current_local_dir(), env_nm), "with_maintenance": True, # whether to include maintenance (optional) "max_iter": 2 * 288, # duration (in number of steps) of the data generated (optional) } ) And this is it. Each time you call `env.reset()` it will internally call `chronix2grid` package to generate new data for this environment (this is why `env.reset()` will take roughly 50s...). .. warning:: For this class to be available, you need to have the "chronix2grid" package installed and working. Please install it with `pip intall grid2op[chronix2grid]` and make sure to have the `coinor-cbc` solver available on your system (more information at https://github.com/bdonnot/chronix2grid#installation) .. warning:: Because I know from experience warnings are skipped half of the time: **please consult** :ref:`generate_data_flow` **for a better way to generate infinite data** ! .. _generate_data_flow: Generate and use an "infinite" data ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. versionadded:: 1.6.6 .. warning:: For this class to be available, you need to have the "chronix2grid" package installed and working. Please install it with `pip intall grid2op[chronix2grid]` and make sure to have the `coinor-cbc` solver available on your system (more information at https://github.com/bdonnot/chronix2grid#installation) In this section we present a new way to generate possibly an infinite amount of data for training your agent ( in case the data shipped with the environment are too limited). One way to do this is to split the data "generation" process on one python script, and the data "consumption" process (for example by training an agent) on another one. This is much more efficient than using the :class:`grid2op.Chronics.FromChronix2grid` because you will not spend 50s waiting the data to be generated at each call to `env.reset()` after the episode is over. First, create a script to generate all the data that you want. For example in the script "generation.py": .. code-block:: python import grid2op env_name = "l2rpn_wcci_2022" # only compatible with what comes next (at time of writing) env = grid2op.make(env_name) nb_year = 50 # or any "big" number... env.generate_data(nb_year=nb_year) # generates 50 years of data # (takes roughly 50s per week, around 45mins per year, in this case 50 * 45 mins = 37.5 hours) Then create a script to "consume" your data, for example by training an agent (say "train.py") [we demonstrate it with l2rpn baselines but you can use whatever you want]: .. code-block:: python import os import grid2op from lightsim2grid import LightSimBackend # highly recommended for speed ! env_name = "l2rpn_wcci_2022" # only compatible with what comes next (at time of writing) env = grid2op.make(env_name, backend=LightSimBackend()) # now train an agent # see l2rpn_baselines package for more information, for example # l2rpn-baselines.readthedocs.io/ from l2rpn_baselines.PPO_SB3 import train nb_iter = 10000 # train for that many iterations agent_name = "WhaetverIWant" # or any other name agent_path = os.path.expand("~") # or anywhere else on your computer trained_agent = train(env, iterations=nb_iter, name=agent_name, save_path=agent_path) # this agent will be trained only on the data available at the creation of the environment # the training loop will take some time, so more data will be generated when it's over # reload them env.chronics_handler.init_subpath() env.chronics_handler.reset() # and retrain your agent including the data you just generated trained_agent = train(env, iterations=nb_iter, name=agent_name, save_path=agent_path, load_path=agent_path ) # once it's over, more time has passed, and more data are available # reload them env.chronics_handler.init_subpath() env.chronics_handler.reset() # and retrain your agent trained_agent = train(env, iterations=nb_iter, name=agent_name, save_path=agent_path, load_path=agent_path ) # well you got the idea # etc. etc. .. warning:: This way of doing things will always increase the size of the data in your hard drive. We do recommend to somehow delete some of the data from time to time Deleting the data you be done before the `env.chronics_handler.init_subpath()` for example: .. code-block :: python ### delete the folder you want to get rid off names_folder_to_delete = ... # To build `names_folder_to_delete` # you could for examaple: # - remove the `nth` oldest directories # see: https://stackoverflow.com/questions/47739262/find-remove-oldest-file-in-directory # - or keep only the `kth`` most recent directories # - or keep only `k` folder at random among the one in `grid2op.get_current_local_dir()` # - or delete all the oldest files and keep your directory at a fixed size # see: https://gist.github.com/ginz/1ba7de8b911651cfc9c85a82a723f952 # etc. for nm in names_folder_to_delete: shutil.rmtree(os.path.join(grid2op.get_current_local_dir(), nm)) #### # reload the remaining data: env.chronics_handler.init_subpath() env.chronics_handler.reset() # continue normally .. _environment-module-train-val-test: Splitting into raining, validation, test scenarios --------------------------------------------------- In machine learning the "training / validation / test" framework is particularly usefull to avoid overfitting and develop models as performant as possible. Grid2op allows for such usage at the environment level. There is the possibility to "split" an environment into training / validation and test (*ie* using only some chronics for training, some others for validation and some others for testing). This can be done with: .. code-block:: python import grid2op env_name = "l2rpn_case14_sandbox" # or any other... env = grid2op.make(env_name) # extract 1% of the "chronics" to be used in the validation environment. The other 99% will # be used for test nm_env_train, nm_env_val, nm_env_test = env.train_val_split_random(pct_val=1., pct_test=1.) # and now you can use the training set only to train your agent: print(f"The name of the training environment is \\"{nm_env_train}\\"") print(f"The name of the validation environment is \\"{nm_env_val}\\"") print(f"The name of the test environment is \\"{nm_env_test}\\"") env_train = grid2op.make(nm_env_train) You can then use, in the above case: .. code-block:: python import grid2op env_name = "l2rpn_case14_sandbox" # matching above env_train = grid2op.make(env_name+"_train") # to only use the "training chronics" # do whatever you want with env_train And then, at time of validation: .. code-block:: python import grid2op env_name = "l2rpn_case14_sandbox" # matching above env_val = grid2op.make(env_name+"_val") # to only use the "validation chronics" # do whatever you want with env_val # and of course env_test = grid2op.make(env_name+"_test") Customization ------------- Environments can be customized in three major ways: - `Backend`: you change the solver that computes the state of the power more or less faste or be more realistically - `Parameters`: you change the behaviour of the Environment. For example you can prevent the powerline to be disconnected when too much current flows on it etc. - `Rules`: you can affect the operational constraint that your agent must meet. For example you can affect more or less powerlines in the same action etc. You can do these at creation time: .. code-block:: python import grid2op env_name = "l2rpn_case14_sandbox" # or any other name # create the regular environment: env_reg = grid2op.make(env_name) # to change the backend # (here using the lightsim2grid faster backend) from lightsim2grid import LightSimBackend env_faster = grid2op.make(env_name, backend=LightSimBackend()) # to change the parameters, for example # to prevent line disconnect when there is overflow param = env_reg.parameters param.NO_OVERFLOW_DISCONNECTION = True env_easier = grid2op.make(env_name, param=param) Of course you can combine everything. More examples are given in section :ref:`env_cust_makeenv`. Detailed Documentation by class -------------------------------- .. automodule:: grid2op.Environment :members: :autosummary: .. include:: final.rst