Preprocessing
The preprocessing can be used independently as a standalone feature. To start the process, click the green button Check scenario in scenarios under check and run in the navigation bar. This procedure includes comprehensive data checks, data enrichments, availability simulations for components, data aggregations, and the creation of reference and preprocessed datasets.
Check
The data check includes a series of quality gates, duplicate detections, and consistency checks. If issues are found, the frontend displays a list that may require user review to ensure that only valid data is loaded into the simulation.
The data check is automatically performed when initiating a simulation run or a dedicated check. It validates model input data across several quality dimensions, including:
- Configuration consistency
- Unique component naming
- Input file completeness and format compliance
- Verification of data type, validity, completeness, necessity, and consistency
- Consistency, solvability, and suitability of optimization problems
A successful data check ensures that subsequent simulation runs proceed smoothly. If errors are found, the frontend displays a detailed issue list, providing targeted solution proposals. Passing the data check is required to generate reference and preprocessed datasets and to start simulation runs.
Enrichment
During input data enrichment, any empty sub-datasets are filled with assumptions based on look-ups, depending on the available data (best guess). This ensures the highest possible simulation accuracy with the given parameters. The enrichment also facilitates the use of data from various sources with differing coverage, such as varying resolutions.
The enriched dataset can be looked up in the reference dataset.
Availabilities
Like all components of the electricity utility system, power plants, consumers, storages, and grid elements are subject to temporary unavailability due to unforeseen outages or planned revisions. During these periods, a component’s ability to generate, store, transfer, or consume electricity is reduced or unavailable, affecting unit commitment, exchanges, and prices.
To ensure realistic results in electricity wholesale market simulations, these unavailability events must be considered, as they directly impact the dispatch of a component. The primary causes of unavailability are:
- Revisions: Scheduled maintenance activities, typically planned years in advance, which occur mainly during seasonal fluctuations, especially in summer in Europe. These are relatively predictable and have standard durations.
- Outages: Unplanned disruptions caused by operational or technical failures. These can occur unpredictably at any time and may vary significantly in duration.
The overall availability of a component is determined by the combination of revisions and outages. Since long-term simulations may lack precise revision dates, random unavailability events are generated during preprocessing. Timestamps and durations for both revisions and outages are determined using a random process: the start time is based on a uniform distribution, and the duration is drawn from a normal distribution, as illustrated in the following figure.
Relative generation and consumption potentials (as percentages per time range) and event time span distributions (in hours per event) are specified in input files labeled availability
. The determination of revisions and outages is carried out during preprocessing in two steps. First, revisions are generated, followed by the determination of outages. Throughout this process, must-runs, revisions, and outages in the input data are preserved. The final event drawings are treated as partial non-availability events to ensure the target availability is met precisely.
To generate revision and outage events, a pseudo seed based on the availability cluster is used. Each cluster is defined by component type, technology, time range, and availability type (revision or outage). This method ensures that revision and outage events can be reproduced with the same input data. Reproducibility is maintained as long as the component lists, drawing parameters, and configurations remain unchanged. While the drawing process is stochastic and event collisions may occur, the specified availability for each cluster is guaranteed. Alternatively, a Mersenne Twister 19937 method can be applied, though this approach is not reproducible. To use this method, disable the parameter preprocessing_availability_drawing_reproducible
in the project configuration.
Aggregation
During the preprocessing, aggregation methods can be applied for following components.
- Battery storages
- Hydro power networks
- Thermal power plants
- Demand-side response consumers
- Exchange capacities
Build-in aggregations can be used optionally for various use cases, including:
- Fast runs for rough estimates within minutes
- Monte Carlo runs for probabilistic analyses
- Resolving numerical issues in solver algorithms that use interior-point methods
Battery aggregation
Battery aggregation consolidates clustered batteries within each cluster into a single representative battery. The parameter battery_aggregation_opt in 90_grid_bidding_zones.csv
defines the aggregation level for batteries within a bidding zone, allowing for flexible configurations based on the following values:
Value | Description |
---|---|
0 | None (default) |
1 | Batteries with the same technology |
2 | All batteries |
Setting battery_aggregation_opt to for example 2 results in a single aggregated battery storage in the associated bidding zone.
During aggregation, associated input data such as must-runs, outages, revisions and states-of-charge are merged, ensuring that the storage capacity, charging and discharging capcities, and metrics such as power-weighted average efficiencies and work costs remain consistent as much as possible.
Hydro aggregation
The hydro power plant aggregation begins by identifying clusters, or sub-networks, of components based on the network topology. Each sub-network is then aggregated, and the resulting replacement components are integrated into the input data model. This iterative process ensures that each aggregation step builds on the previous one, capturing the topology’s details in a reproducible and deterministic way without randomization. Once aggregation is complete, the replacement components are saved to the preprocessed dataset.
The parameter hydro_aggregation_opt in 90_grid_bidding_zones.csv sets the aggregation level for all hydro power plants in a given bidding zone. The aggregation levels for hydro-connected networks are as follows:
Value | Description |
---|---|
0 | None (default) |
1 | Parallel hydro paths |
2 | Serial hydro paths |
3 | Parallel hydro paths with same lower reservoirs |
4 | Parallel hydro paths with same upper reservoirs |
5 | Serial and parallel hydro paths |
For higher levels, hydro-separated networks are merged as follows:
Value | Description |
---|---|
6 | Hydro networks without inflows |
7 | Hydro networks with inflows |
8 | All hydro networks |
Setting hydro_aggregation_opt to for example 8 results in a single aggregated pumped-storage unit, including inflows. The following figure illustrates an example of hydro network aggregation for each level.
Hydro aggregations also merge linked input data like reservoir inflows, reservoir filling levels or plant must-runs, outages and revisions, and assume in total equal power sums, equal generation and consumption amount possibilities as well as same power-weighted average efficiencies and work costs as much as possible.
Thermal aggregation
The thermal power plant aggregation process begins by identifying clusters, or component groups, based on shared plant characteristics. Each group is then aggregated and replaced in the input data model with a single, aggregated component representing the group. This iterative process captures detailed information from the original model at each level, ensuring that aggregation builds consistently and systematically on the previous level. The aggregation is deterministic and reproducible, avoiding randomization by relying solely on specific characteristics. Finally, the aggregated replacement components are written out in the preprocessed dataset.
The thermal_aggregation_opt parameter in 90_grid_bidding_zones.csv
defines the level of aggregation for thermal power plants within a bidding zone. Available options for this parameter are:
Value | Description |
---|---|
0 | None (default) |
1 | Thermal power plants with the same combination of technology and fuel type |
2 | Thermal power plants with same fuel type |
3 | All thermal power plants |
For example, setting thermal_aggregation_opt to 3 aggregates all thermal plants into a single replacement component in the associated bidding zone.
Thermal aggregations also consolidate associated input data such as fuel costs, efficiencies, must-runs, outages, revisions, fuel restrictions, as well as fuel and emission limits, while maintaining power sums, generation capacities, and average generation costs as much as possible.
DSR aggregation
The DSR aggregation merges clustered demand-side response (DSR) consumers into a single replacement DSR consumer within each cluster.
The dsr_aggregation_opt parameter in 90_grid_bidding_zones.csv
controls the aggregation level for DSR consumers in each bidding zone, with the following options:
Value | Description |
---|---|
0 | None (default) |
1 | DSR consumers with identical technology |
2 | All DSR consumers |
For example, setting dsr_aggregation_opt to 2 aggregates all DSR consumers into a single replacement component in the associated bidding zone.
DSR aggregations also integrate associated input data, such as must-runs, outages, revisions and states of charge, while ensuring the same total power, generation, and consumption capacities, as well as equivalent power-weighted average efficiencies and operational costs as much as possible.
Grid aggregation
Grid aggregation applies only to Flow-Based Market Coupling (FBMC) capacities. Unlike generator, consumer, and storage aggregations, grid aggregation is based directly on the accuracy of parameters.
Grid aggregation rounds Power Transfer Distribution Factors (PTDFs) to a specified precision. It then clusters Nritical Network Elements and Contingencies (CNECs) with identical PTDF values, assigning each cluster a Remaining Available Margin (RAM) value (defaulted to the average). Each CNEC cluster is then replaced by one CNEC with the rounded PTDF values and specified RAM, and replacement components are recorded in the preprocessed dataset.
The grid_aggregation_opt parameter in 90_grid_bidding_zones.csv
specifies the grid aggregation level, with these options:
Value | Description |
---|---|
0 | None (default) |
1 | PTDFs rounded to 4 decimals |
2 | PTDFs rounded to 3 decimals |
3 | PTDFs rounded to 2 decimals |
4 | PTDFs rounded to 1 decimal |
Setting the grid aggregation level to 4 would result in the highest reduction of CNECs. The PTDFs of all replacement components would then have one digit.
Grid aggregations also merge linked Advanced Hybrid Couplings (AHCs) accordingly.
Reference
During the check, the application generates a reference dataset containing the loaded and enriched input data. This dataset is saved in the subfolder output/reference
of each check folder and is formatted identically to the input files, enabling direct comparison and reimport if needed. The reference dataset represents the data model immediately after input loading.
Preprocessed
During the check, the application generates an output of the preprocessed input dataset. This dataset reflects applied aggregations, availability simulations, and other optional processing and simplifications based on the project configuration. It is saved in the subfolder output/preprocessed
of each check folder, formatted identically to the input files to allow for direct comparison and reimport if needed. This dataset serves as the data model for subsequent optimization processes.