Closed
Conversation
Key Changes Made
1. Rewrote Clustering class (flixopt/clustering/base.py) - Minimal dataclass with only essential fields:
- cluster_assignments, cluster_weights, n_clusters, timesteps_per_cluster, original_timesteps
- get_timestep_mapping() now computed on demand instead of stored
- expand_data() method for expansion
2. Simplified tsam_adapter.py - Only converts tsam results to xarray for IO
3. Fixed transform_accessor.py expand() - Filter out cluster-only variables to prevent cluster dimension leaking to expanded FlowSystem
4. Updated test file - Replaced old API references:
- clustering.result.cluster_structure.X → clustering.X
- get_cluster_order_for_slice() → get_cluster_assignments_for_slice()
5. Fixed JSON serialization - Convert timestamps to ISO format strings
Result
- 79 tests passing (all clustering-related tests)
- Removed redundant ClusterStructure and ClusterResult classes
- tsam v3 API integration complete with ClusterConfig, ExtremeConfig, PredefinedConfig
Contributor
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Key Changes Made
1. Rewrote Clustering class (flixopt/clustering/base.py) - Minimal dataclass with only essential fields:
- cluster_assignments, cluster_weights, n_clusters, timesteps_per_cluster, original_timesteps
- get_timestep_mapping() now computed on demand instead of stored
- expand_data() method for expansion
2. Simplified tsam_adapter.py - Only converts tsam results to xarray for IO
3. Fixed transform_accessor.py expand() - Filter out cluster-only variables to prevent cluster dimension leaking to expanded FlowSystem
4. Updated test file - Replaced old API references:
- clustering.result.cluster_structure.X → clustering.X
- get_cluster_order_for_slice() → get_cluster_assignments_for_slice()
5. Fixed JSON serialization - Convert timestamps to ISO format strings
Result
- 79 tests passing (all clustering-related tests)
- Removed redundant ClusterStructure and ClusterResult classes
- tsam v3 API integration complete with ClusterConfig, ExtremeConfig, PredefinedConfig
- predefined (tsam PredefinedConfig) - Stored as dict via to_dict(), restored via from_dict() in __post_init__
- cluster_assignments and cluster_weights as DataArrays
- n_clusters, timesteps_per_cluster, original_timesteps as scalars
- metrics Dataset (RMSE, MAE, etc.)
Not stored (only used during clustering):
- ClusterConfig - Input parameter to cluster(), not needed after
- ExtremeConfig - Input parameter to cluster(), not needed after
Key changes made:
1. Added predefined.to_dict() serialization in _create_reference_structure()
2. Added __post_init__ to convert predefined dict back to PredefinedConfig
3. Updated DataArray serialization to use __dataarray__ references for compatibility with _resolve_reference_structure
4. Updated tests to match new format
This allows transferring clustering to other systems after loading:
fs_loaded = FlowSystem.from_netcdf('clustered.nc')
fs2_clustered = fs2.transform.cluster(
n_clusters=8,
predefined=fs_loaded.clustering.predefined # Works!
)
Key changes: 1. Converted from dataclass to regular class with _create_reference_structure() method 2. Uses :::name references for DataArrays (standard Interface pattern) 3. Stores original_timesteps_iso as list of ISO strings (Interface skips pd.Index) 4. Added Clustering.create() convenience constructor that accepts DatetimeIndex and Dataset 5. predefined dict auto-converts to PredefinedConfig in __init__ 6. Metrics stored as individual DataArrays (metrics_rmse, metrics_mae, metrics_rmse_duration) What's preserved through IO: - cluster_assignments and cluster_weights DataArrays - n_clusters, timesteps_per_cluster scalars - original_timesteps (as ISO strings, reconstructed to DatetimeIndex) - predefined (as dict, reconstructed to PredefinedConfig) - metrics (individual DataArrays, accessed as Dataset via property)
1. flixopt/clustering/clustering.py - New simple @DataClass with: - Direct __init__ accepting pd.DatetimeIndex (no factory method) - Derived properties: n_clusters, timesteps_per_cluster, n_original_clusters - Explicit to_reference() / from_reference() IO methods - DatetimeIndex name preserved in attrs for roundtrip 2. flixopt/clustering/__init__.py - Updated import 3. flixopt/clustering/interface.py - Deleted (no longer needed) 4. flixopt/flow_system.py - Updated IO to use new methods 5. flixopt/transform_accessor.py - Direct Clustering() constructor call 6. Tests - Updated to match new API
How to Use Clustering
import tsam
fs_clustered = flow_system.transform.cluster(
n_clusters=8,
cluster_duration='1D',
cluster=tsam.ClusterConfig(method='hierarchical'), # Optional
extremes=tsam.ExtremeConfig(max_value=['Demand|profile']), # Peak forcing
)
How to Access Results/Accuracy
clustering = fs_clustered.clustering
# Properties (derived from data)
clustering.n_clusters # Number of clusters
clustering.n_original_clusters # Original segments (e.g., 31 days)
clustering.timesteps_per_cluster # Per cluster (e.g., 24)
# Core data
clustering.cluster_assignments # xr.DataArray: original → cluster ID
clustering.cluster_weights # xr.DataArray: cluster → occurrence count
clustering.original_timesteps # pd.DatetimeIndex
# Quality metrics
clustering.metrics # xr.Dataset with RMSE, MAE
clustering.metrics.to_dataframe()
# Transfer to another system
clustering.predefined # PredefinedConfig
# Plotting
clustering.plot.heatmap()
Notebooks Updated
- 08c-clustering.ipynb - Main tutorial, uses new extremes=ExtremeConfig(max_value=[...]) API
- 08c2-clustering-storage-modes.ipynb - Storage mode comparison
- 08d-clustering-multiperiod.ipynb - Multi-period systems
- 08e-clustering-internals.ipynb - Complete rewrite showing Clustering dataclass internals
All 4 clustering notebooks pass their tests.
Member
Author
|
Closed in favor of #571 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Brief description of the changes in this PR.
Type of Change
Related Issues
Closes #(issue number)
Testing
Checklist