-
Notifications
You must be signed in to change notification settings - Fork 1
The PyCMOR Paper #179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
The PyCMOR Paper #179
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pgierz, thanks for this great draft.
See my suggestions and comments.
I think it will also be worth explaining how is the CMIP6 standard implemented and what is the strategy for future implementations of other standards, for example CMIP7. One idea would be that we can build pycmor with options in pip (`pip install pycmor[<a_external_mip>] where this external package simply contains all the classes needed to translate those standards to the language of our tool. We could keep the standard interfaces of the higher-interest standards integrated directly into our tool, such as CMIPs, PMIPs, ... and integrate community standard interfaces if sufficiently tested and robust. We could also say, all standard interfaces are not part of the repo of pymor (separation of concerns). I don't know about this point. But a mention to how the communities will include new standard interfaces is needed, in my view.
equal-contrib: true | ||
affiliation: 1 | ||
- name: Miguel Andres-Martinez | ||
orcid: ???? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
orcid: ???? | |
orcid: 0000-0002-1525-5546 |
PyMOR is a toolbox for preparing Earth System Model (ESM) data for analysis and | ||
sharing with the community. PyMOR uses a simple command line interface and a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my view it is not only about "preparing" data. It is also to standardize it. Can you add something about standardization? Like:
PyMOR is a toolbox for preparing Earth System Model (ESM) data for analysis and | |
sharing with the community. PyMOR uses a simple command line interface and a | |
PyCMOR is a toolbox for the preparation and standardization of Earth System Model (ESM) data, facilitating subsequent analysis and ensuring that the data can be readily shared with the community. PyCMOR uses a simple command line interface and a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to explain the CMOR acronym and relate it somewhere to the name of the software
sharing with the community. PyMOR uses a simple command line interface and a | ||
clear way to manipulate NetCDF files step by step to add relevant metadata, | ||
transform units, combine variables, regrid, transform geometries, and more. It | ||
runs in parallel using Dask and SLURM, and thus is suitable to handle even |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not here, but somewhere we have to state that it should be pretty easy to use for different batch systems as long as Dask has a Cluster Class defined for them.
Standardization of climate model outputs is crucial for preparing data for IPCC | ||
reports because it ensures that results from different modeling centers worldwide | ||
can be directly compared, combined, and analyzed in a consistent and transparent | ||
manner. The IPCC and associated projects like CMIP require model outputs to be | ||
formatted according to strict technical specifications: data must be provided | ||
in standardized NetCDF files, using common variable names, units, metadata | ||
conventions (such as the CF Metadata Conventions), and grid structures (e.g., | ||
rectilinear grids for most fields, standard pressure or depth levels for | ||
vertical coordinates). This uniformity allows the Program for Climate Model | ||
Diagnosis and Intercomparison (PCMDI) to centrally archive and distribute model | ||
results, enabling hundreds of researchers to efficiently scrutinize, benchmark, | ||
and synthesize findings across models and scenarios. Without such | ||
standardization, the process of aggregating results for global assessments | ||
would be error-prone, time-consuming, and potentially unreliable, undermining | ||
the scientific basis for the IPCC’s policy-relevant conclusions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to improve the motivation, adding more than just CMIP and IPCC report.
Standardization of climate model outputs is crucial for preparing data for IPCC | |
reports because it ensures that results from different modeling centers worldwide | |
can be directly compared, combined, and analyzed in a consistent and transparent | |
manner. The IPCC and associated projects like CMIP require model outputs to be | |
formatted according to strict technical specifications: data must be provided | |
in standardized NetCDF files, using common variable names, units, metadata | |
conventions (such as the CF Metadata Conventions), and grid structures (e.g., | |
rectilinear grids for most fields, standard pressure or depth levels for | |
vertical coordinates). This uniformity allows the Program for Climate Model | |
Diagnosis and Intercomparison (PCMDI) to centrally archive and distribute model | |
results, enabling hundreds of researchers to efficiently scrutinize, benchmark, | |
and synthesize findings across models and scenarios. Without such | |
standardization, the process of aggregating results for global assessments | |
would be error-prone, time-consuming, and potentially unreliable, undermining | |
the scientific basis for the IPCC’s policy-relevant conclusions. | |
Model Intercomparison projects ([MIPs](https://wcrp-cmip.org/mips/)) bring | |
together the international Earth system science community to address key | |
scientific questions by comparing results across different models and datasets. | |
To do that MIPs offer standardization protocols so that results from different | |
modeling centers worldwide can be directly compared, combined, and analyzed | |
in a consistent and transparent | |
manner. A MIP of particular importance is the Coupled Model Intercomparison Project (CMIP), | |
which is used in the IPCC as an estimate of future climates for different scenarios. MIPs require model outputs and observational data to be | |
formatted according to strict technical specifications: data must be provided | |
in standardized NetCDF files, using common variable names, units, metadata | |
conventions (such as the CF Metadata Conventions), and grid structures (e.g., | |
rectilinear grids for most fields, standard pressure or depth levels for | |
vertical coordinates). This uniformity allows the Program for Climate Model | |
Diagnosis and Intercomparison (PCMDI) to centrally archive and distribute model | |
results, enabling hundreds of researchers to efficiently scrutinize, benchmark, | |
and synthesize findings across models and scenarios. Without such | |
standardization, the process of aggregating results for global assessments | |
would be error-prone, time-consuming, and potentially unreliable. |
make it challenging to manipulate the files to conform to the requisite metadata | ||
standards and best-practices. | ||
|
||
We developed `pymor` to fill the need for a flexible, performant, extensible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We developed `pymor` to fill the need for a flexible, performant, extensible | |
We developed `pycmor` to fill the need for a flexible, performant, extensible |
|
||
```python | ||
import xarray as xr | ||
from pymor.core.rule import Rule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from pymor.core.rule import Rule | |
from pycmor.core.rule import Rule |
- "pymor.core.gather_inputs.load_mfdataset" | ||
- "script://./intpp_recom.py:add_pp_components" | ||
- "pymor.fesom_1p4.nodes_to_levels" | ||
- "script://./intpp_recom.py:vertical_integration" | ||
- "script://./intpp_recom.py:set_pp_units" | ||
- "pymor.std_lib.convert_units" | ||
- "pymor.std_lib.time_average" | ||
- "pymor.std_lib.set_global_attributes" | ||
- "pymor.std_lib.trigger_compute" | ||
- "pymor.std_lib.show_data" | ||
- "pymor.std_lib.files.save_dataset" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- "pymor.core.gather_inputs.load_mfdataset" | |
- "script://./intpp_recom.py:add_pp_components" | |
- "pymor.fesom_1p4.nodes_to_levels" | |
- "script://./intpp_recom.py:vertical_integration" | |
- "script://./intpp_recom.py:set_pp_units" | |
- "pymor.std_lib.convert_units" | |
- "pymor.std_lib.time_average" | |
- "pymor.std_lib.set_global_attributes" | |
- "pymor.std_lib.trigger_compute" | |
- "pymor.std_lib.show_data" | |
- "pymor.std_lib.files.save_dataset" | |
- "pycmor.core.gather_inputs.load_mfdataset" | |
- "script://./intpp_recom.py:add_pp_components" | |
- "pycmor.fesom_1p4.nodes_to_levels" | |
- "script://./intpp_recom.py:vertical_integration" | |
- "script://./intpp_recom.py:set_pp_units" | |
- "pycmor.std_lib.convert_units" | |
- "pycmor.std_lib.time_average" | |
- "pycmor.std_lib.set_global_attributes" | |
- "pycmor.std_lib.trigger_compute" | |
- "pycmor.std_lib.show_data" | |
- "pycmor.std_lib.files.save_dataset" |
the user configuration file: | ||
|
||
```yaml | ||
pymor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pymor: | |
pycmor: |
functionality once the user has defined the configuration file: | ||
|
||
```bash | ||
$ pymor process <path/to/config.yaml> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ pymor process <path/to/config.yaml> | |
$ pycmor process <path/to/config.yaml> |
Christian Stepanek for early design testing, as well as the CMIP team for | ||
fruitful discussions and feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Christian Stepanek for early design testing, as well as the CMIP team for | |
fruitful discussions and feedback. | |
Christian Stepanek for early design testing, as well as the CMIP team and the WCRP ESMO Infrastructure Panel (WIP) fruitful discussions and feedback. |
We also need to acknowledge here the DataHub which is the funding body for @siligam's position
I think it would be a good idea for us to write a paper for
PyMOR
.Target Journal is JOSS: https://joss.theoj.org