Skip to content

Conversation

pgierz
Copy link
Member

@pgierz pgierz commented Jul 9, 2025

I think it would be a good idea for us to write a paper for PyMOR.

Target Journal is JOSS: https://joss.theoj.org

@pgierz
Copy link
Member Author

pgierz commented Aug 6, 2025

Needs #185 and #191 before proceeding

@mandresm
Copy link
Contributor

mandresm commented Aug 6, 2025

TODOs for 13.08.2025

  • @mandresm Review paper and give feedbackwith PR suggestions
  • @siligam Review paper and give feedback with PR suggestions

@mandresm mandresm changed the title The PyMOR Paper The PyCMOR Paper Sep 8, 2025
Copy link
Contributor

@mandresm mandresm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pgierz, thanks for this great draft.

See my suggestions and comments.

I think it will also be worth explaining how is the CMIP6 standard implemented and what is the strategy for future implementations of other standards, for example CMIP7. One idea would be that we can build pycmor with options in pip (`pip install pycmor[<a_external_mip>] where this external package simply contains all the classes needed to translate those standards to the language of our tool. We could keep the standard interfaces of the higher-interest standards integrated directly into our tool, such as CMIPs, PMIPs, ... and integrate community standard interfaces if sufficiently tested and robust. We could also say, all standard interfaces are not part of the repo of pymor (separation of concerns). I don't know about this point. But a mention to how the communities will include new standard interfaces is needed, in my view.

equal-contrib: true
affiliation: 1
- name: Miguel Andres-Martinez
orcid: ????
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
orcid: ????
orcid: 0000-0002-1525-5546

Comment on lines +33 to +34
PyMOR is a toolbox for preparing Earth System Model (ESM) data for analysis and
sharing with the community. PyMOR uses a simple command line interface and a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my view it is not only about "preparing" data. It is also to standardize it. Can you add something about standardization? Like:

Suggested change
PyMOR is a toolbox for preparing Earth System Model (ESM) data for analysis and
sharing with the community. PyMOR uses a simple command line interface and a
PyCMOR is a toolbox for the preparation and standardization of Earth System Model (ESM) data, facilitating subsequent analysis and ensuring that the data can be readily shared with the community. PyCMOR uses a simple command line interface and a

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to explain the CMOR acronym and relate it somewhere to the name of the software

sharing with the community. PyMOR uses a simple command line interface and a
clear way to manipulate NetCDF files step by step to add relevant metadata,
transform units, combine variables, regrid, transform geometries, and more. It
runs in parallel using Dask and SLURM, and thus is suitable to handle even
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not here, but somewhere we have to state that it should be pretty easy to use for different batch systems as long as Dask has a Cluster Class defined for them.

Comment on lines +44 to +58
Standardization of climate model outputs is crucial for preparing data for IPCC
reports because it ensures that results from different modeling centers worldwide
can be directly compared, combined, and analyzed in a consistent and transparent
manner. The IPCC and associated projects like CMIP require model outputs to be
formatted according to strict technical specifications: data must be provided
in standardized NetCDF files, using common variable names, units, metadata
conventions (such as the CF Metadata Conventions), and grid structures (e.g.,
rectilinear grids for most fields, standard pressure or depth levels for
vertical coordinates). This uniformity allows the Program for Climate Model
Diagnosis and Intercomparison (PCMDI) to centrally archive and distribute model
results, enabling hundreds of researchers to efficiently scrutinize, benchmark,
and synthesize findings across models and scenarios. Without such
standardization, the process of aggregating results for global assessments
would be error-prone, time-consuming, and potentially unreliable, undermining
the scientific basis for the IPCC’s policy-relevant conclusions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to improve the motivation, adding more than just CMIP and IPCC report.

Suggested change
Standardization of climate model outputs is crucial for preparing data for IPCC
reports because it ensures that results from different modeling centers worldwide
can be directly compared, combined, and analyzed in a consistent and transparent
manner. The IPCC and associated projects like CMIP require model outputs to be
formatted according to strict technical specifications: data must be provided
in standardized NetCDF files, using common variable names, units, metadata
conventions (such as the CF Metadata Conventions), and grid structures (e.g.,
rectilinear grids for most fields, standard pressure or depth levels for
vertical coordinates). This uniformity allows the Program for Climate Model
Diagnosis and Intercomparison (PCMDI) to centrally archive and distribute model
results, enabling hundreds of researchers to efficiently scrutinize, benchmark,
and synthesize findings across models and scenarios. Without such
standardization, the process of aggregating results for global assessments
would be error-prone, time-consuming, and potentially unreliable, undermining
the scientific basis for the IPCC’s policy-relevant conclusions.
Model Intercomparison projects ([MIPs](https://wcrp-cmip.org/mips/)) bring
together the international Earth system science community to address key
scientific questions by comparing results across different models and datasets.
To do that MIPs offer standardization protocols so that results from different
modeling centers worldwide can be directly compared, combined, and analyzed
in a consistent and transparent
manner. A MIP of particular importance is the Coupled Model Intercomparison Project (CMIP),
which is used in the IPCC as an estimate of future climates for different scenarios. MIPs require model outputs and observational data to be
formatted according to strict technical specifications: data must be provided
in standardized NetCDF files, using common variable names, units, metadata
conventions (such as the CF Metadata Conventions), and grid structures (e.g.,
rectilinear grids for most fields, standard pressure or depth levels for
vertical coordinates). This uniformity allows the Program for Climate Model
Diagnosis and Intercomparison (PCMDI) to centrally archive and distribute model
results, enabling hundreds of researchers to efficiently scrutinize, benchmark,
and synthesize findings across models and scenarios. Without such
standardization, the process of aggregating results for global assessments
would be error-prone, time-consuming, and potentially unreliable.

make it challenging to manipulate the files to conform to the requisite metadata
standards and best-practices.

We developed `pymor` to fill the need for a flexible, performant, extensible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We developed `pymor` to fill the need for a flexible, performant, extensible
We developed `pycmor` to fill the need for a flexible, performant, extensible


```python
import xarray as xr
from pymor.core.rule import Rule
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from pymor.core.rule import Rule
from pycmor.core.rule import Rule

Comment on lines +240 to +250
- "pymor.core.gather_inputs.load_mfdataset"
- "script://./intpp_recom.py:add_pp_components"
- "pymor.fesom_1p4.nodes_to_levels"
- "script://./intpp_recom.py:vertical_integration"
- "script://./intpp_recom.py:set_pp_units"
- "pymor.std_lib.convert_units"
- "pymor.std_lib.time_average"
- "pymor.std_lib.set_global_attributes"
- "pymor.std_lib.trigger_compute"
- "pymor.std_lib.show_data"
- "pymor.std_lib.files.save_dataset"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- "pymor.core.gather_inputs.load_mfdataset"
- "script://./intpp_recom.py:add_pp_components"
- "pymor.fesom_1p4.nodes_to_levels"
- "script://./intpp_recom.py:vertical_integration"
- "script://./intpp_recom.py:set_pp_units"
- "pymor.std_lib.convert_units"
- "pymor.std_lib.time_average"
- "pymor.std_lib.set_global_attributes"
- "pymor.std_lib.trigger_compute"
- "pymor.std_lib.show_data"
- "pymor.std_lib.files.save_dataset"
- "pycmor.core.gather_inputs.load_mfdataset"
- "script://./intpp_recom.py:add_pp_components"
- "pycmor.fesom_1p4.nodes_to_levels"
- "script://./intpp_recom.py:vertical_integration"
- "script://./intpp_recom.py:set_pp_units"
- "pycmor.std_lib.convert_units"
- "pycmor.std_lib.time_average"
- "pycmor.std_lib.set_global_attributes"
- "pycmor.std_lib.trigger_compute"
- "pycmor.std_lib.show_data"
- "pycmor.std_lib.files.save_dataset"

the user configuration file:

```yaml
pymor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pymor:
pycmor:

functionality once the user has defined the configuration file:

```bash
$ pymor process <path/to/config.yaml>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$ pymor process <path/to/config.yaml>
$ pycmor process <path/to/config.yaml>

Comment on lines +296 to +297
Christian Stepanek for early design testing, as well as the CMIP team for
fruitful discussions and feedback.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Christian Stepanek for early design testing, as well as the CMIP team for
fruitful discussions and feedback.
Christian Stepanek for early design testing, as well as the CMIP team and the WCRP ESMO Infrastructure Panel (WIP) fruitful discussions and feedback.

We also need to acknowledge here the DataHub which is the funding body for @siligam's position

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants