Skip to content

Conversation

siligam
Copy link
Contributor

@siligam siligam commented Aug 30, 2025

This PR addresses an edge case in the frequency inference logic where monthly data with a missing month (e.g., Jan, Feb, April) was being incorrectly identified as having a 5W frequency.

Before fix:

>>> import pandas as pd
>>> from pymor.core.infer_freq import infer_frequency
>>> times = pd.to_datetime(["2000-01-31", "2000-02-29", "2000-04-30"])  # March is missing
>>> infer_frequency(times)
'5W'

After fix:

>>> import pandas as pd
>>> from pymor.core.infer_freq import infer_frequency
>>> times = pd.to_datetime(["2000-01-31", "2000-02-29", "2000-04-30"])  # March is missing
>>> infer_frequency(times)
'M'

- Separate merged test functions into distinct test cases
- Add test case for frequency inference with missing months
- Improve test documentation and organization
- Update frequency detection to correctly identify monthly patterns with missing months
- Improve handling of irregular time steps in monthly data
- Ensure consistent behavior across different calendar types
- Filter out zero-deltas (duplicates) before frequency inference
- Prevents duplicates from dominating median calculation
- Now correctly infers 'M' for monthly data with duplicates
- Adds new status 'all_duplicates' for time series with only duplicates
- Includes comprehensive regression tests for duplicate scenarios

Fixes edge case where duplicate timestamps would cause incorrect
frequency inference (e.g., returning '10D' instead of 'M' for
monthly data with duplicates, or 'None' with too many duplicates).
- Add detection of duplicate timestamps (zero deltas) in time series data
- Filter out zero deltas before calculating median for frequency inference
- Report 'irregular' status when duplicates are present instead of 'valid'
- Handle edge case where all timestamps are duplicates ('all_duplicates' status)
- Add comprehensive regression tests for duplicate handling scenarios
- Fix is_resolution_fine_enough to consistently return 'status' key in all paths

This enhancement improves robustness for real-world climate data that often
contains irregularities like duplicate timestamps, providing accurate frequency
detection and clear diagnostic feedback to users.
- Calculate delta_days even when xarray.infer_freq succeeds
- Convert delta_days from numpy.float64 to plain Python float
- Convert is_exact from numpy.bool_ to plain Python bool
- Improves data readability and consistency across inference paths
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant