Skip to content

Conversation

cnhwl
Copy link
Contributor

@cnhwl cnhwl commented Apr 23, 2025

Checklist before merging this PR:

  • Mentioned all issues that this PR fixes or addresses.
  • Summarized the updates of this PR under Summary.
  • Added an entry under Unreleased in the Changelog.

Fixes #2773

Summary

When forecasting model output_chunk_shift > 0 and RegressionEnsembleModel regression_train_n_points == -1 or some large number, it would occur the forecasting model predict error:

        if self.output_chunk_shift and is_autoregression:
            raise_log(
                ValueError(
                    "Cannot perform auto-regression `(n > output_chunk_length)` with a model that uses a "
                    "shifted output chunk `(output_chunk_shift > 0)`."
                ),
                logger=logger,
            )

Therefore, I try to limit the RegressionEnsembleModel regression_train_n_points to be not bigger than the forecasting model output_chunk_length. Moreover, not bigger than ((the forecasting model output_chunk_length) minus (the max lag of the forecasting model)).

Copy link

codecov bot commented Apr 23, 2025

Codecov Report

❌ Patch coverage is 68.42105% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.11%. Comparing base (033fafe) to head (4e82626).

Files with missing lines Patch % Lines
...ts/models/forecasting/regression_ensemble_model.py 68.42% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2789      +/-   ##
==========================================
- Coverage   95.22%   95.11%   -0.11%     
==========================================
  Files         146      146              
  Lines       15573    15583      +10     
==========================================
- Hits        14829    14822       -7     
- Misses        744      761      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for giving this a go @cnhwl. However, I think we need to adapt the proposed solution.

Here are some points:

  • It should be possible to train an ensemble model on base forecasting models that use an output_chunk_shift. Requirements:
    • All models must use the same output_chunk_shift value.
    • All models must use the same output_chunk_length value.
    • In case of base models using output_chunk_shift, the actual regression_model (the ensemble model) must also use the same output_chunk_shift. In that case we need to check that the future covariates lags for regression_model are {"future": [output_chunk_shift]} (see here)
  • After that: the first predict() call in RegressionEnsembleModel.fit() (see here) should probably not be performed when we use historical fc to fit the model. This predict call is anyways only used to validate all series have the expected time index. Can we find another way to validate that all models have the required time frames? Maybe we can perform a check on the generated historical forecasts.
  • Given all of the above, the model should be able to generate the desired forecasts

@cnhwl
Copy link
Contributor Author

cnhwl commented Apr 30, 2025

Thanks for giving this a go @cnhwl. However, I think we need to adapt the proposed solution.

Here are some points:

  • It should be possible to train an ensemble model on base forecasting models that use an output_chunk_shift. Requirements:

    • All models must use the same output_chunk_shift value.
    • All models must use the same output_chunk_length value.
    • In case of base models using output_chunk_shift, the actual regression_model (the ensemble model) must also use the same output_chunk_shift. In that case we need to check that the future covariates lags for regression_model are {"future": [output_chunk_shift]} (see here)
  • After that: the first predict() call in RegressionEnsembleModel.fit() (see here) should probably not be performed when we use historical fc to fit the model. This predict call is anyways only used to validate all series have the expected time index. Can we find another way to validate that all models have the required time frames? Maybe we can perform a check on the generated historical forecasts.

  • Given all of the above, the model should be able to generate the desired forecasts

Hi! @dennisbader I have completed the three requirements by checking the output_chunk_shift and output_chunk_length.
I still keep the code of making self.regression_model.output_chunk_length > self.forecasting_models[0].output_chunk_length - input_shift to avoid autoregression. If you have better ideas on series length assignment (forecast_training and regression_target), please let me know. 🤝

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] RegressionEnsembleModel fails with base estimators that use output_chunk_shift > 0
2 participants