Skip to content

Conversation

12rambau
Copy link
Member

@12rambau 12rambau commented Aug 2, 2025

This PR simply changes the implementation of the sorting algorithm. Chaining sort algo is possible in GEE and I guess it's not less performant than running our own bubble sort using computed indices. The only trick is to run through the properties in reverse order as the last one to run has priority over the others.

I didn't do performances checks yet but my guess is that it's equivalent. readability being much easier.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the sortMany method in the ImageCollection class to use a simpler chaining approach with Google Earth Engine's native sort functionality instead of a custom bubble sort implementation using computed indices.

  • Replaces complex position computation logic with iterative chaining of GEE's native sort operations
  • Adds validation to ensure properties and ascending arrays are the same size
  • Simplifies the implementation while maintaining the same functionality for multi-property sorting

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

File Description
geetools/ee_image_collection.py Main implementation change - refactored sortMany method to use chained sorting
tests/test_ImageCollection.py Updated test to validate error handling for mismatched array sizes
Multiple serialized test files Updated expected outputs reflecting the new implementation structure
Comments suppressed due to low confidence (1)

tests/test_ImageCollection.py:525

  • The test only validates one specific error case (mismatched array sizes), but doesn't test the actual sorting functionality with the new implementation. Consider adding a test that verifies the sorting behavior still works correctly.
        with pytest.raises(ValueError):

@12rambau
Copy link
Member Author

12rambau commented Aug 4, 2025

I made some tests to benchmark this new solution agains the previous one using the following code:

import ee
import geetools
import pandas as pd

ee.Initialize(project="ee-geetools")

with ee.geetools.Profiler() as p:
    ic = ee.ImageCollection("NOAA/GFS0P25")
    icSorted = (
        ic.limit(500)
        .geetools.sortMany(["forecast_time", "creation_time"])
        .limit(30)
    )
    info = icSorted.toList(icSorted.size()).map(lambda x: ee.Dictionary({
        "index": ee.Image(x).get("system:index"),
        "forecast_time": ee.Date(ee.Image(x).get("forecast_time")).format(),
        "creation_time": ee.Date(ee.Image(x).get("creation_time")).format()
    })).getInfo()

df = pd.DataFrame(p.profile)
df["EECU-s"].sum()

they are in competing range:

old implementation new implementation
EECU 0.144 0.12

So I guess we are not making computation explode with this modifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant