Skip to content

Conversation

ryanhoangt
Copy link

Reference Issues/PRs

Fix SWE-bench#377

What does this implement/fix? Explain your changes.

When running patch eval on Modal, I see that for some instances, the content of test output files being captured are out of order, which causes the test summary to fall outside the >>>>> Start Test Output and >>>>> End Test Output markers. I attached a sample log file below.

test_output_astropy__astropy-12907.txt

This PR removes the content slicing line and uses the whole file content for parsing.

Any other comments?

🧡 Thanks for contributing!

sedrick-keh-tri and others added 30 commits March 24, 2025 23:28
Some of the repo_setup.sh scripts leave the working tree in a dirty state which can make it difficult to generate a patch that applies cleanly.  This change commits any outstanding changges such that any patch generated with `git diff` will cleanly apply to a newly launched container during the evaluation step.
* Simplify installation guidelines for inference submodule

* Fixes SWE-bench#368

* Update version
* add docs

* Add leaderboard

* Remove unused import

* Update docs

* Update version

* Update:
 docs
* Support multilingual evaluation

* CI: Fix documentation building vs deploying

* Minor fixes

* Remove some redundancy

* Update dataset ref

---------

Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de>
Co-authored-by: John Yang <byjohnyang@gmail.com>
…h#358)

* fix: preserve all issue references with same keyword in PRs

* Modified extract_resolved_issues to use a set instead of list to store references
Co-authored-by: John Yang <byjohnyang@gmail.com>
…t.py SWE-bench#368 (SWE-bench#369)

* fix prompt_col from text_inputs to text

* update log

---------

Co-authored-by: changqingai <changqingai@tencent.com>
Match the documentation for installing additional dependencies with the contents of `pyproject.toml`
carlosejimenez and others added 18 commits June 1, 2025 21:07
This action fails if more than 1 is running at the same time (which
happens if you merge multiple PRs in quick succession). Fix is by
disabling concurrency, so they just queue up.
SWE-bench#417)

* fix(build): fix python base images requirement types-setuptools incorrect version when replacing

* Update clean_requirements and clean_environment_yml patterns to remove version specs safely

---------

Co-authored-by: baixuran <baixuran@bytedance.com>
Co-authored-by: carlose <cjsaltlake@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent evaluation in Modal vs without using Modal