-
Notifications
You must be signed in to change notification settings - Fork 117
--rdma-mpi
flag fix
#996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
--rdma-mpi
flag fix
#996
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a proper fix for the --rdma-mpi
flag that controls whether RDMA MPI tests are included in the test suite. Previously, RDMA MPI tests were always skipped, but now they are conditionally added based on the flag.
Key Changes
- Changed test filtering logic to conditionally add RDMA MPI tests instead of always skipping them
- Set default value for
rdma_mpi
argument toFalse
in the global state - Updated GitHub Actions workflow scripts to use the flag appropriately for different scenarios
Reviewed Changes
Copilot reviewed 9 out of 11 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
toolchain/mfc/test/test.py | Removed code that unconditionally skipped RDMA MPI tests |
toolchain/mfc/test/cases.py | Added conditional logic to include RDMA MPI test cases only when flag is enabled |
toolchain/mfc/state.py | Set default value for rdma_mpi argument to False |
tests/FA4D8FEF/golden.txt | Test output data for RDMA MPI test case |
tests/FA4D8FEF/golden-metadata.txt | Test metadata for RDMA MPI test case |
tests/2C9844EF/golden-metadata.txt | Test metadata for RDMA MPI test case |
tests/1B300F28/golden-metadata.txt | Test metadata for RDMA MPI test case |
.github/workflows/frontier/test.sh | Updated test command to use --rdma-mpi flag conditionally |
.github/workflows/frontier/build.sh | Updated build command to use --rdma-mpi flag |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
if ARG("rdma_mpi"): | ||
cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'m': 29, 'n': 29, 'p': 49, 'rdma_mpi': 'T'}, ppn=2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Guard against non-boolean or string values from ARG("rdma_mpi")
to avoid truthy string pitfalls. Normalize the flag to a strict boolean before the conditional. [possible issue, importance: 6]
if ARG("rdma_mpi"): | |
cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'m': 29, 'n': 29, 'p': 49, 'rdma_mpi': 'T'}, ppn=2)) | |
rdma_enabled = bool(ARG("rdma_mpi")) is True if isinstance(ARG("rdma_mpi"), bool) else str(ARG("rdma_mpi")).lower() in ("1", "t", "true", "yes", "y", "on") | |
if rdma_enabled: | |
cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'m': 29, 'n': 29, 'p': 49, 'rdma_mpi': 'T'}, ppn=2)) |
if ARG("rdma_mpi"): | ||
cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'rdma_mpi': 'T'}, ppn=2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Mirror the boolean normalization here to keep behavior consistent across 2D and 3D paths. This prevents accidental enabling when ARG("rdma_mpi")
is a non-empty string. [general, importance: 6]
if ARG("rdma_mpi"): | |
cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'rdma_mpi': 'T'}, ppn=2)) | |
rdma_enabled = bool(ARG("rdma_mpi")) is True if isinstance(ARG("rdma_mpi"), bool) else str(ARG("rdma_mpi")).lower() in ("1", "t", "true", "yes", "y", "on") | |
if rdma_enabled: | |
cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'rdma_mpi': 'T'}, ppn=2)) |
@@ -50,7 +50,7 @@ def __str__(self) -> str: | |||
|
|||
|
|||
gCFG: MFCConfig = MFCConfig() | |||
gARG: dict = {} | |||
gARG: dict = {"rdma_mpi": False} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Ensure the default aligns with expected type usage by explicitly using a boolean and documenting accepted truthy strings. Add a simple helper to parse flags once and reuse. [general, importance: 7]
gARG: dict = {"rdma_mpi": False} | |
gARG: dict = {"rdma_mpi": False} | |
def arg_bool(name: str, default: bool = False) -> bool: | |
val = ARG(name, default) | |
if isinstance(val, bool): | |
return val | |
return str(val).lower() in ("1", "t", "true", "yes", "y", "on") |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #996 +/- ##
=======================================
Coverage 40.91% 40.91%
=======================================
Files 70 70
Lines 20270 20270
Branches 2520 2520
=======================================
Hits 8293 8293
Misses 10439 10439
Partials 1538 1538 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Nothing else to add, ready for review and merge. |
.github/workflows/frontier/test.sh
Outdated
else | ||
./mfc.sh test -a --rdma-mpi --max-attempts 3 -j 32 -- -c frontier | ||
./mfc.sh test --max-attempts 3 -j 32 -- -c frontier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you removed testing of post_process on frontier (that's what -a
does). please put it back 😄
User description
Description
Proper implementation of the flag for self-hosted frontier gpu test. Simply, it is intended to append the additional tests when needed instead of including/skipping. Also,
./mfc.sh test --rdma-mpi --gpu -- -c frontier
worked as anticipated.Subsequent to (#878)
PR Type
Enhancement, Tests
Description
• Fixed
--rdma-mpi
flag implementation to properly append additional tests instead of including/skipping• Added conditional logic to RDMA MPI test case generation in
cases.py
usingARG("rdma_mpi")
check• Initialized default
rdma_mpi: False
value in state configuration• Updated frontier workflow commands by removing
-a
flags and adjusting--rdma-mpi
flag usage• Added new golden reference test data files and metadata for test cases FA4D8FEF, 1B300F28, and 2C9844EF
• Enhanced test metadata to include CMake configuration, environment variables, and CPU details for RDMA-MPI GPU tests
Diagram Walkthrough
File Walkthrough
1 files
cases.py
Conditional RDMA MPI test case generation
toolchain/mfc/test/cases.py
• Added import for
ARG
from..state
module• Wrapped RDMA MPI test
case generation with conditional check using
ARG("rdma_mpi")
• Applied
conditional logic to both 3D and 2D test case scenarios
3 files
state.py
Initialize RDMA MPI flag default value
toolchain/mfc/state.py
• Initialized
gARG
dictionary with defaultrdma_mpi: False
valuetest.sh
Update frontier test workflow commands
.github/workflows/frontier/test.sh
• Removed
-a
flag from GPU test command• Removed
--rdma-mpi
flag fromCPU test command
build.sh
Update frontier build workflow command
.github/workflows/frontier/build.sh
• Removed
-a
flag from test command while keeping--rdma-mpi
flag4 files
golden.txt
Add golden reference test data
tests/FA4D8FEF/golden.txt
• Added new golden reference file with numerical test data
• Contains
conservative and primitive variable data at different time steps
golden-metadata.txt
Add test metadata for golden reference
tests/FA4D8FEF/golden-metadata.txt
• Added test metadata file with build configuration and system
information
• Contains CMake configuration, environment variables, and
CPU details
golden-metadata.txt
Add additional test metadata file
tests/1B300F28/golden-metadata.txt
• Added another test metadata file with identical system configuration
• Contains same build and environment information as FA4D8FEF test
golden-metadata.txt
Add golden metadata file for RDMA-MPI GPU test configuration
tests/2C9844EF/golden-metadata.txt
• Added a new golden metadata file containing test configuration and
system information
• Includes CMake configuration details for
pre_process, simulation, syscheck, and post_process modules
• Contains
CPU architecture information from an AMD EPYC 7A53 64-Core Processor
system
• Records test invocation with
--rdma-mpi
and--gpu
flags onfrontier configuration
3 files