`--rdma-mpi` flag fix #996

Malmahrouqi3 · 2025-09-07T21:44:29Z

User description

Description

Proper implementation of the flag for self-hosted frontier gpu test. Simply, it is intended to append the additional tests when needed instead of including/skipping. Also, ./mfc.sh test --rdma-mpi --gpu -- -c frontier worked as anticipated.

Subsequent to (#878)

PR Type

Enhancement, Tests

Description

• Fixed --rdma-mpi flag implementation to properly append additional tests instead of including/skipping
• Added conditional logic to RDMA MPI test case generation in cases.py using ARG("rdma_mpi") check
• Initialized default rdma_mpi: False value in state configuration
• Updated frontier workflow commands by removing -a flags and adjusting --rdma-mpi flag usage
• Added new golden reference test data files and metadata for test cases FA4D8FEF, 1B300F28, and 2C9844EF
• Enhanced test metadata to include CMake configuration, environment variables, and CPU details for RDMA-MPI GPU tests

Diagram Walkthrough

flowchart LR
  A["state.py"] -- "initialize default" --> B["rdma_mpi: False"]
  B --> C["cases.py"]
  C -- "conditional check" --> D["ARG('rdma_mpi')"]
  D -- "wrap test generation" --> E["RDMA MPI test cases"]
  F["frontier workflows"] -- "remove -a flags" --> G["updated commands"]
  H["new test files"] --> I["golden reference data"]

File Walkthrough

Relevant files

Enhancement

1 files

cases.py `Conditional RDMA MPI test case generation` toolchain/mfc/test/cases.py • Added import for `ARG` from `..state` module • Wrapped RDMA MPI test case generation with conditional check using `ARG("rdma_mpi")` • Applied conditional logic to both 3D and 2D test case scenarios	+5/-2

Configuration changes

3 files

state.py `Initialize RDMA MPI flag default value` toolchain/mfc/state.py • Initialized `gARG` dictionary with default `rdma_mpi: False` value	+1/-1
test.sh `Update frontier test workflow commands` .github/workflows/frontier/test.sh • Removed `-a` flag from GPU test command • Removed `--rdma-mpi` flag from CPU test command	+2/-2
build.sh `Update frontier build workflow command` .github/workflows/frontier/build.sh • Removed `-a` flag from test command while keeping `--rdma-mpi` flag	+1/-1

Tests

4 files

golden.txt `Add golden reference test data` tests/FA4D8FEF/golden.txt • Added new golden reference file with numerical test data • Contains conservative and primitive variable data at different time steps	+32/-0
golden-metadata.txt `Add test metadata for golden reference` tests/FA4D8FEF/golden-metadata.txt • Added test metadata file with build configuration and system information • Contains CMake configuration, environment variables, and CPU details	+189/-0
golden-metadata.txt `Add additional test metadata file` tests/1B300F28/golden-metadata.txt • Added another test metadata file with identical system configuration • Contains same build and environment information as FA4D8FEF test	+189/-0
golden-metadata.txt `Add golden metadata file for RDMA-MPI GPU test configuration` tests/2C9844EF/golden-metadata.txt • Added a new golden metadata file containing test configuration and system information • Includes CMake configuration details for pre_process, simulation, syscheck, and post_process modules • Contains CPU architecture information from an AMD EPYC 7A53 64-Core Processor system • Records test invocation with `--rdma-mpi` and `--gpu` flags on frontier configuration	+189/-0

Additional files

3 files

golden.txt	+20/-0
golden.txt	+23/-0
test.py	+0/-5

Copilot

Pull Request Overview

This PR implements a proper fix for the --rdma-mpi flag that controls whether RDMA MPI tests are included in the test suite. Previously, RDMA MPI tests were always skipped, but now they are conditionally added based on the flag.

Key Changes

Changed test filtering logic to conditionally add RDMA MPI tests instead of always skipping them
Set default value for rdma_mpi argument to False in the global state
Updated GitHub Actions workflow scripts to use the flag appropriately for different scenarios

Reviewed Changes

Copilot reviewed 9 out of 11 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
toolchain/mfc/test/test.py	Removed code that unconditionally skipped RDMA MPI tests
toolchain/mfc/test/cases.py	Added conditional logic to include RDMA MPI test cases only when flag is enabled
toolchain/mfc/state.py	Set default value for `rdma_mpi` argument to `False`
tests/FA4D8FEF/golden.txt	Test output data for RDMA MPI test case
tests/FA4D8FEF/golden-metadata.txt	Test metadata for RDMA MPI test case
tests/2C9844EF/golden-metadata.txt	Test metadata for RDMA MPI test case
tests/1B300F28/golden-metadata.txt	Test metadata for RDMA MPI test case
.github/workflows/frontier/test.sh	Updated test command to use `--rdma-mpi` flag conditionally
.github/workflows/frontier/build.sh	Updated build command to use `--rdma-mpi` flag

qodo-merge-pro · 2025-09-07T21:45:41Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Conditional Coverage RDMA-MPI cases are now appended only when ARG('rdma_mpi') is true. Please verify that all intended RDMA-MPI scenarios (both 3D and non-3D branches) are covered and that other generators don’t need similar conditional additions to avoid silent gaps. if ARG("rdma_mpi"): cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'m': 29, 'n': 29, 'p': 49, 'rdma_mpi': 'T'}, ppn=2)) else: cases.append(define_case_d(stack, '2 MPI Ranks', {}, ppn=2)) if ARG("rdma_mpi"): cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'rdma_mpi': 'T'}, ppn=2)) Default Behavior Setting gARG with a default {'rdma_mpi': False} may mask missing CLI parsing. Confirm that command-line parsing properly toggles this flag and that no other expected defaults are overwritten or shadowed. gARG: dict = {"rdma_mpi": False} def ARG(arg: str, dflt = None) -> typing.Any: # pylint: disable=global-variable-not-assigned Workflow Parity The CPU path no longer passes --rdma-mpi, while GPU does. Confirm this is intentional and that CPU RDMA-MPI tests aren’t required. Also validate removal of -a doesn’t drop needed test groups. ./mfc.sh test --rdma-mpi --max-attempts 3 -j $ngpus -- -c frontier else ./mfc.sh test --max-attempts 3 -j 32 -- -c frontier fi

qodo-merge-pro · 2025-09-07T21:46:21Z

toolchain/mfc/test/cases.py

+            if ARG("rdma_mpi"):
+                cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'m': 29, 'n': 29, 'p': 49, 'rdma_mpi': 'T'}, ppn=2))


Suggestion: Guard against non-boolean or string values from ARG("rdma_mpi") to avoid truthy string pitfalls. Normalize the flag to a strict boolean before the conditional. [possible issue, importance: 6]

Suggested change

if ARG("rdma_mpi"):

cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'m': 29, 'n': 29, 'p': 49, 'rdma_mpi': 'T'}, ppn=2))

rdma_enabled = bool(ARG("rdma_mpi")) is True if isinstance(ARG("rdma_mpi"), bool) else str(ARG("rdma_mpi")).lower() in ("1", "t", "true", "yes", "y", "on")

if rdma_enabled:

cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'m': 29, 'n': 29, 'p': 49, 'rdma_mpi': 'T'}, ppn=2))

qodo-merge-pro · 2025-09-07T21:46:22Z

toolchain/mfc/test/cases.py

+            if ARG("rdma_mpi"):
+                cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'rdma_mpi': 'T'}, ppn=2))


Suggestion: Mirror the boolean normalization here to keep behavior consistent across 2D and 3D paths. This prevents accidental enabling when ARG("rdma_mpi") is a non-empty string. [general, importance: 6]

Suggested change

if ARG("rdma_mpi"):

cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'rdma_mpi': 'T'}, ppn=2))

rdma_enabled = bool(ARG("rdma_mpi")) is True if isinstance(ARG("rdma_mpi"), bool) else str(ARG("rdma_mpi")).lower() in ("1", "t", "true", "yes", "y", "on")

if rdma_enabled:

cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'rdma_mpi': 'T'}, ppn=2))

qodo-merge-pro · 2025-09-07T21:46:22Z

toolchain/mfc/state.py

@@ -50,7 +50,7 @@ def __str__(self) -> str:


 gCFG: MFCConfig = MFCConfig()
-gARG: dict      = {}
+gARG: dict      = {"rdma_mpi": False}


Suggestion: Ensure the default aligns with expected type usage by explicitly using a boolean and documenting accepted truthy strings. Add a simple helper to parse flags once and reuse. [general, importance: 7]

Suggested change

gARG: dict = {"rdma_mpi": False}

gARG: dict = {"rdma_mpi": False}

def arg_bool(name: str, default: bool = False) -> bool:

val = ARG(name, default)

if isinstance(val, bool):

return val

return str(val).lower() in ("1", "t", "true", "yes", "y", "on")

codecov · 2025-09-08T00:13:36Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.91%. Comparing base (1bf4e9a) to head (6c4e8cf).

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #996   +/-   ##
=======================================
  Coverage   40.91%   40.91%           
=======================================
  Files          70       70           
  Lines       20270    20270           
  Branches     2520     2520           
=======================================
  Hits         8293     8293           
  Misses      10439    10439           
  Partials     1538     1538

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Malmahrouqi3 · 2025-09-08T14:28:48Z

Nothing else to add, ready for review and merge.

sbryngelson · 2025-09-10T14:52:40Z

.github/workflows/frontier/test.sh

 else
-    ./mfc.sh test -a --rdma-mpi --max-attempts 3 -j 32 -- -c frontier
+    ./mfc.sh test --max-attempts 3 -j 32 -- -c frontier


you removed testing of post_process on frontier (that's what -a does). please put it back 😄

Malmahrouqi3 added 3 commits September 7, 2025 13:57

updated fixes

b0ea72d

improvement

fa58f9b

workflow files

98b4e4a

Copilot AI review requested due to automatic review settings September 7, 2025 21:44

Malmahrouqi3 requested review from a team and sbryngelson as code owners September 7, 2025 21:44

Copilot AI reviewed Sep 7, 2025

View reviewed changes

qodo-merge-pro bot added the Review effort 2/5 label Sep 7, 2025

sbryngelson approved these changes Sep 7, 2025

View reviewed changes

qodo-merge-pro bot reviewed Sep 7, 2025

View reviewed changes

Malmahrouqi3 and others added 2 commits September 7, 2025 23:30

re-generated golden file

e7348b5

Merge branch 'master' into RDMA-fix

0518a4f

sbryngelson approved these changes Sep 8, 2025

View reviewed changes

sbryngelson reviewed Sep 10, 2025

View reviewed changes

Malmahrouqi3 and others added 2 commits September 10, 2025 12:47

added -a for post_process testing

07d217b

Merge branch 'master' into RDMA-fix

6c4e8cf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`--rdma-mpi` flag fix #996

`--rdma-mpi` flag fix #996

Uh oh!

Malmahrouqi3 commented Sep 7, 2025 •

edited by qodo-merge-pro bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

qodo-merge-pro bot commented Sep 7, 2025

Uh oh!

qodo-merge-pro bot Sep 7, 2025

Uh oh!

qodo-merge-pro bot Sep 7, 2025

Uh oh!

qodo-merge-pro bot Sep 7, 2025

Uh oh!

codecov bot commented Sep 8, 2025 •

edited

Loading

Uh oh!

Malmahrouqi3 commented Sep 8, 2025

Uh oh!

sbryngelson Sep 10, 2025

Uh oh!

Uh oh!

		if ARG("rdma_mpi"):
		cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'m': 29, 'n': 29, 'p': 49, 'rdma_mpi': 'T'}, ppn=2))

-gARG: dict      = {"rdma_mpi": False}
+gARG: dict = {"rdma_mpi": False}
+def arg_bool(name: str, default: bool = False) -> bool:
+    val = ARG(name, default)
+    if isinstance(val, bool):
+        return val
+    return str(val).lower() in ("1", "t", "true", "yes", "y", "on")

--rdma-mpi flag fix #996

Are you sure you want to change the base?

--rdma-mpi flag fix #996

Uh oh!

Conversation

Malmahrouqi3 commented Sep 7, 2025 • edited by qodo-merge-pro bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

qodo-merge-pro bot commented Sep 7, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-merge-pro bot Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

qodo-merge-pro bot Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

qodo-merge-pro bot Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Malmahrouqi3 commented Sep 8, 2025

Uh oh!

sbryngelson Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

`--rdma-mpi` flag fix #996

`--rdma-mpi` flag fix #996

Malmahrouqi3 commented Sep 7, 2025 •

edited by qodo-merge-pro bot

Loading

codecov bot commented Sep 8, 2025 •

edited

Loading