Skip to content

Conversation

Malmahrouqi3
Copy link
Collaborator

@Malmahrouqi3 Malmahrouqi3 commented Sep 7, 2025

User description

Description

Proper implementation of the flag for self-hosted frontier gpu test. Simply, it is intended to append the additional tests when needed instead of including/skipping. Also, ./mfc.sh test --rdma-mpi --gpu -- -c frontier worked as anticipated.

Subsequent to (#878)


PR Type

Enhancement, Tests


Description

• Fixed --rdma-mpi flag implementation to properly append additional tests instead of including/skipping
• Added conditional logic to RDMA MPI test case generation in cases.py using ARG("rdma_mpi") check
• Initialized default rdma_mpi: False value in state configuration
• Updated frontier workflow commands by removing -a flags and adjusting --rdma-mpi flag usage
• Added new golden reference test data files and metadata for test cases FA4D8FEF, 1B300F28, and 2C9844EF
• Enhanced test metadata to include CMake configuration, environment variables, and CPU details for RDMA-MPI GPU tests


Diagram Walkthrough

flowchart LR
  A["state.py"] -- "initialize default" --> B["rdma_mpi: False"]
  B --> C["cases.py"]
  C -- "conditional check" --> D["ARG('rdma_mpi')"]
  D -- "wrap test generation" --> E["RDMA MPI test cases"]
  F["frontier workflows"] -- "remove -a flags" --> G["updated commands"]
  H["new test files"] --> I["golden reference data"]
Loading

File Walkthrough

Relevant files
Enhancement
1 files
cases.py
Conditional RDMA MPI test case generation                               

toolchain/mfc/test/cases.py

• Added import for ARG from ..state module
• Wrapped RDMA MPI test
case generation with conditional check using ARG("rdma_mpi")
• Applied
conditional logic to both 3D and 2D test case scenarios

+5/-2     
Configuration changes
3 files
state.py
Initialize RDMA MPI flag default value                                     

toolchain/mfc/state.py

• Initialized gARG dictionary with default rdma_mpi: False value

+1/-1     
test.sh
Update frontier test workflow commands                                     

.github/workflows/frontier/test.sh

• Removed -a flag from GPU test command
• Removed --rdma-mpi flag from
CPU test command

+2/-2     
build.sh
Update frontier build workflow command                                     

.github/workflows/frontier/build.sh

• Removed -a flag from test command while keeping --rdma-mpi flag

+1/-1     
Tests
4 files
golden.txt
Add golden reference test data                                                     

tests/FA4D8FEF/golden.txt

• Added new golden reference file with numerical test data
• Contains
conservative and primitive variable data at different time steps

+32/-0   
golden-metadata.txt
Add test metadata for golden reference                                     

tests/FA4D8FEF/golden-metadata.txt

• Added test metadata file with build configuration and system
information
• Contains CMake configuration, environment variables, and
CPU details

+189/-0 
golden-metadata.txt
Add additional test metadata file                                               

tests/1B300F28/golden-metadata.txt

• Added another test metadata file with identical system configuration

• Contains same build and environment information as FA4D8FEF test

+189/-0 
golden-metadata.txt
Add golden metadata file for RDMA-MPI GPU test configuration

tests/2C9844EF/golden-metadata.txt

• Added a new golden metadata file containing test configuration and
system information
• Includes CMake configuration details for
pre_process, simulation, syscheck, and post_process modules
• Contains
CPU architecture information from an AMD EPYC 7A53 64-Core Processor
system
• Records test invocation with --rdma-mpi and --gpu flags on
frontier configuration

+189/-0 
Additional files
3 files
golden.txt +20/-0   
golden.txt +23/-0   
test.py +0/-5     

@Copilot Copilot AI review requested due to automatic review settings September 7, 2025 21:44
@Malmahrouqi3 Malmahrouqi3 requested review from a team and sbryngelson as code owners September 7, 2025 21:44
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a proper fix for the --rdma-mpi flag that controls whether RDMA MPI tests are included in the test suite. Previously, RDMA MPI tests were always skipped, but now they are conditionally added based on the flag.

Key Changes

  • Changed test filtering logic to conditionally add RDMA MPI tests instead of always skipping them
  • Set default value for rdma_mpi argument to False in the global state
  • Updated GitHub Actions workflow scripts to use the flag appropriately for different scenarios

Reviewed Changes

Copilot reviewed 9 out of 11 changed files in this pull request and generated no comments.

Show a summary per file
File Description
toolchain/mfc/test/test.py Removed code that unconditionally skipped RDMA MPI tests
toolchain/mfc/test/cases.py Added conditional logic to include RDMA MPI test cases only when flag is enabled
toolchain/mfc/state.py Set default value for rdma_mpi argument to False
tests/FA4D8FEF/golden.txt Test output data for RDMA MPI test case
tests/FA4D8FEF/golden-metadata.txt Test metadata for RDMA MPI test case
tests/2C9844EF/golden-metadata.txt Test metadata for RDMA MPI test case
tests/1B300F28/golden-metadata.txt Test metadata for RDMA MPI test case
.github/workflows/frontier/test.sh Updated test command to use --rdma-mpi flag conditionally
.github/workflows/frontier/build.sh Updated build command to use --rdma-mpi flag

Copy link

qodo-merge-pro bot commented Sep 7, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Conditional Coverage

RDMA-MPI cases are now appended only when ARG('rdma_mpi') is true. Please verify that all intended RDMA-MPI scenarios (both 3D and non-3D branches) are covered and that other generators don’t need similar conditional additions to avoid silent gaps.

    if ARG("rdma_mpi"):
        cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'m': 29, 'n': 29, 'p': 49, 'rdma_mpi': 'T'}, ppn=2))
else:
    cases.append(define_case_d(stack, '2 MPI Ranks', {}, ppn=2))
    if ARG("rdma_mpi"):
        cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'rdma_mpi': 'T'}, ppn=2))
Default Behavior

Setting gARG with a default {'rdma_mpi': False} may mask missing CLI parsing. Confirm that command-line parsing properly toggles this flag and that no other expected defaults are overwritten or shadowed.

gARG: dict      = {"rdma_mpi": False}

def ARG(arg: str, dflt = None) -> typing.Any:
    # pylint: disable=global-variable-not-assigned
Workflow Parity

The CPU path no longer passes --rdma-mpi, while GPU does. Confirm this is intentional and that CPU RDMA-MPI tests aren’t required. Also validate removal of -a doesn’t drop needed test groups.

    ./mfc.sh test --rdma-mpi --max-attempts 3 -j $ngpus -- -c frontier
else
    ./mfc.sh test --max-attempts 3 -j 32 -- -c frontier
fi

Comment on lines +366 to +367
if ARG("rdma_mpi"):
cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'m': 29, 'n': 29, 'p': 49, 'rdma_mpi': 'T'}, ppn=2))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Guard against non-boolean or string values from ARG("rdma_mpi") to avoid truthy string pitfalls. Normalize the flag to a strict boolean before the conditional. [possible issue, importance: 6]

Suggested change
if ARG("rdma_mpi"):
cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'m': 29, 'n': 29, 'p': 49, 'rdma_mpi': 'T'}, ppn=2))
rdma_enabled = bool(ARG("rdma_mpi")) is True if isinstance(ARG("rdma_mpi"), bool) else str(ARG("rdma_mpi")).lower() in ("1", "t", "true", "yes", "y", "on")
if rdma_enabled:
cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'m': 29, 'n': 29, 'p': 49, 'rdma_mpi': 'T'}, ppn=2))

Comment on lines +370 to +371
if ARG("rdma_mpi"):
cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'rdma_mpi': 'T'}, ppn=2))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Mirror the boolean normalization here to keep behavior consistent across 2D and 3D paths. This prevents accidental enabling when ARG("rdma_mpi") is a non-empty string. [general, importance: 6]

Suggested change
if ARG("rdma_mpi"):
cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'rdma_mpi': 'T'}, ppn=2))
rdma_enabled = bool(ARG("rdma_mpi")) is True if isinstance(ARG("rdma_mpi"), bool) else str(ARG("rdma_mpi")).lower() in ("1", "t", "true", "yes", "y", "on")
if rdma_enabled:
cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'rdma_mpi': 'T'}, ppn=2))

@@ -50,7 +50,7 @@ def __str__(self) -> str:


gCFG: MFCConfig = MFCConfig()
gARG: dict = {}
gARG: dict = {"rdma_mpi": False}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Ensure the default aligns with expected type usage by explicitly using a boolean and documenting accepted truthy strings. Add a simple helper to parse flags once and reuse. [general, importance: 7]

Suggested change
gARG: dict = {"rdma_mpi": False}
gARG: dict = {"rdma_mpi": False}
def arg_bool(name: str, default: bool = False) -> bool:
val = ARG(name, default)
if isinstance(val, bool):
return val
return str(val).lower() in ("1", "t", "true", "yes", "y", "on")

Copy link

codecov bot commented Sep 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.91%. Comparing base (1bf4e9a) to head (6c4e8cf).

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #996   +/-   ##
=======================================
  Coverage   40.91%   40.91%           
=======================================
  Files          70       70           
  Lines       20270    20270           
  Branches     2520     2520           
=======================================
  Hits         8293     8293           
  Misses      10439    10439           
  Partials     1538     1538           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Malmahrouqi3
Copy link
Collaborator Author

Nothing else to add, ready for review and merge.

else
./mfc.sh test -a --rdma-mpi --max-attempts 3 -j 32 -- -c frontier
./mfc.sh test --max-attempts 3 -j 32 -- -c frontier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you removed testing of post_process on frontier (that's what -a does). please put it back 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants