Skip to content

Conversation

ScottTodd
Copy link
Member

@ScottTodd ScottTodd commented Aug 5, 2025

Progress on ROCm/TheRock#1200. Tested downstream (via a patch) in ROCm/TheRock#1201.

The HIP SDK sets the HIP_PATH environment variable, which hipcc and hipconfig were using instead of neighboring files.

A robust way to handle this would be to stamp all information about related tools into the tools themselves during packaging, rather than use global state like environment variables or heuristics like scanning the file system. I think once we've progressed further towards rolling out the new ROCm "superrepos" and open source CI/CD we can make those deeper changes. For now, I've extended the brittle logic with another special case.

@ScottTodd ScottTodd marked this pull request as ready for review August 5, 2025 22:52
ScottTodd added a commit to ROCm/TheRock that referenced this pull request Aug 5, 2025
Fixes #1200. Sent upstream for
review as well at ROCm/llvm-project#289.

The HIP SDK sets the `HIP_PATH` environment variable, which hipcc and
hipconfig were using instead of neighboring files. This resulted in
errors like this when the HIP SDK was installed:
```
λ .\build\compiler\hipcc\dist\bin\hipcc --version
'C:\Program' is not recognized as an internal or external command,
operable program or batch file.
The system cannot find the path specified.
'nvcc' is not recognized as an internal or external command,
operable program or batch file.
Device not supported - Defaulting to AMD
HIP version: 6.2.41512-db3292736
The system cannot find the path specified.
failed to execute:""C:\Program Files\AMD\ROCm\6.2\lib\llvm\bin\clang.exe"  --driver-mode=g++ -O3 -fuse-ld=lld --ld-path="C:\Program Files\AMD\ROCm\6.2\lib\llvm\bin\lld-link.exe" -Llib --hip-link  --version"
```

A robust way to handle this would be to stamp all information about
related tools into the tools themselves during packaging, rather than
use global state like environment variables or heuristics like scanning
the file system. I think once we've progressed further towards rolling
out the new ROCm "superrepos" and open source CI/CD we can make those
deeper changes. For now, I've extended the brittle logic with another
special case.
@ScottTodd
Copy link
Member Author

Ping? Does anyone monitor this repository? Who should review this?

@lamb-j
Copy link
Collaborator

lamb-j commented Sep 11, 2025

Taking a look. thanks for the ping!

@ScottTodd
Copy link
Member Author

Thanks! We've been carrying this patch in TheRock for some time now: https://github.com/ROCm/TheRock/blob/main/patches/amd-mainline/llvm-project/0005-Rework-constructHipPath-so-HIP_PATH-env-var-is-lower.patch and we're making a push to get to 0 patches across LLVM, rocm-libraries, rocm-systems, etc.

@david-salinas
Copy link

I have some concerns. The original issue was on windows, but this change is OS agnostic. Granted the hip_path and rocm_path variables and their purpose/use is confusing. But I think what you really want to change is rocm_path, as that is what is used to find clang. I'd also suggest looking at the function "constructCompilerPath()", that function has similar changes and maybe is a better place to put this change? For reference, how and when hipcc uses these variables is documented here: https://rocmdocs.amd.com/projects/llvm-project/en/latest/LLVM/clang/html/HIPSupport.html#path-setting-for-dependencies . Also is the issue you're trying to solve the case where you build hipcc locally and you don't want that built hipcc to use the ROCm installed clang? Couldn't that just be solved by setting the ROCM_PATH as needed?

@ScottTodd
Copy link
Member Author

cc @stellaraccident

Also is the issue you're trying to solve the case where you build hipcc locally and you don't want that built hipcc to use the ROCm installed clang? Couldn't that just be solved by setting the ROCM_PATH as needed?

No, without this or an equivalent patch there is a bug where installs through ROCm Python packages (a new recommended way to use ROCm on Linux and Windows) were escaping their sandbox and looking at system-installed packages. The way those system-installed packages were called happened to have a syntax error (not escaping spaces in the path), but the sandbox escape in general was the larger bug.

Unfortunately, many users and developers have those environment variables already set by installers and scripts, pointing to global (potentially old) system installs. Development builds and self-contained installs (e.g. in Python virtual environments) need the ability to reference their own tools.

Thanks for the link to the docs... yes, I think this is a problem:

Order of Precedence for HIP Path

  1. --hip-path compiler option
  2. HIP_PATH environment variable (use with caution)
  3. --rocm-path compiler option
  4. ROCM_PATH environment variable (use with caution)
  5. Default automatic detection (relative to Clang or at the default ROCm installation location)

Even if we recommend users not install system-wide at the same time as they install self-contained packages (of which there may be many on the system), there would still be this dangerous failure mode where the global environment variable would change behavior.

The original issue was on windows, but this change is OS agnostic. Granted the hip_path and rocm_path variables and their purpose/use is confusing. But I think what you really want to change is rocm_path, as that is what is used to find clang. I'd also suggest looking at the function "constructCompilerPath()", that function has similar changes and maybe is a better place to put this change?

I'll look around to see if changes to those functions would also work.

@ScottTodd
Copy link
Member Author

Maybe we could bake some --hip-path or --rocm-path settings into the packages? I'm not sure... these environment variables are some pretty big tech debt that we've been trying to fix.

@ScottTodd
Copy link
Member Author

I think by default all ROCm tools should know where other ROCm tools in the same install are located and should call them directly. Relying on the system path and environment variables to poke around elsewhere on disk is very brittle.

@david-salinas
Copy link

@ScottTodd So I do think there is something wrong with the logic in "constructHipPath()" that needs to be fixed. Ideally if we follow the order of precedence documented, we should do the following:

  1. use the value in compiler option --hip-path
  2. if no --hip-path option, use HIP_PATH env var
  3. if no HIP_PATH env var, use --rocm-path option
  4. if no --rocm-path option, use ROCM_PATH env var
  5. if no ROCM_PATH env var, detect based on self location path - which changes between Windows and Linux. Windows SDK doesn't have llvm/lib path.

We're close to that, but I think we're missing the check for rocm_path. Note, the code to handle the env vars and options, is not pretty. We have a "variables" struct as well as a "envVariables". The former is used as a place to hold the final/reconciled values, and the envVariables is used to just hold the actual (shell) environment vars.

But you should also note that the hipPath is primarily used to find the HIP runtime and its header files. Or at least it should. I think part of the problem is that we also used hip-path to help find llvm/clang because the value of hip-path has precedence over rocm-path. I think we did this specifically on Windows, because we package the HIP RT, headers, and compiler (clang) all in the same location as part of the HIP_SDK package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants