-
Notifications
You must be signed in to change notification settings - Fork 73
Rework constructHipPath so HIP_PATH env var is lower priority. #289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: amd-staging
Are you sure you want to change the base?
Conversation
Fixes #1200. Sent upstream for review as well at ROCm/llvm-project#289. The HIP SDK sets the `HIP_PATH` environment variable, which hipcc and hipconfig were using instead of neighboring files. This resulted in errors like this when the HIP SDK was installed: ``` λ .\build\compiler\hipcc\dist\bin\hipcc --version 'C:\Program' is not recognized as an internal or external command, operable program or batch file. The system cannot find the path specified. 'nvcc' is not recognized as an internal or external command, operable program or batch file. Device not supported - Defaulting to AMD HIP version: 6.2.41512-db3292736 The system cannot find the path specified. failed to execute:""C:\Program Files\AMD\ROCm\6.2\lib\llvm\bin\clang.exe" --driver-mode=g++ -O3 -fuse-ld=lld --ld-path="C:\Program Files\AMD\ROCm\6.2\lib\llvm\bin\lld-link.exe" -Llib --hip-link --version" ``` A robust way to handle this would be to stamp all information about related tools into the tools themselves during packaging, rather than use global state like environment variables or heuristics like scanning the file system. I think once we've progressed further towards rolling out the new ROCm "superrepos" and open source CI/CD we can make those deeper changes. For now, I've extended the brittle logic with another special case.
Ping? Does anyone monitor this repository? Who should review this? |
Taking a look. thanks for the ping! |
Thanks! We've been carrying this patch in TheRock for some time now: https://github.com/ROCm/TheRock/blob/main/patches/amd-mainline/llvm-project/0005-Rework-constructHipPath-so-HIP_PATH-env-var-is-lower.patch and we're making a push to get to 0 patches across LLVM, rocm-libraries, rocm-systems, etc. |
I have some concerns. The original issue was on windows, but this change is OS agnostic. Granted the hip_path and rocm_path variables and their purpose/use is confusing. But I think what you really want to change is rocm_path, as that is what is used to find clang. I'd also suggest looking at the function "constructCompilerPath()", that function has similar changes and maybe is a better place to put this change? For reference, how and when hipcc uses these variables is documented here: https://rocmdocs.amd.com/projects/llvm-project/en/latest/LLVM/clang/html/HIPSupport.html#path-setting-for-dependencies . Also is the issue you're trying to solve the case where you build hipcc locally and you don't want that built hipcc to use the ROCm installed clang? Couldn't that just be solved by setting the ROCM_PATH as needed? |
No, without this or an equivalent patch there is a bug where installs through ROCm Python packages (a new recommended way to use ROCm on Linux and Windows) were escaping their sandbox and looking at system-installed packages. The way those system-installed packages were called happened to have a syntax error (not escaping spaces in the path), but the sandbox escape in general was the larger bug. Unfortunately, many users and developers have those environment variables already set by installers and scripts, pointing to global (potentially old) system installs. Development builds and self-contained installs (e.g. in Python virtual environments) need the ability to reference their own tools. Thanks for the link to the docs... yes, I think this is a problem:
Even if we recommend users not install system-wide at the same time as they install self-contained packages (of which there may be many on the system), there would still be this dangerous failure mode where the global environment variable would change behavior.
I'll look around to see if changes to those functions would also work. |
Maybe we could bake some |
I think by default all ROCm tools should know where other ROCm tools in the same install are located and should call them directly. Relying on the system path and environment variables to poke around elsewhere on disk is very brittle. |
@ScottTodd So I do think there is something wrong with the logic in "constructHipPath()" that needs to be fixed. Ideally if we follow the order of precedence documented, we should do the following:
We're close to that, but I think we're missing the check for rocm_path. Note, the code to handle the env vars and options, is not pretty. We have a "variables" struct as well as a "envVariables". The former is used as a place to hold the final/reconciled values, and the envVariables is used to just hold the actual (shell) environment vars. But you should also note that the hipPath is primarily used to find the HIP runtime and its header files. Or at least it should. I think part of the problem is that we also used hip-path to help find llvm/clang because the value of hip-path has precedence over rocm-path. I think we did this specifically on Windows, because we package the HIP RT, headers, and compiler (clang) all in the same location as part of the HIP_SDK package. |
Progress on ROCm/TheRock#1200. Tested downstream (via a patch) in ROCm/TheRock#1201.
The HIP SDK sets the
HIP_PATH
environment variable, which hipcc and hipconfig were using instead of neighboring files.A robust way to handle this would be to stamp all information about related tools into the tools themselves during packaging, rather than use global state like environment variables or heuristics like scanning the file system. I think once we've progressed further towards rolling out the new ROCm "superrepos" and open source CI/CD we can make those deeper changes. For now, I've extended the brittle logic with another special case.