Skip to content

Conversation

jayhawk-commits
Copy link
Contributor

Depends on ROCm/rocm-systems#808 and Windows CI being green again. When Windows CI is green, I can temporarily change the rocm-systems submodule to point to PR 808's branch to run CI against the combination of both changes.

The dvc usage is only needed for the pal libs, so it is guarded with an if Windows. However, this tool will be used by miopen and hipblaslt at some point, and I could not confirm that the Linux pipelines install pip modules before the fetch sources sequence. The Windows pipelines appear to install the pip modules before fetch sources.

- Separate the dvc exec calls into its own function.
- Move the execution of the calls.
# presently, only amdgpu-windows-interop in rocm-systems uses DVC, but...
# eventually, DVC will be rolled out to math libraries and in linux
print(f"dvc detected in {project_dir}, running dvc pull")
exec(["dvc", "pull"], cwd=project_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to check that dvc exists before running this command. We can then log a helpful error message like

Could not find dvc on PATH so large files could not be fetched
To install dvc, run pip install dvc or pip install -r requirements.txt

Can check with shutil.which

Thankfully, our documentation already suggests running fetch_sources.py from a venv with the requirements installed (https://github.com/ROCm/TheRock?tab=readme-ov-file#building-from-source), but I would definitely prefer if that wasn't a hard requirement. On my systems I may opt for a system-wide install of dvc since I'm already juggling quite a few Python venvs for various builds and tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied this feedback. Thanks!

Comment on lines 94 to 95
for project in dvc_projects:
if project in projects:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can save on indentation by returning/continuing early here

https://testing.googleblog.com/2017/06/code-health-reduce-nesting-reduce.html

def pull_large_files(dvc_projects, projects):
    for project in dvc_projects:
        if not project in projects:
           continue

        submodule_path = get_submodule_path(project)
        ...
        if not dvc_config_file.exists():
          log(...)
          continue

       # check for DVC config in the ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied this feedback. Thanks!

Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structurally this LGTM. I'm wondering about the cost of this new dvc dependency for build environment setup. Curious to see if we can trim it down somehow.

requirements.txt Outdated
Comment on lines 17 to 25
# data version control for pal .lib files
# and eventually large kernels from miopen
# and large logic files from hipblaslt
dvc==3.62.0
dvc-s3==3.2.2
# dvc dependencies that pip takes time
# resolving versioning without guidance
aiobotocore==2.24.1
boto3==1.39.11
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, these new requirements pull in a lot of indirect deps. 108 packages to be specific, compared to 131 total for this whole requirements.txt.

That's another reason to allow (and maybe recommend) installing outside of pip.

We could make these deps conditional on the platform (; platform_system == "Windows"), though that won't help for when MIOpen and hipblaslt also start to use it. I wonder if there's a more lightweight package or dependency chain we could use instead...

λ py -V:3.12 -m venv 3.12.venv && .\3.12.venv\Scripts\activate.bat
(3.12.venv) λ pip freeze
(3.12.venv) λ pip install dvc==3.62.0 dvc-s3==3.2.2 aiobotocore==2.24.1 boto3==1.39.11
(3.12.venv) λ pip freeze
aiobotocore==2.24.1
aiohappyeyeballs==2.6.1
aiohttp==3.12.15
aiohttp-retry==2.9.1
aioitertools==0.12.0
aiosignal==1.4.0
amqp==5.3.1
annotated-types==0.7.0
antlr4-python3-runtime==4.9.3
appdirs==1.4.4
asyncssh==2.21.0
atpublic==6.0.1
attrs==25.3.0
billiard==4.2.1
boto3==1.39.11
botocore==1.39.11
celery==5.5.3
certifi==2025.8.3
cffi==1.17.1
charset-normalizer==3.4.3
click==8.2.1
click-didyoumean==0.3.1
click-plugins==1.1.1.2
click-repl==0.3.0
colorama==0.4.6
configobj==5.0.9
cryptography==45.0.7
dictdiffer==0.9.0
diskcache==5.6.3
distro==1.9.0
dpath==2.2.0
dulwich==0.24.1
dvc==3.62.0
dvc-data==3.16.12
dvc-http==2.32.0
dvc-objects==5.1.1
dvc-render==1.0.2
dvc-s3==3.2.2
dvc-studio-client==0.22.0
dvc-task==0.40.2
entrypoints==0.4
filelock==3.19.1
flatten-dict==0.4.2
flufl.lock==8.2.0
frozenlist==1.7.0
fsspec==2025.9.0
funcy==2.0
gitdb==4.0.12
GitPython==3.1.45
grandalf==0.8
gto==1.8.0
hydra-core==1.3.2
idna==3.10
iterative-telemetry==0.0.10
jmespath==1.0.1
kombu==5.5.4
markdown-it-py==4.0.0
mdurl==0.1.2
multidict==6.6.4
networkx==3.5
omegaconf==2.3.0
orjson==3.11.3
packaging==25.0
pathspec==0.12.1
platformdirs==4.4.0
prompt_toolkit==3.0.52
propcache==0.3.2
psutil==7.0.0
pycparser==2.22
pydantic==2.11.7
pydantic_core==2.33.2
pydot==4.0.1
pygit2==1.18.2
Pygments==2.19.2
pygtrie==2.5.0
pyparsing==3.2.3
python-dateutil==2.9.0.post0
pywin32==311
PyYAML==6.0.2
requests==2.32.5
rich==14.1.0
ruamel.yaml==0.18.15
ruamel.yaml.clib==0.2.12
s3fs==2025.9.0
s3transfer==0.13.1
scmrepo==3.5.2
semver==3.0.4
setuptools==80.9.0
shellingham==1.5.4
shortuuid==1.0.13
shtab==1.7.2
six==1.17.0
smmap==5.0.2
sqltrie==0.11.2
tabulate==0.9.0
tomlkit==0.13.3
tqdm==4.67.1
typer==0.17.3
typing-inspection==0.4.1
typing_extensions==4.15.0
tzdata==2025.2
urllib3==2.5.0
vine==5.1.0
voluptuous==0.15.2
wcwidth==0.2.13
wrapt==1.17.3
yarl==1.20.1
zc.lockfile==3.0.post1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was something I was looking at on Friday. Initially, before I added some of the sub-dependencies to requirements.txt, pip would iterate over every version to find matches for those.

Example: https://github.com/ROCm/TheRock/actions/runs/17331389446/job/49208004639

I managed to at least stop any of that iterative version download sequence with what I added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooof. That doesn't inspire much confidence in those packages being safe to depend on :/. Far too many indirect dependencies in the network to be developer-friendly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll snoop around Hugging Face and some of the other advertised consumers of dvc to see how they're dealing with dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option for the short term when it's still only for Windows, we could add it the chocolatey install list instead of with pip. I did ask in the dvc Discord about this topic of their python dependency list in the context of a CI runner that is mainly using the tool for pull command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Discord suggested installing the binary package or using the github action https://github.com/iterative/setup-dvc

Comment on lines +109 to +113
# check for DVC config in the submodule and run dvc pull if found.
# presently, only amdgpu-windows-interop in rocm-systems uses DVC, but...
# eventually, DVC will be rolled out to math libraries and in linux
print(f"dvc detected in {project_dir}, running dvc pull")
exec(["dvc", "pull"], cwd=project_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this on my dev machine. Seems to work well enough:

(.venv) λ python ./build_tools/fetch_sources.py
++ Exec [D:\projects\TheRock]$ git submodule update --init -- base/half comm-libs/rccl comm-libs/rccl-tests base/rocm-cmake profiler/rocprof-trace-decoder/binaries compiler/hipify compiler/amd-llvm compiler/spirv-llvm-translator rocm-libraries rocm-systems math-libs/BLAS/hipSOLVER math-libs/BLAS/rocSOLVER
dvc detected in D:\projects\TheRock\rocm-systems, running dvc pull
++ Exec [D:\projects\TheRock\rocm-systems]$ dvc pull
Collecting                                                                                   |32.0 [00:00,  483entry/s] Fetching
Building workspace index                                                                     |6.00 [00:00,  187entry/s] Comparing indexes                                                                           |39.0 [00:00, 13.6kentry/s] Applying changes                                                                             |32.0 [00:00,   848file/s] A       shared\amdgpu-windows-interop\pal\lib\Release\x64\ddRpcClient.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\ddRpcShared.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\ddEventClient.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\zstd.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\ddNet.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\SettingsRpcService2.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\DriverUtilsService.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\stb_sprintf.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\amdrdf.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\addrlib.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\ddEventServer.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\ddEventParser.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\devdriver.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\ddRpcServer.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\palCompilerDeps.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\ddCore.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\metrohash.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\ddYaml.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\ddCommon.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\pal_lz4.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\dd_settings.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\cwpack.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\vam.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\pal.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\dd_common.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\ddSocket.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\ddEventStreamer.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\UberTraceService.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\palUtil.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\mpack.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\dd_libyaml.lib
A       shared\amdgpu-windows-interop\pal\lib\Release\x64\pal_uuid.lib
32 files added and 32 files fetched
++ Exec [D:\projects\TheRock]$ git update-index --no-skip-worktree -- base/half comm-libs/rccl comm-libs/rccl-tests base/rocm-cmake profiler/rocprof-trace-decoder/binaries compiler/hipify compiler/amd-llvm compiler/spirv-llvm-translator rocm-libraries rocm-systems math-libs/BLAS/hipSOLVER math-libs/BLAS/rocSOLVER

if shutil.which("dvc") is None:
print("Could not find `dvc` on PATH so large files could not be fetched")
print(
"To install dvc, run `pip install dvc` or `pip install -r requirements.txt`"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requirements.txt - we need to add this dvc to requirements.txt file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are removing it from the requirements due to how many extra dependencies it pulls and adding the setup to be done in the runner image. Will change the output when we decide on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: TODO
Development

Successfully merging this pull request may close these issues.

3 participants