Skip to content

Conversation

christiangnrd
Copy link
Member

This may have to wait for KA 0.10 depending on how much cpu=true affects performance.

@christiangnrd
Copy link
Member Author

Seems like at least with CUDA.jl, using dynamic workgroup sizes recovers ~50% of the performance lost switching over to KernelAbstractions. Is there potentially some overhead with KA that is lesser with Dynamic workgroup sizes?

@maleadt
Copy link
Member

maleadt commented Sep 1, 2025

Is there potentially some overhead with KA that is lesser with Dynamic workgroup sizes?

cc @vchuravy

@vchuravy
Copy link
Member

vchuravy commented Sep 1, 2025

Huh, I would expect static kernel sizes to be a performance benefit or at least performance neutral.

The only thing that could happen is that suddenly we are able to unroll more or something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants