Port `reverse` from CUDA.jl #609

christiangnrd · 2025-08-04T19:49:29Z

This may have to wait for KA 0.10 depending on how much cpu=true affects performance.

christiangnrd · 2025-08-05T21:42:47Z

Seems like at least with CUDA.jl, using dynamic workgroup sizes recovers ~50% of the performance lost switching over to KernelAbstractions. Is there potentially some overhead with KA that is lesser with Dynamic workgroup sizes?

maleadt · 2025-09-01T05:05:17Z

Is there potentially some overhead with KA that is lesser with Dynamic workgroup sizes?

cc @vchuravy

vchuravy · 2025-09-01T09:25:03Z

Huh, I would expect static kernel sizes to be a performance benefit or at least performance neutral.

The only thing that could happen is that suddenly we are able to unroll more or something like that.

christiangnrd added 10 commits July 29, 2025 18:50

Start using AcceleratedKernels.jl

9d9f432

Port reverse from CUDA

46e8f03

Try with cpu=true

aed2acc

Oops

8eb1ba1

Remove return

ae2d26c

shbrg

3d98fea

Finally?

ee2cfa1

ifhbvw

fcc63e4

minor opt

47f4ea1

dynamic

db77f9d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Port `reverse` from CUDA.jl #609

Port `reverse` from CUDA.jl #609

Uh oh!

christiangnrd commented Aug 4, 2025

Uh oh!

christiangnrd commented Aug 5, 2025

Uh oh!

maleadt commented Sep 1, 2025

Uh oh!

vchuravy commented Sep 1, 2025

Uh oh!

Uh oh!

Port reverse from CUDA.jl #609

Are you sure you want to change the base?

Port reverse from CUDA.jl #609

Uh oh!

Conversation

christiangnrd commented Aug 4, 2025

Uh oh!

christiangnrd commented Aug 5, 2025

Uh oh!

maleadt commented Sep 1, 2025

Uh oh!

vchuravy commented Sep 1, 2025

Uh oh!

Uh oh!

Port `reverse` from CUDA.jl #609

Port `reverse` from CUDA.jl #609