-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA]: Ampere mbarrier support for barriers with non-default completion function #419
Comments
I don't know how to implement what you are asking for without a bunch of additional state and additional atomics + branches on, well, somewhat hot paths. If a barrier is completed by an async thread, who calls the completion function? Do you want every call to I also don't understand where this need comes from; a user will be able to do a |
An async thread that completes a There are a bunch of ways to implement this:
If the question is, what are completion functions useful for, we have a tutorial showing how to use them to perform a reduction in the programming guide: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#completion-function |
No, I mean specifically the case you are describing with expect_tx. |
I'm not describing anything about expect_tx in this issue. I think you are looking for: #420 |
You used expect_tx as a motivating example here, but I guess that #420 is that motivating example. |
Yes, that example is common enough that it deserves a built in solution; this issue is just for general purpose support for completion functions. |
Is this a duplicate?
Area
libcu++
Is your feature request related to a problem? Please describe.
We are not using mbarriers for
cuda::barrier<thread_scope_block, userdefined>
. We should use mbarrier for those, since in Hopper the userdefined completion function is required to automatically perform anexpect_tx
operation during the phase completion step.Describe the solution you'd like
We are not using mbarriers for
cuda::barrier<thread_scope_block, userdefined>
. We should use mbarrier for those, since in Hopper the userdefined completion function is required to automatically perform anexpect_tx
operation during the phase completion step.Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: