-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA]: Experimental PTX wrappers to expose TMA #359
Comments
@jrhemstad @griwes @gonzalobg, @vishalmehta1991 : I have created a minimal proposal to get preliminary support for some TMA instructions out there in libcu++. If you could provide feedback on the location of the header, the deprecation policy, and how to go about this in general, that would be great! |
Personally I am against experimental headers. We just recently added an experimental feature by guarding it behind a feature flag ( That way, when the experimental phase is over a user has nothing to do as the only code change is removal of the feature flag guard. |
Moreover, I see no benefit in experimental features if it isn't clear that we can break them at any time. (Obviously with clear justification and a replacement) |
I think it makes sense to move the header out of experimental, since the plan is to not evolve and therefore not break these APIs. |
@miscco this is a stop-gap implementation that we fully intend to remove in the future. |
For each of the exposed instructions (perhaps except the 2D bulk one), there is a 1:1 correspondence to the planned exposure in #38. So these features should be deprecatable in a painless fashion. |
I would rather not start adding experimental headers, but use a feature flag. That is much easier to remove |
@jrhemstad the reason this stop-gap implementation is being added, is to teach users to use it in the programming guide. We can't never remove this; it would break users that read the docs and use these APIs. |
It's no different than any other API in that we are free to deprecate and remove it at a major version change, right? |
You can try, but given the time frame between these wrappers and their replacement landing, and the extremely high demand that has been expressed for these wrappers, I'm skeptical we will ever be able to remove them. |
Is this a duplicate?
Area
libcu++
Is your feature request related to a problem? Please describe.
I would like to use TMA features, even if in a somewhat rough fashion using functions with a C-like signature that simply wrap PTX instructions.
Describe the solution you'd like
Location
Provide the header
<cuda/experimental/ctk_12_features>
that exposes a number of new PTX instructions, especially around TMA.PTX instructions
cp.async.bulk
: both shared to global and global to sharedfence.proxy.async.shared.cta
cp.async.bulk.commit.group
cp.async.bulk.wait.group.read
.read
should be fine.cp.async.bulk.tensor.2d
: both global to shared and shared to globalHigher dimensions: not sure if we should expose 3d, 4d, and 5d. Open to
suggestions.
mbarrier.arrive.expect_tx
: This one has so many integration issues with thebarrier::arrival_token
that I have opened a separate PR #354 to get it included.Deprecation policy
We can support these wrappers during the CTK 12 life-cycle, mark them deprecated for CTK 13 cycle, and remove them in the CTK 14 cycle.
Describe alternatives you've considered
Many alternatives were considered. It may not make sense to list them in this GitHub issue.
Additional context
No response
The text was updated successfully, but these errors were encountered: