Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prune CUB's ChainedPolicy by __CUDA_ARCH_LIST__ #2154
base: main
Are you sure you want to change the base?
Prune CUB's ChainedPolicy by __CUDA_ARCH_LIST__ #2154
Changes from all commits
83d4192
86040f3
5789029
c2b6385
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggested: as described below, this is safe only when we have namespace magic. Although this leads to UB when linking TUs compiled for different architecture sets, old chained policy would at least find a policy to get executed. New chained policy would not. So, I'd prefer something along the lines of:
but rapids is going to disable namespace magic, so a change like that would make this PR less useful. I guess if we change
cudaError_t e = cudaSuccess;
to return an actual error as suggested below, we'll at least catch the problem at runtime. I'd recommend adding a note close to the error we are going to return when no arch fromCudaArches
matched device ptx suggesting users to re-enable namespace magic or wrap namespace.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
important: there are a few potential situations that could lead us to a situation where
device_ptx_version
is outside ofCudaArches
. For instance, change inPtxVersion
function I described below, or disabled namespace magic. This situation would lead to this function returningcudaSuccess
while not invoking any algorithms. Given that it'd be a corrupted use case that we want to report to the user, I'd prefer having something other thancudaSuccess
as a default value here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: this approach works because
device_ptx_version
is not CC of the current device but rather PTX version closest to current device CC from the set of PTX versions thatcub::EmptyKernel
was compiled for. When magic namespace is enabled, this property provides us a guarantee thatdevice_ptx_version
is one of theCudaArches
becausecub::EmptyKernel
was compiled forCudaArches
. At some point we discussed switching to a querying CC of current device directly instead of using empty kernel. This change would be one of the reasons for us not to do that, because thendevice_ptx_version
could be outside ofCudaArches
, leaving algorithm not executed when someone compiled for, say, on Ampere but tried running code on Ada. I'd suggest adding a note somewhere onPtxVersion
saying that we should always query CC from empty kernel for that reason.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here was the issue for the alternative approach: #898