-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace _CCCL_ALWAYS_INLINE
with _CCCL_FORCEINLINE
#2439
base: main
Are you sure you want to change the base?
Conversation
🟨 CI finished in 1h 19m: Pass: 99%/368 | Total: 2d 03h | Avg: 8m 22s | Max: 49m 36s | Hits: 74%/25647
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
+/- | Thrust |
+/- | CUDA Experimental |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 368)
# | Runner |
---|---|
297 | linux-amd64-cpu16 |
28 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
🟩 CI finished in 2h 32m: Pass: 100%/368 | Total: 2d 03h | Avg: 8m 24s | Max: 49m 36s | Hits: 74%/25647
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
+/- | Thrust |
+/- | CUDA Experimental |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 368)
# | Runner |
---|---|
297 | linux-amd64-cpu16 |
28 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
template <class Vector> | ||
void TestTransformInputOutputIterator() | ||
THRUST_DISABLE_BROKEN_GCC_VECTORIZER void TestTransformInputOutputIterator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes our tests, but won't gcc still be miscompiling Thrust for users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is nothing we can change. I want to note that this is exceptionally frickle and dependent on exact sizes and optimization settings, so I dont see anything we can do there
@@ -39,7 +39,7 @@ struct wrapped_function | |||
|
|||
_CCCL_EXEC_CHECK_DISABLE | |||
template <typename... Ts> | |||
_CCCL_FORCEINLINE _CCCL_HOST_DEVICE Result operator()(Ts&&... args) const | |||
inline _CCCL_HOST_DEVICE Result operator()(Ts&&... args) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miscco at least locally, this change avoids the gcc optimizer issue.
/ok to test |
🟨 CI finished in 2h 00m: Pass: 99%/368 | Total: 7d 00h | Avg: 27m 29s | Max: 1h 25m | Hits: 54%/25647
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
+/- | Thrust |
+/- | CUDA Experimental |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 368)
# | Runner |
---|---|
297 | linux-amd64-cpu16 |
28 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
Description
cccl has
_CCCL_FORCEINLINE
and_CCCL_ALWAYS_INLINE
. there should be only one. also,_CCCL_FORCEINLINE
currently expands toinline
when not using a CUDA compiler. that is unexpected. it should expand to either__attribute__((always_inline))
or__forceinline
depending on which is supported by the host compiler.closes #2438
This PR moves the definition of
_CCCL_FORCEINLINE
fromexecution_space.h
tovisibility.h
. it also changes the definition to expand directly to either__inline__ __attribute__((always_inline))
or__forceinline
rather then indirectly through the__forceinline__
macro defined inhost_defines.h
.Checklist