Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Widen histogram agent constructor to more types #2380

Merged
merged 1 commit into from
Sep 6, 2024

Conversation

bernhardmgruber
Copy link
Contributor

This allows to accept more data types beyond arrays of exact static sizes.

No SASS changes on CUB device histogram test with CTK 12.6.

Fixes #1877 for AgentHistogram

This allows to accept more data types beyond arrays of exact static sizes.

No SASS changes on CUB device histogram test with CTK 12.6.

Fixes NVIDIA#1877 for AgentHistogram
@bernhardmgruber bernhardmgruber added the cub For all items related to CUB label Sep 6, 2024
Copy link
Contributor

github-actions bot commented Sep 6, 2024

🟨 CI finished in 3h 52m: Pass: 99%/251 | Total: 1d 16h | Avg: 9m 42s | Max: 47m 51s | Hits: 98%/24387
  • 🟨 thrust: Pass: 99%/118 | Total: 14h 15m | Avg: 7m 15s | Max: 24m 35s | Hits: 99%/20079

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  99%/110 | Total: 13h 26m | Avg:  7m 20s | Max: 24m 35s | Hits:  99%/20079 
      🟩 arm64              Pass: 100%/8   | Total: 49m 06s | Avg:  6m 08s | Max:  7m 49s
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total:  1h 11m | Avg:  4m 45s | Max: 15m 58s | Hits:  99%/2231  
      🟩 11.8               Pass: 100%/3   | Total: 14m 43s | Avg:  4m 54s | Max:  5m 11s
      🔍 12.5               Pass:  99%/100 | Total: 12h 49m | Avg:  7m 41s | Max: 24m 35s | Hits:  99%/17848 
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  9m 13s | Avg:  4m 36s | Max:  4m 40s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 11m | Avg:  4m 45s | Max: 15m 58s | Hits:  99%/2231  
      🟩 nvcc11.8           Pass: 100%/3   | Total: 14m 43s | Avg:  4m 54s | Max:  5m 11s
      🔍 nvcc12.5           Pass:  98%/98  | Total: 12h 40m | Avg:  7m 45s | Max: 24m 35s | Hits:  99%/17848 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 13s | Avg:  4m 36s | Max:  4m 40s
      🔍 nvcc               Pass:  99%/116 | Total: 14h 06m | Avg:  7m 17s | Max: 24m 35s | Hits:  99%/20079 
    🔍 cxx: Clang17 🔍
      🟩 Clang9             Pass: 100%/6   | Total: 30m 24s | Avg:  5m 04s | Max:  6m 02s
      🟩 Clang10            Pass: 100%/3   | Total: 18m 45s | Avg:  6m 15s | Max:  6m 54s
      🟩 Clang11            Pass: 100%/4   | Total: 18m 42s | Avg:  4m 40s | Max:  4m 58s
      🟩 Clang12            Pass: 100%/4   | Total: 18m 17s | Avg:  4m 34s | Max:  4m 44s
      🟩 Clang13            Pass: 100%/4   | Total: 19m 24s | Avg:  4m 51s | Max:  5m 10s
      🟩 Clang14            Pass: 100%/4   | Total: 19m 18s | Avg:  4m 49s | Max:  5m 05s
      🟩 Clang15            Pass: 100%/4   | Total: 19m 44s | Avg:  4m 56s | Max:  5m 09s
      🟩 Clang16            Pass: 100%/4   | Total: 20m 08s | Avg:  5m 02s | Max:  5m 14s
      🔍 Clang17            Pass:  94%/18  | Total:  2h 42m | Avg:  9m 02s | Max: 23m 38s
      🟩 GCC6               Pass: 100%/2   | Total:  7m 21s | Avg:  3m 40s | Max:  3m 51s
      🟩 GCC7               Pass: 100%/6   | Total: 24m 38s | Avg:  4m 06s | Max:  4m 49s
      🟩 GCC8               Pass: 100%/6   | Total: 24m 46s | Avg:  4m 07s | Max:  4m 34s
      🟩 GCC9               Pass: 100%/6   | Total: 25m 23s | Avg:  4m 13s | Max:  4m 41s
      🟩 GCC10              Pass: 100%/4   | Total: 19m 12s | Avg:  4m 48s | Max:  5m 04s
      🟩 GCC11              Pass: 100%/7   | Total: 35m 20s | Avg:  5m 02s | Max:  5m 36s
      🟩 GCC12              Pass: 100%/4   | Total: 19m 50s | Avg:  4m 57s | Max:  5m 19s
      🟩 GCC13              Pass: 100%/20  | Total:  3h 14m | Avg:  9m 42s | Max: 24m 35s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 17m 50s | Avg:  5m 56s | Max:  6m 21s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 15m 58s | Avg: 15m 58s | Max: 15m 58s | Hits:  99%/2231  
      🟩 MSVC14.29          Pass: 100%/2   | Total: 30m 55s | Avg: 15m 27s | Max: 15m 42s | Hits:  99%/4462  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  1h 53m | Avg: 18m 50s | Max: 22m 35s | Hits:  99%/13386 
    🔍 cxx_family: Clang 🔍
      🔍 Clang              Pass:  98%/51  | Total:  5h 27m | Avg:  6m 25s | Max: 23m 38s
      🟩 GCC                Pass: 100%/55  | Total:  5h 50m | Avg:  6m 22s | Max: 24m 35s
      🟩 Intel              Pass: 100%/3   | Total: 17m 50s | Avg:  5m 56s | Max:  6m 21s
      🟩 MSVC               Pass: 100%/9   | Total:  2h 39m | Avg: 17m 46s | Max: 22m 35s | Hits:  99%/20079 
    🔍 jobs: TestGPU 🔍
      🟩 Build              Pass: 100%/99  | Total:  9h 09m | Avg:  5m 33s | Max: 17m 37s | Hits:  99%/13386 
      🟩 TestCPU            Pass: 100%/11  | Total:  2h 03m | Avg: 11m 14s | Max: 22m 35s | Hits:  99%/6693  
      🔍 TestGPU            Pass:  87%/8   | Total:  3h 02m | Avg: 22m 47s | Max: 24m 35s
    🔍 std: 14 🔍
      🟩 11                 Pass: 100%/30  | Total:  2h 55m | Avg:  5m 50s | Max: 23m 16s
      🔍 14                 Pass:  97%/34  | Total:  4h 16m | Avg:  7m 32s | Max: 24m 35s | Hits:  99%/8924  
      🟩 17                 Pass: 100%/33  | Total:  4h 05m | Avg:  7m 26s | Max: 24m 08s | Hits:  99%/6693  
      🟩 20                 Pass: 100%/21  | Total:  2h 58m | Avg:  8m 29s | Max: 23m 51s | Hits:  99%/4462  
    🟨 gpu
      🟨 v100               Pass:  99%/118 | Total: 14h 15m | Avg:  7m 15s | Max: 24m 35s | Hits:  99%/20079 
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 14m 43s | Avg:  4m 54s | Max:  5m 11s
      🟩 90a                Pass: 100%/4   | Total: 16m 32s | Avg:  4m 08s | Max:  4m 21s
    
  • 🟩 cub: Pass: 100%/132 | Total: 1d 02h | Avg: 11m 50s | Max: 47m 51s | Hits: 96%/4308

    🟩 cpu
      🟩 amd64              Pass: 100%/124 | Total:  1d 00h | Avg: 12m 03s | Max: 47m 51s | Hits:  96%/4308  
      🟩 arm64              Pass: 100%/8   | Total:  1h 09m | Avg:  8m 40s | Max: 11m 08s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  2h 15m | Avg:  9m 02s | Max: 45m 57s | Hits:  96%/718   
      🟩 11.8               Pass: 100%/3   | Total: 24m 33s | Avg:  8m 11s | Max:  8m 52s
      🟩 12.5               Pass: 100%/114 | Total: 23h 23m | Avg: 12m 18s | Max: 47m 51s | Hits:  96%/3590  
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 10m 29s | Avg:  5m 14s | Max:  5m 37s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  2h 15m | Avg:  9m 02s | Max: 45m 57s | Hits:  96%/718   
      🟩 nvcc11.8           Pass: 100%/3   | Total: 24m 33s | Avg:  8m 11s | Max:  8m 52s
      🟩 nvcc12.5           Pass: 100%/112 | Total: 23h 13m | Avg: 12m 26s | Max: 47m 51s | Hits:  96%/3590  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 29s | Avg:  5m 14s | Max:  5m 37s
      🟩 nvcc               Pass: 100%/130 | Total:  1d 01h | Avg: 11m 56s | Max: 47m 51s | Hits:  96%/4308  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 38m 18s | Avg:  6m 23s | Max:  7m 40s
      🟩 Clang10            Pass: 100%/3   | Total: 21m 56s | Avg:  7m 18s | Max:  7m 36s
      🟩 Clang11            Pass: 100%/4   | Total: 25m 48s | Avg:  6m 27s | Max:  7m 56s
      🟩 Clang12            Pass: 100%/4   | Total: 25m 54s | Avg:  6m 28s | Max:  7m 18s
      🟩 Clang13            Pass: 100%/4   | Total: 25m 49s | Avg:  6m 27s | Max:  6m 58s
      🟩 Clang14            Pass: 100%/4   | Total: 26m 23s | Avg:  6m 35s | Max:  7m 47s
      🟩 Clang15            Pass: 100%/4   | Total: 25m 38s | Avg:  6m 24s | Max:  6m 59s
      🟩 Clang16            Pass: 100%/4   | Total: 25m 42s | Avg:  6m 25s | Max:  6m 56s
      🟩 Clang17            Pass: 100%/26  | Total:  7h 40m | Avg: 17m 41s | Max: 42m 54s
      🟩 GCC6               Pass: 100%/2   | Total: 12m 06s | Avg:  6m 03s | Max:  6m 11s
      🟩 GCC7               Pass: 100%/6   | Total:  1h 17m | Avg: 12m 53s | Max: 45m 57s
      🟩 GCC8               Pass: 100%/6   | Total: 36m 13s | Avg:  6m 02s | Max:  6m 47s
      🟩 GCC9               Pass: 100%/6   | Total: 35m 27s | Avg:  5m 54s | Max:  6m 40s
      🟩 GCC10              Pass: 100%/4   | Total: 26m 17s | Avg:  6m 34s | Max:  7m 22s
      🟩 GCC11              Pass: 100%/7   | Total: 51m 11s | Avg:  7m 18s | Max:  8m 52s
      🟩 GCC12              Pass: 100%/4   | Total: 28m 04s | Avg:  7m 01s | Max:  7m 36s
      🟩 GCC13              Pass: 100%/29  | Total:  8h 27m | Avg: 17m 29s | Max: 47m 51s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 22m 38s | Avg:  7m 32s | Max:  8m 15s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 15m 59s | Avg: 15m 59s | Max: 15m 59s | Hits:  96%/718   
      🟩 MSVC14.29          Pass: 100%/2   | Total: 29m 46s | Avg: 14m 53s | Max: 16m 27s | Hits:  96%/1436  
      🟩 MSVC14.39          Pass: 100%/3   | Total: 46m 16s | Avg: 15m 25s | Max: 16m 19s | Hits:  96%/2154  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total: 11h 15m | Avg: 11m 26s | Max: 42m 54s
      🟩 GCC                Pass: 100%/64  | Total: 12h 53m | Avg: 12m 05s | Max: 47m 51s
      🟩 Intel              Pass: 100%/3   | Total: 22m 38s | Avg:  7m 32s | Max:  8m 15s
      🟩 MSVC               Pass: 100%/6   | Total:  1h 32m | Avg: 15m 20s | Max: 16m 27s | Hits:  96%/4308  
    🟩 gpu
      🟩 v100               Pass: 100%/132 | Total:  1d 02h | Avg: 11m 50s | Max: 47m 51s | Hits:  96%/4308  
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total: 12h 30m | Avg:  7m 34s | Max: 45m 57s | Hits:  96%/4308  
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 54m | Avg: 21m 49s | Max: 26m 34s
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 31m | Avg: 18m 59s | Max: 24m 44s
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 53m | Avg: 21m 39s | Max: 27m 00s
      🟩 SmallGMem          Pass: 100%/1   | Total: 47m 51s | Avg: 47m 51s | Max: 47m 51s
      🟩 TestGPU            Pass: 100%/8   | Total:  4h 25m | Avg: 33m 13s | Max: 42m 54s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 24m 33s | Avg:  8m 11s | Max:  8m 52s
      🟩 90a                Pass: 100%/4   | Total: 18m 40s | Avg:  4m 40s | Max:  5m 03s
    🟩 std
      🟩 11                 Pass: 100%/34  | Total:  6h 02m | Avg: 10m 40s | Max: 37m 24s
      🟩 14                 Pass: 100%/37  | Total:  6h 46m | Avg: 10m 59s | Max: 31m 13s | Hits:  96%/2154  
      🟩 17                 Pass: 100%/37  | Total:  7h 57m | Avg: 12m 54s | Max: 47m 51s | Hits:  96%/1436  
      🟩 20                 Pass: 100%/24  | Total:  5h 16m | Avg: 13m 11s | Max: 42m 54s | Hits:  96%/718   
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
pycuda
CUDA C Core Library

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda
+/- CUDA C Core Library

🏃‍ Runner counts (total jobs: 251)

# Runner
178 linux-amd64-cpu16
42 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@bernhardmgruber bernhardmgruber marked this pull request as ready for review September 6, 2024 14:52
@bernhardmgruber bernhardmgruber requested review from a team as code owners September 6, 2024 14:52
Copy link
Contributor

github-actions bot commented Sep 6, 2024

🟩 CI finished in 6h 06m: Pass: 100%/251 | Total: 1d 16h | Avg: 9m 39s | Max: 47m 51s | Hits: 98%/24387
  • 🟩 cub: Pass: 100%/132 | Total: 1d 02h | Avg: 11m 50s | Max: 47m 51s | Hits: 96%/4308

    🟩 cpu
      🟩 amd64              Pass: 100%/124 | Total:  1d 00h | Avg: 12m 03s | Max: 47m 51s | Hits:  96%/4308  
      🟩 arm64              Pass: 100%/8   | Total:  1h 09m | Avg:  8m 40s | Max: 11m 08s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  2h 15m | Avg:  9m 02s | Max: 45m 57s | Hits:  96%/718   
      🟩 11.8               Pass: 100%/3   | Total: 24m 33s | Avg:  8m 11s | Max:  8m 52s
      🟩 12.5               Pass: 100%/114 | Total: 23h 23m | Avg: 12m 18s | Max: 47m 51s | Hits:  96%/3590  
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 10m 29s | Avg:  5m 14s | Max:  5m 37s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  2h 15m | Avg:  9m 02s | Max: 45m 57s | Hits:  96%/718   
      🟩 nvcc11.8           Pass: 100%/3   | Total: 24m 33s | Avg:  8m 11s | Max:  8m 52s
      🟩 nvcc12.5           Pass: 100%/112 | Total: 23h 13m | Avg: 12m 26s | Max: 47m 51s | Hits:  96%/3590  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 29s | Avg:  5m 14s | Max:  5m 37s
      🟩 nvcc               Pass: 100%/130 | Total:  1d 01h | Avg: 11m 56s | Max: 47m 51s | Hits:  96%/4308  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 38m 18s | Avg:  6m 23s | Max:  7m 40s
      🟩 Clang10            Pass: 100%/3   | Total: 21m 56s | Avg:  7m 18s | Max:  7m 36s
      🟩 Clang11            Pass: 100%/4   | Total: 25m 48s | Avg:  6m 27s | Max:  7m 56s
      🟩 Clang12            Pass: 100%/4   | Total: 25m 54s | Avg:  6m 28s | Max:  7m 18s
      🟩 Clang13            Pass: 100%/4   | Total: 25m 49s | Avg:  6m 27s | Max:  6m 58s
      🟩 Clang14            Pass: 100%/4   | Total: 26m 23s | Avg:  6m 35s | Max:  7m 47s
      🟩 Clang15            Pass: 100%/4   | Total: 25m 38s | Avg:  6m 24s | Max:  6m 59s
      🟩 Clang16            Pass: 100%/4   | Total: 25m 42s | Avg:  6m 25s | Max:  6m 56s
      🟩 Clang17            Pass: 100%/26  | Total:  7h 40m | Avg: 17m 41s | Max: 42m 54s
      🟩 GCC6               Pass: 100%/2   | Total: 12m 06s | Avg:  6m 03s | Max:  6m 11s
      🟩 GCC7               Pass: 100%/6   | Total:  1h 17m | Avg: 12m 53s | Max: 45m 57s
      🟩 GCC8               Pass: 100%/6   | Total: 36m 13s | Avg:  6m 02s | Max:  6m 47s
      🟩 GCC9               Pass: 100%/6   | Total: 35m 27s | Avg:  5m 54s | Max:  6m 40s
      🟩 GCC10              Pass: 100%/4   | Total: 26m 17s | Avg:  6m 34s | Max:  7m 22s
      🟩 GCC11              Pass: 100%/7   | Total: 51m 11s | Avg:  7m 18s | Max:  8m 52s
      🟩 GCC12              Pass: 100%/4   | Total: 28m 04s | Avg:  7m 01s | Max:  7m 36s
      🟩 GCC13              Pass: 100%/29  | Total:  8h 27m | Avg: 17m 29s | Max: 47m 51s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 22m 38s | Avg:  7m 32s | Max:  8m 15s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 15m 59s | Avg: 15m 59s | Max: 15m 59s | Hits:  96%/718   
      🟩 MSVC14.29          Pass: 100%/2   | Total: 29m 46s | Avg: 14m 53s | Max: 16m 27s | Hits:  96%/1436  
      🟩 MSVC14.39          Pass: 100%/3   | Total: 46m 16s | Avg: 15m 25s | Max: 16m 19s | Hits:  96%/2154  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total: 11h 15m | Avg: 11m 26s | Max: 42m 54s
      🟩 GCC                Pass: 100%/64  | Total: 12h 53m | Avg: 12m 05s | Max: 47m 51s
      🟩 Intel              Pass: 100%/3   | Total: 22m 38s | Avg:  7m 32s | Max:  8m 15s
      🟩 MSVC               Pass: 100%/6   | Total:  1h 32m | Avg: 15m 20s | Max: 16m 27s | Hits:  96%/4308  
    🟩 gpu
      🟩 v100               Pass: 100%/132 | Total:  1d 02h | Avg: 11m 50s | Max: 47m 51s | Hits:  96%/4308  
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total: 12h 30m | Avg:  7m 34s | Max: 45m 57s | Hits:  96%/4308  
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 54m | Avg: 21m 49s | Max: 26m 34s
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 31m | Avg: 18m 59s | Max: 24m 44s
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 53m | Avg: 21m 39s | Max: 27m 00s
      🟩 SmallGMem          Pass: 100%/1   | Total: 47m 51s | Avg: 47m 51s | Max: 47m 51s
      🟩 TestGPU            Pass: 100%/8   | Total:  4h 25m | Avg: 33m 13s | Max: 42m 54s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 24m 33s | Avg:  8m 11s | Max:  8m 52s
      🟩 90a                Pass: 100%/4   | Total: 18m 40s | Avg:  4m 40s | Max:  5m 03s
    🟩 std
      🟩 11                 Pass: 100%/34  | Total:  6h 02m | Avg: 10m 40s | Max: 37m 24s
      🟩 14                 Pass: 100%/37  | Total:  6h 46m | Avg: 10m 59s | Max: 31m 13s | Hits:  96%/2154  
      🟩 17                 Pass: 100%/37  | Total:  7h 57m | Avg: 12m 54s | Max: 47m 51s | Hits:  96%/1436  
      🟩 20                 Pass: 100%/24  | Total:  5h 16m | Avg: 13m 11s | Max: 42m 54s | Hits:  96%/718   
    
  • 🟩 thrust: Pass: 100%/118 | Total: 14h 02m | Avg: 7m 08s | Max: 24m 35s | Hits: 99%/20079

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total: 13h 13m | Avg:  7m 12s | Max: 24m 35s | Hits:  99%/20079 
      🟩 arm64              Pass: 100%/8   | Total: 49m 06s | Avg:  6m 08s | Max:  7m 49s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  1h 11m | Avg:  4m 45s | Max: 15m 58s | Hits:  99%/2231  
      🟩 11.8               Pass: 100%/3   | Total: 14m 43s | Avg:  4m 54s | Max:  5m 11s
      🟩 12.5               Pass: 100%/100 | Total: 12h 36m | Avg:  7m 33s | Max: 24m 35s | Hits:  99%/17848 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  9m 13s | Avg:  4m 36s | Max:  4m 40s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 11m | Avg:  4m 45s | Max: 15m 58s | Hits:  99%/2231  
      🟩 nvcc11.8           Pass: 100%/3   | Total: 14m 43s | Avg:  4m 54s | Max:  5m 11s
      🟩 nvcc12.5           Pass: 100%/98  | Total: 12h 27m | Avg:  7m 37s | Max: 24m 35s | Hits:  99%/17848 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 13s | Avg:  4m 36s | Max:  4m 40s
      🟩 nvcc               Pass: 100%/116 | Total: 13h 53m | Avg:  7m 11s | Max: 24m 35s | Hits:  99%/20079 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 30m 24s | Avg:  5m 04s | Max:  6m 02s
      🟩 Clang10            Pass: 100%/3   | Total: 18m 45s | Avg:  6m 15s | Max:  6m 54s
      🟩 Clang11            Pass: 100%/4   | Total: 18m 42s | Avg:  4m 40s | Max:  4m 58s
      🟩 Clang12            Pass: 100%/4   | Total: 18m 17s | Avg:  4m 34s | Max:  4m 44s
      🟩 Clang13            Pass: 100%/4   | Total: 19m 24s | Avg:  4m 51s | Max:  5m 10s
      🟩 Clang14            Pass: 100%/4   | Total: 19m 18s | Avg:  4m 49s | Max:  5m 05s
      🟩 Clang15            Pass: 100%/4   | Total: 19m 44s | Avg:  4m 56s | Max:  5m 09s
      🟩 Clang16            Pass: 100%/4   | Total: 20m 08s | Avg:  5m 02s | Max:  5m 14s
      🟩 Clang17            Pass: 100%/18  | Total:  2h 29m | Avg:  8m 18s | Max: 23m 38s
      🟩 GCC6               Pass: 100%/2   | Total:  7m 21s | Avg:  3m 40s | Max:  3m 51s
      🟩 GCC7               Pass: 100%/6   | Total: 24m 38s | Avg:  4m 06s | Max:  4m 49s
      🟩 GCC8               Pass: 100%/6   | Total: 24m 46s | Avg:  4m 07s | Max:  4m 34s
      🟩 GCC9               Pass: 100%/6   | Total: 25m 23s | Avg:  4m 13s | Max:  4m 41s
      🟩 GCC10              Pass: 100%/4   | Total: 19m 12s | Avg:  4m 48s | Max:  5m 04s
      🟩 GCC11              Pass: 100%/7   | Total: 35m 20s | Avg:  5m 02s | Max:  5m 36s
      🟩 GCC12              Pass: 100%/4   | Total: 19m 50s | Avg:  4m 57s | Max:  5m 19s
      🟩 GCC13              Pass: 100%/20  | Total:  3h 14m | Avg:  9m 42s | Max: 24m 35s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 17m 50s | Avg:  5m 56s | Max:  6m 21s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 15m 58s | Avg: 15m 58s | Max: 15m 58s | Hits:  99%/2231  
      🟩 MSVC14.29          Pass: 100%/2   | Total: 30m 55s | Avg: 15m 27s | Max: 15m 42s | Hits:  99%/4462  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  1h 53m | Avg: 18m 50s | Max: 22m 35s | Hits:  99%/13386 
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total:  5h 14m | Avg:  6m 09s | Max: 23m 38s
      🟩 GCC                Pass: 100%/55  | Total:  5h 50m | Avg:  6m 22s | Max: 24m 35s
      🟩 Intel              Pass: 100%/3   | Total: 17m 50s | Avg:  5m 56s | Max:  6m 21s
      🟩 MSVC               Pass: 100%/9   | Total:  2h 39m | Avg: 17m 46s | Max: 22m 35s | Hits:  99%/20079 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total: 14h 02m | Avg:  7m 08s | Max: 24m 35s | Hits:  99%/20079 
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  9h 09m | Avg:  5m 33s | Max: 17m 37s | Hits:  99%/13386 
      🟩 TestCPU            Pass: 100%/11  | Total:  2h 03m | Avg: 11m 14s | Max: 22m 35s | Hits:  99%/6693  
      🟩 TestGPU            Pass: 100%/8   | Total:  2h 49m | Avg: 21m 09s | Max: 24m 35s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 14m 43s | Avg:  4m 54s | Max:  5m 11s
      🟩 90a                Pass: 100%/4   | Total: 16m 32s | Avg:  4m 08s | Max:  4m 21s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  2h 55m | Avg:  5m 50s | Max: 23m 16s
      🟩 14                 Pass: 100%/34  | Total:  4h 03m | Avg:  7m 09s | Max: 24m 35s | Hits:  99%/8924  
      🟩 17                 Pass: 100%/33  | Total:  4h 05m | Avg:  7m 26s | Max: 24m 08s | Hits:  99%/6693  
      🟩 20                 Pass: 100%/21  | Total:  2h 58m | Avg:  8m 29s | Max: 23m 51s | Hits:  99%/4462  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 16m 05s | Avg: 16m 05s | Max: 16m 05s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
pycuda
CUDA C Core Library

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda
+/- CUDA C Core Library

🏃‍ Runner counts (total jobs: 251)

# Runner
178 linux-amd64-cpu16
42 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@fbusato
Copy link
Contributor

fbusato commented Sep 6, 2024

just to make sure I'm not missing something. The idea here is to extend the constructor to anything that can be converted to a pointer.

Copy link
Contributor

@fbusato fbusato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bernhardmgruber
Copy link
Contributor Author

just to make sure I'm not missing something. The idea here is to extend the constructor to anything that can be converted to a pointer.

Yes. Previously, the constructor only accepted a C-style array of matching length. Now the constructor accepts a contiguous sequence of data from all kinds of sources. This allows us to use the .data() member of cuda::std::array instead of passing it's inner array __elems_, which relies on implementation specifics. So the motivation is more to get rid of a workaround on our side, rather than enabling more for users.

@bernhardmgruber bernhardmgruber merged commit fcf7c91 into NVIDIA:main Sep 6, 2024
267 checks passed
@bernhardmgruber bernhardmgruber deleted the hist_agent branch September 6, 2024 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cub For all items related to CUB
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA]: Make CUB block algorithms usable with cuda::std::array
2 participants