Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: GH200 DeviceReduce performance: 14x (<1 GiB) and 2x (>1 GiB) lower than SOL #437

Open
2 of 5 tasks
gonzalobg opened this issue Sep 12, 2023 · 0 comments
Open
2 of 5 tasks
Labels
bug Something isn't working right.

Comments

@gonzalobg
Copy link
Collaborator

gonzalobg commented Sep 12, 2023

Is this a duplicate?

Type of Bug

Performance

Component

Thrust

Describe the bug

Improve the performance of thrust::reduce and transform_reduce by 14x for < 1 GB input sizes, and by 2x for >1 GiB sizes.

This requires fixing the following bugs:

How to Reproduce

Run DeviceReduce and compare it against SOL throughput.

Expected behavior

DeviceReduce should not be more than an order-of-magnitude slower than SOL.

Reproduction link

Internal link available.

Operating System

Linux.

nvidia-smi output

GH200

NVCC version

Any.

@gonzalobg gonzalobg added the bug Something isn't working right. label Sep 12, 2023
@gonzalobg gonzalobg changed the title [BUG]: improving reductions < 1 GB by 10x [BUG]: GH200 DeviceReduce performance: 14x (<1 GiB) and 2x (>1 GiB) lower than SOL Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right.
Projects
Status: Todo
Development

No branches or pull requests

1 participant