You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #2055, we experimented with using bit-packed tile states in the decoupled look-back of algorithms that need to carry the offset type in the decoupled look-back.
While the overall the performance for 64-bit offset types improved when using bit-packed tile states compared to using regular tile states, performance of 64-bit offset types still lags a good bit behind 32-bit offset types.
We want to investigate where the remaining performance degradation comes from. One possibility to mitigate that performance degradation is to use two different offset types within the relevant algorithms: (1) one that is used for indexing items within a tile and (2) one that is used for indexing within global memory.
The text was updated successfully, but these errors were encountered:
elstehle
changed the title
Try to mitigate performance degradation when moving from 32- to 64-bit offset types in DeviceSelect
Try to mitigate performance degradation when moving from 32- to 64-bit offset types when using bit-packed tile states in decoupled look-back
Jul 31, 2024
For future reference, a draft PR is posted here elstehle#3. Despite efforts to mitigate the slowdowns, the worst-case slowdown from using i64 over i32 is still 1.35x using the bit-packed tile state.
In #2055, we experimented with using bit-packed tile states in the decoupled look-back of algorithms that need to carry the offset type in the decoupled look-back.
While the overall the performance for 64-bit offset types improved when using bit-packed tile states compared to using regular tile states, performance of 64-bit offset types still lags a good bit behind 32-bit offset types.
We want to investigate where the remaining performance degradation comes from. One possibility to mitigate that performance degradation is to use two different offset types within the relevant algorithms: (1) one that is used for indexing items within a tile and (2) one that is used for indexing within global memory.
The text was updated successfully, but these errors were encountered: