Enable hash computation from variable length keys #327

sleeepyjack · 2023-06-30T00:22:54Z

This PR enables hash computation of key types whose sizes are only known at runtime.

sleeepyjack · 2023-06-30T14:04:33Z

Murmur3 doesn't play nice with the dynamic size. For now, I just implemented this idea for xxhash.

sleeepyjack · 2023-06-30T14:05:42Z

I'm seeing a few build warnings in the tests:

In member function ‘constexpr cuco::detail::XXHash_64<Key>::result_type cuco::detail::XXHash_64<Key>::operator()(const Key&, Extent) const [with Extent = long unsigned int; Key = int]’,
    inlined from ‘void CATCH2_INTERNAL_TEMPLATE_TEST_8() [with Hash = cuco::detail::XXHash_64<int>]’ at /workspaces/cuCollections/tests/utility/hash_test.cu:170:33:
/workspaces/cuCollections/include/cuco/detail/hash_functions/xxhash.cuh:296:38: warning: ‘<anonymous>’ may be used uninitialized [-Wmaybe-uninitialized]
  296 |         v2 += blocks8[pipeline_offset + 1] * prime2;
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/workspaces/cuCollections/tests/utility/hash_test.cu: In function ‘void CATCH2_INTERNAL_TEMPLATE_TEST_8() [with Hash = cuco::detail::XXHash_64<int>]’:
/workspaces/cuCollections/tests/utility/hash_test.cu:170:75: note: ‘<anonymous>’ declared here
  170 |     CHECK(hash(42) == hash(42, key_size));
      |                                                                           ^             
In member function ‘constexpr cuco::detail::XXHash_64<Key>::result_type cuco::detail::XXHash_64<Key>::operator()(const Key&, Extent) const [with Extent = long unsigned int; Key = int]’,
    inlined from ‘void CATCH2_INTERNAL_TEMPLATE_TEST_8() [with Hash = cuco::detail::XXHash_64<int>]’ at /workspaces/cuCollections/tests/utility/hash_test.cu:170:33:
/workspaces/cuCollections/include/cuco/detail/hash_functions/xxhash.cuh:299:38: warning: ‘<anonymous>’ may be used uninitialized [-Wmaybe-uninitialized]
  299 |         v3 += blocks8[pipeline_offset + 2] * prime2;
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/workspaces/cuCollections/tests/utility/hash_test.cu: In function ‘void CATCH2_INTERNAL_TEMPLATE_TEST_8() [with Hash = cuco::detail::XXHash_64<int>]’:
/workspaces/cuCollections/tests/utility/hash_test.cu:170:75: note: ‘<anonymous>’ declared here
  170 |     CHECK(hash(42) == hash(42, key_size));
      |                                                                           ^             
In member function ‘constexpr cuco::detail::XXHash_64<Key>::result_type cuco::detail::XXHash_64<Key>::operator()(const Key&, Extent) const [with Extent = long unsigned int; Key = int]’,
    inlined from ‘void CATCH2_INTERNAL_TEMPLATE_TEST_8() [with Hash = cuco::detail::XXHash_64<int>]’ at /workspaces/cuCollections/tests/utility/hash_test.cu:170:33:
/workspaces/cuCollections/include/cuco/detail/hash_functions/xxhash.cuh:302:38: warning: ‘<anonymous>’ may be used uninitialized [-Wmaybe-uninitialized]
  302 |         v4 += blocks8[pipeline_offset + 3] * prime2;
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/workspaces/cuCollections/tests/utility/hash_test.cu: In function ‘void CATCH2_INTERNAL_TEMPLATE_TEST_8() [with Hash = cuco::detail::XXHash_64<int>]’:
/workspaces/cuCollections/tests/utility/hash_test.cu:170:75: note: ‘<anonymous>’ declared here
  170 |     CHECK(hash(42) == hash(42, key_size));

PointKernel · 2023-06-30T17:57:07Z

Murmur3 doesn't play nice with the dynamic size. For now, I just implemented this idea for xxhash.

What is the issue with murmurhash? We could take cudf code as a reference (https://github.com/rapidsai/cudf/blob/d14b6cce9cc39793c118f065b113c83d0210ceb6/cpp/include/cudf/detail/utilities/hash_functions.cuh#L195-L285). Eventually, I would like to get rid of all hash function details in libcudf and reply on cuco hashers instead.

sleeepyjack · 2023-06-30T18:28:28Z

Dynamic mumurhash3 passes the unit test now (see diff 4a95fa8) but I'm getting even more warnings:

/workspaces/cuCollections/include/cuco/detail/hash_functions/murmurhash3.cuh:183:13: warning: this statement may fall through [-Wimplicit-fallthrough=]
  183 |       case 3: k1 ^= tail[2] << 16;
      |          ~~~^~~~~~~~~~~~~~~~~~~~
/workspaces/cuCollections/include/cuco/detail/hash_functions/murmurhash3.cuh:184:1: note: here
  184 |       case 2: k1 ^= tail[1] << 8;
      | ^   
/workspaces/cuCollections/include/cuco/detail/hash_functions/murmurhash3.cuh:184:13: warning: this statement may fall through [-Wimplicit-fallthrough=]
  184 |       case 2: k1 ^= tail[1] << 8;
      |          ~~~^~~~~~~~~~~~~~~~~~~
...

PointKernel · 2023-06-30T18:32:02Z

Dynamic mumurhash3 passes the unit test now (see diff 4a95fa8) but I'm getting even more warnings:

/workspaces/cuCollections/include/cuco/detail/hash_functions/murmurhash3.cuh:183:13: warning: this statement may fall through [-Wimplicit-fallthrough=]
  183 |       case 3: k1 ^= tail[2] << 16;
      |          ~~~^~~~~~~~~~~~~~~~~~~~
/workspaces/cuCollections/include/cuco/detail/hash_functions/murmurhash3.cuh:184:1: note: here
  184 |       case 2: k1 ^= tail[1] << 8;
      | ^   
/workspaces/cuCollections/include/cuco/detail/hash_functions/murmurhash3.cuh:184:13: warning: this statement may fall through [-Wimplicit-fallthrough=]
  184 |       case 2: k1 ^= tail[1] << 8;
      |          ~~~^~~~~~~~~~~~~~~~~~~
...

[[fallthrough]]

include/cuco/detail/hash_functions/murmurhash3.cuh

PointKernel · 2023-06-30T18:49:30Z

include/cuco/detail/hash_functions/murmurhash3.cuh

+   * @return A resulting hash value for `key`
+   */
+  template <typename Extent>
+  constexpr result_type __host__ __device__ operator()(Key const& key, Extent size) const noexcept


As discussed offline. We would provide a member like

constexpr result_type __host__ __device__ compute_hash(Key const& key, Extent size) const noexcept;

instead of overloading () operator.

Per std::hash

std::hash<Key>::operator() C++ Utilities library std::hash Specializations of std::hash should define an operator() that: Takes a single argument key of type Key.

Technically we still meet the standard requirements/interface. This is just another overload of the operator. The standard does not prohibit additional functionality.

sleeepyjack · 2023-07-01T00:21:45Z

Proof that with the changes introduced by this PR the compiler is still able to apply the same optimizations in case of a static extent (PTX output is identical): ~~https://godbolt.org/z/zGd4G4heG~~ Edit: update incorporating latest changes: https://godbolt.org/z/31srn8f3e

sleeepyjack · 2023-07-03T23:55:28Z

include/cuco/detail/hash_functions/murmurhash3.cuh

-    const uint32_t* const blocks = (const uint32_t*)(data + nblocks * 4);
-    for (int i = -nblocks; i; i++) {
-      uint32_t k1 = blocks[i];  // getblock32(blocks,i);
+    for (std::remove_const_t<decltype(nblocks)> i = 0; size >= 4 && i < nblocks; i++) {
+      std::uint32_t k1 = load_chunk<std::uint32_t>(data, i);


@PointKernel Here's the essential change I had to make to eliminate the compiler warnings.

PointKernel

Conflicts to resolve (due to the bug fix in #326)

@sleeepyjack Can you please add a benchmark verifying the memcpy won't affect runtime performance?

Otherwise LGTM

sleeepyjack · 2023-07-04T16:28:13Z

Performance difference between the current dev branch (Ref) and this branch (Cmp):

[0] NVIDIA RTX A6000

Hash	Ref Time	Ref Noise	Cmp Time	Cmp Noise	Diff	%Diff	Status
cuco::detail::MurmurHash3_32	3.852 ms	0.01%	3.747 ms	0.01%	-105.309 us	-2.73%	FAIL
cuco::detail::MurmurHash3_32	7.526 ms	0.01%	7.527 ms	0.00%	0.138 us	0.00%	PASS
cuco::detail::MurmurHash3_32<large_key<32> >	33.123 ms	0.00%	33.123 ms	0.00%	-0.257 us	-0.00%	PASS
cuco::detail::XXHash_32	3.158 ms	0.02%	2.974 ms	0.02%	-183.604 us	-5.81%	FAIL
cuco::detail::XXHash_32	6.253 ms	0.01%	6.253 ms	0.01%	0.397 us	0.01%	PASS
cuco::detail::XXHash_32<large_key<32> >	26.792 ms	0.00%	26.792 ms	0.00%	0.108 us	0.00%	PASS
cuco::detail::XXHash_64	6.620 ms	0.01%	6.379 ms	0.01%	-241.088 us	-3.64%	FAIL
cuco::detail::XXHash_64	8.156 ms	0.01%	8.705 ms	0.01%	548.905 us	6.73%	FAIL
cuco::detail::XXHash_64<large_key<32> >	64.476 ms	0.00%	64.475 ms	0.00%	-0.467 us	-0.00%	PASS
cuco::detail::MurmurHash3_fmix32	2.772 ms	0.01%	2.772 ms	0.01%	0.017 us	0.00%	PASS
cuco::detail::MurmurHash3_fmix64	4.633 ms	0.01%	4.633 ms	0.01%	0.095 us	0.00%	PASS

sleeepyjack · 2023-07-04T16:28:42Z

rerun tests

sleeepyjack · 2023-07-04T22:32:53Z

Ok, so it turns out that the commit that introduced the memcpy (3aedeb0) blew up everything - compiler segfaults in every TU that uses a hasher inside a map/set. Interestingly, hash_test and hash_bench work fine. Am I doing something stupid here? cc @jrhemstad

sleeepyjack · 2023-07-06T15:17:58Z

With CTK 12.2 everything compiles just fine - no compiler segfaults. Does anyone know of a recently fixed nvcc bug causing this issue?

jrhemstad · 2023-07-06T20:42:20Z

include/cuco/detail/hash_functions/xxhash.cuh

+   * @return A resulting hash value for `key`
+   */
+  template <typename Extent>
+  constexpr result_type __host__ __device__ operator()(Key const& key, Extent size) const noexcept


This feels weird. Normally, I'd expect such an esoteric use case to provide a custom hash function as opposed to trying to twist the standard hash function signature to make it work for this kind of situation.

Just discussed this with @PointKernel. I will change the signature to

template <typename Extent> constexpr result_type __host__ __device__ compute_bytes(std::byte const* data, Extent size) const noexcept

The idea is that a user can use this function to build a custom hasher around our hashers.

A bit more modern approach would be

template<class T, std::size_t N> constexpr result_type __host__ __device__ compute_hash(cuda::std::span<T, N> data) const noexcept;

Edit: We're still waiting for cuda::std::span support: #332

Addressed in 74a3739

sleeepyjack · 2023-07-11T00:49:10Z

Updated benchmarks (reference is current dev branch):

[0] NVIDIA RTX A6000

Hash	Ref Time	Ref Noise	Cmp Time	Cmp Noise	Diff	%Diff	Status
cuco::detail::MurmurHash3_32	3.123 ms	0.97%	3.094 ms	0.67%	-29.645 us	-0.95%	FAIL
cuco::detail::MurmurHash3_32	6.541 ms	1.08%	6.506 ms	1.27%	-35.056 us	-0.54%	PASS
cuco::detail::MurmurHash3_32<large_key<32> >	27.241 ms	0.62%	27.031 ms	0.82%	-209.693 us	-0.77%	FAIL
cuco::detail::XXHash_32	2.619 ms	0.73%	2.602 ms	0.24%	-16.662 us	-0.64%	FAIL
cuco::detail::XXHash_32	5.501 ms	1.17%	5.503 ms	1.95%	2.386 us	0.04%	PASS
cuco::detail::XXHash_32<large_key<32> >	22.452 ms	0.77%	22.218 ms	0.58%	-234.048 us	-1.04%	FAIL
cuco::detail::XXHash_64	5.504 ms	0.24%	5.437 ms	0.46%	-67.710 us	-1.23%	FAIL
cuco::detail::XXHash_64	6.784 ms	0.00%	6.714 ms	0.33%	-69.711 us	-1.03%	FAIL
cuco::detail::XXHash_64<large_key<32> >	55.766 ms	0.53%	55.026 ms	0.63%	-739.887 us	-1.33%	FAIL
cuco::detail::MurmurHash3_fmix32	2.251 ms	1.56%	2.231 ms	1.47%	-20.388 us	-0.91%	PASS
cuco::detail::MurmurHash3_fmix64	3.832 ms	0.51%	3.794 ms	0.55%	-37.373 us	-0.98%	FAIL

Enable hash computation from variable length keys

5e52320

sleeepyjack added the type: feature request New feature request label Jun 30, 2023

sleeepyjack requested a review from PointKernel June 30, 2023 00:22

sleeepyjack added 4 commits June 30, 2023 11:39

Merge remote-tracking branch 'upstream/dev' into dynamic-hash

e2cef98

Revert murmurhash3 changes

acd719e

Remove useless attributes

270a94a

Add unit test for dynamic vs. static key sizes

a73e3ff

Enable dynamic key length for mumurhash3

4a95fa8

PointKernel requested changes Jun 30, 2023

View reviewed changes

sleeepyjack added 7 commits July 3, 2023 11:47

Fix implicit fallthrough warning

b15d9aa

Use reinterpret_cast instead of C-style cast

fd2ed9c

Use memcpy instead of reinterpret_cast

3aedeb0

Remove sync tag from hash bench for better accuracy

669467d

XXHash cleanups

418b419

MurmurHash3 cleanups

afddf1f

Fix typo in comment

fc95dcb

sleeepyjack commented Jul 3, 2023

View reviewed changes

Fix load_chunk

f3b435c

PointKernel approved these changes Jul 4, 2023

View reviewed changes

Merge remote-tracking branch 'upstream/dev' into dynamic-hash

5940a96

jrhemstad reviewed Jul 6, 2023

View reviewed changes

sleeepyjack added 3 commits July 10, 2023 14:42

Use compute_hash function instead of operator

74a3739

Merge remote-tracking branch 'upstream/dev' into dynamic-hash

78c1527

Use reinterpret_cast instead of memcpy /sadface

b821589

Merge remote-tracking branch 'upstream/dev' into dynamic-hash

d2a2538

sleeepyjack force-pushed the dynamic-hash branch from 8fbd42d to d2a2538 Compare July 11, 2023 00:52

sleeepyjack merged commit 51f68ca into NVIDIA:dev Jul 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable hash computation from variable length keys #327

Enable hash computation from variable length keys #327

sleeepyjack commented Jun 30, 2023

sleeepyjack commented Jun 30, 2023

sleeepyjack commented Jun 30, 2023

PointKernel commented Jun 30, 2023 •

edited

Loading

sleeepyjack commented Jun 30, 2023

PointKernel commented Jun 30, 2023

PointKernel Jun 30, 2023

sleeepyjack Jun 30, 2023

sleeepyjack commented Jul 1, 2023 •

edited

Loading

sleeepyjack Jul 3, 2023

PointKernel left a comment

sleeepyjack commented Jul 4, 2023

sleeepyjack commented Jul 4, 2023

sleeepyjack commented Jul 4, 2023 •

edited

Loading

sleeepyjack commented Jul 6, 2023

jrhemstad Jul 6, 2023

sleeepyjack Jul 6, 2023 •

edited

Loading

sleeepyjack Jul 6, 2023

sleeepyjack Jul 7, 2023 •

edited

Loading

sleeepyjack Jul 10, 2023

sleeepyjack commented Jul 11, 2023

Enable hash computation from variable length keys #327

Enable hash computation from variable length keys #327

Conversation

sleeepyjack commented Jun 30, 2023

sleeepyjack commented Jun 30, 2023

sleeepyjack commented Jun 30, 2023

PointKernel commented Jun 30, 2023 • edited Loading

sleeepyjack commented Jun 30, 2023

PointKernel commented Jun 30, 2023

PointKernel Jun 30, 2023

Choose a reason for hiding this comment

sleeepyjack Jun 30, 2023

Choose a reason for hiding this comment

sleeepyjack commented Jul 1, 2023 • edited Loading

sleeepyjack Jul 3, 2023

Choose a reason for hiding this comment

PointKernel left a comment

Choose a reason for hiding this comment

sleeepyjack commented Jul 4, 2023

sleeepyjack commented Jul 4, 2023

sleeepyjack commented Jul 4, 2023 • edited Loading

sleeepyjack commented Jul 6, 2023

jrhemstad Jul 6, 2023

Choose a reason for hiding this comment

sleeepyjack Jul 6, 2023 • edited Loading

Choose a reason for hiding this comment

sleeepyjack Jul 6, 2023

Choose a reason for hiding this comment

sleeepyjack Jul 7, 2023 • edited Loading

Choose a reason for hiding this comment

sleeepyjack Jul 10, 2023

Choose a reason for hiding this comment

sleeepyjack commented Jul 11, 2023

[0] NVIDIA RTX A6000

PointKernel commented Jun 30, 2023 •

edited

Loading

sleeepyjack commented Jul 1, 2023 •

edited

Loading

sleeepyjack commented Jul 4, 2023 •

edited

Loading

sleeepyjack Jul 6, 2023 •

edited

Loading

sleeepyjack Jul 7, 2023 •

edited

Loading