Skip to content

A Library for fast Hash Tables on GPUs (modifications for easier BF operations and support for GROUP BY logic)

License

Notifications You must be signed in to change notification settings

kevkrist/warpcore

 
 

Repository files navigation

WARPCORE

The following modifications have been made to the base warpcore library in this fork:

  • Support for aggregations in SingleValueHashTable (SQL GROUP BY). Values in the table must be initialized by a call to init_values(), where the initial value must be the identity for the aggregate operation (e.g., 0 for atomicAdd(), INT32_MAX for atomicMin(), etc.). The atomic_ aggregator functor argument has signature atomic_aggregator(value_type* value_address, value_type value_to_aggregate).
  • BloomFilter::retrieve_write() writes out the filtered input table.
  • In general, the writer functor argument has signature writer(int write_index, int read_index, [HashTableValueType hash_table_value, FilterValueType filter_value]), depending on the hash table type and retrieval type.
  • BloomFilter::insert_if() inserts into the bloom filter if a corresponding values passes a predicate (SQL WHERE).
  • BloomFilter::retrieve_write_if() and BloomFilter::retrieve_if() are defined similarly.
  • SingleValueHashTable::insert_if(), SingleValueHashTable::retrieve_write() and SingleValueHashTable::retrieve_write_if() are also implemented.
  • HashSet::insert_if(), HashSet::retrieve_write(), and HashSet::retrieve_write_if() are implemented.

The goal is to eliminate the need to write hash table kernel operations customized to specific SQL queries. The exception is hash join pipelining, but I have found that in cases of high selectivity, the thread divergence caused by pipelining is not amortized by the lack of materialization. The underlying implementation details are unaltered.

Below you will find the original README.

NOTE: There is a bug in the test build (which is also present in the original repo).

Hashing at the speed of light on modern CUDA-accelerators

Introduction

warpcore is a framework for creating high-throughput, purpose-built hashing data structures on CUDA-accelerators.

This library provides the following data structures:

Implementations support key types std::uint32_t and std::uint64_t together with any trivially copyable value type. In order to be adaptable to a wide range of possible usecases, we provide a multitude of combinable modules such as hash functions, probing schemes, and data layouts (visit the documentation for further information).

warpcore has won the best paper award at the IEEE HiPC 2020 conference (link to manuscript)(link to preprint) and is based on our previous work on massively parallel GPU hash tables warpdrive which has been published in the prestigious IEEE IPDPS conference (link to manuscript).

Development Status

This library is still under heavy development. Users should expect breaking changes and refactoring to be common. Developement mainly takes place on our in-house GitLab instance. However, we plan to migrate to GitHub in the near future.

Requirements

Dependencies

  • hpc_helpers - utils, timers, etc.
  • kiss_rng - a fast and lightweight GPU PRNG
  • CUB - high-throughput primitives for GPUs (already included in newer versions of the CUDA toolkit, i.e., $\ge$ v10.2)

Note: Dependencies are automatically managed via CMake.

Getting warpcore

warpcore is header only and can be incorporated manually into your project by downloading the headers and placing them into your source tree.

Adding warpcore to a CMake Project

warpcore is designed to make it easy to include within another CMake project. The CMakeLists.txt exports a warpcore target that can be linked1 into a target to setup include directories, dependencies, and compile flags necessary to use warpcore in your project.

We recommend using CMake Package Manager (CPM) to fetch warpcore into your project. With CPM, getting warpcore is easy:

cmake_minimum_required(VERSION 3.18 FATAL_ERROR)

include(path/to/CPM.cmake)

CPMAddPackage(
  NAME warpcore
  GITHUB_REPOSITORY sleeepyjack/warpcore
  GIT_TAG/VERSION XXXXX
)

target_link_libraries(my_target warpcore)

This will take care of downloading warpcore from GitHub and making the headers available in a location that can be found by CMake. Linking against the warpcore target will provide everything needed for warpcore to be used by my_target.

1: warpcore is header-only and therefore there is no binary component to "link" against. The linking terminology comes from CMake's target_link_libraries which is still used even for header-only library targets.

Building warpcore

Since warpcore is header-only, there is nothing to build to use it.

To build the tests, benchmarks, and examples:

cd $WARPCORE_ROOT
mkdir -p build
cd build
cmake .. -DWARPCORE_BUILD_TESTS=ON -DDWARPCORE_BUILD_BENCHMARKS=ON -DDWARPCORE_BUILD_EXAMPLES=ON
make

Binaries will be built into:

  • build/tests/
  • build/benchmarks/
  • build/examples/

Where to go from here?

Take a look at the examples, test your own system performance using the benchmark suite and be sure everything works as expected by running the test suite.

How to cite warpcore?

BibTeX:

@inproceedings{DBLP:conf/hipc/JungerKM0XLS20,
  author    = {Daniel J{\"{u}}nger and
               Robin Kobus and
               Andr{\'{e}} M{\"{u}}ller and
               Christian Hundt and
               Kai Xu and
               Weiguo Liu and
               Bertil Schmidt},
  title     = {WarpCore: {A} Library for fast Hash Tables on GPUs},
  booktitle = {27th {IEEE} International Conference on High Performance Computing,
               Data, and Analytics, HiPC 2020, Pune, India, December 16-19, 2020},
  pages     = {11--20},
  publisher = {{IEEE}},
  year      = {2020},
  url       = {https://doi.org/10.1109/HiPC50609.2020.00015},
  doi       = {10.1109/HiPC50609.2020.00015},
  timestamp = {Wed, 05 May 2021 09:45:30 +0200},
  biburl    = {https://dblp.org/rec/conf/hipc/JungerKM0XLS20.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

@inproceedings{DBLP:conf/ipps/Junger0S18,
  author    = {Daniel J{\"{u}}nger and
               Christian Hundt and
               Bertil Schmidt},
  title     = {WarpDrive: Massively Parallel Hashing on Multi-GPU Nodes},
  booktitle = {2018 {IEEE} International Parallel and Distributed Processing Symposium,
               {IPDPS} 2018, Vancouver, BC, Canada, May 21-25, 2018},
  pages     = {441--450},
  publisher = {{IEEE} Computer Society},
  year      = {2018},
  url       = {https://doi.org/10.1109/IPDPS.2018.00054},
  doi       = {10.1109/IPDPS.2018.00054},
  timestamp = {Sat, 19 Oct 2019 20:31:38 +0200},
  biburl    = {https://dblp.org/rec/conf/ipps/Junger0S18.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

warpcore Copyright (C) 2018-2021 Daniel Jünger

This program comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the file LICENSE for details.

About

A Library for fast Hash Tables on GPUs (modifications for easier BF operations and support for GROUP BY logic)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 88.1%
  • Cuda 11.5%
  • Other 0.4%