llama.cpp library for UndreamAI

LlamaLib implements an API for the llama.cpp server. The focus of this project is to:

build the library in a cross-platform way to support most of the architectures available in llama.cpp
expose the API in a way that can be imported in Unity (and C#), more specifically for the LLMUnity project

Each release contains:

the built libraries for different architectures in the undreamai-[VERSION]-llamacpp.zip
built libraries with additional functionality (iQ quants, flash attention) for Nvidia / AMD GPUs in the undreamai-[VERSION]-llamacpp-full.zip
server binaries that can use the above libraries similarly to the llama.cpp server in the undreamai-[VERSION]-server.zip.
Note: you need to extract the libraries and the binaries in the same directory to use them.

The following architectures are provided:

*-noavx (Windows/Linux): support for CPUs without AVX instructions (operates on all AVX as well)
*-avx (Windows/Linux): support for CPUs with AVX instructions
*-avx2 (Windows/Linux): support for CPUs with AVX-2 instructions
*-avx512 (Windows/Linux): support for CPUs with AVX-512 instructions
*-cuda-cu11.7.1 (Windows/Linux): support for Nvidia GPUs with CUDA 11 (CUDA doesn't need to be separately installed)
*-cuda-cu12.2.0 (Windows/Linux): support for Nvidia GPUs with CUDA 11 (CUDA doesn't need to be separately installed)
*-hip (Windows/Linux): support for AMD GPUs with AMD HIP (HIP doesn't need to be separately installed)
*-vulkan (Windows/Linux): support for most GPUs independent of manufacturer
macos-*-acc (macOS arm64/x64): support for macOS with the Accelerate framework
macos-*-no_acc (macOS arm64/x64): support for macOS without the Accelerate framework

In addition the windows-archchecker and linux-archchecker libraries are used to determine the presence and type of AVX instructions in Windows and Linux.

The server CLI startup guide can be accessed by running the command .\undreamai_server -h on Linux/macOS or undreamai_server.exe -h on Windows for the architecture of interest.
More information on the different options can be found on the llama.cpp server Readme.

The server binaries can be used to deploy remote servers for LLMUnity.
You can print the required command within Unity by running the scene.
More information can be found at the Use a remote server section of the LLMUnity Readme.

Name		Name	Last commit message	Last commit date
Latest commit History 459 Commits
.github		.github
archchecker		archchecker
tinyBLAS		tinyBLAS
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
benchmark.sh		benchmark.sh
server_setup.sh		server_setup.sh
setup_and_run_benchmarks.sh		setup_and_run_benchmarks.sh
stringwrapper.cpp		stringwrapper.cpp
stringwrapper.h		stringwrapper.h
undreamai.cpp		undreamai.cpp
undreamai.h		undreamai.h
undreamai_server.cpp		undreamai_server.cpp
undreamai_test.cpp		undreamai_test.cpp
utils_callback.hpp		utils_callback.hpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llama.cpp library for UndreamAI

About

Releases 15

Packages

Contributors 2

Languages

License

undreamai/LlamaLib

Folders and files

Latest commit

History

Repository files navigation

llama.cpp library for UndreamAI

About

Resources

License

Stars

Watchers

Forks

Releases 15

Packages 0

Contributors 2

Languages

Packages