Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BPE in Haskell #79

Open
BobMcDear opened this issue May 24, 2024 · 0 comments
Open

BPE in Haskell #79

BobMcDear opened this issue May 24, 2024 · 0 comments

Comments

@BobMcDear
Copy link

Hello,

I have created a port of minbpe in Haskell, minbpe-hs, that provides the same functionalities as minbpe minus GPT4Tokenizer. Thanks to the inherently recursive structure of BPE, it can be rendered quite nicely in functional languages, and I hope those who are struggling to apprehend the workings of this algorithm can benefit from studying its Haskell implementation.

The Wikipedia example can be reproduced using minbpe-hs as follows.

{-# LANGUAGE OverloadedStrings #-}

import BPE.Base
import BPE.Basic

main :: IO ()
main = do
    let (merges, vocab) = trainTokenizer (256 + 3) "aaabdaaabac"
    putStrLn $ show $ encode merges "aaabdaaabac"
    putStrLn $ show $ decode vocab [258, 100, 258, 97, 99]
    saveMergesAndVocab "toy" merges vocab

Would it be all right if I submit a PR to add this to the list of community extensions?

Thank you,
Borna

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant