chunkk

Recursively generating a dataset for finetuning pre-trained GPT models from a large text file, like a book or a documentation

Usage

node --input [inputFilePath] --output [outputFilePath] --numIterations [number] --numTokens [number] --model [chatGptModel]

or

node -i [inputFilePath] -o [outputFilePath] -n [number] -t [number] --m [chatGptModel]

input (requred) - the file path for txt (for example a book, or a documentation)
output - file path for the generated JSON file // default output.json
numIterations - how many times you want to ask for questions for each chunk // default 3
numTokens - max number of tokens for ChatGPT model of your choice // default 2000
model - ChatGPT model // default gpt-3.5-turbo

Example

node index.js --input '../Downloads/TedChiang-The truth of fact the truth of feeling.txt' --numIterations 5 --output '../Downloads/Ted.json' --numTokens 2500 --model 'gpt-4'

Here is how it works

Takes a big text file
Splits it in numberTokens chunks
For each chunk:
- Ask GPT to create a set of questions. The same request repeated in total numberOfIterations times. Every request returns about 8-10 question. So the number of questions will be about numberOfIterations * 10
- All these questions are then fed as a prompt to ChatGPT for answers.
- The last request is a summary for this chunk of text
Summaries are concatenated into a new text, and the process repeats recursively until just one chunk is left
All questions, answers and summaries are recorded in JSON format in file outputFile unless you specified the

TODO

Add streamining. This is not going to work for huge files for now, since the reading of the file is done with fs.readFileSync
Add quizzes.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
README.md		README.md
chunkk.js		chunkk.js
index.js		index.js
package.json		package.json
promptTesting.js		promptTesting.js
utils.js		utils.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chunkk

Usage

Example

Here is how it works

TODO

About

Releases

Packages

Contributors 2

Languages

tradle/chunkk

Folders and files

Latest commit

History

Repository files navigation

chunkk

Usage

Example

Here is how it works

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages