Skip to content

Latest commit

 

History

History
80 lines (53 loc) · 5.81 KB

automind.md

File metadata and controls

80 lines (53 loc) · 5.81 KB

automind automind.py

Introduction

automind.py is a Python module that contributes to the CodePhreak project by facilitating the transformation of memory data into a structured prompt. This prompt is formatted using a specialized style known as "Llama Chat Style," which is designed to enable intelligent code generation and interaction. Table of Contents

Constants Function: format_to_llama_chat_style(memory)

Constants

In automind.py, the following constants are defined to assist in formatting the prompts:

BOS: The beginning-of-sequence token, denoting the start of a sequence. EOS: The end-of-sequence token, indicating the conclusion of a sequence. B_INST, E_INST: Tokens marking the beginning and end of an instruction in the Llama Chat Style. B_SYS, E_SYS: Tokens enclosing a system instruction in the Llama Chat Style. DEFAULT_SYSTEM_PROMPT: A default system prompt template used for instruction formatting.

Function: format_to_llama_chat_style(memory)

This function takes memory as input, where memory represents a sequence of dialog pairs, each consisting of an instruction and a response (except for the most recent dialog). The function formats this memory data into a structured prompt that adheres to the Llama Chat Style.

The process involves the following steps:

Iterate through each dialog pair in the memory, extracting the instruction and response. Handle None values for instruction and response by replacing them with empty strings. For the first dialog pair, prepend the default system prompt template using the B_SYS and E_SYS tokens. Format each instruction and response pair using the B_INST, E_INST, and EOS tokens. For the first instruction, the BOS token is not explicitly added due to tokenization behavior. Extract the new instruction from the most recent dialog and append it to the prompt. If this is the first dialog, the default system prompt is prepended.

The result is a formatted prompt that encapsulates the historical dialog interactions and the user's new instruction, all conforming to the Llama Chat Style.

Conclusion

automind.py provides a crucial function, format_to_llama_chat_style(memory), that converts memory data into a structured prompt suitable for intelligent code generation. By implementing this Llama Chat Style formatting, the module contributes to the seamless integration of memory-based dialog interactions and automated code generation within the codephreak project. {Professor Codephreak is an expert in machine learning, computer science and computer programming}

This following comprehensively explains the key components of the automind.py module, emphasizing its purpose, the constants involved, and the operation of the primary function in transforming memory data into a suitable prompt format for intelligent code generation.

automind.py - AutoMind Formatting Utilities Overview

The automind.py module provides utility functions for formatting conversation data in the style of the Llama Chatbot for use with the AutoMind conversational system. It assists in preparing conversation memory for input to the system by applying specific formatting rules and adding system prompts. Functions format_to_llama_chat_style(memory) -> str

This function takes conversation memory data as input and formats it into the Llama Chatbot's specific chat style for use with AutoMind. The function performs the following steps:

Prepend the default system prompt to the first instruction.
Iterate through each dialog in the conversation memory, appending the instruction and response in the specified format.
Handle None values for instruction and response by converting them to empty strings.
Handle the addition of special tokens for the beginning of a conversation (BOS), the beginning of an instruction (B_INST), and the end of an instruction (E_INST).
Handle the addition of the system prompt for the first instruction.
Add a new instruction from the user at the end, applying the same formatting rules.

The function returns a formatted string that represents the conversation history in the desired chat style for input to the AutoMind conversational system. Constants

BOS, EOS: Tokens representing the beginning and end of a sequence.
B_INST, E_INST: Tokens representing the beginning and end of an instruction.
B_SYS, E_SYS: Tokens representing the beginning and end of a system prompt.
DEFAULT_SYSTEM_PROMPT: The default system prompt to prepend to the first instruction.

Usage

Import the necessary modules: os, glob, ujson, psutil, AutoModelForCausalLM, AutoTokenizer from the transformers library.

Define the constants BOS, EOS, B_INST, E_INST, B_SYS, E_SYS, and DEFAULT_SYSTEM_PROMPT.

Define the function format_to_llama_chat_style(memory) -> str:
    Iterate through conversation memory, applying formatting rules and adding system prompts where needed.
    Handle special tokens and formatting for instructions and responses.
    Return the formatted conversation history as a string.

Example Use Case

The automind.py module can be utilized within a conversational system to prepare conversation memory for input to AutoMind and AUTOMINDx engine. By formatting conversation data in the specific chat style of the Llama Chatbot, the system ensures that memory data is properly structured and enhanced with system prompts, ready for use in generating context-aware responses within the AutoMind environment.

Professor Codephreak in self portrait
AUTOMINDx design goal