Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added conservative tokenizer for Claude 3 #146

Open
wants to merge 20 commits into
base: development
Choose a base branch
from

Conversation

adubovik
Copy link
Collaborator

@adubovik adubovik commented Sep 6, 2024

Resolves #145

Byte counting approximation is used since the Claude 3 tokenizer isn't yet released.

@adubovik adubovik self-assigned this Sep 6, 2024
@adubovik adubovik marked this pull request as ready for review September 10, 2024 10:48
@adubovik
Copy link
Collaborator Author

adubovik commented Sep 10, 2024

/deploy-review

Environment URL: https://chat-ai-dial-adapter-bedrock-pr-146.nightly-test.deltixhub.io
E2E tests status: success

1 similar comment
@adubovik
Copy link
Collaborator Author

adubovik commented Sep 10, 2024

/deploy-review

Environment URL: https://chat-ai-dial-adapter-bedrock-pr-146.nightly-test.deltixhub.io
E2E tests status: success

@adubovik adubovik force-pushed the feat/support-prompt-truncation-for-claude3 branch from 7b6944d to ca99288 Compare September 10, 2024 16:36
@adubovik
Copy link
Collaborator Author

See anthropics/anthropic-sdk-python#656 for discussion why Anthropic datatypes have Iterable inside which doesn't do well with Pydantic.

@adubovik
Copy link
Collaborator Author

See also anthropics/anthropic-sdk-python#375 raising the issue of missing tokenizer for Claude 3.

) -> List[int]:
prompt = self.get_text_completion_prompt(params, messages)
return prompt.discarded_messages or []
) -> List[int] | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's the same discussion, as we have here:

epam/ai-dial-adapter-openai#150 (comment)

My main argument for avoiding such -> List[int] | None in favour of "squeezed" -> SqueezedResult is:

If you look at result type of function, and see just primitive types, that gives you no information, what meaning that types hold

For example here - I need to go and double check, that List[int] means that we have some discarded messages indexes
(and this also assumes, that I know that we pass discarded messages as indexes of messages. If I would not know that, this List[int] would make no sense for me, before I actually go and see code)

And None means that we discarded nothing

Which is also kind of counterintuitive - why it's not just empy list then?


If this would be some "squeezed" type, like

class PromptTruncationResult(BaseModel):
    """ Indexes of discarded messages"""
    discarded_messages: List[int] | None = None

I would get sense of this function immediately, just by hints of my IDE

Copy link
Collaborator Author

@adubovik adubovik Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And None means that we discarded nothing
Which is also kind of counterintuitive - why it's not just empy list then?

None and emtpy list have different meanings:

  • None - the algo wasn't applied at all,
  • Empty list - the algo was applied and the messages turned out to fit into the limit.

Here None supports optionality of the truncate prompt algo application.

So PromptTruncationResult isn't prompt truncation result strictly speaking, it's OptionalPromptTruncationResult and PromptTruncationResult should be a wrapper for List[int] following the logic of your proposal.

I really don't like proliferation of trivial types like this. It's Python, not Java after all.

If you look at result type of function, and see just primitive types, that gives you no information, what meaning that types hold

Typically the name of the function provides a hint on the meaning of the result. Like

def concant_lists(a: List[T], b: List[T]) -> List[T]
    pass

It's clear what List[T] means. You don't have to wrap it into ConcatenationResult[T] class.

trucate_prompt is literally part of the DIAL API, it's not some obscure method - it's implemented in 3 DIAL packages at this point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You provided great example for discussion

Because, for instance, if this function would be something like that

async def truncate_prompt(
    self, params: ModelParameters, messages: List[Message]
) ->List[Message]:
    ...

I would have no objections - here you can get on first glance, that by truncation we take list of input messages, and return modified list of messages

By naming of function itself I get clearly, that truncate - is shortening/shrinking something
And this typings help me with understanding


Now let's look at what we have:

async def truncate_prompt(
    self, params: ModelParameters, messages: List[Message]
) -> List[int] | None:
    ...

Ok, so I'm truncating prompt, that is list of messages, and I get ... list of integers??
What does this list of integer means? What does None means?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like proliferation of trivial types like this. It's Python, not Java after all.

If this function would be super-easy to understand, than I would have no objections.

We basically writing in language, that has

Explicit is better than implicit.
...
Readability counts.

inside of its motto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function do not truncate anything

It returns list of indexes, that we should remove from messages, in order for it to fit into max_prompt_tokens limit

Maybe we should rename this function then, because

async def get_indexes_to_remove(
    self, params: ModelParameters, remove_from: List[Message]
) -> List[int] | None

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Renamed to compute_discarded_messages
  2. Introduced DiscardedMessages = List[int] to improve readability
  3. Added comments to explain what None means

tokenize_messages: Callable[[Messages], int],
keep_message: Callable[[Messages, int], bool],
partition_messages: Callable[[Messages], List[int]],
async def truncate_prompt(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This truncate_prompt and truncate_prompt_ setup looks twisted

Why can't we just make this operations inside of chat_model and adapter?

That's only 6 lines of code (3 of them are duplicated), but it's more explicit:

    @override
    async def truncate_and_linearize_messages(
        self, messages: List[BaseMessage], max_prompt_tokens: Optional[int]
    ) -> TextCompletionPrompt:
        tuncate_result = await truncate_prompt(
            messages=messages,
            tokenize_messages=self.tokenize_messages,
            keep_message=keep_last_and_system_messages,
            partition_messages=self.partitioner,
            model_limit=None,
            user_limit=max_prompt_tokens,
        )
        if isinstance(result, TruncatePromptError):
            raise result.to_dial_exception()
        messages = omit_by_indices(messages, tuncate_result)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is now duplicated, will become triplicated and many-more-times-licated in future, when we add new models with truncate prompt supports. And we will.

Copy link
Contributor

@roman-romanov-o roman-romanov-o Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, can we at least get rid of this truncate_prompt and tuncate_prompt_?

Imgaine, that you're new to this repo, and you just scrolling codebase, and see two functions, with basically same naming, and different output types

You have two functions

async def truncate_prompt(
...
) -> Tuple[List[int], List[_T]]:
    ...



async def truncate_prompt_(
...
) -> Set[int] | TruncatePromptError:
    ...

I can't imagine any situation, where you're not confused by this, without actually digging into each of this functions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed truncate_prompt_ => compute_discarded_messages since it only returns a list of indices of discarded messages, whereas truncate_prompt does actually return a truncated list of messages.

messages: List[_T],
tokenize_messages: Callable[[List[_T]], Awaitable[int]],
keep_message: Callable[[List[_T], int], bool],
partition_messages: Callable[[List[_T]], List[int]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's partitioner, if we're consistent with chain of calls

Copy link
Collaborator Author

@adubovik adubovik Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow the logic. Why should it be partitioner?

Here I called all callbacks as verbs by the action they do:

  • tokenize_messages - it tokenizes a list of messages,
  • keep_message - it marks a message as required (maybe I should rename it to is_message_required?),
  • partition_messages - it creates a partition of messages

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because chain of calls here is such:

class PseudoChatModel(...):
    partitioner: Callable[[List[BaseMessage]], List[int]]
    
     @override
    async def truncate_and_linearize_messages(
        self, messages: List[BaseMessage], max_prompt_tokens: Optional[int]
    ) -> TextCompletionPrompt:
        discarded_messages, messages = await truncate_prompt(
            ...
           partition_messages=self.partitioner,  <-------
       )

Meaning of this function has not changed - it's a function, that takes list of elements, and returns list of partition sizes

Why we need to change this naming then?

Let's choose partition_messages or partitioner, and use it consistently

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, renamed to partitioner and tokenizer.

messages: List[_T],
tokenize_messages: Callable[[List[_T]], Awaitable[int]],
keep_message: Callable[[List[_T], int], bool],
partition_messages: Callable[[List[_T]], List[int]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partitioner

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

aidial_adapter_bedrock/llm/truncate_prompt.py Outdated Show resolved Hide resolved
aidial_adapter_bedrock/llm/chat_model.py Show resolved Hide resolved
@adubovik adubovik force-pushed the feat/support-prompt-truncation-for-claude3 branch from fea64c5 to da2fcda Compare September 20, 2024 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support prompt truncation for Claude 3 models
2 participants