Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PromptGuard to safety_utils #608

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tryrobbo
Copy link
Contributor

PromptGuard has been introduced as a system safety tool to be used to check LLM prompts for malicious text. This PR adds this guard to the safety_utils module and adjusts recipes to use this new check.

  • Implementation details
    • PromptGuard has a limit of 512 tokens. Naive sentence splitting is performed on inputs to try to ensure that this limit is not breached. A warning is printed if a sentence is longer than 512 tokens.
    • Only the jailbreak score of PromptGuard is acted on. The other scores are intended to be used in agentic implemetations, of which there are none currently which use the safety_utils module.

General simple test

echo "hello" |python3 inference.py "meta-llama/Meta-Llama-3.1-8B-Instruct"  --quantization '8bit' --use_fast_kernels --enable-promptguard-safety True 

To test the text splitter feature

python3 inference.py "meta-llama/Meta-Llama-3.1-8B-Instruct"  --quantization '8bit' --use_fast_kernels --enable-promptguard-safety True < ~/longprompt 

Copy link
Contributor

@albertodepaola albertodepaola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good! added a couple of simple comments only.

@@ -34,6 +34,7 @@ def main(
enable_sensitive_topics: bool=False, # Enable check for sensitive topics using AuditNLG APIs
enable_salesforce_content_safety: bool=True, # Enable safety check with Salesforce safety flan t5
enable_llamaguard_content_safety: bool=False, # Enable safety check with Llama-Guard
enable_promptguard_safety: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the small size of the mode, can we leave it as default true?

@@ -37,6 +37,7 @@ def main(
enable_saleforce_content_safety: bool=True, # Enable safety check woth Saleforce safety flan t5
use_fast_kernels: bool = False, # Enable using SDPA from PyTorch Accelerated Transformers, make use Flash Attention and Xformer memory-efficient kernels
enable_llamaguard_content_safety: bool = False,
enable_promptguard_safety: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specially for this call, can we set it to true?

scores = self.get_scores(sentence)
running_scores['jailbreak'] = max([running_scores['jailbreak'],scores['jailbreak']])
running_scores['indirect_injection'] = max([running_scores['indirect_injection'],scores['indirect_injection']])
is_safe = True if running_scores['jailbreak'] < 0.5 else False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set the bar at 0.5? I think 0.8 or 0.9 would be better based on talks with the team.

for sentence in sentences:
scores = self.get_scores(sentence)
running_scores['jailbreak'] = max([running_scores['jailbreak'],scores['jailbreak']])
running_scores['indirect_injection'] = max([running_scores['indirect_injection'],scores['indirect_injection']])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we comment that this is not being used for the user dialog?

Copy link
Contributor

@mreso mreso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall but lets address the way we feed the prompt to the model (See comments).

from torch.nn.functional import softmax
inputs = self.tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
inputs = inputs.to(device)
if len(inputs[0]) > 512:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As max_length is 512 this condition should never be true. Can we instead follow the PromptGuard recommendation and split the text into multiple segments which we apply in parallel (batched)? Especially because of the much bigger context length of Llama 3.1.

return "PromptGuard", True, "PromptGuard is not used for model output so checking not carried out"
sentences = text_for_check.split(".")
running_scores = {'jailbreak':0, 'indirect_injection' :0}
for sentence in sentences:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its probably more efficient to do this batched as commented above. Lets split the prompt in blocks of 512 (with some overlap) and then feed them batched into the model which will be way more efficient than feeding the sentences one by one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants