Regex Detector
Our Regex Detector is a robust tool meticulously crafted to scrutinize language model outputs based on predefined regular expression patterns. Users have the flexibility to define both desirable ("good") and undesirable ("bad") patterns, allowing fine-tuning of model output validation.
Usage
This detector operates using two primary lists of regular expressions: good_patterns
and bad_patterns
.
- Good Patterns: When the
good_patterns
list is provided, the model's output is considered valid if any of these patterns match the output. This feature proves invaluable when expecting specific formats or keywords in the output. - Bad Patterns: Conversely, if the
bad_patterns
list is provided, the model's output is deemed invalid if any of these patterns match the output. This functionality is ideal for filtering out undesired phrases, words, or formats from the model's responses. The detector can independently function using either list.
Configuration
from safeguards.shield.output_detectors import RegexOutput
safeguards = Shield()
output_detectors = [RegexOutput(bad_patterns=['\b(union(\s+all)?|select|insert|update|delete|from|where)\b'], redact=True)]
sanitized_response, valid_results, risk_score = safeguards.scan_output(sanitized_prompt, response_text, output_detectors)