Skip to content

Prompt Injection Detector

Our Prompt Injection Detector provides robust defense against adversarial input manipulations aimed at large language models (LLMs). By promptly identifying and neutralizing these attacks, our detector ensures the LLM operates securely, preventing it from falling victim to injection attacks.

Vulnerability

Injection attacks, particularly within LLM contexts, can prompt the model to execute unintended actions. Attackers typically exploit these vulnerabilities in two main ways:

  • Direct Injection: Involves overwriting system prompts directly.

  • Indirect Injection: Involves altering inputs sourced from external channels.

Info

As specified by the OWASP Top 10 LLM attacks, this vulnerability is categorized under:

LLM01: Prompt Injection - Manipulating LLMs via crafted inputs can lead to unauthorized access, data breaches, and compromised decision-making.

Configuration

from safeguards.shield.input_detectors import PromptInjection

safeguards = Shield()
input_detectors = [PromptInjection(threshold=0.7)]

sanitized_prompt, valid_results, risk_score = safeguards.scan_input(prompt, input_detectors)