Skip to main content

What is Detection?

Avaliar’s detection system automatically analyzes LLM inputs and outputs to identify safety issues. Every trace — whether captured via the SDK or the Proxy — can be scanned by a suite of detectors that flag problems like prompt injection, toxicity, PII leakage, bias, jailbreak attempts, and hallucinations. Detection is the core of Avaliar’s safety layer. It turns raw traces into actionable findings with severity classifications and, when configured, triggers real-time alerts.

Three ways to use detection

1. SDK — integrated with tracing

Add detection directly to any @traceable LLM span. Results are automatically attached to your traces.
from avaliar import traceable
from avaliar.detectors import DetectorType

@traceable(
    "llm",
    model="gpt-4o",
    provider="openai",
    detection=True,
    detectors=[DetectorType.PROMPT_INJECTION, DetectorType.TOXICITY],
    detection_mode="cloud",
)
async def generate(messages: list) -> str:
    ...

2. SDK — standalone

Run detectors independently for custom validation workflows:
from avaliar.detectors import Detector, DetectorType

detector = Detector([DetectorType.PROMPT_INJECTION, DetectorType.PII])

result = await detector.evaluate_full(
    prompt="User input here",
    response="LLM response here",
)

if result.has_issues:
    for issue in result.issues:
        print(f"{issue.type}: {issue.message} (severity: {issue.severity})")

3. Proxy Detection

When you route LLM calls through the Avaliar Proxy, detection is automatic. Prompt injection detection runs synchronously (blocking harmful inputs), while all other detectors run asynchronously on Avaliar’s backend. No code changes are needed — the proxy handles everything.

Detection Pipeline

Every trace follows the same pipeline:
Input (prompt + response)
  → Detectors (6 types)
    → Issues identified
      → Severity classification (LOW → CRITICAL)
        → Alerts triggered (if configured)

Detector Types

Avaliar includes six built-in detector types:
  1. Prompt Injection — Detects attempts to manipulate the LLM through crafted inputs
  2. Jailbreak — Identifies attempts to bypass LLM safety constraints
  3. Toxicity — Flags offensive, harmful, or inappropriate content
  4. PII Detection — Finds personally identifiable information in inputs or outputs
  5. Bias — Identifies biased or discriminatory content
  6. Hallucination — Detects factually incorrect or fabricated information
See Detector Types for detailed descriptions and examples of each.

Severity Levels

Every detected issue is assigned a severity level:
SeverityDescription
LOWMinor issues that may warrant review but do not pose an immediate risk. Informational findings.
MEDIUMNotable issues that should be addressed. May indicate a pattern that could escalate.
HIGHSignificant safety concerns that require prompt attention. Active risk to users or data.
CRITICALSevere issues demanding immediate action. Active exploitation, data exposure, or dangerous outputs.

Next Steps

Detector Types

Learn about each of the 6 detector types with examples and severity guidance.

Detection Modes

Choose between local and cloud detection for SDK-based integration.