What is Detection?
Avaliar’s detection system automatically analyzes LLM inputs and outputs to identify safety issues. Every trace — whether captured via the SDK or the Proxy — can be scanned by a suite of detectors that flag problems like prompt injection, toxicity, PII leakage, bias, jailbreak attempts, and hallucinations. Detection is the core of Avaliar’s safety layer. It turns raw traces into actionable findings with severity classifications and, when configured, triggers real-time alerts.Three ways to use detection
1. SDK — integrated with tracing
Add detection directly to any@traceable LLM span. Results are automatically attached to your traces.
2. SDK — standalone
Run detectors independently for custom validation workflows:3. Proxy Detection
When you route LLM calls through the Avaliar Proxy, detection is automatic. Prompt injection detection runs synchronously (blocking harmful inputs), while all other detectors run asynchronously on Avaliar’s backend. No code changes are needed — the proxy handles everything.Detection Pipeline
Every trace follows the same pipeline:Detector Types
Avaliar includes six built-in detector types:- Prompt Injection — Detects attempts to manipulate the LLM through crafted inputs
- Jailbreak — Identifies attempts to bypass LLM safety constraints
- Toxicity — Flags offensive, harmful, or inappropriate content
- PII Detection — Finds personally identifiable information in inputs or outputs
- Bias — Identifies biased or discriminatory content
- Hallucination — Detects factually incorrect or fabricated information
Severity Levels
Every detected issue is assigned a severity level:| Severity | Description |
|---|---|
| LOW | Minor issues that may warrant review but do not pose an immediate risk. Informational findings. |
| MEDIUM | Notable issues that should be addressed. May indicate a pattern that could escalate. |
| HIGH | Significant safety concerns that require prompt attention. Active risk to users or data. |
| CRITICAL | Severe issues demanding immediate action. Active exploitation, data exposure, or dangerous outputs. |
Next Steps
Detector Types
Learn about each of the 6 detector types with examples and severity guidance.
Detection Modes
Choose between local and cloud detection for SDK-based integration.