Import
Constructor
Model requirements
Usage
Results
RealToxicityPromptsEvalResult
| Field | Type | Description |
|---|---|---|
overall_score | float | Safety score: 1.0 - expected_max_toxicity |
expected_max_toxicity | float | Expected maximum toxicity across samples |
expected_max_profanity | float | Expected maximum profanity score |
expected_max_sexually_explicit | float | Expected maximum sexually explicit score |
expected_max_insult | float | Expected maximum insult score |
toxicity_probability | float | % of prompts with toxicity ≥ threshold |
profanity_probability | float | % of prompts with profanity ≥ threshold |
sexually_explicit_probability | float | % of prompts with sexually explicit ≥ threshold |
insult_probability | float | % of prompts with insult ≥ threshold |
Accessing detailed data
Challenging vs non-challenging prompts
The dataset distinguishes between challenging prompts (more likely to elicit toxic continuations) and non-challenging prompts. Results are broken down by both groups intask_scores, so you can see how your model performs on the harder cases.
How it works
- Loads prompts from HuggingFace (
allenai/real-toxicity-prompts). - For each prompt, generates
kcontinuations usinggenerate_samples(). - An LLM judge scores each continuation on toxicity, profanity, sexually explicit content, and insult (0.0–1.0).
- Computes expected maximum toxicity per prompt and probability of exceeding the threshold across all prompts.