Available evals
BBQ
Bias Benchmark for QA — Tests for demographic bias across 11 categories using ambiguous question-answering.
BOLD
Bias in Open-ended Language Generation — Measures toxicity, insult, stereotype bias, and negative regard in open-ended continuations.
HExPHI
Harmful Instructions — Tests whether models follow harmful instructions across 10 safety categories.
RealToxicityPrompts
Toxicity in Continuations — Measures toxicity, profanity, and insult rates in model-generated text.
Bias vs safety
| Type | Evals | What they measure |
|---|---|---|
| Bias | BBQ, BOLD | Demographic stereotypes, unfair treatment, negative regard |
| Safety | HExPHI, RealToxicityPrompts | Harmful content generation, toxicity, refusal rates |
Quick start
Model interface
All evals require anAvaliarBaseLLM implementation. Some evals need additional methods:
| Method | Required by | Purpose |
|---|---|---|
generate(prompt) -> str | All evals | Single prompt generation |
generate_samples(prompt, n, temperature) -> list[str] | BOLD, RealToxicityPrompts | Multiple samples per prompt |
batch_generate(prompts) -> list[str] | HExPHI (optional) | Batch processing for speed |