Overview
This page covers tracing patterns beyond the basics — parallel execution, streaming LLM responses, multi-provider setups, token tracking with different provider SDKs, and sync code.Concurrent Spans
When your agent runs multiple operations in parallel, useasyncio.gather to run them concurrently. The SDK correctly links each parallel call as a child of the current parent span.
Multi-Provider Tracing
You can trace calls to different LLM providers in the same application. Setprovider accurately so the Avaliar dashboard can break down cost and latency by provider.
- OpenAI
- Anthropic
- Google Gemini
Streaming LLM Responses
For streaming responses, decorate the generator function. The SDK captures the full concatenated response after the generator is exhausted and submits it as a single trace.Async streaming
Sync streaming
Token counts are not available from the streaming API mid-stream. Use
update_current_llm_run with the usage object from the final chunk if you need them:Synchronous Code
@traceable works with regular (non-async) functions. Use this when integrating with synchronous code or frameworks that don’t support asyncio.
Token Tracking
Always callupdate_current_llm_run from inside a span_type="llm" function to attach token counts. This data drives cost calculations in the Avaliar dashboard.
| Field | Source |
|---|---|
input_tokens | response.usage.prompt_tokens (OpenAI) / response.usage.input_tokens (Anthropic) |
output_tokens | response.usage.completion_tokens (OpenAI) / response.usage.output_tokens (Anthropic) |
Deep Agent Hierarchies
Traces can be arbitrarily deep. This example traces a three-level agent: coordinator → researcher → LLM.Custom Metadata
Pass extra fields to the provider API alongside your messages and they’ll be recorded in the trace:temperature and top_p decorator parameters are stored in the trace’s generation_info and shown in the Trace Explorer alongside the prompt and response.
Detection on Specific Spans Only
You don’t need to enable detection on every span — add it only where the safety risk is highest. For example, run detection on the final LLM response but not on intermediate summarization calls:Choosing Between Local and Cloud Detection
| Local | Cloud | |
|---|---|---|
| Where it runs | Your infrastructure | Avaliar’s servers |
| Additional dependencies | avaliar_eval, OPENAI_API_KEY | None |
| Latency | Depends on your hardware | Low, managed |
| Data leaves your environment | No | Yes |
| Plan requirement | Free | Pro |
| Best for | Development, air-gapped systems | Production |