Overview
When an LLM decides to call a tool, Avaliar can trace that tool call as a child span under the LLM span that triggered it. You get a complete picture of what the agent decided to do, what parameters it passed, what the tool returned, and how long each step took.
Tool call tracing is built into the SDK. There is no special integration required — you add @traceable("tool") to any function and it automatically becomes a traced span.
Basic Usage
Add @traceable("tool") to any function your agent calls:
from avaliar import traceable
from openai import AsyncOpenAI
client = AsyncOpenAI()
@traceable("tool")
async def search_knowledge_base(query: str) -> str:
"""Query internal documentation."""
results = db.search(query)
return "\n".join(results)
@traceable("tool")
async def send_email(to: str, subject: str, body: str) -> dict:
"""Send an email via the internal mail service."""
response = mail_client.send(to=to, subject=subject, body=body)
return {"status": "sent", "message_id": response.id}
@traceable(
"llm",
model="gpt-4o",
provider="openai",
)
async def agent_step(messages: list) -> str:
response = await client.chat.completions.create(
model="gpt-4o",
messages=messages,
)
return response.choices[0].message.content
@traceable("agent")
async def run_agent(user_query: str) -> str:
# Both tool calls automatically appear as children of this agent span
context = await search_knowledge_base(user_query)
result = await agent_step([
{"role": "system", "content": f"Context: {context}"},
{"role": "user", "content": user_query},
])
return result
The trace tree looks like:
run_agent (agent)
├── search_knowledge_base (tool) ← query in, docs out
└── agent_step (llm) ← messages in, response out
Each tool span records the exact function arguments as inputs and the return value as output. No extra configuration is needed — the SDK picks up the parent context automatically.
What Gets Traced
For every @traceable("tool") span, Avaliar records:
| Field | Content |
|---|
| Name | The function name |
| Inputs | All positional and keyword arguments |
| Output | The return value |
| Start time | When the function was called |
| End time | When it returned |
| Duration | End minus start |
| Error | The exception message if it raised |
| Parent span | The LLM or agent span that called it |
This makes it possible to answer questions like: “What did the knowledge base return for that query?” or “Why did the booking tool fail?” — directly from the Trace Explorer.
Sync and Async
@traceable("tool") works with both sync and async functions:
# Async tool
@traceable("tool")
async def fetch_from_api(endpoint: str) -> dict:
async with httpx.AsyncClient() as client:
response = await client.get(endpoint)
return response.json()
# Sync tool
@traceable("tool")
def query_database(sql: str) -> list[dict]:
cursor = db.execute(sql)
return cursor.fetchall()
Both can be mixed freely in the same trace tree.
If you’re using LangChain’s @tool decorator, place @tool on the outside and @traceable("tool") directly above the function. This order preserves LangChain’s schema generation (which reads the function’s docstring and type hints) while still giving Avaliar the original function to wrap.
from langchain_core.tools import tool
from avaliar import traceable
@tool
@traceable("tool")
def lookup_policy(query: str) -> str:
"""Search company policy documentation for the given query."""
docs = retriever.invoke(query)
return "\n\n".join([doc.page_content for doc in docs])
@tool
@traceable("tool")
def search_flights(
departure_airport: str,
arrival_airport: str,
date: str,
) -> list[dict]:
"""Search for available flights between two airports on a given date."""
return flight_db.search(departure_airport, arrival_airport, date)
The @tool decorator wraps the already-traced function, so LangChain calls the traced wrapper and the span is created correctly.
If you reverse the order (@traceable outer, @tool inner), LangChain will read the schema from the @traceable wrapper instead of the original function, which loses the docstring and parameter annotations that LangChain uses to describe the tool to the LLM.
LangGraph: Restoring Context Across Threads
LangGraph sometimes runs tool nodes in a new thread or as part of a separate execution context. When this happens, the tool function may start without a parent span because Python’s contextvars are not inherited across thread boundaries.
If you see tool spans appearing as disconnected root spans instead of children of the agent span, use this pattern:
import contextvars
from avaliar import traceable
from avaliar.run_tree import _CURRENT_RUN_TREE
# At the start of your root span, capture the current run tree
_CURRENT_TRACE_PARENT = None
@traceable("agent")
async def process_turn(user_input: str) -> str:
global _CURRENT_TRACE_PARENT
# Save parent so tools in new threads can find it
_CURRENT_TRACE_PARENT = _CURRENT_RUN_TREE.get()
# ... run your LangGraph graph ...
# In any tool that LangGraph might run in a new thread:
@tool
@traceable("tool")
def fetch_user_data(config: RunnableConfig) -> list[dict]:
"""Fetch data for the current user."""
# Restore context if this thread doesn't have it
if _CURRENT_RUN_TREE.get() is None and _CURRENT_TRACE_PARENT is not None:
_CURRENT_RUN_TREE.set(_CURRENT_TRACE_PARENT)
# Now proceed with actual logic
@traceable("tool")
def _fetch(user_id: str) -> list[dict]:
return db.query("SELECT * FROM users WHERE id = ?", user_id)
return _fetch(config["configurable"]["user_id"])
This workaround is only needed when your orchestration framework (like LangGraph) dispatches tools to new threads. Plain asyncio.gather does not have this problem because async tasks inherit their parent’s contextvars context.
This example shows a realistic agent that calls multiple tools in sequence and traces the full pipeline:
import asyncio
from avaliar import traceable
from avaliar.trace import update_current_llm_run
from openai import AsyncOpenAI
client = AsyncOpenAI()
@traceable("tool")
async def search_products(query: str, max_results: int = 5) -> list[dict]:
"""Search the product catalog."""
return catalog.search(query, limit=max_results)
@traceable("tool")
async def check_inventory(product_id: str) -> dict:
"""Check current stock levels for a product."""
return inventory.get(product_id)
@traceable("tool")
async def create_order(product_id: str, quantity: int, customer_id: str) -> dict:
"""Place an order for a product."""
return orders.create(product_id=product_id, qty=quantity, cid=customer_id)
@traceable(
"llm",
model="gpt-4o",
provider="openai",
)
async def decide_and_act(messages: list, available_context: str) -> str:
response = await client.chat.completions.create(
model="gpt-4o",
messages=messages + [
{"role": "system", "content": f"Context:\n{available_context}"}
],
)
update_current_llm_run(
input_tokens=response.usage.prompt_tokens,
output_tokens=response.usage.completion_tokens,
)
return response.choices[0].message.content
@traceable("agent")
async def shopping_agent(customer_id: str, request: str) -> str:
# Step 1: find products matching the request
products = await search_products(request)
# Step 2: check inventory for first result
if products:
stock = await check_inventory(products[0]["id"])
context = f"Products: {products}\nStock: {stock}"
else:
context = "No products found."
# Step 3: ask LLM what to do
messages = [{"role": "user", "content": request}]
decision = await decide_and_act(messages, context)
return decision
Trace tree for a single shopping_agent call:
shopping_agent (agent)
├── search_products (tool) ← query="wireless headphones", returns 5 products
├── check_inventory (tool) ← product_id="SKU-42", returns {in_stock: 3}
└── decide_and_act (llm) ← messages + context in, recommendation out
If a tool raises an exception, the span records the error and marks the span as failed. The rest of the trace continues unaffected.
@traceable("tool")
async def risky_api_call(endpoint: str) -> dict:
response = await client.get(endpoint)
response.raise_for_status() # May raise — recorded as span error
return response.json()
@traceable("agent")
async def agent(query: str) -> str:
try:
data = await risky_api_call("/api/data")
except Exception as e:
# The tool span is already recorded with the error
# Handle gracefully here
data = {}
return process(data)
The failed tool span appears in the Trace Explorer with the error message, status, and exact inputs that caused it — useful for debugging intermittent tool failures.