Tool Call Tracing - Avaliar AI

Overview

When an LLM decides to call a tool, Avaliar can trace that tool call as a child span under the LLM span that triggered it. You get a complete picture of what the agent decided to do, what parameters it passed, what the tool returned, and how long each step took. Tool call tracing is built into the SDK. There is no special integration required — you add @traceable("tool") to any function and it automatically becomes a traced span.

Basic Usage

Add @traceable("tool") to any function your agent calls:

from avaliar import traceable
from openai import AsyncOpenAI

client = AsyncOpenAI()


@traceable("tool")
async def search_knowledge_base(query: str) -> str:
    """Query internal documentation."""
    results = db.search(query)
    return "\n".join(results)


@traceable("tool")
async def send_email(to: str, subject: str, body: str) -> dict:
    """Send an email via the internal mail service."""
    response = mail_client.send(to=to, subject=subject, body=body)
    return {"status": "sent", "message_id": response.id}


@traceable(
    "llm",
    model="gpt-4o",
    provider="openai",
)
async def agent_step(messages: list) -> str:
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
    )
    return response.choices[0].message.content


@traceable("agent")
async def run_agent(user_query: str) -> str:
    # Both tool calls automatically appear as children of this agent span
    context = await search_knowledge_base(user_query)
    result = await agent_step([
        {"role": "system", "content": f"Context: {context}"},
        {"role": "user", "content": user_query},
    ])
    return result

The trace tree looks like:

run_agent             (agent)
  ├── search_knowledge_base  (tool)   ← query in, docs out
  └── agent_step             (llm)    ← messages in, response out

Each tool span records the exact function arguments as inputs and the return value as output. No extra configuration is needed — the SDK picks up the parent context automatically.

What Gets Traced

For every @traceable("tool") span, Avaliar records:

Field	Content
Name	The function name
Inputs	All positional and keyword arguments
Output	The return value
Start time	When the function was called
End time	When it returned
Duration	End minus start
Error	The exception message if it raised
Parent span	The LLM or agent span that called it

This makes it possible to answer questions like: “What did the knowledge base return for that query?” or “Why did the booking tool fail?” — directly from the Trace Explorer.

Sync and Async

@traceable("tool") works with both sync and async functions:

# Async tool
@traceable("tool")
async def fetch_from_api(endpoint: str) -> dict:
    async with httpx.AsyncClient() as client:
        response = await client.get(endpoint)
        return response.json()


# Sync tool
@traceable("tool")
def query_database(sql: str) -> list[dict]:
    cursor = db.execute(sql)
    return cursor.fetchall()

Both can be mixed freely in the same trace tree.

Using with LangChain Tools

If you’re using LangChain’s @tool decorator, place @tool on the outside and @traceable("tool") directly above the function. This order preserves LangChain’s schema generation (which reads the function’s docstring and type hints) while still giving Avaliar the original function to wrap.

from langchain_core.tools import tool
from avaliar import traceable


@tool
@traceable("tool")
def lookup_policy(query: str) -> str:
    """Search company policy documentation for the given query."""
    docs = retriever.invoke(query)
    return "\n\n".join([doc.page_content for doc in docs])


@tool
@traceable("tool")
def search_flights(
    departure_airport: str,
    arrival_airport: str,
    date: str,
) -> list[dict]:
    """Search for available flights between two airports on a given date."""
    return flight_db.search(departure_airport, arrival_airport, date)

The @tool decorator wraps the already-traced function, so LangChain calls the traced wrapper and the span is created correctly.

If you reverse the order (@traceable outer, @tool inner), LangChain will read the schema from the @traceable wrapper instead of the original function, which loses the docstring and parameter annotations that LangChain uses to describe the tool to the LLM.

LangGraph: Restoring Context Across Threads

LangGraph sometimes runs tool nodes in a new thread or as part of a separate execution context. When this happens, the tool function may start without a parent span because Python’s contextvars are not inherited across thread boundaries. If you see tool spans appearing as disconnected root spans instead of children of the agent span, use this pattern:

import contextvars
from avaliar import traceable
from avaliar.run_tree import _CURRENT_RUN_TREE

# At the start of your root span, capture the current run tree
_CURRENT_TRACE_PARENT = None

@traceable("agent")
async def process_turn(user_input: str) -> str:
    global _CURRENT_TRACE_PARENT
    # Save parent so tools in new threads can find it
    _CURRENT_TRACE_PARENT = _CURRENT_RUN_TREE.get()
    # ... run your LangGraph graph ...


# In any tool that LangGraph might run in a new thread:
@tool
@traceable("tool")
def fetch_user_data(config: RunnableConfig) -> list[dict]:
    """Fetch data for the current user."""
    # Restore context if this thread doesn't have it
    if _CURRENT_RUN_TREE.get() is None and _CURRENT_TRACE_PARENT is not None:
        _CURRENT_RUN_TREE.set(_CURRENT_TRACE_PARENT)

    # Now proceed with actual logic
    @traceable("tool")
    def _fetch(user_id: str) -> list[dict]:
        return db.query("SELECT * FROM users WHERE id = ?", user_id)

    return _fetch(config["configurable"]["user_id"])

This workaround is only needed when your orchestration framework (like LangGraph) dispatches tools to new threads. Plain asyncio.gather does not have this problem because async tasks inherit their parent’s contextvars context.

Multi-Tool Agent Example

This example shows a realistic agent that calls multiple tools in sequence and traces the full pipeline:

import asyncio
from avaliar import traceable
from avaliar.trace import update_current_llm_run
from openai import AsyncOpenAI

client = AsyncOpenAI()


@traceable("tool")
async def search_products(query: str, max_results: int = 5) -> list[dict]:
    """Search the product catalog."""
    return catalog.search(query, limit=max_results)


@traceable("tool")
async def check_inventory(product_id: str) -> dict:
    """Check current stock levels for a product."""
    return inventory.get(product_id)


@traceable("tool")
async def create_order(product_id: str, quantity: int, customer_id: str) -> dict:
    """Place an order for a product."""
    return orders.create(product_id=product_id, qty=quantity, cid=customer_id)


@traceable(
    "llm",
    model="gpt-4o",
    provider="openai",
)
async def decide_and_act(messages: list, available_context: str) -> str:
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=messages + [
            {"role": "system", "content": f"Context:\n{available_context}"}
        ],
    )
    update_current_llm_run(
        input_tokens=response.usage.prompt_tokens,
        output_tokens=response.usage.completion_tokens,
    )
    return response.choices[0].message.content


@traceable("agent")
async def shopping_agent(customer_id: str, request: str) -> str:
    # Step 1: find products matching the request
    products = await search_products(request)

    # Step 2: check inventory for first result
    if products:
        stock = await check_inventory(products[0]["id"])
        context = f"Products: {products}\nStock: {stock}"
    else:
        context = "No products found."

    # Step 3: ask LLM what to do
    messages = [{"role": "user", "content": request}]
    decision = await decide_and_act(messages, context)

    return decision

Trace tree for a single shopping_agent call:

shopping_agent        (agent)
  ├── search_products (tool)   ← query="wireless headphones", returns 5 products
  ├── check_inventory (tool)   ← product_id="SKU-42", returns {in_stock: 3}
  └── decide_and_act  (llm)    ← messages + context in, recommendation out

Tracing Tool Errors

If a tool raises an exception, the span records the error and marks the span as failed. The rest of the trace continues unaffected.

@traceable("tool")
async def risky_api_call(endpoint: str) -> dict:
    response = await client.get(endpoint)
    response.raise_for_status()  # May raise — recorded as span error
    return response.json()


@traceable("agent")
async def agent(query: str) -> str:
    try:
        data = await risky_api_call("/api/data")
    except Exception as e:
        # The tool span is already recorded with the error
        # Handle gracefully here
        data = {}

    return process(data)

The failed tool span appears in the Trace Explorer with the error message, status, and exact inputs that caused it — useful for debugging intermittent tool failures.

​Overview

​Basic Usage

​What Gets Traced

​Sync and Async

​Using with LangChain Tools

​LangGraph: Restoring Context Across Threads

​Multi-Tool Agent Example

​Tracing Tool Errors

Overview

Basic Usage

What Gets Traced

Sync and Async

Using with LangChain Tools

LangGraph: Restoring Context Across Threads

Multi-Tool Agent Example

Tracing Tool Errors