MCP server

Query agent signals, run details, and health scores directly from Claude Code, Cursor, Codex, or any MCP-compatible client — without leaving your editor. All MCP access is read-only, querying the signals and run metadata already in your Dunetrace backend.

What it is

The MCP server wraps the Dunetrace Customer API in the Model Context Protocol. Your editor (or any LLM) can call it as a tool and ask things like:

"Is my langchain-example-agent healthy?"
"What failed in the last 24 hours?"
"Show me signal #518 — what happened and how do I fix it?"
"Is the TOOL_LOOP I'm seeing systemic or a one-off?"
"Walk me through run 019e2314-6b7 step by step."

Prerequisites

Dunetrace backend running (docker compose up -d)
Python 3.11+
Customer API accessible at http://localhost:8002 (or set DUNETRACE_API_URL)

Install

pip install dunetrace-mcp

Or install from source for development:

cd packages/mcp-server
pip install -e .

Client setup

Claude Code

Add to ~/.claude.json:

{
  "mcpServers": {
    "dunetrace": {
      "command": "dunetrace-mcp",
      "env": {
        "DUNETRACE_API_URL": "http://localhost:8002",
        "DUNETRACE_API_KEY": "dt_dev_test"
      }
    }
  }
}

Restart Claude Code. The dunetrace server will appear in the MCP tools list.

Cursor

Create .cursor/mcp.json in your project root (or global ~/.cursor/mcp.json):

{
  "mcpServers": {
    "dunetrace": {
      "command": "dunetrace-mcp",
      "env": {
        "DUNETRACE_API_URL": "http://localhost:8002",
        "DUNETRACE_API_KEY": "dt_dev_test"
      }
    }
  }
}

SSE clients (Codex, Langdock, etc.)

Run the server in SSE mode (listens on :8000 by default):

dunetrace-mcp --sse
dunetrace-mcp --sse --port 9000   # custom port

Point your client's tool endpoint at http://localhost:8000/sse.

Manual test (stdio)

dunetrace-mcp

The server speaks MCP over stdin/stdout. You can pipe JSON-RPC messages manually or use the MCP Inspector.

Environment variables

Variable	Default	Description
`DUNETRACE_API_URL`	`http://localhost:8002`	Customer API base URL
`DUNETRACE_API_KEY`	`dt_dev_test`	Bearer token (auth header)

For production, set DUNETRACE_API_KEY to your real API key.

Tools

`list_agents`

List all monitored agents with their run counts, signal counts, and failure type breakdown. No arguments.

`summarize_agent`

One-shot diagnosis of an agent. Combines health score, failure breakdown, recent signals with their fixes, and health component bars. Start here before diving deeper.

Argument	Type	Description
`agent_id`	string	Agent ID (from `list_agents`)

`get_agent_health`

Health score (0–100) and per-component breakdown for an agent. Requires ≥3 runs for a score. Token/latency components return neutral (half points) until ≥30 runs accumulate a baseline.

Component	Max points	Measures
`failure_rate`	40	% of runs that triggered any signal
`loop_avoidance`	25	% of runs without a tool loop
`token_efficiency`	20	Avg prompt tokens vs. per-agent baseline
`latency`	15	Avg LLM latency vs. per-agent baseline

`get_agent_patterns`

Analyze failure patterns: systemic vs. one-off classification, daily signal trend, failure rates by type, and input hashes that consistently trigger failures. A failure marked SYSTEMIC has appeared in a high proportion of runs over an extended window. Only input patterns with ≥50% hit rate are shown.

Argument	Type	Description
`agent_id`	string	Agent ID

`get_agent_runs`

List recent runs for an agent with durations and signal status.

Argument	Type	Default	Description
`agent_id`	string	required	Agent ID
`limit`	int	20	Max runs to return (max 100)

`get_agent_signals`

Recent failure signals for a specific agent, with titles, explanations, and fix suggestions.

Argument	Type	Default	Description
`agent_id`	string	required	Agent ID
`limit`	int	20	Max signals to return (max 100)
`severity`	string	—	Filter: `CRITICAL`, `HIGH`, `MEDIUM`, or `LOW`

`get_signal_detail`

Full detail for a specific signal: complete evidence dict, impact statement, and all suggested fixes with code snippets.

Argument	Type	Description
`signal_id`	int	Integer signal ID (visible in `search_signals` output)
`agent_id`	string	Optional — omit to search all agents

`search_signals`

Search signals across all agents with combined filters. Useful for cross-agent audits or time-bounded investigations.

Argument	Type	Default	Description
`severity`	string	—	Filter: `CRITICAL`, `HIGH`, `MEDIUM`, or `LOW`
`failure_type`	string	—	Detector name e.g. `TOOL_LOOP`, `COST_SPIKE`, `CONTEXT_BLOAT`
`since_hours`	int	—	Only signals from the last N hours
`agent_id`	string	—	Restrict to one agent; searches all agents if omitted
`limit`	int	30	Max signals to return (max 200)

`get_run_detail`

Full event timeline for a specific run.

Argument	Type	Description
`run_id`	string	Run UUID
`agent_id`	string	Optional

Example output for a failing run shows the full event timeline with timestamps, token counts, and latency per LLM call — the prompt token growth across calls is a secondary signal for context inflation.

`get_instrumentation_guide`

Quick-start code snippet for instrumenting an agent with Dunetrace.

Argument	Type	Description
`framework`	string	`langchain`, `python`, `typescript`, `tools`, or `otel`

Aliases accepted: langgraph, lc, ts, js, node, otlp, opentelemetry, langdock, dify, tool-calls, tracking.

Worked example: diagnosing a failing agent

This is a real diagnostic session using actual data from langchain-example-agent.

Step 1 — triage

You:   Is my LangChain agent healthy?

Claude: [calls list_agents]
        langchain-example-agent — 134 runs, 57 signals, health 48/100.
        Dominant failure: TOOL_LOOP (48 signals, 36% of runs).

        [calls summarize_agent("langchain-example-agent")]
        Health is 🔴 48/100. TOOL_LOOP is the main issue — 48 signals
        across 36% of runs, confidence 90% on every one. The agent is
        calling web_search 6 times with identical arguments (steps 2–7)
        without tracking which queries it's already tried.

Step 2 — check if it's systemic

You:   Is it getting worse?

Claude: [calls get_agent_patterns("langchain-example-agent")]
        Yes — confirmed SYSTEMIC (75% of runs), and escalating:
        TOOL_LOOP signals went 1 → 1 → 1 → 2 → 5 → 7 over the last 7 days.
        Three input hashes trigger it at 98–100% rate. It's a deterministic
        bug, not flaky behaviour.

Step 3 — look at a specific failing run

You:   Show me a failing run.

Claude: [calls get_agent_runs("langchain-example-agent", limit=5)]
        Most recent failing run: 019e2314-6b7 (4.1s, 8 steps, 1 signal).
        Clean runs have 4 steps. Failing runs always have 8 — the loop
        adds exactly 4 extra steps every time.

        [calls get_run_detail("019e2314-6b7")]
        web_search called at steps 2, 4, 6 with identical args. Prompt
        tokens growing 512 → 612 → 710 → 805 as redundant results accumulate.

Step 4 — get the fix

You:   What's the fix?

Claude: [calls get_signal_detail(518, "langchain-example-agent")]
        Two options:

        1. Code fix — add dedup in your tool wrapper:
           seen_queries = set()
           def web_search(query):
               if query in seen_queries:
                   return "Already searched. Try rephrasing."
               seen_queries.add(query)
               return _do_search(query)

        2. Prompt addition (faster to ship):
           "Do not repeat a search query you have already tried.
            If a search returned no useful results, reformulate
            the query before trying again."

        Option 2 can be applied via Langfuse in one click from the dashboard.

Typical workflows

Investigate a run from a Slack alert

You:   Check run 019e2314-6b7

Claude: [calls get_run_detail("019e2314-6b7")]
        4.1s, 8 steps. TOOL_LOOP at step 7 — web_search called 6×
        with identical args. Context growing 512→805 tokens per call.
        Fix: add a dedup set or prompt instruction.

Cross-agent audit

You:   Which agents had issues in the last 24 hours?

Claude: [calls list_agents]
        langfuse-example-agent — 47 HIGH signals (TOOL_LOOP), last 12h.
        langfuse-ts-example-agent — 3 HIGH signals (TOOL_LOOP), last 21h.
        Both looping on web_search. Likely the same root cause.

Before a deploy

You:   Is langchain-example-agent stable enough to deploy?

Claude: [calls get_agent_patterns("langchain-example-agent")]
        No — TOOL_LOOP is systemic (75% of runs) and escalating daily.
        Three input hashes trigger it at 98–100%. Ship the dedup fix first.

Instrument a new agent

You:   How do I add Dunetrace to my LangChain agent?

Claude: [calls get_instrumentation_guide("langchain")]
        pip install 'dunetrace[langchain]'

        from dunetrace import Dunetrace
        from dunetrace.integrations.langchain import DunetraceCallbackHandler

        dt = Dunetrace(endpoint="http://localhost:8001")
        callback = DunetraceCallbackHandler(dt, agent_id="my-agent",
                                            model="gpt-4o-mini",
                                            tools=["web_search"])
        agent.invoke(input, config={"callbacks": [callback]})
        dt.shutdown()

Troubleshooting

`starlette` conflict with `fastapi`

The mcp package pulls in starlette 1.0.0. FastAPI 0.115 and earlier cap starlette below that. FastAPI 0.136+ removed the upper bound and is fully compatible.

pip install --upgrade fastapi

Server not appearing in Claude Code / Cursor

Confirm dunetrace-mcp is on your PATH: which dunetrace-mcp
Confirm the Customer API is reachable: curl http://localhost:8002/health
Restart the editor after editing the MCP config file