Docs / Reference · MCP server

MCP server

Query agent signals, run details, and health scores directly from Claude Code, Cursor, Codex, or any MCP-compatible client — without leaving your editor. All data is read-only; no raw prompts, tool arguments, or model outputs ever leave your process.

What it is

The MCP server wraps the Dunetrace Customer API in the Model Context Protocol. Your editor (or any LLM) can call it as a tool and ask things like:

  • "Is my langchain-example-agent healthy?"
  • "What failed in the last 24 hours?"
  • "Show me signal #518 — what happened and how do I fix it?"
  • "Is the TOOL_LOOP I'm seeing systemic or a one-off?"
  • "Walk me through run 019e2314-6b7 step by step."

Prerequisites

  • Dunetrace backend running (docker compose up -d)
  • Python 3.11+
  • Customer API accessible at http://localhost:8002 (or set DUNETRACE_API_URL)

Install

pip install dunetrace-mcp

Or install from source for development:

cd packages/mcp-server
pip install -e .

Client setup

Claude Code

Add to ~/.claude.json:

{
  "mcpServers": {
    "dunetrace": {
      "command": "dunetrace-mcp",
      "env": {
        "DUNETRACE_API_URL": "http://localhost:8002",
        "DUNETRACE_API_KEY": "dt_dev_test"
      }
    }
  }
}

Restart Claude Code. The dunetrace server will appear in the MCP tools list.

Cursor

Create .cursor/mcp.json in your project root (or global ~/.cursor/mcp.json):

{
  "mcpServers": {
    "dunetrace": {
      "command": "dunetrace-mcp",
      "env": {
        "DUNETRACE_API_URL": "http://localhost:8002",
        "DUNETRACE_API_KEY": "dt_dev_test"
      }
    }
  }
}

SSE clients (Codex, Langdock, etc.)

Run the server in SSE mode (listens on :8000 by default):

dunetrace-mcp --sse
dunetrace-mcp --sse --port 9000   # custom port

Point your client's tool endpoint at http://localhost:8000/sse.

Manual test (stdio)

dunetrace-mcp

The server speaks MCP over stdin/stdout. You can pipe JSON-RPC messages manually or use the MCP Inspector.

Environment variables

VariableDefaultDescription
DUNETRACE_API_URLhttp://localhost:8002Customer API base URL
DUNETRACE_API_KEYdt_dev_testBearer token (auth header)

For production, set DUNETRACE_API_KEY to your real API key.

Tools

list_agents

List all monitored agents with their run counts, signal counts, and failure type breakdown. No arguments.

summarize_agent

One-shot diagnosis of an agent. Combines health score, failure breakdown, recent signals with their fixes, and health component bars. Start here before diving deeper.

ArgumentTypeDescription
agent_idstringAgent ID (from list_agents)

get_agent_health

Health score (0–100) and per-component breakdown for an agent. Requires ≥3 runs for a score. Token/latency components return neutral (half points) until ≥30 runs accumulate a baseline.

ComponentMax pointsMeasures
failure_rate40% of runs that triggered any signal
loop_avoidance25% of runs without a tool loop
token_efficiency20Avg prompt tokens vs. per-agent baseline
latency15Avg LLM latency vs. per-agent baseline

get_agent_patterns

Analyze failure patterns: systemic vs. one-off classification, daily signal trend, failure rates by type, and input hashes that consistently trigger failures. A failure marked SYSTEMIC has appeared in a high proportion of runs over an extended window. Only input patterns with ≥50% hit rate are shown.

ArgumentTypeDescription
agent_idstringAgent ID

get_agent_runs

List recent runs for an agent with durations and signal status.

ArgumentTypeDefaultDescription
agent_idstringrequiredAgent ID
limitint20Max runs to return (max 100)

get_agent_signals

Recent failure signals for a specific agent, with titles, explanations, and fix suggestions.

ArgumentTypeDefaultDescription
agent_idstringrequiredAgent ID
limitint20Max signals to return (max 100)
severitystringFilter: CRITICAL, HIGH, MEDIUM, or LOW

get_signal_detail

Full detail for a specific signal: complete evidence dict, impact statement, and all suggested fixes with code snippets.

ArgumentTypeDescription
signal_idintInteger signal ID (visible in search_signals output)
agent_idstringOptional — omit to search all agents
Privacy note: The evidence dict in signal responses contains SHA-256 hashed fingerprints the detector used — not the original content. Raw tool arguments never leave your agent process.

search_signals

Search signals across all agents with combined filters. Useful for cross-agent audits or time-bounded investigations.

ArgumentTypeDefaultDescription
severitystringFilter: CRITICAL, HIGH, MEDIUM, or LOW
failure_typestringDetector name e.g. TOOL_LOOP, COST_SPIKE, CONTEXT_BLOAT
since_hoursintOnly signals from the last N hours
agent_idstringRestrict to one agent; searches all agents if omitted
limitint30Max signals to return (max 200)

get_run_detail

Full event timeline for a specific run.

ArgumentTypeDescription
run_idstringRun UUID
agent_idstringOptional

Example output for a failing run shows the full event timeline with timestamps, token counts, and latency per LLM call — the prompt token growth across calls is a secondary signal for context inflation.

get_instrumentation_guide

Quick-start code snippet for instrumenting an agent with Dunetrace.

ArgumentTypeDescription
frameworkstringlangchain, python, typescript, tools, or otel

Aliases accepted: langgraph, lc, ts, js, node, otlp, opentelemetry, langdock, dify, tool-calls, tracking.

Worked example: diagnosing a failing agent

This is a real diagnostic session using actual data from langchain-example-agent.

Step 1 — triage

You:   Is my LangChain agent healthy?

Claude: [calls list_agents]
        langchain-example-agent — 134 runs, 57 signals, health 48/100.
        Dominant failure: TOOL_LOOP (48 signals, 36% of runs).

        [calls summarize_agent("langchain-example-agent")]
        Health is 🔴 48/100. TOOL_LOOP is the main issue — 48 signals
        across 36% of runs, confidence 90% on every one. The agent is
        calling web_search 6 times with identical arguments (steps 2–7)
        without tracking which queries it's already tried.

Step 2 — check if it's systemic

You:   Is it getting worse?

Claude: [calls get_agent_patterns("langchain-example-agent")]
        Yes — confirmed SYSTEMIC (75% of runs), and escalating:
        TOOL_LOOP signals went 1 → 1 → 1 → 2 → 5 → 7 over the last 7 days.
        Three input hashes trigger it at 98–100% rate. It's a deterministic
        bug, not flaky behaviour.

Step 3 — look at a specific failing run

You:   Show me a failing run.

Claude: [calls get_agent_runs("langchain-example-agent", limit=5)]
        Most recent failing run: 019e2314-6b7 (4.1s, 8 steps, 1 signal).
        Clean runs have 4 steps. Failing runs always have 8 — the loop
        adds exactly 4 extra steps every time.

        [calls get_run_detail("019e2314-6b7")]
        web_search called at steps 2, 4, 6 with identical args. Prompt
        tokens growing 512 → 612 → 710 → 805 as redundant results accumulate.

Step 4 — get the fix

You:   What's the fix?

Claude: [calls get_signal_detail(518, "langchain-example-agent")]
        Two options:

        1. Code fix — add dedup in your tool wrapper:
           seen_queries = set()
           def web_search(query):
               if query in seen_queries:
                   return "Already searched. Try rephrasing."
               seen_queries.add(query)
               return _do_search(query)

        2. Prompt addition (faster to ship):
           "Do not repeat a search query you have already tried.
            If a search returned no useful results, reformulate
            the query before trying again."

        Option 2 can be applied via Langfuse in one click from the dashboard.

Typical workflows

Investigate a run from a Slack alert

You:   Check run 019e2314-6b7

Claude: [calls get_run_detail("019e2314-6b7")]
        4.1s, 8 steps. TOOL_LOOP at step 7 — web_search called 6×
        with identical args. Context growing 512→805 tokens per call.
        Fix: add a dedup set or prompt instruction.

Cross-agent audit

You:   Which agents had issues in the last 24 hours?

Claude: [calls list_agents]
        langfuse-example-agent — 47 HIGH signals (TOOL_LOOP), last 12h.
        langfuse-ts-example-agent — 3 HIGH signals (TOOL_LOOP), last 21h.
        Both looping on web_search. Likely the same root cause.

Before a deploy

You:   Is langchain-example-agent stable enough to deploy?

Claude: [calls get_agent_patterns("langchain-example-agent")]
        No — TOOL_LOOP is systemic (75% of runs) and escalating daily.
        Three input hashes trigger it at 98–100%. Ship the dedup fix first.

Instrument a new agent

You:   How do I add Dunetrace to my LangChain agent?

Claude: [calls get_instrumentation_guide("langchain")]
        pip install 'dunetrace[langchain]'

        from dunetrace import Dunetrace
        from dunetrace.integrations.langchain import DunetraceCallbackHandler

        dt = Dunetrace(endpoint="http://localhost:8001")
        callback = DunetraceCallbackHandler(dt, agent_id="my-agent",
                                            model="gpt-4o-mini",
                                            tools=["web_search"])
        agent.invoke(input, config={"callbacks": [callback]})
        dt.shutdown()

Privacy

All data served by the MCP tools comes from the Dunetrace Customer API, which stores only hashed or structural metadata:

  • Tool arguments → SHA-256 hash (shown as args_hashes)
  • LLM prompts and outputs → SHA-256 hash (never stored)
  • Token counts, latency, step counts → stored as plain numbers
  • Run and signal metadata → stored as plain text

The evidence dict in signal responses contains the hashed fingerprints the detector used — not the original content.

Troubleshooting

starlette conflict with fastapi

The mcp package pulls in starlette 1.0.0. FastAPI 0.115 and earlier cap starlette below that. FastAPI 0.136+ removed the upper bound and is fully compatible.

pip install --upgrade fastapi

Server not appearing in Claude Code / Cursor

  • Confirm dunetrace-mcp is on your PATH: which dunetrace-mcp
  • Confirm the Customer API is reachable: curl http://localhost:8002/health
  • Restart the editor after editing the MCP config file