What it is
The MCP server wraps the Dunetrace Customer API in the Model Context Protocol. Your editor (or any LLM) can call it as a tool and ask things like:
- "Is my
langchain-example-agenthealthy?" - "What failed in the last 24 hours?"
- "Show me signal #518 — what happened and how do I fix it?"
- "Is the TOOL_LOOP I'm seeing systemic or a one-off?"
- "Walk me through run
019e2314-6b7step by step."
Prerequisites
- Dunetrace backend running (
docker compose up -d) - Python 3.11+
- Customer API accessible at
http://localhost:8002(or setDUNETRACE_API_URL)
Install
pip install dunetrace-mcp
Or install from source for development:
cd packages/mcp-server
pip install -e .
Client setup
Claude Code
Add to ~/.claude.json:
{
"mcpServers": {
"dunetrace": {
"command": "dunetrace-mcp",
"env": {
"DUNETRACE_API_URL": "http://localhost:8002",
"DUNETRACE_API_KEY": "dt_dev_test"
}
}
}
}
Restart Claude Code. The dunetrace server will appear in the MCP tools list.
Cursor
Create .cursor/mcp.json in your project root (or global ~/.cursor/mcp.json):
{
"mcpServers": {
"dunetrace": {
"command": "dunetrace-mcp",
"env": {
"DUNETRACE_API_URL": "http://localhost:8002",
"DUNETRACE_API_KEY": "dt_dev_test"
}
}
}
}
SSE clients (Codex, Langdock, etc.)
Run the server in SSE mode (listens on :8000 by default):
dunetrace-mcp --sse
dunetrace-mcp --sse --port 9000 # custom port
Point your client's tool endpoint at http://localhost:8000/sse.
Manual test (stdio)
dunetrace-mcp
The server speaks MCP over stdin/stdout. You can pipe JSON-RPC messages manually or use the MCP Inspector.
Environment variables
| Variable | Default | Description |
|---|---|---|
DUNETRACE_API_URL | http://localhost:8002 | Customer API base URL |
DUNETRACE_API_KEY | dt_dev_test | Bearer token (auth header) |
For production, set DUNETRACE_API_KEY to your real API key.
Tools
list_agents
List all monitored agents with their run counts, signal counts, and failure type breakdown. No arguments.
summarize_agent
One-shot diagnosis of an agent. Combines health score, failure breakdown, recent signals with their fixes, and health component bars. Start here before diving deeper.
| Argument | Type | Description |
|---|---|---|
agent_id | string | Agent ID (from list_agents) |
get_agent_health
Health score (0–100) and per-component breakdown for an agent. Requires ≥3 runs for a score. Token/latency components return neutral (half points) until ≥30 runs accumulate a baseline.
| Component | Max points | Measures |
|---|---|---|
failure_rate | 40 | % of runs that triggered any signal |
loop_avoidance | 25 | % of runs without a tool loop |
token_efficiency | 20 | Avg prompt tokens vs. per-agent baseline |
latency | 15 | Avg LLM latency vs. per-agent baseline |
get_agent_patterns
Analyze failure patterns: systemic vs. one-off classification, daily signal trend, failure rates by type, and input hashes that consistently trigger failures. A failure marked SYSTEMIC has appeared in a high proportion of runs over an extended window. Only input patterns with ≥50% hit rate are shown.
| Argument | Type | Description |
|---|---|---|
agent_id | string | Agent ID |
get_agent_runs
List recent runs for an agent with durations and signal status.
| Argument | Type | Default | Description |
|---|---|---|---|
agent_id | string | required | Agent ID |
limit | int | 20 | Max runs to return (max 100) |
get_agent_signals
Recent failure signals for a specific agent, with titles, explanations, and fix suggestions.
| Argument | Type | Default | Description |
|---|---|---|---|
agent_id | string | required | Agent ID |
limit | int | 20 | Max signals to return (max 100) |
severity | string | — | Filter: CRITICAL, HIGH, MEDIUM, or LOW |
get_signal_detail
Full detail for a specific signal: complete evidence dict, impact statement, and all suggested fixes with code snippets.
| Argument | Type | Description |
|---|---|---|
signal_id | int | Integer signal ID (visible in search_signals output) |
agent_id | string | Optional — omit to search all agents |
evidence dict in signal responses contains SHA-256 hashed fingerprints the detector used — not the original content. Raw tool arguments never leave your agent process.search_signals
Search signals across all agents with combined filters. Useful for cross-agent audits or time-bounded investigations.
| Argument | Type | Default | Description |
|---|---|---|---|
severity | string | — | Filter: CRITICAL, HIGH, MEDIUM, or LOW |
failure_type | string | — | Detector name e.g. TOOL_LOOP, COST_SPIKE, CONTEXT_BLOAT |
since_hours | int | — | Only signals from the last N hours |
agent_id | string | — | Restrict to one agent; searches all agents if omitted |
limit | int | 30 | Max signals to return (max 200) |
get_run_detail
Full event timeline for a specific run.
| Argument | Type | Description |
|---|---|---|
run_id | string | Run UUID |
agent_id | string | Optional |
Example output for a failing run shows the full event timeline with timestamps, token counts, and latency per LLM call — the prompt token growth across calls is a secondary signal for context inflation.
get_instrumentation_guide
Quick-start code snippet for instrumenting an agent with Dunetrace.
| Argument | Type | Description |
|---|---|---|
framework | string | langchain, python, typescript, tools, or otel |
Aliases accepted: langgraph, lc, ts, js, node, otlp, opentelemetry, langdock, dify, tool-calls, tracking.
Worked example: diagnosing a failing agent
This is a real diagnostic session using actual data from langchain-example-agent.
Step 1 — triage
You: Is my LangChain agent healthy?
Claude: [calls list_agents]
langchain-example-agent — 134 runs, 57 signals, health 48/100.
Dominant failure: TOOL_LOOP (48 signals, 36% of runs).
[calls summarize_agent("langchain-example-agent")]
Health is 🔴 48/100. TOOL_LOOP is the main issue — 48 signals
across 36% of runs, confidence 90% on every one. The agent is
calling web_search 6 times with identical arguments (steps 2–7)
without tracking which queries it's already tried.
Step 2 — check if it's systemic
You: Is it getting worse?
Claude: [calls get_agent_patterns("langchain-example-agent")]
Yes — confirmed SYSTEMIC (75% of runs), and escalating:
TOOL_LOOP signals went 1 → 1 → 1 → 2 → 5 → 7 over the last 7 days.
Three input hashes trigger it at 98–100% rate. It's a deterministic
bug, not flaky behaviour.
Step 3 — look at a specific failing run
You: Show me a failing run.
Claude: [calls get_agent_runs("langchain-example-agent", limit=5)]
Most recent failing run: 019e2314-6b7 (4.1s, 8 steps, 1 signal).
Clean runs have 4 steps. Failing runs always have 8 — the loop
adds exactly 4 extra steps every time.
[calls get_run_detail("019e2314-6b7")]
web_search called at steps 2, 4, 6 with identical args. Prompt
tokens growing 512 → 612 → 710 → 805 as redundant results accumulate.
Step 4 — get the fix
You: What's the fix?
Claude: [calls get_signal_detail(518, "langchain-example-agent")]
Two options:
1. Code fix — add dedup in your tool wrapper:
seen_queries = set()
def web_search(query):
if query in seen_queries:
return "Already searched. Try rephrasing."
seen_queries.add(query)
return _do_search(query)
2. Prompt addition (faster to ship):
"Do not repeat a search query you have already tried.
If a search returned no useful results, reformulate
the query before trying again."
Option 2 can be applied via Langfuse in one click from the dashboard.
Typical workflows
Investigate a run from a Slack alert
You: Check run 019e2314-6b7
Claude: [calls get_run_detail("019e2314-6b7")]
4.1s, 8 steps. TOOL_LOOP at step 7 — web_search called 6×
with identical args. Context growing 512→805 tokens per call.
Fix: add a dedup set or prompt instruction.
Cross-agent audit
You: Which agents had issues in the last 24 hours?
Claude: [calls list_agents]
langfuse-example-agent — 47 HIGH signals (TOOL_LOOP), last 12h.
langfuse-ts-example-agent — 3 HIGH signals (TOOL_LOOP), last 21h.
Both looping on web_search. Likely the same root cause.
Before a deploy
You: Is langchain-example-agent stable enough to deploy?
Claude: [calls get_agent_patterns("langchain-example-agent")]
No — TOOL_LOOP is systemic (75% of runs) and escalating daily.
Three input hashes trigger it at 98–100%. Ship the dedup fix first.
Instrument a new agent
You: How do I add Dunetrace to my LangChain agent?
Claude: [calls get_instrumentation_guide("langchain")]
pip install 'dunetrace[langchain]'
from dunetrace import Dunetrace
from dunetrace.integrations.langchain import DunetraceCallbackHandler
dt = Dunetrace(endpoint="http://localhost:8001")
callback = DunetraceCallbackHandler(dt, agent_id="my-agent",
model="gpt-4o-mini",
tools=["web_search"])
agent.invoke(input, config={"callbacks": [callback]})
dt.shutdown()
Privacy
All data served by the MCP tools comes from the Dunetrace Customer API, which stores only hashed or structural metadata:
- Tool arguments → SHA-256 hash (shown as
args_hashes) - LLM prompts and outputs → SHA-256 hash (never stored)
- Token counts, latency, step counts → stored as plain numbers
- Run and signal metadata → stored as plain text
The evidence dict in signal responses contains the hashed fingerprints the detector used — not the original content.
Troubleshooting
starlette conflict with fastapi
The mcp package pulls in starlette 1.0.0. FastAPI 0.115 and earlier cap starlette below that. FastAPI 0.136+ removed the upper bound and is fully compatible.
pip install --upgrade fastapi
Server not appearing in Claude Code / Cursor
- Confirm
dunetrace-mcpis on your PATH:which dunetrace-mcp - Confirm the Customer API is reachable:
curl http://localhost:8002/health - Restart the editor after editing the MCP config file