Technical GuideCompanion to Build Tutorial

How to Test Your MCP Server: Validation, Debugging, and Scoring Impact

You built your MCP server. Now how do you know it actually works? A broken MCP server is worse than no MCP server — agents will try to connect, fail, and mark your business as unreliable. This guide covers five validation methods, the six most common bugs, and how testing translates directly to your Agent Readiness Score.

AgentHermes Research

April 15, 202613 min read

Why MCP Testing Is Not Optional

A website with a broken contact form is annoying. An MCP server with a broken tool is catastrophic for agent trust. Here is why: when a human hits a broken form, they try again or call you. When an AI agent hits a broken tool, it marks your server as unreliable and deprioritizes you in future queries. There is no second chance — agents have perfect memory and zero patience.

Our scan data from 500+ businesses shows that 40% of deployed MCP servers have at least one broken tool. The most common failure: tools that work during development but break in production due to environment differences, missing auth, or transport configuration issues. Testing is not about quality — it is about agent trust.

40%

MCP servers with broken tools

+22

Score boost from working MCP

Second chances from agents

5 Methods to Validate Your MCP Server

Test from the outside in. Start with visual inspection (MCP Inspector), then real-world agent testing (Claude Desktop), then protocol-level verification (curl), then automated regression (Jest), and finally score impact (AgentHermes scan).

Step 1

MCP Inspector — Visual Validation

The MCP Inspector (npx @modelcontextprotocol/inspector) connects to your server and displays all tools, resources, and prompts in a browser UI. You can call tools interactively, see response schemas, and verify descriptions. This is your first test — if Inspector cannot connect or shows missing tools, agents cannot either.

npx @modelcontextprotocol/inspector

Step 2

Claude Desktop — Real Agent Testing

Add your MCP server to Claude Desktop's configuration file (claude_desktop_config.json). Then ask Claude to use your tools naturally: "Check availability at my business" or "Get a quote for lawn care." Claude will discover your tools, call them, and show you the results. This is the closest simulation to how real agents will interact with your server.

Edit ~/Library/Application Support/Claude/claude_desktop_config.json

Step 3

curl for JSON-RPC 2.0 Verification

MCP uses JSON-RPC 2.0 over stdio or SSE transport. For HTTP/SSE servers, use curl to send raw JSON-RPC requests. Test the initialize handshake, tools/list, and individual tool calls. This verifies your server handles the protocol correctly at the lowest level — no client abstractions hiding bugs.

curl -X POST http://localhost:3000/mcp -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","method":"tools/list","id":1}'

Step 4

Automated Test Suite with Jest

Write Jest tests that import your MCP server handlers directly. Test each tool with valid inputs, invalid inputs, missing required fields, and edge cases. Assert response schemas match your tool definitions. Run in CI so every commit validates the MCP contract. This catches regressions before agents hit them.

npx jest --testPathPattern=mcp

Step 5

AgentHermes Scan — D2 Scoring Impact

Run an AgentHermes scan on your domain after deploying the MCP server. The scanner detects MCP endpoints, checks tool count and quality, verifies SSE transport, and measures the impact on your D2 (API) dimension score. A working MCP server with 5+ well-described tools typically adds 15-25 points to your overall Agent Readiness Score.

Visit agenthermes.ai/audit and enter your domain

6 Most Common MCP Server Bugs

These are the bugs we see most frequently when scanning MCP servers. Each one silently degrades agent trust. Check your server against this list before deploying.

Wrong method names in tool definitions

Critical

Symptom

Inspector shows tools but agents get "method not found" errors when calling them

Fix

Tool name in the definition must exactly match the handler function name. Check for typos, underscores vs hyphens, and case sensitivity.

Missing error handling on tool calls

High

Symptom

Agent gets a raw stack trace or empty response instead of a structured error

Fix

Wrap every tool handler in try/catch. Return { content: [{ type: "text", text: "Error: descriptive message" }], isError: true } on failure. Never expose internal errors.

Auth not forwarded from MCP to backend

High

Symptom

Tools work in local testing but return 401 in production. Inspector works but Claude Desktop fails.

Fix

If your MCP tools call authenticated backend APIs, the auth token must flow from the MCP client through your server to the backend. Use environment variables for service-to-service auth, not user tokens.

SSE transport not sending keep-alive

Medium

Symptom

Connection drops after 30-60 seconds of inactivity. Tools work for first call then fail.

Fix

Send SSE comments (": keep-alive\n\n") every 15-30 seconds. Most reverse proxies (nginx, Cloudflare) timeout idle SSE connections. Configure proxy timeout to 300s minimum.

Tool input schemas missing required fields

High

Symptom

Agent sends partial data and tool returns garbage instead of a validation error

Fix

Define inputSchema with JSON Schema "required" array for every mandatory field. Validate inputs before processing. Return clear error messages listing which fields are missing.

Tool descriptions too vague for agent discovery

Medium

Symptom

Agent has access to your tools but never calls them because it does not understand when to use them

Fix

Tool descriptions must answer: What does this tool do? When should an agent use it? What will it return? Include example use cases. Bad: "Get info." Good: "Returns business hours, address, phone number, and service area for the specified business location."

Debugging Techniques

When a tool fails and the error is not obvious, use these techniques to isolate the problem. The goal is always the same: see the exact JSON-RPC message the client sends and the exact response your server returns.

Enable verbose logging

Set DEBUG=mcp:* or your framework's verbose flag. Log every incoming JSON-RPC message and outgoing response. This shows exactly what the client sends and what your server returns — indispensable for protocol-level debugging.

Check SSE transport headers

For HTTP/SSE servers, verify Content-Type is "text/event-stream", Cache-Control is "no-cache", and Connection is "keep-alive". Missing headers cause silent failures in some MCP clients.

Validate tool schemas with ajv

Install ajv (JSON Schema validator) and validate your tool inputSchema definitions against the JSON Schema draft-07 spec. Invalid schemas silently break agent input validation.

Test with multiple MCP clients

If your server works in Inspector but not Claude Desktop, the bug is in transport or auth handling — not tool logic. Test with at least two different clients to isolate client-specific issues.

Monitor with structured logging

Log tool calls as JSON objects: { tool: "get_services", input: {...}, duration_ms: 142, success: true }. Aggregate these to find slow tools, high error rates, and unused tools that agents ignore.

The most important debugging insight: if your server works in MCP Inspector but fails with a real agent, the bug is almost always in transport or auth — not in your tool logic. Inspector often runs locally via stdio, while production agents connect via HTTP/SSE through proxies and load balancers that can interfere with the connection. Always test the full production path, not just local.

How Testing Impacts Your Agent Readiness Score

A working MCP server directly impacts three of the nine scoring dimensions. Here is the breakdown from our scoring methodology:

D2: API Quality

15% weight

MCP tools count as structured API endpoints. 5+ working tools with proper schemas = 70+ on this dimension.

D8: Reliability

13% weight

Consistent responses, proper error handling, and uptime. Broken tools that return 500s actively hurt this score.

D9: Agent Experience

10% weight

MCP is the gold standard for agent experience. Having an MCP server at all puts you in the top 1% of businesses.

The scoring math: A business with no MCP server scores 0 on D9 (Agent Experience). Adding a working MCP server with 5 tools jumps D9 to 60-80. With D9 weighted at 10%, that is a direct 6-8 point boost to your total score. Combined with D2 and D8 improvements, expect a 15-25 point total increase from a properly tested MCP server. Run a free scan at /audit to see your before and after.

Frequently Asked Questions

How do I test an MCP server that uses stdio transport?

For stdio-based MCP servers, the MCP Inspector is your primary testing tool — it handles stdio communication automatically. For automated testing, import your server module directly in Jest and call handler functions. You cannot use curl with stdio servers since they communicate via stdin/stdout, not HTTP. If you need HTTP-based testing, consider adding SSE transport as an alternative — most production deployments benefit from having both.

How many tools should my MCP server expose?

Quality matters more than quantity, but 5-8 tools is the sweet spot for most businesses. Too few (1-2) means agents cannot do much. Too many (20+) means agents struggle to pick the right tool. Our scan data shows the highest-scoring MCP servers have 5-10 well-described tools with clear use cases. Each tool should do one thing well with typed inputs and outputs.

Will testing my MCP server improve my Agent Readiness Score?

Testing itself does not directly change your score — but fixing the bugs you find does. A broken MCP server that returns errors will score lower than one with no MCP server at all (the scanner detects failed endpoints). The D2 (API) dimension rewards working, well-documented endpoints. The D8 (Reliability) dimension rewards consistent uptime and proper error responses. Testing ensures both dimensions score well.

How often should I re-test my MCP server?

Run automated Jest tests on every commit in CI. Run an AgentHermes scan monthly or after any significant change to tools, schemas, or transport. Manual testing with MCP Inspector is most valuable when adding new tools or changing existing ones. The most common failure pattern is a code change that breaks an existing tool without anyone noticing — agents silently stop using it.

Continue Reading

Tutorial

Test your MCP server with a free scan

See how your MCP server impacts your Agent Readiness Score across all 9 dimensions. The scanner detects MCP endpoints automatically and measures tool quality.

Scan My MCP Server Build One First

Share this article:

Complete Guide

How to Test Your MCP Server: Validation, Debugging, and Scoring Impact

Why MCP Testing Is Not Optional

5 Methods to Validate Your MCP Server

MCP Inspector — Visual Validation

Claude Desktop — Real Agent Testing

curl for JSON-RPC 2.0 Verification

Automated Test Suite with Jest

AgentHermes Scan — D2 Scoring Impact

6 Most Common MCP Server Bugs

Wrong method names in tool definitions

Missing error handling on tool calls

Auth not forwarded from MCP to backend

SSE transport not sending keep-alive

Tool input schemas missing required fields

Tool descriptions too vague for agent discovery

Debugging Techniques

Enable verbose logging

Check SSE transport headers

Validate tool schemas with ajv

Test with multiple MCP clients

Monitor with structured logging

How Testing Impacts Your Agent Readiness Score

D2: API Quality

D8: Reliability

D9: Agent Experience

Frequently Asked Questions

How do I test an MCP server that uses stdio transport?

How many tools should my MCP server expose?

Will testing my MCP server improve my Agent Readiness Score?

How often should I re-test my MCP server?

Continue Reading

How to Build an MCP Server for Your Business

API Testing Tools and Agent Readiness

Get Your Free Agent Readiness Score

Test your MCP server with a free scan

Related Articles

What Is Agent Readiness? The Complete Guide

State of Agent Readiness: Most Businesses Score Under 40

Why Stripe Scores 68 Silver