Understanding Sampling

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now

So far, the relationship between the Client and Server has been one-way regarding intelligence. The Client (Host) has the “Brain” (the LLM), and the Server has the “Brawn” (Tools/Data).

But sometimes, the Server needs a little bit of intelligence to finish its job.

Consider a Log Archival Server. Its job is to save error logs to a file. However, simply saving raw, cryptic error codes isn’t very useful. You want the server to:

  1. Read the raw error.
  2. Understand and summarize it into plain English.
  3. Save the explained version to a report file.

To do step #2, the server needs an LLM. Instead of buying a separate API key for the server, MCP allows it to use Sampling to “borrow” the Client’s LLM connection.

The Architecture of Sampling

Sampling reverses the usual flow of control.

Implementing the Server

You will build a tool called process_log_entry. It takes a raw log line, asks the Client to explain it, and then appends that explanation to a local file.

from mcp.server.fastmcp import FastMCP, Context
from mcp.types import SamplingMessage, TextContent

mcp = FastMCP("Log-Archiver")

@mcp.tool()
async def process_log_entry(log_line: str, ctx: Context) -> str:
    """
    Analyzes a log entry using the Client's LLM and saves the report to a file.
    """
    prompt = f"You are a Site Reliability Engineer. Explain this log error in one concise sentence: {log_line}"

    result = await ctx.session.create_message(
        messages=[
            SamplingMessage(
                role="user",
                content=TextContent(type="text", text=prompt),
            )
        ],
        max_tokens=100,
    )

    if result.content.type == "text":
        llm_explanation = result.content.text
    else:
        llm_explanation = str(result.content)

    report_file = "incident_report.txt"
    with open(report_file, "a") as f:
        f.write(f"--- INCIDENT REPORT ---\n")
        f.write(f"RAW: {log_line}\n")
        f.write(f"ANALYSIS: {llm_explanation}\n")
        f.write("-" * 30 + "\n")

    return f"Success: Log analyzed and appended to '{report_file}'."

if __name__ == "__main__":
    mcp.run(transport="streamable-http")

What You Are Building

You are building a server that asks the client to use its LLM to explain a log entry, then saves that explanation to a local report file. The server never talks to the model directly; it requests a sampling response from the client.

Implementing the Sampling Client

The Client acts as the bridge. It connects to the server and listens for sampling requests. When the server asks for help, the client forwards the request to the actual Anthropic API.

import asyncio
import os
from anthropic import Anthropic
from mcp import ClientSession, types
from mcp.client.session import RequestContext
from mcp.client.streamable_http import streamablehttp_client

SERVER_URL = "http://127.0.0.1:8000/mcp"

llm_client = Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))

async def sampling_handler(
    context: RequestContext,
    params: types.CreateMessageRequestParams
) -> types.CreateMessageResult:
    """
    This function triggers when the Server asks for an LLM completion.
    We bridge the request to the real Anthropic API.
    """
    server_prompt = params.messages[0].content.text
    print(f"\n[Client] Server requested LLM generation for: '{server_prompt}'")

    print(f"[Client] Forwarding request to Claude...")

    message = llm_client.messages.create(
        max_tokens=params.maxTokens or 1024,
        messages=[
            {
                "role": "user",
                "content": server_prompt,
            }
        ],
        model="claude-sonnet-4-5-20250929",
    )

    ai_response = message.content[0].text
    print(f"[Client] Received answer from Claude: '{ai_response}'")

    return types.CreateMessageResult(
        model="claude-sonnet-4-5",
        role="assistant",
        content=types.TextContent(type="text", text=ai_response)
    )

async def run_client():
    print(f"Connecting to Server at {SERVER_URL}...")

    async with streamablehttp_client(SERVER_URL) as (read, write, _):
        async with ClientSession(
            read,
            write,
            sampling_callback=sampling_handler
        ) as session:
            await session.initialize()
            print("Connected.\n")

            print("--- Test: Archiving Complex Error ---")

            complex_error = (
                "2025-12-21 14:02:11 UTC [821] ERROR:  deadlock detected "
                "DETAIL:  Process 821 waits for ShareLock on transaction 456; process 999 waits for ShareLock on transaction 821. "
                "HINT:  See server log for query details."
            )

            result = await session.call_tool(
                "process_log_entry",
                arguments={"log_line": complex_error}
            )

            print(f"\nFinal Tool Result:\n{result.content[0].text}")

if __name__ == "__main__":
    asyncio.run(run_client())

How the Server Script Works

  • ctx.session.create_message(...) triggers a sampling request to the client.
  • The server waits for the client to return an LLM response.
  • The server writes the explanation to incident_report.txt.

How the Client Script Works

  • sampling_handler(...) is the callback the server triggers for sampling.
  • It forwards the prompt to Anthropic and returns the response.
  • sampling_callback=... is what makes the client “sampling-aware.”
  • The API key is read from ANTHROPIC_API_KEY.

Run It

  1. Install dependencies:
uv add anthropic
export ANTHROPIC_API_KEY="YOUR_KEY_HERE"
$env:ANTHROPIC_API_KEY="YOUR_KEY_HERE"
uv run python sampling_server.py
uv run python sampling_client.py
See forum comments
Download course materials from Github
Previous: Enforcing Roots Next: Sampling with Real AI