draft: MRTR (SEP-2322) lowlevel plumbing + handler-shape comparison by maxisbey · Pull Request #2322 · modelcontextprotocol/python-sdk

maxisbey · 2026-03-20T16:10:50Z

Draft implementation of Multi Round-Trip Requests (SEP-2322) for the Python SDK. Two commits: lowlevel plumbing, then the handler-shape comparison deck.

Counterpart to typescript-sdk#1701 — same weather-lookup tool throughout, so the diff between option files is the argument. Unlike the TS demos, the lowlevel plumbing here is real (not smuggled through JSON text blocks); every option round-trips IncompleteResult through the actual wire protocol.

Commit 1: types + lowlevel + client retry loop

	Where	Shape
Types	`src/mcp/types/_types.py`	`IncompleteResult` (discriminated by `result_type`), `InputRequest`/`InputResponse` unions, `input_responses`+`request_state` folded into `RequestParams`
Server	`src/mcp/server/lowlevel/server.py`	`on_call_tool` return widened to `CallToolResult \| IncompleteResult \| CreateTaskResult`
Shared	`src/mcp/shared/session.py`	`send_request` accepts `TypeAdapter` via overload — enables union result parsing
Session	`src/mcp/client/session.py`	`call_tool_mrtr()` returns the union; `call_tool()` stays narrow, raises clearly on `IncompleteResult`
Client	`src/mcp/client/client.py`	`call_tool()` drives the retry loop internally — dispatches embedded input requests to `elicitation_callback`/`sampling_callback`/`list_roots_callback`, retries with collected responses + echoed `request_state`. `max_mrtr_rounds=8` bound.

The client-side delta from today's code is zero: elicitation_callback is the same function whether it fires from SSE push or MRTR retry.

Commit 2: handler-shape comparison

SDK primitives in src/mcp/server/experimental/mrtr.py:

MrtrCtx.once(key, fn) — idempotency guard tracked in request_state (Option F)
ToolBuilder — incomplete_step(...).end_step(...).build(); end_step runs exactly once regardless of round count (Option G)
input_response(params, key) — sugar for the guard-first pattern
sse_retry_shim() + dispatch_by_version() — comparison artifacts for A/D

Option examples in examples/servers/mrtr-options/:

	Author writes	SDK does	Hidden re-entry	Old client gets
E	MRTR-native only	Nothing	No	Result w/ default, or error
A	MRTR-native only	Retry-loop over SSE	Yes, safe	Full elicitation
B	`await elicit()`	Exception → `IncompleteResult`	Yes, unsafe	Full elicitation
C	One handler, `if version` branch	Version accessor	No	Full elicitation
D	Two handlers	Picks by version	No	Full elicitation
F	MRTR-native + `ctx.once` wraps	`once()` guard in request_state	No	(same as E)
G	Step functions + `.build()`	Step-tracking in request_state	No	(same as E)

Testing

tests/experimental/test_mrtr.py parametrises E/F/G against the same Client + callback to prove identical wire behaviour — the server's internal choice doesn't leak. The footgun test measures audit_log count: naive handler fires twice for one tool call, F and G fire once.

tests/client/test_client.py has 8 new E2E tests covering the retry loop (single-round elicitation, multi-round with request_state accumulation, sampling/roots dispatch, round-limit, missing-callback error paths).

Not in scope

Persistent/Tasks workflow — ServerTaskContext already does input_required; MRTR integration is a separate PR
mrtrOnly client flag — trivial to add, not demoed
requestState HMAC signing — called out in code comments; demos use plain base64-JSON
High-level MCPServer integration (@server.tool decorator shape) — lowlevel-first, this PR stops at Server

Exploratory — not intended to merge as-is. Open questions: which of F/G (or both) to ship as SDK primitives, whether to keep call_tool_mrtr as public or fold the union into call_tool once SEP finalises, whether sse_retry_shim belongs in the SDK at all vs docs-only.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update

Lowlevel plumbing for Multi Round-Trip Requests: Types: - IncompleteResult with result_type discriminator, input_requests, request_state - InputRequest/InputResponse unions (elicitation, sampling, roots) - input_responses + request_state fields on RequestParams Server (lowlevel): - on_call_tool return widened to include IncompleteResult Session: - send_request accepts TypeAdapter (overload) for union result parsing - call_tool_mrtr() returns CallToolResult | IncompleteResult - call_tool() stays narrow, raises on IncompleteResult with migration hint Client: - call_tool() drives MRTR retry loop internally — dispatches embedded input requests to elicitation/sampling/list_roots callbacks, retries with collected responses + echoed request_state - max_mrtr_rounds bound (default 8) The client-side delta from today's code is zero: elicitation_callback is the same function whether it fires from SSE push or MRTR retry.

Python-SDK counterpart to typescript-sdk#1701. Seven ways to write the same weather-lookup tool so the diff between files is the argument. SDK primitives (src/mcp/server/experimental/mrtr.py): - MrtrCtx.once() — idempotency guard tracked in request_state (Option F) - ToolBuilder — structural step decomposition; end_step runs exactly once regardless of round count (Option G) - input_response() — sugar for the guard-first pattern - sse_retry_shim() — Option A comparison artifact (pragma no-cover until LATEST_PROTOCOL_VERSION bumps past the MRTR gate) - dispatch_by_version() — Option D comparison artifact Option examples (examples/servers/mrtr-options/): - E (degrade-only): the SDK default. MRTR-native; pre-MRTR gets a default or error. Both quadrant rows collapse here. - A (SSE shim): SDK emulates retry over SSE. Safe re-entry, hidden loop. - B (await shim): exception-based. UNSAFE — hidden double-execution above await. Not a ship target; for contrast. - C (version branch): explicit if/else in handler body. - D (dual handler): two functions, SDK picks by version. - F (ctx.once): idempotency guard, opt-in per side-effect. - G (ToolBuilder): no above-the-guard zone; end_step structurally unreachable until all elicitations complete. The invariant test (tests/experimental/test_mrtr.py) parametrises E/F/G against the same Client + callback to prove identical wire behaviour — the server's internal choice doesn't leak. The footgun test measures audit_log count to prove F and G actually hold the guard (naive handler fires twice; F and G fire once). Both F and G depend on request_state integrity. The demos use plain base64-JSON; a production SDK MUST HMAC-sign the blob.

Two standalone reference examples before the comparison deck: - basic.py: the simple-tool equivalent for MRTR. One IncompleteResult, one retry. Comments walk through the two moves every MRTR handler makes: check input_responses, return IncompleteResult if missing. Runnable end-to-end against the in-memory Client. - basic_multiround.py: the ADO-rules SEP example translated. Two cascading elicitation rounds with request_state carrying accumulated context so any server instance can handle any round. Shows the key gotcha: input_responses carries only the latest round's answers, not accumulated — anything that must survive goes in request_state.

mrtr.py → mrtr/ ├── __init__.py — package docstring + re-exports ├── _state.py — encode_state/decode_state + input_response helper ├── context.py — MrtrCtx (Option F, ship target) ├── builder.py — ToolBuilder (Option G, ship target) └── compat.py — sse_retry_shim + dispatch_by_version (comparison artifacts) Ship targets (F/G) now live separately from the dual-path compat shims. All imports from mcp.server.experimental.mrtr unchanged.

The Option B footgun was: await elicit() looks like a suspension point but is actually a re-entry point, so everything above it runs twice. Option H fixes that by making it a REAL suspension point — the coroutine frame is held in a ContinuationStore across MRTR rounds, keyed by request_state. Handler code stays exactly as it was in the SSE era: async def my_tool(ctx: LinearCtx, location: str) -> str: audit_log(location) # runs exactly once units = await ctx.elicit("Which units?", UnitsSchema) return f"{location}: 22°{units.u}" The wrapper linear_mrtr(my_tool, store=...) translates this into a standard MRTR on_call_tool handler. Round 1 starts the coroutine; elicit() sends IncompleteResult back through the wrapper and parks on a stream. Round 2's retry wakes it with the answer. The coroutine continues from where it stopped — no re-entry, no double-execution. Trade-off: server holds the frame in memory between rounds. Client sees pure MRTR (no SSE, independent requests), but server is stateful within a single tool call. Horizontally-scaled deployments need sticky routing on the request_state token. Same operational shape as Option A's SSE hold, without the long-lived connection. SDK pieces (src/mcp/server/experimental/mrtr/linear.py): - LinearCtx with async elicit(message, PydanticSchema) -> instance - ContinuationStore — owns the task group, TTL-based frame expiry - linear_mrtr(handler, store=...) — the wrapper - ElicitDeclined raised when user declines/cancels 7 E2E tests including the key assertion: side-effects above await fire exactly once (the test measures audit_log count).

halter73 · 2026-03-21T20:49:00Z

@maxisbey I had to do a double take. What sorcery did you use to get this PR number (#2322) to match the SEP (#2322)!?

maxisbey added 5 commits March 20, 2026 15:54

felixweinberger mentioned this pull request Mar 20, 2026

examples: MRTR dual-path options for 2025-11 client compat modelcontextprotocol/typescript-sdk#1701

Draft

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draft: MRTR (SEP-2322) lowlevel plumbing + handler-shape comparison#2322

draft: MRTR (SEP-2322) lowlevel plumbing + handler-shape comparison#2322
maxisbey wants to merge 5 commits intomainfrom
mrtr-draft

maxisbey commented Mar 20, 2026

Uh oh!

halter73 commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maxisbey commented Mar 20, 2026

Commit 1: types + lowlevel + client retry loop

Commit 2: handler-shape comparison

Testing

Not in scope

Types of changes

Uh oh!

halter73 commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants