Skip to content

draft: MRTR (SEP-2322) lowlevel plumbing + handler-shape comparison#2322

Draft
maxisbey wants to merge 5 commits intomainfrom
mrtr-draft
Draft

draft: MRTR (SEP-2322) lowlevel plumbing + handler-shape comparison#2322
maxisbey wants to merge 5 commits intomainfrom
mrtr-draft

Conversation

@maxisbey
Copy link
Contributor

Draft implementation of Multi Round-Trip Requests (SEP-2322) for the Python SDK. Two commits: lowlevel plumbing, then the handler-shape comparison deck.

Counterpart to typescript-sdk#1701 — same weather-lookup tool throughout, so the diff between option files is the argument. Unlike the TS demos, the lowlevel plumbing here is real (not smuggled through JSON text blocks); every option round-trips IncompleteResult through the actual wire protocol.

Commit 1: types + lowlevel + client retry loop

Where Shape
Types src/mcp/types/_types.py IncompleteResult (discriminated by result_type), InputRequest/InputResponse unions, input_responses+request_state folded into RequestParams
Server src/mcp/server/lowlevel/server.py on_call_tool return widened to CallToolResult | IncompleteResult | CreateTaskResult
Shared src/mcp/shared/session.py send_request accepts TypeAdapter via overload — enables union result parsing
Session src/mcp/client/session.py call_tool_mrtr() returns the union; call_tool() stays narrow, raises clearly on IncompleteResult
Client src/mcp/client/client.py call_tool() drives the retry loop internally — dispatches embedded input requests to elicitation_callback/sampling_callback/list_roots_callback, retries with collected responses + echoed request_state. max_mrtr_rounds=8 bound.

The client-side delta from today's code is zero: elicitation_callback is the same function whether it fires from SSE push or MRTR retry.

Commit 2: handler-shape comparison

SDK primitives in src/mcp/server/experimental/mrtr.py:

  • MrtrCtx.once(key, fn) — idempotency guard tracked in request_state (Option F)
  • ToolBuilderincomplete_step(...).end_step(...).build(); end_step runs exactly once regardless of round count (Option G)
  • input_response(params, key) — sugar for the guard-first pattern
  • sse_retry_shim() + dispatch_by_version() — comparison artifacts for A/D

Option examples in examples/servers/mrtr-options/:

Author writes SDK does Hidden re-entry Old client gets
E MRTR-native only Nothing No Result w/ default, or error
A MRTR-native only Retry-loop over SSE Yes, safe Full elicitation
B await elicit() Exception → IncompleteResult Yes, unsafe Full elicitation
C One handler, if version branch Version accessor No Full elicitation
D Two handlers Picks by version No Full elicitation
F MRTR-native + ctx.once wraps once() guard in request_state No (same as E)
G Step functions + .build() Step-tracking in request_state No (same as E)

Testing

tests/experimental/test_mrtr.py parametrises E/F/G against the same Client + callback to prove identical wire behaviour — the server's internal choice doesn't leak. The footgun test measures audit_log count: naive handler fires twice for one tool call, F and G fire once.

tests/client/test_client.py has 8 new E2E tests covering the retry loop (single-round elicitation, multi-round with request_state accumulation, sampling/roots dispatch, round-limit, missing-callback error paths).

Not in scope

  • Persistent/Tasks workflow — ServerTaskContext already does input_required; MRTR integration is a separate PR
  • mrtrOnly client flag — trivial to add, not demoed
  • requestState HMAC signing — called out in code comments; demos use plain base64-JSON
  • High-level MCPServer integration (@server.tool decorator shape) — lowlevel-first, this PR stops at Server

Exploratory — not intended to merge as-is. Open questions: which of F/G (or both) to ship as SDK primitives, whether to keep call_tool_mrtr as public or fold the union into call_tool once SEP finalises, whether sse_retry_shim belongs in the SDK at all vs docs-only.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Lowlevel plumbing for Multi Round-Trip Requests:

Types:
- IncompleteResult with result_type discriminator, input_requests, request_state
- InputRequest/InputResponse unions (elicitation, sampling, roots)
- input_responses + request_state fields on RequestParams

Server (lowlevel):
- on_call_tool return widened to include IncompleteResult

Session:
- send_request accepts TypeAdapter (overload) for union result parsing
- call_tool_mrtr() returns CallToolResult | IncompleteResult
- call_tool() stays narrow, raises on IncompleteResult with migration hint

Client:
- call_tool() drives MRTR retry loop internally — dispatches embedded
  input requests to elicitation/sampling/list_roots callbacks, retries
  with collected responses + echoed request_state
- max_mrtr_rounds bound (default 8)

The client-side delta from today's code is zero: elicitation_callback is the
same function whether it fires from SSE push or MRTR retry.
Python-SDK counterpart to typescript-sdk#1701. Seven ways to write the
same weather-lookup tool so the diff between files is the argument.

SDK primitives (src/mcp/server/experimental/mrtr.py):
- MrtrCtx.once() — idempotency guard tracked in request_state (Option F)
- ToolBuilder — structural step decomposition; end_step runs exactly once
  regardless of round count (Option G)
- input_response() — sugar for the guard-first pattern
- sse_retry_shim() — Option A comparison artifact (pragma no-cover until
  LATEST_PROTOCOL_VERSION bumps past the MRTR gate)
- dispatch_by_version() — Option D comparison artifact

Option examples (examples/servers/mrtr-options/):
- E (degrade-only): the SDK default. MRTR-native; pre-MRTR gets a default
  or error. Both quadrant rows collapse here.
- A (SSE shim): SDK emulates retry over SSE. Safe re-entry, hidden loop.
- B (await shim): exception-based. UNSAFE — hidden double-execution above
  await. Not a ship target; for contrast.
- C (version branch): explicit if/else in handler body.
- D (dual handler): two functions, SDK picks by version.
- F (ctx.once): idempotency guard, opt-in per side-effect.
- G (ToolBuilder): no above-the-guard zone; end_step structurally
  unreachable until all elicitations complete.

The invariant test (tests/experimental/test_mrtr.py) parametrises E/F/G
against the same Client + callback to prove identical wire behaviour —
the server's internal choice doesn't leak. The footgun test measures
audit_log count to prove F and G actually hold the guard (naive handler
fires twice; F and G fire once).

Both F and G depend on request_state integrity. The demos use plain
base64-JSON; a production SDK MUST HMAC-sign the blob.
Two standalone reference examples before the comparison deck:

- basic.py: the simple-tool equivalent for MRTR. One IncompleteResult,
  one retry. Comments walk through the two moves every MRTR handler
  makes: check input_responses, return IncompleteResult if missing.
  Runnable end-to-end against the in-memory Client.

- basic_multiround.py: the ADO-rules SEP example translated. Two
  cascading elicitation rounds with request_state carrying accumulated
  context so any server instance can handle any round. Shows the key
  gotcha: input_responses carries only the latest round's answers, not
  accumulated — anything that must survive goes in request_state.
mrtr.py → mrtr/
├── __init__.py  — package docstring + re-exports
├── _state.py    — encode_state/decode_state + input_response helper
├── context.py   — MrtrCtx (Option F, ship target)
├── builder.py   — ToolBuilder (Option G, ship target)
└── compat.py    — sse_retry_shim + dispatch_by_version (comparison artifacts)

Ship targets (F/G) now live separately from the dual-path compat shims.
All imports from mcp.server.experimental.mrtr unchanged.
The Option B footgun was: await elicit() looks like a suspension point but
is actually a re-entry point, so everything above it runs twice. Option H
fixes that by making it a REAL suspension point — the coroutine frame is
held in a ContinuationStore across MRTR rounds, keyed by request_state.

Handler code stays exactly as it was in the SSE era:

    async def my_tool(ctx: LinearCtx, location: str) -> str:
        audit_log(location)      # runs exactly once
        units = await ctx.elicit("Which units?", UnitsSchema)
        return f"{location}: 22°{units.u}"

The wrapper linear_mrtr(my_tool, store=...) translates this into a standard
MRTR on_call_tool handler. Round 1 starts the coroutine; elicit() sends
IncompleteResult back through the wrapper and parks on a stream. Round 2's
retry wakes it with the answer. The coroutine continues from where it
stopped — no re-entry, no double-execution.

Trade-off: server holds the frame in memory between rounds. Client sees
pure MRTR (no SSE, independent requests), but server is stateful within
a single tool call. Horizontally-scaled deployments need sticky routing on
the request_state token. Same operational shape as Option A's SSE hold,
without the long-lived connection.

SDK pieces (src/mcp/server/experimental/mrtr/linear.py):
- LinearCtx with async elicit(message, PydanticSchema) -> instance
- ContinuationStore — owns the task group, TTL-based frame expiry
- linear_mrtr(handler, store=...) — the wrapper
- ElicitDeclined raised when user declines/cancels

7 E2E tests including the key assertion: side-effects above await fire
exactly once (the test measures audit_log count).
@halter73
Copy link

@maxisbey I had to do a double take. What sorcery did you use to get this PR number (#2322) to match the SEP (#2322)!?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants