Export, serve, and containerize any ML model — plus auto-generate MCP servers for AI agents.
anydeploy is the last-mile deployment toolkit for ML models. It exports PyTorch or sklearn models to ONNX, TorchScript, or TFLite with smart defaults; generates a FastAPI server with health checks and OpenAPI docs; auto-creates a Model Context Protocol (MCP) server so any AI agent (Claude Desktop, Continue, Cursor) can call your model as a tool; and produces Dockerfiles + requirements files for reproducible deployment. Three deployment profiles (edge, balanced, quality) pick quantization and precision for you.
Built by Viet-Anh Nguyen at NRL.ai.
- One-liner API —
anydeploy.export(model, "onnx")handles shape inference, opset, and validation - Plugin architecture — Register custom exporters, servers, or container targets
- Local-first — Everything runs on your machine; no cloud account needed
- Minimal core deps — Base install has zero heavy deps; torch/tf are optional
- Production-ready — MCP integration, FastAPI generation, Dockerfile scaffolding
pip install anydeployFor optional features:
pip install anydeploy[onnx] # ONNX export + onnxruntime verification
pip install anydeploy[torch] # TorchScript export
pip install anydeploy[tflite] # TFLite conversion
pip install anydeploy[serve] # FastAPI + uvicorn server
pip install anydeploy[mcp] # Model Context Protocol server generation
pip install anydeploy[all] # everythingPython 3.8+ supported (tested on 3.8, 3.9, 3.10, 3.11, 3.12, 3.13)
import anydeploy
import torch
model = torch.load("resnet50.pt").eval()
# 1. Export to ONNX with smart defaults (opset, dynamic axes, validation)
anydeploy.export(
model,
format="onnx",
out="resnet50.onnx",
example_input=torch.randn(1, 3, 224, 224),
profile="balanced", # edge | balanced | quality
)
# 2. Generate a FastAPI server with health check + OpenAPI docs
anydeploy.serve("resnet50.onnx", host="0.0.0.0", port=8000)
# 3. Generate an MCP server so Claude Desktop / Cursor can call the model
anydeploy.mcp("resnet50.onnx", out="my_mcp_server/", name="image-classifier")
# 4. Generate a Dockerfile + requirements.txt for reproducible deployment
anydeploy.containerize("resnet50.onnx", out="docker/", base="python:3.11-slim")| Format | How it works | Notes |
|---|---|---|
| ONNX | torch.onnx.export with auto-derived dynamic axes + opset 17 defaults |
Validates via onnxruntime after export |
| TorchScript | torch.jit.trace (default) or torch.jit.script |
Python-free runtime |
| TFLite | torch -> onnx -> tf -> tflite via onnx-tf + TensorFlow converter |
Mobile / embedded |
All exports include automatic shape inference, input/output naming, and a round-trip validation step that runs a dummy input through both the original and the exported model and compares outputs.
| Profile | Precision | Quantization | Intended target |
|---|---|---|---|
edge |
int8 | Post-training static quantization | Raspberry Pi, phones, MCUs |
balanced (default) |
fp16 | Optional fp16 conversion | Laptop / workstation CPU |
quality |
fp32 | None | Server / GPU inference |
anydeploy.serve(model_path) generates and launches a FastAPI app with:
POST /predict— accepts JSON or multipart image uploadGET /health— liveness checkGET /docs— interactive OpenAPI UI (Swagger)- Automatic request/response Pydantic schemas inferred from the model's input/output shapes
- Optional batching, CORS, and API-key authentication
anydeploy.mcp(model_path, name=...) generates a complete MCP server implementation that exposes your model as an AI-callable tool. Any MCP-compatible client — Claude Desktop, Cursor, Continue, Zed — can then invoke your model via natural language.
The generated server:
- Exposes a
run_modeltool with a JSON schema derived from model inputs - Handles image decoding, tensor conversion, and postprocessing
- Ships with a
claude_desktop_config.jsonsnippet ready to copy
anydeploy.containerize(model_path) generates:
Dockerfile— minimal base image (python-slim by default) with only the runtime dependencies your model needsrequirements.txt— pinned versions discovered from the export step.dockerignore— sensible defaultsdocker-compose.yml(optional) — for multi-container setups
| Function | Purpose |
|---|---|
anydeploy.export(model, format, out, **opts) |
Export to ONNX/TorchScript/TFLite |
anydeploy.serve(model_path, host, port) |
Launch a FastAPI server |
anydeploy.generate_server(model_path, out) |
Generate FastAPI code to disk |
anydeploy.mcp(model_path, out, name) |
Generate an MCP tool server |
anydeploy.containerize(model_path, out) |
Generate Dockerfile + requirements |
anydeploy.quantize(model_path, mode="int8") |
Post-training quantization |
anydeploy.benchmark(model_path) |
Measure latency + throughput |
# Export
anydeploy export model.pt --format onnx --out model.onnx --profile edge
# Serve
anydeploy serve model.onnx --port 8000
# Generate MCP server
anydeploy mcp model.onnx --out mcp_server/ --name my-model
# Containerize
anydeploy containerize model.onnx --out docker/
# Benchmark
anydeploy benchmark model.onnx --runs 100import traincv, anydeploy
# Train a YOLOv8 detector
run = traincv.train("datasets/pets/", task="detect", model="yolov8n", epochs=50)
# Export to ONNX, edge-quantized
anydeploy.export(run.weights_path, format="onnx",
out="pets.onnx", profile="edge")
# Expose as an MCP tool for Claude Desktop
anydeploy.mcp("pets.onnx", out="pets_mcp/", name="pet-detector")import anydeploy
anydeploy.containerize("model.onnx", out="deploy/")
# Then:
# cd deploy && docker build -t my-model .
# docker run -p 8000:8000 my-modelimport anydeploy
print(anydeploy.benchmark("model.onnx")) # fp32 baseline
anydeploy.quantize("model.onnx", mode="int8", out="model_int8.onnx")
print(anydeploy.benchmark("model_int8.onnx")) # int8 quantizedMIT (c) Viet-Anh Nguyen