anydeploy

Export, serve, and containerize any ML model — plus auto-generate MCP servers for AI agents.

anydeploy is the last-mile deployment toolkit for ML models. It exports PyTorch or sklearn models to ONNX, TorchScript, or TFLite with smart defaults; generates a FastAPI server with health checks and OpenAPI docs; auto-creates a Model Context Protocol (MCP) server so any AI agent (Claude Desktop, Continue, Cursor) can call your model as a tool; and produces Dockerfiles + requirements files for reproducible deployment. Three deployment profiles (edge, balanced, quality) pick quantization and precision for you.

Built by Viet-Anh Nguyen at NRL.ai.

Why anydeploy?

One-liner API — anydeploy.export(model, "onnx") handles shape inference, opset, and validation
Plugin architecture — Register custom exporters, servers, or container targets
Local-first — Everything runs on your machine; no cloud account needed
Minimal core deps — Base install has zero heavy deps; torch/tf are optional
Production-ready — MCP integration, FastAPI generation, Dockerfile scaffolding

Installation

pip install anydeploy

For optional features:

pip install anydeploy[onnx]      # ONNX export + onnxruntime verification
pip install anydeploy[torch]     # TorchScript export
pip install anydeploy[tflite]    # TFLite conversion
pip install anydeploy[serve]     # FastAPI + uvicorn server
pip install anydeploy[mcp]       # Model Context Protocol server generation
pip install anydeploy[all]       # everything

Python 3.8+ supported (tested on 3.8, 3.9, 3.10, 3.11, 3.12, 3.13)

Quick Start

import anydeploy
import torch

model = torch.load("resnet50.pt").eval()

# 1. Export to ONNX with smart defaults (opset, dynamic axes, validation)
anydeploy.export(
    model,
    format="onnx",
    out="resnet50.onnx",
    example_input=torch.randn(1, 3, 224, 224),
    profile="balanced",          # edge | balanced | quality
)

# 2. Generate a FastAPI server with health check + OpenAPI docs
anydeploy.serve("resnet50.onnx", host="0.0.0.0", port=8000)

# 3. Generate an MCP server so Claude Desktop / Cursor can call the model
anydeploy.mcp("resnet50.onnx", out="my_mcp_server/", name="image-classifier")

# 4. Generate a Dockerfile + requirements.txt for reproducible deployment
anydeploy.containerize("resnet50.onnx", out="docker/", base="python:3.11-slim")

Models & Methods

Export formats

Format	How it works	Notes
ONNX	`torch.onnx.export` with auto-derived dynamic axes + opset 17 defaults	Validates via onnxruntime after export
TorchScript	`torch.jit.trace` (default) or `torch.jit.script`	Python-free runtime
TFLite	`torch -> onnx -> tf -> tflite` via onnx-tf + TensorFlow converter	Mobile / embedded

All exports include automatic shape inference, input/output naming, and a round-trip validation step that runs a dummy input through both the original and the exported model and compares outputs.

Deployment profiles

Profile	Precision	Quantization	Intended target
`edge`	int8	Post-training static quantization	Raspberry Pi, phones, MCUs
`balanced` (default)	fp16	Optional fp16 conversion	Laptop / workstation CPU
`quality`	fp32	None	Server / GPU inference

FastAPI server generation

anydeploy.serve(model_path) generates and launches a FastAPI app with:

POST /predict — accepts JSON or multipart image upload
GET /health — liveness check
GET /docs — interactive OpenAPI UI (Swagger)
Automatic request/response Pydantic schemas inferred from the model's input/output shapes
Optional batching, CORS, and API-key authentication

MCP (Model Context Protocol) server generation

anydeploy.mcp(model_path, name=...) generates a complete MCP server implementation that exposes your model as an AI-callable tool. Any MCP-compatible client — Claude Desktop, Cursor, Continue, Zed — can then invoke your model via natural language.

The generated server:

Exposes a run_model tool with a JSON schema derived from model inputs
Handles image decoding, tensor conversion, and postprocessing
Ships with a claude_desktop_config.json snippet ready to copy

Containerization

anydeploy.containerize(model_path) generates:

Dockerfile — minimal base image (python-slim by default) with only the runtime dependencies your model needs
requirements.txt — pinned versions discovered from the export step
.dockerignore — sensible defaults
docker-compose.yml (optional) — for multi-container setups

API Reference

Function	Purpose
`anydeploy.export(model, format, out, **opts)`	Export to ONNX/TorchScript/TFLite
`anydeploy.serve(model_path, host, port)`	Launch a FastAPI server
`anydeploy.generate_server(model_path, out)`	Generate FastAPI code to disk
`anydeploy.mcp(model_path, out, name)`	Generate an MCP tool server
`anydeploy.containerize(model_path, out)`	Generate Dockerfile + requirements
`anydeploy.quantize(model_path, mode="int8")`	Post-training quantization
`anydeploy.benchmark(model_path)`	Measure latency + throughput

CLI Usage

# Export
anydeploy export model.pt --format onnx --out model.onnx --profile edge

# Serve
anydeploy serve model.onnx --port 8000

# Generate MCP server
anydeploy mcp model.onnx --out mcp_server/ --name my-model

# Containerize
anydeploy containerize model.onnx --out docker/

# Benchmark
anydeploy benchmark model.onnx --runs 100

Examples

Train with traincv, deploy with anydeploy

import traincv, anydeploy

# Train a YOLOv8 detector
run = traincv.train("datasets/pets/", task="detect", model="yolov8n", epochs=50)

# Export to ONNX, edge-quantized
anydeploy.export(run.weights_path, format="onnx",
                 out="pets.onnx", profile="edge")

# Expose as an MCP tool for Claude Desktop
anydeploy.mcp("pets.onnx", out="pets_mcp/", name="pet-detector")

Auto-generate a Docker image and run it

import anydeploy

anydeploy.containerize("model.onnx", out="deploy/")

# Then:
#   cd deploy && docker build -t my-model .
#   docker run -p 8000:8000 my-model

Benchmark before and after quantization

import anydeploy

print(anydeploy.benchmark("model.onnx"))              # fp32 baseline
anydeploy.quantize("model.onnx", mode="int8", out="model_int8.onnx")
print(anydeploy.benchmark("model_int8.onnx"))         # int8 quantized

License

MIT (c) Viet-Anh Nguyen

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
src/anydeploy		src/anydeploy
tests		tests
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
logo.svg		logo.svg
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

anydeploy

Why anydeploy?

Installation

Quick Start

Models & Methods

Export formats

Deployment profiles

FastAPI server generation

MCP (Model Context Protocol) server generation

Containerization

API Reference

CLI Usage

Examples

Train with traincv, deploy with anydeploy

Auto-generate a Docker image and run it

Benchmark before and after quantization

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

anydeploy

Why anydeploy?

Installation

Quick Start

Models & Methods

Export formats

Deployment profiles

FastAPI server generation

MCP (Model Context Protocol) server generation

Containerization

API Reference

CLI Usage

Examples

Train with traincv, deploy with anydeploy

Auto-generate a Docker image and run it

Benchmark before and after quantization

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages