Skip to content

vietanhdev/anydeploy

Repository files navigation

anydeploy

Export, serve, and containerize any ML model — plus auto-generate MCP servers for AI agents.

PyPI Python License

anydeploy is the last-mile deployment toolkit for ML models. It exports PyTorch or sklearn models to ONNX, TorchScript, or TFLite with smart defaults; generates a FastAPI server with health checks and OpenAPI docs; auto-creates a Model Context Protocol (MCP) server so any AI agent (Claude Desktop, Continue, Cursor) can call your model as a tool; and produces Dockerfiles + requirements files for reproducible deployment. Three deployment profiles (edge, balanced, quality) pick quantization and precision for you.

Built by Viet-Anh Nguyen at NRL.ai.

Why anydeploy?

  • One-liner APIanydeploy.export(model, "onnx") handles shape inference, opset, and validation
  • Plugin architecture — Register custom exporters, servers, or container targets
  • Local-first — Everything runs on your machine; no cloud account needed
  • Minimal core deps — Base install has zero heavy deps; torch/tf are optional
  • Production-ready — MCP integration, FastAPI generation, Dockerfile scaffolding

Installation

pip install anydeploy

For optional features:

pip install anydeploy[onnx]      # ONNX export + onnxruntime verification
pip install anydeploy[torch]     # TorchScript export
pip install anydeploy[tflite]    # TFLite conversion
pip install anydeploy[serve]     # FastAPI + uvicorn server
pip install anydeploy[mcp]       # Model Context Protocol server generation
pip install anydeploy[all]       # everything

Python 3.8+ supported (tested on 3.8, 3.9, 3.10, 3.11, 3.12, 3.13)

Quick Start

import anydeploy
import torch

model = torch.load("resnet50.pt").eval()

# 1. Export to ONNX with smart defaults (opset, dynamic axes, validation)
anydeploy.export(
    model,
    format="onnx",
    out="resnet50.onnx",
    example_input=torch.randn(1, 3, 224, 224),
    profile="balanced",          # edge | balanced | quality
)

# 2. Generate a FastAPI server with health check + OpenAPI docs
anydeploy.serve("resnet50.onnx", host="0.0.0.0", port=8000)

# 3. Generate an MCP server so Claude Desktop / Cursor can call the model
anydeploy.mcp("resnet50.onnx", out="my_mcp_server/", name="image-classifier")

# 4. Generate a Dockerfile + requirements.txt for reproducible deployment
anydeploy.containerize("resnet50.onnx", out="docker/", base="python:3.11-slim")

Models & Methods

Export formats

Format How it works Notes
ONNX torch.onnx.export with auto-derived dynamic axes + opset 17 defaults Validates via onnxruntime after export
TorchScript torch.jit.trace (default) or torch.jit.script Python-free runtime
TFLite torch -> onnx -> tf -> tflite via onnx-tf + TensorFlow converter Mobile / embedded

All exports include automatic shape inference, input/output naming, and a round-trip validation step that runs a dummy input through both the original and the exported model and compares outputs.

Deployment profiles

Profile Precision Quantization Intended target
edge int8 Post-training static quantization Raspberry Pi, phones, MCUs
balanced (default) fp16 Optional fp16 conversion Laptop / workstation CPU
quality fp32 None Server / GPU inference

FastAPI server generation

anydeploy.serve(model_path) generates and launches a FastAPI app with:

  • POST /predict — accepts JSON or multipart image upload
  • GET /health — liveness check
  • GET /docs — interactive OpenAPI UI (Swagger)
  • Automatic request/response Pydantic schemas inferred from the model's input/output shapes
  • Optional batching, CORS, and API-key authentication

MCP (Model Context Protocol) server generation

anydeploy.mcp(model_path, name=...) generates a complete MCP server implementation that exposes your model as an AI-callable tool. Any MCP-compatible client — Claude Desktop, Cursor, Continue, Zed — can then invoke your model via natural language.

The generated server:

  • Exposes a run_model tool with a JSON schema derived from model inputs
  • Handles image decoding, tensor conversion, and postprocessing
  • Ships with a claude_desktop_config.json snippet ready to copy

Containerization

anydeploy.containerize(model_path) generates:

  • Dockerfile — minimal base image (python-slim by default) with only the runtime dependencies your model needs
  • requirements.txt — pinned versions discovered from the export step
  • .dockerignore — sensible defaults
  • docker-compose.yml (optional) — for multi-container setups

API Reference

Function Purpose
anydeploy.export(model, format, out, **opts) Export to ONNX/TorchScript/TFLite
anydeploy.serve(model_path, host, port) Launch a FastAPI server
anydeploy.generate_server(model_path, out) Generate FastAPI code to disk
anydeploy.mcp(model_path, out, name) Generate an MCP tool server
anydeploy.containerize(model_path, out) Generate Dockerfile + requirements
anydeploy.quantize(model_path, mode="int8") Post-training quantization
anydeploy.benchmark(model_path) Measure latency + throughput

CLI Usage

# Export
anydeploy export model.pt --format onnx --out model.onnx --profile edge

# Serve
anydeploy serve model.onnx --port 8000

# Generate MCP server
anydeploy mcp model.onnx --out mcp_server/ --name my-model

# Containerize
anydeploy containerize model.onnx --out docker/

# Benchmark
anydeploy benchmark model.onnx --runs 100

Examples

Train with traincv, deploy with anydeploy

import traincv, anydeploy

# Train a YOLOv8 detector
run = traincv.train("datasets/pets/", task="detect", model="yolov8n", epochs=50)

# Export to ONNX, edge-quantized
anydeploy.export(run.weights_path, format="onnx",
                 out="pets.onnx", profile="edge")

# Expose as an MCP tool for Claude Desktop
anydeploy.mcp("pets.onnx", out="pets_mcp/", name="pet-detector")

Auto-generate a Docker image and run it

import anydeploy

anydeploy.containerize("model.onnx", out="deploy/")

# Then:
#   cd deploy && docker build -t my-model .
#   docker run -p 8000:8000 my-model

Benchmark before and after quantization

import anydeploy

print(anydeploy.benchmark("model.onnx"))              # fp32 baseline
anydeploy.quantize("model.onnx", mode="int8", out="model_int8.onnx")
print(anydeploy.benchmark("model_int8.onnx"))         # int8 quantized

License

MIT (c) Viet-Anh Nguyen

About

CLI tool and library to export ML models to production formats and containerize them with Docker

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages