Edgeflow#
Train in Python. Serve in Rust.
Edgeflow is an MLflow-compatible experiment tracker, model registry, and inference server, built for nodes where a Python serving stack is too heavy to fit. Models log through the MLflow client you already use; the runtime is a single Rust binary that loads ONNX, runs WASM pre/post processing, and hot-swaps deployments without dropping traffic.
Why it exists#
A Python serving container easily passes several hundred MB resident before it answers a single request. On a constrained node - an edge box, a small VPS, a free tier - that is the difference between fitting your model and not. Edgeflow keeps the authoring ergonomics teams already have (MLflow tracking, a Python SDK, ONNX export) but moves inference into a Rust + WASM runtime measured in tens of MB.
Hot-swap, per-target observability, and multi-target deployments are first-class because rebuilding and pushing a container image is not a realistic update mechanism for a node living behind a flaky link in a warehouse.
Who it’s for#
ML engineers shipping models to constrained nodes who want to keep their MLflow workflow.
Teams running fleets of edge devices who do not want to pay the per-node weight of a full Python serving stack.
Anyone building a side project or proof of concept that needs a serving story lighter than a full container orchestration deploy.
Where to start#
New here. Walk through the Quickstart tutorial. About two minutes from zero to a
live /infer endpoint.
Evaluating edgeflow. Skim the system architecture to see how the control plane, the inference runtime, and the artifact store fit together.
Building on the API. A dedicated reference section is in flight. For now, the tutorials cover the SDK and HTTP surface end-to-end on real models.