Quickstart: your first deploy in 2 minutes#
This tutorial gets you from zero to a live inference endpoint. We use the iris dataset because it trains in under a second and has no heavyweight dependencies. Later tutorials move to real models.
You will:
Bring up edgeflow locally with docker compose.
Walk through a training script that pushes the model to edgeflow.
Send a request to the live inference endpoint.
Prerequisites#
Docker and docker compose
uv(installation guide)
1. Bring up edgeflow#
Pull the quickstart compose file and start the stack. No clone needed -
quickstart.yaml references pre-built images on GHCR.
curl -O https://raw.githubusercontent.com/jordandelbar/edgeflow/main/deploy/quickstart.yaml
docker compose -f quickstart.yaml up -d
Two containers start: the control-plane server on :5000 and an
inference pod on :8080.
2. Build the training script#
If you just want to see it work, the finished script is on GitHub:
curl -O https://raw.githubusercontent.com/jordandelbar/edgeflow/main/examples/01-quickstart-iris/train.py
uv run train.py
The rest of this section walks through train.py piece by piece.
Dependencies#
Create a pyproject.toml in your project directory with these
dependencies:
[project]
name = "iris-tutorial"
version = "0.0.0"
requires-python = ">=3.12"
dependencies = [
"edgeflow",
"mlflow>=3.11.1,<4",
"numpy>=2.4.4,<3",
"scikit-learn>=1.8.0,<2",
]
Imports#
Three groups: edgeflow’s SDK, MLflow for experiment tracking, and scikit-learn for the model itself.
import os
import edgeflow
import mlflow
import numpy as np
from edgeflow.models import sklearn_to_onnx
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
edgeflow.models.sklearn_to_onnx is a small helper that wraps
skl2onnx with sensible defaults. You can call skl2onnx
directly if you need finer control.
Configuration#
Two values point the script at the edgeflow server and pick a
deployment target. The defaults match what docker compose up
exposes, so you only need to override them when running against a
different setup.
EDGEFLOW_SERVER = os.environ.get("EDGEFLOW_SERVER", "http://localhost:5000")
EDGEFLOW_TARGET = os.environ.get("EDGEFLOW_TARGET", "iris-inference")
Train the classifier#
Standard scikit-learn flow. Iris loads from a built-in dataset; cast
features to float32 because that’s what the ONNX exporter expects.
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris.data.astype(np.float32), iris.target, test_size=0.2, random_state=42
)
clf = LogisticRegression(max_iter=200)
clf.fit(X_train, y_train)
accuracy = accuracy_score(y_test, clf.predict(X_test))
Nothing edgeflow-specific yet - this is the same code you would write to fit and evaluate any sklearn classifier.
Log to MLflow and bundle the model#
Edgeflow speaks the MLflow tracking protocol. Point mlflow at the
edgeflow server and start a run as you normally would; log_params
and log_metric behave exactly as they would against a vanilla
MLflow server.
The edgeflow-specific call is edgeflow.log_model. It serialises
the trained classifier to ONNX and bundles it with a postprocess
pipeline into a single artifact. Here the pipeline is just
ClassifierOutput, which maps the model’s raw probability vector to
{class_id, label, confidence} so the inference endpoint returns
something humans can read.
mlflow.set_tracking_uri(EDGEFLOW_SERVER)
exp = mlflow.set_experiment("iris-poc")
with mlflow.start_run(experiment_id=exp.experiment_id, run_name="iris-logistic") as run:
mlflow.log_params(
{
"model": "LogisticRegression",
"max_iter": 200,
"n_features": 4,
"n_classes": 3,
}
)
mlflow.log_metric("accuracy", accuracy)
edgeflow.log_model(
model_bytes=sklearn_to_onnx(clf),
postprocess=edgeflow.Pipeline(
[edgeflow.ClassifierOutput(labels=list(iris.target_names))]
),
)
run_id = run.info.run_id
The bundled artifact is the unit edgeflow loads into an inference pod later. Both the ONNX bytes and the postprocess pipeline travel together; the pod has everything it needs from a single download.
Register and deploy#
The MLflow run is a record of the experiment. To make the model addressable, promote it into the registry, then point a target at it.
mv = edgeflow.register(run_id, "iris-classifier", server=EDGEFLOW_SERVER)
deployment = edgeflow.deploy(
mv.name, mv.version, EDGEFLOW_TARGET, server=EDGEFLOW_SERVER, wait=True
)
register creates a versioned ModelVersion from the run.
deploy tells the iris-inference target to load that version, and
wait=True blocks until the inference pod confirms the new pipeline
is live - so by the time the script exits, you can hit the endpoint.
Expected output:
training iris classifier...
accuracy: 0.9667
pushing to edgeflow at http://localhost:5000...
run_id: 1f2a...
3. Send a request#
The inference endpoint accepts a JSON array of feature values (sepal length, sepal width, petal length, petal width).
curl -X POST http://localhost:8080/infer \
-H 'Content-Type: application/json' \
-d '[5.1, 3.5, 1.4, 0.2]'
You should get back something like:
{"class_id":0,"label":"setosa","confidence":0.9766}
Next steps#
Iris with preprocessing: ship transforms with the model - move feature normalisation off the client and into a WASM pre-transform that ships with the model.
Adult income: JSON input with mixed feature types - JSON input with encoded categorical features (named-input mode).
YOLOv8 on edgeflow: image input, WASM pre/post - real CV model, image input, k3d deployment.