Quickstart: your first deploy in 2 minutes#

This tutorial gets you from zero to a live inference endpoint. We use the iris dataset because it trains in under a second and has no heavyweight dependencies. Later tutorials move to real models.

You will:

  1. Bring up edgeflow locally with docker compose.

  2. Walk through a training script that pushes the model to edgeflow.

  3. Send a request to the live inference endpoint.

Prerequisites#

1. Bring up edgeflow#

Pull the quickstart compose file and start the stack. No clone needed - quickstart.yaml references pre-built images on GHCR.

curl -O https://raw.githubusercontent.com/jordandelbar/edgeflow/main/deploy/quickstart.yaml
docker compose -f quickstart.yaml up -d

Two containers start: the control-plane server on :5000 and an inference pod on :8080.

2. Build the training script#

If you just want to see it work, the finished script is on GitHub:

curl -O https://raw.githubusercontent.com/jordandelbar/edgeflow/main/examples/01-quickstart-iris/train.py
uv run train.py

The rest of this section walks through train.py piece by piece.

Dependencies#

Create a pyproject.toml in your project directory with these dependencies:

[project]
name = "iris-tutorial"
version = "0.0.0"
requires-python = ">=3.12"
dependencies = [
    "edgeflow",
    "mlflow>=3.11.1,<4",
    "numpy>=2.4.4,<3",
    "scikit-learn>=1.8.0,<2",
]

Imports#

Three groups: edgeflow’s SDK, MLflow for experiment tracking, and scikit-learn for the model itself.

import os

import edgeflow
import mlflow
import numpy as np
from edgeflow.models import sklearn_to_onnx
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

edgeflow.models.sklearn_to_onnx is a small helper that wraps skl2onnx with sensible defaults. You can call skl2onnx directly if you need finer control.

Configuration#

Two values point the script at the edgeflow server and pick a deployment target. The defaults match what docker compose up exposes, so you only need to override them when running against a different setup.

EDGEFLOW_SERVER = os.environ.get("EDGEFLOW_SERVER", "http://localhost:5000")
EDGEFLOW_TARGET = os.environ.get("EDGEFLOW_TARGET", "iris-inference")

Train the classifier#

Standard scikit-learn flow. Iris loads from a built-in dataset; cast features to float32 because that’s what the ONNX exporter expects.

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data.astype(np.float32), iris.target, test_size=0.2, random_state=42
)
clf = LogisticRegression(max_iter=200)
clf.fit(X_train, y_train)
accuracy = accuracy_score(y_test, clf.predict(X_test))

Nothing edgeflow-specific yet - this is the same code you would write to fit and evaluate any sklearn classifier.

Log to MLflow and bundle the model#

Edgeflow speaks the MLflow tracking protocol. Point mlflow at the edgeflow server and start a run as you normally would; log_params and log_metric behave exactly as they would against a vanilla MLflow server.

The edgeflow-specific call is edgeflow.log_model. It serialises the trained classifier to ONNX and bundles it with a postprocess pipeline into a single artifact. Here the pipeline is just ClassifierOutput, which maps the model’s raw probability vector to {class_id, label, confidence} so the inference endpoint returns something humans can read.

mlflow.set_tracking_uri(EDGEFLOW_SERVER)
exp = mlflow.set_experiment("iris-poc")

with mlflow.start_run(experiment_id=exp.experiment_id, run_name="iris-logistic") as run:
    mlflow.log_params(
        {
            "model": "LogisticRegression",
            "max_iter": 200,
            "n_features": 4,
            "n_classes": 3,
        }
    )
    mlflow.log_metric("accuracy", accuracy)
    edgeflow.log_model(
        model_bytes=sklearn_to_onnx(clf),
        postprocess=edgeflow.Pipeline(
            [edgeflow.ClassifierOutput(labels=list(iris.target_names))]
        ),
    )
    run_id = run.info.run_id

The bundled artifact is the unit edgeflow loads into an inference pod later. Both the ONNX bytes and the postprocess pipeline travel together; the pod has everything it needs from a single download.

Register and deploy#

The MLflow run is a record of the experiment. To make the model addressable, promote it into the registry, then point a target at it.

mv = edgeflow.register(run_id, "iris-classifier", server=EDGEFLOW_SERVER)
deployment = edgeflow.deploy(
    mv.name, mv.version, EDGEFLOW_TARGET, server=EDGEFLOW_SERVER, wait=True
)

register creates a versioned ModelVersion from the run. deploy tells the iris-inference target to load that version, and wait=True blocks until the inference pod confirms the new pipeline is live - so by the time the script exits, you can hit the endpoint.

Expected output:

training iris classifier...
accuracy: 0.9667
pushing to edgeflow at http://localhost:5000...
run_id: 1f2a...

3. Send a request#

The inference endpoint accepts a JSON array of feature values (sepal length, sepal width, petal length, petal width).

curl -X POST http://localhost:8080/infer \
     -H 'Content-Type: application/json' \
     -d '[5.1, 3.5, 1.4, 0.2]'

You should get back something like:

{"class_id":0,"label":"setosa","confidence":0.9766}

Next steps#