ADR-001: System Architecture#

Status: Accepted

Date: 2026-03-28 (revised from 2026-03-24)

Context#

Edgeflow aims to be an MLOps platform purpose-built for edge device fleets: robots, drones, industrial hardware, autonomous systems. The core problem it solves:

Teams deploying ML models to physical devices have no standard tooling to know what model version is running on which device, detect degradation in production, and safely roll back across a fleet.

Existing tools (MLflow, Weights & Biases, BentoML) solve parts of this problem but assume a cloud or datacenter context. None provide a coherent story for edge-constrained environments where devices may be offline, have limited resources, and are managed as a fleet rather than individually.

The system must:

Run the server on constrained hardware (single binary, low memory footprint)
Communicate reliably with devices on flaky or intermittent network connections
Be compatible with existing MLflow Python clients to lower adoption friction
Not require additional infrastructure to get started (broker, message queue, sidecar)
Provide a path toward a native richer API as the product matures

Decision#

The system is two components with separate responsibilities. A third component (device manager / lifecycle supervisor) was considered and rejected - see Alternatives Considered.

1. edgeflow-server#

The central control plane. Runs in the infrastructure (bare metal, VM, or k3s cluster).

Responsibilities:

Experiment tracking (MLflow-compatible API surface)
Model registry (versioning, staging, promotion)
Deployment orchestration (targeting devices and fleets)
Telemetry ingestion and storage
Drift detection evaluation
Alert rule management
Serving the Svelte UI as static files
Embedded MQTT broker (rumqttd) for device communication

Stack: Rust + Axum + SQLite (swappable via Store trait) + tower-http + rumqttd

Key design choice: the MQTT broker is embedded in the server binary via rumqttd. No separate broker process is required. For deployments that already operate a managed broker (AWS IoT, HiveMQ, Mosquitto), the embedded broker can be disabled and an external URL configured instead:

[mqtt]
broker = "embedded"          # default - runs rumqttd inside the server process
# broker = "mqtt://your-broker.example.com:1883"   # external alternative

2. edgeflow-inference#

A standalone process that runs on each target (edge device, VM, k8s pod, or bare metal server). Responsible for model execution and, in managed mode, for its own lifecycle coordination with the server.

Responsibilities:

Load and run ONNX models via ort (ONNX Runtime)
Expose a local HTTP endpoint (POST /infer) for the application to call
Collect input/output telemetry and latency metrics
Detect input distribution drift against a stored baseline

In managed mode (when [server] is configured):

Register with edgeflow-server on startup
Listen for deployment instructions (MQTT subscription or HTTP poll fallback)
Download model artifacts from the registry
Hot-swap the loaded model without downtime
Confirm deployment status back to the server

Stack: Rust + ort + minimal HTTP server (no axum, weight matters on device) + optional MQTT client

Two modes, one binary:

# Standalone - pure inference server, no server communication
# (omit [server] block entirely)

# Managed - full lifecycle coordination
[server]
url  = "http://edgeflow-server:5000"
mqtt = "mqtt://edgeflow-server:1883"   # optional; falls back to HTTP poll if omitted

Key design choice: edgeflow-inference is useful standalone, without edgeflow-server. A user can run it as a pure ONNX inference server and never use the rest of the platform. This lets users adopt the inference component without committing to the rest of edgeflow. Process supervision (restart on crash) is delegated to the OS layer (systemd, Docker restart policy) rather than implemented as a third binary.

Communication Architecture#

Transport evolution:

The managed communication path can evolve without breaking changes:

HTTP pull (v1, no extra infrastructure) - inference polls server for pending deployments. Works through NAT and firewalls. Acceptable latency for model deployment (not control-plane timing-critical).
MQTT (v2, embedded broker) - inference subscribes to its deployment topic. Server publishes instructions immediately on deployment creation. Better for large fleets and real-time rollback instructions.

Both transports can coexist. Devices without MQTT connectivity fall back to HTTP poll automatically.

MQTT Topic Structure#

edgeflow/targets/{target_id}/deploy          ← server → inference (deployment instruction)
edgeflow/targets/{target_id}/deploy/ack      ← inference → server (accepted, loading)
edgeflow/targets/{target_id}/deploy/status   ← inference → server (deployed / failed)
edgeflow/targets/{target_id}/telemetry       ← inference → server

API Surface#

Native API (product identity)#

/api/v1/targets/**       ← target registration, pending deployments
/api/v1/deployments/**
/api/v1/models/**
/api/v1/experiments/**
/api/v1/runs/**
/api/v1/alerts/**

MLflow Compatibility Shim (adoption layer, frozen surface)#

/api/2.0/mlflow/experiments/**
/api/2.0/mlflow/runs/**
/api/2.0/mlflow/metrics/**
/api/2.0/mlflow/artifacts/**

The shim is a translation layer over the native data model, not a separate implementation. The native data model is designed first; the shim is built on top of it.

Model Format#

ONNX is the sole supported model format for deployment.

Rationale:

Universal export target from PyTorch, TensorFlow, and scikit-learn
ort (ONNX Runtime Rust bindings) is actively maintained
The ONNX graph contains full I/O schema (tensor names, shapes, dtypes) which edgeflow uses to auto-generate inference wrappers and telemetry configuration without user configuration
Enables hardware-specific optimization (CUDA, TensorRT, CoreML) via ORT execution providers without changing the model artifact

Drift Detection Strategy (PoC)#

For the initial implementation: running mean/std shift on input tensors.

Baseline distribution established from training data samples pushed with the model artifact
Agent computes running statistics on live inputs
Drift flagged when input distribution exceeds N standard deviations from baseline
PSI (Population Stability Index) and KL Divergence added in subsequent iterations

Key Lineage Chain#

Edgeflow’s core value is full lineage from training to device:

This chain must be queryable end to end. A user must be able to ask:

“What training run produced the model on robot-07?”
“Which devices are running a model trained on dataset version X?”
“What changed between the model on robot-07 and robot-12?”

Consequences#

Positive:

Two-component design keeps the operational surface minimal - one server binary, one inference binary per device
edgeflow-inference can be adopted on its own, useful as a pure ONNX server
Embedded MQTT broker means zero extra infrastructure for the common case
HTTP poll fallback means the system works through NAT and firewalls without extra setup
MQTT handles intermittent edge connectivity (QoS, persistent sessions, offline buffering) when available
MLflow shim provides immediate compatibility with existing Python training code
Single Rust binary for the server is easy to deploy on constrained hardware
Store trait abstraction allows SQLite → Postgres migration without changing business logic

Negative / risks:

edgeflow-inference now carries both inference logic and lifecycle coordination - these need clean internal separation so standalone mode keeps working without the server
ONNX-only model format excludes teams using TensorFlow SavedModel or TorchScript directly; acceptable tradeoff for PoC, revisit at v1
Maintaining MLflow API compatibility is ongoing work as MLflow evolves; treat the shim as frozen at the current API surface, do not track MLflow HEAD

Alternatives Considered#

Separate ``edgeflow-device`` binary (device manager)

Rejected. The responsibilities originally assigned to a separate device manager - registration, artifact download, hot-swap, deployment confirmation - are already implemented in edgeflow-inference (pod health is observed via the k8s API instead). The only remaining concern (process restart on crash) is better handled by systemd or Docker restart policies than by a custom-written process supervisor. A third binary adds operational surface (a third systemd unit, a third thing to monitor and update) with no architectural benefit.

gRPC instead of MQTT for device communication

Rejected. gRPC assumes stable connections. MQTT’s QoS levels and persistent sessions are designed for exactly the intermittent connectivity pattern of edge devices.

REST polling instead of MQTT

Not fully rejected - HTTP poll is the v1 transport precisely because it requires no extra infrastructure and works through NAT. MQTT is the v2 transport that replaces polling once the embedded broker is in place. The two coexist: devices without MQTT connectivity continue to poll.

External MQTT broker (Mosquitto, HiveMQ) as a required dependency

Rejected for the default case. Early adopters should not have to operate a broker to get started. rumqttd embedded in the server binary provides a real MQTT broker with zero operational overhead. External brokers remain supported for production deployments that already have one.