Why Three Components
How Basalt's three-component architecture — gateway, agent, and image service — eliminates the complexity tax of microservice infrastructure platforms.
Infrastructure platforms tend to accumulate components faster than they accumulate clarity. A scheduler becomes a control plane. The control plane grows a queue, an identity service, a policy service, a workflow service, a metadata service, and a fleet of sidecars that exist mostly to translate between the others. The result can scale, but it also turns ordinary operations into distributed-systems incidents.
Basalt takes a narrower shape: gateway, agent, and image service. That is enough separation to keep concerns honest, but not so much separation that the platform needs its own platform team to run it.
The gateway owns truth
The gateway is the API server. It is Axum-based, serves the portal UI, exposes the REST endpoints, authenticates users and automation, and owns the Postgres database. That last part is the important part. Basalt does not split infrastructure truth across a scheduler database, a VM database, a network database, and a workflow database. There is one source of truth for tenants, projects, hosts, virtual machines, networks, storage pools, image references, migrations, audit events, and long-running tasks.
A single database is not a lack of ambition. It is an operational constraint chosen on purpose. Many infrastructure failures begin as disagreement between services: the scheduler thinks a VM moved, the network service still has the old binding, the storage service has a half-finished volume attachment, and the UI is showing whichever component answered last. Basalt avoids that class by making state transitions durable in one place and making every worker reconcile from that truth.
Long-running work still exists. Creating a VM, importing an image, or migrating a workload cannot be reduced to a single SQL statement. Basalt handles those operations through a durable task executor with progress tracking. The task record gives operators and users a stable object to inspect, retry, or audit. The task system is not a separate workflow microservice with its own database; it is part of the same control-plane model, backed by the same state that describes the resources being changed.
The agent reconciles reality
The agent is a per-host daemon. It manages the host-local mechanics: KVM and libvirt domains, Open vSwitch networking, storage pools, and the runtime facts that only the host can observe. It does not receive a stream of fragile imperative instructions such as “create tap device, then attach bridge port, then define this domain, then start it” and hope that every step finishes in order.
Instead, the agent reconciles manifests. The gateway tells it what should exist. The agent compares that desired state with what actually exists on the host and converges the host toward the manifest. If a VM is defined but stopped, the agent can start it. If a network attachment exists in the database but the local OVS port is missing, the agent can recreate it. If a host restarts halfway through an operation, reconciliation runs again from durable state instead of trying to resume an in-memory procedure.
This is the difference between orchestration and reconciliation. Imperative orchestration optimizes for the happy path: perform these steps in this order. Manifest-driven reconciliation optimizes for the real path: observe, compare, and converge until reality matches intent. It accepts that hosts reboot, libvirt commands fail, storage pools disappear temporarily, and network state can drift. The recovery mechanism is the normal mechanism, not a separate disaster path.
Basalt’s stuck-migration reconciler is a concrete example. Live migration touches two hosts, hypervisor state, storage assumptions, and control-plane records. A purely imperative system has to answer difficult questions after a timeout: which step completed, which side owns the VM, and which service is authoritative? In Basalt, migration state is represented durably, and the agent-side reconciliation logic can detect a migration that stopped making progress and recover it according to observed host state and the gateway’s intended outcome. The platform does not need a human to inspect three service logs and decide which component won.
The image service isolates blob mechanics
Images are different from ordinary control-plane rows. They are large, transferred in chunks, retried across unreliable networks, and often moved between machines that should not receive broad database credentials. Basalt keeps those mechanics in the image service.
The image service is a blob store backed by the local filesystem. It supports chunked-resume uploads, so a large image import does not have to restart from byte zero after a client interruption. Transfers use single-use HMAC-signed tokens, giving the gateway a way to authorize a specific movement of bytes without turning the blob endpoint into a general-purpose credential. Completion callbacks are written to a durable spool and retried with exponential backoff, so a transient gateway outage does not strand an otherwise successful upload.
This is a separate component because the failure profile is separate. Blob transfer wants resumability, token scoping, and filesystem durability. Control-plane authorization wants tenant context, RBAC, audit events, and relational consistency. Host reconciliation wants local privileges to manage hypervisor state. Collapsing all three into one binary would blur security boundaries; splitting them into a dozen services would add coordination tax without improving the model.
Why not microservices?
The usual argument for microservices is independent scale and independent deployment. That argument is strongest when teams are independently shipping unrelated domains with different load profiles. It is weaker for an infrastructure control plane where most operations are state transitions over the same resources.
A VM create request is not only a compute event. It involves identity, quota, project membership, image metadata, network selection, storage placement, task progress, audit logging, and host capability reasoning. Splitting those into separately deployed services means either duplicating state or introducing synchronous calls between every step. Once that happens, the platform must solve distributed transactions, retry storms, partial failure semantics, schema drift between services, and debugging across many traces. The user asked for a VM. The operator gets a consensus problem.
Basalt’s three-component design keeps the unavoidable boundaries and removes the optional ones. The gateway is the transactional brain. The agent is the privileged host actuator. The image service is the large-object transfer boundary. Capability reasoning happens where the platform has the full inventory of host facts and desired outcomes, not in a maze of services that each know one dimension. A host can advertise what it can actually run; the gateway can decide what should be placed there; the agent can enforce that decision locally.
The operational property is simple: three components means three things to deploy, monitor, secure, and upgrade. There is one database to back up and one schema to migrate. There are fewer version-skew combinations. Failure domains are explicit: API/database, host reconciliation, and blob transfer. Operators can reason about the system without first drawing a service map.
This does not make Basalt small. It makes the complexity visible. The hard parts of infrastructure platforms are still present: hypervisor lifecycle, OpenFlow and OVS state, storage pool behavior, live migration, tenant isolation, RBAC, image movement, task durability, and auditability. The design choice is to put those hard parts behind three stable responsibilities instead of hiding them behind a large number of internal APIs.
A control plane should be boring when nothing is wrong and legible when something is. Gateway, agent, and image service are Basalt’s answer to that requirement.