We document the architecture because the buyers who matter ask. If your CTO wants a deeper read or a custom variant, write us. We will send the whitepaper and a sample repo.
L5
Telemetry & Ops
Health, latency, intent logs, alerts, fleet view, training feedback loop. The dashboard a real on-call engineer watches at 2am when something is off.
Prometheus
Grafana
Sentry
Custom audit log
PagerDuty / Opsgenie hooks
L4
Agent Layer
Task graph, brand voice, persona, safety envelope, escalation paths, handoff to humans. This is the layer that makes a robot feel like a coworker instead of a parrot.
Custom Python orchestrator
LangGraph patterns where they fit
Pydantic schemas
YAML personas
Eval harness with graded test set
L3
Models
Whatever model wins for the task at hand. Long-context reasoning here, fast tool-calling there, on-device speech for latency, on-device LLM when the network is unreliable.
Claude 4 family for reasoning
GPT-4o for tool calling
Llama 3 8B Q4 on Orin (fallback)
Whisper-small.en (ASR)
Piper / ElevenLabs (TTS)
Moondream 2 (VLM)
L2
Orchestration
The wiring between the SDK and the agent. Message bus, task scheduler, sensor fusion, the bits that ROS 2 does well and the bits we replace because ROS 2 does not.
ROS 2 Humble
micro-ROS bridge
DDS
Custom Python services
Redis for ephemeral state
PostgreSQL for episodic memory
L1
Vendor SDK
The hardware floor. Manufacturer SDK, low-level motor control, sensor I/O, on-robot networking, kill switch, charging dock protocol.
Unitree SDK 2 (low + high level)
Boston Dynamics SDK
Figure / 1X partner APIs
CAN bus + GPIO for retrofits
Operating principles
Five rules we will not break.
Models change. Interfaces should not. The agent layer talks to a model interface, not a vendor SDK. Swapping Claude for a local Llama is a config change, not a refactor.
Every agent has an escalation path. If the agent cannot answer with a confidence threshold, it hands off to a human with structured context. No silent failures.
Evals are not optional. Every deployment ships with a graded eval set. Regressions break the build, not the customer experience.
Telemetry is a first-class citizen. If we cannot see what happened in production, we did not deploy a system, we deployed a prayer.
The kill switch is hardware, not software. Anything that moves around humans gets a physical, visible, single-action stop. No exceptions.