Hermes Agent: LXC Deployment Architecture
Level 2 (Topic)
Concept
Hermes Agent runs as a bare-metal installation inside a hardened unprivileged Proxmox LXC container. The deployment follows the same enterprise pattern as the Hetzner Vault runtime: Terraform creates the LXC infrastructure, Ansible configures the internal state, and a top-level shell orchestrator (create.sh/destroy.sh) ties them together with structured logging and timing instrumentation.
The critical architectural challenge was achieving PVC-like data persistence on LVM-thin storage, which is not natively supported by Proxmox 9.1 for LXC containers (volumes matching vm-<vmid>-disk-* are always destroyed on container destroy). The solution is a Pet vs Cattle pattern: a dedicated “pet” container (CT 999, protection=1) that owns the persistent volume, mounted by ephemeral “cattle” containers (CT 105).
Architecture / Design
Container Model
- Cattle — CT 105 (Hermes Agent): Unprivileged Debian 12 LXC, managed by Terraform, destroyed/recreated on each cycle
- Pet — CT 999 (pet-storage): Minimal container (1 core, 256MB),
protection=1— never destroyed. Owns persistent volumes - Hermes runs as
hermesuser (UID 10000, non-root, no sudo) inside CT 105 features: nesting=truefor compatibility with Hermes’ internal subprocess management
Storage Architecture
- Cattle rootfs: 20G on
local-lvm(ephemeral — destroyed with CT 105) - Pet rootfs: 2G on
local-lvm(permanent — tied to CT 999) - Persistent volume: 10G LVM-thin volume
local-lvm:vm-999-disk-1, owned by CT 999, mounted into CT 105 at/home/hermesvia API - No backup/restore needed: Volume survives destroy/create because ownership is tied to CT 999, not CT 105
- Reusable: CT 999 can own multiple volumes for different cattle containers
Lifecycle
create.sh: Pet Ensure (2s) → Terraform Create (11s) → Wait SSH (15s) →
Attach Volume (8s) → Ansible Baseline (56s) → Agent (30s) →
Configure (6s) → Verify (9s) → TOTAL: 137s (2min 17s)
destroy.sh: Stop CT 105 → API detach mp0 → Terraform Destroy → total ~15s
(volume vm-999-disk-1 preserved on CT 999)
Key Design Decisions
- Bare-metal install, not Docker: Docker-in-unprivileged-LXC has overlay2/ZFS fallback to vfs driver. Rootless Docker breaks
network_mode: host. Directinstall.shavoids both. - No Bubblewrap sandbox: Requires
CAP_SYS_ADMIN(dropped by hardening) and Proxmox AppArmor blocksmount(). LXC isolation alone deemed sufficient. - Pet vs Cattle, not bind-mount: Proxmox API rejects bind-mounts for token-based auth (HTTP 403). The pet container pattern provides true persistence with only API token access.
- Volume attached via API on stopped container:
--data-urlencode "mp0=local-lvm:vm-999-disk-1,mp=/home/hermes"onPUT /config, then start container. Hotplug (mount on running container) fails — container must be stopped first.
Pet Container (CT 999) Details
- Managed by separate
terraform-pet/main.tf(independent state) protection = trueprevents any accidental deletion via API or GUI- 1 core, 256MB RAM, 2G rootfs
- Single purpose: own and serve the persistent 10G volume
vm-999-disk-1 - Must be created ONCE (
terraform apply), then never modified
Research History
Three research sessions confirmed the Proxmox 9.1 LVM-thin behavior:
- API mount syntax:
--data-urlencoderequired for values with commas.size=causes errors on existing volumes.delete=mp0works for detach. - LVM ownership: Proxmox always scans storage for
vm-<vmid>-disk-*during container destroy.delete_unreferenced_disks_on_destroyflag doesn’t exist for LXC in bpg/proxmox v0.106. - Proxmox 9.1 specifics:
destroy-unreferenced-disks=0ignored for LXC. Bind-mounts blocked (403 for tokens). No API rename endpoint. ZFS not available on this host.
Full research assets are in AGENTS.ctx/crisp-build/assets/.
Iteration History
| Iteration | Feature | Status |
|---|---|---|
| 1 | LXC base + networking + SSH | Done |
| 2 | Hermes bare-metal install (Playwright dep pre-install, –skip-setup) | Done |
| 3 | Configuration (opencode-go backend, systemd units, dashboard :9119) | Done |
| 4a | Managed volume + backup/restore | Deprecated |
| 4b | Pet vs Cattle persistence (current) | Done |
| 5 | Bubblewrap sandbox | Cancelled |
Known Issues / Technical Debt
- Dashboard Web UI npm build may fail on first start — run
cd web && npm install && npm run buildmanually inside the container - CT 999 has
protection=1— must disable protection before any maintenance operation - Volume attach requires container to be stopped (hotplug not supported)
Reference Blocks
Create API call (attach existing pet volume)
curl -sk -X PUT "https://proxmox:8006/api2/json/nodes/tazlab/lxc/105/config" \
--data-urlencode "mp0=local-lvm:vm-999-disk-1,mp=/home/hermes"
Destroy API call (detach, preserve volume)
curl -sk -X PUT "https://proxmox:8006/api2/json/nodes/tazlab/lxc/105/config" \
-d "delete=mp0"
ansible.cfg SSH keepalive
[ssh_connection]
ssh_args = -o ServerAliveInterval=30 -o ServerAliveCountMax=3 -o ConnectTimeout=10
See Also
- Parent hub: Hermes Agent
- Infrastructure layer: ephemeral-castle
- Operator environment: tazpod