Ephemeral Castle: Ansible Vault Detail

Level 3 (Detail) — Ansible role orchestration for the Hetzner Vault runtime.

Concept

The vault-runtime Ansible role (ansible/roles/vault-runtime/) handles all lifecycle operations on the Hetzner Vault VM: classification, PKI, bootstrap, restore, unseal, backup, and persistence.

Role Structure

ansible/roles/vault-runtime/
├── tasks/
│   └── main.yml           # Main orchestrator (~848 lines)
├── handlers/
│   └── main.yml           # Restart vault, reload systemd
├── files/
│   ├── vault-phase-a-bootstrap.sh    # First init
│   ├── vault-local-unseal.sh         # Auto unseal
│   ├── vault-snapshot-backup.sh      # S3 snapshot
│   └── vault-remote-restore.sh       # S3 restore
├── templates/
│   ├── vault.hcl.j2                          # Vault config
│   ├── lushycorp-vault.service.j2            # Main systemd unit
│   ├── vault-local-unseal.service.j2         # Unseal service
│   ├── vault-snapshot-backup.service.j2      # Backup service
│   ├── vault-snapshot-backup.timer.j2        # Backup timer
│   ├── vault-remote-restore.service.j2       # Restore service
│   └── lushycorp-vault.container.j2          # Quadlet (TD-016)
└── defaults/
    └── main.yml           # Variables and paths

Orchestration Flow

1. Classification Phase

The role classifies three domains before any lifecycle mutation:

TazPod domain (controller-side):

  • Checks 5 canonical files: init.json, unseal-keys.json, root-token.txt, admin-token.txt, admin-token.json
  • Classifies as: empty (0 files), inconsistent (missing or metadata mismatch), or coherent (all 5 present with valid metadata)

Local domain (VM-side):

  • Probes Vault status via podman exec vault status
  • Checks lifecycle receipt and data directory
  • Classifies as: empty, coherent, or inconsistent

Remote domain (S3):

  • Reads latest.json pointer from S3
  • Classifies as: empty, coherent, or inconsistent

2. Guard Phase

The role applies the classification matrix:

  • inconsistent → hard fail immediately
  • T0 + H0 + S1 → hard fail (bootstrap anchor gap, TD-021)
  • T0 + H0 + S0 → fresh init allowed
  • T1 + H0 + S1 → restore from S3 allowed
  • T1 + H0 + S0 → hard fail

3. Execution Phase

Fresh init (T0+H0+S0):

  1. vault-phase-a-bootstrap.sh: init Vault, create root token, create admin token, write lifecycle receipt
  2. Phase B: copy artifacts back to controller (~/secrets/lushycorp-vault/)
  3. Phase C: install unseal shares, lifecycle receipt on host
  4. Enable and run vault-local-unseal.service
  5. Verify Vault is initialized and unsealed

Restore (T1+H0+S1):

  1. vault-remote-restore.sh: download S3 snapshot, restore via vault operator raft snapshot restore
  2. Same post-restore unseal and verification

4. Post-Convergence

  • Validate Vault HTTPS endpoint: curl --resolve <fqdn>:8200:<tailnet-ip>
  • Run remote-durability-post.yml to finalize S3 surfaces
  • Fetch PKI certs and admin token from VM back to controller (ansible.builtin.fetch)
  • Check TazPod RAM vault is mounted (assert mountpoint -q /home/tazpod/secrets)
  • Run tazpod save from /workspace to re-encrypt updated canonical files
  • Run tazpod push vault from /workspace to persist to S3

See Also