Ephemeral Castle Vault Bootstrap and Restore

The Vault runtime on Hetzner uses a sophisticated “Classification” logic to handle the transition from a fresh VPS to a fully hydrated, unsealed secret engine.

The Classification Matrix

The Ansible role vault-runtime (tasks in ansible/roles/vault-runtime/tasks/) determines the state of three domains:

  1. TazPod (The Source):
    • empty: No keys found in operator vault.
    • coherent: Keys exist and match the current lineage.
  2. Local (The Target):
    • empty: No Raft data or unseal shares on the VPS.
    • consistent: Local data matches TazPod lineage.
  3. S3 (The Backup):
    • empty: No snapshots found.
    • coherent: Valid Raft snapshots available for the lineage.

Automated Restore Logic (C2)

If the Local VPS is destroyed but TazPod and S3 are coherent, create.sh executes the following:

  • Metadata Sync: Reads latest.json pointer from S3 to find the correct snapshot.
  • Binary Restore: Executes vault operator raft snapshot restore using a temporary local instance.
  • Passphrase Hydration: Rehydrates /var/lib/lushycorp-vault/bootstrap/ unseal shares from TazPod.
  • Final Unseal: Reruns the systemd-managed unseal helper to bring the leader node online.

S3 Pointers and Rotation

  • Lineage ID: A unique UUID generated at first init (vault_lineage_id).
  • Global Path: vault/raft-snapshots/latest.json.
  • Lineage Path: vault/raft-snapshots/<lineage_id>/latest.json.
  • A/B Slots: Snapshot objects rotate between slot-a and slot-b to prevent corruption.

See Also