Ephemeral Castle: Create/Destroy Detail

Level 3 (Detail) — Hetzner Vault runtime provisioning pipeline.

Concept

The Hetzner Vault runtime is provisioned through an 8-stage pipeline (create.sh) and torn down by a 4-stage destroy process (destroy.sh). Each stage writes a timestamped log file under logs/.

Create Pipeline

File: runtimes/lushycorp-vault/hetzner/create.sh (228 lines)

Stage 1: Terraform (VM)

Log: *-10-terraform.log

  • cd terraform/ && terraform init && terraform apply
  • Variables: hcloud_token, image_id, image_name
  • Creates: Hetzner VPS from golden image, firewall, initial network
  • Outputs: server IP, public inventory file (inventory.public.ini)

Stage 2: Public Bootstrap (SSH + Tailscale)

Log: *-20-public-bootstrap.log

  • Waits for SSH reachability (up to 30 attempts, 10s apart)
  • Runs Ansible playbook tailscale-bootstrap.yml via public IP
  • Tailscale OAuth credentials passed as env vars
  • Installs Tailscale on the VM, registers with tags tag:tazlab-vault, tag:vault-api
  • Hostname registered as lushycorp-vault

Stage 3: Tailscale Validation

Log: *-30-tailscale-validation.log

  • Verifies operator-side tailscale status and tailscale ip -4
  • Runs validate-device-tags.sh to confirm lushycorp-vault has required tags
  • Outputs tag validation JSON to configs/tailscale-tags.json

Stage 4: Transport Switch

Log: *-40-transport-switch.log

  • render-tailscale-inventory.sh generates Ansible inventory with tailnet IP
  • Verifies SSH over Tailscale transport (up to 6 retries, 5s apart)
  • This is the critical transition from public-IP SSH to tailnet-only SSH

Stage 5: Podman Verify

Log: *-50-podman-verification.log

  • Runs ansible-playbook common.yml over tailnet transport
  • Verifies Podman is installed and functional on the target VM

Stage 6: Vault Runtime Install

Log: *-60-vault-runtime-install.log

  • Runs ansible-playbook vault-runtime-install.yml over tailnet
  • Vault binary + config installation, systemd unit setup, golden image validation

Stage 7: Vault Runtime Converge

Log: *-70-vault-runtime-converge.log

  • Runs ansible-playbook vault-runtime-converge.yml over tailnet
  • Classification, PKI generation, init/restore, unseal, health verification

Stage 8: Vault Runtime Post

Log: *-80-vault-runtime-post.log

  • Runs ansible-playbook vault-runtime-post.yml over tailnet
  • Admin token persistence, snapshot backup, TazPod encrypted archive refresh and S3 push

Timing Summary

After completion, create.sh prints a timing table showing elapsed seconds per stage.

Preflight Checks

The script validates before starting:

  • TazPod vault is unlocked (/workspace/.tazpod/vault/vault.tar.aes)
  • Hetzner token exists (~/secrets/hetzner-token)
  • Tailscale OAuth credentials exist
  • SSH key exists (~/secrets/ssh/lushycorp-vault/id_ed25519)
  • Golden image env file exists (configs/golden-image.env)

Transport: Hostnet+TUN Auto-Detection

create.sh auto-detects the Tailscale socket at startup: if /dev/net/tun is available and a TUN-mode socket exists at /var/run/tailscale/tailscaled.sock, it uses that path. Otherwise it falls back to the old userspace socket under AGENTS.ctx/tools/tailscale/state/. The inventory renderer (render-tailscale-inventory.sh) also auto-detects TUN mode and generates direct-tailnet SSH (no ProxyCommand) when available.

Timing

With hostnet+TUN mode and split playbooks, the full create cycle completes in ~344s (down from ~1200s with the old monolithic playbook). Typical per-stage timing: Terraform 4s, Public Bootstrap 16s, Tailscale Validation 1s, Transport Switch 9s, Podman Verify 11s, Vault Install 175s, Vault Converge 90s, Vault Post 38s.

Destroy Pipeline

File: runtimes/lushycorp-vault/hetzner/destroy.sh (188 lines)

Log: *-90-destroy.log

1. Delete Hetzner server (by name)
2. Wait for server to disappear (30 attempts, 2s apart)
3. Delete Hetzner firewall (by name)
4. Delete Tailscale device (by hostname via OAuth API)
5. Cleanup local artifacts (inventories, terraform state)

Destroy Order

  1. delete_server_fast: Finds server by name (lushycorp-vault) via hcloud server list, deletes all matching IDs
  2. wait_server_gone: Polls until server is no longer listed
  3. delete_firewall_if_present: Finds and deletes firewall named <server_name>-bootstrap
  4. delete_tailscale_device: Obtains OAuth token, lists devices, finds device by hostname lushycorp-vault, deletes it via Tailscale API
  5. cleanup_local_artifacts: Removes generated inventories, terraform state, runtime metadata, tag validation JSON

Important: No S3 Clearing

destroy.sh does not clear the S3 remote durability layer. After destroy, S3 still contains the lineage and snapshots. A subsequent create.sh will hit the T0 + H0 + S1 hard-fail if the operator-side canonical bootstrap files are absent.

See Also