Ephemeral Castle Operational Cheat Sheet

This page provides a quick reference for common commands and maintenance tasks within the ephemeral-castle repository.

Cluster Lifecycle (tazlab-k8s)

Run these from clusters/tazlab-k8s/proxmox/:

ActionCommand
Full Bootstrap./create.sh
Full Teardown./destroy.sh
Force VM Delete./nuclear-wipe.sh
Check Logsls -ltr logs/

Manual Terragrunt Operations

To apply changes to a specific layer without a full rebirth:

  1. Export Secrets (from TazPod):

    export INFISICAL_CLIENT_ID="$(tr -d "'\" " < ~/secrets/infisical-client-id)"
    export INFISICAL_CLIENT_SECRET="$(tr -d "'\" " < ~/secrets/infisical-client-secret)"
    export PROXMOX_TOKEN_ID="$(tr -d "'\" " < ~/secrets/proxmox-token-id)"
    export PROXMOX_TOKEN_SECRET="$(tr -d "'\" " < ~/secrets/proxmox-token-secret)"
    export GITHUB_TOKEN="$(tr -d "'\" " < ~/secrets/github-token)"
    
  2. Navigate to Layer:

    cd clusters/tazlab-k8s/live/<layer-name>
    
  3. Execute:

    terragrunt plan
    terragrunt apply --non-interactive --auto-approve
    

Networking (Tailscale)

ActionCommand
Start / Join TailnetAGENTS.ctx/tools/tailscale/start.sh
Apply ACL/OAuthcd tailscale/ && ./setup.sh
Check Peerstailscale status
Ping Vaulttailscale ping lushycorp-vault
Ping Clustertailscale ping tazlab-k8s-control-plane-01

Note: start.sh is run from the workspace root, and it launches tailscaled in the background so the shell returns immediately while the daemon initializes.

Vault Runtime (Hetzner)

Run these from runtimes/lushycorp-vault/hetzner/:

ActionCommand
Create/Restore./create.sh
Nuclear Destroy./destroy.sh
Build golden image./golden-image/scripts/build-golden-image.sh --snapshot-name "<name>"

Ansible Playbooks

Run from runtimes/lushycorp-vault/hetzner/ansible/:

PlaybookWhen to use
tailscale-bootstrap.ymlFirst-time Tailscale join on new VM (public IP)
common.ymlPodman + package verification (tailnet)
vault-runtime-install.ymlRuntime installation, config, service setup
vault-runtime-converge.ymlClassification, restore/init, unseal, health
vault-runtime-post.ymlAdmin token, snapshot backup, TazPod persistence

The old monolithic vault-s3-backup-recovery.yml was split into three playbooks (install/converge/post) to improve observability and allow per-stage timing.

Vault Runtime Preflight

Before ./create.sh on the Hetzner Vault runtime:

  1. Hostnet+TUN mode: Tailscale runs natively inside the container via tazpod-tailscale-up. The old userspace start.sh path is no longer needed. create.sh auto-detects the Tailscale socket (TUN socket preferred).
  2. Ensure TazPod vault is unlocked:
    tazpod unlock
    
  3. Verify the canonical bootstrap files if the run depends on restore or remote-durability continuity:
    ls ~/secrets/lushycorp-vault/init.json \
       ~/secrets/lushycorp-vault/unseal-keys.json \
       ~/secrets/lushycorp-vault/root-token.txt \
       ~/secrets/lushycorp-vault/admin-token.txt \
       ~/secrets/lushycorp-vault/admin-token.json
    
  4. Remember that /workspace/.tazpod/vault/vault.tar.aes alone is not enough for runtime classification; the playbook reads the decrypted canonical files under ~/secrets/lushycorp-vault/.
  5. Check phase logs under logs/:
    • *-10-terraform.log
    • *-20-public-bootstrap.log
    • *-30-tailscale-validation.log
    • *-40-transport-switch.log
    • *-50-podman-verification.log
    • *-60-vault-runtime-install.log
    • *-70-vault-runtime-converge.log
    • *-80-vault-runtime-post.log

Timing

With hostnet+TUN mode and the split playbooks, the full create.sh cycle completes in ~344s (down from ~1200s):

PhaseSec
Terraform (VM)4
Public Bootstrap (SSH+Tailscale)16
Tailscale Validation1
Transport Switch9
Podman Verify (common)11
Vault Runtime Install175
Vault Runtime Converge90
Vault Runtime Post38
TOTAL344

Destroy/Create Warning

./destroy.sh removes the Hetzner server and local Terraform outputs, but it does not clear the S3 remote-durability layer. That means a subsequent ./create.sh can legitimately hit the T0 + H0 + S1 hard-fail matrix branch if the operator-side canonical bootstrap files are absent while remote S3 remains coherent.

Common Debugging Tools

Talos OS

  • Check Dashboard: talosctl dashboard --talosconfig clusters/tazlab-k8s/proxmox/configs/talosconfig
  • Get Config: talosctl get machineconfig

Kubernetes

  • Access Cluster: kubectl --kubeconfig clusters/tazlab-k8s/proxmox/configs/kubeconfig get nodes
  • Flux Status: flux get kustomizations
  • ESO Logs: kubectl logs -n external-secrets -l app.kubernetes.io/name=external-secrets

Proxmox

  • List VMs: qm list (on the Proxmox host)
  • Check Task Log: Look at the Proxmox Web UI “Tasks” pane.

See Also