Ephemeral Castle Rebirth Protocol
The Rebirth Protocol (create.sh V11.1) is a high-level orchestrator that moves the cluster from “Nothing” to “Blog Online” in approximately 9-10 minutes (down from 12min after v11.1 hardening).
The create.sh Logic (Step-by-Step)
1. Secret Resolution
Uses the resolve() bash function to extract credentials from the TazPod Vault (/home/tazpod/secrets/).
- If a file exists in the vault, it’s exported as an env var.
GITHUB_TOKENis used to create a cluster-wideghcr-pull-secretfor authenticated ghcr.io pulls (avoids anonymous rate-limit).
2. Tailscale AuthKey Minting
The script executes mint_tailscale_authkey():
- Authenticates with Tailscale OAuth (
client_id+client_secret). - Requests a short-lived (3600s), reusable, ephemeral key with
tag:tazlab-k8s. - The key is kept in memory and never committed to Terraform state.
3. Platform Layer (proxmox-talos module)
- Creates Proxmox VMs (1 CP + 1 Worker) via Terraform.
- Applies Talos machine configuration with baked-in patches:
cluster.apiServer.certSANs: addslushycorp-k8s.magellanic-gondola.ts.netto K8s API server TLS cert.cluster.apiServer.extraArgs: configures multi-issuerservice-account-issuer,api-audiences,service-account-jwks-uri.cluster.coreDNS.disabled: true: disables Talos built-in CoreDNS.machine.install.image: Talos v1.12.0 installer image.
- CRITICAL LESSON:
machine.certSANsis for the Talos node API (port 50000), NOT the K8s API server (port 6443). Usingmachine.certSANstriggerstrustdPKI regeneration that can invalidate kubelet/kube-proxy certificates, causing a cluster deadlock requiring full rebuild.
4. Engine Layer (k8s-engine module)
Layers applied AFTER Talos machines are configured:
- ESO HelmRelease: external-secrets operator (required for CRDs).
- CoreDNS (user-managed): ServiceAccount, ClusterRole, ClusterRoleBinding, ConfigMap (with ts.net forward → Tailscale nameserver
10.96.0.101), Deployment (2 replicas), ClusterIP Service (10.96.0.10). Replaces Talos built-in CoreDNS disabled above. - Vault bootstrap secrets:
vault-ca-certandvault-eso-tokenread directly from~/secrets/lushycorp-vault/via Terraformfile(). - Vault ClusterSecretStore
tazlab-secrets-vault: deployed withdepends_onon ESO helm release (ensures CRDs are registered). - GitHub token: created as direct
kubernetes_secret_v1from~/secrets/github-token— no ExternalSecret dependency. - Tailscale operator OAuth: created as direct
kubernetes_secret_v1from~/secrets/tailscale-operator-*— no ExternalSecret dependency.
5. Networking + GitOps + Storage (Parallel)
- networking layer: MetalLB (IP pool + L2Advertisement).
- gitops layer: Flux bootstrap using pre-created GitHub token secret. Flux DAG then reconciles: bridge -> namespaces -> operators-tailscale (uses pre-created OAuth) -> tailscale-dns -> configs (Vault store) -> instances -> auth -> apps.
- storage layer: Longhorn (parallel with networking + gitops, non bloccante). Longhorn non dipende da MetalLB — nessun motivo tecnico per sequenziarli. Installed after networking. Creates volume backing for PVCs.
7. Post-Bootstrap: PGO → Vault Sync
After Flux deploys PGO and the database is ready:
- Script waits for
tazlab-db-pguser-grafanasecret intazlab-dbnamespace. - Reads the fresh PGO-generated password.
- Syncs it to Vault at
secret/tazlab-k8s/static/monitoring/grafana/GRAFANA_DB_PASSWORD. - Grafana ExternalSecret picks up the password and Grafana starts with 3/3 containers.
8. Verification
- Waits for Traefik LoadBalancer IP assignment.
- Runs
check-blog.shto verify the blog is responding. - Reports timing for each layer (approximate: platform 2m, engine 95s, networking+gitops 106s, storage 85s).
Bootstrap Deadlocks Fixed (2026-05-28)
The engine layer was restructured to eliminate bootstrap circular dependencies:
| Dependency | Problem | Fix |
|---|---|---|
| Operator OAuth → Vault → DNS → Operator | Chicken-and-egg in fresh bootstrap | OAuth secret created directly from operator files in engine layer |
| CoreDNS → Talos SSA | Talos v1.12+ Server-Side Apply overwrites manual Corefile changes | coreDNS.disabled: true + user-managed CoreDNS deployed by engine |
| ESO CRD → Vault ClusterSecretStore | CRD not registered when Terraform applies store resource | depends_on = [helm_release.external_secrets] |
| GitHub token → ESO → Vault → DNS | ExternalSecret chain needs DNS which needs operator | Direct kubernetes_secret_v1 from operator file, bypassing ESO |
| CoreDNS Corefile syntax | Single-line health { lameduck 5s } causes CoreDNS crash loop | Multiline health block in Corefile |
| Grafana secret missing from kustomization | secrets.yaml not in resources list of monitoring kustomization | Added to kustomization.yaml and pushed to master |
Log Locations
Logs are stored in clusters/tazlab-k8s/proxmox/logs/ with a timestamped directory structure. Parallel Terragrunt layers output to /workspace/logs/dag-fix/layer-*.log.
See Also
- Layers: Terragrunt Layers
- Hub: Ephemeral Castle Hub
- Reference: Operator Cheat Sheet