Ephemeral Castle Cluster Bootstrap

Scope

This page documents the active tazlab-k8s one-shot bootstrap flow inside ephemeral-castle/.

Current Synthesis

clusters/tazlab-k8s/proxmox/create.sh is the high-level rebirth orchestrator. It mints bootstrap credentials, applies the Terragrunt foundation layers, patches Talos for Tailscale, then drives Flux convergence and post-bootstrap validation.

Bootstrap Sequence

1. Secret Minting

  • resolve Infisical, Proxmox, GitHub, and Tailscale operator secrets from /home/tazpod/secrets
  • mint a short-lived Tailscale AuthKey in memory
  • avoid persisting the AuthKey into Terraform/Terragrunt state

2. Terragrunt Foundation

  • secrets fetches Proxmox credentials and TALOS_SECRETBOX_KEY from Infisical
  • platform creates the Proxmox VMs and applies Talos machine configuration
  • engine installs ESO and creates the tazlab-secrets bridge
  • networking installs MetalLB
  • gitops bootstraps Flux
  • storage installs Longhorn and the S3 backup secret

3. Talos Tailscale Patching

  • create.sh patches the Talos machine config with Tailscale ExtensionServiceConfig
  • the patch is applied node-by-node with talosctl apply-config
  • hostnames are generated from the cluster name and role

4. Flux Reconciliation

  • reconcile the flux-system source
  • reconcile core infrastructure kustomizations
  • reconcile infrastructure-auth, apps-data, and apps-static
  • wait for each kustomization to become Ready

5. Post-Bootstrap Validation

  • wait for Longhorn PVCs to bind
  • wait for the PostgreSQL restore job to start and complete
  • sync runtime-generated Grafana secret data back into the monitoring namespace
  • wait for Grafana readiness
  • wait for the Traefik LoadBalancer IP
  • run check-blog.sh

Important Implementation Details

  • network and gitops layers are parallelized after engine
  • storage stays after the parallel group because it depends on the ESO bridge
  • the script logs parallel layer output to /workspace/logs/dag-fix/
  • success output includes per-layer timing and total rebirth time

Operational Verification

  • check-blog.sh verifies HTTPS and looks for the Hugo marker string
  • precision-test.sh and stress-test.sh exercise the rebirth loop and log per-cycle behavior

Relationships

Source Basis

  • clusters/tazlab-k8s/proxmox/create.sh
  • clusters/tazlab-k8s/proxmox/destroy.sh
  • clusters/tazlab-k8s/proxmox/check-blog.sh
  • clusters/tazlab-k8s/proxmox/precision-test.sh
  • clusters/tazlab-k8s/proxmox/stress-test.sh
  • clusters/tazlab-k8s/BOOTSTRAP.md