TazLab K8s Flux DAG (Dependency Graph)

The cluster desired state is reconciled through a directed acyclic graph (DAG) of 15 Flux Kustomizations. This DAG ensures foundational resources (CNI, DNS, namespaces) are ready before higher-level operators or applications are applied.

The Build Chain

The following order is defined by the dependsOn property in clusters/tazlab-k8s/*.yaml.

Level 0 — Foundation (Parallel Roots)

These four Kustomizations have no dependencies and run in parallel:

  • infrastructure-operators-namespaces → Creates ai-agents namespace. HealthChecks on kube-flannel (DaemonSet) and coredns (Deployment) to ensure the network is operational.
  • infrastructure-operators-core → Installs cert-manager, Traefik, reloader, dex, OAuth2 Proxy, cloudflare-ddns, and namespace declarations for tazlab-db, hugo-blog, hugo-wiki.
  • infrastructure-operators-data → Installs Crunchy PostgreSQL Operator (PGO).
  • infrastructure-tailscale → Creates tailscale namespace, ExternalSecret for Operator OAuth credentials (k8s_operator client), and HelmRepository for the Tailscale chart.

Level 1 — Bridge, Monitoring & Tailscale

  • infrastructure-bridge → Configures IngressClass (traefik) and ClusterIssuer (tazlab-issuer, Let’s Encrypt prod). DependsOn: operators-core, operators-namespaces.
  • infrastructure-monitoring → Installs kube-prometheus-stack (Grafana + Prometheus). DependsOn: operators-namespaces.
  • infrastructure-operators-tailscale → Installs the Tailscale Operator HelmRelease (v1.96.5). DependsOn: infrastructure-tailscale.

Level 2 — Tailscale DNS

  • infrastructure-tailscale-dns → Deploys hostNetwork CoreDNS relay DaemonSet (port 5353) for magellanic-gondola.ts.net resolution, static ClusterIP Service (10.96.0.101), and patches the coredns ConfigMap with a tailnet forwarding zone. Not needed for general pod DNS. DependsOn: infrastructure-operators-tailscale.

Level 3 — Secrets & Identity

  • infrastructure-configs → Deploys all ExternalSecrets (Cloudflare token, wildcard TLS, S3, Dex OIDC, GitHub token, OpenClaw secrets). DependsOn: infrastructure-bridge.

Level 4 — Workloads (Parallel)

All five depend on infrastructure-configs. They apply in parallel with wait: false where appropriate:

  • infrastructure-instances → Deploys PostgresCluster, Traefik Service/LoadBalancer, Longhorn, Dex, pgadmin, homepage, cloudflare-ddns, OpenClaw, plus all 4 image automation pipelines. DependsOn: infrastructure-configs, infrastructure-operators-data. Uses wait: false — pods handle Pending/Init states naturally.
  • apps-static → Deploys hugo-blog (nginx + static files). DependsOn: infrastructure-configs.
  • apps-static-wiki → Deploys hugo-wiki (nginx + static wiki). DependsOn: infrastructure-configs.
  • apps-data → Deploys mnemosyne-mcp (Go MCP server). DependsOn: infrastructure-configs.

Level 5 — Access Management

  • infrastructure-auth → Deploys OAuth2 Proxy + ForwardAuth middleware. DependsOn: infrastructure-instances.

Wait Policy

Not all Kustomizations use the same synchronization strategy:

KustomizationwaitNotes
operators-namespacestrueBlocks until flannel + coredns ready
operators-coretrue
operators-datatrue
tailscaletrue
bridgetrue
monitoringdefault (true)
operators-tailscaletrue
tailscale-dnstrue
configstrueBlocks until secrets available
instancesfalseNon-blocking — init containers handle readiness
apps-staticdefault (true)
apps-static-wikidefault (true)
apps-datadefault (true)
authtrueBlocks until Dex + OAuth2 healthy

The key architectural insight: infrastructure-instances uses wait: false because it bundles database, storage, and workloads that take variable time to become ready. Flux applies the manifests quickly and lets Kubernetes handle the Pending/Init states through init containers (e.g., wait-for-db).

Health Checks

Only two Kustomizations use explicit health checks:

  • infrastructure-operators-namespaces: DaemonSet kube-flannel + Deployment coredns
  • infrastructure-tailscale: Deployment tailscale-operator (HelmRelease target)

This is deliberate: once the network is proven healthy, the rest of the DAG relies on dependsOn ordering and init containers rather than health check polling.

See Also