TazLab K8s: Flux Kustomizations Detail

Level 3 (Detail) — All 15 Flux Kustomizations with exact spec, dependencies, and health checks.

Concept

Flux watches the clusters/tazlab-k8s/ directory for Kustomization resources. Each Kustomization points to a path in the repository and defines its dependencies, interval, and post-build substitution. Flux applies them in dependency order.

DAG Overview

infrastructure-operators-namespaces ──┬── infrastructure-bridge ────────── infrastructure-configs ──┬── infrastructure-instances ── infrastructure-auth
infrastructure-operators-core ────────┘                                                             ├── apps-static (hugo-blog)
infrastructure-operators-data ────────┴── infrastructure-monitoring                                  ├── apps-static-wiki (hugo-wiki)
                                                                                                      ├── apps-data (mnemosyne-mcp)
infrastructure-tailscale ─────────────── infrastructure-operators-tailscale ── infrastructure-tailscale-dns

Note: The Tailscale chain (infrastructure-tailscaleinfrastructure-operators-tailscaleinfrastructure-tailscale-dns) is an independent branch that does not depend on any core/infra Kustomization except itself. Tailscale DNS resolution is orthogonal to cluster core operations.

Kustomization Inventory

All Kustomizations share:

  • namespace: flux-system
  • sourceRef: kind: GitRepository, name: flux-system
  • postBuild.substituteFrom → ConfigMap cluster-vars (provisioned by ephemeral-castle)
  • interval: 1h
  • prune: true

1. infrastructure-operators-namespaces

FieldValue
Path./infrastructure/operators/namespaces
Waittrue
HealthChecksDaemonSet kube-flannel (kube-system), Deployment coredns (kube-system)
DependsOnnone (root)
Timeout5m

Declares the ai-agents namespace. Other namespaces (cert-manager, traefik, monitoring, tazlab-db, hugo-blog, hugo-wiki, dex, auth, reloader, cloudflare-ddns) are declared inline in each operator’s folder via Opzione A.1 pattern.

2. infrastructure-operators-core

FieldValue
Path./infrastructure/operators/core
Waittrue
DependsOnnone (root)
Timeout10m
RetryInterval2m

Installs all core operators via HelmReleases:

  • cert-manager (v1.16.2)
  • traefik (v34.0.0)
  • reloader (v1.2.1)
  • dex
  • auth (OAuth2 Proxy)
  • cloudflare-ddns
  • tazlab-db namespace declaration
  • hugo-blog namespace declaration
  • hugo-wiki namespace declaration

3. infrastructure-operators-data

FieldValue
Path./infrastructure/operators/data
Waittrue
DependsOnnone (root)
Timeout10m
RetryInterval2m

Installs:

  • postgres-operator (Crunchy PGO v5.7.2)

4. infrastructure-tailscale

FieldValue
Path./infrastructure/tailscale
Waittrue
HealthChecksDeployment tailscale-operator (tailscale)
DependsOnnone (root)
Timeout5m
RetryInterval1m

Creates the tailscale namespace, provisions the OAuth ExternalSecret (k8s_operator client), and defines the HelmRepository for https://pkgs.tailscale.com/helmcharts. Layer 1 of the Tailscale Operator 3-layer DAG.

5. infrastructure-operators-tailscale

FieldValue
Path./infrastructure/operators/tailscale
Waittrue
DependsOninfrastructure-tailscale
Timeout5m
RetryInterval1m

Installs the Tailscale Operator HelmRelease (v1.96.5, tailscale/tailscale-operator). Layer 2 of the 3-layer DAG — only applies after the namespace, Secret, and HelmRepository exist.

6. infrastructure-tailscale-dns

FieldValue
Path./infrastructure/tailscale-dns
Waittrue
DependsOninfrastructure-operators-tailscale
Timeout5m
RetryInterval1m

Deploys the hostNetwork CoreDNS relay DaemonSet (port 5353) for magellanic-gondola.ts.net resolution, static ClusterIP Service (10.96.0.101), and patches the coredns ConfigMap with a tailnet forwarding zone. Layer 3 — only applies after the Operator CRDs are available.

7. infrastructure-bridge

FieldValue
Path./infrastructure/cluster-bridge
Waittrue
DependsOninfrastructure-operators-core, infrastructure-operators-namespaces
Timeout5m

Aggregator for ./infrastructure/bridge. Configures:

  • IngressClass traefik
  • ClusterIssuer tazlab-issuer (Let’s Encrypt prod, HTTP01)

8. infrastructure-monitoring

FieldValue
Path./infrastructure/operators/monitoring
Waitnot set (defaults to true for non-root)
DependsOninfrastructure-operators-namespaces
Timeout10m
RetryInterval2m

Installs:

  • kube-prometheus-stack (HelmRelease)
  • metrics-server
  • Grafana dashboards as ConfigMaps (cluster-health, nodes-pro)
  • Grafana Ingress
  • flux-secret-sync (syncs Grafana credential from ExternalSecret)

9. infrastructure-configs

FieldValue
Path./infrastructure/configs
Waittrue
DependsOninfrastructure-bridge
Timeout5m

Deploys all ExternalSecrets and static configuration:

  • cert-manager: Cloudflare API token + DNS01 ClusterIssuer
  • wildcard-tls: *.tazlab.net TLS cert via ExternalSecret
  • hugo-wiki: namespace-specific ExternalSecrets
  • tazlab-db: S3 backup credentials
  • dex: OIDC client credentials
  • ai-agents: OpenClaw gateway / Telegram / OpenAI / ElevenLabs tokens
  • github-external-secret: GitHub token for Flux image automation

10. infrastructure-instances

FieldValue
Path./infrastructure/cluster-instances
Waitfalse
DependsOninfrastructure-configs, infrastructure-operators-data
Timeout10m
RetryInterval2m
SubstituteFrom (extra)Secret grafana-bootstrap-secret

Aggregator for ./infrastructure/instances plus all 4 image automation pipelines. Uses wait: false because some workloads (tazlab-db, longhorn, apps with init containers) can take variable time to become ready. Kubernetes handles Pending/Init states naturally.

Deploys:

  • tazlab-db: PostgresCluster (Crunchy PGO, 1 replica, 4Gi, S3 backup)
  • traefik: LoadBalancer Service (IP 192.168.1.240)
  • longhorn: Ingress + Service
  • dex: Deployment + Service + Ingress + ConfigMap
  • pgadmin: Deployment + PVC + Service
  • homepage: Dashboard UI
  • cloudflare-ddns: Deployment + ExternalSecret

Plus image automation for:

  • hugo-blog
  • hugo-wiki
  • mnemosyne-mcp

11. apps-static (hugo-blog)

FieldValue
Path./apps/cluster/hugo-blog
Waitnot set (defaults to true)
DependsOninfrastructure-configs
Timeout5m

Deploys hugo-blog from apps/base/hugo-blog/:

  • nginx serving static Hugo build
  • Certificate CR for blog.tazlab.net
  • Traefik Ingress + middlewares
  • Redirect middleware (tazlab.netblog.tazlab.net)

12. apps-static-wiki (hugo-wiki)

FieldValue
Path./apps/cluster/hugo-wiki
Waitnot set (defaults to true)
DependsOninfrastructure-configs
Timeout5m

Deploys hugo-wiki from apps/base/hugo-wiki/:

  • nginx serving static Hugo wiki build
  • Ingress for wiki.tazlab.net
  • Wildcard TLS via ExternalSecret (same cert chain as blog)

13. apps-data (mnemosyne-mcp)

FieldValue
Path./apps/cluster/mnemosyne-mcp
Waitnot set (defaults to true)
DependsOninfrastructure-configs
Timeout5m

Deploys mnemosyne-mcp from apps/base/mnemosyne-mcp/:

  • Go MCP server (tazzo/mnemosyne-mcp)
  • LoadBalancer Service port 8004 → 8080
  • ExternalSecret for GEMINI_API_KEY
  • wait-for-db initContainer patch
  • RBAC for secret reading
  • Reloader annotation on tazlab-db-pguser-mnemosyne + mnemosyne-mcp-secrets
FieldValue
Waittrue
DependsOninfrastructure-configs
Timeout5m
RetryInterval1m
  • PVCs for config and workspace (5Gi + 10Gi, tazlab-storage)
  • MetalLB LoadBalancer IP (192.168.1.242)
  • ExternalSecrets for gateway token, telegram, OpenAI, ElevenLabs

15. infrastructure-auth

FieldValue
Path./infrastructure/auth
Waittrue
DependsOninfrastructure-instances
Timeout5m

Deploys:

  • OAuth2 Proxy Deployment + Service + Ingress
  • ForwardAuth middleware
  • RBAC for the oauth2-proxy ServiceAccount

Wait Policy Truth Table

KustomizationwaithealthChecksEffect
infrastructure-operators-namespacestrueflannel + corednsBlocks dependents until CNI + DNS ready
infrastructure-operators-coretrueBlocks until core HelmReleases installed
infrastructure-operators-datatrueBlocks until PGO installed
infrastructure-tailscaletruetailscale-operatorBlocks until Layer 1 ready
infrastructure-operators-tailscaletrueBlocks until Layer 2 HelmRelease installed
infrastructure-tailscale-dnstrueBlocks until Layer 3 DNS resources ready
infrastructure-bridgetrueBlocks until IngressClass + Issuer ready
infrastructure-monitoringdefault(true)
infrastructure-configstrueBlocks until secrets available
infrastructure-instancesfalseNon-blocking: pods handle Pending/Init naturally
apps-staticdefault(true)
apps-static-wikidefault(true)
apps-datadefault(true)
infrastructure-authtrueBlocks until Dex + OAuth2 healthy

DAG Integrity Rules

  1. Namespaces must exist before their operators are installed
  2. ClusterIssuer / IngressClass must exist before ExternalSecrets can reference them
  3. ExternalSecrets must be available before workloads that consume them
  4. PGO must be installed before PostgresCluster CRs are applied
  5. Infrastructure instances must be ready before auth layer depends on them
  6. Never apply instances before their operator is ready — dependsOn enforces this

See Also