TazLab Flux DAG Troubleshooting

Scope

This runbook explains how to debug the active Flux DAG when a layer does not converge.

Start Here

  1. Read TazLab Flux DAG.
  2. Identify the first layer that is not ready.
  3. Inspect the layer immediately upstream, not the failed application first.

Read Order

Use this order when debugging:

  1. Flux root and source readiness
  2. DAG roots: operators-namespaces, operators-core, operators-data
  3. bridge and configs
  4. instances
  5. auth
  6. apps

Useful Commands

flux get kustomizations -A
flux get sources all -A
flux reconcile source git flux-system
kubectl --kubeconfig=../ephemeral-castle/clusters/tazlab-k8s/proxmox/configs/kubeconfig get pods -A
kubectl --kubeconfig=../ephemeral-castle/clusters/tazlab-k8s/proxmox/configs/kubeconfig get events -A --sort-by=.lastTimestamp

What To Check By Layer

Operators

  • namespace exists
  • HelmRelease exists and is healthy
  • CRDs are installed before dependent resources are applied

Configs

  • ExternalSecret objects resolved successfully
  • cluster vars are available
  • TLS and secret artifacts exist before auth or app layers consume them

Instances

  • Pod readiness and service selectors match
  • wait: false layers are not confused with actual readiness failure
  • database and dashboard surfaces have the expected LoadBalancer IPs and ports

Auth

  • oauth2-proxy is healthy
  • auth.tazlab.net ingress is reachable
  • the forward-auth middleware name matches the protected ingress annotations

Apps

  • image automation has promoted the expected tag
  • app-specific ingress or service surfaces match the layer contract

Common Failure Pattern

If a downstream layer is broken, first verify the upstream config or operator layer that feeds it. Most Flux failures in this cluster are dependency or secret-flow issues, not application container bugs.

Relationships