TazLab Flux DAG Troubleshooting
Scope
This runbook explains how to debug the active Flux DAG when a layer does not converge.
Start Here
- Read TazLab Flux DAG.
- Identify the first layer that is not ready.
- Inspect the layer immediately upstream, not the failed application first.
Read Order
Use this order when debugging:
- Flux root and source readiness
- DAG roots: operators-namespaces, operators-core, operators-data
- bridge and configs
- instances
- auth
- apps
Useful Commands
flux get kustomizations -A
flux get sources all -A
flux reconcile source git flux-system
kubectl --kubeconfig=../ephemeral-castle/clusters/tazlab-k8s/proxmox/configs/kubeconfig get pods -A
kubectl --kubeconfig=../ephemeral-castle/clusters/tazlab-k8s/proxmox/configs/kubeconfig get events -A --sort-by=.lastTimestamp
What To Check By Layer
Operators
- namespace exists
- HelmRelease exists and is healthy
- CRDs are installed before dependent resources are applied
Configs
ExternalSecretobjects resolved successfully- cluster vars are available
- TLS and secret artifacts exist before auth or app layers consume them
Instances
- Pod readiness and service selectors match
wait: falselayers are not confused with actual readiness failure- database and dashboard surfaces have the expected LoadBalancer IPs and ports
Auth
oauth2-proxyis healthyauth.tazlab.netingress is reachable- the forward-auth middleware name matches the protected ingress annotations
Apps
- image automation has promoted the expected tag
- app-specific ingress or service surfaces match the layer contract
Common Failure Pattern
If a downstream layer is broken, first verify the upstream config or operator layer that feeds it. Most Flux failures in this cluster are dependency or secret-flow issues, not application container bugs.