TazLab K8s: Monitoring Detail
Level 3 (Detail) — Prometheus, Grafana, dashboards, metrics-server.
Concept
The monitoring stack uses kube-prometheus-stack (Prometheus + Grafana + Alertmanager) managed as HelmRelease. Grafana uses the shared PostgreSQL instance (tazlab-db) as its backend, making dashboards persistent across pod restarts. Dashboards are managed as ConfigMaps loaded by a sidecar.
HelmRelease
File: infrastructure/operators/monitoring/helmrelease.yaml
| Field | Value |
|---|---|
| Chart | kube-prometheus-stack |
| Repository | prometheus-community |
| Namespace | monitoring |
Manifests
File: infrastructure/operators/monitoring/
| File | Purpose |
|---|---|
helmrepository.yaml | Helm repository reference |
helmrelease.yaml | kube-prometheus-stack installation |
namespace.yaml | monitoring namespace |
metrics-server.yaml | Resource metrics (CPU/memory) |
flux-secret-sync.yaml | Grafana admin credentials sync |
grafana-ingress.yaml | Grafana HTTPS ingress |
dashboards/cluster-health.yaml | Cluster health dashboard ConfigMap |
dashboards/nodes-pro.yaml | Detailed node metrics dashboard |
Grafana PostgreSQL Backend
Grafana is configured to use tazlab-db as its database backend (database grafana, user grafana). This ensures:
- Dashboard configurations survive pod restarts
- User sessions and preferences are persistent
- No PVC dependency for Grafana itself
The grafana-bootstrap-secret Secret (referenced by infrastructure-instances Kustomization’s substituteFrom) provides initial admin credentials.
Dashboards as Code
Dashboards are stored as ConfigMaps with label grafana_dashboard: "1". Grafana’s sidecar watches for this label and loads dashboards automatically.
Current dashboards:
cluster-health.yaml— High-level node and pod metricsnodes-pro.yaml— Detailed hardware and kernel metrics
metrics-server
File: infrastructure/operators/monitoring/metrics-server.yaml
Provides resource metrics used by kubectl top and HorizontalPodAutoscaler. Installed alongside kube-prometheus-stack.
flux-secret-sync
Syncs Grafana admin credentials from ExternalSecret to the Grafana deployment. Ensures the initial admin password is available before Grafana starts.
DAG Position
operators-namespaces (Level 0, creates monitoring namespace)
→ monitoring (Level 1, installs kube-prometheus-stack and metrics-server)
→ configs (Level 2, creates S3 backup ExternalSecret for tazlab-db)
→ instances (Level 3, creates Grafana ingress + syncs grafana-bootstrap-secret)
Prometheus Status Note
As of the 2026-04-29 power loss recovery, Prometheus uses a manually salvaged Longhorn volume. The volume is healthy and the pod is 2/2 Running. Future consideration: if monitoring data becomes important, the Prometheus PVC should use 2 replicas.
See Also
- Parent topic: Monitoring & Dashboards
- Sibling details: tazlab-db Detail, External Secrets Detail