TazLab K8s: Monitoring Detail

Level 3 (Detail) — Prometheus, Grafana, dashboards, metrics-server.

Concept

The monitoring stack uses kube-prometheus-stack (Prometheus + Grafana + Alertmanager) managed as HelmRelease. Grafana uses the shared PostgreSQL instance (tazlab-db) as its backend, making dashboards persistent across pod restarts. Dashboards are managed as ConfigMaps loaded by a sidecar.

HelmRelease

File: infrastructure/operators/monitoring/helmrelease.yaml

Field	Value
Chart	kube-prometheus-stack
Repository	prometheus-community
Namespace	monitoring

Manifests

File: infrastructure/operators/monitoring/

File	Purpose
`helmrepository.yaml`	Helm repository reference
`helmrelease.yaml`	kube-prometheus-stack installation
`namespace.yaml`	`monitoring` namespace
`metrics-server.yaml`	Resource metrics (CPU/memory)
`flux-secret-sync.yaml`	Grafana admin credentials sync
`grafana-ingress.yaml`	Grafana HTTPS ingress
`dashboards/cluster-health.yaml`	Cluster health dashboard ConfigMap
`dashboards/nodes-pro.yaml`	Detailed node metrics dashboard

Grafana PostgreSQL Backend

Grafana is configured to use tazlab-db as its database backend (database grafana, user grafana). This ensures:

Dashboard configurations survive pod restarts
User sessions and preferences are persistent
No PVC dependency for Grafana itself

The grafana-bootstrap-secret Secret (referenced by infrastructure-instances Kustomization’s substituteFrom) provides initial admin credentials.

Dashboards as Code

Dashboards are stored as ConfigMaps with label grafana_dashboard: "1". Grafana’s sidecar watches for this label and loads dashboards automatically.

Current dashboards:

cluster-health.yaml — High-level node and pod metrics
nodes-pro.yaml — Detailed hardware and kernel metrics

metrics-server

File: infrastructure/operators/monitoring/metrics-server.yaml

Provides resource metrics used by kubectl top and HorizontalPodAutoscaler. Installed alongside kube-prometheus-stack.

flux-secret-sync

Syncs Grafana admin credentials from ExternalSecret to the Grafana deployment. Ensures the initial admin password is available before Grafana starts.

DAG Position

operators-namespaces (Level 0, creates monitoring namespace)
→ monitoring (Level 1, installs kube-prometheus-stack and metrics-server)
→ configs (Level 2, creates S3 backup ExternalSecret for tazlab-db)
→ instances (Level 3, creates Grafana ingress + syncs grafana-bootstrap-secret)

Prometheus Status Note

As of the 2026-04-29 power loss recovery, Prometheus uses a manually salvaged Longhorn volume. The volume is healthy and the pod is 2/2 Running. Future consideration: if monitoring data becomes important, the Prometheus PVC should use 2 replicas.

TazLab K8s: Monitoring Detail#

Concept#

HelmRelease#

Manifests#

Grafana PostgreSQL Backend#

Dashboards as Code#

metrics-server#

flux-secret-sync#

DAG Position#

Prometheus Status Note#

See Also#