Tailscale: Service Exposure (Ingress + LoadBalancer)

Level 2 (Topic) — Replacing MetalLB and public Traefik with Tailscale-native Ingress and LoadBalancer for internal services.

Concept

After the Tailscale Operator was deployed (DNS resolution) and all secrets were migrated from Infisical to Vault, the next architectural step was to move internal service access onto the tailnet. The project 20-tailscale-service-exposure replaced public Traefik ingress + MetalLB LoadBalancer with Tailscale-native endpoints for 6 services.

Four Exposure Mechanisms

The Tailscale Operator supports four ways to expose cluster workloads to the tailnet:

  1. LoadBalancer Service with loadBalancerClass: tailscale — for any TCP/UDP protocol
  2. Annotation tailscale.com/expose: "true" on an existing Service
  3. Ingress with ingressClassName: tailscale — HTTP/HTTPS only, auto-provisioned Let’s Encrypt TLS
  4. ProxyGroup kube-apiserver — dedicated HA proxy for the K8s API server (port 443, Let’s Encrypt TLS)

ProxyGroup kube-apiserver (2026-05-29)

The old LoadBalancer Service for the K8s API server (tailscale-apiserver in kube-system) was replaced by a ProxyGroup of type kube-apiserver (mode: noauth). This is the official Tailscale pattern for API server exposure.

  • Resource: tazlab-k8s/infrastructure/tailscale-dns/apiserver-proxy.yaml
  • Name: lushycorp-apiserver-proxy (namespace: tailscale)
  • Mode: noauth (Vault handles JWT+JWKS authentication)
  • Replicas: 2 (HA)
  • FQDN: lushycorp-apiserver-proxy.magellanic-gondola.ts.net
  • TLS: Let’s Encrypt (no custom CA cert needed)
  • ACL: grants tag:tazlab-vault → tag:k8s:443,80 + autoApprovers for Tailscale Services

Note: All Tailscale Services (LoadBalancer, ProxyGroup) assign VIPs with ipMode: VIP. On Linux clients, these require --accept-routes=true on the client. The Vault host (Hetzner VM) was configured with tailscale set --accept-routes=true --accept-dns=true.

Services Migrated

ServiceBeforeAfterPortMethod
PostgreSQLMetalLB 192.168.1.241:5432tazlab-db.magellanic-gondola.ts.net:54325432LoadBalancer
HomepageTraefik home.tazlab.net + MetalLB 192.168.1.240:8000home.magellanic-gondola.ts.net8000Ingress
pgAdminTraefik pgadmin.tazlab.netpgadmin.magellanic-gondola.ts.net8001Ingress
LonghornTraefik longhorn.tazlab.netlonghorn.magellanic-gondola.ts.net8002Ingress
Traefik dashboardTraefik traefik.tazlab.nettraefik.magellanic-gondola.ts.net8003Ingress
GrafanaTraefik grafana.tazlab.netgrafana.magellanic-gondola.ts.net8005Ingress

Key Design Decisions

D1: Authentication — ACL + Identity Headers (no oauth2-proxy)

oauth2-proxy is deployed as a forward-auth middleware for Traefik (--upstream=static://200), not as a reverse proxy. It cannot serve as a Tailscale Ingress backend. The chosen model:

  • Tailscale ACL — network-level authorization (device tags)
  • Identity headersTailscale-User, Tailscale-User-Login injected by the Ingress proxy
  • App-layer auth — each service retains its own login

D2: pgBouncer Bypass (Intentional)

The Postgres tailnet Service targets the primary pod directly, bypassing pgBouncer. TazPod is the sole tailnet DB consumer with 1-2 persistent connections — connection pooling adds overhead with no benefit. The existing MetalLB path also bypassed pgBouncer.

D3: MetalLB Removal Strategy

Old MetalLB Services are commented out in git (not deleted). Flux prunes the resource. Rollback: uncomment and push. Permanently removed after 7 days of proven stability.

D4: Wildcard TLS Cleanup

After each Tailscale Ingress is validated, the wildcard TLS ExternalSecret block and Traefik Ingress resource are removed from the component’s kustomization.

Requirements

  • Tailscale Operator v1.96.x with OAuth scopes devices, auth_keys, AND services (bug #19471)
  • HTTPS enabled at the tailnet level (tailscale.com/admin/dns)
  • ACL tags: tag:k8s (operator proxies), tag:internal-apps (admin ingress)
  • Hairpin annotation tailscale.com/experimental-forward-cluster-traffic-via-ingress: "true" on every Ingress for in-cluster pod connectivity

Implementation Flow (3 Slices)

  1. Slice 1 — Postgres via LoadBalancer Service (TCP), MetalLB removed
  2. Slice 2 — 5 Admin surfaces via Ingress, one at a time, each with validation before cleanup
  3. Slice 3 — Homepage links updated to tailnet hostnames

See Also