Airbnb5d ago
Monitoring reliably at scale
Here's a summary of the post "Monitoring reliably at scale" from Airbnb Engineering: Airbnb's observability stack depended on the same systems it was intended to monitor, introducing a circular dependency that risked visibility during outages. To break this dependency, the team isolated compute resources and networking layers to provide redundant, highly available paths for collecting metrics. The team created dedicated Kubernetes clusters for observability workloads to minimize shared failure domains and operational overhead. For networking, they built a custom Layer 7 network ingress layer using Envoy to load-balance traffic, isolate observability traffic, and prioritize telemetry.
FrontendData Science
9 min