Day 20: Recap – Creating a Basic Observability Stack

Welcome to Day 20 of the Zero to Platform Engineer in 30 Days challenge! 🚀 Today, we’re doing a recap of observability best practices and assembling a basic observability stack using Prometheus, Grafana, Loki, and OpenTelemetry.

What Is an Observability Stack?

An observability stack consists of tools that help monitor, log, and trace applications to ensure performance and reliability.

Key Components of Observability:

  • Metrics → Measure system health (CPU, memory, request rates).
  • Logs → Capture detailed event information for debugging.
  • Traces → Provide insights into request flows across services.

A full observability stack includes:

  • Prometheus → Metrics collection
  • Grafana → Visualization and dashboards
  • Loki → Centralized logging
  • OpenTelemetry → Distributed tracing

Building a Cloud-Native Observability Stack

Metrics and Prometheus

  • Collects time-series data from kubernetes, applications, and infrastructure.
  • Uses PromQL to query and aggregate data.

1. Install Prometheus with Helm:

helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace

2. Example PromQL Queries:

sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)

Dashboards and Grafana

  • Provides a graphical interface for creating and managing dashboards.
  • Allows you to visualize metrics and logs.
  • Integrates with Prometheus and Loki.
  • Supports alerting for proactive monitoring.

Install Grafana with Helm:

helm install grafana grafana/grafana --namespace monitoring

Import a Pre-Built Dashboard:

  1. Go to Dashboards > Import.
  2. Enter Dashboard ID: 9135 (Kubernetes Cluster Monitoring)
  3. Select Prometheus as the data source
  4. Click Import to visualize your cluster metrics
  5. Click Save to store your dashboard

Logs with Loki

  • Lightweight, scalable, and highly-available logging solution.
  • Works seamlessly with grafana for log analysis.

Install Loki with Helm:

helm install loki grafana/loki-stack --namespace monitoring

Query logs in Grafana:

{job="nginx"} |= "error"

Distributed Tracing with OpenTelemetry

  • Captures end-to-end request flows across microservices.
  • Helps debug latency issues and optimize performance.

Install OpenTelemetry with Helm:

helm install otel-collector open-telemetry/opentelemetry-collector --namespace monitoring

Why Use an Observability Stack?

  • Faster troubleshooting and debugging. Quickly identify issues and root causes.
  • Better Performance Monitoring. Identify bottlenecks and optimize resource utilization.
  • Improved incident response. Reduce downtime and improve customer satisfaction.

Activity for Today

  1. Review metrics, logs, and traces and how they work together.
  2. Install Prometheus, Grafana, Loki, and OpenTelemetry.
  3. Test queries, dashboards, and alerts in Grafana.

What’s Next?

Tomorrow, we’ll shift focus to Internal Developer Platforms (IDPs) and discuss how platform teams improve developer experience.

👉 Check it out here: Zero to Platform Engineer Repository

Feel free to clone the repo, experiment with the code, and even contribute if you’d like! 🚀

Follow the Series!

🎉 Don’t miss a single step in your journey to becoming a Platform Engineer! 🎉

This post is just the beginning. Here’s what we’ve covered so far and what’s coming up next:

👉 Bookmark this blog and check back every day for new posts in the series. 📣 Share your progress on social media with the hashtag #ZeroToPlatformEngineer to connect with other readers!

Subscribe to Alex Parra Newsletter

One update per month. No spam.