Home Case Studies Pinnacle’s Shift to Unified, Open-Source Observability
DevOps Impelmentation Oil & Gas

Pinnacle’s Shift to Unified, Open-Source Observability

The InfraShift Strategy: The LGTM Stack on AzureInfraShift architected a modern, open-source observability pipeline often referred to as the LGTM stack (Loki, Grafana, Tempo, Mimir/Prometheus). By decoupling the "inte...

Engagement snapshot

  • Client: Pinnacle
  • Duration: On-Going
  • Industry: Oil & Gas
  • Category: DevOps Impelmentation
Overview

Project context

The InfraShift Strategy: The LGTM Stack on Azure


InfraShift architected a modern, open-source observability pipeline often referred to as the LGTM stack (Loki, Grafana, Tempo, Mimir/Prometheus). By decoupling the "intelligence" of the monitoring from the "storage" of the cloud provider, we gave Pinnacle total control over their data and their costs.


The Mission: Replace expensive, fragmented monitoring with a centralized, high-performance platform for logs, metrics, and traces.

Challenge

What needed to change

The "Data Tax" of Proprietary Monitoring Pinnacle, a rapidly scaling enterprise, was facing a dual crisis with their Azure-native monitoring stack. First, the cost of log ingestion and metric storage was ballooning, becoming a significant line item in their cloud budget. Second, their data was fragmented. Engineers had to jump between multiple tools and write complex KQL (Kusto Query Language) queries just to correlate a single spike in latency with a specific log entry.

They needed a centralized, "single pane of glass" that was easy for the whole team to use - without the high premium of proprietary analytics.

Approach

How InfraShift executed

The Execution: Building the Unified Observability Hub

We bypassed the high-cost ingestion points by rerouting all Azure diagnostic data through a custom-built processing engine.


1. Centralized Log Ingestion with Azure Event Hub & Loki

Instead of sending logs directly to high-cost analytics workspaces, we streamed all Azure Diagnostic Logs and metrics into Azure Event Hub.

  • Loki as the Time-Series DB: We used Loki to consume these events. Because Loki only indexes metadata (labels) rather than the entire log line, it is significantly cheaper and faster than traditional indexing engines.
  • No More KQL: We simplified the user experience. Engineers no longer need to be KQL experts; they simply use the Loki Explorer in Grafana to perform high-speed string searches across the entire infrastructure.


2. Full-Stack Metrics & Traces (Prometheus & Tempo)

To provide a 360-degree view, we integrated the industry standards for metrics and distributed tracing:

  • Prometheus: Deployed to extract real-time performance metrics directly from the Kubernetes (AKS) clusters.
  • Tempo: Integrated to track distributed traces, allowing Pinnacle to follow a single user request across multiple microservices to pinpoint exactly where a bottleneck occurs.


3. The "Red-to-Green" Dashboard Strategy

We consolidated everything into a series of highly intuitive Grafana dashboards.

  • The Drill-Down Engine: We built a high-level "Executive View" where the health of every service is represented by a simple Red/Green status.
  • Single-Click Debugging: When a service turns Red, an engineer can click the metric to instantly see the correlated logs in Loki and the specific traces in Tempo—all within the same window.
  • Unified Alerting: We replaced fragmented alerts with a centralized Alertmanager system configured directly within Grafana.


Outcome

What improved after rollout

High Visibility, Low Overhead


The transition to an open-source observability framework transformed how Pinnacle’s SRE team operates:

  • Massive Cost Reduction: Pinnacle now primarily pays for raw storage costs (Blob storage) rather than expensive per-GB ingestion fees. This has drastically lowered their monthly Azure monitoring bill.
  • Zero Learning Curve: By removing the requirement for KQL and providing a simplified search interface, the entire engineering team—not just the DevOps leads—can now troubleshoot issues.
  • Reduced MTTR (Mean Time to Resolution): Having logs, metrics, and traces in a single platform allows engineers to find the "root cause" in minutes instead of hours.
  • Operational Simplicity: A single dashboard provides the status of the entire Azure environment, making it easy for stakeholders to see the health of the business at a glance.


More case studies

Related delivery stories

More examples from migration programs, infrastructure work, and platform operations.

CICD
1 Month

Zero-Downtime Blue-Green Deployment for a High-Growth Startup

Enterprise-Grade Patterns on a Startup BudgetThe client needed zero-downtime deployments, but they weren't yet ready for the cost and complexi...

Client: InfraShift - Client
Why it matters:

Reliability Without the OverheadBy choosing a VM-based automation strategy over a complex cluster, the client achieve...

DevOps Impelmentation
On-Going

Scaling LogiTax from 50 to 1,200+ Customers with Kubernetes & KEDA

The "Before": A Scaling Wall and Deployment Anxiety LogiTax was trapped in the "Monolith VM" cycle. Their entire application lived on a single...

Client: LogiTax
Why it matters:

The Impact: 30x Growth Without the HeadacheBy re-architecting for Kubernetes, LogiTax transformed their infrastructur...

Cloud Migration
3 Months

Strategic Infrastructure Migration and Cost Optimization for CashFlo

CashFlo, a high-growth GST and E-Invoicing platform, faced a critical challenge: their cloud infrastructure costs were scaling faster than the...

Client: CashFlo
Why it matters:

A Leaner, Faster, and More Secure CashFloThe results of the migration redefined CashFlo’s operational baseline:30%...