Home Case Studies Pinnacle’s Shift to Unified, Open-Source Observability
DevOps Impelmentation Oil & Gas

Pinnacle’s Shift to Unified, Open-Source Observability

The InfraShift Strategy: The LGTM Stack on AzureInfraShift architected a modern, open-source observability pipeline often referred to as the LGTM stack (Loki, Grafana, Tempo, Mimir/Prometheus). By decoupling the "inte...

Engagement snapshot

  • Client: Pinnacle
  • Duration: On-Going
  • Industry: Oil & Gas
  • Category: DevOps Impelmentation
Overview

Project context

The InfraShift Strategy: The LGTM Stack on Azure


InfraShift architected a modern, open-source observability pipeline often referred to as the LGTM stack (Loki, Grafana, Tempo, Mimir/Prometheus). By decoupling the "intelligence" of the monitoring from the "storage" of the cloud provider, we gave Pinnacle total control over their data and their costs.


The Mission: Replace expensive, fragmented monitoring with a centralized, high-performance platform for logs, metrics, and traces.

Challenge

What needed to change

The "Data Tax" of Proprietary Monitoring Pinnacle, a rapidly scaling enterprise, was facing a dual crisis with their Azure-native monitoring stack. First, the cost of log ingestion and metric storage was ballooning, becoming a significant line item in their cloud budget. Second, their data was fragmented. Engineers had to jump between multiple tools and write complex KQL (Kusto Query Language) queries just to correlate a single spike in latency with a specific log entry.

They needed a centralized, "single pane of glass" that was easy for the whole team to use - without the high premium of proprietary analytics.

Approach

How InfraShift executed

The Execution: Building the Unified Observability Hub

We bypassed the high-cost ingestion points by rerouting all Azure diagnostic data through a custom-built processing engine.


1. Centralized Log Ingestion with Azure Event Hub & Loki

Instead of sending logs directly to high-cost analytics workspaces, we streamed all Azure Diagnostic Logs and metrics into Azure Event Hub.

  • Loki as the Time-Series DB: We used Loki to consume these events. Because Loki only indexes metadata (labels) rather than the entire log line, it is significantly cheaper and faster than traditional indexing engines.
  • No More KQL: We simplified the user experience. Engineers no longer need to be KQL experts; they simply use the Loki Explorer in Grafana to perform high-speed string searches across the entire infrastructure.


2. Full-Stack Metrics & Traces (Prometheus & Tempo)

To provide a 360-degree view, we integrated the industry standards for metrics and distributed tracing:

  • Prometheus: Deployed to extract real-time performance metrics directly from the Kubernetes (AKS) clusters.
  • Tempo: Integrated to track distributed traces, allowing Pinnacle to follow a single user request across multiple microservices to pinpoint exactly where a bottleneck occurs.


3. The "Red-to-Green" Dashboard Strategy

We consolidated everything into a series of highly intuitive Grafana dashboards.

  • The Drill-Down Engine: We built a high-level "Executive View" where the health of every service is represented by a simple Red/Green status.
  • Single-Click Debugging: When a service turns Red, an engineer can click the metric to instantly see the correlated logs in Loki and the specific traces in Tempo—all within the same window.
  • Unified Alerting: We replaced fragmented alerts with a centralized Alertmanager system configured directly within Grafana.


Outcome

What improved after rollout

High Visibility, Low Overhead


The transition to an open-source observability framework transformed how Pinnacle’s SRE team operates:

  • Massive Cost Reduction: Pinnacle now primarily pays for raw storage costs (Blob storage) rather than expensive per-GB ingestion fees. This has drastically lowered their monthly Azure monitoring bill.
  • Zero Learning Curve: By removing the requirement for KQL and providing a simplified search interface, the entire engineering team—not just the DevOps leads—can now troubleshoot issues.
  • Reduced MTTR (Mean Time to Resolution): Having logs, metrics, and traces in a single platform allows engineers to find the "root cause" in minutes instead of hours.
  • Operational Simplicity: A single dashboard provides the status of the entire Azure environment, making it easy for stakeholders to see the health of the business at a glance.


More case studies

Related delivery stories

More examples from migration programs, infrastructure work, and platform operations.

CICD
1 Month

Zero-Downtime Blue-Green Deployment for a High-Growth Startup

Enterprise-Grade Patterns on a Startup BudgetThe client needed zero-downtime deployments, but they weren't yet ready for the cost and complexi...

Client: InfraShift - Client
Why it matters:

Reliability Without the OverheadBy choosing a VM-based automation strategy over a complex cluster, the client achieve...

DevOps Impelmentation
On-Going

Scaling LogiTax from 50 to 1,200+ Customers with Kubernetes & KEDA

The "Before": A Scaling Wall and Deployment Anxiety LogiTax was trapped in the "Monolith VM" cycle. Their entire application lived on a single...

Client: LogiTax
Why it matters:

The Impact: 30x Growth Without the HeadacheBy re-architecting for Kubernetes, LogiTax transformed their infrastructur...

Cloud Migration
3 Months

Strategic Infrastructure Migration and Cost Optimization for CashFlo

CashFlo, a high-growth GST and E-Invoicing platform, faced a critical challenge: their cloud infrastructure costs were scaling faster than the...

Client: CashFlo
Why it matters:

A Leaner, Faster, and More Secure CashFloThe results of the migration redefined CashFlo’s operational baseline:30%...

Case study technical notes

Architecture, operating model, and business impact

Client challenge

The work starts by identifying the production constraint, such as migration risk, scaling pressure, cost drift, weak observability, or manual release dependency.

Technical approach

InfraShift maps dependencies, reviews architecture, improves automation, strengthens monitoring, and documents handover steps so the client team can operate the platform after delivery.

Outcome to measure

Track lead time, deployment failure rate, incident recovery, cloud spend variance, utilization, alert quality, and the number of manual steps removed from daily engineering work.