Services
- Services Pillars
  
  Integration & Capabilities
  
  Accelerated by the Optimum  Intelligence Suite
  
  Success Stories
  
  What Changes When We’re Your Delivery Partner
Products
- Recent Launches
  
  The Sovereign AI Platform
  Go beyond isolated tools. Turn your data, information assets and code into unified institutional memory.
  Explore Mustang
  
  Your Autonomous QA Team
  The AI agentic swarm that closes the loop on quality assurance.Transform testing from a manual gate into a background process.
  Explore TheTester
  
  The AI Talent Engine
  The intelligence layer for high-volume recruitment. Identify, vet, and match elite talent to your specific business needs with AI-driven precision.
  Explore Skillsify
  
  Operations on Autopilot
  Scale your global team without the risk. Olive automates compliance, attendance, and local labor laws, ensuring your operations never miss a beat.
  Explore Olive
Agency
- What We Deliver
  
  Success Stories
  
  Insights from field
Innovation Center
Insights
About us
- Our Story
  
  Our Team
  
  Careers
  
  TechX
  
  Success Stories
  
  Insights
  
  Contact Us
  
  Our Clients

How We Built a One-Command Health Check for Kubernetes Clusters

Publish date

July 31, 2025

Publish date

July 31, 2025

In fast-moving environments, it’s easy to assume Kubernetes is fine as long as workloads are running. But when real issues surface—like stale deployments, failed pods, or node reboots—assumptions break down quickly.

At Optimum Partners, we recently tackled this challenge in a single-node Kubernetes cluster running key observability components. The cluster was small, but the risk of invisible failure modes was real.

So we built something deceptively simple: a script.
One command. Full cluster visibility. Designed for humans.

Here’s what we learned.

The Visibility Gap in Kubernetes

Kubernetes gives you tools to see everything—but no default way to see it all at once.

In our setup, developers and support staff needed fast answers to common questions:

Which pods are down or in error state?
Are services up across all namespaces?
Has the node rebooted recently?
Are there deprecated resources we missed?

Without dashboards or external tools, that meant jumping between kubectl commands, grepping outputs, and manually correlating data.

We wanted to compress all of that into one clean interface—with context.

What We Built: A Fast, Readable Health Snapshot

We created a Bash script that combines standard kubectl queries with process-level insight from the host node.

The script delivers a live cluster snapshot with:

✅ Verified kubectl connectivity
🧠 Node uptime, version, and readiness
📦 Status of all deployments, pods, services, PVCs, and custom resources
🌐 Ingress configurations
🗂 Recent events across namespaces (last 20–50)
💾 Optional save-to-file with timestamped logs for RCA or audit

Each section is color-coded and well-formatted for scanning large outputs during incident triage.

Technical Highlights

We kept it minimal—but powerful:

Used kubectl get + JSONPath to iterate through namespaces and resource types
Pulled node uptime using systemctl show -p ActiveEnterTimestamp kubelet via SSH
Displayed and optionally saved output using tee
Structured the report using temporary files for clean export

No dependencies. No dashboards.
Just structured shell scripting with discipline.

What Changed: Real Outcomes, Not Just Outputs

This wasn’t about better visuals—it was about better operational control. Here’s what improved:

🚀 Faster Triage

Teams can now run one command during incidents and immediately see degraded states, stale resources, or recent restarts.

🙌 Team Autonomy

Developers no longer need DevOps help to validate cluster health. They run checks themselves before escalating.

🕵️ Snapshot-Based RCA

Saved reports act as timestamped snapshots—useful for retrospective analysis, incident reviews, or internal audits.

🔄 Context Around Restarts

By surfacing node start time, we quickly correlate incidents with reboots or kernel-level changes.

Key Takeaways

Visibility scales with clarity, not just tooling. Even single-node clusters benefit from structured health checks.
Command-line output can be just as operational as a dashboard—when it’s clean, contextual, and shareable.
Simple tooling frees up DevOps bandwidth and empowers developers to act faster.

This script isn’t a monitoring replacement—it’s a visibility multiplier.

In high-velocity environments, that’s often the edge that matters most.