Kubernetes is famous for its "self-healing" capabilities. If a container crashes, the kubelet simply restarts it. On the surface, this is the dream: high availability without a human in the loop. But in a production environment, this silence can be dangerous. A pod that restarts 50 times a day isn't "healthy" it's a system on life support, masking a deeper problem like a memory leak or a failing dependency.
Identifying the Metric
To hear the death cries of a restarting pod, we first need to look at the data. Kubernetes tracks every restart in a metric called kube_pod_container_status_restarts_total.
While you can see this manually with a quick kubectl get pods, the goal is automation. By using Prometheus to scrape these metrics, we can transform a static "restart count" into a time-series "rate of change."
Grafana Alerting
Using Grafana Alerting (rather than raw Prometheus rules) gives us a visual, user-friendly way to manage our thresholds. The logic is simple, we want to be notified the moment the restart counter increases.
sum by (namespace, pod) (increase(kube_pod_container_status_restarts_total[15m]))By wrapping the metric in an increase() function over a 15-minute window, we filter out pods that have been stable for weeks and focus solely on the ones currently struggling.
Telegram Integration
An alert is only as good as the notification that reaches you. While I could of used Slack or Teams most of our team don't look at those messages out of hours. I picked Telegram as an easy to setup alterative, our team was also using telegram for Uptime robot notifications so it made sense.
Setting this up requires a quick chat with the @BotFather to generate an API token and give this to Grafana, but the real challenge is formatting. Kubernetes labels are verbose; a raw JSON alert dump at 3 AM is more of a nuisance than a help.
Making a Clean Alert
By using Grafana’s Message Templates and setting the Telegram Parse Mode to HTML, we can strip away the noise. We don’t need the full metadata we just need the "Who, Where, What, Why"
This is the format we choose
Pod Restarted Cluster: production Namespace: production Pod: superimportantpod-1234
To get that clean, professional output in Telegram, you have to use a two-part "handshake" within Grafana’s alerting system. You define the logic in a Message Template and then "call" that template inside your Contact Point settings.
Step 1
First, you need to save the formatting instructions in the Grafana Template store.
Go to Alerting > Contact points > Message templates
Click + Add message template
Give it a name
Paste this code into the Content box
{{ define "telegram_restart_style" }}
{{ range .Alerts }}
<b>Pod Restarted</b>
<b>Cluster:</b> {{ .Labels.cluster }}
<b>Namespace:</b> {{ .Labels.namespace }}
<b>Pod:</b> {{ .Labels.pod }}
<a href="{{ .GeneratorURL }}">View Alert in Grafana</a>
{{ end }}
{{ end }}Step 2
Now, you have to tell your Telegram bot to actually use that specific template.
Edit your Telegram Contact Point
Scroll down to the Optional Telegram settings section
The "Plug-in" Code, In the Message field, paste exactly this: {{ template "telegram_restart_style" . }}
The "Format" Setting, Change Parse Mode to HTML
If you don't use this code, Grafana uses its internal default template, which is designed to be "safe" but incredibly long. It includes every single label attached to the pod (like image hashes and internal IDs) that you usually don't need for a quick status check.
Conclusion
Monitoring pod restarts isn't about micro-managing your cluster it's about situational awareness. By bridging the gap between Prometheus metrics and real-time Telegram notifications, you turn Kubernetes from a silent healer into a vocal collaborator that tells you exactly when and why it’s struggling.