Skip to main content

Process Restart Alert

Why do you receive this message?​

The restart alert is sent when the system detects that the node process was restarted. You receive it when:

  • The process_start_time_seconds metric changed (difference > 1 second)
  • The node process was restarted
  • The system detected the change during metric collection

â„šī¸ Information: Restart can be intentional (manual restart) or unintentional (crash, system reboot). It's worth checking the cause.

What does the message contain?​

  • Alert image (IMG_0499.png) - Sent as the first message
  • Status - â„šī¸ Node Restart Detected
  • Restart time - Exact timestamp of restart (from Prometheus metric)
  • Source - process_start_time_seconds (process start timestamp)
  • Summary - Explanation of possible causes
  • Recommended checks - Commands to diagnose the cause

How should you react?​

  1. Check if the system restarted:

    last -x reboot | head

    If the system restarted, the node restart was probably caused by system reboot.

  2. Check system boot logs:

    journalctl -b

    Check if there were any problems during boot.

  3. Check node service logs:

    journalctl -u redbelly.service -b

    Check logs from the last boot.

  4. Check synchronization status:

    • Open the monitoring panel
    • Check if the node is synchronized
    • Check peer count
  5. If the restart was unintentional:

    • Check logs for errors
    • Check if there was OOM kill (dmesg | grep -i oom)
    • Check if there were disk problems

Sending Logic​

ElementDetails
Triggercollect_metrics_for_endpoint() function during metric collection
Check frequencyEvery 15 seconds (each metric collection cycle)
Metricprocess_start_time_seconds from Prometheus (Unix timestamp)
Conditionsâ€ĸ abs(current_start_time - last_process_start_time) > 1.0
â€ĸ Telegram alerts enabled
â€ĸ Metric available in both measurements
Tolerance1 second (to avoid false alarms on small differences)
FormatFirst image (IMG_0499.png), then text message in Markdown
Duplicate preventionAlert sent only on change of process_start_time_seconds