Skip to main content

High Latency Alert

Why do you receive this message?

The high latency alert is sent when the system detects that the node responds too slowly. You receive it when:

  • P95 latency (95th percentile of response time) exceeds threshold:
    • WARNING: ≥ 1000ms (1 second) for the last 60 minutes
    • CRITICAL: ≥ 2000ms (2 seconds) for the last 60 minutes
  • The node is ONLINE but works slowly
  • Enough data to calculate P95 (minimum samples from last 60 minutes)

⚠️ Warning! High latency may indicate performance problems that can lead to synchronization issues or node failure.

What does the message contain?

  • Alert image (IMG_0500.png) - Sent as the first message
  • Severity level:
    • ⚠️ WARNING - Action recommended (P95 ≥ 1000ms)
    • 🚨 CRITICAL - Action required (P95 ≥ 2000ms)
  • Metrics details:
    • P95 Latency - value in milliseconds
    • Threshold - threshold that was exceeded
    • Window - last 60 minutes
    • Normal Range - 50-300 ms (expected range)
  • Possible causes:
    • High CPU load
    • Disk I/O wait (slow disk)
    • Network overload
    • RPC overload (too many queries)
    • Memory pressure (lack of memory)
  • Recommended actions - Specific diagnostic commands

How should you react?

  1. Check CPU load:

    top
    # or
    htop

    Look for processes consuming a lot of CPU

  2. Check disk I/O:

    iostat -x 1 10

    Check if there are write/read problems

  3. Check network:

    ping google.com
    mtr google.com

    Check network latency

  4. Check system logs:

    journalctl -u redbelly.service --since today
  5. Check application logs:

    tail -f /var/log/redbelly/rbn_logs/*.log
  6. Check memory:

    free -h

    Check if there are memory problems (swap usage)

  7. If the problem persists:

    • Consider restarting the node
    • Check if there are network problems
    • Contact the Redbelly community on Discord

Sending Logic

ElementDetails
Triggercheck_latency_alerts() function called every 5 minutes
Check frequencyEvery 5 minutes (20 monitoring cycles × 15 seconds)
Time windowLast 60 minutes of data
Thresholds• WARNING: P95 ≥ 1000ms
• CRITICAL: P95 ≥ 2000ms
Cooldown• 30 minutes between alerts (to avoid spam)
• Disabled if latency increases by >50%
Conditions• Enough data (P95 calculable)
• P95 exceeds threshold
• Outside cooldown or significant degradation
FormatFirst image (IMG_0500.png), then text message in Markdown