Skip to content

Instantly share code, notes, and snippets.

@justisGipson
Last active March 13, 2024 15:08
Show Gist options
  • Select an option

  • Save justisGipson/d6394d3b840bda1644f13a86bacbfb57 to your computer and use it in GitHub Desktop.

Select an option

Save justisGipson/d6394d3b840bda1644f13a86bacbfb57 to your computer and use it in GitHub Desktop.
grafana PromQL AWS metrics

Latency

CloudFront

Average Origin Latency:

avg_over_time(aws_cloudfront_origin_latency{<DIMENSION NAME="DIMENSION VALUE">}[5m])

Elastic Beanstalk

Application Latency p99:

histogram_quantile(0.99, rate(aws_elasticbeanstalk_application_latency_p99{<DIMENSION NAME="DIMENSION VALUE">}[5m]))

EC2

Disk Read Latency:

avg(rate(aws_ec2_disk_read_ops{<DIMENSION NAME="DIMENSION VALUE">}[5m]))

Disk Write Latency:

avg(rate(aws_ec2_disk_write_ops{<DIMENSION NAME="DIMENSION VALUE">}[5m]))

RDS

Read Latency:

avg(rate(aws_rds_read_latency{<DIMENSION NAME="DIMENSION VALUE">}[5m]))

Write Latency:

avg(rate(aws_rds_write_latency{<DIMENSION NAME="DIMENSION VALUE">}[5m]))

Traffic

CloudFront

Total Requests:

sum(rate(aws_cloudfront_requests{<DIMENSION NAME="DIMENSION VALUE">}[1m]))

Elastic Beanstalk

Total Application Requests:

sum(rate(aws_elasticbeanstalk_application_requests_total{<DIMENSION NAME="DIMENSION VALUE">}[1m]))

EC2

Network In:

sum(rate(aws_ec2_network_in{<DIMENSION NAME="DIMENSION VALUE">}[1m]))

Network Out:

sum(rate(aws_ec2_network_out{<DIMENSION NAME="DIMENSION VALUE">}[1m]))

RDS

Database Connections:

sum(rate(aws_rds_database_connections{<DIMENSION NAME="DIMENSION VALUE">}[1m]))

Errors

CloudFront

4XX Error Rate:

sum(rate(aws_cloudfront_4xx_error_rate{<DIMENSION NAME="DIMENSION VALUE">}[1m]))

5XX Error Rate:

sum(rate(aws_cloudfront_5xx_error_rate{<DIMENSION NAME="DIMENSION VALUE">}[1m]))

Elastic Beanstalk

4XX Requests:

sum(rate(aws_elasticbeanstalk_application_requests4xx{<DIMENSION NAME="DIMENSION VALUE">}[1m]))

5XX Requests:

sum(rate(aws_elasticbeanstalk_application_requests5xx{<DIMENSION NAME="DIMENSION VALUE">}[1m]))

EC2

Status Check Failed:

increase(aws_ec2_status_check_failed{<DIMENSION NAME="DIMENSION VALUE">}[5m])

RDS

Deadlocks:

increase(aws_rds_deadlocks{<DIMENSION NAME="DIMENSION VALUE">}[5m])

Saturation

CloudFront

Cache Hit Rate:

avg_over_time(aws_cloudfront_cache_hit_rate{<DIMENSION NAME="DIMENSION VALUE">}[15m])

Elastic Beanstalk

CPU Utilization:

avg(rate(aws_elasticbeanstalk_cpuutilization{<DIMENSION NAME="DIMENSION VALUE">}[5m]))

EC2

CPU Utilization:

avg(rate(aws_ec2_cpuutilization{<DIMENSION NAME="DIMENSION VALUE">}[5m]))

RDS

CPU Utilization:

avg(rate(aws_rds_cpuutilization{<DIMENSION NAME="DIMENSION VALUE">}[5m]))

Enhanced Analysis

CloudFront

95th Percentile of Origin Latency Over Time

Calculates the 95th percentile of origin response times, highlighting the worst latency experiences:

histogram_quantile(0.95, rate(aws_cloudfront_origin_latency{<DIMENSION NAME="DIMENSION VALUE">}[5m]))

Error Rate Increase

Measures the total increase in error rates over an hour, capturing spikes in client or server errors:

increase(sum(rate(aws_cloudfront_4xx_error_rate{<DIMENSION NAME="DIMENSION VALUE">}[1h])) + sum(rate(aws_cloudfront_5xx_error_rate{<DIMENSION NAME="DIMENSION VALUE">}[1h])))

Cache Hit Ratio vs. Total Requests

Compares total requests to cache hit ratio, providing insights into cache effectiveness versus demand:

sum(rate(aws_cloudfront_requests{<DIMENSION NAME="DIMENSION VALUE">}[5m])) / avg_over_time(aws_cloudfront_cache_hit_rate{<DIMENSION NAME="DIMENSION VALUE">}[5m])

Elastic Beanstalk

Latency Distribution Change

Tracks day-over-day changes in the 99th percentile latency, identifying significant shifts in application performance:

changes(histogram_quantile(0.99, rate(aws_elasticbeanstalk_application_latency_p99{DIMENSION NAME="DIMENSION VALUE"}[1d])))

Traffic vs. CPU Utilization Correlation

Correlates total application requests with CPU utilization, assessing how traffic impacts resource consumption:

sum(rate(aws_elasticbeanstalk_application_requests_total{DIMENSION NAME="DIMENSION VALUE"}[5m])) * avg(rate(aws_elasticbeanstalk_cpuutilization{DIMENSION NAME="DIMENSION VALUE"}[5m]))

EC2

Disk I/O vs. CPU Saturation

Evaluates disk I/O in relation to CPU utilization, identifying potential bottlenecks in data processing or resource saturation:

(sum(rate(aws_ec2_disk_read_bytes{<DIMENSION NAME="DIMENSION VALUE>}[5m])) + sum(rate(aws_ec2_disk_write_bytes{<DIMENSION NAME="DIMENSION VALUE>}[5m]))) / avg(rate(aws_ec2_cpuutilization{<DIMENSION NAME="DIMENSION VALUE>}[5m]))

Network Throughput Efficiency

Calculates total network throughput per EC2 instance, offering a measure of network efficiency across the fleet:

(sum(rate(aws_ec2_network_in{<DIMENSION NAME="DIMENSION VALUE>}[5m])) + sum(rate(aws_ec2_network_out{<DIMENSION NAME="DIMENSION VALUE>}[5m]))) / count(aws_ec2_info)

RDS

Read/Write Ratio

Analyzes the ratio of read IOPS to write IOPS, providing insights into the read-heavy or write-heavy nature of database operations:

sum(rate(aws_rds_read_iops{<DIMENSION NAME="DIMENSION VALUE">}[5m])) / sum(rate(aws_rds_write_iops{<DIMENSION NAME="DIMENSION VALUE">}[5m]))

Database Connection Spikes

Monitors sudden increases in database connections over 15 minutes, helping identify unexpected spikes in demand or potential DDoS attacks:

increase(aws_rds_database_connections{<DIMENSION NAME="DIMENSION VALUE">}[15m]) / time()

CloudWatch Custom Metrics

CWAgent Metrics

Sourced from running aws cloudwatch list-metrics --namespace "CWAgent" | rg "MetricName" and cleaning duplicates

Names and dimension examples to filter on:

  "MetricName": "ethtool_rx_packets",
  "MetricName": "ethtool_tx_packets",
  "MetricName": "mem_used_percent",
  "MetricName": "ethtool_bw_in_allowance_exceeded",
  "MetricName": "ethtool_pps_allowance_exceeded",
  "MetricName": "ethtool_bw_out_allowance_exceeded",
  "MetricName": "ethtool_conntrack_allowance_exceeded",
  "MetricName": "ethtool_linklocal_allowance_exceeded",

  {
      "MetricName": "mem_used_percent",
      "Dimensions": [
          {
              "Name": "InstanceId",
              "Value": "i-0426944dd2ab4de8e"
          }
      ]
  },
  {

      "MetricName": "ethtool_rx_packets",
      "Dimensions": [
          {
              "Name": "driver",
              "Value": "ixgbevf"
          },
          {
              "Name": "InstanceId",
              "Value": "i-0426944dd2ab4de8e"
          },
          {
              "Name": "interface",
              "Value": "eth0"
          }
      ]
  },
  {

      "MetricName": "ethtool_tx_packets",
      "Dimensions": [
          {
              "Name": "driver",
              "Value": "ixgbevf"
          },
          {
              "Name": "InstanceId",
              "Value": "i-0426944dd2ab4de8e"
          },
          {
              "Name": "interface",
              "Value": "eth0"
          }
      ]
  },
   {

      "MetricName": "ethtool_bw_in_allowance_exceeded",
      "Dimensions": [
          {
              "Name": "driver",
              "Value": "ena"
          },
          {
              "Name": "InstanceId",
              "Value": "i-031b6cd4fd1ce0e96"
          },
          {
              "Name": "interface",
              "Value": "eth0"
          }
      ]
  },
  {

      "MetricName": "ethtool_linklocal_allowance_exceeded",
      "Dimensions": [
          {
              "Name": "driver",
              "Value": "ena"
          },
          {
              "Name": "InstanceId",
              "Value": "i-031b6cd4fd1ce0e96"
          },
          {
              "Name": "interface",
              "Value": "eth0"
          }
      ]
  },
  {

      "MetricName": "ethtool_pps_allowance_exceeded",
      "Dimensions": [
          {
              "Name": "driver",
              "Value": "ena"
          },
          {
              "Name": "InstanceId",
              "Value": "i-031b6cd4fd1ce0e96"
          },
          {
              "Name": "interface",
              "Value": "eth0"
          }
      ]
  },
  {

      "MetricName": "ethtool_bw_out_allowance_exceeded",
      "Dimensions": [
          {
              "Name": "driver",
              "Value": "ena"
          },
          {
              "Name": "InstanceId",
              "Value": "i-031b6cd4fd1ce0e96"
          },
          {
              "Name": "interface",
              "Value": "eth0"
          }
      ]
  },
  {

      "MetricName": "ethtool_conntrack_allowance_exceeded",
      "Dimensions": [
          {
              "Name": "driver",
              "Value": "ena"
          },
          {
              "Name": "InstanceId",
              "Value": "i-031b6cd4fd1ce0e96"
          },
          {
              "Name": "interface",
              "Value": "eth0"
          }
      ]
  },
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment