Prometheus and PromQL¶
A brief set of notes for metrics and promql
Metric Types¶
Counter¶
- Used for things that only go up
- Used for things where we want to calculate the rate of increase of said value (requests etc)
- rate(metrics_name[time_period])
- This will show us the per-second rate of metric_name averaged over a 5 min period
- Ex
- Request Count
- Tasks Completed
Gauges¶
- For values which can go up or down
- For metrics where you do not need to calculate the rate
- Ex
- CPU utilization
- Memory Utilization
- Queue Length
Histogram¶
- measures the frequency of value observations that fall into specific pre-defined buckets.
- For example we might want to keep track of http response times by bucketing every entry into buckets on range 0-0.005 , 0.005-0.1 , 0.1-1.0 etc etc and so on.
- We store the freqeuncy of no. of. requests that fall into those specific buckets.
- We might need to configure custom buckets if the predefined ones do not work for us.
- Use this when
- we want to later calculate averages or percentiles
- we are not bothered by the exact values and approximations work for us
- we know the range of values beforehand, so we can use the default bucket definitions or define our own buckets
- Ex
- request duration
- payload size
Working Example¶
If the name of the metric is request_duration, then prometheus will automatically create other time_series for the same metric with additonal information; like
request_duration_bucket{le=0.005, }request_duration_bucket{le=0.01, }request_duration_bucket{le=0.025, }request_duration_bucket{le=0..05, }request_duration_countrequest_duration_sum
The last two _sum and _count are used to calculate averages and percentiles.
Summary¶
- Histograms, but a bit complicated and weird.
- Use this mainly when we don't know the buckets beforehand and hence we can not use histograms
Operations and Patterns¶
rate¶
calculates the per-second average rate of increase of the time series.
i.e. This gives you the acceleration in the distance-time graph
Increase¶
calculates the increase in the time series.
syntactic sugar for rate(time_series[xm]) * xm
i.e. This gives you the increase in speed in a distance-time graph.
Sum¶
For summing over dimensions in a metric. Imagine this as a group_by operator.
Resources¶
- https://www.youtube.com/watch?v=09bR9kJczKM
- https://www.youtube.com/watch?v=fhx0ehppMGM
- https://promlabs.com/blog/2020/06/18/the-anatomy-of-a-promql-query/
- https://www.robustperception.io/rate-then-sum-never-sum-then-rate/
- https://www.robustperception.io/understanding-machine-cpu-usage/
- https://stackoverflow.com/questions/54494394/do-i-understand-prometheuss-rate-vs-increase-functions-correctly