Alerting rules allow you to define alert conditions based on Prometheus Quey Language (PromQL) expressions and to send notifications about firing alerts to an external service. Whenever the alert expression results in one or more vector elements at a given point in time, the alert counts as active for these elements' label sets. If you have installed Prometheus with Helm, you have yet a Custom Resource Definitions (CRD) called PrometheusRule
. Below is an example
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
annotations:
meta.helm.sh/release-name: mon
meta.helm.sh/release-namespace: monitoring
labels:
app: kube-prometheus-stack
release: mon
name: regolatest
namespace: monitoring
spec:
groups:
- name: customRules
rules:
- alert: rateGraph
expr: rate(prometheus_http_requests_total{handler="/graph"}[5m]) * 60 >= 2
labels:
severity: none # You can add other labels, which can be useful in the future for grouping alerts
annotations:
description: This alert is activated if, in the last 5 minutes, the http requests per minute towards the sub-path "/graph" of the Prometheus dashboard exceeds the value 2
summary: Many requests to "graph"
- alert: rateAlerts
expr: rate(prometheus_http_requests_total{handler="/alerts"}[5m]) * 60 >= 3
labels:
severity: none # You can add other labels, which can be useful in the future for grouping alerts
annotations:
description: This alert is activated if, in the last 5 minutes, the http requests per minute towards the sub-path "/alerts" of the Prometheus dashboard exceeds the value 3
summary: Many requests to "alerts"
In this example we have inserted 2 rules that evaluate the rate of http requests to Prometheus dashboard pages. We will then update these web pages via the browser, to trigger the alert. Also note the labels
adopted in this manifest. Similarly to what was seen in Monitor a Service, the labels must match those present in the ruleSelector
field.
# This is just an excerpt from the output of the "describe" command
$ kubectl describe -n monitoring prometheus
Spec:
Rule Selector:
Match Labels:
App: kube-prometheus-stack
Release: mon
Trigger the rules
Now let's move to the Prometheus dashboard and update the sub-path /graph
and /alerts
(framed in yellow at the top left of the image) a few times (15-20 times should be enough), in order to activate the rules adopted. As you can see from the image, the created alerts have been activated and, in fact, are in firing status. After a few minutes, if no other http requests are made, these alerts will return to the starting status, i.e. inactive. The Watchdog alert, at the top of the screenshot, is always present: is an alert meant to ensure that the entire alerting pipeline is functional.
Firing alerts
Send alert via e-mail
Let's try to send our alerts by e-mail. To do this we will use 2 other CRDs: Alertmanager
and AlertmanagerConfig
. Note the analogy with Prometheus
and PrometheusRule
. Also in this case we have linked the 2 CRDs through labels
(however they can be chosen arbitrarily). Launch a describe on the Alertmanager component
# This is just an excerpt from the output of the "describe" command
$ kubectl describe -n monitoring alertmanager
Spec:
Alertmanager Config Selector:
Match Labels:
Alertmanager: config
Create and configure the AlertmanagerConfig
component for sending e-mails (find more details on the official website), using the file below. As you can see, Gmail is used as mail provider here. It isn’t recommended that you use your personal password for this, so you should create an App Password. To do that, go to Account Settings -> Security -> Signing in to Google -> App password (if you don’t see App password as an option, you probably haven’t set up 2-Step Verification and will need to do that first). Copy the newly-created password in the Secret below, after having encrypted it with the command echo <password> | base64 -w0
.
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: alert-email
namespace: monitoring
labels:
alertmanager: config
spec:
route:
groupBy: [severity]
receiver: 'notifications'
groupWait: 30s
groupInterval: 5m
repeatInterval: 12h
receivers:
- name: 'notifications'
emailConfigs:
- to: <to@example.com>
from: <from@gmail.com>
smarthost: smtp.gmail.com:587
authUsername: <from@gmail.com>
authIdentity: <from@gmail.com>
authPassword:
name: gmail-pass
key: alertmanager.yaml
sendResolved: true
headers:
- key: From
value: <from@gmail.com>
- key: Subject
value: 'Alertmanager notification'
- key: To
value: <to@example.com>
---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: gmail-pass
namespace: monitoring
data:
alertmanager.yaml: <pass_encode_base64>
After executing an apply
of the above file, it is advisable to take a look at the logs of the following Pods, to see if there are any errors during the execution
$ kubectl logs -n monitoring pod/mon-kube-prometheus-stack-operator-bcf97f54f-w8s7j
level=info ts=2021-08-20T10:44:28.791312171Z caller=operator.go:1221 component=prometheusoperator key=monitoring/mon-kube-prometheus-stack-prometheus msg="sync prometheus"
level=info ts=2021-08-20T10:44:28.875362927Z caller=operator.go:742 component=alertmanageroperator key=monitoring/mon-kube-prometheus-stack-alertmanager msg="sync alertmanager"
$ kubectl logs -n monitoring pod/alertmanager-mon-kube-prometheus-stack-alertmanager-0
level=info ts=2021-08-20T10:34:03.964Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=info ts=2021-08-20T10:34:03.964Z caller=coordinator.go:126 component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/config/alertmanager.yaml
From the above logs you can read what is the path, inside the Pod Alertmanager-0
, in which the newly added configurations are saved. This information can also be found in the Alertmanager dashboard. To reach it, simply expose the service (via ingress or NodePort)
$ kubectl get -n monitoring svc mon-kube-prometheus-stack-alertmanager
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
mon-kube-prometheus-stack-alertmanager ClusterIP 10.233.21.103 <none> 9093/TCP 9d
Within the dashboard you will find the same alerts of the Prometheus UI and, moreover, you can silence them for a defined period of time, filter them if there were many and much more. Finally, try to generate some alerts, if you do not want to wait for some error to occur spontaneously, and verify the receipt of the e-mail at the address indicated in the configuration of the AlertmanagerConfig
component. You can, for example, reactivate the rateGraph or rateAlerts rules already used previously, as in the screenshot below
Alertmanager UI
Send alert via Slack
If you want to receive notifications via Slack, you should be part of a Slack workspace. To set up alerting in your Slack workspace, you’re going to need a Slack API URL. Go to Slack -> Administration -> Manage apps. In the Manage apps directory, search for Incoming WebHooks and add it to your Slack workspace. Next, specify in which channel you’d like to receive notifications from Alertmanager. After you confirm and add Incoming WebHooks integration, Webhook URL is displayed (copy it).
Manage Apps
Incoming WebHooks
At this point, as seen previously, we create an AlertmanagerConfig
component for sending notifications to Slack, using the following file. Complete the file by replacing the two placeholders: the name of the Slack channel
, in which to receive notifications, and the apiURL
, in the Secret
at the bottom, encoded in base64 (echo <apiURL> | base64 -w0
). Then proceed with the apply
of the file.
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: alert-slack
namespace: monitoring
labels:
alertmanager: config
spec:
route:
groupBy: [severity]
receiver: 'slack-notifications'
groupWait: 30s
groupInterval: 5m
repeatInterval: 12h
receivers:
- name: 'slack-notifications'
slackConfigs:
- channel: '<#channel>'
apiURL:
name: slack-pass
key: alertmanager.yaml
sendResolved: true
# The following lines can be omitted, they have only an aesthetic value
iconURL: 'https://avatars3.githubusercontent.com/u/3380462'
title: |-
[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} for {{ .CommonLabels.job }}
{{- if gt (len .CommonLabels) (len .GroupLabels) -}}
{{" "}}(
{{- with .CommonLabels.Remove .GroupLabels.Names }}
{{- range $index, $label := .SortedPairs -}}
{{ if $index }}, {{ end }}
{{- $label.Name }}="{{ $label.Value -}}"
{{- end }}
{{- end -}}
)
{{- end }}
text: >-
{{ range .Alerts -}}
*Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}
*Description:* {{ .Annotations.description }}
*Details:*
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: slack-pass
namespace: monitoring
data:
alertmanager.yaml: <apiURL_encode_base64>
If everything went smoothly (remember to always take a look at the logs of the Alertmanager-0
pod), you should get, in the chosen channel, a result similar to the following. We have 2 notifications for a single alert because we used the sendResolved
option in the configuration file above, in order to be notified also when the problem is resolved.
Slack Alert Firing
Slack Alert Resolved