3) Alerting

Alerting rules allow you to define alert conditions based on Prometheus Quey Language (PromQL) expressions and to send notifications about firing alerts to an external service. Whenever the alert expression results in one or more vector elements at a given point in time, the alert counts as active for these elements' label sets. If you have installed Prometheus with Helm, you have yet a Custom Resource Definitions (CRD) called PrometheusRule. Below is an example

PrometheusRule.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  annotations:
    meta.helm.sh/release-name: mon
    meta.helm.sh/release-namespace: monitoring
  labels:
    app: kube-prometheus-stack
    release: mon
  name: regolatest
  namespace: monitoring
spec:
  groups:
  - name: customRules
    rules:
    - alert: rateGraph
      expr: rate(prometheus_http_requests_total{handler="/graph"}[5m]) * 60 >= 2
      labels:
        severity: none # You can add other labels, which can be useful in the future for grouping alerts
      annotations:
        description: This alert is activated if, in the last 5 minutes, the http requests per minute towards the sub-path "/graph" of the Prometheus dashboard exceeds the value 2
        summary: Many requests to "graph"
    - alert: rateAlerts
      expr: rate(prometheus_http_requests_total{handler="/alerts"}[5m]) * 60 >= 3
      labels:
        severity: none # You can add other labels, which can be useful in the future for grouping alerts
      annotations:
        description: This alert is activated if, in the last 5 minutes, the http requests per minute towards the sub-path "/alerts" of the Prometheus dashboard exceeds the value 3
        summary: Many requests to "alerts"

In this example we have inserted 2 rules that evaluate the rate of http requests to Prometheus dashboard pages. We will then update these web pages via the browser, to trigger the alert. Also note the labels adopted in this manifest. Similarly to what was seen in Monitor a Service, the labels must match those present in the ruleSelector field.

Prometheus CRD

# This is just an excerpt from the output of the "describe" command
$ kubectl describe -n monitoring prometheus 
Spec:
  Rule Selector:
    Match Labels:
      App:      kube-prometheus-stack
      Release:  mon

Trigger the rules

Now let's move to the Prometheus dashboard and update the sub-path /graph and /alerts (framed in yellow at the top left of the image) a few times (15-20 times should be enough), in order to activate the rules adopted. As you can see from the image, the created alerts have been activated and, in fact, are in firing status. After a few minutes, if no other http requests are made, these alerts will return to the starting status, i.e. inactive. The Watchdog alert, at the top of the screenshot, is always present: is an alert meant to ensure that the entire alerting pipeline is functional.

Send alert via e-mail

Let's try to send our alerts by e-mail. To do this we will use 2 other CRDs: Alertmanager and AlertmanagerConfig. Note the analogy with Prometheus and PrometheusRule. Also in this case we have linked the 2 CRDs through labels (however they can be chosen arbitrarily). Launch a describe on the Alertmanager component

Alertmanager CRD

# This is just an excerpt from the output of the "describe" command
$ kubectl describe -n monitoring alertmanager
Spec:
  Alertmanager Config Selector:
    Match Labels:
      Alertmanager: config

Create and configure the AlertmanagerConfig component for sending e-mails (find more details on the official website), using the file below. As you can see, Gmail is used as mail provider here. It isn’t recommended that you use your personal password for this, so you should create an App Password. To do that, go to Account Settings -> Security -> Signing in to Google -> App password (if you don’t see App password as an option, you probably haven’t set up 2-Step Verification and will need to do that first). Copy the newly-created password in the Secret below, after having encrypted it with the command echo <password> | base64 -w0.

AlertmanagerConfigMail.yaml

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: alert-email
  namespace: monitoring
  labels:
    alertmanager: config
spec:
  route:
    groupBy: [severity]
    receiver: 'notifications'
    groupWait: 30s
    groupInterval: 5m
    repeatInterval: 12h
  receivers:
  - name: 'notifications'
    emailConfigs:
    - to: <to@example.com>
      from: <from@gmail.com>
      smarthost: smtp.gmail.com:587
      authUsername: <from@gmail.com>
      authIdentity: <from@gmail.com>
      authPassword:
        name: gmail-pass
        key: alertmanager.yaml
      sendResolved: true
      headers:
      - key: From
        value: <from@gmail.com>
      - key: Subject
        value: 'Alertmanager notification'
      - key: To
        value: <to@example.com>
---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: gmail-pass
  namespace: monitoring
data:
  alertmanager.yaml: <pass_encode_base64>

After executing an apply of the above file, it is advisable to take a look at the logs of the following Pods, to see if there are any errors during the execution

Check the logs

$ kubectl logs -n monitoring pod/mon-kube-prometheus-stack-operator-bcf97f54f-w8s7j
level=info ts=2021-08-20T10:44:28.791312171Z caller=operator.go:1221 component=prometheusoperator key=monitoring/mon-kube-prometheus-stack-prometheus msg="sync prometheus"
level=info ts=2021-08-20T10:44:28.875362927Z caller=operator.go:742 component=alertmanageroperator key=monitoring/mon-kube-prometheus-stack-alertmanager msg="sync alertmanager"

$ kubectl logs -n monitoring pod/alertmanager-mon-kube-prometheus-stack-alertmanager-0
level=info ts=2021-08-20T10:34:03.964Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=info ts=2021-08-20T10:34:03.964Z caller=coordinator.go:126 component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/config/alertmanager.yaml

From the above logs you can read what is the path, inside the Pod Alertmanager-0, in which the newly added configurations are saved. This information can also be found in the Alertmanager dashboard. To reach it, simply expose the service (via ingress or NodePort)

Alertmanager service

$ kubectl get -n monitoring svc mon-kube-prometheus-stack-alertmanager
NAME                                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
mon-kube-prometheus-stack-alertmanager   ClusterIP   10.233.21.103   <none>        9093/TCP   9d

Within the dashboard you will find the same alerts of the Prometheus UI and, moreover, you can silence them for a defined period of time, filter them if there were many and much more. Finally, try to generate some alerts, if you do not want to wait for some error to occur spontaneously, and verify the receipt of the e-mail at the address indicated in the configuration of the AlertmanagerConfig component. You can, for example, reactivate the rateGraph or rateAlerts rules already used previously, as in the screenshot below

Send alert via Slack

If you want to receive notifications via Slack, you should be part of a Slack workspace. To set up alerting in your Slack workspace, you’re going to need a Slack API URL. Go to Slack -> Administration -> Manage apps. In the Manage apps directory, search for Incoming WebHooks and add it to your Slack workspace. Next, specify in which channel you’d like to receive notifications from Alertmanager. After you confirm and add Incoming WebHooks integration, Webhook URL is displayed (copy it).

At this point, as seen previously, we create an AlertmanagerConfig component for sending notifications to Slack, using the following file. Complete the file by replacing the two placeholders: the name of the Slack channel, in which to receive notifications, and the apiURL, in the Secret at the bottom, encoded in base64 (echo <apiURL> | base64 -w0). Then proceed with the apply of the file.

AlertmanagerConfigSlack.yaml

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: alert-slack
  namespace: monitoring
  labels:
    alertmanager: config
spec:
  route:
    groupBy: [severity]
    receiver: 'slack-notifications'
    groupWait: 30s
    groupInterval: 5m
    repeatInterval: 12h
  receivers:
  - name: 'slack-notifications'
    slackConfigs:
    - channel: '<#channel>'
      apiURL:
        name: slack-pass
        key: alertmanager.yaml
      sendResolved: true
# The following lines can be omitted, they have only an aesthetic value
      iconURL: 'https://avatars3.githubusercontent.com/u/3380462'
      title: |-
       [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} for {{ .CommonLabels.job }}
       {{- if gt (len .CommonLabels) (len .GroupLabels) -}}
         {{" "}}(
         {{- with .CommonLabels.Remove .GroupLabels.Names }}
           {{- range $index, $label := .SortedPairs -}}
             {{ if $index }}, {{ end }}
             {{- $label.Name }}="{{ $label.Value -}}"
           {{- end }}
         {{- end -}}
         )
       {{- end }}
      text: >-
       {{ range .Alerts -}}
       *Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}

       *Description:* {{ .Annotations.description }}

       *Details:*
         {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
         {{ end }}
       {{ end }}
---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: slack-pass
  namespace: monitoring
data:
  alertmanager.yaml: <apiURL_encode_base64>

If everything went smoothly (remember to always take a look at the logs of the Alertmanager-0 pod), you should get, in the chosen channel, a result similar to the following. We have 2 notifications for a single alert because we used the sendResolved option in the configuration file above, in order to be notified also when the problem is resolved.

Page tree

3) Alerting

Trigger the rules

Send alert via e-mail

Send alert via Slack