Being aware of the computational limits of a VM or, in this case, a cluster of VMs is very useful to know how far we can go without breaking them. Furthermore, knowing the maximum workload supported by a device allows us to adopt one with characteristics suitable for our purposes: if a cluster with a certain configuration can manage our applications very well, even under sustained effort, it is useless to spend excessive resources.

LoadFor this purpose load, endurance and stress tests reveal how the system responds in various situations. To be more specific, these three types of analysis are defined as:

To put the VMs under pressure, this tutorial puts a lot of demand on a PHP application running in a Kubernetes cluster. The aim is for the cluster to scale horizontally, when incoming requests exceed normal usage patterns.

The The tests will be performed on a cluster consisting of 4 nodes (1 master and 3 worker) with the same flavor. The flavor will also be modified in turn, remaining the same between the VMs in the cluster, passing from training (2 CPUs and 4GB RAM) to large (4 CPUs and 8GB RAM) and finally xlarge (8 CPUs and 16GB RAM). Finally, the clusters used for the tests are set to "factory settings", ie they will contain only the starting software of a typical k8s cluster just created. Before we continue, let's familiarize ourselves with a couple of tools suitable for our purposes: Metrics Server and Horizontal Pod Autoscaler.

Metrics Server

Metrics Server collects resource metrics (CPU and Memory) from Kubelets and exposes them in Kubernetes apiserver through Metrics API for use by HPA. Metrics API can also be accessed by kubectl top, as we will see later, making it easier to debug autoscaling pipelines.

Metrics Installation

Latest Metrics Server release can be installed by running

Code Block

language	bash
title	Install Metrics Server

# Components are installed in the kube-system namespace
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Metrics Server deployment will likely not be Ready. If, analyzing the logs, you see the error "unable to fully scrape metrics", then edit the deployment by inserting the flag

Code Block

language	bash
title	Error "unable to fully scrape metrics"
collapse	true

$ kubectl edit deploy metrics-server -n kube-system
# Insert flag "--kubelet-insecure-tls" 
spec:
  containers:
  - args:
    - --kubelet-insecure-tls # Do not verify the CA of serving certificates presented by Kubelets

If everything went well, you can already try to run the commands kubectl top, that allows you to see the resource consumption for nodes or pods

Code Block

language	bash
title	Visualize Pod and Node resources
collapse	true

$ kubectl top node
NAME                     CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
k8s-master1.novalocal    159m         3%     2800Mi          36%
k8s-worker-1.novalocal   92m          4%     1794Mi          48%
k8s-worker-2.novalocal   78m          3%     1599Mi          43%
k8s-worker-3.novalocal   67m          3%     1611Mi          43%

$ kubectl top pod -n kube-system
NAME                                            CPU(cores)   MEMORY(bytes)
calico-kube-controllers-744cfdf676-6lwjv        1m           30Mi
calico-node-222bk                               18m          98Mi
calico-node-68gk8                               14m          99Mi
calico-node-f7xzz                               23m          87Mi
calico-node-nj89p                               19m          99Mi
coredns-74ff55c5b-q5l4w                         2m           17Mi
coredns-74ff55c5b-q7gfb                         2m           17Mi
kube-apiserver-k8s-master1.novalocal            68m          475Mi
kube-controller-manager-k8s-master1.novalocal   9m           58Mi
kube-proxy-4vgzb                                1m           13Mi
kube-proxy-rhtsg                                1m           14Mi
kube-proxy-rt6ld                                1m           16Mi
kube-proxy-vb8zt                                1m           17Mi
kube-scheduler-k8s-master1.novalocal            3m           22Mi
metrics-server-d895c4b8b-j96dk                  3m           15Mi

Limits and requests for CPU and Memory resources are measured, respectively, in cpu units and bytes. One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers (1 hyperthread on bare-metal Intel processors). You can express memory as a plain integer or as a fixed-point number using one of these suffixes: E, P, T, G, M, K. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki. For example, let's analyze the output of the kubectl top node command on the k8s-master1.novalocal node. This VM has 4 vCPUs and 8 GB of RAM. By doing some simple calculations (159/4000 CPU and 2800/8000 RAM), we obtain approximately the displayed percentages.

Horizontal Pod Autoscaler

Horizontal Pod Autoscaler (HPA) automatically scales the number of Pods in a replication controller, deployment, replicaSet or statefulSet based on observed CPU or Memory (RAM) utilization. Let's see how it works in the next example.

Run and expose php-apache server

To demonstrate HPA, we will use a custom docker image based on the php-apache image. Launch the docker build command, after copying the contents of the Dockerfile and the index.php

Code Block

language	php
title	Dockerfile and index.php
collapse	true

FROM php:5-apache
COPY index.php /var/www/html/index.php
RUN chmod a+rx index.php

// These lines of code define the index.php file, which performs some CPU intensive computations
<?php
  $x = 0.0001;
  for ($i = 0; $i <= 1000000; $i++) {
    $x += sqrt($x);
  }
  echo "OK!";
?>

First, we will start a deployment running the image and expose it as a service using the following configuration

Code Block

language	yml
title	php-apache.yaml
collapse	true

apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  replicas: 1
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: k8s.gcr.io/hpa-example
        ports:
        - containerPort: 80
        resources:	# <--- Pay attention
          limits:
            cpu: 500m
          requests:
            cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  ports:
  - port: 80
  selector:
    run: php-apache

The one just shown is a .yaml file containing a classic configuration of a deployment and a service. The only novelty is represented by the resources parameter, in the Container specifications. When you specify a Pod, you can optionally specify how much of each resource a Container needs. When you specify the resource request for Containers in a Pod, the scheduler uses this information to decide which node to place the Pod on. When you specify a resource limit for a Container, the kubelet enforces those limits so that the running container is not allowed to use more of that resource than the limit you set. The kubelet also reserves at least the request amount of that system resource specifically for that container to use. If the node where a Pod is running has enough of a resource available, it's possible (and allowed) for a container to use more resource than its request for that resource specifies. However, a container is not allowed to use more than its resource limit.

Create HPA

Now that the server is running, we will create the autoscaler using kubectl autoscale. The following command will create a HPA, that maintains between 1 and 10 replicas of the Pods controlled by the php-apache deployment we created before. Roughly speaking, HPA will increase and decrease the number of replicas (via the deployment) to maintain an average CPU utilization across all Pods of 50% (since each pod requests 200 milli-cores), this means average CPU usage of 100 milli-cores.

Code Block

language	bash
title	Create HPA

$ kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

We may check the current status of autoscaler by running

Code Block

language	bash
title	Autoscaler status
collapse	true

$ kubectl get hpa
NAME         REFERENCE                     TARGET    MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache/scale   0% / 50%  1         10        1          18s

Please note that the current CPU consumption is 0% as we are not sending any requests to the server.

Info

title	Support for HPA in kubectl

HPA, like every API resource, is supported in a standard way by kubectl. We can list autoscalers by kubectl get hpa and get detailed description by kubectl describe hpa. We can create a new autoscaler using kubectl create command. In effect, instead of using kubectl autoscale command to create a HPA imperatively, we can use a file to create it declaratively. Finally, we can delete an autoscaler using kubectl delete hpa.

Increase load

Now, we will see how the autoscaler reacts to increased load. We will start a container, and send an infinite loop of queries to the php-apache service (run it in a different terminal)

Code Block

language	bash
title	Send infinite queries
collapse	true

$ kubectl run -i --tty load-generator --rm --image=busybox --restart=Never \
	-- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
If you don't see a command prompt, try pressing enter.
OK!OK!OK!OK!OK!OK!OK!OK!...

Within a minute or so, we should see the higher CPU load by executing

Code Block

language	bash
title	Overload
collapse	true

$ kubectl get hpa
NAME         REFERENCE                     TARGET      MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache/scale   305% / 50%  1         10        1          3m

Info

title	Note

It may take a few minutes to stabilize the number of replicas. Since the amount of load is not controlled in any way it may happen that the final number of replicas will differ from this example.

Here, CPU consumption has increased to 305% of the request. As a result, the deployment was resized to 7 replicas

Code Block

language	bash
title	Get deployment and HPA
collapse	true

$ kubectl get deployment,hpa
NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/php-apache   7/7     7            7           2d1h
NAME                                             REFERENCE               TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/php-apache   Deployment/php-apache   42%/50%   1         10        7          2d

We also take a look at the resource consumption of the Pods, to check how the system reacts. In the php-apache.yaml file, seen above, we set requests.cpu: 200m in the container specification. Subsequently, we entrusted the management of the deployment to the HPA, requiring that the CPU consumption of the Pods does not exceed, on average, the value of 100 milli-cores. The system actually respects these dictates. In fact, by performing an arithmetic average of the CPU consumption by the php-apache Pods below, we obtain a value of about 84 milli-cores. Compare this result with the TARGETS column of the get hpa command above: 84 milli-cores correspond to 42% of the 200 milli-cores required for Pods.

Code Block

language	bash
title	Metrics analysis
collapse	true

$ kubectl top pod
NAME                         CPU(cores)   MEMORY(bytes)
load-generator               7m           0Mi
php-apache-d4cf67d68-26nnm   82m          8Mi
php-apache-d4cf67d68-5cxkh   103m         8Mi
php-apache-d4cf67d68-gn5l7   90m          8Mi
php-apache-d4cf67d68-j229m   74m          8Mi
php-apache-d4cf67d68-k9vqz   77m          8Mi
php-apache-d4cf67d68-tlssl   90m          8Mi
php-apache-d4cf67d68-x76h2   75m          10Mi

Stop load

We will finish our example by stopping the user load. In the terminal where we created the container with busybox image, terminate the load generation by typing <Ctrl> + C. Then, we verify the result state. After a minute or so, re-run the two get commands used earlier. You should get that CPU utilization dropped to 0, and so HPA autoscaled the number of replicas back down to 1.

Page tree

Versions Compared

Old Version 4

New Version 5

Key

Metrics Server

Metrics Installation

Horizontal Pod Autoscaler

Run and expose php-apache server

Create HPA

Increase load

Stop load

Page tree

Page History

Versions Compared

Old Version 4

New Version 5

Key

Metrics Server

Metrics Installation

Horizontal Pod Autoscaler

Run and expose php-apache server

Create HPA

Increase load

Stop load