Being aware of the computational limits of a VM or, in this case, a cluster of VMs is very useful to know how far we can go without breaking them. Furthermore, knowing the maximum workload supported by a device allows us to adopt one with characteristics suitable for our purposes: if a cluster with a certain configuration can manage our applications very well, even under sustained effort, it is useless to spend excessive resources.
For this purpose load, endurance and stress tests reveal how the system responds in various situations. To be more specific, these three types of analysis are defined as:
To put the VMs under pressure, this tutorial puts a lot of demand on a PHP application running in a Kubernetes cluster. The aim is for the cluster to scale horizontally, when incoming requests exceed normal usage patterns.
The tests will be performed on a cluster consisting of 4 nodes (1 master and 3 worker) with the same flavor. The flavor will also be modified in turn, remaining the same between the VMs in the cluster, multiplying the CPU and RAM by a factor of 2: passing from medium (2 CPUs and 4GB RAM) to large (4 CPUs and 8GB RAM) and finally xlarge (8 CPUs and 16GB RAM). Finally, the clusters used for the tests are set to "factory settings", ie they will contain only the starting software of a typical k8s cluster just created. Before we continue, let's familiarize ourselves with a couple of tools suitable for our purposes: Metrics Server and Horizontal Pod Autoscaler.
Metrics Server collects resource metrics (CPU and Memory) from Kubelets and exposes them in Kubernetes apiserver through Metrics API for use by HPA. Metrics API can also be accessed by kubectl top, as we will see later, making it easier to debug autoscaling pipelines.
Latest Metrics Server release can be installed by running
# Components are installed in the kube-system namespace kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml |
Metrics Server deployment will likely not be Ready. If, analyzing the Pod logs, you see the error "unable to fully scrape metrics", then edit the deployment by inserting the flag
$ kubectl edit deploy metrics-server -n kube-system
# Insert flag "--kubelet-insecure-tls"
spec:
containers:
- args:
- --kubelet-insecure-tls # Do not verify the CA of serving certificates presented by Kubelets |
If everything went well, you can already try to run the commands kubectl top, that allows you to see the resource consumption for nodes or pods
$ kubectl top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% k8s-master1.novalocal 159m 3% 2800Mi 36% k8s-worker-1.novalocal 92m 4% 1794Mi 48% k8s-worker-2.novalocal 78m 3% 1599Mi 43% k8s-worker-3.novalocal 67m 3% 1611Mi 43% $ kubectl top pod -n kube-system NAME CPU(cores) MEMORY(bytes) calico-kube-controllers-744cfdf676-6lwjv 1m 30Mi calico-node-222bk 18m 98Mi calico-node-68gk8 14m 99Mi calico-node-f7xzz 23m 87Mi calico-node-nj89p 19m 99Mi coredns-74ff55c5b-q5l4w 2m 17Mi coredns-74ff55c5b-q7gfb 2m 17Mi kube-apiserver-k8s-master1.novalocal 68m 475Mi kube-controller-manager-k8s-master1.novalocal 9m 58Mi kube-proxy-4vgzb 1m 13Mi kube-proxy-rhtsg 1m 14Mi kube-proxy-rt6ld 1m 16Mi kube-proxy-vb8zt 1m 17Mi kube-scheduler-k8s-master1.novalocal 3m 22Mi metrics-server-d895c4b8b-j96dk 3m 15Mi |
Limits and requests for CPU and Memory resources are measured, respectively, in cpu units and bytes. One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers (1 hyperthread on bare-metal Intel processors). You can express memory as a plain integer or as a fixed-point number using one of these suffixes: E, P, T, G, M, K. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki. For example, let's analyze the output of the kubectl top node command on the k8s-master1.novalocal node. This VM has 4 vCPUs and 8 GB of RAM. By doing some simple calculations (159/4000 CPU and 2800/8000 RAM), we obtain approximately the displayed percentages.
Horizontal Pod Autoscaler (HPA) automatically scales the number of Pods in a replication controller, deployment, replicaSet or statefulSet based on observed CPU or Memory (RAM) utilization. Let's see how it works in the next example.
To demonstrate HPA we will use a custom docker image based on the php-apache image. Apply the following file, to install a simple PHP web application in the Kubernetes cluster. Then, verify the pods were created.
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
replicas: 1
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: k8s.gcr.io/hpa-example
ports:
- containerPort: 80
resources: # <--- Pay attention
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
labels:
run: php-apache
spec:
ports:
- port: 80
selector:
run: php-apache |
The one just shown is a .yaml file containing a classic configuration of a deployment and a service. The only novelty is represented by the resources parameter, in the Container specifications. When you specify a Pod, you can optionally specify how much of each resource a Container needs. When you specify the resource request for Containers in a Pod, the scheduler uses this information to decide which node to place the Pod on. When you specify a resource limit for a Container, the kubelet enforces those limits so that the running container is not allowed to use more of that resource than the limit you set. The kubelet also reserves at least the request amount of that system resource specifically for that container to use. If the node where a Pod is running has enough of a resource available, it's possible (and allowed) for a container to use more resource than its request for that resource specifies. However, a container is not allowed to use more than its resource limit.
Now that the server is running, we will create the autoscaler using kubectl autoscale. The following command will create a HPA, that maintains between 1 and 10 replicas of the Pods controlled by the php-apache deployment we created before. Roughly speaking, HPA will increase and decrease the number of replicas (via the deployment) to maintain an average CPU utilization across all Pods of 50% (since each pod requests 200 milli-cores), this means average CPU usage of 100 milli-cores.
$ kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10 |
We may check the current status of autoscaler by running
$ kubectl get hpa NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE php-apache Deployment/php-apache/scale 0% / 50% 1 10 1 18s |
Please note that the current CPU consumption is 0% as we are not sending any requests to the server.
HPA, like every API resource, is supported in a standard way by |
Now, we will see how the autoscaler reacts to increased load. Once the PHP web application is running in the cluster and we have set up an autoscaling deployment, introduce load on the web application. Here we use a BusyBox image in a container and infinite web requests running from BusyBox to the PHP web application. Copy and deploy the infinite-calls.yaml file.
apiVersion: apps/v1
kind: Deployment
metadata:
name: infinite-calls
labels:
app: infinite-calls
spec:
replicas: 1
selector:
matchLabels:
app: infinite-calls
template:
metadata:
name: infinite-calls
labels:
app: infinite-calls
spec:
containers:
- name: infinite-calls
image: busybox
command:
- /bin/sh
- -c
- "while true; do wget -q -O- http://php-apache; done" |
Within a minute or so, we should see the higher CPU load by executing
$ kubectl get hpa NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE php-apache Deployment/php-apache/scale 305% / 50% 1 10 1 3m |
It may take a few minutes to stabilize the number of replicas. Since the amount of load is not controlled in any way it may happen that the final number of replicas will differ from this example. |
Here, CPU consumption has increased to 305% of the request. As a result, the deployment was resized to 7 replicas
$ kubectl get deployment,hpa NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/infinite-calls 1/1 1 1 5m19s deployment.apps/php-apache 7/7 7 7 71m NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE horizontalpodautoscaler.autoscaling/php-apache Deployment/php-apache 42%/50% 1 10 7 71m |
We also take a look at the resource consumption of the Pods, to check how the system reacts. In the php-apache.yaml file, seen above, we set requests.cpu: 200m in the container specification. Subsequently, we entrusted the management of the deployment to the HPA, requiring that the CPU consumption of the Pods does not exceed, on average, the value of 100 milli-cores. The system actually respects these dictates. In fact, by performing an arithmetic average of the CPU consumption by the php-apache Pods below, we obtain a value of about 84 milli-cores. Compare this result with the TARGETS column of the get hpa command above: 84 milli-cores correspond to 42% of the 200 milli-cores required for Pods.
$ kubectl top pod NAME CPU(cores) MEMORY(bytes) infinite-calls-69f758db46-hxssq 7m 2Mi php-apache-d4cf67d68-26nnm 82m 8Mi php-apache-d4cf67d68-5cxkh 103m 8Mi php-apache-d4cf67d68-gn5l7 90m 8Mi php-apache-d4cf67d68-j229m 74m 8Mi php-apache-d4cf67d68-k9vqz 77m 8Mi php-apache-d4cf67d68-tlssl 90m 8Mi php-apache-d4cf67d68-x76h2 75m 10Mi |
We will finish our example by stopping the process, simply deleting the deployment/infinite-calls component or, if you want to reuse it for further testing, scale it to zero replicas. Then, we verify the result state: after a minute or so, re-run the two get commands used earlier. You should get that CPU utilization dropped to 0, and so HPA autoscaled the number of replicas back down to 1.