Mistakes to Avoid when Configuring a Kubernetes Cluster

Kubernetes has become the go-to platform for deploying and orchestration containerized workloads at scale. Organizations build and deploy their containerized workloads on a Kubernetes cluster that is either self-managed or running on a cloud provider. As their scale increases, it becomes increasingly important to avoid misconfigurations in Kubernetes manifests or configuration issues that can lead to a plethora of different problems.

Within this blog post, we will be going over some of the common mistakes that are made when deploying applications to Kubernetes. We will also be sharing some of our experiences and help you minimize issues when you are working with Kubernetes.

1. Not Leveraging AWS EC2 Spots / GCP Preemptible VM

When you are running Kubernetes clusters on a cloud environment such as GCP or AWS, you always want to find a good balance of cost to performance. One of the common strategies that are useful for managing costs are by using Spot nodes on AWS or Preemptible VMs on GCP.

AWS Spot Instances: With EC2 Spot instances, you pay the Spot price that’s in effect for the time period your instances are running. Spot instance prices are set by Amazon EC2 and adjust gradually based on long-term trends in supply and demand for Spot instance capacity. Spot Instances are available at up to a 90% discount compared to On-Demand prices.

GCP Preemptible VM: Preemptible instances are highly affordable, short-lived compute instances suitable for time-flexible workloads. They offer the same machine types and options as regular compute instances, last for up to 24 hours, and are available in all projects whose location is set to a Google Cloud region. Pricing is fixed so you will always get low cost and financial predictability, without taking the risk of gambling on variable market pricing. Preemptible instances are up to 80% cheaper than regular instances.

Thus, whenever possible use AWS Spot Instances or GCP Preemptible VM to reduce the cost of your Kubernetes Cluster.

2. Not selecting the right Instance type/size

The other important factor is to make sure you select the right instance type and size while you are configuring a Kubernetes Cluster.

Why does the type of cloud instance matter? Since they all fall under the AWS umbrella, aren’t they effectively the same? Not quite. Some offer substantially more memory, a focus on CPU optimization or accelerated GPU performance; while others provide a more generalized set of services and resources. So, it’s really important to choose the right instance based on specific needs.

Each individual node needs to be powerful enough to run cluster-supporting workloads, as well as a reasonable amount of your business-critical applications. While you may think making a couple of large nodes might be the best option, there are some downsides with this.

Firstly, your cloud costs would skyrocket due to provisioning larger nodes. Secondly, making a few big nodes means that whenever a node fails, your available capacity drops much more – and that means that the knock-on effect on the rest of the cluster is much bigger. A good rule of thumb is that you should try to have nodes sized so that losing one node doesn’t take a significant portion of your capacity offline.

3. Not Securely configuring the Kubernetes API server

The Kubernetes API server manages all API requests between external users and the cluster's internal components. It processes these operations and acts as the main gateway to the cluster’s shared data, enabling interaction between all the system's components.

Make sure to securely configure the settings while configuring the Kubernetes API Server of your cluster because if you don’t properly control access to Kubernetes API, you’re leaving yourself wide open to attack. Some of the most common and risky mistakes that you can make is not using authentication for access to the API server since that is the main administration entry point to your cluster

To configure your Kubernetes Cluster with an authentication token that provides access to the Kubernetes API by default is a high risk and if that token has cluster-admin rights, an attacker could easily escalate privileges and can take over the entire cluster by just accessing the single container.

4. Not taking a holistic approach to container security

Many people assume that containers are inherently more secure because they are ephemeral. However, the ease of spinning up new containers automatically can backfire if your automated configuration includes security vulnerabilities.

It is advised to follow up a ten-layer approach to container security. This includes both the container stack layers (such as the container host and registries) and container lifecycle issues (such as API management).

Focusing too narrowly on a single area – such as Kubernetes and orchestration is likely to increase risks elsewhere. If you are sure that you have secured your cluster following the best practices, that doesn’t mean that the applications you run on K8s are also secured. They may still be vulnerable due to vulnerabilities in the code or bad setup of privileges, such as container images configured to run with root privileges.

  • Image Scanning - ADD
  • Security Policies

5. Improperly configuring (or ignoring) native Kubernetes features

It’s easy to misconfigure settings like incorrectly setting role-based access controls (RBACs) that allow too much or even too little access, this causes potential security holes or deployment issues when applications are trying to communicate. The intent should be to limit what users including administrators can do on the cluster.

In general, networking in Kubernetes can come with a significant learning curve, which in turn makes it fertile terrain for security mistakes, expert says that “There should be a zero-trust approach”. An example illustrating potentially risky default configuration is when workloads are deployed into the default namespace which means those workloads are not isolated from each other even though the mechanism of namespaces allows for such isolation, this can cause a blast.

6. Not Selecting wisely between KOPS and Amazon EKS

It is really crucial to wisely choose between KOPS and Amazon EKS while configuring a Kubernetes Cluster.

AWS is the most widely used Cloud Provider and it offers EKS as a managed Kubernetes solution. It doesn’t give you control over the entire Cloud Environment and you can only control the worker nodes in the control plane. Setting up the Kubernetes cluster might seem to be a difficult task in the beginning, but after doing this tedious job, maintenance of the cluster becomes really easy as the maintenance work is managed for your by AWS.

If you decide to use KOPS to set up the Kubernetes Cluster, setting up might seem to be really easy but the maintenance of Cluster is a difficult job. With KOPS, you will be managing the cluster yourself which will include routine activities such as certificate rotations, node upgrades and more.

 Both EKS and KOPS have their own advantages and disadvantages, thus it is really important to know the needs of your organization and accordingly choose between Amazon EKS and KOPS. Check out this brilliant post that compares AWS EKS vs KOPs.

7. Not using the Resources judiciously

You should always specify resource requests (CPU, memory) and limits. If you forget to set it up, Kubernetes will closely pack all of your Pods into a handful of nodes. A single pod may consume all the CPU or memory available on the node, the cluster won’t scale itself up as needed and causing its neighbors to be starved of CPU or hit Out of Memory errors.

You have to make sure to use Resource requests, it lets the scheduler know how many resources you expect your application to consume. When assigning pods to nodes, Kubernetes budgets them so that all of their requirements are met by the node’s resources.

Moreover, Setting Resource Requests is also important because Kubernetes Horizontal Pod autoscaler works on these limits.

So, go ahead and define requests and limits for each of your containers, if you aren’t sure, just take a guess, and keep in mind the safe side is higher. Whether you are certain or not, make sure to monitor actual resource usage by your pods and containers by using your cloud provider.

8. Not using a Autoscaler like KEDA

When your applications have been deployed to production environments, at any given point of time, the traffic to your application can increase. This increase in demand also means that greater CPU and memory resources will be required to serve your application to the end users. 

While it’s possible to provision nodes of a large size, it is not recommended as you will have a lot of underutilized resources which will also impact your cloud costs. Instead, we recommend using an auto scaler such as KEDA or HPA which will scale the nodes and pods as required.

KEDA allows you to set the conditions under which your nodes will scale horizontally, making it a great solution for handling increased demand. To learn more about how KEDA can help with scaling your nodes, please check out our detailed blog about KEDA.

9. Lock Configurations for Deployment Environments

One single application is often deployed to multiple different environments such as production, staging, testing, QA and more. You might even be deploying a single application to multiple production environments that are distributed across different regions. Each environment would have some different configurations that you don’t want to change.

You must ensure that these configurations are locked and cannot be changed by anyone without the process access permissions. For example, when the production application is to be deployed to multiple regions, you want the configuration of each reach to be locked.

With Devtron, you can select all the different environments to deploy applications to, and even lock the configurations. Once the configurations are locked, they cannot be changed without the proper approval from an authorized user. You can learn more about how Devtron protects environment configuration here.

10. Add Approval Gates for triggering deployments

Whenever deploying applications, you do not want a wrong build to be deployed into the production environments. If a buggy application version gets deployed accidentally, it can lead to a degraded user experience, negatively impacting businesses. In order to ensure that the correct builds are being deployed, approval gates should be in place. These approval gates act as guardrails against releasing the wrong deployment version or triggering a deployment during peak hours.

Devtron has a robust approval gateway that can be set while triggering deployments. You can set the user from who approval is required before triggering any deployments. Check out this blog to learn more about Devtron’s deployment gates, and how they can help you.