11 Kubernetes implementation mistakes – and how to avoid them
I manage a team that designs and introduces in-house Kubernetes aaS at Mail.ru Cloud Solutions. And we often see a lack of understanding as to this technology, so I’d like to talk about common strategic mistakes at Kubernetes implementation in major projects.
Most of the problems arise because the technology is quite sophisticated. There are unobvious implementation and operation challenges, as well as poorly used advantages, all of those resulting in money loss. Another issue is the global lack of knowledge and experience with Kubernetes. Learning its use by the book can be tricky, and hiring qualified staff can be challenging. All the hype complicates Kubernetes-related decision making. Curiously enough, Kubernetes is often implemented rather formally – just for it to be there and make their lives better in some way.
Hopefully, this post will help you to make a decision you will feel proud of later (and won’t regret or feel like building a time machine to undo it).
Let me introduce you to the world of mistakes in Kubernetes PRO implementation, as well as some tricks for the magic to happen. First of all, I mean reduction of time-to-market, speeding up of releases, and guaranteed seamless app operation.
Mistake 1. Not getting ready to work with Stateful apps
Initially, Kubernetes was intended for Stateless apps that process data but store nothing – like web-apps.
To deploy Stateful apps that store data in Kubernetes, Persistent Volumes (PV) can be used. These are data storages related to pods. Regarding the app, a PV is mounted in the container as a local directory. This makes the data operation pattern changing unnecessary. At the same time, with Kubernetes the storage becomes available to the app at any cluster node. To use a PV, you will have to create a DSS and adjust Kubernetes appropriately.
Otherwise, you can rework the app to store data not within but in separate databases or object storages. These can be deployed on your own or obtained as ready-to-use services (aaS).
When establishing these Stateful storages, the task of their backing up shall be solved. This task is challenging and tricky to solve on your own. Moreover, such a system is hard to maintain. However, our Kubernetes aaS has an out-of-the-box integrated PV based on a CEPH block storage, virtual file storage or SSD/NVME discs connected to every computational server via iSCSI. Besides, object storage or databases can be run on the same platform in aaS format.
Unless you refine the architecture for Stateful, various mishaps are possible. A typical scenario: a distributed storage system isn’t deployed with Kubernetes. Only Stateless apps are started. After finding out that some data is to be stored, they use a server nearby to bring up an NFS that Kubernetes-based apps would address. Early in the game, these are service apps like the monitoring ones, and losing information from them isn’t a big deal. Unfortunately, later the unreliable and slow NFS will be addressed by production and core services that will store their data within. This situation implies many risks that are much more difficult to solve on the go. High production storages should be integrated into the infrastructure along with Kubernetes.
Mistake 2. Not tuning monolithic app architecture for work in Kubernetes
It’s about using Kubernetes in a monolithic project without adjusting the infrastructure and/or its operation and management patterns.
Kubernetes is an optimal solution for apps with loosely coupled microservice architecture – clearly defined modularity limits, independent deployment, development, and scaling-up. Such apps can be conveniently packed into containers and sent via the orchestrator. That’s why Kubernetes shouldn’t be used in monolithic architecture, as well as for loads with many dependencies. This way, the app update alone will take hours.
What are the key moments of app tuning? Firstly, monoliths like to store data within. And you’ll have to teach a monolith how to operate Stateful storages.
Secondly, you’ll need to teach it how to run in several instances:
- Instances should work with data stored apart from the monolith and know how to access them.
- There are usually Memcached or Redis in the monolith at the application for caches, which the app addresses locally. When an app is deployed in Kubernetes, a separate Redis cluster is required. Moreover, the app shall know how to use it as a cluster. This is something that classical monoliths can’t do.
- In Kubernetes, fail-safety, balancing and self-healing are ensured due to app instances easily migrating to other nodes. Things are different in the case of the monolith – even if it runs as several instances, client apps usually can proceed only with the same instance they’ve started to work with. In this case, migrating to another node just resets the session. That’s why the app must learn to operate such migrations. Another option is deploying a sticky session at the balancer.
Finally, you’ll have to split the monolithic application to adjust it to Kubernetes. The point of doing this is to obtain a loosely coupled architecture, where every functionality is split into independent services and packed into containers — along with dependencies. Without that, an app becomes a distributed network monolith. This is, a set of strongly coupled micro monoliths that can only be scaled up all together in a single container. No way this can save the initial case of unmanageable and risky deployments. Count out reusing the monolith app code – the architecture and the code itself will require much rearrangement.
In some cases, the logic has partially been migrated to microservices, with some of them remaining monolith. Migrating everything (the monolith part included) to Kubernetes is a bad idea. A good option is leaving the monolith part beyond Kubernetes and using the latter to scale up microservices.
Mistake 3. Trying to switch to Kubernetes entirely
For many companies, Kubernetes is a way to unify server operation at a single platform. That’s where the urge to manage all the things in a similar way comes from. Starting a container in Kubernetes is easier than setting up an app or a database at a virtual machine. This is a tempting yet fickle way. The productionstage will surely clear things up, naming everything that wasn’t considered at the migration stage.
For instance, migrating a database to Kubernetes is possible but hardly ever required. If you want to do it anyway, remember that Kubernetes mobility requires a DBMS cluster with automatic Master/Slave changing. Moreover, the database in Kubernetes will be underlain by a network-attached storage, which will only aggravate the delays.
If your system has any components that could unpredictably change their behavior after moving to Kubernetes, take your time and consider the consequences.
Mistake 4. Not preparing the team for Kubernetes implementation
When dealing with Kubernetes, we see the lack of experts. Suppose administrators that install and maintain Kubernetes don’t understand its technological concept. In that case, we end up not with a cloud-native system but with an architecture based on manual operation. Unadjusted to the company's tasks, it is also unable to use the cluster opportunities.
Kubernetes needs administrators with deployment skills and specific expertise. Kubernetes isn’t just a file run on an independent virtual machine. It’s an ecosystem of components. You should know how to deploy a cluster and then maintain it in a production environment with software running. You have to know what Kubernetes is made of, its operation and adjustment principles, as well as the best practices for app running.
If system administrators aren’t familiar with Kubernetes, they will hardly manage production with the first shot. You will have to teach them or have them trained, give them time to experiment. Get ready for nasty surprises at the production stage. If you decide to hire some experts, check if they really can deploy Kubernetes on their own and maintain it. Many of them can barely push a button at Rancher or alike. Such a background is definitely not enough to maintain a cluster.
Nevertheless, if you find experts capable of managing the app infrastructure in Kubernetes, you will need far less people than you required for container migration. There is another benefit, too. Kubernetes as a technology is highly standardized, so any new employees with the relevant experience will be able to swing into action. Mind that things get really bad whenever home-made solutions are used for deployment automation instead of Kubernetes.
Mistake 5. Not getting the processes ready for Kubernetes implementation
The development will be adjusted for Kubernetes. First of all, you should integrate DevOps procedures that affect all of the code development. Companies often keep their previous operation procedures when implementing Kubernetes, and developers keep writing code as before.
In addition, implementing Kubernetes changes how roles and reliabilities are distributed within the team. It enhances all the cultural issues, too. If developers don’t feel like implementing DevOps and actually resist this methodology, none of the tools will be of use – including Kubernetes.
DevOps ideology has administrators maintain the infrastructure and developers deal with the app – from its design and code writing to the startup, implementation, monitoring, and operation in the production environment. Now developers don’t have to understand the infrastructure.However, they have to understand how their code works with all the processes.
Cases are common though when a team decides to split a monolith into microservices, spends much time on that, and ceases all the business development. After splitting the monolith, they are going to manage their app with Kubernetes. The problem is nobody ever told them that they would need to implement Docker and CI, too. This causes resistance, and in the end, although they have Kubernetes and microservices nominally, this has no effect on the app.
Why does nothing improve until you change your development approaches?
- If developers don’t see into Docker, then a DevOps engineer writes Docker files for them. But anyway, it is developers who are aware of the app operation, its versions, and dependencies. They could easily write a Docker file (if they knew how to) and maintain the app infrastructure. However, if a Docker file is made by DevOps Engineers, they can miss out some of the details plus. Moreover, will need to look into the app, which will make things take longer.
- Sometimes, DevOps Engineers substitute developers in writing Kubernetes manifests. The fact is that the technology enables developers to control the app launch themselves — along with all the resources and health checks. As long as this job is done by DevOps, it just takes longer.
- Finally, there are fewer DevOps Engineers than developers on any team. If a hundred of developers are generating microservices, while 5 to 10 DevOps are trying to write manifests for all of them, they all end up with a rollout queue. Hence, the procedures don’t speed up but slow down.
What can cause the resistance? Developers have to familiarize themselves with the Docker Way concept. This comprises studying Docker, Kubernetes manifests, health check introduction, and of resource indication for the app. Errors at these stages may result in an app that won’t launch, and a developer needs to see into such things. Sometimes, a development team just doesn’t understand why they have to undertake typical administration tasks.
In fact, the new approach is good for developers, and this message is easy to communicate to the team. For instance, previously they had to write a code and start a task in Jira for infrastructure possession, which stayed pending there for a month. With Kubernetes, the development team will have total control over rollouts. A posteriori: as soon as developers realize and learn to use it, they start enjoying it.
Kubernetes makes a developer’s life so much easier thanks to its declarativity. All the configurations and infrastructures are described in the files. Mind that in the case of an imperative approach, like in Ansible, configuration management requires manual ordering of the tasks required for the app rollout.
In the case of a declarative approach that Kubernetes supports, you declare not the tasks required but the status. This means you don’t need much understanding of how things work or how they can be done. You don’t have to consider the system’s status at the moment. Easy-to-follow YAML manifests are used in Kubernetes for configuration management along with transparent patterns, IaC (infrastructure as a code) tools, all of them describing what is going to be run, where and how, as well as its parameters.
Along with the app evolution, the team acquires general pipelines of software delivery to servers, as well as general patterns for CI/CD and Kubernetes manifests. All of these significantly speed up the procedures. Developers no longer have to reinvent how things are to be connected but simply connect ready-to-use CI/CD task patterns to CI files. Kubernetes manifest patterns help not to write them anew at every turn. DevOps provide instruments for making universal patterns, and developers only have to indicate the parameters that make a certain case different from the pattern. When every member of the team sees the benefits to be gained, switching to DevOps practices becomes so much easier.
Mistake 6. Not using technologies and approaches that extend Kubernetes
The first thing regards testing the new Kubernetes versions. In fact, it is an application with its own lifecycle that requires solving of adjustment, update, server connection, and other similar tasks. It can be compared to a standard OS. The issues are basically the same, although Kubernetes is more complicated than an OS at an independent server. This is why Kubernetes updates can cause trouble in app operation. It is a good idea to set up a test environment where apps could be checked.
The second thing is that Kubernetes requires boosting up. Its ecosystem comprises some two hundred of products that are easily integrable with it by API, in particular, SDN, Ingress, Prometheus, Fluentd, DSS, authorization, and so on. But don’t expect all of it to work well right out-of-the-box: all of it requires further adjustment.
Basic things worth focusing on are:
- app data storage: Ceph, GlusterFS, Minio;
- monitoring: Prometheus, Influx + Telegraph;
- log registration and storage: Fluentd, Elasticsearch, Loki;
- autoscaling: Metrics Server, Prometheus Adapter;
- safety control: Dex, Keycloak, Open Policy Agent;
- CI/CD tools: Gitlab, Jenkins, Argo CD, Teamcity;
- Docker Registry: Harbor, Nexus, Official Docker Registry;
- networks: Calico, Weave net, Cilium;
- storage and operation tools for the Serverless function: Kubeless, OpenFaaS, Fission;
- Service Mesh for data flow control in the app without code modification: Istio, Linkerd, Consul.
A company can get Kubernetes but be unaware that it lacks an out-of-the-box monitoring solution. After installation, they will realize then need the cluster itself to be monitored — just like the apps within.
And it is not some regular type of monitoring that Zabbix can manage. Indicating server IPs is just not enough – Prometheus or Telegraph are required. It is the same story with logs. A cluster out-of-the-box presents a log history, but it is stored only for the apps currently running. This history vanishes after redeployment. A monolith allows for manual server checking and log registration, but it won’t work in Kubernetes.
Reliable access to the cluster requires a fail-safe balancer like MetalLB or VirtualIP. Such tools can be «out-of-the-box» only in Managed solutions – for instance, our K8s aaS has already got integrated balancers.
Mistake 7. Not using typical task solution automation
A great feature of Kubernetes in the field of development is that routine operations can be avoided thanks to patternation (making manifests manually is a frequent case).
Using manifest patterns leaves out a major part of manual work and promotes the centralized implementation of the best practice. Another thing is operators used to automate routine procedures on cluster objects depending on the conditions. An operator is executed within the cluster and can, for instance, create Redis without describing all the manifests, deployment, PV, etc. It creates objects in a configuration of already existing ones with the necessary adjustments. Check here for the operator list.
As compared to a Kubernetes manifest pattern, operators are more potent, being independent apps. They can read configurations and parallelly ensure the app logic operation, interact with other systems, register in the Auto-Discovery system, and/or make dashboards in Grafanan.
Finally, infrastructure management is barely possible without IaC tools like Terraform.
Mistake 8. Believing that you can forget about Kubernetes after starting it at the cloud
A cluster can be scaled up at a cloud in 10 minutes. K8s aaS providers are trying to implement as many add-ons required by Kubernetes and its apps as possible. These are used for automated updating and scaling, balancing, networks, and monitoring. «As a Service» format makes cluster adjustment and management easier. And yet, it doesn’t mean that a cloud cluster can simply be scaled up and abandoned.
No K8s aaS provider can guarantee absolute availability of their service. A 99.95% SLA implies a shutdown of 5 hours a year max., and a provider can tolerate that. In case you need top availability,load balancing and replication can help. If a provider admits scaling up of the cluster for several availability zones, you should definitely take this chance.
Besides, providers are only responsible for their own part of the K8s aaS infrastructure. They answer for the service availability, which depends on the equipment operation, the virtualization platform, as well as Iaas and PaaS components. At the same time, analyzing the workload, traffic, and performance is always the customer’s responsibility. A company’s employees are charged with query adjustment, capacity utilization planning, autoscaling logic adjustment, emergency recovery (unless a provider’s emergency recovery service isn’t used), adjustment of safety and user access, database operation, and network configurations.
The cloud can manage most of the infrastructure administration and management tasks – but not all of them. Prior to connecting K8s aaS, you really should check the list of things that your company will have to manage on its own.
Mistake 9. Starting a cluster at K8s aaS but neglecting other services offered by the platform
I’ve already mentioned obtaining DBaaS, object, and Stateful storages. Now, let me explain how Kubernetes synergizes with other services offered by the platforms and associated with machine learning and big data processing. Such a synergy powers neural network learning, helps to use resources reasonably, and performs all the calculations within the cluster.
Kubernetes is a perfect fit for data science and big data operation. Analysts feel that building pipelines for big data operation is easier than using the data for decision-making automation, as well as other tasks. Kubernetes ensures quick uptake of as much infrastructure for machine learning as necessary. Thanks to autoscaling, the cluster promptly responses to load increase. Machine learning often requires many resources here and now, and autoscaling ensures their scaling up and down as required.
Besides, there are so many machine learning solutions tailored to Kubernetes — for example, Kuberflow. They can also be used together with K8s aaS, although started independently.
Mistake 10. Believing that Kubernetes is secure by default
One shouldn’t think that the cluster, apps, and data are automatically protected as long as default settings are unchanged.
We know cases when the customer’s security service had no knowledge of Kubernetes and didn’t consider it as an infrastructure component. Administrators don’t think of safety, whereas the information security department believes an antivirus willdo. Out of the box, Kubernetes creates a security breach. For instance, a miner once got into the JW Player development cluster thanks to its unrecovered security vulnerabilities and used it for crypto mining.
This issue can be solved by DevOps introduction with consequent switching of all the company’s departments to universal task solving tools. The data security service shall take part in the DevOps procedure, and DevSecOps processes arranged shall ensure automatic security control while data security tools are being implemented into pipelines.
Basic safety practices:
- Pod security policy;
- Network policies;
- User authorization;
- Privilege restrictions.
Mistake 11. Implementing at any costs even when you don’t need it
Not every big company dealing with development really needs Kubernetes.
It is indicated in a case with continuous development, expected support for new releases, and/or when requirements imposed by HighLoad shall be observed.
In case of a minor development project, you will have to deploy a massive architecture for a simple code to work after implementing Kubernetes. If you’re going to implement Kubernetes, think of your company’s projects that might need it and that will benefit from it indeed.
Considering all the advantages of Kubernetes, remember that it is a sophisticated technology. That is where all the issues described above come from. Before implementing it, it is worth getting the architecture ready, making sure that you have teammates that know this tool well, as well as sum up the projects where Kubernetes can be of use. As long as you are sure but are wishing to introduce this technology asap, we are eager to offer our help and support when it comes to adjustment and operation of our Cloud Kubernetes aaS.