Kata Containers is actually now the main way to run containers in an isolated virtual machine for greater security. I tell you how to install them for use with Containerd and Docker while still being able to switch between release versions.
DevOps *
Software Development Methodology
Hashicorp Vault — gcs + gcpckms
Secret Management and Why It’s Important
Hi! My name is Evgeny, and I work as a Lead DevOps at Exante. In this article, I will discuss the practical experience of setting up a high-availability HashiCorp Vault with a GCP storage backend and auto unseal in Kubernetes (K8s).
Our infrastructure used to consist of thousands of virtual and physical machines hosting our legacy services. Configuration files, including plain-text secrets, were distributed across these machines, both manually and with the help of Chef.
We decided to change the company’s strategy for several reasons: to accelerate code delivery processes, ensure continuous delivery, securely store secrets, and speed up the deployment of new applications and environments.
We decided to transition our product to a cloud-native model, which required us to change our approach to development and infrastructure. This involved refactoring our legacy services, adopting a microservices architecture, deploying services in cloud-based Kubernetes (K8s), and utilizing managed resources like Redis and PostgreSQL.
In our situation, everything needed to change—from applications and infrastructure to how we distribute configs and secrets. We chose Google as our cloud provider and HashiCorp Vault for secret storage. We've since made significant progress on this journey.
Why HashiCorp Vault?
There were several reasons:
Iterative CI development in GitLab for an automated testing framework
A story about evolving our CI process for a monolithic Python repo with tests, challenges faced, and solutions. Discussing Docker, linters, Allure TestOps, and more.
Argo CD vs Flux CD
За последнее время я вижу всё больше споров на тему двух популярных GitOps инструментов: Argo CD и Flux CD.
На самом деле я считаю такие споры необоснованными, потому что глубоко убеждён что внимания заслуживают оба инструмента и каждый из них хорош для решения своего круга задач.
В своей профессиональной деятельности я активно использую и тот и другой. Я хочу поделиться с вами своим мнением и кейсами использования. Надеюсь эта статья поможет вам выбрать наиболее подходящий инструмент под ваши нужды.
Memory consumption of .NET applications on Linux
In this article, I will cover the memory consumption of .NET applications on Linux. Firstly, we will try to understand the idea of virtual memory. Then, we will examine the memory statistics that Linux provides, such as RSS (Resident Set Size), VmData, RES (Resident Memory Size), and SWAP. Following that, we will delve into specifics related to the .NET.
Managing AWS Auto Scaling Group Instance Refresh: The Harmony of Terraform and Ansible
In the DevOps realm, where automation is crucial, the management of resources and updating processes in the cloud is vitally important. Many modern projects, particularly in AWS cloud environments, leverage Auto Scaling Groups (ASG). This mechanism aims to achieve three key objectives: balancing loads, increasing service reliability, and optimizing operational costs for efficiency and effectiveness.
Imagine working at a company where you deploy applications on Amazon's resources. To streamline this process and manage configurations more effectively, you use pre-built AMI images. These are crafted with tools like HashiCorp Packer, ensuring your applications launch swiftly and reliably. For the actual infrastructure deployment, you turn to Terraform. It's widely recognized as the standard in many major companies for managing cloud resources and using the IaC (Infrastructure as Code) approach.
As an IT engineer, you sometimes need to update instance versions to a newer AMI image, either for the latest security patches or to introduce new functionalities. The challenge lies in updating an active ASG without causing downtime. It's crucial to ensure the new AMI performs as reliably as the existing one, balancing the need for updates with system stability and uptime.
ASG's instance refresh is a crucial feature that allows for updating instances within a group while minimizing downtime, thereby maintaining high availability. However, ensuring the success of such updates, especially in large, complex systems, can be a challenge. Terraform resources, such as aws_autoscaling_group, can initiate this process but don't provide progress tracking. This limitation becomes apparent when other infrastructure components, such as certificate renewals or DNS updates, depend on the state and version of the instances. Monitoring the update process is essential to maintain an accurate infrastructure state after Terraform's execution.
To overcome this challenge, Ansible can be utilized...
In simple terms about a simple Nginx Unit
This article describes the new Nginx Unit web server. In it you can learn more about the web server itself, its installation and configuration: how to use listeners, routing, how to install TLS certificates. The article will show how easy it is to work with it and that huge configs are slowly becoming a thing of the past.
Apache Kafka and AWS S3: backup and restore
It hasn't been that long since my last cheat sheet was published and this is the day for the new one. This time I got a task to find a solution for backing up Kafka to S3 bucket, following the terms:
Validation WebHook troubleshooting, how low can you go?
I'm Alex Movergan, DevOps team lead at Altenar. I focus on automation in general and on improving troubleshooting skills within my team. In this article, I'll share a captivating tale that revolves around Kubernetes, validation webhooks, kubespray, and Calico.
Join me on this DevOps journey as we explore real-world scenarios unraveling the intricacies of troubleshooting in a Kubernetes environment.
Google SSO for Kibana straightforward way on basic license
As many times before, I keep writing cheat sheets after the tasks which made me search a lot and glue things together before I found a solution. Long story short, this time I was asked to set up Google SSO for Kibana without switching from a basic license to a paid one. Kibana, by the moment, already had authentication set up and the customer wanted to log in there with the use of Google Workspace user accounts. Along with that, the customer wanted to keep user account which was already there, in Kibana. There was no need for role mapping or other advanced features, just plain SSO and that's all. As you probably know Elastic provides SSO feature only on paid license, so I have had no other way to get it working except for using 3rd party software. But first things first, let's list the steps we should go over:
About «free» #iam, #oidc, #saml, #etc
There is a task to develop a solution that allows:
- Access control to web applications via #oidc/#saml
- Access control to vanilla #Kubernetes
- SSH access control to bare-metal hosts - using SSH certificate technology if possible
- Authorizing users to other server applications such as #Vault, #PostgreSQL, #Kafka, #ClickHouse, #MongoDB
- Being able to connect users from third-party organizations to certain resources based on group membership, etc
- Ensuring that everything described above works, including the bare metal environment
Data Phoenix Digest — ISSUE 2.2023
Video recording of our webinar about dstack and reproducible ML workflows, AVL binary tree operations, Ultralytics YOLOv8, training XGBoost, productionize ML models, introduction to forecasting ensembles, domain expansion of image generators, Muse, X-Decoder, Box2Mask, RoDynRF, AgileAvatar and more.
Understanding the Differences Between Kafka and RabbitMQ: in Simple Terms
Software message brokers became the standard for creating complex systems. However not all IT specialists understand how these instruments work. Pavel Malygin, Lead System Analyst at Innotech, dives into the topic of message brokers and explains how they are used.
An Antidote to Absent-Mindedness, or How I Gained Access to an OpenShift Node without an SSH Key
Typically when a Node falls out of the OpenShift cluster, this is resolved by simply restarting the offending element. What should you do, however, if you’ve forgotten the SSH key or left it in the office? You can attempt to restore access by using your wit and knowledge of Linux commands. Renat Garaev, lead developer at Innotech, described how he found the solution for this riddle and what was the outcome.
IaC Development Life Cycle
IaC Development Life Cycle
This is the translation of my speech at T-Meetup: DevOps Life Cycle.
I believe that you have heard about SDLC (Systems development life cycle). Is it possible that the same things are applicable for the IaC?
Best warnings of static analyzer
Everyone who runs the static analyzer on a project for the first time is slightly shocked by hundreds, thousands or even tens of thousands of warnings. It may be frustrating. Is my code so terrible? Or is the analyzer lying? In any case, filtering by the severity changes the situation, not completely though. That's why we thought about how we could improve the first experience with the analyzer. Let me show you the new feature step by step...
Turning RTSP into WebRTC: how many cameras will the server withstand?
This article continues the series of articles on load tests. Today we will analyze the testing methodology and answer the question: "How many IP cameras can be connected to a WebRTC server?"
Load test of WebRTC recording on AWS
Do you remember how just a few years ago it was a disaster to lose a camera at the end of a vacation? All memorable pictures and videos then disappeared along with the lost device. Probably, this fact prompted the great minds to invent cloud storage, so that the safety of records no longer depends on the presence of the devices on which these records are made.
WebRTC face to face video chat. Load test
We continue to review variants of load tests. In this article we will go over the testing methodology and conduct a load test that we will use to try and determine the number of users that could watch and stream at the same time, meaning the users will simultaneously publish and view the streams.
Load testing for WebRTC mixer
This article is a continuation of our series of write-ups about load tests for our server. We have already discussed how to compile metrics and how to use them to choose the equipment, and we also provided an overview of various load testing methods. Today we shall look at how the server handles stream mixing.