Good afternoon. My name is Anatoly Bobunov, and I am an SDET (Software Development Engineer in Test) at EXANTE. In this article, I will discuss the part of my work that involves CI (Continuous Integration) in our project. I will briefly describe the state of CI when I joined the company, talk about the key stages of its development, and share the challenges we faced along with the solutions we implemented. I understand that each segment of this article could be expanded into a separate text, but I decided to start with a historical overview.
Currently, we have a monolithic repository with automated tests for the entire backend. We write tests in Python using Pytest as the test runner. The tests are executed inside a Dockerfile in GitLab CI. The results of the automated tests are sent to Allure TestOps, some analytics data is sent to Grafana, and custom notifications are sent to Slack. Since there are many services to test, each test is located in separate subfolders within the main tests folder.
State of CI at the Time of My Arrival
I joined the company to optimise the testing framework and address specific challenges faced by automation engineers. When I started, the pipeline had only one stage called "tests", where tests for each service were executed as separate jobs. The number of tests and the duration of job runs varied significantly across services, with execution times ranging from 5 minutes to 1 hour. Slack notifications were sent regarding the overall pipeline status, but one had to delve into the GitLab CI logs for detailed information.
Our company organises the process so that testers are divided into small teams, usually consisting of one to four people. Each team is assigned a specific group of services to test. Testers primarily work with the part of the framework that interacts with their assigned services. This often results in some team members needing more skills, while more experienced colleagues are short on time to focus on framework development and tackle crucial architectural issues.
Dockerfile for the Test Environment
Our testing team is distributed, with most members working on Linux and macOS, and only occasionally on Windows. To avoid compatibility issues with libraries and operating systems, we isolated the entire test environment within a Docker image.
Writing the Dockerfile wasn’t particularly challenging, but the resulting image size 1.3 GB is somewhat of a concern. While this size is acceptable, I occasionally consider ways to reduce it. Below is an example of the Dockerfile that was written at that time:
# Use the official Python 3.9 image with a minimal configuration
FROM python:3.9-slim
# Copy all files from the current directory on the host to the ./project directory in the container
COPY ./ ./project
# Set the working directory inside the container
WORKDIR ./project
# Add the /project directory to the PATH environment variable
ENV PATH=$PATH:/project
# Create the allure-results and log directories if they do not already exist
# allure-results/ - is required for Allure reports
# log/ - is where various log files are stored
RUN mkdir allure-results || true && mkdir log || true
# Update packages and install necessary dependencies, then clean up the cache to reduce the image size
RUN apt-get update && apt-get install --no-install-recommends -y wget python3-pip python3-dev python3-venv default-jre gcc procps \
&& apt-get -y autoremove --purge && apt-get -y clean \
&& rm -rf /var/lib/apt/lists/* && rm -rf /var/cache/apt
# Update pip and install dependencies from requirements.txt
RUN pip install --no-cache-dir --upgrade pip && pip install --no-cache-dir -r requirements.txt
# Download the latest version of allurectl and make it executable
# allurectl - CLI interface for interacting with Allure testops
RUN wget https://github.com/allure-framework/allurectl/releases/latest/download/allurectl_linux_386 -O ./allurectl && chmod +x ./allurectl
As a side development of this task, we came up with the idea that if a Merge Request (MR) includes changes to configuration files, library versions, or the Dockerfile, we would also like to build a test version of the Docker image. In the MR pipeline, this test Docker image would be used to validate the MR code against the master branch image. Upon merging the MR into the master or dev branch, the test Docker image updates the master or dev image and becomes the primary image.
The Docker image tag is stored as a variable in docker.env and is passed through all jobs.
# Files That Should Trigger Docker Image Rebuild
.changes_def:
changes: &changes-def
- Dockerfile
- requirements.txt
- .dockerignore
- .gitlab/rebuild-docker-stage.yml
# Rules for Triggering Docker Image Builds
.rules_push:
rules:
- &rules-push
if: '$CI_PIPELINE_SOURCE == "push" && $CI_COMMIT_REF_PROTECTED == "true"' # Master and Dev branches is protected
changes: *changes-def
.docker_def: &docker_def
image: docker:24
tags:
- docker
services:
- name: docker:dind
command: [ "--tls=false", "--insecure-registry=my-nexus-registry" ]
before_script:
- docker login -u $NEXUS_LOGIN -p $NEXUS_PASS $REGISTRY_URL
build_docker:
stage: build-docker
<<: *docker_def
script:
- |
DOCKER_BUILDKIT=1 docker build \
--tag $IMAGE_TAG . \
--cache-from $IMAGE_TAG \
--label "org.opencontainers.image.title=$CI_PROJECT_TITLE" \
--label "org.opencontainers.image.url=$CI_PROJECT_URL" \
--label "org.opencontainers.image.created=$CI_JOB_STARTED_AT" \
--label "org.opencontainers.image.revision=$CI_COMMIT_SHA" \
--label "org.opencontainers.image.author=$CI_COMMIT_AUTHOR" \
--label "org.opencontainers.image.version=$CI_COMMIT_REF_NAME"
- docker push $IMAGE_TAG
- echo "IMAGE_TAG=$IMAGE_TAG" >> docker.env
rules:
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
variables:
IMAGE_TAG: $IMAGE_TAG_BRANCH
changes: *changes-def
- *rules-push
artifacts:
reports:
dotenv: docker.env
expire_in: 1 days
In this way, we addressed two problems:
Created a stable environment for development.
Eliminated the need to manually rebuild the image when updating libraries in the project.
The main issue with this approach is the build time for the image. Since the image is built each time a new pipeline is initialised in an MR that includes changes to tracked files, the average build time is about 2 minutes. While this is not critical yet, it’s worth considering optimising this step.
Integration with Allure-Testops
One of the first tasks was to integrate Allure reporting. Allure TestOps is a platform for managing and analysing testing, providing tools for report creation, data visualisation, and integration with CI/CD processes. My colleague handled the installation of Allure TestOps in our environment. My role involved assisting with creating the necessary integrations, updating the pipeline, and enhancing the framework.
To send data to Allure TestOps, we need to:
Log in to Allure TestOps (allurectl auth login).
Create a test run in Allure TestOps (allurectl job-run start).
Send data to Allure TestOps.
To run the tests using this command, we must already be logged in and have an active job running in Allure TestOps. It is important to consider that multiple jobs can run simultaneously during the Test stage. If the MR changes the code used for testing multiple services, all the jobs for those services will be triggered.
We had to create a “temporary” intermediate stage called the Prepare allure run, which handles all the preparatory work for interacting with Allure. All data is saved as job artifacts and passed on to the test jobs.
prepare_allure:
stage: prepare-allure-run
tags:
- docker
image: $IMAGE_TAG
script:
- export PIPELINE_DATE=$(date +%Y_%b_%d)
- export PIPELINE_DATETIME=$(date +%b_%d_%H:%M)
- echo "PIPELINE_DATE=$PIPELINE_DATE" >> variables.env
- echo "PIPELINE_DATETIME=$PIPELINE_DATETIME" >> variables.env
- export ALLURE_LAUNCH_TAGS="${PIPELINE_DATE}, ${ALLURE_LAUNCH_TAGS}"
- export ALLURE_LAUNCH_NAME="${PIPELINE_DATETIME} - ${ALLURE_LAUNCH_NAME}"
- allurectl auth login --endpoint $ALLURE_ENDPOINT --token $ALLURE_TOKEN
- allurectl job-run start --launch-name "${ALLURE_LAUNCH_NAME}" --launch-tags "${ALLURE_LAUNCH_TAGS}"
- allurectl job-run env >> variables.env
artifacts:
reports:
dotenv: variables.env
expire_in: 2 days
Collect Stage: Basic Code Correctness Check
Another pressing issue was the lack of validation for the functionality of all tests when changes were made to the src/
folder. Often, code would make it into the master branch, which worked well for running tests in one service but broke tests in another. Issues like forgetting to update models, adding or removing arguments in functions or methods, and similar problems were common. We needed a solution that would quickly show whether the automated tests were in working order.
At that time, I was familiarising myself with the framework, and my first step was to attempt running tests for each service. I ran tests using pytest with the command pytest {path_to_suite} --collect-only
to check code correctness during the initial familiarisation phase.
The --collect-only
option in pytest collects and displays a list of all the tests that will be executed without actually running them.
This idea was useful, so I created a separate stage for this check. This allowed the entire team to quickly verify that pytest could correctly collect all tests.
Implementing the collect stage reduced the time spent fixing "simple" coding errors and issues arising from incorrect configurations.
.collect: &collect
stage: collect-tests
tags:
- docker
image: $IMAGE_TAG
before_script:
- mkdir log || true
artifacts:
name: "$CI_JOB_NAME"
paths:
- log/
- pytest.log
expire_in: 3 days
eb_collect:
<<: *collect
script:
- pytest tests/core/ --collect-only
rules:
- if: '$CI_PIPELINE_SOURCE == "merge_request_event" || ($CI_PIPELINE_SOURCE == "push" && ($CI_COMMIT_BRANCH == "master" || $CI_COMMIT_BRANCH == "dev"))'
changes:
- tests/core/**/*
- src/**/*
Code Static Analysis Stage
During MR reviews, discussing coding styles with colleagues was common, as everyone has their personal preferences. To avoid unnecessary debates on this topic, the idea emerged to standardise the coding style within the project. I added a static code analysis stage (linting, here after referred to as linters).
For our Python project, we chose the standard set of tools: black, isort, and flake8. However, we decided to abandon black due to its strict rules requiring extensive rewriting of old code, and similarly, we opted not to use mypy at this time. We plan to revisit these tools in the future.
Installing the libraries, configuring settings, and addressing initial errors went relatively quickly. More time was spent discussing and agreeing on a unified style within the team.
During the first month after introducing the linters stage, the configuration was set to allow_failure: true
, which allowed the pipeline to continue even if the job failed. This period was given to testers to analyse and fix errors.
Once the basic rules were established and the most common code errors were corrected, the next step was to switch the linters stage to enforce strict code checking.
The implementation of the static code analysis stage allowed us to address several issues at once:
Accelerated the MR review process as we no longer needed to focus on stylistic errors in the code.
Improved overall project code quality
Eliminated simple errors in the code, such as incorrect imports, calls to non-existent methods or variables, and many other common mistakes.
In addition to the linters, to provide testers with quicker feedback, we decided to implement the pre-commit tool. Pre-commit - is a tool for automatically running linters, tests, and other checks before each commits to the version control system. It ensures that code meets established standards and does not contain simple errors before it gets committed to the repository. Checks are performed on each local commit to the active branch.
Smoke Test Stage
The next step was to implement a smoke test stage.
The main challenge was determining which tests should be considered smoke tests. During testing, any service under test actively interacts with other services. Many tests require a preliminary setup of the environment and test data. The tests were complex and had many dependencies. Mocks and stubs were not available at that time, making it challenging to identify a minimal set of tests that would accurately indicate whether a specific service was ready for comprehensive testing.
After discussing with the team, we partially identified a set of smoke tests for some services. We marked these tests with @pytest.mark.smoke
and separated their execution into a distinct stage in GitLab CI. The failure of any test in the smoke stage halts further pipeline progression, as there is little value in running a full test suite if critical functionality is broken.
As a result, we started receiving feedback on test failures much faster. This rapid feedback was especially valuable for test suites that took over an hour to execute at the time. It significantly accelerated identifying and fixing errors, improving overall code quality and development efficiency.
Another benefit of analysing tests and attempting to identify a set of smoke tests was that testers realised that tests should be as atomic as possible.
Health checks Stage
In the previous sections, I implemented linters, the test collection stage, and smoke tests. These steps made our test code more robust and functional. However, we frequently encountered issues where a service was unavailable due to infrastructure problems, such as updates or repairs. This often led to tests failing, but the issue was not on our side.
At the same time, our company began migrating the tested services to Kubernetes. Consequently, we decided to advance the idea of implementing API requests to check the status of services, such as status endpoints and health checks. This led to the creation of automated tests to verify the service status, and a separate health checks stage was added to the pipeline. These tests were placed in a separate health checks folder.
The health checks stage was set to run after the linters and collection stages but before the main test execution, as there is no point in starting tests if the service is unavailable.
With the introduction of the healthchecks stage, we began receiving feedback on pipeline failures more quickly, and testers spent less time identifying the cause of the issues.
.healthcheck_def: &healthcheck_def
stage: healthcheck
tags:
- docker
image: $IMAGE_TAG
script:
- $TIMEOUT pytest $PYTEST_ARGS $HEALTH_CHECKS_SET
artifacts:
name: "$CI_JOB_NAME"
paths:
- log/
- pytest.log
expire_in: 14 days
healthcheck_for_article:
<<: *healthcheck_def
variables:
HEALTH_CHECKS_SET: "tests/healthchecks/test_health_for_article.py"
rules:
- *base_rules
- if: '$CI_PIPELINE_SOURCE == "merge_request_event" || ($CI_PIPELINE_SOURCE == "push" && ($CI_COMMIT_BRANCH == "master" || $CI_COMMIT_BRANCH == "dev"))'
changes:
- tests/core/suite_for_article/**/*
- tests/healthchecks/test_health_for_article.py
Internal Tests Stage
Here's a brief overview of the internal tests stage. In addition to the code written for interacting with services, our repository contains a folder with scripts used for preparing test data and various internal utilities. To monitor the functionality of these scripts, we introduced the Internal tests stage.
During this stage, unit tests are executed to verify the functionality of these scripts. The stage is triggered only if there are changes in the scripts folder within the MR.
Debugging in GitLab CI
Sometimes, to verify a theory or debug an issue, we wanted to run code within GitLab CI. It was time to recall the magical "Run pipeline" button.
The new idea involved the ability to custom-run any job with a shortened version of the pipeline to skip certain checks. My colleague suggested using "predefined variables" or "Prefill variables" for this purpose.
After discussing with the team and reviewing the ideas, we implemented an approach that allowed any team member to go to the Pipelines tab, click the "Run pipeline" button, and, by selecting predefined options, create the desired test run. Below is an example of code where predefined variables are defined, allowing the selection of a test environment for running automated tests.
variables:
DEPLOY_ENVIRONMENT:
description: "Select the deployment target. Valid options are: 'test', 'stage'"
value: "test"
options:
- "test"
- "stage"
We also added the ability to run any tests without being tied to a specific service or configuration. This was implemented as a custom-run, where parameters could be provided to pass any arguments for running pytest (PYTEST_ARGS and TEST_PATH - such as directories, files, classes, logging levels, etc.).
.custom_def: &custom_def
stage: tests
tags:
- docker
image: $IMAGE_TAG
allow_failure: true
before_script:
- *printenv
rules:
- *rules-web
artifacts:
name: "$CI_JOB_NAME"
paths:
- log/
- pytest.log
expire_in: 2 days
when: always
custom_run:
<<: *custom_def
script:
- pytest $PYTEST_ARGS $TEST_PATH
During the custom run in the pipeline, only one job is executed in the Tests stage.
Summary
Below are the flowcharts of our pipelines to provide an overview of what was achieved. The first flowchart represents the full pipeline, which is run infrequently, primarily in MRs, where it's crucial to prevent bad code from merging, so all checks are included.
The following flowchart shows an example of implementing nightly (scheduled) runs and acceptance tests. The pipeline for both cases is the same: we check that the service is live, prepare the Allure report, and run all tests for the service.
Currently, I am working on optimising the entire pipeline to speed it up and reduce the total run time of automated tests to at most 5 minutes.
Thank you for reading, and I welcome your comments. If you want to discuss the article, let’s chat in the comment below.