What's New in the Angie 1.9 Web Server (an nginx fork) and What to Expect from 1.10? / Habr

Colleagues meeting in the lobby of our office (AI-processed in Miyazaki style)

You may have already read in the news that on the eve of Cosmonautics Day, a new stable release of Angie 1.9.0 was released, an nginx fork that continues to be developed by the team of former nginx developers. Approximately every quarter, we try to release new stable versions and delight users with numerous improvements. This release is no exception, but it's one thing to read a dry changelog and quite another to get to know the functionality in more detail, to learn how and in which cases it can be applied.

The list of innovations that we will discuss in more detail:

Saving shared memory zones with cache index to disk;
Persistent switching to a backup group of proxied servers;
0-RTT in the stream module;
New busy status for proxied servers in the built-in statistics API;
Improvements to the ACME module, which allows automatic obtaining of Let's Encrypt TLS certificates and others;
Caching TLS certificates when using variables.

With one exception, all of this is included in the latest open source version of Angie and is available right now in our repositories.

And in addition, I'll briefly anticipate expectations for the future Angie 1.10 version, which should appear somewhere in late June - early July.

Saving the shared memory zone with cache index to disk

One of the most interesting innovations in the fresh 1.9 release is the ability to save the shared memory zone that stores the cache index to disk.

For this purpose, in the proxy_cache_path directive, in addition to the name and size of the zone, you can now specify the path to the file:

proxy_cache_path  my_cache keys_zone=my_czone:256m:file=/path/to/my_cache.zone;

When shutting down, all data from the zone will be unloaded to the file, and when starting up, it will be loaded back from the file into shared memory.

What does this give us? Previously, to load the cache after server restart, a special process was used, the so-called cache loader. It traversed all files in the cache directory and loaded the metadata into the index stored in shared memory. When there are a lot of files, this process can take minutes, hours, or even days, while creating additional, sometimes significant load on the hard disk. By the way, in our monitoring, you can observe this process: when the cache is not yet reloaded, the value of the cold field will be true. Enabling such a server is not possible in all cases; sometimes it can increase delays to an unacceptable level. As a result, after restarting the server, you have to wait all this time before returning the server to service.

But if the index stored in shared memory is saved on disk as a single file, then unlike minutes, hours, and sometimes even days, the process of loading this file into memory takes just moments. Thus, loading the cache using the "cache loader" process is not required, and traffic can be directed to the server immediately after starting Angie.

By the way, the saved index along with the cache can be transferred to another server, provided that the architectures and build parameters match.

In future releases, we plan to add saving shared memory zones to other modules as well.

Persistent switching to a backup group of proxied servers (PRO only)

Another interesting feature that has been added, but this time only to the PRO version, concerns the HTTP load balancer. The new backup_switch directive now allows changing the logic of switching to a backup group of servers:

upstream backend {
    zone backend 1m;

    backup_switch permanent=5m;

    server backend1.example.com;
    server backend2.example.com;
    server backend3.example.com;

    server backup1.example.com   backup;
    server backup2.example.com   backup;
    server backup3.example.com   backup;
}

By default, for each incoming request, Angie first tries to find a live server in the main group (without the backup option), and if there are no live servers among them, it then searches among the backup group. Thus, as soon as one of the servers in the main group becomes available, all subsequent traffic will be directed to it immediately. In some configurations, such behavior is not acceptable, and in case of switching to a backup group, it is required to continue balancing requests to it. This mode is enabled by the directive:

backup_switch permanent;

In this case, once encountering that there are no live servers in the main group, Angie will switch to the backup group and continue to balance traffic to it as long as there are live servers in the backup group or until the configuration is reloaded.

If a timeout is specified in the permanent option, for example, permanent=5m, then attempts to select a server in the main group will occur, but not more often than once in the specified interval. If such an attempt is successful, it will switch back to the main group. This mode is a compromise and will protect against too frequent switching between the backup and main group in case such switching is still required, but the servers in the main group remain unstable or remain overloaded.

In addition, a new backup_switch object has appeared in the statistics API, which contains a numeric field active. This object is only available when the backup_switch permanent mode is enabled. The active field displays the current active group: 0 if the main group is active and 1 in case of backup. In the extended version of the load balancer Angie ADC, more than 1 backup group can be specified, so the value of the active field in this case can be >1.

If the option is set with a time interval (for example, permanent=1m), then when switching to a backup group, the backup_switch object also includes a timeout field, which contains the number of milliseconds until the next attempt to switch to the main group.

In future versions, we plan to implement the ability to configure even more complex logic for switching between backup groups (for example, by the number of available servers in the group, the load on them, etc.), as well as a new interface in the API that will allow forcing group switching by command from outside.

Next, let's talk about simpler but useful functions that have been implemented in both the open source version and PRO.

0-RTT in the stream module

The ability to use the TLS 1.3 Early Data (0-RTT) mechanism, which was previously only available in the HTTP module, has been added to the stream module. It is enabled with the same directive:

ssl_early_data on;

This mechanism allows the client to send data before completing the TLSv1.3 connection negotiation stage, which reduces the delay from the moment of connection to the moment of data processing during TLS termination. But it should be kept in mind that this mechanism is vulnerable to so-called replay attacks, so it can only be enabled if the web service itself or the protocol that is proxied through the stream module is protected against such attacks.

New "busy" status for proxied servers

The next innovation is the new busy status, which can be displayed in the state field for proxied servers in our statistics API. This status can only be seen if the server configuration in the upstream group has a max_conns limitation:

server backend.example.com:80 max_conns=128;

As you might guess, if the server hits the connection limit, its state will change to busy until the number of connections drops below the limit (or it turns off and transitions to one of the other modes: down/unhealthy/unavailable). Previously, instead of busy, the server remained in the up status while it was alive, but requests could not be directed to it due to the max_conns limitation. Therefore, for better indication of this state and the reasons why new requests are not directed to the server, an additional status was added.

ACME module improvements

The next group of improvements is related to the further development of functionality for automatic issuance of ACME certificates. They serve to improve the user experience with this module and were developed based on feedback from our Telegram community.

A new parameter renew_on_load has been added to acme_client, which makes it much easier to forcibly start the certificate renewal process when loading the configuration, if for some reason it is needed (for example, if a key leak becomes known).

The operation mode of the enabled=off option has also changed. It has become more user-friendly and now allows temporarily disabling certificate renewal, but if they were previously issued, access to them is preserved and they can continue to be used in the configuration.

The approach to validating ACME directive configurations has also been revised in the new version. Previously, acme_client directives required at least one acme directive in server blocks, which was not always convenient. Often, users generate Angie configuration dynamically, or connect certain configuration sections from separate files, and since acme_client and acme directives are set at different levels, a situation can easily arise where in one version of the configuration or another, there is not a single acme directive referring to acme_client. Previously, such configurations were considered invalid, but now the correctness check is more targeted, and an error will only occur if the acme directive was specified in a server block that does not have any valid domain name in server_name.

By the way, the easiest way to get acquainted with the capabilities of automatic TLS certificate issuance is to read the guide. In particular, from it you can learn that for issuing wildcard certificates, Angie can handle DNS requests from the ACME server itself - this significantly simplifies the setup.

Caching TLS certificates when using variables

And of course, we ported all the functionality from nginx 1.27.4, including the most significant innovation: caching dynamic TLS certificates that are set by variables. This functionality is configured using the corresponding directives: ssl_object_cache_inheritable, ssl_certificate_cache, proxy_ssl_certificate_cache, grpc_ssl_certificate_cache and uwsgi_ssl_certificate_cache, and allows reducing CPU load and delay when using variables in ssl_certificate_cache, proxy_ssl_certificate and similar directives. This is very relevant in combination with our ACME module, where certificates are set through variables:

ssl_certificate_cache max=10 inactive=6h valid=1d;

server {
    listen 443 ssl;

    server_name example.com www.example.com;

    acme example;

    ssl_certificate $acme_cert_example;
    ssl_certificate_key $acme_cert_key_example;
}

Other useful minor improvements

Among minor improvements: we added the display of the compilation date and time, both in the statistics API interface and in the command line output with the -V flag. This functionality is especially useful to better distinguish between individual builds in our "nightly" build repositories, which have recently become available to everyone: https://download.angie.software/angie-nightly/ with all third-party modules. But this will also be useful for those who build Angie independently from source.

In addition, we fixed an issue with the inheritance of proxy_ssl_certificate and proxy_ssl_certificate_key directives, which manifested when using variables in them in builds with NTLS (i.e., the Tongsuo TLS library).

That's probably all regarding the 1.9.0 release.

Conclusion and plans for the next release

Starting with 1.10.0, we have a lot of interesting things coming. In particular, the ability to automatically update the list of projected servers in upstream blocks based on Docker container labels is almost ready and is currently under review. Thanks to it, Angie will be able to automatically connect to the Docker API and monitor events. If you start a container with the appropriate label, its IP address with the specified port will immediately be added to the list of servers in the upstream block without reloading the configuration, and in case of container stopping - removed. And a container put on pause will transfer the corresponding server to the down state.

Currently, the list of servers in upstream groups can be updated via DNS, including from SRV records, and in the PRO version additionally via Rest API configuration.

Also coming soon is a new module for customizing collected statistics metrics, which will allow users to create various counters in the configuration themselves, calculated from variables or their combinations. Thus, it will be possible to count everything that imagination allows. For example, set up a metric that would calculate the average traffic volume attributable to the top most requested URIs. Why not? Whatever you configure - that's what will be calculated, resulting in very flexible statistics. It can also be requested through our API in JSON format, or delivered directly to Prometheus.

Customization of statistics, interaction with the Docker API, and other no less interesting functional capabilities will appear in the next version, which we try to stably release at least once every 3 months.

In addition, I hope that I will have enough time to bring to release the long-awaited application launch capability, which I announced in my talk at the last HighLoad++ conference.

Thank you for your attention. Stay tuned!