A brief overview of XHTTP for VLESS: what, why, and how / Habr

We were asked to talk about ~~the protocol~~ technology XHTTP in the context of XRay, VLESS, and others. You asked for it, so here it is!

Unfortunately, there is almost no official documentation on XHTTP, except for a very weird post on GitHub, which is written in a rather peculiar way and is not very clear to people who are 'not in the know'. There is also a Russian translation from Chinese in the comments, but it's so bad that it would be better if it didn't exist (although you can read it for a good laugh — just the 'TCP head blocking' is worth a mention).

First, a bit of history. The classic use of VLESS and similar proxy protocols (including with XTLS-Reality) involves the client connecting directly to a proxy server running on some VPS. However, in many countries (including Russia), entire subnets of popular hosting providers have started to be blocked (or throttled), and in other countries, censors have begun to monitor connections to 'single' addresses with high traffic volumes. Therefore, for a long time, ideas of connecting to proxy servers through CDNs (Content Delivery Networks) have been considered and tested. Most often, the websocket transport was used for this, but this option has two major drawbacks: it has one characteristic feature (I won't specify it here to not make the RKN's job easier), and secondly, the number of CDNs that support websocket proxying is not that large, and it would be desirable to be able to proxy through those that do not.

Therefore, first in the well-known Tor project for bridges, the meek transport was invented, which allowed data to be transmitted using numerous HTTP request-response pairs, thus allowing connections to bridges (proxies) through any CDN. A little later, the same transport was implemented in the briefly resurrected V2Ray. But meek has two very significant drawbacks that stem from its operating principle: the speed is very low (in fact, we have half-duplex transmission and huge overhead from constant requests-responses), and due to the huge number of GET/POST requests every second, free CDNs can quickly kick us out, and paid ones can present a hefty bill.

The authors of XRay, however, came up with an excellent idea on how to improve this mechanism: what if we split the reception and transmission into two connections — one for receiving, one for sending? This is how the SplitTunnel transport appeared, which, as a result of further development, turned into XHTTP.

Let me clarify two things right away. First, XHTTP is not a protocol, but a transport. It is usually used for the well-known VLESS protocol, although in theory it can be used for others. Second, XHTTP was originally designed to work through a CDN, but in fact it can be used without a CDN for a direct connection to the server, and this also provides certain advantages.

XHTTP has three modes of operation:

packet-up — the slowest mode (but still much faster than meek), but compatible with almost all web servers and CDNs. It uses one connection for transmitting data from the server to the client, and many short-lived connections for transmitting data from the client to the server
_{(here and further, 'connection' does not specifically mean a TCP connection, but an HTTP 'request-response' session, many of which can exist within a single TCP connection, but in any case, the TCP connections for 'outbound' and 'inbound' traffic will be different)}
stream-up — faster, but with worse compatibility — it only gets through certain (or specifically configured) web servers. One long-lived connection for 'client→server' transmission, one long-lived connection for 'server→client' transmission.
stream-one — the only mode that does not split connections for 'outbound' and 'inbound' traffic, but always transmits data in both directions within a single connection. In essence, it's the same as a regular VLESS with an HTTP header (there used to be such a transport in XRay). It only gets through Nginx with the directive grpc_pass and through Cloudflare with the gRPC feature enabled (in fact, gRPC is not actually used there, it's just a decoy so that Nginx and CF do not interfere with the transmitted requests-responses).

So what are the advantages of XHTTP?

The ability to connect to the server not only via HTTPS, but also via QUIC (you need to explicitly set ALPN on the client h3) — this can work more effectively on an unstable connection or if HTTPS is blocked, but they forgot to block QUIC;
The ability to connect to the proxy server using TLS v1.2 while having the fingerprint of a real web server. Let me explain this in more detail. A fingerprint is a set of specific characteristics in the TLS implementation and the overall behavior of a web server, its distinctive features, so to speak. The fingerprints of an XRay server and a real web server (e.g., Nginx, Caddy, etc.) are different, and this can be used to detect a proxy. To avoid this, XTLS-Reality is usually used in the so-called 'steal from yourself' mode — when you masquerade as your own web server running on the same host. The problem is that XTLS-Reality only works over TLS v1.3, and the RKN has been caught more than once blocking TLS v1.3 towards popular hosting providers. Using XHTTP, you can connect to the proxy over TLS v1.2, and the server's fingerprint will be authentic — because a real Nginx will be listening on port 443, and XRay will be behind it.
The ability to connect through CDNs, including those that do not support Websockets and gRPC. Moreover, some CDNs still allow domain fronting, which MiraclePtr talked about in his article. This way, you can masquerade as someone else's domain even better than with XTLS-Reality — now when you connect to the proxy, you not only connect with the SNI of some other domain and receive its real certificate in response, but you also connect to the exact IP address that this domain resolves to, and you are not alone — thousands of other people from all over the country visiting this site connect to this address with the same SNI. Perfect.
You can use browser dialer. This solves a similar fingerprint problem as in point 2, but for the client, not the server. XRay and the like use the uTLS library to masquerade as the fingerprints of popular browsers, but this masquerade is not 100% perfect. With a browser dialer, the scheme looks like this: you open a page with a special script in your browser, and your XRay client connects not directly to the proxy, but first to the browser, and then the script from the browser connects to the proxy. As a result, the client's fingerprint and behavior are as similar to a real browser as possible — because the client is, in fact, the real browser itself.
And finally, the best part. In packet-up and stream-up modes, the transmitted data is split into at least two 'outbound' and 'inbound' connections, which greatly complicates the analysis of traffic patterns and all sorts of TLS-inside-TLS. But that's not all. Who said that these two 'outbound' and 'inbound' connections must be established in the same way and to the same address? XRay allows you to configure them independently. As a result, you can send data 'outbound' via QUIC and receive it 'inbound' via HTTPS. Or send data 'outbound' over IPv6 and receive it 'inbound' over IPv4. Or connect 'outbound' and 'inbound' to different IPv6 addresses. You can even have two connections to two different servers with different hosting providers in different countries. You can send data one way directly and the other way through a CDN. And of course, in all these cases, you can use different domains. In short, as far as your imagination goes. All this makes analysis and blocking much more complicated.
A few important points:
1. Along with XHTTP, a new multiplexing algorithm, XMUX, was introduced. There's nothing particularly interesting there, but also nothing complicated — those interested can use an auto-translator to translate the GitHub article from the link above into English, it's mostly understandable.
2. XHTTP cannot be used with XTLS-Vision. Protection against tls-inside-tls detection is provided by multiplexing (the same XMUX from the previous point) and splitting the 'outbound-inbound' streams (as mentioned above).
3. XHTTP can be used with XTLS-Reality (in this case, stream-one mode will be selected by default unless otherwise specified).
4. XHTTP is under active development, so it's important that the XRay versions on the client and server are the same — otherwise, you might experience strange glitches, or it might not work at all.
5. Clients based on Sing-box do not support XHTTP. Those based on XRay do. v2raN and v2rayNG work without any problems.
6. An example configuration of XHTTP for the XRay client and server, as well as for Nginx, can be found in the xray-examples.

Instead of a conclusion. The most important thing

Over the past year, we have observed that the RKN, unable to cope with protocol detection (which can be detected by indirect signs or deep statistical analysis, but this requires brains and money, and while they seem to have money, the brain situation is apparently not so good), has switched to cruder measures with stronger side effects, which we could sometimes observe in some regions: blocking HTTPS towards popular hosting providers, blocking TLS v1.3 to them as well, blocking some CDNs, blocking unidentified protocols, shaping SSH connections after exceeding a certain amount of transferred data, and even temporarily (from 15 minutes to a couple of days) strengthening blocks for specific subscribers who have triggered some DPI heuristics.

Therefore, the conclusion suggests itself that the future tactics for fighting censorship should not be based on inventing more sophisticated protocols and masking methods, but on exploiting the features of the blocking mechanisms themselves — at least in such a way that their exploitation becomes more and more expensive, and implementing them becomes more and more difficult.

For example, currently, in most cases, detection and blocking are not applied to traffic within the country. This suggests an option where the first destination point when connecting to a proxy could be some cheap domestic VPS (to which you can connect in any way you like, even with old shadowsocks, or with Reality masquerading as a Russian website), and from there the traffic would go to a proxy abroad.

And if the hosting provider also has a DPI and starts to go wild — we remember about the reverse proxy, bridge, portals mechanism, which MiraclePtr also talked about — because 'return' traffic is also usually not filtered, and filtering it would require far more resources. And by the way, since CDNs are mentioned in the article, here's another hint: CDNs don't have to be foreign. Think about it.