Pull to refresh

CacheBrowser experiment: bypassing the Chinese firewall without a proxy by caching content

Level of difficultyMedium
Reading time7 min
Views918
Original author: Иван Сергеев

Today, a significant portion of all content on the Internet is distributed with the use of CDNs (Content Delivery Networks). At the same time, there is no research on how various censors extend their influence on such networks. Scientists from the University of Massachusetts analyzed possible methods of blocking CDN content using the example of the practices of the Chinese authorities, and also developed a tool to bypass such blocks.

We (specialists from proxy service) have prepared an overview material with the main conclusions and results of this experiment (translate of this material).

Introducion

Censorship is a global threat to freedom of speech on the Internet and free access to information. This is largely possible due to the fact that the Internet has borrowed the “end‑to‑end communication” model from the telephone networks of the 1970s. This allows you to block access to content or user communication without serious effort or cost simply based on the IP address. There are several ways to do it, from blocking the address with prohibited content itself to blocking the possibility for users to even find it out using manipulations with DNS.

However, the development of the Internet has also led to the emergence of new ways to disseminate information. One of them is the use of cached content to improve performance and speed up communications. Today, CDN providers handle a significant amount of all traffic in the world — just Akamai, the leader in this segment, accounts for up to 30% of global static web traffic.

A CDN network is a distributed system for delivering Internet content at maximum speed. A typical CDN network consists of servers in various geographical locations that cache content in order to “give” it to those users that are closest to this server. This allows you to significantly increase the speed of online communication.

In addition to improving the quality of service for end users, CDN hosting helps content creators scale their projects, reducing the load on infrastructure.

Censoring CDN content

Despite the fact that CDN traffic already accounts for a significant proportion of all information transmitted via the Internet, there is still almost no research on how censors in the real world approach its control.

The authors of the study began by researching censorship techniques that can be applied to CDNs. Then they studied the real mechanisms used by the Chinese authorities.

First, let's talk about possible censorship methods and the possibility of using them to control CDNs.

Filtering by IP

This is the simplest and most inexpensive Internet censorship technique. Using this approach, the censor determines and blacklists the IP addresses of resources hosting prohibited content. Then the controlled Internet providers stop delivering packets sent to such addresses.

P‑based blocking is one of the most common methods of Internet censorship. Most commercial network devices are equipped with functions to implement such blocks without serious computing costs.

However, this method is not very suitable for blocking CDN traffic due to some properties of the technology itself:

  • Distributed caching — to ensure the best availability of content and optimize performance, CDN networks cache user content on a large number of edge servers placed in geographically distributed locations. To filter such content based on IPs, the censor will need to find out the addresses of all edge servers and blacklist them. This will affect the main properties of the method, because its main advantage is that in the usual scheme, blocking one server allows you to “cut off” access to prohibited content for a large number of people at once.

  • Shared IPs — commercial CDN providers share their infrastructure (i.e. edge servers, mapping system, etc.) between multiple clients. As a result, the prohibited CDN content is uploaded from the same IP addresses as the non‑prohibited content. As a result, any attempt to filter IPs will lead to a huge number of sites and content, that censors are not interested in, to be blocked.

  • Highly dynamic IP assignment — to optimize load balancing and improve the quality of service, edge servers and end users are mapped very quickly and dynamically. For example, Akamai updates the returned IP addresses every minute. This will make it almost impossible to link addresses with prohibited content.

DNS interference

In addition to IP filtering, DNS interference is another popular way of censoring. This approach implies that the actions of censors are aimed at ensuring that users do not find out the IP addresses of resources with prohibited content at all. That is, the intervention is at the domain name resolution level. There are several ways to do it, including hacking DNS connections, using DNS poisoning techniques, and blocking DNS requests to banned sites.

This is a very effective way of blocking, but it can be bypassed if you use non‑standard DNS resolution methods, for example, out‑of‑band channels. Therefore, censors usually combine DNS blocking with IP filtering. But, as it was mentioned above, IP filtering is not effective for censoring CDN content.

URL/Keywords filtering using DPI

Modern network activity monitoring equipment can be used to analyze specific URLs and keywords in transmitted data packets. This technology is called DPI (deep packet inspection). Such systems find mentions of forbidden words and resources, after which interference in online communication happens. As a result, the packets are simply dropped.

This method is effective, but it is more complex and resource‑intensive, since it requires defragmentation of all data packets sent within certain streams.

CDN content can be protected from such filtering in the same way as “regular” content — in both cases, the use of encryption (i.e. HTTPS) helps.

In addition to using DPI to search for keywords or URLs of prohibited resources, these tools can be used for more advanced analysis. These methods include statistical analysis of online/offline traffic and analysis of identification protocols. These methods are extremely resource‑intensive and at the moment there is simply no evidence of their use by censors to a sufficiently serious extent.

Self-ctnsorship of CDN providers

If the censor is a state, then it has every opportunity to prohibit those CDN providers working in the country that do not obey local laws governing access to content. There is no way to resist self‑censorship — therefore, if a CDN provider is interested in working in the country, it will be forced to comply with local laws, even if they restrict freedom of speech.

How China censors CDN content

The Great Chinese Firewall is reasonably considered to be the most effective and advanced system for ensuring Internet censorship.

Research methodology

The scientists performed experiments using a Linux node located inside China. They also had access to several computers outside the country. First, the researchers checked that the node was censored, similar to that applied to other Chinese users — to do that, they tried to open various prohibited sites using this machine. So the presence of the same level of censorship was confirmed.

The list of sites that use CDN and are blocked in China was taken from the site GreatFire.org. Then for each case they carried out the analysis of the blocking method.

According to open data, the only major player in the CDN market with its own infrastructure in China is Akamai. Other providers included in the study are CloudFlare, Amazon CloudFront, EdgeCast, Fastly, and SoftLayer.

During the experiments, the researchers found out the addresses of Akamai's edge servers inside the country, and then tried to get cached authorized content through them. It was not possible to get access to prohibited content (they were getting the HTTP 403 Forbidden error) — obviously, the company conducts self‑censorship in order to preserve the opportunity to work in the country. At the same time, access to these resources remained open outside the country.

Providers without infrastructure in China do not use self‑censorship for local users.

In the case of other providers, DNS filtering has become the most frequently used blocking method — requests to blocked sites are resolved to incorrect IP addresses. At the same time, the firewall does not block the edge CDN servers themselves, since they store both prohibited and permitted information.

And if in the case of unencrypted traffic, the authorities have the opportunity to block individual pages of sites using DPI, then when using HTTPS they can only prohibit access to the entire domain. This leads, among other things, to blocking of permitted content.

In addition, China has its own CDN providers, among them are such networks as ChinaCache, ChinaNetCenter, and CDNetworks. All these companies fully comply with the laws of the country and block prohibited content.

CacheBrowser: a tool for bypassing blocks using CDN

As the analysis showed, it is quite difficult for censors to block CDN content. Therefore, the researchers decided to go further and develop an online blocking bypass tool that will not use the proxy technology.

The main idea of the tool is that censors have to interfere with DNS to block CDNs, but it is not really necessary to use domain name resolution to upload CDN content. Thus, a user can get the content they need by contacting the edge server on which it is already cached directly.

The diagram below shows the system composition.

Client software is installed on the user's computer, and a regular browser is used to access the content.

When requesting a URL or part of the already requested content, the browser sends a request to the local DNS system (LocalDNS) to get the IP address of the hosting. Regular DNS is requested only for domains that are not yet contained in the LocalDNS database. The Scraper module constantly goes through the requested URLs and searches for potentially blocked domain names in the list. Then the Scraper accesses the Resolver module to resolve newly discovered blocked domains, this module performs the task and adds an entry to LocalDNS. Then the DNS cache of the browser is cleared so that the existing DNS records for the blocked domain are deleted from it.

If the Resolver module cannot figure out which CDN provider the domain belongs to, it will ask the Bootstrapper module for help.

How it works in practice

The client software of the product was implemented for Linux, but it can be easily ported, for Windows as well. A regular Mozilla Firefox is used as a browser. The Scraper and Resolver modules are written in Python, and the Customer‑to‑CDN and CDN‑toIP databases are stored in.txt files. The LocalDNS database is a regular /etc/hosts file on Linux.

As a result, for a blocked URL of the form blocked.com the script will get the IP address of the edge server from the /etc/hosts file and send an HTTP GET request to access BlockedURL.html with Host HTTP header fields:

blocked.com/ and User‑Agent: Mozilla/5.0 (Windows NT 5.1; rv:14.0) Gecko/20 100 101 Firefox/14.0.1

The Bootstrapper module is implemented with the use of a free tool digwebinterface.com. This DNS resolver cannot be blocked and it responds to DNS queries on behalf of many geographically distributed DNS servers in various network regions.

With this tool, the researchers managed to access Facebook from their Chinese node — although the social network has long been blocked in China.

Conclusion

The experiment showed that the problems that censors experience when trying to block CDN content can be used to create a system to bypass blocking. Such a tool allows you to bypass blocks even in China, where one of the most powerful online censorship systems operates.

Tags:
Hubs:
Total votes 1: ↑1 and ↓0+3
Comments0

Articles