The National Internet Segment Reliability Research explains how the outage of a single Autonomous System might affect the connectivity of the impacted region with the rest of the world. Most of the time, the most critical AS in the region is the dominant ISP on the market, but not always.
As the number of alternate routes between AS’s increases (and do not forget that the Internet stands for “interconnected network” — and each network is an AS), so does the fault-tolerance and stability of the Internet across the globe. Although some paths are from the beginning more important than others, establishing as many alternate routes as possible is the only viable way to ensure an adequately robust network.
The global connectivity of any given AS, regardless of whether it is an international giant or regional player, depends on the quantity and quality of its path to Tier-1 ISPs.
Usually, Tier-1 implies an international company offering global IP transit service over connections with other Tier-1 providers. Nevertheless, there is no guarantee that such connectivity will be maintained all the time. For many ISPs at all “tiers”, losing connection to just one Tier-1 peer would likely render them unreachable from some parts of the world.
The Methodology of Internet Reliability Measurement
Examining a case when an AS experiences network degradation, we want to answer the following question: “How many AS’s in the same region would lose connectivity with Tier-1 operators and their global availability along with it?”
Throughout the years we model such a situation because at the dawn of BGP and interdomain routing design its creators assumed that every non-transit AS would have at least two upstream providers to guarantee fault tolerance in case one of them goes down.
However, the current reality is different; with less than half of all ISPs in the world having only one connection to an upstream transit provider. A range of unconventional relationships among transit ISPs further reduces availability.
Have transit ISPs ever failed? The answer is yes, and it happens with increasing frequency. The more appropriate question is — under what conditions would a particular ISP experience service degradation so severe we would call it an outage? If such problems seem unlikely, it may be worth considering Murphy’s Law: “Anything that can go wrong, will”.
To model such a scenario, we have applied the same model for the fourth year in a row. Although again, we did not merely repeat previous calculations — the research is expanding over the years.
The following steps were taken to rate AS reliability:
- For every AS in the world, we examine all alternate paths to Tier-1 operators with the help of an AS relationship model, the core of Qrator.Radar;
- Using the Maxmind GeoIP database, we matched countries to every IP address of every AS;
- For every AS we calculated the share of its address space that corresponds to the relevant region. ISP’s were filtered out that reside at an Internet Exchange point in a region where they do not have a significant presence. The example we are using here is Hong Kong, where traffic is exchanged among hundreds of members of HKIX — yet the biggest Asian Internet Exchange, most of which have zero presence in the local internet segment;
- After isolating regional ASs, we analyzed the potential impact of one’s outage on other AS’s as well as their respective countries;
- In the end, for each country, we identified the AS with the greatest/largest impact on other ASes in their region. Foreign AS’s were not considered.
- We took that AS’s impact value as a reliability score for the country. And used that score to rate reliability of countries. The less score is — the better reliability is.
The 2019 column represents the score that an individual country had in the 2019 rating on a certain position.
Long Story Short:
- The United States regained 10 out of 11 positions they lost in 2019, residing at position 8 in 2020;
- Four new countries entered the Top-20 of reliability rating: Liechtenstein, Japan, Indonesia and Argentina.
- Four countries effectively left the Top-20: Ireland, Bulgaria and Luxembourg and Czech Republic, which is residing at position 21 this year.
- Hong Kong dropped eight positions and closes the Top-20 in 2020;
- Singapore lost 11 positions.
- A longtime leader of the rating — Germany — gave way to Brazil, the 2020 leader of reliability rating.
- Every year exciting movements happen in the reliability rating, often corresponding to what is happening inside the respective regions.
First things first — the overall trend in global reliability, counted as an average and medium. This time we are looking at the five years of continuous research:
In 2020 the number of countries that successfully improved reliability score to under 10%, indicating high fault tolerance, increased by 5 for the second year in a row, reaching a total of 40.
As you can also see, the average reliability score is improving over time. However, the median stays at comparable levels since 2018 — the lower part of the rating does not improve quickly enough, compared to the upper half.
However, the most significant fact remains — for the period of our research, both IPv4 and IPv6 show significant improvements in reliability. Furthermore, there is an inevitable point in the future, where the IPv6 version of the rating would become the primary one.
In 2020 it seems that something has changed in the perception and adoption of the IPv6 protocol. Google obtains the most appropriate statistics we want to mention.
As of September 2020, almost 30% of Google users use the native IPv6 connection, which effectively translates into their ISPs supporting the v6 version of IP protocol.
Although the main issue with IPv6 still persists — that is the partial connectivity. Due to peering wars, not universal IPv6 adoption and other issues, the IPv6 still has the problem of limited network visibility. To better understand this, take a look at the IPv6 reliability versus the partial connectivity rate.
It is evident from this IPv6 Top-20 Reliability to Partial Connectivity Comparison chart that there are several countries where the partial connectivity in IPv6 exceeds 10%: Italy, Hong Kong, Ireland, Romania.
Looking at the partial connectivity combined with “classic” reliability percentage, showing the share of unavailable in case of an outage resources, we could state that in Hong Kong alone the IPv6 failure of AS3491 would result in 18% of IPv6-connected resources rendered unavailable. 16% in Ireland; almost the same in Italy and Romania. Those numbers are high even in Great Britain — 7.5%, Germany — 8%, United States — 15%.
The lowest value among IPv6 Top-20 belongs to Brazil — 4.66%, Netherlands — 4.72% and Japan — 5.24%.
It seems that in the year 2020 the tides turned and the IPv6 reliability, even considering the partial connectivity, looks better than that of IPv4. Average IPv4 reliability score in 2020 is 36.22%, and for IPv6 the same metrics is as high as 28.71% — and as we measure the outage impact, the lower the metric is — the better. However, it is necessary to mention that the country adoption for IPv6 is twice as low, as in the IPv4 case — the newer version of the protocol still has to go a long way to the total adoption.
Broadband Internet and PTR records
“Does a country’s leading ISP always influence regional reliability more than everyone else?” — this is the question we are trying to answer with the help of additional information and investigation. We suggest that the most significant (by user base or customer base) ISP in a region is not necessarily the most critical for the region’s network connectivity.
Two years ago, we started to analyze the PTR records. Generally, PTR records are used for Reverse DNS lookup: using the IP-address to identify the associated hostname or domain name.
Since we already know the largest AS’s for every country in the world, we could count the PTR records within their network and determine their share of overall PTR records for the corresponding region. We counted only PTR records and did not calculate the ratio of IP-addresses without PTR records to IP-address with them.
So, we are speaking strictly of IP-addresses with PTR records present. The practice of adding those is not universal; some providers do this and others do not.
In the PTR-based rating, we are looking at what part of PTR-enabled IP-addresses would go offline with an outage of each country’s AS and the percentage that represents the relevant region.
Such an approach that considers PTR-records yields very different results. In most cases, not only does the primary regional AS change, but the percentage is entirely different. In all of the generally reliable (from the global availability point of view) regions, the number of PTR-enabled IP-addresses that shut down following an outage of one autonomous system is dozens of times higher. That could mean that the leading national ISP always handles end-users at one point or another.
Thus, we should assume that this percentage represents the part of the ISP’s user base and customer base that would go offline (if switching to a second internet service provider were not possible) in the event of an outage. From this perspective, countries appear to be less reliable than they look from the transit point of view. We leave possible conclusions from this PTR-enabled rating to the reader.
ISPs With Only One Upstream (Stub networks) and Their Reliability
In seven out of the top twenty IPv4 Reliability Rating countries, we found a peculiar detail. Suppose we look for the largest provider for “stub networks”, which are essentially networks with only one upstream provider. In that case, we will find another AS and ISP, different from the one responsible for the current classical reliability metric for the corresponding national segment.
Here we highlighted the countries from both the top of IPv4 Reliability Rating and IPv6 Reliability Rating for 2020.
Let’s talk about the most visible differences between the critical AS in terms of global transit versus the primary upstream choice in a specific region. It is interesting to notice that rarely a critical AS for stub networks would not be the classical global critical AS simultaneously.
AS174 — Cogent — is special in IPv4. Cogent’s changes are always interesting to investigate — as a Tier-1 with a strong presence in Europe it has a tremendous responsibility, as it is also a critical AS for stub networks, as well as the global transit. In 2020 it is critical AS for IPv6 stub networks in France and Belgium, and in IPv4 Cogent is responsible for all the reliability metrics in Great Britain, France, Belgium (from the IPv4 top 20), but also Ireland’s and Vatican’s global reliability metrics.
However, the Hong Kong example is somewhat outstanding. The classical critical AS for the region’s IPv4 connectivity is AS3491 — PCCW Global, a Tier-1 ISP. However, the stub networks’ critical AS is an AS4515 — ERX-STAR — connected to both PCCW Global and Hong Kong IX. So for ERX-STAR, the situation is the following — if AS3491 somehow fails — it would still retain the regional connectivity through the IX, if, on the other hand, IX fails — the data would be still globally available through the Tier-1 network.
That is a particular example of how a significant and profound Internet Exchange in one region could be a substitute for the second strong upstream provider. If there is a big IX nearby, by connecting to it and to only one classical transit ISP you could get almost the same regional reliability as by connecting to the two transit upstreams. Although, once again, IX is not able to fully replace the Tier-1 connection in the global sense — It’s an excellent example of regional versus global connectivity.
AS6939 — Hurricane Electric — is the stub networks’ critical autonomous systems in both Hong Kong and the U.S.A. for IPv6.
In the IPv4 it also changes — in the U.S.A. the classical reliability arises from the position of AS3356, belonging to the CenturyLink ISP — but the critical AS for stub networks in v4 is AT&T’s AS7018 — another Tier-1 in the national segment of United States.
Details by Regions
One of the most important questions to ask ourselves while conducting the 2020 research was: “How was it possible for the U.S. segment to regain some reliability after the dramatical 11 positions drop last year and improve the fault tolerance of the national segment’s Internet, without swapping the ISP in question — CenturyLink?”
Well, the first thing is that CenturyLink could have lost some of its market value, and this is probably the main reason why the numbers improved. Although it is hard to say for sure, we tend to think that CenturyLink’s outages in the previous years, and the most recent one, probably motivated their customers to at least try to find some additional transit capabilities, if not refusing the further CenturyLink service at all.
Hong Kong’s position drop from 2019 to 2020 transition could be connected to the changing situation within the region, although PCCW’s market share almost evidently fluctuated during this year.
We outlined the Singaporean change in critical AS for the region in the last year’s research and, looking at the current change; we could state that AS4657 could be conquering the market further, consolidating as the primary ISP of the country — this would inevitably lead to the drop of the reliability score.
As we write this research from the Czech Republic, we feel ourselves obligated to highlight the crucial detail about CZ’s Internet segment. It is probably the only region we know for sure that does not have a single ISP business behind the critical autonomous system for the country. Instead, AS47232 — ISPAlliance — which is critical all around the metrics (classical global transit and stub networks), is a free and voluntary union of smaller ISPs within the Czech Republic. As the company writes on its “Aims and vision” page:
“Local telecommunications operators face many difficulties on the market, which put them at a disadvantage vis-à-vis the “big brands” of multinational competition and those who do not play fair. ISP Alliance a.s. we set it up to overcome these difficulties.”
It does help to overcome the presence of much more significant players in the national segment! That is a good example of a group unification over a reasonable common goal, resulting in a good reliability score, as even with the entering of the Top 20 by four new countries in 2020, Czech Republic is at the position 21 with the reliability score of 6.5%. Moreover, such a small (compared to the other leaders’ size) national segment taking eight place in the quantity of IPv6 enabled ASes is simply a thing that should be thanked for to everyone involved in building and maintaining computer networks.
Thank you for reading the Reliability Research! In case you have any questions, feel free to contact us on firstname.lastname@example.org.