A Comprehensive Overview of Web Scraping Legality: Frequent Issues, Major Laws, Notable Cases

Table of Contents

Is It Legal to Scrape a Web Page and Use Its Data?

1. Types of Data You Are Scraping
1.1. Public Data
1.2. Personal Data
1.3. Copyrighted Data
2. Frequent Web Scraping Legal Issues
2.1. Copyright Infringement
2.2. CFAA Violation
2.3. Trespass to Site Security
3. Existing Legislation
3.1. GDPR
3.2. US Privacy Act
3.3. EU versus US Laws Comparison
4. Precedent Cases
4.1. eBay v. Bidder’s Edge
4.2. The FTC v. Facebook
4.3. hiQ Labs v. LinkedIn

Closing Thoughts

Introduction

Being as old as the internet, web scraping technology is today the backbone of many marketing and lead-generation strategies. But the question of its legality arises more and more often, especially in relation to some high-impact cases.

There is no explicit answer to the question, and there are certain gray areas to the web scraping legal issues. That’s why we’ve made up our minds to explain the matter in more detail.

Is It Legal to Scrape a Web Page and Use Its Data?

This problem is, in fact, multifaceted, and each aspect of it deserves some special attention.

1. Types of Data You Are Scraping

The types of data you are scraping are quite variable, and though in most cases scraping is legal, your activities can be classified as illegal or fall into a gray area in particular situations.

1.1. Public Data

Scraping public sites is a completely legal practice. The logic is quite clear: the entry of a web scraping bot does not differ from the entry of the browser and only open data is provided in both cases. Though many site owners make attempts to technically protect their open information from the competitors’ crawlers, legally scraping such sites is neither theft nor illegal conduct.

The two types of data that merit concern and caution are personal and copyrighted data.

1.2. Personal Data

PII — personally identifiable information is any data that can be used for identifying a specific person. Name, date of birth, address, contact and employment information, and financial and medical details are just some items in the list. Personal data is a hot topic, and different jurisdictions have diverse regulations. In general, it’s illegal to collect, store, and use someone’s data without the owner’s consent or a legal reason for doing so.

As a rule, when scraping data from websites on the internet, we do not have the consent of the personal data owner, and it’s difficult to lawfully argue for personal data collection. So it’s better not to extract any personal data. Keep in mind that EU and Californian legislation are the strictest in this aspect.

1.3. Copyrighted Data

This is the type of data owned by an individual or business with full control over its reproduction and use. Copyrighted data includes anything like articles, images, songs, databases, and more, and even though this data is openly available online, it’s illegal to use without the consent of the owner. Therefore, while scraping is not illegal in such a case, any further usage of the data might be, depending on a country’s laws. Instead of replicating a piece of writing in full, you can, for instance, use snippets of the original text or provide the reference to the source of the image, table, or video you use.

Copyrighted data image#2However, factual data, is not copyrightable. The names, prices, and features of products aren’t covered by copyright laws, and it’s legal to scrape them. Be aware of the issue of database rights, however. It is often illegal to scrape and reproduce a full database from the web, but using pieces of information without replicating the original database structure does not violate most of the regulations.

Let’s talk a bit about the most frequent legal issues, and then proceed to a brief explanation of existing legislation and regulations in a few different jurisdictions.

2. Frequent Web Scraping Legal Issues

2.1. Copyright Infringement

“All rights reserved.” This phrase is pretty familiar to most of us. What is its relation to web scraping? The key aspect that matters is how the parsed information is used.

If you have harvested copyright-protected data, to stay within the legal framework, you cannot publish it or use it for commercial purposes. It’s not, for instance, forbidden to search YouTube for videos, but it’s illegal to repost them on other sites, since they are covered by copyright legislation. In general, the copyright for media files is prosecutable, regardless of the way the data was obtained.

2.2. CFAA Violation

Back in 1986, the Computer Fraud and Abuse Act was passed in the US to protect specific computers that contained military, fiscal, or other sensitive data from hacking and unauthorized access. Later on, in 1996, it was extended to protect private information. Due to the fact that someone using data scraping techniques can reach only publicly available information, the CFAA regulations do not apply to web crawlers.

The law has nothing to do with data scraping, unless it’s used for harmless data collection.

2.3. Trespass to Site Security

A trespass to site security (or chattel) occurs when a website is violated or when a site server is hurt by any means. It’s easy to forget about this possible issue, since it does not look like a legal issue at first glance. However, frequent web crawler requests can decrease the target site’s performance and slow or stop its server.

If the natural operation of the website is disturbed because of the scraping, something is wrong with the crawler software, and the site owner may think it is an intentional attack.

As you can see from above, it is important to be careful and know and understand the laws and legal regulations of the jurisdictions you are scraping in.

Let’s proceed to our discussion about the existing legislation.

3. Existing Legislation

Today, there isn’t any law clearly declaring website scraping legal or illegal. In most cases, defining web scraping legality requires understanding the court rulings of the lawsuits between website holders and data scrapers.

However, there are legal acts most widely referred to in such cases—the General Data Protection Regulation in EEA and the US Privacy Act.

3.1. GDPR

In spring 2018, the General Data Protection Regulation came into effect as a “one-stop” legal principle to put into practice through a single authority. It is now applied to the personal details of people within the European Economic Area. However, anonymized data is not covered by the Regulation.

Though the document is over a hundred pages of legal language, just a few articles comprise its key IT-related parts. The GDPR sets rules for personal information protection when it is gathered by data controllers and passed to data processors, including those in the cloud.

Besides, there is a breach notification requirement—data authorities and consumers should be notified when there’s been a data exposure. Companies are required to specify the nature of the breach, the categories and amount of information affected, as well as the measures initiated to mitigate the breach.

What is more, the GDPR rules that any company, even those not present physically within the EU, are subject to the Regulation when collecting data about European subjects.

3.2. US Privacy Act

Unlike the European Union, the US does not have single federal privacy legislation, just several federal laws and consumer-oriented legal acts concerning privacy for finance (GLBA), health care (HIPPA), and children’s data (COPPA) coming from each state.

The Consumer Privacy Act (CPA), passed by Congress in 1974, is the most significant one. It remains the first legal reference in most court cases. It confirms the right of American citizens to access, copy, and correct data held by governmental bodies. But the CPA has no impact on data collected online by private companies.

At present, the internet remains a deregulated area where social media and tech companies practice an anything-goes approach. However, American states start stepping in with their own laws on data privacy, and California takes the leading position.

3.3. EU versus US Laws Comparison

It’s tricky to compare the two approaches. While the European Union has its GDPR combined with data security laws, the US does not have any federal-level regulation for consumer data privacy in force, and only California, Maine, and Nevada have legal privacy regulations in effect. They apply data breach notification rule in the other American states.

Nevertheless, California has taken the lead and adopted the CCPA—Californian Consumer Privacy Act. At the moment, it’s the most comprehensive internet-focused legislative document. The CCPA became the clue for other American states to draft their own data privacy laws. It contains a long list of personal information identifiers that are protected—biometric data, geolocation, browsing history, employee information, email, and more.

CGPR vs CCPA

Both the GDPR and CCPA allow people to access, remove, and opt-out of data processing at any moment. Unlike EU consumers, Californians cannot correct their inaccurate personal information. The EU Regulation requires user consent, while the Californian CPA only asks for a privacy notice on the websites.

The New York Privacy Act is currently on hold, yet it contains CCPA legal hallmarks and provides a user with the right to correct, delete, and request PII, coming close to the European GDPR.

The Act is quite strict, claiming an exclusive right of action for any law violation over all the companies that have divisions in the US and are therefore subject to it.

4. Precedent Cases

As we have already mentioned, there are certain court rulings most often referred to in lawsuit cases. Some of them are below.

4.1. eBay v. Bidder’s Edge

This case was the first web scraping-related case heard in a US court. On December 10, 1999, Bidder’s Edge made more than 100,000 entries to the eBay site without the due authorization. That resulted in eBay computer system damage.

Though the legal disputes between the companies were settled out of court in 2001 with an undisclosed sum and an agreement not to access eBay’s data, it became the first legal precedent.

4.2. The FTC v. Facebook

In 2012, the Federal Trade Commission filed a complaint against Facebook because it violated the privacy of the user data it collects. In its applications and privacy notices, Facebook assured it would not sell anybody’s personal information and claimed the users could limit access to their information. However, the FTC accused the network of deceiving people about how their data was handled, and the social media giant was required to revamp its data security practices. This legal case cost Facebook authorities US$5 billion and served as a good lesson not to violate any regulations ever since.

4.3 hiQ Labs v. LinkedIn

After several years of tolerance, LinkedIn sent a cease-and-desist letter to a company dealing with data analysis—hiQ Labs. The company automatically collected information from open LinkedIn profiles and used it to consult employers whose workers posted resumes on the site. After receiving the letter, hiQ Labs sued LinkedIn

Judiciary case of hiQ Labs and LinkedIn
In late 2019, both the first instance and the appeals courts claimed that the CFAA did not apply to the data open to the general public and prohibited LinkedIn from interfering with hiQ’s web scraping.

This case became a historic moment that fundamentally changed the balance of power in legal data regulation cases. However, unlimited usage of the scraped data for commercial purposes is prohibited.

Closing Thoughts

Web scraping legality is still an intricate problem, but keep in mind that if the data is not protected with a login, it’s legal to scrape.

If there is a login requirement, you should check the terms and conditions of the site. However, even if you follow the legally enforceable regulations, the way you use the parsed data matters.
The usage of data is our clients’ area of responsibility, and with this article, we aimed to clear up the matter.
Tags:
data scraping legality, legal scraping, web scraping

You can't comment this post because its author is not yet a full member of the community. You will be able to contact the author only after he or she has been invited by someone in the community. Until then, author's username will be hidden by an alias.