Privacy (GDPR)

Internet data mining. Is it legal in the EU?

It’s been a boom time for internet data mining. Countless large-scale web crawlers are collecting and aggregating data from multiple websites every day, every minute, every second. What for? To analyse and examine information from a large pool of datasets and transform it into new knowledge or a new product. 

More and more businesses are built on that concept, scientists and medics also use automatically combined data from different sources to spawn predictions and unearth new insights.

Is your business based on internet data mining?

Data mining is the process of collecting and analyzing human-readable data for own purposes.

Look at some examples of data mining businesses:

Idealo – a price comparison site

Idealo is a price comparison site

SESAMm is an innovative startup that aggregates analytics and investment signals

SESAMm is an innovative startup that aggregates analytics and investment signals based on hundreds of thousands textual data sources worldwide using Natural Language Processing and precisely emotions analysis

You could easily find thousands and thousands other data mining examples. Is your company one of them?

"If you run business in the EU based on internet data mining better find out what you should be aware of"


If you plan to use data mining in your business you should consider:

  1. regulations regarding databases;
  2. copyrights law;
  3. specific regulations regarding extracted content like personal data protection regulations/;
  4. competition law;
  5. elements of contractual law (websites’ Terms and conditions and instructions for crawlers);
  6. new digital single market directive that regulates text and data mining.

The EU institutions always stress the importance of data mining for innovative economy and their will to support European entrepreneurs using big data solutions. The problem is despite declarations legal reality in this area is more and more strict for business. 

1. Databases protection

Many websites comprise databases like online shops consist of data sets about products; social network sites with data of natural persons; VOD services providing not only movies but also information, photos or reviews and many others. 


  1. may not be considered as protected (e.g. the schedule of football matches available on the football federation websites);
  2. may be protected by copyrights if they’re original (these regulations have not much practical impact)
  3. may be protected under sui generis right.

Obraz zawierający sprzęt elektroniczny, obwód, stojące  Opis wygenerowany automatycznie

The most important regulation that you should consider while operating big data is database Directive 96/9/EC which created the right called sui generis (of its own kind). 

The protection covers those who had made a substantial investment in the assembly of a database, which means “obtaining, verification and presentation” of the contents, rather than in the creation of the content itself. 


  1. The maker of a database made available to the public may not prevent a lawful user from extracting and/or re-utilizing insubstantial parts of its contents for any purposes. Contrary extracting or re-using a substantial part of the database is not allowed unless the maker gives his/her permission (e.g. in website’s Terms and Conditions). For the evaluation part of the database as substantial it’s important to consider not only quantity but also the importance of extracted data.
  2. Additionally it is not allowed to scrap (repeatedly and systematically extract) the protected databases’ content and/or re-use its insubstantial part if these activities result in conflict with a normal exploitation of that database or which unreasonably prejudice the legitimate interests of the maker of the database.

"If you run business based on data mining always consider carefully what databases you want to scrap; check if they fall within the scope of databases protection regulations; try to limit extracted and re-used content"

2. Website’s Terms and conditions and instructions for crawlers

This is what always matters, but not always is decisive. T&C creates legal bond between website’s operator and website’s users. But how to define the status of instructions for crawlers? Are bots the users? May T&C be legally binding for those who couldn’t have accepted the conditions (even implicitly)?

Generally, website owners may limit data mining but should do it within the scope of regulations. That sounds good, but is not that easy for unambiguously interpretation.

In a significant case 30/14 of Ryanair Ltd v PR Aviation BV from 2015, Court of Justice of The EU decided that where a website is not protected under the database directive its operator may use T&C to prevent web-scraping. This ruling may result ironically in better protection of databases that are “not protected” (not covered by database regulations).

"Good news is many operators want their website’s to be scrapped (usually under certain circumstances) and don’t mind fair re-use which sometimes boosts their website too; if it’s so – good for you, if it’s not – consider if you have other legal basis" 

Obraz zawierający trawa, zewnętrzne, siedzi, pole  Opis wygenerowany automatycznie

3. Text and data mining regulations

In 2019, after years of discussions the EU finally adopted new copyright in Digital Single Market Directive. Despite its name it regulates some issues falling out of the scope of copyrights. Surprise, surprise. 

When you plan business in the Internet always be aware of this regulation which has to be transposed into respective national laws until 7 June 2021.

In the DSM directive the EU created a new legal concept called "text and data mining" ("TDM"), some aspects of which have been always derived from the copyrights, but as a whole has not been regulated before.

The directive provides for an exception to the copyrights for reproductions and extractions made by research organisations and cultural heritage institutions. It also requires the member states to implement regulations which provide for exceptions relating generally to TDM without specifying the context (Article 4). This provision applies not only to copyright, but also to databases. The exception shall work as long as it has not been expressly limited by the right holder.

"In practice it seems the website owners will be entitled to limit internet data mining in respect to commercial use to a greater extent than under current regulations"


When you plan to start or develop your business based on big data and web scraping do it carefully. You should analyse legal background first to avoid unnecessary risk. If you need help do not hesitate to contact me. I will be glad to gain a full overview of your specific situation in order to determine the best approach in protecting your interest.


Disclaimer: information in this article is provided for informational purposes only. You should not construe any such information as legal, business, tax, investment, trading, financial, or other advice. Nothing in this article intends to promote any of the resources mentioned.

Ewa Wojnarska-Krajewska,

Legal Counsel, 

IT/IP lawyer, privacy and data protection specialist

Need a lawyer in this area?



European Union

15 years in IT law

Legal counsel focused on IP/IT and Privacy Law including data protection. Wide range of...

Legal Nodes Blog

Privacy (GDPR)
Cookie Policy: How to Track Website Users Lawfully

The recent study of the Nederlandse Omroep Stichting (the ‘NOS’), a Dutch news media, showed that more than 1,300 Dutch websites violate the privacy of their users. The violation found by the NOS is simple - the users cannot use the websites wit...

Legal Nodes Team
For Startups
Why Your Startup Needs a Founders' Agreement + Template 2021

Founders Agreement – the key step to set clear intentions for you and your partners and to avoid misunderstandings in the future. In a new post on the Legal Nodes blog, we explain what a Founders Agreement is, reasons for your startup to prepare...

Legal Nodes Team
For Startups
Terms of Use that your users will actually read

In this article Legal Nodes Team talks about Terms of Use, how to write them effectively and why you need them in the first place. You could find a FREE template at the end of this article....

Legal Nodes Team
Privacy (GDPR)
How Can You Leverage a Privacy Kit More Effectively?

In this article, Punit Bhatia, a leading privacy expert, shares how small businesses can become privacy compliant by using Privacy Kits in an effective and why just branding the documents and templates in name of your company is not a good idea....

Punit Bhatia
Legal Nodes Updates
Legal Nodes in 2020: A Year in Review

Despite the fact that for many 2020 will be strongly associated with the coronavirus pandemic and lockdown measures, it would be a mistake to forget all the good things that happened this year. Especially when the festive season approaches, and ...

Legal Nodes Team
Privacy (GDPR)
Internet data mining. Is it legal in the EU?

Data mining is the process of collecting and analyzing human-readable data for own purposes. More and more businesses are built on that concept, scientists and medics also use automatically combined data from different sources to spawn predictio...

Ewa Wojnarska-Krajewska
Privacy (GDPR)
11 simple (but complete) steps towards the GDPR compliance in 2020

The GDPR can be a wake-up call to sort out your processes, procedures and technology and thereby run a more successful organisation. Data is now more essential than ever, regardless of your activities or market sector. Not only will efficiencies...

Thomas Hayes
Contract Work
Force Majeure Clauses and the Effect of Coronavirus on Businesses

The coronavirus pandemic has made force majeure clauses one of the hottest legal topics worldwide. To help businesses navigate this issue, we asked Tom Bohills, an English qualified lawyer and the Founder of Chronos Law, to explain the backgrou...

Tom Bohills
Privacy (GDPR)
Privacy Policy: Everything you need to know

Privacy Policy (or Privacy Notice) is a public legal statement of the company. It explains how the organisation uses information about its users, customers, or employees....

Legal Nodes Team