Internet data mining. Is it legal in the EU?
It’s been a boom time for internet data mining. Countless large-scale web crawlers are collecting and aggregating data from multiple websites every day, every minute, every second. What for? To analyse and examine information from a large pool of datasets and transform it into new knowledge or a new product.
More and more businesses are built on that concept, scientists and medics also use automatically combined data from different sources to spawn predictions and unearth new insights.
Is your business based on internet data mining?
Data mining is the process of collecting and analyzing human-readable data for own purposes.
Look at some examples of data mining businesses:
Idealo is a price comparison site
SESAMm is an innovative startup that aggregates analytics and investment signals based on hundreds of thousands textual data sources worldwide using Natural Language Processing and precisely emotions analysis
You could easily find thousands and thousands other data mining examples. Is your company one of them?
"If you run business in the EU based on internet data mining better find out what you should be aware of"
If you plan to use data mining in your business you should consider:
- regulations regarding databases;
- copyrights law;
- specific regulations regarding extracted content like personal data protection regulations/;
- competition law;
- elements of contractual law (websites’ Terms and conditions and instructions for crawlers);
- new digital single market directive that regulates text and data mining.
The EU institutions always stress the importance of data mining for innovative economy and their will to support European entrepreneurs using big data solutions. The problem is despite declarations legal reality in this area is more and more strict for business.
1. Databases protection
Many websites comprise databases like online shops consist of data sets about products; social network sites with data of natural persons; VOD services providing not only movies but also information, photos or reviews and many others.
- may not be considered as protected (e.g. the schedule of football matches available on the football federation websites);
- may be protected by copyrights if they’re original (these regulations have not much practical impact)
- may be protected under sui generis right.
The most important regulation that you should consider while operating big data is database Directive 96/9/EC which created the right called sui generis (of its own kind).
The protection covers those who had made a substantial investment in the assembly of a database, which means “obtaining, verification and presentation” of the contents, rather than in the creation of the content itself.
- The maker of a database made available to the public may not prevent a lawful user from extracting and/or re-utilizing insubstantial parts of its contents for any purposes. Contrary extracting or re-using a substantial part of the database is not allowed unless the maker gives his/her permission (e.g. in website’s Terms and Conditions). For the evaluation part of the database as substantial it’s important to consider not only quantity but also the importance of extracted data.
- Additionally it is not allowed to scrap (repeatedly and systematically extract) the protected databases’ content and/or re-use its insubstantial part if these activities result in conflict with a normal exploitation of that database or which unreasonably prejudice the legitimate interests of the maker of the database.
"If you run business based on data mining always consider carefully what databases you want to scrap; check if they fall within the scope of databases protection regulations; try to limit extracted and re-used content"
2. Website’s Terms and conditions and instructions for crawlers
This is what always matters, but not always is decisive. T&C creates legal bond between website’s operator and website’s users. But how to define the status of instructions for crawlers? Are bots the users? May T&C be legally binding for those who couldn’t have accepted the conditions (even implicitly)?
Generally, website owners may limit data mining but should do it within the scope of regulations. That sounds good, but is not that easy for unambiguously interpretation.
In a significant case 30/14 of Ryanair Ltd v PR Aviation BV from 2015, Court of Justice of The EU decided that where a website is not protected under the database directive its operator may use T&C to prevent web-scraping. This ruling may result ironically in better protection of databases that are “not protected” (not covered by database regulations).
"Good news is many operators want their website’s to be scrapped (usually under certain circumstances) and don’t mind fair re-use which sometimes boosts their website too; if it’s so – good for you, if it’s not – consider if you have other legal basis"
3. Text and data mining regulations
In 2019, after years of discussions the EU finally adopted new copyright in Digital Single Market Directive. Despite its name it regulates some issues falling out of the scope of copyrights. Surprise, surprise.
When you plan business in the Internet always be aware of this regulation which has to be transposed into respective national laws until 7 June 2021.
In the DSM directive the EU created a new legal concept called "text and data mining" ("TDM"), some aspects of which have been always derived from the copyrights, but as a whole has not been regulated before.
The directive provides for an exception to the copyrights for reproductions and extractions made by research organisations and cultural heritage institutions. It also requires the member states to implement regulations which provide for exceptions relating generally to TDM without specifying the context (Article 4). This provision applies not only to copyright, but also to databases. The exception shall work as long as it has not been expressly limited by the right holder.
"In practice it seems the website owners will be entitled to limit internet data mining in respect to commercial use to a greater extent than under current regulations"
When you plan to start or develop your business based on big data and web scraping do it carefully. You should analyse legal background first to avoid unnecessary risk. If you need help do not hesitate to contact me. I will be glad to gain a full overview of your specific situation in order to determine the best approach in protecting your interest.
Disclaimer: information in this article is provided for informational purposes only. You should not construe any such information as legal, business, tax, investment, trading, financial, or other advice. Nothing in this article intends to promote any of the resources mentioned.
IT/IP lawyer, privacy and data protection specialist
Need a lawyer in this area?
15 years in IT law