One of the basic and fundamental principles of copyright law is that data is as such not protected; copyright only protects the creative form not the information incorporated in the protected work. Thus, Text and Data Mining (TDM) should in principle not be a use covered by any exclusive intellectual property (IP) rights (IPRs), both copyright and other sui generis rights.
It could even be argued that this activity is outside the scope of exclusive rights and that any restriction would amount to undermine the underlying rationales of copyright protection and result in an inadmissible restriction of freedom of expression and information. However, at some point, during the chain of activities enabling TDM research, technically some IPR relevant actions are necessary so that in the absence of a specific permission within the legal framework, TDM can lead to an infringement.
Given that dominant market players customarily override exceptions by imposing both contractual and technological measures, limitations to technological blocking should be introduced as well by clearly spelling out that both Technological Protection Measures (TPMs) and network security and integrity measures should not undermine the effective application of the exception. Accordingly, protection against contractual and technological override should be also clearly extended to TDM mining materials not protected by IPRs, including those made available in a database.
TDM usually involves some copying, which even in case of limited excerpt might infringe the right of reproduction. TDM activities can concern text or data, which both can be covered by intellectual property protection, both copyrights and database sui generis rights, or be outside the scope of protection (e.g. lacking originality or being in the public domain). Basically, IPRs can be affected whenever mining involves IP protected subject matters. Only TDM tools involving minimal copying of few words or crawling through data and processing each item separately could be operated without running into potential liability for copyright infringement. This follows from the fact that copyright law does not protect data but only original expressions within copyright protected subject matters.
Any reproductions resulting in the creation of a copy of a protected work along the chain of TDM activities might trigger copyright infringement. In this respect, pre-processing to standardize materials into machine-readable formats might trigger infringement of the right of reproduction. Likewise, the uploading of the pre-processed material on a platform – which might occur or not depending on whether the TDM technique adopted makes use of a TDM software crawling data to be analysed directly from the source – might also violate the right of reproduction. Mining – that stage of the TDM process where data is finally extracted—can also infringe upon the right of reproduction depending on the mining software deployed and the character of the extraction.
TDM might involve the reproduction, translation, adaptation, arrangement, and any other alteration of a database protected by copyright, which means the original selection and arrangement of the database’s content. TDM might infringe sui generis database rights, in particular the extraction – and to a minor extent the re-utilization – of substantial parts of a database. In this context, even if extraction does occur without reproduction of the original materials, extraction itself would infringe upon the exclusive rights provided to the database owner.
Finally, it is to be noted that the TDM output should not infringe any exclusive rights as it merely reports on the results of the TDM quantitative analysis, typically not including parts or extracts of the mined materials. However, it is worth highlighting that contemporary research practices, striving for verifiability of TDM research results, require the ability of researchers to store source materials and to communicate them at least to their peers. From a legal perspective, this conduct could most likely trigger the infringement of the right of communication to the public.