Prof. Michael Carroll Quoted in Nature Story on New Database for Text Mining

October 29, 2021

Image Caption
Professor Michael Carroll

PIJIP Professor Michael Carroll was recently quoted in a Nature story on the General Index - a 38 terabyte index of words and phrases taken from 107 million research papers. It allows researchers to run text mining operations to obtain information from the works, without downloading the full text. 

The papers in the General Index are copyrighted, and many of them are paywalled. Since the database returns only snippets of up to five words from the paper - and viewers cannot read the papers themselves - the General Index can distribute results without violating copyright laws. Prof. Carroll told Nature, that “copyright does not protect facts and ideas, and these results would be treated as communication of facts derived from the analysis of the copyrighted articles.”

Professor Carroll teaches copyright and cyberlaw at WCL.  His scholarship in the area of copyright and text mining includes Copyright and the Progress of Science: Why Text and Data Mining Is Lawful, published by the UC Davis Law Review.