Researchers are faced with an increasing volume of relevant documents from a wide variety of fields. Thus, there is a growing need for tools that allow researchers to quickly identify related texts in different domains. Existing solutions do not allow linking documents from text corpora that originate from different domains. Moreover, they are not scalable, or use proprietary algorithms.
Logic Mill – A New Software System and Research Tool
Logic Mill is a new software system and research tool designed by a research group from the Economics Department and led by Dietmar Harhoff to identify documents that are similar to a given text in other text corpora. It consists of a set of open source software components and has a public application-programming interface (API) that the scientific community may use.
The Logic Mill software analyzes large parts of texts, which consist not only of words, but also of structure and context, with the help of state-of-the-art machine learning techniques. Unlike previous attempts to estimate text similarity, Logic Mill accounts for semantic structure as an additional dimension of similarity. Logic Mill does not only look for the occurrence of the same words, but also in what context (that is, relative to the sentence and paragraph) these occur. Specialized machine learning models encode the text numerically and allow the computation of various similarity measures.
Previous attempts of comparing text documents were mostly limited to texts of the same category, such as patents to patents or publications to publications. Now, it is possible to compare documents across these and other domains.
Currently indexed datasets include data from Semantic Scholar, EPO, USTPO und WIPO. An integration of Wikipedia is in preparation.
The Research Applications
Logic Mill allows to explore literature quickly. It permits to find semantically similar patent documents, which is important for prior art search in patent examination or to assess the propensity of patent litigation. Moreover, it can link patents to related scientific publications. Logic Mill can recommend citations for new documents and readings from just published papers. It also allows assessing the novelty of patents and publications. In addition, knowledge flows can be traced across different domains. New trends and the diffusion of new concepts can be detected.
The name of the project Logic Mill is inspired by the novels of the “The Baroque Cycle” by British writer Neal Stephenson. In it, German polymath Gottfried Wilhelm Leibniz conceptualizes a machine to organize all human knowledge based on a retrieval system using prime numbers. While this machine is fictitious, Leibniz’s thoughts echo into modern computing, in particular into the problem of representing any kind of data numerically.
If you would like to be notified of Logic Mill progress or participate in the trial program, you can register on the Logic Mill Website.
Directly to the publication Logic Mill – A Knowledge Navigation System.