Externally funded project
Innovation and Entrepreneurship Research

Tracing the Flow of Knowledge from Science to Technology Using Deep Learning

Domain-specific language models have become an important tool in the social sciences. They transform text into data points, which can then be used for further analysis. However, these models are usually generated for one specific purpose, e.g., a model trained on scientific publications has learned different features than a model trained on patents. We develop a textual relatedness model for both the scientific and patent domains, optimized for similarity comparisons. During training, we use citations as a proxy for semantic similarity. Once the model is trained, citations are no longer required, and the model relies only on the text of the new documents to identify similarities. Throughout the project, we employ different strategies to build and train the models. After a thorough comparison, we select the best performing model for real-world applications.


Ghosh, Mainak; Erhardt, Sebastian; Rose, Michael E.; Buunk, Erik; Harhoff, Dietmar (2024). PaECTER: Patent-Level Representation Learning Using Citation-Informed Transformers, arXiv preprint 2402.19411

Erhardt, Sebastian; Ghosh, Mainak; Buunk, Erik; Rose, Michael E.; Harhoff, Dietmar (2022). Logic Mill – A Knowledge Navigation System, arXiv preprint 2301.00200.

External Funding

EPO ARP (European Patent Office, Academic Research Programme 2021)

Fields of Research