Sebastian Erhardt, M.Sc.

Former Research Fellow

Innovation and Entrepreneurship Research



Areas of Interest

Data Science, Artificial Intelligence, Machine Learning, Natural Language Processing, Economics of Innovation

Academic Résumé

08/2020 – 06/2025
Junior Research Fellow and Doctoral Candidate, Max Planck Institute for Innovation and Competition (Innovation and Entrepreneurship Research)
Postgraduate Studies in Business Research (MBR) at Munich School of Management, LMU Munich
Doctoral Thesis: “Essays on Applications of Machine Learning to Science, Patent, and Economic Data”

02/2020 – 07/2020
Visiting Blockchain Researcher, Liphardt Lab ‒ Stanford Distributed Trust Initiative, Leland Stanford Junior University

10/2019 – 05/2021
Master of Science (M.Sc.) in Computer Science, Ludwig-Maximilians-Universität (LMU) Munich

08/2018 – ​​​​​​​07/2020
Honours Degree in Technology Management, Center for Digital Technology and Management (CDTM), LMU Munich and Technical University of Munich (TUM)

10/2017 – ​​​​​​​04/2019
Master of Science (M.Sc.) in Information Systems, Technical University of Munich (TUM)

10/2014 – 03/2017
Bachelor of Science (B.Sc.) in Information Systems, Technical University of Munich (TUM)

Work Experience

12/2005 – today
Self-Employed Developer/Consultant ‒ Freelance work for small, medium and large companies and for the Federal Government, Germany

08/2020 – 10/2020
Developer/Product Manager ‒ Tech4Germany, Federal Ministry of the Interior, Federal Ministry of Finance and German Federal Centre for Information Technology, Berlin

02/2016 – 03/2017
Working Student, Deloitte Digital, Munich

09/2015 – 02/2016
Working Student, Deloitte Consulting, Munich

Honors, Scholarships, Academic Prizes

Since 2019
Fellow of the Tech4Germany Program of the Federal Government of Germany, an initiative under the patronage of Prof. Dr. Helge Braun - Federal Minister & Chief of Staff of the Federal Government of Germany

2018 – 2019
Think Digital Scholarship of the Internet Business Cluster (IBC) e.V.

Since 2018
Member of the Elite Network of Bavaria, an initiative of the Free State of Bavaria to promote young scientists

2016 – 2018
Manage&More Scholarship of the UnternehmerTUM

Publications

Conference papers

Erhardt, Sebastian; Ghosh, Mainak; Buunk, Erik; Rose, Michael; Harhoff, Dietmar (2024). Logic Mill - A Knowledge Navigation System, CEUR Workshop Proceedings 3775, 25-35.

  • Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. It leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million
    documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.
  • https://ceur-ws.org/Vol-3775/paper7.pdf
  • Also published as: arXiv preprint 2301.00200
  • Event: 5th Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech 2024) co-located with the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024), Washington D.C., 2024-10-24

Further Publications, Press Articles, Interviews

Harhoff, Dietmar; Rose, Michael; Ghosh, Mainak; Erhardt, Sebastian; Buunk, Erik (2024). Tracing the Flow of Knowledge From Science to Technology Using Deep Learning. Project Report for the Academic Research Programme of the EPO, 2024.

  • This project aims to enhance the tracing of knowledge flows from scientific research to technology using advanced deep-learning techniques. By developing models such as Pat-SPECTER, and PaECTER, the project seeks to improve the accuracy of identifying connections between patents and scientific literature, surpassing the limitations of traditional citation-based analysis. Key findings include the analysis of the performance of Pat-SPECTER in predicting scientific citations for patents and the performance of PaECTER in predicting citations among patents. The evaluation process was made difficult by the incompleteness of the open-access database OpenAlex, which lacks abstracts for a portion of the scientific literature. Real-world tests demonstrated Pat-SPECTER’s effectiveness in identifying relevant prior art documents (patents and publications), improving the efficiency of prior art search. The project highlights the potential of advanced machine learning models and advances their use in tracing knowledge flows. It provides tools that can enhance patent examination processes, innovation tracking, and research and development strategies. These efforts help foster innovation by revealing the intricate connections between science and technology.
  • https://link.epo.org/elearning/en-ARP2021_Harhoff.pdf

Erhardt, Sebastian (2024). Automated Patent Landscaping Using Deep Learning, Proceedings of the 5th Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech 2024), in: Hidir Aras et al. (ed.), PatentSemTech 2024 Patent Text Mining and Semantic Technologies 2024 - Proceedings of the 5th Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech 2024), co-located with the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024) (CEUR-Workshop Proceedings, 3775), Washington, D.C., 36-50.

  • Patent Landscaping is a valuable instrument for many stakeholders, such as patent examiners, company decision-makers, researchers, and policymakers. They use this method to analyze the state-of-the-art, compare organizations’ patenting activities, assess entire industries, or identify gaps in internal R&D activities. However, analyzing vast amounts of patent documents and aggregating and visualizing information is cumbersome and complex. The paper presents an innovative approach to automated patent landscaping by combining natural language processing models with approximate nearest neighbor search, dimensionality reduction, and clustering methods. This entire approach only uses the textual content of the underlying patents and does not use any additional meta-data, such as technology classes or citations.
  • Event: 5th Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech 2024), Washington, D.C., 2024-07-28

Discussion Papers

Rose, Michael; Ghosh, Mainak; Erhardt, Sebastian; Li, Cheng; Buunk, Erik; Harhoff, Dietmar (2025). Tracing the Flow of Knowledge From Science to Technology Using Deep Learning, arXiv:2512.24259.

  • We develop a language similarity model suitable for working with patents and scientific publications at the same time. In a horse race-style evaluation, we subject eight language (similarity) models to predict credible Patent-Paper Citations. We find that our Pat-SPECTER model performs best, which is the SPECTER2 model fine-tuned on patents. In two real-world scenarios (separating patent-paper-pairs and predicting patent-paper-pairs) we demonstrate the capabilities of the Pat-SPECTER. We finally test the hypothesis that US patents cite papers that are semantically less similar than in other large jurisdictions, which we posit is because of the duty of candor. The model is open for the academic community and practitioners alike.

Ghosh, Mainak; Erhardt, Sebastian; Rose, Michael; Buunk, Erik; Harhoff, Dietmar (2024). PaECTER: Patent-level Representation Learning using Citation-informed Transformers, arXiv preprint 2402.19411. DOI

  • PaECTER is a publicly available, open-source document-level encoder specific for patents. We fine-tune BERT for Patents with examiner-added citation information to generate numerical representations for patent documents. PaECTER performs better in similarity tasks than current state-of-the-art models used in the patent domain. More specifically, our model outperforms the next-best patent specific pre-trained language model (BERT for Patents) on our patent citation prediction test dataset on two different rank evaluation metrics. PaECTER predicts at least one most similar patent at a rank of 1.32 on average when compared against 25 irrelevant patents. Numerical representations generated by PaECTER from patent text can be used for downstream tasks such as classification, tracing knowledge flows, or semantic similarity search. Semantic similarity search is especially relevant in the context of prior art search for both inventors and patent examiners. PaECTER is available on Hugging Face.

Erhardt, Sebastian; Ghosh, Mainak; Buunk, Erik; Rose, Michael; Harhoff, Dietmar (2022). Logic Mill - A Knowledge Navigation System, arXiv preprint 2301.00200.

  • Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.
  • https://doi.org/10.48550/arXiv.2301.00200
  • Also published in: CEUR Workshop Proceedings 3775

Presentations

23.09.2024
Patent Opposition in Technology Space
Research Seminar, Max Planck Institute for Innovation and Competition
Martinsried/Planegg

28.07.2024
Logic Mill - A Knowledge Navigation System
5th Workshop on Patent Text Mining and Semantic Technologies, CEUR / ACM SIGIR
Washington D.C., USA

28.07.2024
Automated Patent Landscaping using Deep Learning
5th Workshop on Patent Text Mining and Semantic Technologies, CEUR / ACM SIGIR
Washington D.C., USA

06.05.2024
Logic Mill & Patent Landscaping
Max(s)i Workshop, Copenhagen Business School
Copenhagen, Denmark

26.03.2024
Bold Research Projects & To Opt Out or Not
Research Seminar, Max Planck Institute for Innovation and Competition
Tutzing

07.12.2023
Logic Mill - A Knowledge Navigation System
2nd CESifo / ifo Junior Workshop on Big Data
Munich

02.12.2023
Logic Mill - A Knowledge Navigation System
Innovation Information Initiative Technical Working Group Meeting - National Bureau of Economic Research
Cambridge, MA, US

09.06.2022
Logic Mill
Munich Summer Institute
Munich

08.04.2022
Tracing the Flow of Knowledge from Science to Technology Using Deep Learning
European Patent Office ARP Program
Munich

04.11.2020
Introduction to Git & Github
Max Planck Institute for Innovation and Competition
online

Projects