Erik Buunk, M.Sc.
Projekt-/Data Science-Beauftragter
Innovation and Entrepreneurship Research
+49 89 24246-583
erik.buunk(at)ip.mpg.de
Arbeitsbereiche
Data Science, Informationsmanagement, Unternehmensberatung, Projektmanagement, Prozessoptimierung (Lean Six Sigma), Softwareentwicklung, Datenvisualisierung, Grafikdesign, Geographische Informationssysteme (GIS)
Werdegang
seit 2021
Projekt-/Data Science-Beauftragter am Max-Planck-Institut für Innovation und Wettbewerb (Innovation and Entrepreneurship Research)
2019 – 2020
Institutsmitglied, Institute for Quantitative Social Science, Harvard University, MA, USA
2016 – 2019
Berater für Informationsmanagement, Security Region Utrecht, Niederlande
2011 – 2016
Senior IT-Berater, Stadtverwaltung Amersfoort, Niederlande
2011 – 2007
Informationsberater/Projektleiter Soziale Dienste, Stadtverwaltung Amersfoort, Niederlande
2003 – 2006
Grafik-Design
1994 – 1996
Umweltwissenschaften, Abschluss M.Sc. (1996)
1991 – 1994
Wissenschaft, Wirtschaft und Verwaltung, Propädeutisches Diplom, mit Auszeichnung (1992)
Publikationen
Konferenzbeiträge
Logic Mill - A Knowledge Navigation System, CEUR Workshop Proceedings 3775, 25-35.
(2024).- Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. It leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million
documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains. - https://ceur-ws.org/Vol-3775/paper7.pdf
- Also published as: arXiv preprint 2301.00200
- Event: 5th Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech 2024) co-located with the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024), Washington D.C., 2024-10-24
Diskussionspapiere
PaECTER: Patent-level Representation Learning using Citation-informed Transformers, arXiv preprint 2402.19411. DOI
(2024).- PaECTER is a publicly available, open-source document-level encoder specific for patents. We fine-tune BERT for Patents with examiner-added citation information to generate numerical representations for patent documents. PaECTER performs better in similarity tasks than current state-of-the-art models used in the patent domain. More specifically, our model outperforms the next-best patent specific pre-trained language model (BERT for Patents) on our patent citation prediction test dataset on two different rank evaluation metrics. PaECTER predicts at least one most similar patent at a rank of 1.32 on average when compared against 25 irrelevant patents. Numerical representations generated by PaECTER from patent text can be used for downstream tasks such as classification, tracing knowledge flows, or semantic similarity search. Semantic similarity search is especially relevant in the context of prior art search for both inventors and patent examiners. PaECTER is available on Hugging Face.
Logic Mill - A Knowledge Navigation System, arXiv preprint 2301.00200.
(2022).- Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.
- https://doi.org/10.48550/arXiv.2301.00200
- Also published in: CEUR Workshop Proceedings 3775
Vorträge
19.09.2023
Logic Mill / Tracing The Flow Of Knowledge
Forschungsseminar
Ort: Schloss Ringberg
18.09.2023
Research Project Management
Forschungsseminar
Ort: Schloss Ringberg
12.07.2023
Tracing The Flow Of Knowledge
EPO/ARP Workshop
Ort: online
03.07.2023
Tracing The Flow Of Knowledge
Posterpräsentation
Kuratorium, Max-Planck-Institut für Innovation und Wettbewerb
Ort: München
27.02.2023
Startup Data
Forschungsseminar
Ort: Frauenchiemsee
06.09.2022
Startup Data Project and GDPR
Forschungsseminar
Ort: Bernried
04.07.2022
Logic Mill – Applications of Machine Learning to Patents, Publications, and Other Text Corpora
Posterpräsentation
Kuratorium, Max-Planck-Institut für Innovation und Wettbewerb
Ort: München
09.06.2022
Logic Mill – Applications of Machine Learning to Patents, Publications, and Other Text Corpora
Posterpräsentation
Munich Summer Institute
Ort: München
13.04.2022
New Data Sources
Forschungsseminar
Ort: Ohlstadt
02.12.2021
Logic Mill
Forschungsseminar
Ort: Schloss Ringberg
01.12.2021
Dataroom Reproducibility
Forschungsseminar
Ort: Schloss Ringberg
01.10.2021
Tools and Resources for Reproducibility
Forschungsseminar
Ort: Feldkirchen-Westerham
30.09.2021
Logic Mill
Forschungsseminar
Ort: Feldkirchen-Westerham
27.07.2021
Information and Data Management at MPI-IC: Human Research Data in Practice
Ort: online
06.07.2021
Logic Mill, Applications of Machine Learning to Patents, Publications, and Other Text Corpora
Ort: online
26.03.2021
Replicability
Forschungsseminar
Ort: online