Mainak Ghosh, M.Sc.

Doctoral Student and Junior Research Fellow

Innovation and Entrepreneurship Research

Phone: +49 89 24246-561
Email: mainak.ghosh(at)ip.mpg.de

Personal Website:

https://ghoshmainak.github.io

Areas of Interest:

Natural Language Processing, Economics of Innovation, Science of Science, Intellectual Property Rights

Academic Résumé

Since 03/2020
Junior Research Fellow and Doctoral Candidate, Max Planck Institute for Innovation and Competition (Innovation and Entrepreneurship Research)

10/2017 - 11/2019
Master of Science (M.Sc.) in Data Engineering & Analytics, Technical University of Munich (TUM); Master Thesis: “Multilingual Opinion Mining on Social Media Comments Using Unsupervised Neural Clustering Methods”

11/2017 - 02/2018
Student Research Assistant, Max Planck Institute for Social Law and Social Policy, Munich

05/2013 - 07/2013
Research Intern, Indian Statistical Institute, Kolkata, India

06/2012 - 07/2012
Summer Intern, Globsyn Business School, Kolkata, India

2010 - 2014
Bachelor of Engineering (B.E.) in Computer Science & Technology, Indian Institute of Engineering Science & Technology, Shibpur, India

Work Experience

03/2018 - 03/2020
Working Student, IDS GmbH – Analysis and Reporting Services (IDS), Munich

08/2014 - 09/2017
Software Engineer, Acclaris Business Solutions Pvt Ltd, Kolkata, India

Academic Prizes and Honors

2013
Cognizant Certified Student (CCS), IT Foundation Skills

2009
Award, Mathematical Competence Test, Association for Improvement of Mathematics Teaching (AIMT), Kolkata, India

2008
Certificate of Merit in Physical Science & Mechanics

Publications

Chatterjee, Chirantan; Chugunova, Marina; Ghosh, Mainak; Singhal, Abhay; Wang, Lucy Xiaolu (2023). Human Mediation Leads to Higher Compliance in Digital Mental Health: Field Evidence from India, Frontiers in Behavioral Economics, 2. DOI

Erhardt, Sebastian; Ghosh, Mainak; Buunk, Erik; Rose, Michael; Harhoff, Dietmar (2024). Logic Mill - A Knowledge Navigation System, CEUR Workshop Proceedings 3775, 25-35.

Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. It leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million
documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.
https://ceur-ws.org/Vol-3775/paper7.pdf
Also published as: arXiv preprint 2301.00200
Event: 5th Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech 2024) co-located with the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024), Washington D.C., 2024-10-24

Hagerer, Gerhard; Kirchhoff, Martin; Danner, Hannah; Pesch, Robert; Ghosh, Mainak; Roy, Archishman; Jiaxi, Zhao; Groh, Georg (2021). SocialVisTUM: An Interactive Visualization Toolkit for Correlated Neural Topic Models on Social Media Opinion Mining, in: Galia Angelova et al. (ed.), Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021) - Deep Learning for Natural Language Processing Methods and Applications, INCOMA Ltd., Shoumen 2021, 475-482.

Recent research in opinion mining proposedword embedding-based topic modeling methods that provide superior coherence compared to traditional topic modeling. In this paper, we demonstrate how these methods can be used to display correlated topic models on social media texts using SocialVisTUM, our proposed interactive visualization toolkit. It displays a graph with topics as nodes and their correlations as edges. Further details are displayed interactively to support the exploration of large text collections, e.g., representative words and sentences of topics, topic and sentiment distributions, hierarchical topic clustering, and customizable, predefined topic labels. The toolkit optimizes automatically on custom data for optimal coherence. We show a working instance of the toolkit on data crawled from English social media discussions about organic food consumption. The visualization confirms findings of a qualitative consumer research study. SocialVisTUM and its training procedures are accessible online.
Event: International Conference "Recent Advances in Natural Language Processing, Shoumen, 2021-09-01

Hagerer, Gerhard; Moeed, Abdul; Dugar, Sumit; Gupta, Sarthak; Ghosh, Mainak; Danner, Hannah; Mitevski, Oliver; Nawroth, Andreas; Groh, Georg (2020). An Evaluation of Progressive Neural Networks for Transfer Learning in Natural Language Processing, in: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, 1376-1381.

A major challenge in modern neural networks is the utilization of previous knowledge for new tasks in an effective manner, otherwise known as transfer learning. Fine-tuning, the most widely used method for achieving this, suffers from catastrophic forgetting. The problem is often exacerbated in natural language processing (NLP). In this work, we assess progressive neural networks (PNNs) as an alternative to ﬁne-tuning. The evaluation is based on common NLP tasks such as sequence labeling and text classiﬁcation. By gauging PNNs across a range of architectures, datasets, and tasks, we observe improvements over the baselines throughout all experiments.
Conference Paper
Conference Volume
Event: 12th Language Resources and Evaluation Conference, Marseille, 2020-05-11

Ghosh, Mainak; Erhardt, Sebastian; Rose, Michael; Buunk, Erik; Harhoff, Dietmar (2024). PaECTER: Patent-level Representation Learning using Citation-informed Transformers, arXiv preprint 2402.19411. DOI

PaECTER is a publicly available, open-source document-level encoder specific for patents. We fine-tune BERT for Patents with examiner-added citation information to generate numerical representations for patent documents. PaECTER performs better in similarity tasks than current state-of-the-art models used in the patent domain. More specifically, our model outperforms the next-best patent specific pre-trained language model (BERT for Patents) on our patent citation prediction test dataset on two different rank evaluation metrics. PaECTER predicts at least one most similar patent at a rank of 1.32 on average when compared against 25 irrelevant patents. Numerical representations generated by PaECTER from patent text can be used for downstream tasks such as classification, tracing knowledge flows, or semantic similarity search. Semantic similarity search is especially relevant in the context of prior art search for both inventors and patent examiners. PaECTER is available on Hugging Face.

Erhardt, Sebastian; Ghosh, Mainak; Buunk, Erik; Rose, Michael; Harhoff, Dietmar (2022). Logic Mill - A Knowledge Navigation System, arXiv preprint 2301.00200.

Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.
https://doi.org/10.48550/arXiv.2301.00200
Also published in: CEUR Workshop Proceedings 3775

Presentations

19.09.2023
Patent Quality - Measurement & Analysis
Research Seminar
Location: Ringberg Castle

27.02.2023
Patent Quality - Measurement & Analysis
Research Seminar
Location: Frauenchiemsee

23.09.2023
Logic Mill
Summer School on Data and Algorithms for Science, Technology & Innovation Studies, KU Leuven
Location: Leuven, Belgium

07.09.2022
Logic Mill
Research Seminar
Location: Bernried am Starnberger See

11. – 14.04.2022
Logic Mill & Hierarchical Embedding
Research Seminar
Location: Ohlstadt

01.12.2021
Logic Mill: Patent Embedding
Research Seminar
Location: Ringberg Castle

01.10.2021
Logic Mill: Automation of Patent Full-Text Collection
Research Seminar
Location: Feldkirchen-Westerham

23.03.2021
Logic Mill - Citation Prediction
Research Seminar
Location: online

23.03.2021
Automation and Mental Health Platform Design: Field Experiment Plan
Research Seminar
Location: online

10.09.2020
Knowledge Mining, Digitalization, Machine Learning
Research Seminar
Location: online (Munich)

Projects

Essays on the Role of Science in Patents, Patent Quality, and Diffusion

Tracing the Flow of Knowledge from Science to Technology Using Deep Learning