Welcome to Prof. Dr. Marco Spruit's academic website. This dashboard highlights various aspects of my current work activities. Navigate to the appropriate pages in the 'Blue MenuBar' (above) for much more details on my work as Full Professor of Advanced Data Science in Population Health at Leiden University's Medical Center (PHEG) and its Faculty of Science (LIACS).

Active Grants & Collaborations

Recent Publications & Talks

PhDs & Students & Committees

Research grants (21)

2024-2026: ECOTIP, EUR 130K (LUMC).
Identifying tipping points of the effects of living environments on ecosyndemics of lifestyle-related illnesses by ML/NLP modelling of a patient segmentation model based on EHR and environmental data. Applicant(s): Kiefte,J., Spruit,M., Vos,R., et al. Remark: grant total: 4.4M EUR. Researcher(s): Muizelaar,H. www.nwo.nl/en/projects/nwa151822151
2023-2026: INSAFEDARE, EUR 571K (LUMC).
Innovative applications of assessment and assurance of data and synthetic data for regulatory decision support. Generation and evaluation of a benchmarking synthetic dataset amenable to the regulatory process, analytical methods for validation of digital health applications, and components for data integration pipelines. Financer(s): Horizon Europe: HORIZON-HLTH-2022-TOOL-11-02: Tools and technologies for a healthy society. Applicant(s): Despotou,G. et al. HEU project #101095661; grant total: 4.8M EUR. Researcher(s): Achterberg,J. & Dijk,B. van 10.3030/101095661
2024: EuroQoL-LLM, 1325 EUR (LUMC).
Applying Large Language Models to Identify EQ-5D Bolt-ons Based on Patient Text Data. Financer: EuroQol Group Seed grant: 1792-SG. Applicant: van den Akker-van Marle,E., Spruit,M., et al. Remark: Grant total: 42K EUR. Researcher(s): Heijdra Suasnabar,J. et al. euroqol.org/research-at-euroqol/ our-research-portfolio/funded-projects/
2023-2024: HealthBox, EUR 66,000 (LUMC).
A personalized, home-based eHealth intervention to treat metabolic syndrome and prevent its complications by ML/NLP modelling of a patient segmentation model based on EHR and environmental data. Applicant(s): Chavannes,N., Atsma,D., Pijl,H., Vos,R., et al. Remark: grant total: 2.5M EUR. Researcher(s): Muizelaar,H. www.nwo.nl/en/projects/kich1gz0321007
2023-2024: SENSYN, EUR 5K (LUMC).
Making sensitive data reusable through synthetic data generation, and implementation of FAIR principles in highly sensitive data areas. Financer(s): NWO Open Science Fund. Applicant(s): Liem,M., Spruit,M., et al. Remark: grant total: 50K EUR. Researcher(s): Haas,M. & Achterberg,J. www.nwo.nl/en/projects/osf231006
2021-2024: VIPP, EUR 60K (LUMC).
Virtual Patients and Population Dataset. Develop a synthetic ELAN dataset to improve teaching data science. Financer(s): LUMC Interprofessional Education (IPE) programme. Applicant(s): Spruit,M., & Szuhai,K. Remark: Project Raamplan Implementatie Artsopleiding (PRIMA) 2020 working group deliverable wrt Theme 5 on Big Data and AI. Researcher(s): Faiq,A. healthcampusdenhaag.nl/nl/project/ virtuele-patient-en-populatie-vipp-dataset/

Research theme

Research collaborations (16)

2022-2024: EDAsynth (ULEI)
Emergency Department Admissions Forecasting with Generative AI. Sponsor: Universidad de Alcalá. Researcher(s): Álvarez-Chavez,H.
2022-2026: PreProMMF (ULEI)
Natural Language Processing in Mental Health: Detection, Prediction and Promotion with Multilingual, Multimodal and Federated Techniques. Sponsor: Arab Academy of Science, Technology & Maritime Transport (AAST). Financed as a 60% lecturer - 40% researcher contract. Researcher(s): Khalil,S.
2021-2025: Data2Bedside (LUMC)
Reusing routinely collected data from regional GP offices in ELAN to create a clinical decision support tool to identify disease progression risk levels in Type Two Diabetes Mellitus (T2DM) patients. Sponsor: Kingdom of Saudi Arabia scholarship. Researcher(s): Alfaraj,S.
2021-2026: PHA (LUMC)
Population Health Analytics. Maturity modelling for situational data infrastructure and scenario planning towards appropriate regional intelligence. Sponsor: Q-Consult Zorg. Researcher(s): Roorda,E.
2020-2024: ATS (ULEI)
A Telling Story. Mindreading with NLP. Sponsor: NWO; Applicant(s): Duijn, M. van. Researcher(s): Dijk,B. van
2018-2024: PbD (UU)
Privacy-by-Design. How organisations can demonstrate responsible data use in information systems through Privacy-by-Design. Sponsor: P&O Rijk. Researcher(s): Dijk,F. van

Journal articles (106)

  1. Rijcken,E., Zervanou,K., Mosteiro,P., Scheepers,F., Spruit,M., & Kaymak,U. (2024). Topic Specificity: a Descriptive Metric for Algorithm Selection and Finding the Right Number of Topics. Natural Language Processing Journal, 8, 100082. 10.1016/j.nlp.2024.100082
  2. Muizelaar,H., Haas,M., van Dortmont,K., van der Putten,P., & Spruit,M. (2024). Extracting Patient Lifestyle Characteristics from Dutch Clinical Text with BERT Models. BMC Medical Informatics and Decision Making, 24, 151. 10.1186/s12911-024-02557-5
  3. Khalil, S., Tawfik,N., & Spruit,M. (2024). Federated learning for privacy-preserving depression detection with multilingual language models in social media posts. Patterns, 5, 100990. 10.1016/j.patter.2024.100990
  4. Khalil, S., Tawfik,N., & Spruit,M. (2024). Exploring the Potential of Federated Learning in Mental Health Research: A Systematic Literature Review. Applied Intelligence, 54, 1619-1636. 10.1007/s10489-023-05095-1
  5. Jungo,K., Salari,P., Meier,R., Bagattini,M., Spruit,M., Rodondi,N., Streit,S., & Schwenkglenks,M. (2024). Cost-effectiveness of a medication review intervention for general practitioners and their multimorbid older patients with polypharmacy: Analysis of data from the OPTICA trial. Socio-Economic Planning Sciences, 92, 101837. 10.1016/j.seps.2024.101837
  6. Jungo,K., Deml,M., Schalbetter,F., Moor,J., Feller,M., Lüthold,R., Huibers,J., Sallevelt,B., Meulendijk,M., Spruit,M., Schwenkglenks,M., Rodondi,N., & Streit,S. (2024). A mixed methods analysis of the medication review intervention centered around the use of the Systematic Tool to Reduce Inappropriate Prescribing Assistant (STRIPA) in Swiss primary care practices. BMC Health Services Research, 24, article number 350. 10.1186/s12913-024-10773-y

gScholar statistics

AllSince 2019
Citations42882985
h-index3627
i10-index9172

Conference proceedings (84)

  1. Dijk,B. van, Duijn,M. van, Kloostra,L., Spruit,M., & Beekhuizen,B. (2024). Using a Language Model to Unravel Semantic Development in Children's Use of a Dutch Perception Verb. 8th Workshop on Cognitive Aspects of the Lexicon (CogALex@ LREC-COLING 2024) (pp. 98-106). 20 May 2024, Torino, Italy. 2024 - Dijk Duijn Kloostra Spruit Beekhuizen.pdf
  2. Wang,R., Verberne,S., & Spruit,M. (2024). Attend All Options at Once: Full Context Input for Multi-choice Reading Comprehension. In European Conference on Information Retrieval (ECIR 2024) (pp. 387-402). 24-28 March 2024, Glasgow, Scotland. Cham: Springer. 10.1007/978-3-031-56027-9_24
  3. Dijk, B., Kouwenhoven,T., Spruit,M., & Duijn, M. van (2023). Large Language Models: The Need for Nuance in Current Debates and a Pragmatic Perspective on Understanding. Conference on Empirical Methods in Natural Language Processing (EMNLP 2023) (pp. 12641-12654). ACL. December 6-10, Singapore. aclanthology.org/2023.emnlp-main.779/

Invited talks (44)

  1. 21/03/2024: Natural language processing for enriching real world evidence from electronic health records: NLP @ Health Campus The Hague. Spring Symposium Young Epidemiologists, UMC Utrecht. [30 min] 2024 0321 spruit-haga.pdf
  2. 11/03/2024: Translational Data Science in Population Health: Data Techniques and Methodology for Violence as a Public Health Problem. KIEM Pressure Cooker Workshop, 11 March 2024. [10 min] 2024 0311 Kiem-pitch-tds-en.pdf
  3. 20/02/2024: Translational Data Science & AI: A case of Natural Language Processing for Violence Risk Assessment using CRISP-DM. Lorentz workshop Criminal Justice Settings, Crime, and Reintegration, Session on New insights from computer science and economics for the study of criminal justice involved individuals, Leiden. [30 min] www.lorentzcenter.nl 2024 0220 Lorentz spruit NLP.pdf
  4. 30/10/2023: Translational Data Science in Population Health: CRISP-DM Methodology in the TDS Lab. Amsterdam Public Health (APH) methodology workshop, Amsterdam. [45 min] 2023 1030 pitch-tds-en-crisp.pdf youtu.be/BhTLj2rdnPc
  5. 12/10/2023: ELAN-VIPP: Het ELAN Virtuele Patiënten en Populatie project - Onderweg naar een digitale tweeling mét ELAN?. Nederlandstalig Platform Survey Onderzoek (NPSO) bijeenkomst Synthetische data, online. [35 min] 2023 1012 VIPP-NPSO.pdf
  6. 20/09/2023: Translationele gegevenswetenschap: een geval van natuurlijke taalverwerking in de geestelijke gezondheid. AG TechFest, Werkspoorkathedraal, Utrecht. [30 min] 2023 0920 mrspruit NLP mini.pdf

Postdocs & PhD candidates (13)

MSc students (97)

  1. Drougkas,George (in progress). Multimodal Machine Learning for Better Identification of Language Markers for Mental Health. Spruit,Marco, & Bakker,Erwin (UL).
  2. Rameshchandra,Ramya Tumkur (in progress). Unsupervised machine learning methods to understand the social and psychological effects of prescription opioids. Spruit,Marco, & Baratchi,Mitra (UL).
  3. Bianchi,Niccolo (in progress). Automated drug repurposing workflow for rare diseases. Spruit,Marco & Lefebvre,Armel (UL).
  4. Thiel,Haike van (committed).
  5. Tomassen,Floris (05/02/2024). LLM-Based Data Generation techniques for end-to-end models of grammatical error correction applied to Dutch Care Text. Spruit,Marco; Wijnholds,Gijs. (Prime Vision). [8.5]

BSc students (61)

  1. Leito, Roderick (in progress). Integration of the EQ5D PROM questionnaire into a natural and unobtrusive conversation using a RASA-driven chatbot. Spruit,Marco & Lefebvre,Armel (UL).
  2. Baghdasaryan, Ruzanna (in progress). Questionnaire-driven Dialogue: Utilizing Large Language Models for Hallucination-free Conversational AI in Elderly Well-being Monitoring. Spruit,Marco & Lefebvre,Armel (UL).
  3. Tanoesemito, Charma (est. 01/03/2024). Reconstructing family relationships using routine primary care Electronic Health Record database. Life Sciences and Technology (LST) programme. Spruit,Marco; Marian Beekman, Niels van den Berg (MOLEPI). [8.0]
  4. Lelasseux, Maxine (05/02/2024). Analyzing offenses against life data: a machine learning approach on data extracted from the Human Relations Area Files (HRAF) database. Spruit,Marco; Liem,Marieke; Syme,Katharina (FGGA/ISGA). [6.5]

Leiden University committees (24)

  1. 2024-present: Self Steering Committee Member in UNA Europa, One Health Focus Area. .../una-europa-leiden/self-steering-committees
  2. 2024-present: Member ELAN Scientific Board.
  3. 2023-present: Member PHM Scientific Council.
  4. 2023-present: Member LIACS Scientific Council.
  5. 2023-present: Member PHEG Stuurgroep Studenten Onderwijs (SSO).
  6. 2022-present: Lead of ELAN implementation case in LUMC/Health-RI node.
  7. 2022-present: Member LUMC Student Research Award committee.
  8. 2021-present: Co-lead Special Interest Group Health Data Science (with profs. Kraaij & Fiocco).
  9. 2021-present: Member core team LUMC Clinical AI Implementation and Research Lab (CAIRELab).
  10. 2021-present: Member Advisory board of LUMC Research Facility Data Analytics.

Oppositions (19)

  1. M. Fragkiadakis (LIACS, 9/4/2024, secretary). Digital Tools for Sign Language Research: Towards Recognition and Comparison of Lexical Signs (prof M. Mous, P. van der Putten, V. Nyst).
  2. M. Lao (LIACS, 28/11/2023). Exploring Deep Learning for Multimodal Understanding (prof M. Lew, prof A. Plaat).
  3. R. Turner (ULEI/MI, 14/11/2023). Safe Anytime-Valid Inference: from Theory to Implementation in Psychiatry Research (prof P. Grünwald, prof F. Scheepers, A. Harma).
  4. K. van Mens (RUN/Psychiatry, 24/05/2023). Discovering insights with machine learning: Lessons learned from case studies in mental healthcare (prof B. Tiemens, prof R. Janssen, J. Lokkerbol, D. de Beurs).