Academic Profile

My academic drive is to connect practical problems in healthcare practices to fundamental challenges in data science and to subsequently address both simultaneously. This is in essence my Translational Data Science (TDS) research theme, which bridges the best of both worlds. Pasteur's Quadrant in Figure 1 visualises my drive to achieve a better fundamental understanding of the world around us by being societally inspired, demand-driven and solution-oriented.

In 2020 I was appointed Full Professor of Advanced Data Science in Population Health at both the Department of Public Health & Primary Care (PHEG) of the Leiden University Medical Center (LUMC) and the Leiden Institute of Advanced Computer Science (LIACS) at the Faculty of Science (FWN) of Leiden University (ULEI) to further pursue my vision of simultaneously translating novel data techniques to health innovations and implementing new insights from these novel applications into daily healthcare practices. On 1 April 2022 I delivered my inaugural lecture titled Translational Data Science in Population Health, in which I introduced TDS as an independent discipline embedded within the Dutch scientific landscape.

At the Health Campus The Hague I have started the TDS Lab. My strategic research objective is to establish an authoritative and open national infrastructure for Dutch health research, education and care to accelerate innovation and to democratise data science technologies through especially natural language processing and automated machine learning technologies. My research theme Translational Data Science in Population Health has three complementary research lines which together address the continuous knowledge discovery process as operationalised by the cross-industry standard process for data science (CRISP-DM) as shown in the figure below:

Figure 1: Translational data science in Pasteur's Quadrant (on the left) combines basic data science understanding with applied data science use considerations. My Top-10 output items encompass the entire data science process (on the right).

Firstly, the Data Engineering research line investigates the further consolidation, standardisation and enrichment of the Extramural LUMC Academic Network (ELAN) data infrastructure which links structured medical, health, social domain, and socio-economic data of 1M+ inhabitants, extending my "FAIR Data in context ArchiTEcture" project [FeDerATE (UU/ITS) 200K]. I explore federated machine learning (FML) to enrich ELAN with inter-organisational linking of unstructured and multimodal data to further my "Computing Visits Data" programme [COVIDA (EWUU) 250K] [out #3]. I develop natural language processing (NLP) techniques for Dutch medical texts to extract diagnosis and treatment information from clinical notes, and mental health language markers from patient narratives [out #4]. I design clinical decision support systems (CDSS) that integrate guidelines and taxonomies to bootstrap machine learning (ML) innovations, e.g. the STRIP Assistant for optimising medication reviews in polypharmacy patients [out #8].

Secondly, the Data Analytics research line employs NLP and ML techniques for their suitability to answer translational research questions. I democratise data science by utilising automated machine learning (AutoML) technology [out #5], synthetic data generation [INSAFEDARE (HE) 570K; SENSYN (NWO) 50K] and unsupervised topic discovery [SAF21 (H2020) 250K]. In my "Psychiatry Research Analytics InfraStructure" project [PRAISE (UMCU/Psychiatry) 200K] on clinical NLP we report on a Deep Learning-based prediction model for assessing inpatient violence risk using clinical notes [out #6]. Recently, we introduced the first-ever gender bias exploration and mitigation in a ML model trained on real clinical psychiatry data [out #7].

Finally, the e-Health Implementation research line designs and implements data science interventions as e-health solutions in the The Hague region. In early BeHapp work I supervised the development of a sociability score metric to continuously monitor psychiatric patients [out #2]. In three Randomised Controlled Trials in the Netherlands, Switzerland, Ireland and Belgium, I led the STRIP Assistant work packages as the intervention instrument [OPERAM (H2020) 250K; STRIMP (ZonMW) 110K; OPTICA (SNF) 25K] [out #8]. Through the GEIGER and SMESEC projects [(H2020) 300K; (H2020) 280K] we established a sustainable ecosystem with integrated training, software tooling and user community for digitally-dependent professionals [out #9]. Core Life Analytics BV received 1M+ venture capital to commercialise our big data analytics research [CESCA (UMCU/CSC) 100K].

In recent years I have become increasingly more visible internationally as a leader in health data science through editorships at the journals on Healthcare Analytics (Elsevier), Digital Public Health (Frontiers), Semantic Web & Information Systems (IGI), and Computer Information Systems (T&F), next to various programme committee memberships of leading conferences such as AIME, ICIS, ECIS and WWW.

Until 2020 I worked as an assistant/associate professor in the Information Systems and NLP research groups at Utrecht University's Computer Science department, where I developed my Applied Data Science (ADS) research theme. In 2015 I launched the ADS Lab with a specific focus on healthcare innovations. I authoritatively defined ADS as an independent research discipline [out #1]. My current TDS research theme can be considered a deeper and theoretically grounded improvement over ADS. Until 2007 I worked as a PhD researcher in language data science at the University of Amsterdam (UvA). I introduced a novel association rule mining technique, received an Association for Literary and Linguistic Computing (ACLC) bursary award in 2005, and was an invited researcher at the Università di Trieste. Before 2003 I worked in industry for ten years as an NLP/Big Data software engineer at ZyLAB Europe and the Dutch Royal Navy, among others.


Between 2013-2023 I participated in leadership programmes at Utrecht University and Leiden University to further develop my leadership capabilities. In 2014 I completed the Educational Leadership programme, while leading the Information Science CUrriculum REvision (CURE) as Education Manager. During 2015-2016 I contributed to the university-wide data science strategy, designed and led the ADS postgraduate programme, and co-designed and managed the ADS master's profile for several years. In 2017 I was awarded the Senior Teaching Qualification. In 2022 I proposed the data science specialisation in LUMC's Population Health Management programme.

In 2018 during the Academic Leadership programme, my colleagues characterised me as being creative, positive, witty, dauntless, and motivating. Between 2017-2020 I represented UU in the Data Science Platform Netherlands (DSPN). I was the data science expert in the New Science Agenda (NWA) Taskforce on Prevention [out #10]. In the "NWO Round Table Session on Health", I urged to reboot Dutch health infrastructure. In 2019 I was awarded the Senior Research Qualification and Ius Promovendi after completing the Research Leadership programme.

Currently, 10 PhD students have completed their dissertations and are furthering their careers as assistant professor, lecturer, postdoc, data scientist, or data manager in academia or industry. My TDS Lab currently consists of 1 assistant professor, 2 postdocs, 6 fulltime PhD students and 4 external parttime PhD students. In 2023 I completed the LUMC Leadership for Higher Management programme to revisit my leadership vision and capabilities.

Over the years a hybrid Coaching/Leading-by-Example leadership style crystallised in which I empower my Lab members through continuous motivational support and creativity stimulation through blue-sky thinking, always ensuring their perceived research ownership. The hybrid TDS Lab has been meeting monthly since 2016 to catch up socially and academically, next to optional participation in UL's SIG Health Data Science seminars, which I co-lead. We also organise both periodic and ad-hoc individual meetings. Additionally, I prioritise timely written feedback. Finally, I am proud to observe that longlasting friendships have developed in my Lab.

Top-10 output items

  1. Spruit,M., & Lytras,M. (2018). Applied Data Science in Patient-centric Healthcare: Adaptive Analytic Systems for Empowering Physicians and Patients. Telematics and Informatics, 35(4), Patient Centric Healthcare, 643-653. 10.1016/j.tele.2018.04.002
  2. Eskes,P., Spruit,M., Brinkkemper,S., Vorstman,J., & Kas,M. (2016). The Sociability Score: App-based Social Profiling from a Healthcare Perspective. Computers in Human Behavior, 59, 39-48. 10.1016/j.chb.2016.01.024
  3. Borger,T., Mosteiro,P., Kaya,H., Rijcken,E., Salah,A., Scheepers,F., & Spruit,M. (2022). Federated Learning for Violence Incident Prediction in a Simulated Cross-institutional Psychiatric Setting. Expert Systems with Applications, 199, 116720. 10.1016/j.eswa.2022.116720
  4. Spruit,M., Verkleij,S., Schepper,C. de, & Scheepers,F. (2022). Exploring Language Markers of Mental Health in Psychiatric Stories. Applied Sciences, 12(4), Current Approaches and Applications in Natural Language Processing, 2179. 10.3390/app12042179
  5. Ooms,R., & Spruit,M. (2020). Self-Service Data Science in Healthcare with Automated Machine Learning. Applied Sciences, 10(9), Medical Artificial Intelligence, 2992. 10.3390/app10092992
  6. Menger,V., Spruit,M., Est,R. van, Nap,E., & Scheepers,F. (2019). Machine Learning Approach to Inpatient Violence Risk Assessment Using Routinely Collected Clinical Notes in Electronic Health Records. JAMA Network Open, 2(7), e196709. 10.1001/jamanetworkopen.2019.6709
  7. Mosteiro,P., Kuiper,J., Masthoff,J., Scheepers,F., & Spruit,M. (2022). Bias Discovery in Machine Learning Models for Mental Health. Information, 13(5), Advances in Explainable Artificial Intelligence, 237. 10.3390/info13050237
  8. Blum,M., Sallevelt,B., Spinewine,A., O'Mahony,D., Moutzouri,E., Feller,M., Baumgartner,C., Roumet,M., Jungo,K., Schwab,N., Bretagne,L., Beglinger,S., Aubert,C., Wilting,I., Thevelin,S., Murphy,K., Huibers,C., Drenth-van Maanen,C., Boland,B., Crowley,E., Eichenberger,A., Meulendijk,M., Jennings,E., Adam,L., Roos,M., Gleeson,L., Shen,Z., Marien,S., Meinders,A., Baretella,O., Netzer,S., Montmollin,M., Fournier,A., Mouzon,A., O'Mahony,C., Aujesky,D., Mavridis,D., Byrne,S., Jansen,P., Schwenkglenks,M., Spruit,M., Dalleur,O., Knol,W., Trelle,S., & Rodondi,N. (2021). Optimizing Therapy to Prevent Avoidable Hospital Admissions in Multimorbid Older Adults (OPERAM): Cluster Randomised Controlled Trial. BMJ, 374(n1585). 10.1136/bmj.n1585
  9. CyberGEIGER GmbH. Being one of the six founders, CyberGEIGER provides integrated training, tooling and user community for non-IT professionals such as physicians, accountants, students and startups to assess, plan and support data protection throughout Europe.
  10. Taskforce Preventie (2018). Kennisagenda Preventie. Nationale Wetenschapsagenda route Gezondheidszorgonderzoek, preventie en behandeling. NFU-18.2849. 2020-08/18.2849_NFU