Dear students, since the LIACS matching meetup in October 2024, we have already teamed up with 7 new master students for our thesis topics, so we cannot take on more new students in the coming months. Check back in 2025 for more opportunities! Cheers, Marco
[ML] Balanced and balancing distance measures for mixed variable types
Many AI, ML and data science methods depend on the notion of a distance, which often acts as a dissimilarity measure between observations in the data set. In real-world data sets, variables have various types, e.g. continuous, ordinal, nominal/categorical and binary, contained within one data set. In such cases, dissimilarity is almost always measured using Gower's distance. It min-max-scales numeric variables, and assigns distances to non-numeric variables as 1 if the values are unequal, and 0 if they are. Dimensions are just added directly, like in the Manhattan distance measure. The implication is that distances are dominated by categorical dimensions, as the distance (if non-zero) corresponds to the largest possible distance in the numeric dimensions, which will typically have smaller values. Also, average distances per dimension are not equalized (not even if the dimensions themselves are normalized or standardized first), and are dominated by imbalanced columns. This project will develop a balanced version of Gower's distance that makes the contribution of every feature on average equal, and leaves the possibility to re-weigh the contribution of features. The resulting distance measure will be used for risk stratification of people with metabolic syndrome on a large scale data warehouse with health, demographic and socio-economic data, but is expected to find wide-spread use in distance-based machine learning tasks on heterogeneous data.
Daily supervisor: Marcel Haas (LUMC), Marco Spruit[ML] MDL-based association rule mining on ELAN data
Further the research in MSc thesis bySince the 1990s, there has been a rapid increase in overuse, abuse, and overdose deaths, along with the significant medical, social, psychological, demographic, and economic consequences associated with prescription opioids. Social and psychological effects are of particular interest because they extend beyond individual addiction to impact families, communities, and social systems, leading to issues such as mental health disorders, social isolation, and economic hardship. In this work, association rule mining is used on the ATC, ICPC codes, and patient demographics to draw interesting relationships. Specifically, Apriori and FP-growth algorithms were used to find frequent itemsets from which association rules were derived.
Daily supervisor: Marco Spruit, t.b.d.
[NLP] Extracting Adverse Drug Reactions from SmPC Using Large Language Models
Background
Previous research has demonstrated the effectiveness of natural language processing techniques in extracting adverse drug reactions (ADRs) from Summary of Product Characteristics (SmPC) documents. However, the potential of large language models (LLMs) for this task remains unexplored.
Objective
To develop and evaluate a method using large language models to automatically extract adverse drug reactions from SmPC documents, comparing its performance to previous NLP approaches.
Methods
Expected Outcomes
Significance
This study will explore the potential of LLMs in improving the accuracy and efficiency of ADR extraction from SmPC documents, potentially enhancing pharmacovigilance and drug safety monitoring processes.