I am associate professor and director of education at the Leiden Institute of Advanced Computer Science (LIACS), the computer science institute of Leiden University. I am group leader of the Explanatory Data Analysis group.
My primary research interest is exploratory data mining: how can we enable domain experts to explore and analyse their data, to discover structure and—ultimately—novel knowledge?
For this it is important that methods and results are explainable to domain experts, who may not be data scientists. My signature approach is to define and identify patterns that matter, i.e., succinct descriptions that characterise relevant structure present in the data. Which patterns matter strongly depends on the data and task at hand, hence defining the problem is one of the key challenges of exploratory data mining. Information theoretic concepts such as the Minimum Description Length (MDL) principle have proven very useful to this end. I am also interested in interactive data mining, i.e., involving humans in the loop. Finally, I am interested in fundamental data mining research for real-world applications, both in science (e.g., life sciences, social sciences) and industry (e.g., manufacturing and engineering, aviation), as this is the best way to show that the theory works in practice.
I am affiliated with SAILS, the university-wide research programme for artificial intelligence. Broadly speaking, my research can be situated in the fields of data mining, machine learning, data science, and artificial intelligence (AI).
In press |
|
Cross-Domain Graph Level Anomaly Detection. Transactions on Knowledge and Data Engineering, ACM |
|
2024 |
|
Conditional Density Estimation with Histogram Trees. In: Proceedings of the Conference on Neural Information Processing Systems (NeurIPS 2024), 2024. |
|
A Survey on Explainable Anomaly Detection. Transactions on Knowledge Discovery from Data vol.18(1), ACM, 2024. |
|
Human-guided Rule Learning for ICU Readmission Risk Analysis. In: Proceedings of the Workshop on AI and Data Science for Healthcare (AIDSH) at KDD 2024, 2024. |
|
Graph Neural Networks based Log Anomaly Detection and Explanation. In: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings, pp 306-307, ACM, 2024. |
|
2023 |
|
Evaluating Cluster-Based Synthetic Data Generation for Blood-Transfusion Analysis. Journal of Cybersecurity and Privacy vol.3(4), pp 882-894, MDPI, 2023. |
|
WEARDA: recording wearable sensor data for human activity monitoring. Journal of Open Research Software vol.11(1), 2023. |
|
The added value of ferritin levels and genetic markers for the prediction of haemoglobin deferral. Vox Sanguinis vol.118(10), pp 825-834, 2023. |
|
Explainable Contextual Anomaly Detection using Quantile Regression Forests. Data Mining and Knowledge Discovery, Springer |
|
Novel approach for phenotyping based on diverse top-k subgroup lists. In: Proceedings of the Conference on Artificial Intelligence In Medicine (AIME 2023), Springer, 2023. |
|
Discovering Diverse Top-k Characteristic Lists. In: Proceedings of the 21st International Symposium on Intelligent Data Analysis (IDA 2023), Springer, 2023. |
|
Discovering Rule Lists with Preferred Variables. In: Proceedings of the 21st International Symposium on Intelligent Data Analysis (IDA 2023), Springer, 2023. |
|
Defining migraine days, based on longitudinal E-diary data. Cephalalgia |
|
Unsupervised Discretization by Two-dimensional MDL-based Histogram. Machine Learning, Springer |
|
Generating synthetic mixed discrete-continuous health records with mixed sum-product networks. Journal of the American Medical Informatics Association vol.30(1), Oxford University Press, 2023. |