I am associate professor and group leader of the Explanatory Data Analysis group at the Leiden Institute of Advanced Computer Science (LIACS), the computer science institute of Leiden University. My primary research interest is exploratory data mining: how can we enable domain experts to explore and analyse their data, to discover structure and—ultimately—novel knowledge?
For this it is important that methods and results are explainable to domain experts, who may not be data scientists. My signature approach is to define and identify patterns that matter, i.e., succinct descriptions that characterise relevant structure present in the data. Which patterns matter strongly depends on the data and task at hand, hence defining the problem is one of the key challenges of exploratory data mining. Information theoretic concepts such as the Minimum Description Length (MDL) principle have proven very useful to this end. I am also interested in interactive data mining, i.e., involving humans in the loop. Finally, I am interested in fundamental data mining research for real-world applications, both in science (e.g., life sciences, social sciences) and industry (e.g., manufacturing and engineering, aviation), as this is the best way to show that the theory works in practice.
I am affiliated with SAILS and DSRP, the university-wide research programmes for artificial intelligence and data science, respectively. Broadly speaking, my research can be situated in the fields of data mining, machine learning, data science, and artificial intelligence (AI).
In press |
|
Robust subgroup discovery. Data Mining and Knowledge Discovery |
|
2022 |
|
Feature Selection for Fault Detection and Prediction based on Log Analysis. In: Proceedings of the international workshop on AI for Manufacturing Workshop at ECMLPKDD 2022, 2022. |
|
Histogram-based Probabilistic Rule Lists for Numeric Targets. In: Proceedings of the international workshop on Knowledge Discovery in Inductive Databases (KDID 2022) at ECMLPKDD 2022, 2022. |
|
Truly Unordered Probabilistic Rule Sets for Multi-class Classification. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD 2022), Springer, 2022. |
|
Finding Efficient Trade-offs in Multi-Fidelity Response Surface Modeling. Engineering Optimization |
|
Probabilistic Rule Sets Ready for Interactive Machine Learning. In: AAAI'22-Workshop on Interactive Machine Learning, 2022. |
|
Associations between symptoms, donor characteristics and IgG antibody response in 2082 COVID-19 convalescent plasma donors. Frontiers in Immunology, Frontiers |
|
2021 |
|
Evaluating privacy of individuals in medical data. Health Informatics Journal, SAGE Publications |
|
Estimating Conditional Mutual Information for Discrete-Continuous Mixtures using Multi-Dimensional Adaptive Histograms. In: Proceedings of the SIAM Conference on Data Mining 2021 (SDM'21), SIAM, 2021. |
|
Online Summarization of Dynamic Graphs using Subjective Interestingness for Sequential Data. Data Mining and Knowledge Discovery vol.35(1), pp 88-126, 2021. (ECML PKDD journal track) |