top of page
Image by Fares Hamouche

Record linkage

Combining data from various sources empowers researchers to explore innovative questions, for example those raised by conducting healthcare monitoring studies. However, the lack of a unique identifier often poses challenges. Record linkage procedures determine whether pairs of observations collected on different occasions belong to the same individual.

Summary

Data is typically anonymized, for very good reasons. This makes it difficult to link records of the same individual to each other for research purposes.

 

An example comes from the field of gynecology. A team of researchers wanted to investigate mothers who previously had a preterm delivery of twins. The researchers wondered what happened at a next birth, if the mother had more children. How likely was it that a next baby would be delivered preterm? The answer to this question can be found in the Netherlands Perinatal Registry, which contains data on almost all births in the Netherlands, but the data doesn't include a unique identifier for the mother. Record linkage methods enable researchers to link records from multiple births by the same mother. You can read the results from the study on preterm delivery here.

The general idea of record linkage is that records are matched based on characteristics recorded in the data, like sex, age and zip code. However, data could contain errors, or information could have changed. For example, people tend to move, so the same person could have various zip codes throughout data sets. With FlexRL, we introduce a flexible method that can perform reliable record linkage while taking into account errors or changes to the data.

Related paperss

Robach, K., van der Pas, S., van de Wiel, M. & Hof, M.H. (2024). A flexible model for record linkage. arXiv preprint arXiv:2407.06835 [link]

bottom of page