Skip to Main content Skip to Navigation
New interface
Journal articles

Extending the Fellegi-Sunter record linkage model for mixed-type data with application to the French national health data system

Abstract : Probabilistic record linkage is a process of combining data from different sources, when such data refer to common entities and identifying information is not available. Fellegi and Sunter proposed a probabilistic record linkage framework that takes into account multiple non-identifying information, but is limited to simple binary comparison between matching variables. In our work, we propose an extension of this model for mixed-type comparison vectors. We develop a mixture model for handling comparison values of low prevalence categorical matching variables, and a mixture of hurdle gamma distribution for handling comparison values of continuous matching variables. The parameters are estimated by means of the Expectation Conditional Maximization (ECM) algorithm. Through a Monte Carlo simulation study, we evaluate both the posterior probability estimation for a record pair to be a match, and the prediction of matched record pairs. The simulation results indicate that the proposed methods outperform existing ones in most considered cases. The proposed methods are applied on a real dataset, to perform linkage between a registry of patients suffering from venous thromboembolism in the Brest district area (GETBO) and the French national health information system (SNDS).
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03290773
Contributor : Thanh Huan VO Connect in order to contact the contributor
Submitted on : Tuesday, November 8, 2022 - 6:57:30 AM
Last modification on : Saturday, November 26, 2022 - 6:11:41 AM

File

preprint_CSDA_Record linkage.p...
Files produced by the author(s)

Identifiers

Citation

Thanh Huan Vo, Guillaume Chauvet, André Happe, Emmanuel Oger, Stephane Paquelet, et al.. Extending the Fellegi-Sunter record linkage model for mixed-type data with application to the French national health data system. Computational Statistics and Data Analysis, 2023, 179 (article n° 107656), ⟨10.1016/j.csda.2022.107656⟩. ⟨hal-03290773v2⟩

Share

Metrics

Record views

224

Files downloads

186