Extending the Fellegi-Sunter record linkage model for mixed-type data with application to the French national health data system - Centre Henri Lebesgue Accéder directement au contenu
Article Dans Une Revue Computational Statistics and Data Analysis Année : 2023

Extending the Fellegi-Sunter record linkage model for mixed-type data with application to the French national health data system

Résumé

Probabilistic record linkage is a process of combining data from different sources, when such data refer to common entities and identifying information is not available. Fellegi and Sunter proposed a probabilistic record linkage framework that takes into account multiple non-identifying information, but is limited to simple binary comparison between matching variables. In our work, we propose an extension of this model for mixed-type comparison vectors. We develop a mixture model for handling comparison values of low prevalence categorical matching variables, and a mixture of hurdle gamma distribution for handling comparison values of continuous matching variables. The parameters are estimated by means of the Expectation Conditional Maximization (ECM) algorithm. Through a Monte Carlo simulation study, we evaluate both the posterior probability estimation for a record pair to be a match, and the prediction of matched record pairs. The simulation results indicate that the proposed methods outperform existing ones in most considered cases. The proposed methods are applied on a real dataset, to perform linkage between a registry of patients suffering from venous thromboembolism in the Brest district area (GETBO) and the French national health information system (SNDS).
Fichier principal
Vignette du fichier
preprint_CSDA_Record linkage.pdf (558.57 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03290773 , version 1 (19-07-2021)
hal-03290773 , version 2 (08-11-2022)

Identifiants

Citer

Thanh Huan Vo, Guillaume Chauvet, André Happe, Emmanuel Oger, Stephane Paquelet, et al.. Extending the Fellegi-Sunter record linkage model for mixed-type data with application to the French national health data system. Computational Statistics and Data Analysis, 2023, 179 (article n° 107656), ⟨10.1016/j.csda.2022.107656⟩. ⟨hal-03290773v2⟩
377 Consultations
492 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More