OTrecod: An R Package for Data Fusion using Optimal Transportation Theory - Analyse numérique Accéder directement au contenu
Article Dans Une Revue R Journal Année : 2022

OTrecod: An R Package for Data Fusion using Optimal Transportation Theory

Résumé

The advances of information technologies often confront users with a large amount of data which is essential to integrate easily. In this context, creating a single database from multiple separate data sources can appear as an attractive but complex issue when same information of interest is stored in at least two distinct encodings. In this situation, merging the data sources consists in finding a common recoding scale to fill the incomplete information in a synthetic database. The OTrecod package provides R-users two functions dedicated to solve this recoding problem using optimal transportation theory. Specific arguments of these functions enrich the algorithms by relaxing distributional constraints or adding a regularization term to make the data fusion more flexible. The OTrecod package also provides a set of support functions dedicated to the harmonization of separate data sources, the handling of incomplete information and the selection of matching variables. This paper gives all the keys to quickly understand and master the original algorithms implemented in the OTrecod package, assisting step by step the user in its data fusion project.
Fichier principal
Vignette du fichier
OTRecod_Rpackage.pdf (707.44 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03827398 , version 1 (24-10-2022)

Identifiants

Citer

Grégory Guernec, Valérie Garès, Jérémy Omer, Nicolas Savy, Philippe Saint-Pierre. OTrecod: An R Package for Data Fusion using Optimal Transportation Theory. R Journal, 2022, 14 (4), pp.195 - 222. ⟨10.32614/RJ-2023-006⟩. ⟨hal-03827398⟩
68 Consultations
78 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More