LRez: C++ API and toolkit for analyzing and managing Linked-Reads data - Irisa Accéder directement au contenu
Poster De Conférence Année : 2021

LRez: C++ API and toolkit for analyzing and managing Linked-Reads data

Résumé

Linked-Reads technologies, such as 10x Genomics, Haplotagging, stLFR and TELL-Seq, partition and tag high-molecular-weight DNA molecules with a barcode using a microfluidic device prior to classical short-read sequencing. This way, Linked-Reads manage to combine the high-quality of the short reads and a long-range information which can be inferred by identifying distant reads belonging to the same DNA molecule with the help of the barcodes. This technology can thus efficiently be employed in various applications, such as structural variant calling, but also genome assembly, phasing and scaffoling. To benefit from Linked-Reads data, most methods first map the reads against a reference genome, and then rely on the analysis of the barcode contents of genomic regions, often requiring to fetch all reads or alignments with a given barcode. However, despite the fact that various tools and libraries are available for processing BAM files, to the best of our knowledge, no such tool currently exists for managing Linked-Reads barcodes, and allowing features such as indexing, querying, and comparisons of barcode contents. LRez aims to address this issue, by providing a complete and easy to use API and suite of tools which are directly compatible with various Linked-Reads sequencing technologies. LRez provides various functionalities such as extracting, indexing and querying Linked-Reads barcodes, in BAM, FASTQ, and gzipped FASTQ files (Table 1). The API is compiled as a shared library, helping its integration to external projects. Moreover, all functionalities are implemented in a thread-safe fashion. Our experiments show that, on a 70 GB Haplotagging BAM file from Heliconius erato [1], index construction took an hour, and resulted in an index occupying 11 GB of RAM. Using this index, querying time per barcode reached an average of 11 ms. In comparison, using a naive approach without a barcode-based index, querying time per barcode reached an hour.
Fichier principal
Vignette du fichier
JOBIM2021_paper_55.pdf (117.46 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03441917 , version 1 (22-11-2021)

Identifiants

  • HAL Id : hal-03441917 , version 1

Citer

Pierre Morisse, Claire Lemaitre, Fabrice Legeai. LRez: C++ API and toolkit for analyzing and managing Linked-Reads data. JOBIM 2021 - Journées Ouvertes en Biologie, Informatique et Mathématiques, Jul 2021, Paris, France. pp.1. ⟨hal-03441917⟩
35 Consultations
49 Téléchargements

Partager

Gmail Facebook X LinkedIn More