Skip to Main content Skip to Navigation
Poster communications

LRez: C++ API and toolkit for analyzing and managing Linked-Reads data

Abstract : Linked-Reads technologies, such as 10x Genomics, Haplotagging, stLFR and TELL-Seq, partition and tag high-molecular-weight DNA molecules with a barcode using a microfluidic device prior to classical short-read sequencing. This way, Linked-Reads manage to combine the high-quality of the short reads and a long-range information which can be inferred by identifying distant reads belonging to the same DNA molecule with the help of the barcodes. This technology can thus efficiently be employed in various applications, such as structural variant calling, but also genome assembly, phasing and scaffoling. To benefit from Linked-Reads data, most methods first map the reads against a reference genome, and then rely on the analysis of the barcode contents of genomic regions, often requiring to fetch all reads or alignments with a given barcode. However, despite the fact that various tools and libraries are available for processing BAM files, to the best of our knowledge, no such tool currently exists for managing Linked-Reads barcodes, and allowing features such as indexing, querying, and comparisons of barcode contents. LRez aims to address this issue, by providing a complete and easy to use API and suite of tools which are directly compatible with various Linked-Reads sequencing technologies. LRez provides various functionalities such as extracting, indexing and querying Linked-Reads barcodes, in BAM, FASTQ, and gzipped FASTQ files (Table 1). The API is compiled as a shared library, helping its integration to external projects. Moreover, all functionalities are implemented in a thread-safe fashion. Our experiments show that, on a 70 GB Haplotagging BAM file from Heliconius erato [1], index construction took an hour, and resulted in an index occupying 11 GB of RAM. Using this index, querying time per barcode reached an average of 11 ms. In comparison, using a naive approach without a barcode-based index, querying time per barcode reached an hour.
Document type :
Poster communications
Complete list of metadata

https://hal.inria.fr/hal-03441917
Contributor : Claire Lemaitre Connect in order to contact the contributor
Submitted on : Monday, November 22, 2021 - 8:45:27 PM
Last modification on : Thursday, November 25, 2021 - 3:13:15 AM

File

JOBIM2021_paper_55.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03441917, version 1

Citation

Pierre Morisse, Claire Lemaitre, Fabrice Legeai. LRez: C++ API and toolkit for analyzing and managing Linked-Reads data. JOBIM 2021, Jul 2021, Paris, France. ⟨hal-03441917⟩

Share

Metrics

Record views

9

Files downloads

10