Iterative Spaced Seed Hashing: Closing the Gap Between Spaced Seed Hashing and k-mer Hashing

Enrico Petrucci; Laurent Noé; Cinzia Pizzi; Matteo Comin

doi:10.1007/978-3-030-20242-2_18

Communication Dans Un Congrès Année : 2019

Iterative Spaced Seed Hashing: Closing the Gap Between Spaced Seed Hashing and k-mer Hashing

(1) , (2, 3, 4) , (1) , (1)

1
2
3
4

Enrico Petrucci

Fonction : Auteur

Department of Information Engineering [Padova]

Laurent Noé

Fonction : Auteur
PersonId : 85
IdHAL : noe
ORCID : 0000-0002-1170-8376
IdRef : 093601948

Université de Lille, Sciences et Technologies

Université de Lille

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Cinzia Pizzi

Fonction : Auteur

Department of Information Engineering [Padova]

Matteo Comin

Fonction : Auteur

Department of Information Engineering [Padova]

Résumé

Alignment-free classification of sequences has enabled high-throughput processing of sequencing data in many bioinformatics pipelines. Much work has been done to speed-up the indexing of k-mers through hash-table and other data structures. These efforts have led to very fast indexes, but because they are k-mer based, they often lack sensitivity due to sequencing errors or polymorphisms. Spaced seeds are a special type of pattern that accounts for errors or mutations. They allow to improve the sensitivity and they are now routinely used instead of k-mers in many applications. The major drawback of spaced seeds is that they cannot be efficiently hashed and thus their usage increases substantially the computational time. In this paper we address the problem of efficient spaced seed hashing. We propose an iterative algorithm that combines multiple spaced seed hashes by exploiting the similarity of adjacent hash values in order to efficiently compute the next hash. We report a series of experiments on HTS reads hashing, with several spaced seeds. Our algorithm can compute the hashing values of spaced seeds with a speedup of 6.2x, outperforming previous methods. Software and Datasets are available at ISSH

Mots clés

Efficient hashing Gapped q-gram spaced seeds k-mers

Domaines

Bio-informatique [q-bio.QM]

Fichier principal

ISSH_Camera.pdf (382.78 Ko)

Origine : Accord explicite pour ce dépôt

Laurent Noé : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02146404

Soumis le : mercredi 5 juin 2019-12:04:42

Dernière modification le : mercredi 24 janvier 2024-09:54:22

Dates et versions

hal-02146404 , version 1 (05-06-2019)

Identifiants

HAL Id : hal-02146404 , version 1
DOI : 10.1007/978-3-030-20242-2_18

Citer

Enrico Petrucci, Laurent Noé, Cinzia Pizzi, Matteo Comin. Iterative Spaced Seed Hashing: Closing the Gap Between Spaced Seed Hashing and k-mer Hashing. 15th International Symposium on Bioinformatics Research and Applications (ISBRA), Jun 2019, Barcelona, Spain. pp.208-219, ⟨10.1007/978-3-030-20242-2_18⟩. ⟨hal-02146404⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS CRISTAL CRISTAL-BONSAI UNIV-LILLE

155 Consultations

246 Téléchargements

Iterative Spaced Seed Hashing: Closing the Gap Between Spaced Seed Hashing and k-mer Hashing

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager