MarkedBERT: Integrating Traditional IR Cues in Pre-trained Language Models for Passage Retrieval - Recherche d’Information et Synthèse d’Information Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

MarkedBERT: Integrating Traditional IR Cues in Pre-trained Language Models for Passage Retrieval

Résumé

The Information Retrieval (IR) community has witnessed a flourishing development of deep neural networks, however, only a fewmanaged to beat strong baselines. Among them, models like DRMM and DUET were able to achieve better results thanks to the properhandling of exact match signals. Nowadays, the application of pre-trained language models to IR tasks has achieved impressive resultsexceeding all previous work. In this paper, we assume that established IR cues like exact term-matching, proven to be valuable fordeep neural models, can be used to augment the direct supervision from labeled data for training these pre-trained models. To studythe effectiveness of this assumption, we propose MarkedBERT a modified version of one of the most popular pre-trained modelsvia language modeling tasks, BERT. MarkedBERT integrates exact match signals using a marking technique that locates and highlightsExact Matched query-document terms using marker tokens. Experiments on MS MARCO Passage Ranking task show that our rathersimple approach is actually effective. We find that augmenting the input with marker tokens allows the model to focus on valuabletext sequences for IR.
Fichier principal
Vignette du fichier
sigir_2020_markedbert.pdf (472.37 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03011890 , version 1 (18-11-2020)

Identifiants

Citer

Lila Boualili, Jose G. Moreno, Mohand Boughanem. MarkedBERT: Integrating Traditional IR Cues in Pre-trained Language Models for Passage Retrieval. 43rd International ACM SIGIR conference on research and development in Information Retrieval (SIGIR 2020), Jul 2020, Virtual Event, China. pp.1977-1980, ⟨10.1145/3397271.3401194⟩. ⟨hal-03011890⟩
182 Consultations
456 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More