Methods for Phonetic Scraping of Youtube Videos - Productions scientifiques du CLILLAC-ARP Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Methods for Phonetic Scraping of Youtube Videos

Résumé

This paper discusses two pipelines for the auto- matic collection of automatic speech recognition (ASR) transcripts and audio content from YouTube videos and subsequent phonetic analysis: PEASYV (Phonetic Extraction and Alignment of Subtitled YouTube Videos) and YTPP (YouTube Phonetics Pipeline). The pipelines differ somewhat in terms of processing steps as well as the tools used for forced alignment, but produce comparable results. The two pipelines may be useful for large-scale collection of acoustic data for phonetic analysis.
Fichier principal
Vignette du fichier
2023.icnlsp-1.25.pdf (1.82 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04547365 , version 1 (15-04-2024)

Licence

Paternité - Pas d'utilisation commerciale - Partage selon les Conditions Initiales

Identifiants

  • HAL Id : hal-04547365 , version 1

Citer

Adrien Méli, Steven Coats, Nicolas Ballier. Methods for Phonetic Scraping of Youtube Videos. 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023), Mourad Abbas, Abed Alhakim Freihat, Dec 2023, Trento (Italy), France. pp.244-249. ⟨hal-04547365⟩
0 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More