6533b833fe1ef96bd129ba77
RESEARCH PRODUCT
Wykorzystanie korpusów rosyjskojęzycznych newsów internetowych na potrzeby systemów automatycznego rozpoznawania mowy w obszarze monitoringu mediów
subject
description
The author of the article used open Internet-news corpuses (NewsRu and Taiga) to create N-gram language models for the needs of automatic speech recognition systems. The models were comprehensively evaluated (perplexity, WER, proper name recognition, comparison with the base model and Google ASR). The author also rescored N-gram models, using recursive neural networks. The effectiveness of the models was assessed by recognizing speech from the news channel Россия 24 (37 files with a total length of 1.5 hours were tested). The selection of test data is related to the main goal of the article — speech recognition for the needs of the so-called media monitoring.
year | journal | country | edition | language |
---|---|---|---|---|
2022-01-01 | Przegląd Rusycystyczny |