Title: Comparison of Czech Transformers on Text Classification Tasks
Authors: Lehečka, Jan
Švec, Jan
Citation: LEHEČKA, J. ŠVEC, J. Comparison of Czech Transformers on Text Classification Tasks. In Statistical Language and Speech Processing, SLSP 2021. Cham: Springer, 2021. s. 27-37. ISBN: 978-3-030-89578-5 , ISSN: 0302-9743
Issue Date: 2021
Publisher: Springer
Document type: konferenční příspěvek
ConferenceObject
URI: 2-s2.0-85118152009
http://hdl.handle.net/11025/47192
ISBN: 978-3-030-89578-5
ISSN: 0302-9743
Keywords in different language: text categorization and summarization;monolingual transformers;sentiment analysis;multi-label topic identification
Abstract in different language: In this paper, we present our progress in pre-training monolingual Transformers for Czech and contribute to the research community by releasing our models for public. The need for such models emerged from our effort to employ Transformers in our language-specific tasks, but we found the performance of the published multilingual models to be very limited. Since the multilingual models are usually pre-trained from 100+ languages, most of low-resourced languages (including Czech) are under-represented in these models. At the same time, there is a huge amount of monolingual training data available in web archives like Common Crawl. We have pre-trained and publicly released two monolingual Czech Transformers and compared them with relevant public models, trained (at least partially) for Czech. The paper presents the Transformers pre-training procedure as well as a comparison of pre-trained models on text classification task from various domains.
Rights: © Springer
Appears in Collections:Konferenční příspěvky / Conference Papers (KKY)
OBD

Files in This Item:
File SizeFormat 
Lehečka-Švec2021_Chapter_ComparisonOfCzechTransformersO.pdf276,39 kBAdobe PDFView/Open


Please use this identifier to cite or link to this item: http://hdl.handle.net/11025/47192

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

search
navigation
  1. DSpace at University of West Bohemia
  2. Publikační činnost / Publications
  3. OBD