Published: 2025-10-31

Microcorpus of language errors in contemporary Polish

Magdalena Zawisławska

Abstract

The article discusses the process of compiling the first corpus of errors in contemporary Polish and its possible applications. The main goal of the corpus was to use it to train language models based on deep neural networks. However, during the annotation, several problems that may be of interest to linguists (especially those involved in prescriptive linguistics) were identified. Difficulties in annotation suggest that the very concept of error is unclear, as is categorization of language errors. Corpus statistics give an approximate picture of how well educated Poles know the linguistic norm and what types of errors most commonly appear in texts. Such information can be used for the purpose of language education at the school level and in Polish studies.

Keywords:

Polish, language errors, typology of errors, corps, linguistic norm

Download files

Citation rules

Zawisławska, M. (2025). Microcorpus of language errors in contemporary Polish. Poradnik Językowy, 827(8), 33–46. Retrieved from https://www.journals.polon.uw.edu.pl/index.php/pj/article/view/1838

Cited by / Share


This website uses cookies for proper operation, in order to use the portal fully you must accept cookies.