Grammarly collects texts in Ukrainian. This is necessary for the development of Ukrainian NLP (Natural Language Processing)

19.08.2020

The Ukrainian company Grammarly, which develops tools for working with texts on the Internet, wants to create the first annotated GEC corpus in Ukrainian. This is an array of texts that is necessary for the development of speech recognition systems, voice assistants and grammar correction tools.

What is needed to create a GEC corps

In order for the algorithms to “speak” in Ukrainian, Grammarly collects user texts – these can be posts from social networks, blogs, articles, essays, poems and letters. The texts will be checked by linguists to remove stylistic and spelling errors.

“Ukrainian is a language with a developed morphology. Unlike English, each word here has many word forms. NLP techniques developed for English will not always be optimal for Ukrainian. Finding the best methods for working with such languages is a separate task, and our corpus will come in handy here”, – the company explains.

What will this project give

will accelerate the development of voice assistants and online systems for correcting grammar in Ukrainian,
promote the use of high-quality Ukrainian language on the Internet,
will increase the number of open tools for NLP-learning Ukrainian (natural language processing or natural language processing).

How to help

The GEC corpus of the Ukrainian language will be published in the public domain. Material remuneration for participation in its creation is not provided, however, in this way any user can contribute to the development of the Ukrainian language online.

The collection will last until September 13. You can provide ready-made text or write text from scratch here.

Source: ain.ua