• Uncategorized

Artificial intelligence was taught to speak Ukrainian for the first time – the experience of ISD Group

Artificial intelligence was first taught to speak Ukrainian (with slang and uncensored). This was done by the developers from the technological creative agency ISD Group. In his column for AIN.UA Viktor Shkurba, the founder of the agency’s head, tells how it was.

The development of intelligence took six months from compiling a language corpus to polishing the final result. The project team consisted of 8 people.

GPT-2

Our developers have created their version of the AI-based on the GPT-2 model. This is the work of Hugging Face. It allows the neural network to generate coherent text that closely resembles the language of humans. It is based on a language model that generates a probability distribution for the next word, based on preliminary ones, and builds a logical sequence tree.

At the first stage, the main challenge was to create a model that would be able to communicate in Ukrainian like an ordinary person – with slang and without censorship.

First, the developers created the language corpus of the neural network using open databases of dictionaries of Ukrainian developers:

  • Lang-uk — an informal group of enthusiasts who aim to improve the computer processing of Ukrainian-language texts.
  • Brown-uk — brown’s body of the Ukrainian language.
  • ВЕСУМ — large electronic dictionary of the Ukrainian language.
  • ГРАК — general regionally annotated corpus of the Ukrainian languagesи.

As a basis for pre-training, the models collected 3 gigabytes of Ukrainian works from online libraries.

After that, the GPT-2 generator was supplemented with tools that help build context and correct word sequences in texts:

This was not enough, because at the output the neural network “spoke” in an old-fashioned way. Subsequently, subtitles of modern films, TV series, blogs, tweets and other user-generated content from social networks were added to the language corpus. Thus, the intellect began to speak modern and slang.

Live speech

At the second stage, the developers decided to experiment. Do not repeat the creation of a neutral system, an artificial advisor, but go further and give it character. This is a bad guy, even a little crazy, who constantly pisses his friends on some crazy things.

To do this, young people were involved in collecting the dataset through open game telegram bots. Users gave answers to questions and fed the neural network with their answers.

And in order to add diversity and extreme to the collected ideas, the AI ​​was subjected to a kind of “cannibalism”. Through the neural network, in several rounds, they ran ideas invented by him, designated the most successful ones and monitored the result.

To this end, ISD created a system of praise and punishment for the AI. Each generated variant gained a certain score, consisting of “estimates” of the following parameters: the width of the context, consideration of the topic of the previous sentence, connectivity and distance from the original variants of the dataset. For taking into account all parameters, the neural network received “pluses” – rewards, but for their abuse the system gave “minus” – penalties.

Thus, it was possible not only to program the neural network and force it to perform certain actions according to the templates of the language corpus and human variants of the dataset, but to correct the generation of sequences towards the most unpredictable and extraordinary responses.

You can see what happened and test the first artificial intelligence that can “speak” Ukrainian here. AI was created as part of the communication of the REVO brand.

Author: Victor Shkurba, head of ISD Group

Source: ain.ua

You may also like...