Кафедра електронних обчислювальних машин (ЕОМ)

Постійний URI для цієї колекції

https://openarchive.nure.ua/handle/document/64

Перегляд

Зараз показано 1 - 4 з 4

Comparative analysis of neural network models for the problem of speaker recognition
(ХНУРЕ, 2023) Kholiev, V.; Barkovska, O.
The subject matter of the article are the neural network models designed or adapted for the problem of voice analysis in the context of the speaker identification and verification tasks. The goal of this work is to perform a comparative analysis of relevant neural network models in order to determine the model(s) that best meet the chosen formulated criteria, – model type, programming language of model’s implementation, parallelizing potential, binary or multiclass, accuracy and computing complexity. Some of these criteria were chosen because of universal importance, regardless of particular application, such as accuracy and computational complexity. Others were chosen due to the architecture and challenges of the scientific communication system mentioned in the work that performs tasks of the speaker identification and verification. The relevance of the paper lies in the prevalence of audio as a communication medium, which results in a wide range of practical applications of audio intelligence in various fields of human activity (business, law, military), as well as in the necessity of enabling and encouraging efficient environment for inward-facing audio-based scientific communication among young scientists in order for them to accelerate their research and to acquire scientific communication skills. To achieve the goal, the following tasks were solved: criteria for models to be judged upon were formulated based on the needs and challenges of the proposed model; the models, designed for the problems of speaker identification and verification, according to formulated criteria were reviewed with the results compiled into a comprehensive table; optimal models were determined in accordance with the formulated criteria. The following neural network based models have been reviewed: SincNet, VGGVox, Jasper, TitaNet, SpeakerNet, ECAPA_TDNN. Conclusions. For the future research and practical solution of the problem of speaker authentication it will be reasonable to use a convolutional neural network implemented in the Python programming language, as it offers a wide variety of development tools and libraries to utilize.
Justifying the selection of a neural network linguistic classifier
(ХНУРЕ, 2023) Barkovska, O.; Voropaieva, K.; Ruskikh, O.
The subject matter of this article revolves around the exploration of neural network architectures to enhance the accuracy of text classification, particularly within the realm of natural language processing. The significance of text classification has grown notably in recent years due to its pivotal role in various applications like sentiment analysis, content filtering, and information categorization. Given the escalating demand for precision and efficiency in text classification methods, the evaluation and comparison of diverse neural network models become imperative to determine optimal strategies. The goal of this study is to address the challenges and opportunities inherent in text classification while shedding light on the comparative performance of two well-established neural network architectures: Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN). To achieve the goal, the following tasks were solved: a comprehensive analysis of these neural network models was performed, considering several key aspects. These aspects include classification accuracy, training and prediction time, model size, data distribution, and overall ease of use. By systematically assessing these attributes, this study aims to provide valuable information about the strengths and weaknesses of each model and enable researchers and practitioners to make informed decisions when selecting a neural network classifier for text classification tasks. The following methods used are a comprehensive analysis of neural network models, assessment of classification accuracy, training and prediction time, model size, and data distribution. The following results were obtained: The LSTM model demonstrated superior classification accuracy across all three training sample sizes when compared to CNN. This highlights LSTM's ability to effectively adapt to diverse data types and consistently maintain high accuracy, even with substantial data volumes. Furthermore, the study revealed that computing power significantly influences model performance, emphasizing the need to consider available resources when selecting a model. Conclusions. Based on the study's findings, the Long Short-Term Memory (LSTM) model emerged as the preferred choice for text data classification. Its adeptness in handling sequential data, recognizing long-term dependencies, and consistently delivering high accuracy positions it as a robust solution for text analysis across various domains. The decision is supported by the model's swift training and prediction speed and its compact size, making it a suitable candidate for practical implementation.
Research into speech-to-text tranfromation module in the proposed model of a speaker’s automatic speech annotation
(ХНУРЕ, 2022) Barkovska, O.
The subject matter of the article is the module for converting the speaker’s speech into text in the proposed model of automatic annotation of the speaker’s speech, which has become more and more popular in Ukraine in the last two years, due to the active transition to an online form of communication and education as well as conducting workshops, interviews and discussing urgent issues. Furthermore, the users of personal educational platforms are not always able to join online meetings on time due to various reasons (one example can be a blackout), which explains the need to save the speakers’ presentations in the form of audio files. The goal of the work is to elimination of false or corrupt data in the process of converting the audio sequence into the relevant text for further semantic analysis. To achieve the goal, the following tasks were solved: a generalized model of incoming audio data summarization was proposed; the existing STT models (for turning audio data into text) were analyzed; the ability of the STT module to operate in Ukrainian was studied; STT module efficiency and timing for English and Ukrainian-based STT module operation were evaluated. The proposed model of the speaker’s speech automatic annotation has two major functional modules: speech-to-text (STT) і summarization module (SUM). For the STT module, the following models of linguistic text analysis have been researched and improved: for English it is wav2vec2-xls-r-1bz, and for Ukrainian it is Ukrainian STT model (wav2vec2-xls-r-1b-uk-with-lm.Artificial neural networks were used as a mathematical apparatus in the models under consideration. The following results were obtained: demonstrates the reduction of the word error level descriptor by almost 1.5 times, which influences the quality of word recognition from the audio and may potentially lead to obtaining higher-quality output text data. In order to estimate the timing for STT module operation, three English and Ukrainian audio recordings of various length (5s, ~60s and ~240s) were analyzed. The results demonstrated an obvious trend for accelerated obtaining of the output file through the application of the computational power of NVIDIA Tesla T4 graphic accelerator for the longest recording. Conclusions: the use of a deep neural network at the stage of noise reduction in the input file is justified, as it provides an increase in the WER metric by almost 25%, and an increase in the computing power of the graphics processor and the number of stream processors provide acceleration only for large input audio files. The following research of the author is focused on the study of the methods of the obtained text summarization module efficiency.
Research of the text processing methods in organization of electronic storages of information objects
(ХНУРЕ, 2022) Barkovska, O.; Khomych, V.; Nastenko, O.
The subject matter of the article is electronic storage of information objects (IO) ordered by specified rules at the stage of accumulation of qualification thesis and scientific work of the contributors of the offered knowledge exchange system provided to the system in different formats (text, graphic, audio). Classified works of contributors of the system are the ground for organization of thematic rooms for discussion to spread scientific achievements, to adopt new ideas, to exchange knowledge and to look for employers or mentors in different countries. The goal of the work is to study the libraries of text processing and analysis to speed-up and increase accuracy of the scanned text documents classification in the process of serialized electronic storage of information objects organization. The following tasks are: to study the text processing methods on the basis of the proposed generalized model of the system of classification of scanned documents with the specified location of the block of text processing and analysis; to investigate the statistics of change in the execution time of the developed parallel modification of the methods of the word processing module for the system with shared memory for collections of text documents of different sizes; analyze the results. The methods used are the following: parallel digital sorting methods, methods of mathematical statistics, linguistic methods of text analysis. The following results were obtained: in the course of the research fulfillment the generalized model of the scanned documents classification system that consist of image processing unit and text processing unit that include unit of the scanned image previous processing; text detection unit; previous text processing; compiling of the frequency dictionary; text proximity detection was offered. Conclusions: the proposed parallel modification of the previous text processing unit gives acceleration up to 3,998 times. But, at a very high computational load (collection of 18144 files, about 1100 MB), the resources of an ordinary multiprocessor-based computer with the shared memory obviously is not enough to solve such problems in the mode close to real time.

Перегляд

Перегляд Кафедра електронних обчислювальних машин (ЕОМ) за автором "Barkovska, O."

Результатів на сторінку

Варіанти сортування