Публікація: Методи інтелектуального аналізу консолідованих даних для підтримки прийняття рішень
Завантаження...
Дата
2021
Автори
Назва журналу
ISSN журналу
Назва тома
Видавництво
ХНУРЕ
Анотація
Дисертаційну роботу присвячено розробленню методів моделювання, формування аналітичних ознак, інтелектуального аналізу табличних і текстових консолідованих даних для підвищення точності, достовірності та інформативності результатів аналізу, які використовуються для підтримки прийняття рішень в інформаційно-аналітичних системах. Розроблено метод оптимізації прогнозної аналітики часових рядів з використанням стекінгового об’єднання та відбору різнотипних моделей на основі лінійної регресії LASSO та байєсівської регресії. Проаналізовано поєднання байєсівської, лінійної та машино-навчальної логістичних регресій у задачах виявлення технічних відмов. Розглянуто оптимізацію послідовності дій інтелектуального агента в задачах аналітики попиту з використанням глибокого Q-навчання та імітаційного моделювання середовища. Запропоновано модель векторного представлення текстових даних, у просторі семантичних семантичних та тематичних полів. Проведено аналіз текстових даних на основі алгоритмів машинного навчання з використанням кількісних семантичних ознак. Розроблено метод виявлення додаткових аналітичних ознак на основі лексемних поєднань у семантичних структурах текстових масивів. Запропоновано модель семантичних концептів текстових масивів на основі теорії аналізу формальних концептів.
The thesis focuses on the development of methods of modeling, formation of analytical features, intellectual analysis of tabular and textual consolidated data for increasing the accuracy, reliability and self-descriptiveness of the analysis results, which are used to support decision-making in information and analytical systems. The object of the research is the processing and analysis of consolidated data with different structures and from different sources of information. The subject of the research is models and methods of the intellectual analysis of consolidated data of tabular and textual type. The methods of the research are: the theory and algorithms of machine and deep learning for creating predictive models and their ensembles; the theory of reinforcement machine learning for building models of intelligent agents in the algorithms for optimizing the sequence of decision-making; the probability theory and mathematical statistics for the formation of frequency semantic characteristics of textual lexemes and for the creation of probabilistic predictive models of intellectual data analysis; the set theory for creating set-theoretic models of semantic and thematic fields; the theory of frequent sets and association rules, as well as the theory of analysis of formal concepts for the development of approaches in the analytics of text data streams. As a result of theoretical and experimental studies, the following scientific results were obtained: a method for optimizing the predictive analytics of time series using stacking combination and a selection of different types of models based on linear regression LASSO and Bayesian regression has been developed, providing an increase in forecasting accuracy as well as the formation of an optimal predictive ensemble of models; a method for detecting technical failures has been developed, which, due to a combination of Bayesian, linear and machine-learning logistic regression, provides an increase in the reliability of results, making it possible to build effective diversified decision-making processes; the methods for optimizing the sequence of actions of an intelligent agent in the tasks of demand analytics using deep Q-learning and simulation modeling of the interaction environment based on a parametric model and using historical data were further developed, providing an increase in the efficiency of business decision-making; a method of vector representation of textual data has been developed, which, through the theory
of semantic and thematic fields, makes it possible to represent text documents in a lowdimensional space of semantic features, reduces the complexity of calculations and increases the reliability of results in the analysis of textual data; a method for analyzing textual data based on machine learning algorithms using quantitative features of semantic and thematic fields as well as a method for genetic optimization of a set of these features have been developed, providing an increase in the reliability of the results of the intellectual analysis of text arrays; the method of classification and regression analysis of different types of consolidated data based on the combination of LSTM neural network for input text data and neural network with fully connected layers for input quantitative features has been improved, providing an increased reliability of the results; a method for identifying additional analytical features based on lexeme combinations in the semantic structures of text arrays has been developed, which, through the use of the theory of frequent sets and association rules, expands the information basis to support decision-making in the analytics
of consolidated data; a model of semantic concepts of text based on the theory of formal concepts analysis has been developed, making it possible to identify effective analytical features taking into account the semantic structure of text datasets.
The results obtained in the thesis research and the developed methods are a component technology for decision-making support in complex information systems and they provide an increase of self-descriptiveness and reliability of intellectual data analysis in predictive analytics of different types of consolidated data. The obtained results make it possible to: increase the accuracy in forecasting tasks and reduce the number of models in a stacking ensemble by 30% for a certain class of tasks due to the developed methods of stacking combination of different types of models into predictive ensembles; assess the uncertainty and predictive risks of the constituent models when making expert decisions on the formation of a predictive ensemble of models due to the developed method of using Bayesian regression for stacking predictive models; increase the accuracy and self-descriptiveness of the results in the analyses of demand dynamics and in the analytics of financial time series due to the developed methods of applying linear, probabilistic and machine-learning predictive models based on analytical features of the consolidated data of a given subject area of intellectual analysis; optimize the set of predictive features and improve the forecasting accuracy due to the developed methods in predicting technical failures on assembly lines in production using a stacking combination of models; reduce the number of analytical semantic features of textual data by 3-10 times compared to a set of lexeme frequency features for the given characteristics of the intellectual textual data analysis due to the developed methods of using the theory of semantic and thematic fields; quantitatively analyze the semantic component of the author’s idiolect in text arrays due to the developed method of text analysis using the theory of semantic and thematic fields; form additional semantic features for predictive models and improve the quality of information and analytical systems through the developed methods of intellectual analysis of text streams of Twitter using the theory of frequent sets and association rules as well as the theory of formal concepts analysis. The conducted studies have solved the relevant scientific and applied problem of a choice, combination and optimization of methods of the intellectual analysis of consolidated data by developing methods of modeling, formation of informative analytical features and intellectual analysis of tabular and textual data, taking into account the subject area of analysis, which made it possible to create effective predictive multilevel odels, expand the self-descriptiveness of intellectual analysis of various types of data and mprove the decision support for complex information-analytical systems.
Опис
Ключові слова
інтелектуальний аналіз даних, методи машинного навчання, ознаки даних, часові ряди, семантичні поля, часті множини, асоціативні правила, аналіз формальних концептів, data mining, machine learning methods, data features, time series, semantic fields, frequent sets, associative rules, formal concept analysis
Бібліографічний опис
Павлишенко, Б. М. Методи інтелектуального аналізу консолідованих даних для підтримки прийняття рішень : автореф. дис. ... д-ра техн. наук : 05.13.23 "Системи та засоби штучного інтелекту" / Б. М. Павлишенко ; М-во освіти і науки України, Харків. нац. ун-т радіоелектроніки. – Харків, 2021. – 40 с.