(2019) Калайчев, Г. В.; Сидоров, М. В.; Шпакович, М. О.
The main goal of this work is to show the ways of preparation the amount of data, building a classification model on the huge dataset and evaluating resulting model on test data. Initial problem which was solved in this work was taken from Microsoft Malware Prediction Competition from Kaggle site. This task is an appropriate for our goal since training dataset contains different types of features for preprocessing and 9 million of rows.