Публікація:
Word2Vec Model Analysis for Semantic and Morphologic Similarities in Turkish Words

dc.contributor.authorSavytska L.
dc.contributor.authorTurgut Sübay M.
dc.contributor.authorVnukova N.
dc.contributor.authorBezugla I.
dc.contributor.authorPyvovarov V.
dc.date.accessioned2023-02-16T11:35:57Z
dc.date.available2023-02-16T11:35:57Z
dc.date.issued2022
dc.description.abstractThe study presents the calculation of the similarity between words in Turkish language by using word representation techniques. Word2Vec is a model used to represent words into vector form. The model is formed using articles from Wikipedia dump Turkish service as the corpus and then Cosine Similarity calculation method is used to determine the similarity value. The open-source Python programming language and Gensim library are used to obtain high quality word vectors with Word2Vec and calculate the cosine similarity of the vectors. Continuous Bag-of-words (CBOW) algorithm is used to train high quality word vectors. The cosine similarity values in the results are derived from the weight (dimension values) of the vector dimensions. The Window size 10 and 300 vector dimension configurations are taken. Increasing the number of cycles contributes to the vectors getting more accurate values. The corpus is trained in five cycles (EPOCH) with the same parameters. The Turkish corpus contains more than one hundred and sixty one million words. The dictionary of words (unique words), obtained from the corpus, is more than three hundred and sixty-seven thousand. Such a big data gives an opportunity to conduct high quality semantic and morphologic analysis and arithmetic operations of the word vectors.
dc.identifier.citationWord2Vec Model Analysis for Semantic and Morphologic Similarities in Turkish Words / Savytska, L., Turgut Sübay, M., Vnukova, N., Bezugla, I., Pyvovarov, V. // COLINS-2022: 6th International Conference on Computational Linguistics and Intelligent Systems, May 12–13, 2022, Gliwice, Poland, 2022, р. 161–176. - URL: https://ceur-ws.org/Vol-3171/paper17.pdf.
dc.identifier.urihttps://openarchive.nure.ua/handle/document/21943
dc.subjectNLP
dc.subjectWord2Vec
dc.subjectword vectors
dc.subjectcosine similarity
dc.subjectword embedding
dc.subjectsemantic relations
dc.subjectformal (structural) relations
dc.subjectTurkish language
dc.titleWord2Vec Model Analysis for Semantic and Morphologic Similarities in Turkish Words
dc.typeArticle
dspace.entity.typePublication

Файли

Оригінальний пакет
Зараз показано 1 - 1 з 1
Завантаження...
Зображення мініатюри
Назва:
Vnukova.pdf
Розмір:
1.08 MB
Формат:
Adobe Portable Document Format
Ліцензійний пакет
Зараз показано 1 - 1 з 1
Немає доступних мініатюр
Назва:
license.txt
Розмір:
9.64 KB
Формат:
Item-specific license agreed upon to submission
Опис: