<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Word2Vec Model Analysis for Semantic and Morphologic Similarities in Turkish Words</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Larysa</forename><surname>Savytska</surname></persName>
							<email>larisa-savickaya@hotmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Simon Kuznets Kharkiv National University of Economics</orgName>
								<address>
									<addrLine>Nauky av. 9a</addrLine>
									<postCode>61166</postCode>
									<settlement>Kharkiv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">M</forename><surname>Turgut Sübay</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Piramit Danismanlik A.S</orgName>
								<address>
									<settlement>İstanbul, Kadıköy</settlement>
									<country key="TR">Turkey</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nataliya</forename><surname>Vnukova</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Simon Kuznets Kharkiv National University of Economics</orgName>
								<address>
									<addrLine>Nauky av. 9a</addrLine>
									<postCode>61166</postCode>
									<settlement>Kharkiv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Iryna</forename><surname>Bezugla</surname></persName>
							<email>iryna.bezugla@hneu.net</email>
							<affiliation key="aff0">
								<orgName type="institution">Simon Kuznets Kharkiv National University of Economics</orgName>
								<address>
									<addrLine>Nauky av. 9a</addrLine>
									<postCode>61166</postCode>
									<settlement>Kharkiv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Vasyl</forename><surname>Pyvovarov</surname></persName>
							<email>v.pyvovarov@ukr.net</email>
							<affiliation key="aff2">
								<orgName type="institution">Yaroslav Mudryi National Law University</orgName>
								<address>
									<addrLine>Pushkinska str. 77</addrLine>
									<postCode>61024</postCode>
									<settlement>Kharkiv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<orgName type="department">International Conference on Computational Linguistics and Intelligent Systems</orgName>
								<address>
									<addrLine>May 12-13</addrLine>
									<postCode>2022</postCode>
									<settlement>Gliwice</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Word2Vec Model Analysis for Semantic and Morphologic Similarities in Turkish Words</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">6193A07D26C21881BA16E303A4AED891</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T12:59+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>NLP, Word2Vec, word vectors, cosine similarity, word embedding, semantic relations, formal (structural) relations, Turkish language 0000-0002-9158-6304 (L. Savytska)</term>
					<term>0000-0002-2967-694X (M. T. Sübay)</term>
					<term>0000-0002-1354-4838 (N. Vnukova)</term>
					<term>0000-0002-6285-2060 (I. Bezugla)</term>
					<term>0000-0001-9642-3611 (V. Pyvovarov)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The study presents the calculation of the similarity between words in Turkish language by using word representation techniques. Word2Vec is a model used to represent words into vector form. The model is formed using articles from Wikipedia dump Turkish service as the corpus and then Cosine Similarity calculation method is used to determine the similarity value. The open-source Python programming language and Gensim library are used to obtain high quality word vectors with Word2Vec and calculate the cosine similarity of the vectors. Continuous Bag-of-words (CBOW) algorithm is used to train high quality word vectors. The cosine similarity values in the results are derived from the weight (dimension values) of the vector dimensions. The Window size 10 and 300 vector dimension configurations are taken. Increasing the number of cycles contributes to the vectors getting more accurate values. The corpus is trained in five cycles (EPOCH) with the same parameters. The Turkish corpus contains more than one hundred and sixty one million words. The dictionary of words (unique words), obtained from the corpus, is more than three hundred and sixty-seven thousand. Such a big data gives an opportunity to conduct high quality semantic and morphologic analysis and arithmetic operations of the word vectors.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In today's world automatic analysis is constantly being developed to meet the increasing industrial needs. Thanks to automatic analysis, information access, identifying people or objects from photographs, distinguishing the advertising contents of e-mails, analyzing sentiments in correspondence, translation between languages and many similar needs can be met. Natural Language Processing (NLP) is a general field of computer science, artificial intelligence (AI) and mathematical linguistics <ref type="bibr" target="#b0">[1]</ref>. English mathematician Alan Turing asked a question "Can machines think like a human?" This proposal opened the idea of AI and led to discussion <ref type="bibr" target="#b1">[2]</ref> that AI technologies can learn like humans and communicate with people.</p><p>NLP studies the problems of computer analysis and natural language synthesis. For AI, analysis means understanding the language and synthesis means generating intelligent text. There are different approaches to NLP such as statistical, linguistic, symbolic and etc.</p><p>The linguistic approach to natural language processing consists of four levels: graphematic, morphological, syntactic and semantic <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>. The first level is to identify the individual elements of the text / document, such as sections, paragraphs, sentences, etc. The second level is to determine the morphological characteristics of each word. The third level is responsible for determining the syntactic dependence of words in sentences. The last level is related to the semantic understanding of the text, including developments in the field of artificial intelligence <ref type="bibr" target="#b4">[5]</ref>.</p><p>The clusters and sub-clusters between the vectors obtained by machine learning are parallel in terms of the syntax of words, semantic and formal (structural) relations <ref type="bibr" target="#b5">[6]</ref>. These relationships between words find wide application especially in industrial areas such as search engines. In natural language processing, the matching of words with vectors (finding word vectors) techniques are called Word Embeddings (WE) <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref>. WE is the collective name for a set of language modelling and features of learning techniques in NLP, where words or phrases are represented in the form of real number vectors <ref type="bibr" target="#b8">[9]</ref>. Conceptually, WE involve mathematical formulas. The models used in word embeddings are varied, one is the Word2Vec model. The Word2Vec represents words into vector based on several features they have such as windows size and vector dimensions. Word embedding proved to be an incredibly important method for NLP tasks in recent years, enabling various machine learning models that rely on vector representation as input to enjoy richer representations of text input. These representations preserve more semantic and syntactic information on words, leading to improved performance in almost every imaginable NLP task <ref type="bibr" target="#b9">[10]</ref>. One of the reasons for developing word embedding techniques is that it shortens the machine learning training time. The shortening of the training period provides the opportunity to work with more vector dimensions and larger collections in practice. Being able to train machine learning with large corpus and more vector dimensions is shown among the important factors affecting the correct representation of words by vectors.</p><p>The research using machine technology Word2Vec is of great practical importance to computerise many areas of linguistic analysis such as</p><p>• identifying semantic similarity of words and phrases • automatic clustering of words according to the degree of their semantic similarity • automatic generation of thesaurus and bilingual dictionaries • expanding queries due to associative connections • constructing semantic maps of various subject areas and so on.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Works</head><p>Learning vector representations of words using neural networks has generated a strong enthusiasm in the NLP research community. In particular, many contributions were proposed after the work of Tomas Mikolov and his team <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13]</ref> on training word embeddings. The main reasons for this strong interest are: the proposal of a simple and efficient neural architecture to learn word vector representations, the availability of an open source tool Word2Vec and the rapid structuring of a user community. Later on, several contributions have extended the work of T. Mikolov on word vectors to phrases (sequences of words) <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b15">16]</ref> and T. Luong to bilingual representations <ref type="bibr" target="#b16">[17]</ref>. All these vector representations capture similarities between words, phrases or sentences at different levels (morphological, semantic).</p><p>T. Mikolov and his team conducted the research on training word embeddings by using Word2Vec model representation from English corpus <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13,</ref><ref type="bibr" target="#b13">14]</ref>. We did the research on training word embeddings by using Word2Vec model representation from Ukrainian corpus <ref type="bibr" target="#b17">[18]</ref>. D. Chaplinskyi used LexVec, Word2Vec and GloVe model representations to train Ukrainian word embeddings <ref type="bibr" target="#b18">[19]</ref>. A. Romanyuk suggests training Ukrainian word embeddings with Word2Vec, FastText and MUSE model representations <ref type="bibr" target="#b19">[20]</ref>. V. Vysotska did comparative analysis for English and Ukrainian texts processing based on semantics and syntax approaches <ref type="bibr" target="#b20">[21]</ref>.</p><p>Thus, there are new challenges to conduct the research on training word embeddings by using Word2Vec model representation from different corpuses <ref type="bibr" target="#b21">[22,</ref><ref type="bibr" target="#b22">23]</ref>. In this article we are training Turkish word embeddings by using Word2Vec model representation. The research is examining the semantic clustering of Turkish word vectors, semantic relations between words at arithmetic operations of Turkish word vectors, formal clustering of Turkish word vectors and formal relations between words at arithmetic operations of Turkish word vectors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology and Materials</head><p>The open-source Python programming language and Gensim library is used to obtain high quality word vectors with word representation technique Word2Vec model and calculate the cosine similarity of the vectors <ref type="bibr" target="#b23">[24,</ref><ref type="bibr" target="#b24">25,</ref><ref type="bibr" target="#b25">26,</ref><ref type="bibr" target="#b26">27]</ref>. Continuous Bag-of-words (CBOW) algorithm is used to train high quality word vectors. The cosine similarity values in the results are derived from the weight (dimension values) of the vector dimensions. The Window size 10 and 300 vector dimension configurations are taken. Increasing the number of cycles contributes to the vectors getting more accurate values. The corpus is trained in five cycles (EPOCH) with the same parameters.</p><p>The operation steps are the following:</p><p>The latest version of the Python programming language is downloaded and installed. 1. Two libraries NumPy and Scipy are installed using the Python library installer (pip). 2. The "Gensim" library is installed using the Python library installer (pip). 3. To write code in Python, the command window can be used by line or "Pycharm" etc. or an editor can be used.</p><p>4. If the corpus is related to the subject or they have general content, the resource on the GitHub site <ref type="bibr" target="#b27">[28]</ref> can be used and/or the corpus can be organized using different methods. If the corpus is not available, there is an internet address to access ready-made corpus for this resource. Using the Wikimedia dump Turkish service <ref type="bibr" target="#b28">[29]</ref>, the corpus can be edited using the library named "corpora.wikicorpus" <ref type="bibr" target="#b24">[25]</ref> in "Gensim".</p><p>5."models.word2vec" or "models.keyedvectors" libraries available in "Gensim" are used in order to obtain word vectors using Word2Vecmodel.</p><p>6. The "keyedvectors" library in "Gensim" is used to calculate the cosine similarity of the vectors. The Turkish corpus is obtained from Wikipedia dump Turkish service <ref type="bibr" target="#b28">[29]</ref> and a source <ref type="bibr" target="#b25">[26]</ref>. To clean the corpus, the capital letters were changed to lower letters.</p><p>Turkish letters from capital to lower letters mapping: lowerMap = {ord(u'A'): u'a',ord(u'A'): u'a',ord(u'B'): u'b',ord(u'C'): u'c',ord(u'Ç'): u'ç',ord(u'D'): u'd',ord(u'E'): u'e',ord(u'F'): u'f',ord(u'G'): u'g',ord(u'Ğ'): u'ğ',ord(u'H'): u'h',ord(u'I'): u'ı',ord(u'İ'): u'i',ord(u'J'): u'j',ord(u'K'): u'k',ord(u'L'): u'l',ord(u'M'): u'm',ord(u'N'): u'n',ord(u'O'): u'o',ord(u'Ö'): u'ö',ord(u'P'): u'p',ord(u'R'): u'r',ord(u'S'): u's',ord(u'Ş'): u'ş',ord(u'T'): u't',ord(u'U'): u'u',ord(u'Ü'): u'ü',ord(u'V'): u'v',ord(u'Y'): u'y',ord(u'Z'): u'z'} Python code example to get the corpus from Wikipedia dump Turkish service is the following: from __future__ import print_function import os.path import sys from gensim.corpora import WikiCorpus import xml.etree.ElementTree as etree import warnings import logging import string from gensim import utils def tokenize_tr(content,token_min_len=2,token_max_len=50,lower=True):</p><p>if lower: lowerMap = {ord(u'A'): u'a',ord(u'A'): u'a',ord(u'B'): u'b',ord(u'C'): u'c',ord(u'Ç'): u'ç',ord(u'D'): u'd',ord(u'E'): u'e',ord(u'F'): u'f',ord(u'G'): u'g',ord(u'Ğ'): u'ğ',ord(u'H'): u'h',ord(u'I'): u'ı',ord(u'İ'): u'i',ord(u'J'): u'j',ord(u'K'): u'k',ord(u'L'): u'l',ord(u'M'): u'm',ord(u'N'): u'n',ord(u'O'): u'o',ord(u'Ö'): u'ö',ord(u'P'): u'p',ord(u'R'): u'r',ord(u'S'): u's',ord(u'Ş'): u'ş',ord(u'T'): u't',ord(u'U'): u'u',ord(u'Ü'): u'ü',ord(u'V'): u'v',ord(u'Y'): u'y',ord(u'Z'): u'z'} content = content.translate(lowerMap) return [ utils.to_unicode(token) for token in utils.tokenize <ref type="bibr">(</ref> The Turkish corpus contains more than one hundred and sixty one million words. The dictionary of words (unique words), obtained from the corpus, is more than three hundred and sixty-seven thousand. Such a big data gives an opportunity to conduct high quality semantic and morphologic analysis and arithmetic operations of the word vectors.</p><p>Word vectors training Python code example for "Gensim" library is the following: </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Turkish Corpus trained using Word2Vec model: Experiment and Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Semantic clustering of Turkish word vectors</head><p>Word vectors obtained from the general content Turkish corpus using Word2Vec model are clustered and related in terms of semantic relations of Turkish words.</p><p>The first example is the word "Elma". The first five word vectors with the closest cosine similarity to ('elma') vector are shown below.</p><p>[('çilek', 0.7261281609535217), ('vişne', 0.6900818943977356), ('armut', 0.6884721517562866), ('dut', 0.6787133812904358), ('şeftali', 0.6731953024864197)]</p><p>The word "Elma" in the current Turkish dictionaries, represented by Turkish Language Association (TLA) is defined as 1. Noun, botanical Rose; a tree (Pyrusmalus) with pink or white flowers. 2. Noun, the bark of the tree is bright, hard; red, yellow and green in colour; pleasant smell; sour or sweet taste; crisp texture, stone fruit <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>Among the vectors obtained from the Turkish corpus, the vector ('çilek') is the closest cosine vector to ('elma'). The word "Çilek" in the current Turkish dictionaries, represented by Turkish Language Association (TLA) is defined as 1. Noun, botanical Rosa; a plant; stems creeping, flowers white. 2. Noun, fragrant; pink, red coloured fruit <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>. These vectors are clustered together referring to semantic relations between words they belong to. Among the vectors obtained from the Turkish corpus, the second closest cosine-like vector to ('elma') is ('armut'). The word "Armut" in the current Turkish dictionaries, represented by Turkish Language Association (TLA) is defined as 1. Noun, botanical Rose; its flowers are white; it's a tree (Pyruscommunis) that grows all over Turkey <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>These vectors are clustered together referring to paradigmatic relationsbetween words they belong to.</p><p>The other results, obtained from the Turkish corpus, are vectors clustered together referring to lexical paradigm of the words representing the names of fruit trees, related to the meaning of the word "Elma".</p><p>As a result of training the word "İstanbul", the city name, the first five word vectors with the closest cosine similarity to the vector ('istanbul') are shown below.</p><p>[('ankara', 0.6938591599464417), ('bursa', 0.6174916625022888), ('trabzon', 0.591408371925354), ('üsküdar', 0.581426739692688), ('yenibosna', 0.5711308121681213)]</p><p>The word "İstanbul" in Turkish Language Academic dictionary of proper names is defined as 1. One of the provinces of Turkey in the Marmara Region <ref type="bibr" target="#b29">[30]</ref>. Among the vectors obtained from the Turkish corpus, the vector ('ankara') is first closest cosinelike vector to ('istanbul'). The word "Ankara" in Turkish Language Academic dictionary of proper names is defined as 1. One of the provinces located in the Central Anatolian Region of Turkey, the capital of Turkey <ref type="bibr" target="#b29">[30]</ref>.</p><p>These vectors are clustered together referring to semantic relations between words "İstanbul" and "Ankara", two important cities of Turkey.</p><p>As for the vectors such as 'bursa' and 'trabzon', they are clustered together referring to lexical paradigm of the words representing the other important cities names of Turkey. The word vectors ('üsküdar') and ('yenibosna') are clustered together with word vector ('istanbul'), because words "Üsküdar" and "Yenibosna" represent names of two important districts of Istanbul.</p><p>As a result of training the word "Ahmet", a proper name, the first five word vectors with the closest cosine similarity to the vector ('ahmet') are shown below.</p><p>[('osman', 0.6758742332458496), ('muhittin', 0.6753208637237549), ('niyazi', 0.6559439897537231), ('halit', 0.6479822993278503), ('mehmet', 0.6463955044746399)]</p><p>The word "Ahmet" in Turkish Language Academic dictionary of proper names is defined as Origin: Arabic. Gender: Male. 1. Praised <ref type="bibr" target="#b29">[30]</ref>. Among the vectors obtained from the Turkish corpus, the vector ('osman') is the first closest cosine-like vector to ('ahmet'). The word "Osman" in Turkish Language Academic dictionary of proper names is defined as Origin: Arabic. Gender: Male. 1. A type of bird or dragon. 2. Saint Mohammed's son-in-law, the third caliph.</p><p>3. Founder and first ruler of the Ottoman Empire <ref type="bibr" target="#b29">[30]</ref>.</p><p>When the vectors similar to the cosine-like vector ('osman') are examined, the vectors belonging to words/proper names representing male names as the word "Osman" are investigated. It is semantic cluster related to the area of the word "Ahmet".</p><p>As a result of training the word "Ayşe", a proper name, the first five word vectors with the closest cosine similarity to the vector ('ayşe') are shown below.</p><p>[('melike', 0.796585202217102), ('cemile', 0.7877158522605896), ('merve', 0.7801972031593323), ('hatice', 0.7799881100654602), ('zeynep', 0.7753742933273315)]</p><p>The word "Ayşe" in Turkish Language Academic dictionary of proper names is defined as Origin: Arabic, Gender: Female. 1. Living comfortably and peacefully <ref type="bibr" target="#b29">[30]</ref>. Among the vectors obtained from the Turkish corpus, the vector ('melike') is closest cosine-like vector to ('ayşe'). The word "Melike" in Turkish Language Academic dictionary of proper names is defined as Origin: Arabic, Gender: Female. 1. A female ruler. 2. The sultan's wife <ref type="bibr" target="#b29">[30]</ref>. The word "Ayşe" is used as a female name in Turkish. When the vectors similar to the closest cosine vector ('ayşe') are examined, the vectors belonging to words/proper names representing female names as the word "Melike" are investigated. It is a semantic cluster related to the area of the word Ayşe".</p><p>The word vectors ('ahmet') and ('ayşe') are in the same semantic cluster related to the proper nounmeaning relationship and differentiated according to gender characteristics.</p><p>As a result of training the word "Okul", the first five word vectors with the closest cosine similarity to the vector ('okul') are shown below.</p><p>[('okulun', 0.7467690110206604), ('ilkokul', 0.6787807941436768), ('dershane', 0.6465392708778381), ('lise', 0.6133290529251099), ('ortaokul', 0.6094698905944824)]</p><p>The word "Okul"in the current Turkish dictionaries, represented by Turkish Language Association (TLA) is defined as 1. Noun;the place where all kinds of education and training are held collectively <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>. Among the vectors obtained from the Turkish corpus, the vector ('ilkokul') is the second closest cosine-like vector to ('okul'). The word "İlkokul" in the current Turkish dictionaries, represented by Turkish Language Association (TLA) is defined as 1. Noun;a four-year school, primary school opened or allowed by the government to provide the basic education and training of girls and boys at the age of compulsory education <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>These vectors are in a semantic cluster referring to educational place. The other vectors similar to the cosine-like vector ('okul') are ('dershane'), ('lise'), ('ortaokul'). The word "Dershane" in the current Turkish dictionaries, represented by Turkish Language Association (TLA) is defined as 1. Noun; classroom. 2. Noun; private institution that gives money to students outside of school <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>The word "Lise" in the current Turkish dictionaries, represented by Turkish Language Association (TLA) is defined as 1. Noun; secondary education institution that prepares you for life or higher education with at least four years of education after eight years of primary education.</p><p>2. Noun, secondary education institution that prepares you for life or higher education with at least three years of education after three years of secondary school <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>The word "Ortaokul" in the current Turkish dictionaries, represented by Turkish Language Association (TLA) is defined as 1. Noun, generally three-year secondary school that prepares (middle school) students for life on the one hand, and high school on the other, through general education <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>These vectors are in a semantic cluster referring to educational place belonging to lexical paradigm of the words representing the educational place, related to the meaning of the word "Okul". The first closest cosine-like vector to ('okul') is ('okulun'), obtained from the word "Okul". The word vectors ('okul') and ('okulun') are clustered together representing formal relation. It is formal derivation of the noun root "okul" with the suffix "-in".</p><p>According to the results obtained from training the Turkish corpus using Word2vec modal, it is proved that the vectors are clustered in terms of semantic relations of Turkish words.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Arithmetic operations of word vectors and semantic relations between words</head><p>New vectors can be obtained as a result of adding and subtracting (arithmetic operations) the word vectors obtained from the Turkish corpus.</p><p>The first example is similar to the English example, showed by T. Mikolov <ref type="bibr" target="#b10">[11]</ref> obtained from the English corpus when the cosine analogues of the new vector are obtained by adding and subtracting the vectors.</p><p>('king') -('man') + ('woman') = ('queen')</p><p>The first five word vectors with the closest cosine similarity to the result vector of ('kral') -('erkek') + ('kadın') operation are shown below.</p><p>[('kraliçe', 0.5500485897064209), ('prens', 0.5298552513122559), ('kralın', 0.514844536781311), ('kralı', 0.49624234437942505), ('kraliçenin', 0.46907928586006165)]</p><p>The result obtained from the Turkish corpus is similar to the result obtained from the English corpus. The vector ('kraliçe') belonging to the word "Kraliçe" is the Turkish equivalent of the word "Queen" and the closest cosine-like vector to the result vector from the operation.</p><p>The ('kral') -('erkek') + ('kadın') operation is the replacement of the gender characteristic in the word "Kral", which expresses nobility. In terms of the word meaning, the result of the process is the word "Kraliçe". It is seen that the word meaning is compatible with the result of adding and subtracting the vectors. The word "Kraliçe" is defined as "the wife of the king or the woman who rules the kingdom" in the current Turkish dictionaries, represented by Turkish Language Association (TLA) <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>Another example below are the first five word vectors with the closest cosine similarity to the result vector of ('ingiltere') -('londra') + ('ankara') operation.</p><p>[('türkiye', 0.6439434885978699), ('kırıkkale', 0.5729399919509888), ('niğde', 0.5030767917633057), ('eskişehir', 0.4853522777557373), ('tbmm', 0.4850592315196991)]</p><p>The operation ('ingiltere') -('londra') + ('ankara') is the transaction of the relationship between countries and cities (or their capitals). The vector obtained as a result is ('türkiye'), the first vector among the cosine-like vectors. The process and result vectors are compatible with the result of adding and subtracting vectors.</p><p>The first six word vectors with the closest cosine similarity to the result vector of ('finans') -('para') + ('altın') operation are shown below.</p><p>[('bankacılık', 0.439474880695343), ('gayrimenkul', 0.4268363118171692), ('kuyumculuk', 0.4161675274372101), ('mücevherat', 0.41351592540740967), ('mücevher', 0.3932022750377655, ('sigortacılık', 0.3760865330696106)]</p><p>The word "Finans" in the current Turkish Language Academic dictionaryof science and art terms, represented by Turkish Language Association (TLA) is defined as 1. Commercial activity to raise funds and capital. 2. A sub-branch of economics that studies the management of money and other assets.</p><p>3. Management of money, credit, banking and investments <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>The word "Para" in the current Turkish Language Academic dictionaryof science and art terms, represented by Turkish Language Association (TLA)is defined as 1. Noun; a mean of payment made by paper or metal with the value is written on it. It is printed by the state, cash <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>The word "Altın" in the current Turkish Language Academic dictionaryof science and art terms, represented by Turkish Language Association (TLA)is defined as 1. A precious metal that is used as money or stored by governments in exchange for money due to its scarcity in nature <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>The closest cosine-like vector obtained from the ('finans') -('para') + ('altın') operation is ('bankacılık').</p><p>The word "Bankacılık" in the current Turkish Language Academic dictionaryof science and art terms, represented by Turkish Language Association (TLA) is defined as 1. Noun; all transactions made in the bank. 2. Noun; the job of the banker <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>The second cosine-like vector obtained from the ('finans') -('para') + ('altın') operation is ('gayrimenkul').</p><p>The word "Gayrimenkul" in the current Turkish Language Academic dictionaryof science and art terms, represented by Turkish Language Association (TLA)is defined as 1. Adjective; immovable. 2. Noun; law, house, field, etc. immovable property, real estate <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>In the sixth row, the closest cosine-like vector obtained from the ('finans') -('para') + ('altın') operation is ('sigortacılık').</p><p>The word "Sigortacılık" in the current Turkish Language Academic dictionaryof science and art terms, represented by Turkish Language Association (TLA) is defined as 1. Noun; bilateral connection agreement made with the organization dealing with the business in return for the premium paid in advance to compensate for the future damage if something or someone may encounter in the future <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>According to the results of operations the word vectors ('bankacılık'), ('gayrimenkul'), ('sigortacılık') are in semantic relations. The process and result vectors are compatible with the result of adding and subtracting vectors.</p><p>The first five word vectors with the closest cosine similarity to the result vector of ('spor') -('futbol') + ('yüzme') operation are shown below.</p><p>[('olimpik', 0.5659219026565552), ('havuzu', 0.524342954158783), ('sporları', 0.5239308476448059), ('havuzları', 0.5116350650787354), ('binicilik', 0.49981582164764404)]</p><p>The word "Spor" in the current Turkish Language Academic dictionaryof science and art terms, represented by Turkish Language Association (TLA)is defined as 1. Noun; all the actions performed according to some rules, individually or collectively, with the aim to improve body or mind.</p><p>2. Adjective; easy to use <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>(The meaning of the word related to body movements examined in the process. The meaning of the word related to Plant science and Animal science is not found in the process).</p><p>The closest cosine-like vector obtained from the ('spor') -('futbol') + ('yüzme') operation is ('olimpik').</p><p>The word "Olimpik" in the current Turkish dictionaries, represented by Turkish Language Association (TLA) is defined as 1. Related to the Olympics, with Olympic dimensions <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>In the second and the fourth rows, the cosine-like vectors obtained from the ('spor') -('futbol') + ('yüzme') operation are ('havuzu') and ('havuzları'), belonging to the words "Havuzu" and "Havuzları" derived from the word "Havuz" and formed by the suffixes "-u" and "-ları". It is formal derivation of the noun root "havuz" with the suffixes "-u" and "-ları".</p><p>The word "Havuz" in the current Turkish dictionaries, represented by Turkish Language Association (TLA) is defined as 1. Noun; water accumulation, swimming, beautifying the environment, etc.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">It is generally an open place</head><p>where the bottom and sides are made of things like marble or concrete and filled with water for swimming purposes <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>In the third row, the cosine-like vector obtained from the ('spor') -('futbol') + ('yüzme') operation is ('sporları'), belonging the word "Sporları" derived from the word "Spor" and formed by the suffix "-ları". It is formal derivation of the noun root "spor" with the suffix "-ları".</p><p>In the fifth row, the cosine-like vector obtained from the ('spor') -('futbol') + ('yüzme') operation is ('binicilik').</p><p>The word "Binicilik" in the current Turkish dictionaries, represented by Turkish Language Association (TLA) is defined as 1. Noun; state of being a rider. 2. Noun; horse riding sport <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>The word vector ('binicilik') is the result of two sport branches displacement in the vector process. Semantic relations between Turkish words build clusters in the vectors. It is proved that semantic results obtained by addition and subtraction operations on vectors obtained from the English corpus can be also obtained from the Turkish corpus.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Formal clustering of Turkish word vectors</head><p>Word vectors obtained from the general content Turkish corpus using Word2Vec model are clustered in terms of formal (structural) relations of Turkish words according to Turkish-specific suffixes.</p><p>Turkish language is an agglutinative language. The general feature of agglutinative languages is that word roots are kept constant, suffixes and inflections with various functions are added to the roots. By adding different suffixes to the roots of the word, new words are derived and the vocabulary of the language is formed in this way. All changes and developments in Turkish are based on root suffix combinations <ref type="bibr" target="#b31">[32]</ref>. We should not expect only similar words to come close to each other, as there may be similarities in more than one way. These similarities may also occur according to the suffixes taken in inflected languages. When searching similar words by using word vectors, the words ending with similar suffixes can also be reached <ref type="bibr" target="#b10">[11]</ref>.</p><p>The first word to be examined is the word "Gitmek". The first five word vectors with the closest cosine similarity to vector ('gitmek') are shown below.</p><p>[('dönmek', 0.7897772192955017), ('yetişmek', 0.7705608606338501), ('götürmek', 0.7535400390625), ('inmek', 0.7440905570983887), ('yerleşmek', 0.7398502230644226)]</p><p>The word vectors clustered like a cosine are vectors belonging to the verbs in the form of infinitive. The clustering of word vectors is related to the formal feature infinitive suffix "-mek". It is formal derivation of the verb root with the suffix "-mek".</p><p>Another example, the first five word vectors with the closest cosine similarity to the vector ('gittim') are shown below.</p><p>[('gitmiştim', 0.8377403616905212), ('gittiğimde', 0.8276962637901306), ('gidiyordum', 0.7992637753486633), ('gidiyorum', 0.7966102361679077), ('gideceğim', 0.7883756160736084)]</p><p>The word "Gittim" is derived by taking the past tense singular affix "-tim" to the verb root "git". The word vectors, obtained from the words "Gitmiştim", "Gittiğimde", "Gidiyordum", "Gidiyorum", "Gideceğim" are vectors of inflected words derived by adding the first person singular suffix to the verb root "git". The clustering of word vectors is related to the formal feature the first person singular.</p><p>The word "Elma" was discussed while analyzing the semantic relations between vectors. For the word "Elmalı" in the sentence "Elmalı turta severim" the first five word vectors with the closest cosine similarity to the vector ('elmalı') are shown below.</p><p>[('kumluca', 0.7562471628189087), ('akseki', 0.7351764440536499), ('ibradı', 0.7255643606185913), ('karacaören', 0.7211636304855347), ('akçapınar', 0.7149443626403809)]</p><p>The word "Elma" in the sentence "Elmalı turta severim" is derived by adding the suffix "-lı", which builds the word "Elma" into an adjective with the word meaning apple fruit. The word "Elmalı" also refers to the district of Antalya city. Among the vectors obtained by training from Turkish corpus the closest cosine-like vectors are ('kumluca'), ('akseki'), ('ibradı'), representing districts of Antalya city. The clustering of word vectors is related to the semantic feature being a district of Antalya city. It takes place according to the semantic relation with the word "Elmalı".</p><p>The first five word vectors with the closest cosine similarity to vector ('ağaçlık') are shown below.</p><p>[('ormanlık', 0.8628451228141785), ('çalılık', 0.7839390635490417), ('sazlık', 0.7809475660324097), ('makilik', 0.7765018939971924), ('otluk', 0.772311270236969)]</p><p>The word "Ağaçlık" is derived by taking the suffix "-lık", to the verb root "ağaç". The place name is derived from the noun describing the item. The word vectors clustered like a cosine are vectors of inflected words derived by adding the suffixes "-lık", "-lik", "-luk" to the noun root. Clustering of word vectors takes place within the relationship of form and meaning of the word. It is related to the formal derivation of the noun root with suffixes"-lık", "-lik", "-luk" and the semantic feature, building a place name from the item.</p><p>The first five word vectors with the closest cosine similarity to the vector ('avukatlık') are shown below.</p><p>[('muhasebecilik', 0.6621850728988647), ('doktorluk', 0.6428958177566528), ('hakimlik', 0.6376224160194397), ('yargıçlık', 0.635696530342102), ('memurluk', 0.593788206577301)]</p><p>The word "Avukatlık" is derived by taking the suffix "-lık" to the noun root "avukat". The job name is derived from the noun describing a profession name. The word vectors clustered like a cosine are vectors of inflected words derived by adding the suffixes "-lık", "-lik", "-luk" to the noun root. Clustering of word vectors takes place within the relationship of form and meaning of the word. It is related to the formal derivation of the noun root with suffixes "-lık", "-lik", "-luk" and the semantic feature, building a job name from profession name.</p><p>The first five word vectors with the closest cosine similarity to the vector ('temizlik') are shown below.</p><p>[('temizleme', 0.5787885785102844), ('temizliği', 0.5512673258781433), ('banyo', 0.5476162433624268), ('kumlama', 0.5201424360275269), ('tamirat', 0.5156590342521667)]</p><p>The word "Temizlik" is derived by taking the suffix "-lık" to the adjective root "temiz". The nounis derived from the adjective. In the first and the second rows, the cosine-like vectors obtained from the vector ('temizlik') are ('temizleme') and ('temizliği').The word "Temizlik" in the current Turkish dictionaries, represented by Turkish Language Association (TLA) is defined as 1. Noun; state of being clean, purity, chastity, kindness. 2. Noun, the state of standing or keeping clean. 3. Noun; the cleaning job. 4. Noun; (slang) eliminate, destroy, kill <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>. The word "Temizleme" in the current Turkish dictionaries, represented by Turkish Language Association (TLA) is defined as 1. Noun; the cleaning job. 2. Noun; removing stains and dirt, adhering to surfaces, transferring them into a solution or suspension <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>The first two word vectors clustered like a cosine are vectors of inflected words derived by adding the suffixes "-leme", "-liği" to the adjective root. Clustering of word vectors takes place within the formal relations.</p><p>In the third row, the cosine-like vector obtained from the vector ('temizlik') is ('banyo'). The word "Banyo" in Turkish dictionary of the Turkish Language Association is defined as 1. Noun; the part in buildings, where everything is washed. 2. Noun; bathing in the bathtub <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>The word vectors ('temizlik') and ('banyo') are in semantic relations.</p><p>In the fourth row, the cosine-like vector obtained from the vector ('temizlik') is ('kumlama'). The word "Kumlama" in the current Turkish dictionaries, represented by Turkish Language Association (TLA) is defined as 1. Noun; sandblasting the surface using air pressure to indicate more the visual difference between the growth rings of pine trees <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>The word vectors ('temizlik') and ('kumlama') are in semantic relations.</p><p>The word vectors obtained from the word "Temizlik" clustered like cosine vectors according to the semantic and/or formal relationships.</p><p>Formal relations between Turkish words build clusters in the vectors. It is proved that the examined Turkish word vectors are clustered and related according to Turkish-specific suffixes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Arithmetic operations of word vectors and morphology between words</head><p>New vectors can be also obtained as a result of adding and subtracting (arithmetic operations) the word vectors obtained from the Turkish corpus by examining the formal clustering.</p><p>The first five word vectors with the closest cosine similarity to the result vector of ('gitmek') -('git') + ('götür') operation are shown below.</p><p>[('götürmek', 0.7065088748931885), ('yetişmek', 0.5844410061836243), ('götürülmek', 0.5795775651931763), ('binmek', 0.5781220197677612), ('uğurlamak', 0.561299204826355)]</p><p>According to the results of operations ('gitmek') -('git') + ('götür'), the obtained vectors, clustered like a cosine, referring to the formal feature infinitive suffixes "-mek" and "-mak".</p><p>The first five word vectors with the closest cosine similarity to the result vector of ('çiçekli') -('çiçek') + ('yaprak') operation are shown below.</p><p>[('yapraklı', 0.6582359671592712), ('dallı', 0.6488081812858582), ('dişbudak', 0.6367601752281189), ('otu', 0.624358594417572), ('yapraklar', 0.61882483959198)]</p><p>The word "Yapraklı" in the current Turkish dictionaries, represented by Turkish Language Association (TLA) is defined as 1. Adjective, with leaves <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>. The word "Çiçekli" in the current Turkish dictionaries, represented by Turkish Language Association (TLA) is defined as 1. Adjective, with flowers or pictures of flowers <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>According to the results of operations ('çiçekli') -('çiçek') + ('yaprak'), the vector ('yapraklı'), clustered like a cosine, referring to the formal feature noun rooted adjectives derived with similar suffixes "-li", "-lı".</p><p>The first five word vectors with the closest cosine similarity to the result vector of ('tazelik') -('taze') + ('saydam') operation are shown below.</p><p>[('saydamlık', 0.4215427339076996), ('opak', 0.3784925937652588), ('erçivan', 0.3675283193588257), ('görüntüleme', 0.36332154273986816), ('tipindedir', 0.3586195111274719)]</p><p>The word "Tazelik" in the current Turkish dictionaries, represented by Turkish Language Association (TLA) is defined as 1. Noun; state of being fresh, young. 2. Noun; (metaphor) a state of cheerfulness, liveliness <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>The word "Saydamlık" in the current Turkish dictionaries, represented by Turkish Language Association (TLA) is defined as 1. Noun; state of being transparent, transparency <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>According to the results of operations ('tazelik') -('taze') + ('saydam'), the vector 'saydamlık' clustered like a cosine, referring to the formal feature noun rooted nouns derived with similar suffixes"-lik", "-lık".</p><p>The vectors obtained from the Turkish corpus are clustered considering the formal relations between the words they belong to. It is proved that the formal results obtained by addition and subtraction on vectors are clustered and related according to Turkish-specific suffixes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussions</head><p>Word vectors obtained from the general content Turkish corpus using Word2Vec model are clustered and related in terms of semantic relations and formal (structural) relations with Turkish words they belong to or both simultaneously according to Turkish-specific suffixes.</p><p>The word vector ('elma') is clustered together with the other word vectors, obtained from the words belonging to lexical paradigm representing the fruit names. The word vector ('istanbul') is clustered together with the other word vectors, obtained from the words "Ankara", "Bursa", "Trabzon", representing the city names. The word vectors ('üsküdar') and ('yenibosna') are clustered together with word vector ('istanbul') because words "Üsküdar" and "Yenibosna" represent names of two important districts of Istanbul. The word vectors ('ahmet') and ('ayşe') are in the same semantic cluster related to the proper noun-meaning relationship and differentiated according to gender characteristics. The word vector ('okul') is clustered together with the other word vectors obtained from the words "İlkokul", "Dershane", "Lise", "Ortaokul", representing the educational place. The word vectors ('okul') and ('okulun') are clustered together representing formal relation. It is formal derivation of the noun root "okul" with the suffix "-in".</p><p>The word vectors ('dönmek'), ('yetişmek'), ('götürmek'), ('inmek'), ('yerleşmek') are found in the cosine similarity of the word vector ('gitmek'). They are in formal relations, representing the word vectors referring to the verbs in the form of infinitive. It is formal derivation of the verb root with the infinitive suffix "-mek". The word vectors ('gitmiştim'), ('gittiğimde'), ('gidiyordum'), ('gidiyorum'), ('gideceğim') are found in the cosine similarity of the word vector ('gittim'). They are in formal relations representing the word vectors belonging to the inflected words derived by adding the first person singular suffix to the verb root "git". The word vectors ('kumluca'), ('akseki'), ('ibradı') are found in the cosine similarity of the word vector ('elmalı'). The clustering of word vectors is representing the semantic feature being a district of Antalya city. It takes place according to the semantic relations with the word "Elmalı". The word vectors ('ormanlık'), ('çalılık'), ('sazlık'), ('makilik'), ('otluk') are found in the cosine similarity of the word vector ('ağaçlık'), representing the names of the place. The clustering of word vectors is related to the formal feature inflected words derived by adding the suffixes "-lık", "lik", "-luk" to the noun root and takes place within the semantic and formal relations with the words. It is formal derivation of the noun root with suffixes"lık", "lik", "-luk" and semantic feature, building a place name from the item. The word vectors ('muhasebecilik'), ('doktorluk'), ('hakimlik'), ('yargıçlık'), ('memurluk') are found in the cosine similarity of the word vector ('avukatlık'), representing building a job name from profession name. The clustering of word vectors is related to the formal feature inflected words derived by adding the suffixes "-lık", "lik", "-luk" to the noun root and takes place within the semantic and formal relations with the words. It is formal derivation of the noun root with suffixes"-lık", "lik", "-luk" and semantic feature, building a job name from profession name. The first two word vectors ('temizleme') and ('temizliği') in the cosine similarity of the word vector ('temizlik') are clustered like word vectors of inflected words derived by adding the suffixes "-leme", "-liği" to the adjective root. are formal relations. Word vectors ('banyo') and ('kumlama') are clustered like vectors, representing the semantic relations with the word "Temizlik". The first five word vectors obtained from the word "Temizlik" clustered like cosine vectors according to the semantic and/or formal relationships.</p><p>New vectors are obtained as a result of adding and subtracting (arithmetic operations) the word vectors obtained from the Turkish corpus.</p><p>The vector obtained as a result of ('kral') -('erkek') + ('kadın') operation is ('kraliçe'), the first vector among the cosine-like vectors. It is the replacement of the gender characteristic in the word "Kral", expresses nobility. The vector obtained as a result of ('ingiltere') -('londra') + ('ankara') operation is ('türkiye'), the first vectors among the cosine-like vectors. It is the transaction of the relationship between countries and cities (or their capitals). The vectors obtained as a result of ('finans') -('para') + ('altın') operation are word vectors ('bankacılık'), ('gayrimenkul'), ('sigortacılık'). They are in semantic relations compatible with the result of adding and subtracting vectors.The closest cosine-like vector obtained as a result of ('spor') -('futbol') + ('yüzme') operation is ('olimpik'). It is in semantic relation with word vectors. The word vectors ('havuzu'), ('havuzları') and ('sporları') are in formal relations. It is formal derivation of the noun roots "havuz" and "spor" with the suffixes "-u" and "-ları". The word vector ('binicilik') is the result of two sport branches displacement in the vector process. According to the results of operations ('gitmek') -('git') + ('götür'), obtained vectors clustered like a cosine, referring to the formal feature infinitive suffixes "-mek" and "-mak". According to the results of operations ('çiçekli') -('çiçek') + ('yaprak'), the vector ('yapraklı'), clustered like a cosine, referring to the formal feature noun rooted adjectives derived with similar suffixes "-li", "-lı". The vector obtained as a result of ('tazelik') -('taze') + ('saydam') operation is ('saydamlık').The clustering of word vectors is related to the formal feature inflected words derived by adding the "lik", "-lık" to the noun root.</p><p>Our previous research was conducted on training word embeddings by using Word2Vec model representation from Ukrainian corpus. In this paper we took Turkish language to analyse word embeddings by using Word2Vec model representation from corpus belonging to other language family, Turkic.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusions and Future Work</head><p>The research analyses regarding to the clustering of word vectors obtained from Turkish corpus of general subject content (using Word2Vec model) are made considering the two sub-branches of linguistics, semantics and morphology. The research analyses made in terms of semantics proved that the word vectors' accuracy could represent clusters according to semantic or formal relations with the words they belong to. The research analyses made in terms of morphology prove that word vectors are clustered and related in terms of morphological features according to Turkish-specific suffixes. It indicates a high structural level of construction of the Turkish language.</p><p>The cosine similarities of the vectors obtained by addition and subtraction on vectors are examined in terms of their compatibility with the meaning of the process. It is proved that the semantic results that can be obtained by addition and subtraction on vectors obtained from the English corpus can be also obtained from the Turkish corpus.</p><p>Considering the morphological properties of the words, the vectors can be clustered according to the suffixes they take or represent semantic relations between words.</p><p>The research analyses made in terms of semantics and morphology prove that vectors are clustered according to semantic or formal relations with the words they belong to. Verb and noun rooted words cause clusters in word vectors according to their semantic or morphological features or a mixture of both, their meanings in the sentence and the suffixes they took.</p><p>Our future works on this topic will focus on constructing semantic maps of various subject areas and expanding queries due to associative connections.</p></div>		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">References</head></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Natural Language Processing</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">F</forename><surname>Allen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Encyclopedia of Computer Science</title>
				<imprint>
			<publisher>John Wiley and Sons Ltd</publisher>
			<date type="published" when="2003">2003</date>
			<biblScope unit="page" from="1218" to="1222" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Computing Machinery and Intelligence</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Turing</surname></persName>
		</author>
		<idno type="DOI">10.1093/mind/lix.236.433</idno>
	</analytic>
	<monogr>
		<title level="j">Mind</title>
		<imprint>
			<biblScope unit="page" from="433" to="460" />
			<date type="published" when="1950">1950</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Large Language Models in Machine Translation</title>
		<author>
			<persName><forename type="first">T</forename><surname>Brants</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Popat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Och</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language, EMNLP-CoNLL, Association for Computational Linguistics</title>
				<meeting>the 2007 Joint Conference on Empirical Methods in Natural Language, EMNLP-CoNLL, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="858" to="867" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A neural probabilistic language model</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ducharme</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Vincent</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Jauvin</surname></persName>
		</author>
		<idno type="DOI">10.1162/153244303322533223</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="page" from="1137" to="1155" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">The Mobile Application Development Based on Online Music Library for Socializing in the World of Bard Songs and Scouts&apos; Bonfires</title>
		<author>
			<persName><forename type="first">B</forename><surname>Rusyn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Pohreliuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rzheuskyi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kubik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ryshkovets</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chyrun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chyrun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vysotskyi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">B</forename><surname>Fernandes</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-33695-0_49</idno>
	</analytic>
	<monogr>
		<title level="m">Advances in Intelligent Systems and Computing IV, CSIT 2019</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<biblScope unit="volume">1080</biblScope>
			<biblScope unit="page" from="734" to="756" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Scaling learning algorithms towards AI</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lecun</surname></persName>
		</author>
		<idno type="DOI">10.7551/mitpress/7496.003.0016</idno>
	</analytic>
	<monogr>
		<title level="m">Large-scale kernel machines</title>
				<editor>
			<persName><forename type="first">L</forename><surname>Bottou</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">O</forename><surname>Chapelle</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Decoste</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Weston</surname></persName>
		</editor>
		<meeting><address><addrLine>Cambridge, Mass</addrLine></address></meeting>
		<imprint>
			<publisher>Mit Press</publisher>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">The Expressive Power of Word Embeddings</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Perozzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Al-Rfou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Skiena</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1301.3226</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 30 th International Conference on Machine Learning, ICML 2013</title>
				<meeting>the 30 th International Conference on Machine Learning, ICML 2013<address><addrLine>Atlanta, Georgia, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Word Embeddings through Hellinger PCA</title>
		<author>
			<persName><forename type="first">R</forename><surname>Lebret</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Collobert</surname></persName>
		</author>
		<idno type="DOI">10.3115/v1/E14-1051</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 14th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics<address><addrLine>Gothenburg, Sweden</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="482" to="490" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Word2Vec Model Analysis for Semantic Similarities in English Words</title>
		<author>
			<persName><forename type="first">D</forename><surname>Jatnikaa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Bijaksanaa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Suryania</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.procs.2019.08.153</idno>
	</analytic>
	<monogr>
		<title level="m">The Workshop Proceedings of the 4th International Conference on Computer Science and Computational Intelligence 2019 (ICCSCI)</title>
				<imprint>
			<biblScope unit="page" from="160" to="167" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Quick Training of Probabilistic Neural Nets by Importance Sampling</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-S</forename><surname>Senecal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of AISTATS 2003. Society for Artificial Intelligence and Statistics</title>
				<meeting>AISTATS 2003. Society for Artificial Intelligence and Statistics<address><addrLine>Florida, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Efficient Estimation of Word Representations in Vector Space</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1301.3781v3</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Workshop at ICLR 2013, Computation and Language</title>
				<meeting>Workshop at ICLR 2013, Computation and Language<address><addrLine>Scottsdale, Arizona, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Exploiting Similarities among Languages for Machine Translation</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1309.4168v1</idno>
	</analytic>
	<monogr>
		<title level="m">Computing Research Repository</title>
				<imprint>
			<publisher>CoRR</publisher>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Linguistic Regularities in Continuous Space Word Representations</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>-T. Yih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zweig</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
				<meeting>the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Atlanta, Georgia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="746" to="751" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Distributed Representations of Sentences and Documents</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 31st International Conference on Machine Learning</title>
				<meeting>the 31st International Conference on Machine Learning<address><addrLine>Beijing, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="2931" to="2939" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Tian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015)</title>
				<meeting>the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015)<address><addrLine>Buenos Aires, Argentina</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="3650" to="3656" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Word2Vec vs DBnary: Augmenting METEOR using Vector Representations or Lexical Resources?</title>
		<author>
			<persName><forename type="first">C</forename><surname>Servan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bérard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Elloumi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Blanchon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Besacier</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/C16-1110" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers</title>
				<meeting>COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers<address><addrLine>Osaka, Japan</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="1159" to="1168" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Bilingual word representations with monolingual quality in mind</title>
		<author>
			<persName><forename type="first">T</forename><surname>Luong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Pham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Manning</surname></persName>
		</author>
		<idno type="DOI">10.3115/v1/W15-1521</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing</title>
				<meeting>the 1st Workshop on Vector Space Modeling for Natural Language Processing<address><addrLine>Denver, Colorado</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="151" to="159" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Using Word2Vec Technique to Determine Semantic and Morphologic Similarity in Embedded Words of the Ukrainian Language</title>
		<author>
			<persName><forename type="first">L</forename><surname>Savytska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Vnukova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Bezugla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Pyvovarov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Sübay</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org/Vol-2870/paper21.pdf" />
	</analytic>
	<monogr>
		<title level="m">The Workshop Proceedings of the 5th International Conference on Computational Linguistics and Intelligent Systems (COLINS 2021)</title>
				<meeting><address><addrLine>Lviv, Ukraine</addrLine></address></meeting>
		<imprint>
			<biblScope unit="volume">2870</biblScope>
			<biblScope unit="page" from="235" to="248" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">Slovnyk VESUM ta inshi poviazani zasoby NLP dlia ukrainskoi movy [VESUM dictionary and other related NLP tools for the Ukrainian language</title>
		<author>
			<persName><forename type="first">A</forename><surname>Rysin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Starko</surname></persName>
		</author>
		<author>
			<persName><surname>Chaplynskyi</surname></persName>
		</author>
		<ptr target="https://r2u.org.ua/articles/vesum" />
		<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Vektorni predstavlennia sliv dlia ukrainskoi movy [Vector Representations of Ukrainian Words</title>
		<author>
			<persName><forename type="first">A</forename><surname>Romanyuk</surname></persName>
		</author>
		<idno>doi: uam.2019.27.1062</idno>
	</analytic>
	<monogr>
		<title level="j">Ukraina moderna [Modern Ukraine</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="page" from="46" to="72" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">A Comparative Analysis for English and Ukrainian Texts Processing Based on Semantics and Syntax Approach</title>
		<author>
			<persName><forename type="first">V</forename><surname>Vysotska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Holoshchuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Holoshchuk</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org/Vol-2870/paper26.pdf" />
	</analytic>
	<monogr>
		<title level="m">The Workshop Proceedings of the 5th International Conference on Computational Linguistics and Intelligent Systems (COLINS 2021)</title>
				<meeting><address><addrLine>Lviv, Ukraine</addrLine></address></meeting>
		<imprint>
			<biblScope unit="volume">2870</biblScope>
			<biblScope unit="page" from="311" to="356" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database</title>
		<author>
			<persName><forename type="first">E</forename><surname>Altszyler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sigman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ribeiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">Fernández</forename><surname>Slezak</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1610.01520</idno>
	</analytic>
	<monogr>
		<title level="j">Computer Science</title>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Corpus Specificity in LSA and Word2vec: The Role of Out-of-Domain Documents</title>
		<author>
			<persName><forename type="first">E</forename><surname>Altszyler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sigman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">Fernandez</forename><surname>Slezak</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/W18-3001</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of The Third Workshop on Representation Learning for NLP</title>
				<meeting>The Third Workshop on Representation Learning for NLP<address><addrLine>Melbourne, Australia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1" to="10" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<ptr target="https://radimrehurek.com/gensim/models/word2vec.html" />
		<title level="m">Gensim: topic modelling for humans</title>
				<imprint/>
	</monogr>
	<note>Word2vec embeddings</note>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<ptr target="https://radimrehurek.com/gensim/corpora/wikicorpus.html" />
		<title level="m">Gensim: topic modelling for humans</title>
				<imprint/>
	</monogr>
	<note>Corpus from a Wikipedia dump</note>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Aksoy</surname></persName>
		</author>
		<title level="m">Word2Vec gibi işlemlerde kullanılmaya uygun Türkçe metin dosyaları [Turkish text files suitable for use in processes such as Word2Vec</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model</title>
		<author>
			<persName><forename type="first">G</forename><surname>Sidorov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gelbukh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Gómez-Adorno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Pinto</surname></persName>
		</author>
		<idno type="DOI">10.13053/CyS-18-3-2043</idno>
	</analytic>
	<monogr>
		<title level="j">Computación y Sistemas, Thematic issue: Computational linguistics</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="491" to="504" />
			<date type="published" when="2014">2014. 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<title level="m" type="main">Python ile Türkçe derlem (corpus) hazırlama [Preparing a Turkish corpus with Python</title>
		<author>
			<persName><forename type="first">A</forename><surname>Aksoy</surname></persName>
		</author>
		<ptr target="https://github.com/ahmetax/derlemtr" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<ptr target="https://archive.org/details/trwiki-20190101" />
		<title level="m">Wikimedia database dump of the Turkish Wikipedia</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<title level="m" type="main">Turkish Language Association</title>
		<author>
			<persName><forename type="first">Türk</forename><surname>Dil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kurumu</forename></persName>
		</author>
		<ptr target="https://sozluk.gov.tr/" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note>Sözlük. Dictionary</note>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<ptr target="https://kelimeler.gen.tr/" />
		<title level="m">Türkçe Kelime Sözlüğü</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note>Turkish Turkish Dictionary</note>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<author>
			<persName><forename type="first">Z</forename><surname>Korkmaz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Türkiye</forename></persName>
		</author>
		<title level="m">Türkçesi grameri: şekil bilgisi, Türk Dil Kurumu [Turkey. Turkish grammar: morphology, Turkish Language Association</title>
				<meeting><address><addrLine>Ankara</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1027">2019. 1027</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
