<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Stance Detection in Russian: a Feature Selection and Machine Learning Based Approach</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Sergey</forename><surname>Vychegzhanin</surname></persName>
							<email>vychegzhanin.sv@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Vyatka State University</orgName>
								<address>
									<settlement>Kirov</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Stance Detection in Russian: a Feature Selection and Machine Learning Based Approach</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">CA2BAEC7F4E6377B5C18902E9307188E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T19:38+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>stance detection</term>
					<term>SVM</term>
					<term>Recursive Feature Elimination</term>
					<term>social media analysis</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The huge scale and constant increase of the data volume in social media has led to a high demand for automatic means of such content analysis, specifically stance detection. This term stands for the task of assigning stance labels ("for" and "against") with respect to a discussion topic. In the paper we tackle stance detection for Russian texts from social network "VKontakte" with the use of machine learning methods -the support vector machine, k-nearest neighbors, Naïve Bayes, AdaBoost, and decision trees. Also we apply the Recursive Feature Elimination (RFE) algorithm for feature selection and explore the impact of morphological analysis on the quality of the task solution.</p><p>The best results (F 1 =84.3%) are achieved by using of the SVM and vector model with relatively small set of normalized words chosen by RFE.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Over the last 10-15 years Web 2.0 services and social media have shown huge growth, for example, in January 2017 the number of active accounts of Facebook was 1.87 billion, and the total number of social network users worldwide was more than 2.5 billion <ref type="foot" target="#foot_0">1</ref> . "User-generated content is the lifeblood of social media" <ref type="bibr" target="#b20">[Obar, Wildman, 2015]</ref>. Such content, especially in text form, is a potential source of useful information for government agencies, commercial companies and individuals. The huge scale and constant increase of the data volume in social media has led to a high demand for automatic means of such content analysis <ref type="bibr" target="#b32">[Zafarani et al., 2014]</ref>. The subject of this article is the stance detection, or stance classification, that is the task of assigning stance labels with respect to a discussion topic <ref type="bibr" target="#b28">[Sridhar, 2015]</ref>. The main labels in this task are "for" ("support", "pro", "favor") and "against" ("oppose", "con", "anti"). Also the labels "none" ("neither"), denoting the situation, when one cannot deduce stance from the text <ref type="bibr" target="#b19">[Mohammad et al., 2016]</ref>, and "observing", indicating the repetition of the previous opinion <ref type="bibr" target="#b6">[Ferreira, Vlachos, 2016</ref>] are used.</p><p>The set of targets in relation to which the position is expressed can consist of one object, for example, "legalization of abortion" or "feminist movement" <ref type="bibr" target="#b19">[Mohammad et al., 2016]</ref>, pair of objects, for example, "iPhone vs. Blackberry" <ref type="bibr" target="#b27">[Somasundaran, Wiebe, 2009]</ref> or more objects, for example, "left, right and other political orientations" <ref type="bibr" target="#b14">[Malouf, Mullen, 2008]</ref>.</p><p>The spectrum of areas of objects is very wide <ref type="bibr" target="#b1">[Anand et al, 2011;</ref><ref type="bibr">Mohammad et al., 2016]</ref>: politics ("communism vs. capitalism", "Donald Trump"), religion ("God's existence"), socially significant topics ("climate change", "death penalty"), products ("Firefox vs. Internet Explorer", "Mac vs. PC") and even games and entertainment ("Superman vs. Batman", "cats vs. dogs").</p><p>The data sources are: congressional floor debates <ref type="bibr" target="#b29">[Thomas et al., 2006;</ref><ref type="bibr" target="#b3">Burfoot et al., 2011]</ref>, discussion forums <ref type="bibr" target="#b27">[Somasundaran, Wiebe, 2009;</ref><ref type="bibr" target="#b31">Walker et al., 2012]</ref>, social networks such as Twitter <ref type="bibr" target="#b22">[Rajadesingan, Liu, 2014;</ref><ref type="bibr" target="#b26">Sobhani et al., 2016]</ref>, online news articles <ref type="bibr" target="#b6">[Ferreira, Vlachos, 2016]</ref> and comments <ref type="bibr" target="#b25">[Sobhani et al., 2015]</ref>.</p><p>Stance detection can be done at the author level, when it is considered that the author's position does not change during the discussion, and at the document (user post) level, when it is assumed that the author's position may change <ref type="bibr" target="#b28">[Sridhar et al., 2015]</ref>.</p><p>Stance detection can be applied in information retrieval, text summarization, recommendation systems, targeted advertising, political polling, product reviews, and fact checking <ref type="bibr" target="#b18">[Mohammad, 2015;</ref><ref type="bibr" target="#b28">Sridhar et al., 2015;</ref><ref type="bibr" target="#b5">Elfardy et al., 2015;</ref><ref type="bibr" target="#b6">Ferreira, Vlachos, 2016]</ref>.</p><p>Systems that try to determine automatically the position of the author of the text, face a number of difficulties: ─ expression in the same text (post) or in different texts of the same author of the opposite positions with respect to the object (sides). Authors often give arguments in favor of their position and against another position, sometimes in one post. Sometimes even the authors can recognize in some way the rightness of the opponent, without sharing his position as a whole <ref type="bibr" target="#b27">[Somasundaran et al., 2009]</ref>; ─ the author's position may change during the discussion <ref type="bibr" target="#b28">[Sridhar et al., 2015]</ref>; ─ the target object may not be mentioned in the text <ref type="bibr" target="#b18">[Mohammad, 2015]</ref>; ─ authors with different positions use the same or similar lexicon <ref type="bibr" target="#b0">[Agrawal et al., 2003</ref>]; ─ a difficult language for analysis -comparisons, irony, sarcasm and other rhetorical devices <ref type="bibr" target="#b14">[Malouf, Mullen, 2008]</ref>.</p><p>Stance detection is closely related to sentiment analysis, argumentation mining and argument-based opinion mining, but does not coincide with them. Sentiment analysis refers to the task of automatically determining the polarity of a given text, whether it is positive, negative, or neutral <ref type="bibr" target="#b18">[Mohammad, 2015]</ref>. The author's position and his sentiment in relation to the same object may not coincide. For example, in the sentence "I do not like the party Yabloko, but I will vote for it," negative polarity is expressed, but the position is "for". It turns out that the share of such sentences is quite high. For instance, in a dataset of tweets manually annotated for stance and sentiment <ref type="bibr" target="#b26">[Sobhani et al., 2016]</ref> the share of negative tweets in which the position "for" is expressed is 12.8%, and the share of positive tweets with the position "against" is 13.8%. Another difference is that sentiment is changing more dynamically than stance, which is usually quite stable <ref type="bibr" target="#b5">[Elfardy et al., 2015]</ref>.</p><p>There is also a link between stance detection and aspect-based sentiment analysis <ref type="bibr">[Liu, 2012]</ref>. Authors can evaluate not the object as a whole, but its individual aspects, expressing different positions regarding them <ref type="bibr" target="#b27">[Somasundaran et al., 2009]</ref>. Thus, differentiating aspects can help in determining the position of the author.</p><p>Argumentation mining is a research area involved with the automatic extraction of argumentation structure from text <ref type="bibr" target="#b17">[Moens et al., 2007]</ref>. Argument-based opinion mining aims to determine the arguments on which the users base their stance without recovering the argumentation structure <ref type="bibr" target="#b2">[Boltužić, Šnajder, 2014]</ref>. Methods from both areas can be useful for solving the problem of stance detection.</p><p>In our work we studied the users' messages from Internet forums and the Russian social network "VKontakte"<ref type="foot" target="#foot_1">2</ref> , in which the position for or against children vaccinations was expressed. Each post was regarded separately, so it was considered that the position of the author does not change.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Previous work</head><p>In the works devoted to stance detection, two main approaches can be distinguished: 1) the approach, which takes into account the discourse and the links between the posts (utterances), usually using graph-based methods <ref type="bibr" target="#b0">[Agrawal et al., 2003</ref> One of the first works, devoted to stance detection, was the article by Agrawal et al. <ref type="bibr">[2003]</ref>. It considered the newsgroups discussions and the authors were divided into opposite camps based on the analysis of the link structure of the interactions' network and finding a maximum cut in the graph. The texts of the posts were not taken into account. An interesting peculiarity of many newsgroups is that people are more likely to respond to a message when they do not agree, than when they agree. <ref type="bibr" target="#b29">Thomas et al. [2006]</ref> explore transcripts of U.S. Congressional floor debates. The linear Support Vector Machine (SVM) with unigrams as features is used for classification. It is shown that considering same-speaker and different-speaker agreement information slightly improves the accuracy of the analysis. <ref type="bibr" target="#b14">Malouf and Mullen [2008]</ref> analyze U.S. informal political discussion posts. For stance detection they use approaches from sentiment analysis -PMI-IR <ref type="bibr" target="#b30">[Turney, 2002]</ref> and from text classification -Naïve Bayes <ref type="bibr" target="#b16">[McCallum, 1996]</ref>. They also use the approach based on the co-citation graph, for which the singular value decomposition followed by hierarchical clustering is performed. Cluster classes are defined based on the Naïve Bayes classifier. This approach turns out to be more accurate than straightforward Naïve Bayes. <ref type="bibr" target="#b1">Anand et al. [2011]</ref> examine stance detection on posts across 14 various topics, such as "abortion", "marijuana legalization" and "cats vs. dogs". As characteristics they use n-grams (unigrams and bigrams), post's statistics, repeated punctuations, syntactic dependencies, and the set of context features computed for the immediately preceding post. Classification is performed on the basis of Naïve Bayes and rulebased classifier. The results of the analysis with considering the contextual characteristics and without them, are mixed. <ref type="bibr" target="#b31">Walker et al. [2012]</ref> use MaxCut over graph that represents the dialogic relations of agreement between speakers. In comparison with other classifiers (rule-based, Naïve Bayes and the SVM), the MaxCut-based algorithm shows some superiority.</p><p>In other works the discourse links between posts are not taken into account. For example, <ref type="bibr" target="#b27">Somasundaran and Wiebe [2009]</ref> use an unsupervised approach, mining from web the polarity-target pairs, computing conditional probabilities with respect to topics. Overall stance of the post is computed with Integer Linear Programming. <ref type="bibr" target="#b25">Sobhani et al. [2015]</ref> first implement Non-Negative Matrix Factorization for text clustering, then make manual labelling by argument tags based on top keywords. Received argument tags are used in the SVM for stance classification.</p><p>In 2016, within the International Workshop on Semantic Evaluation (SemEval), a competition of systems for stance detection was held <ref type="bibr" target="#b19">[Mohammad et al., 2016]</ref>. English tweets for stance towards the following six targets: "atheism", "climate change is a real concern", "feminist movement", "Hillary Clinton", "legalization of abortion", and "Donald Trump" were studied. The winner was the baseline system of organizers based on the SVM and word and character n-grams (F 1 =68.98%). The first place among the participants was taken by the MITRE system (F 1 =67.82%), based on recurrent neural networks and word embeddings.</p><p>Hasan and Ng [2013] used as datasets the debate posts on four topics: "abortion", "gay rights", "Obama", and "marijuana". Three types of models were used: binary classifiers (Naïve Bayes and SVM), sequence classifiers (Hidden Markov Models and linear-chain Conditional Random Fields), and fine-grained models, jointly determine the stance label of a debate post and the stance label of each of its sentences. One of the results was the inability to identify a clear leader between Naïve Bayes and the SVM.</p><p>In our work each post is considered in isolation; for classification a vector model with feature selection and several supervised classifiers are used, including the SVM, Naïve Bayes, k-nearest neighbors, AdaBoost and decision trees. To the best of our knowledge, this is the first work in which the stance detection task for Russian is solved and corresponding labelled corpus is created. The main goal of this research is an evaluation of machine learning performance for stance detection in Russian.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Text corpus</head><p>The text corpus is formed from Russian-language messages of users of Internet forums and social network "VKontakte". To create the text corpus we used the messag-es containing opinions of users on the topic "Vaccinations for children". In the process of labeling each message was assigned with one of the following labels:</p><p>─ for, if the author supports vaccinations; ─ against, if the author is against vaccinations; ─ conflict, if the author is a supporter of some vaccinations and an opponent of others; ─ none, if on the basis of the message it is difficult to conclude whether the author is a supporter or an opponent of vaccinations.</p><p>Of the 22,000 analyzed texts, 1000 messages containing for and against labels were selected to the corpus. The labelling of corpus was made by three annotators. We used Fleiss' kappa statistical measure <ref type="bibr" target="#b8">[Fleiss, 1971]</ref> to evaluate the interannotator agreement. Its value is equal to 0.87 that confirms the high quality of the labelling.</p><p>Statistical characteristics of the text corpus are presented in Table <ref type="table" target="#tab_1">1</ref>. Below the examples from the text corpus are given for each class of texts. User messages from the for class (with author spelling and punctuation preserved):</p><p>Example 1. "Privivaju. Immunolog podruga sem'i, tak chto problem net. Opjat' zhe na moj vzgljad, esli pridumali privivki, to ne prosto tak" ("I'm making vaccinations. The immunologist is the friend of the family, so there are no problems. Again, in my opinion, if they came up with vaccinations, it's not just that").</p><p>Example 2. "Nuzhno ee protivnikov svodit' v infekcionku na jekskursiju!" ("It is necessary to take its opponents to isolation hospital!").</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>User messages from the against class:</head><p>Example 3. "Mne tozhe plevat' na mnenija vrachej, chto nado delat' privivki. Nado nadejat'sja ne na vrachej a verit' v Boga i vsjo budet horosho." ("I couldn't care less about doctors' opinions. Don't rely on doctors but trust in God and everything will be all right").</p><p>Example 4. "A chto vas ostanavlivaet ne delat' privivki? Ne vizhu v nih nikakogo smysla, i ne ponimaju zachem moemu rebenku nuzhno vvodit' vsjakuju gadost' i podryvat' ego immunitet v mladencheskom vozraste!" ("And what stops you from not getting vaccinated? I do not see any sense in them, and I do not understand why someone needs to inject into my child any muck and undermine his immunity in infancy!").</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>4</head><p>Results and discussion</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Experiments' design</head><p>The solution of the stance detection problem was accomplished using five methods of machine learning: Support Vector Machine (SVM), k-Nearest Neighbors (kNN), Naïve Bayes (NB), AdaBoost (AB), Decision Trees (DT). To optimize the parameters of the classifiers a five-fold nested cross-validation <ref type="bibr" target="#b4">[Cawley, 2010]</ref> procedure was applied and to obtain objective estimates of the classification quality a five-fold crossvalidation procedure was applied. Classifiers with the following parameters were used:</p><p>─ SVM with linear, RBF and polynomial kernels, regularization coefficient С = 10 p , p = [-1, 0, …, 6], gamma = 10 q , q = [-6, -5, …, -1] and gamma = 1/n_features, where n_features -number of features; ─ kNN with numbers of neighbors k = [1..20]; ─ NB with multinomial distribution; ─ AB with decision tree as base classifier; ─ DT with the maximum depth of the tree: max_depth = [1..20].</p><p>The experiments were carried out using the machine learning library scikit-learn <ref type="bibr" target="#b21">[Pedregosa et al., 2011]</ref>. To represent texts a vector space model was used. Each text was represented as an n-dimensional binary vector, the components of which characterized the presence or absence of the corresponding word in the text <ref type="bibr" target="#b15">[Manning et al., 2008]</ref>. The set of words considered in the vector model is a dictionary of this model.</p><p>In the most of experiments, the words from the texts were transformed to the lemmas. Morphological analysis of the texts was carried out using the MyStem tool <ref type="bibr" target="#b24">[Segalovich, 2003]</ref>.</p><p>In total, five experiments were carried out, differing in the composition of the vector model dictionary:</p><p>• Test1 -the dictionary was made up of all the words of the text corpus without transforming them to the lemmas; • Test2 -the dictionary was composed of all the words of the text corpus, transformed to the lemmas; • Test3 -the dictionary was made up of nouns, adjectives, verbs, adverbs and interjections, transformed to the lemmas; • Test4 -the dictionary from Test3 with the added words "za" ("favor") and "ne" ("not"); the word "protiv" ("against") already exists in Test3;</p><p>• Test5 -the dictionary was compiled on the basis of the dictionary from Test2 using the Recursive Feature Elimination (RFE) with the five-fold cross-validation.</p><p>RFE is a recursive repetition of following procedure <ref type="bibr" target="#b7">[Guyon, 2002]</ref>:</p><p>1. Train the classifier (e.g., linear SVM).</p><p>2. Compute the ranking criterion for all features.</p><p>3. Remove the feature with the smallest ranking criterion.</p><p>The size of the dictionaries used in the experiments are shown in Table <ref type="table" target="#tab_2">2</ref> (for Test5 the values for each of five folds are given). Two simple classifiers were used as baselines. The first classifier assigned to the text a label "against" if the word "against" was found in the text, and assigned a label "for" otherwise (BL1); the second classifier assigned to the text a label "for" if the word "for" was found, and assigned a label "against" otherwise (BL2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Results</head><p>The quality of the classification was estimated using of macro F 1 -measure. The values of the F 1 -measure are in Table <ref type="table" target="#tab_3">3</ref>, the classifiers' parameters, corresponding to the best values of the F 1 -measure, are in Table <ref type="table">4</ref>.  In Test1 the SVM showed the best result, ahead of NB and AB by 1.7% and 2.6%, respectively. The reduction of the feature space by almost half (from 11 584 words to 6 007) due to the lemmatization of words (Test2) led to an improvement in quality for NB (from 75.8% to 78.8%) and a slight decrease for the remaining classifiers (from 1.0% for the SVM to 4.4% for DT).</p><p>The use of only nouns, adjectives, verbs, adverbs and interjections with lemmatization (Test3) reduced the quality for all classifiers (from 2.3% for DT to 6.1% for AdaBoost). This decline was due to the exclusion of the keywords «favor» and «not», as demonstrated by Test4, in which these words were added to Test3. The results of Test4 were close to the results of Test2 (the difference does not exceed 1.8%).</p><p>Thus, in experiments without feature selection (Test1 -Test4), a Naïve Bayes classifier with a multinomial distribution made it possible to obtain a better classification quality when using a dictionary composed of all lemmas of the text corpus (Test2: F 1 = 78.8%). At the same time, the quality remains practically at the same level when only nouns, adjectives, verbs, adverbs and interjections are left, and the words «favor» and «not» (Test4: F 1 = 77.4%).</p><p>The selection of the optimal dictionary size in Test5 was based on 5-fold nested cross-validation using the RFE procedure and the SVM classifier with a linear kernel (C = 100) for each fold independently (see Table <ref type="table" target="#tab_2">2</ref>). Note that the Naïve Bayes classifier showed weaker results. Further the optimal dictionaries were used in each classifier to obtain F 1 -measure quality scores; Table <ref type="table" target="#tab_3">3</ref> shows the average F 1 -measure for all five folds. It turned out that the application of the RFE procedure allows to improve significantly the quality of classification for classifiers the SVM, kNN and NB (by 6.8%, 4.6% and 4.1%, respectively). The difference between precision and recall for SVM and NB classifiers in this test does not exceed 1.0%. At that the AB and DT classifiers show the best quality on Test1 with the maximum size of the feature space (11 584 words). Probably this is connected with the principle of building decision rules, underlying both DT and AB. However, their results on Test5 only slightly differ from Test1 (by 0.8%) with an average dictionary size of 693 words.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Analysis of the feature selection process</head><p>For the RFE-procedure and SVM-classifier an additional study was carried out: we constructed the dependence of the quality of the stance detection on the dictionary size, including the attributes with the best ranking criterion (Fig. <ref type="figure" target="#fig_1">1</ref>). Note that this dependence is obtained for the test sets in the cross-validation procedure, therefore it cannot be used as a final rating of the qualifier quality. However, it allows you to draw certain conclusions about the feature space in the problem of stance detection. Fig. <ref type="figure" target="#fig_1">1</ref> shows that the maximum value of F 1 -measure (92.3%) is achieved with the dictionary size of 450 words (we denote this dictionary as "optimal"). The words "convincing", "schedule", "for", "bear (disease)", "allow" have the highest weights in the "for" class, and the words "muck", "evil", "against", "denial", "cease" -in the "against" class.</p><p>Values of F 1 -measure, greater than 90%, can be obtained with the number of features from 350 to 700. Further increase in the number of features leads to a lowering of the classification quality.</p><p>An analysis of the distribution of parts of speech in the optimal dictionary in comparison with the distribution of parts of speech in Russian National Corpus [RNC] was made (Table <ref type="table" target="#tab_4">5</ref>). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>F1-measure</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Size of dic2onary</head><p>adjectives and adverbs that they carry important information for the analysis of opinions <ref type="bibr" target="#b30">[Turney, 2002]</ref>, then the greater proportion of verbs relative to adjectives and adverbs in the optimal dictionary is somewhat surprising. For example, in the sentiment lexicon the proportion of verbs, adjectives and adverbs on average is, respectively, 22%, 37% and 16% <ref type="bibr" target="#b11">[Kotelnikov, 2016]</ref>. At the same time there are only 12 (2 positive and 10 negative) verbs of the optimal dictionary, which are found in sentiment lexicons from <ref type="bibr" target="#b11">[Kotelnikov, 2016]</ref>.</p><p>Classifiers with dictionaries which contain 450 words were also tested, these dictionaries included words with maximum weights Term Frequency -Inverse Document Frequency (TF-IDF, Test6) <ref type="bibr" target="#b10">[Jones, 2004]</ref> and Relevance Frequency (RF, Test7) <ref type="bibr" target="#b12">[Lan et al., 2009]</ref> (See Table <ref type="table" target="#tab_6">6</ref>). Table <ref type="table" target="#tab_6">6</ref> shows that both TF-IDF and RF weighting methods reduce the quality for all classifiers as compared to RFE, except for the RF + NB bundle. But even in the latter case F 1 -measure turns out to be lower than for RFE + SVM.</p><p>Thus, the feature selection based on the RFE procedure can significantly improve the quality of the classification, compared to other methods of selecting features, but it is very resource intensive: in our experiments with 5-fold cross-validation, the RFE procedure took 7.5 hours on an ordinary desktop computer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion</head><p>The task of stance detection is important from the point of view of social media analysis. Our work showed that the traditional models and methods of machine learning and text classification, as well as the feature selection procedures, allow to obtain a sufficiently high quality of the analysis, about 85%. This quality is achieved by using the SVM-classifier with RBF-kernel, Recursive Feature Elimination procedure with words' lemmas as attributes. Multinomial Naïve Bayes classifier also has high marks (here our results coincide with the work <ref type="bibr" target="#b9">[Hasan and Ng, 2013]</ref>), but nevertheless, in general, it is slightly worse than the SVM.</p><p>Another result of our work is the usefulness of using verbs as features along with nouns, adjectives and adverbs; negation «not», as well as specific words for this task -«favor» and «against».</p><p>In the future, we plan for the stance detection task on the one hand to use deep learning approaches such as distributed representations of words, on the other hand, to apply sequential labeling methods, such as conditional random fields.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Table 4 .</head><label>4</label><figDesc>The classifiers' parameters, corresponding to the best values of the F 1</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Graph of the dependence of F 1 -measure on the dictionary size</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 .</head><label>1</label><figDesc>Characteristics of the text corpus</figDesc><table><row><cell>Label</cell><cell>Number of texts</cell><cell>Total amount of words</cell><cell>Average text length, words</cell><cell>Lexicon size (lemmas)</cell></row><row><cell>for</cell><cell>500</cell><cell>35 326</cell><cell>70</cell><cell>4 108</cell></row><row><cell>against</cell><cell>500</cell><cell>34 167</cell><cell>68</cell><cell>3 992</cell></row><row><cell>for &amp; against</cell><cell>1 000</cell><cell>69 493</cell><cell>69</cell><cell>6 007</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 .</head><label>2</label><figDesc>The size of the dictionary</figDesc><table><row><cell></cell><cell>Test1</cell><cell>Test2</cell><cell>Test3</cell><cell>Test4</cell><cell>Test5</cell></row><row><cell>Dictionary size</cell><cell>11 584</cell><cell>6 007</cell><cell>5 716</cell><cell>5 718</cell><cell>{ 661, 612, 468, 767, 959 } Average = 693</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3 .</head><label>3</label><figDesc>F 1 -measure, % (Test1 -Test5)</figDesc><table><row><cell>No. of test</cell><cell>BL1</cell><cell>BL2</cell><cell>SVM</cell><cell>kNN</cell><cell>NB</cell><cell>AB</cell><cell>DT</cell></row><row><cell>Test1</cell><cell></cell><cell></cell><cell>77.5</cell><cell>64.0</cell><cell>75.8</cell><cell>74.9</cell><cell>67.4</cell></row><row><cell>Test2</cell><cell></cell><cell></cell><cell>76.5</cell><cell>60.0</cell><cell>78.8</cell><cell>73.2</cell><cell>63.0</cell></row><row><cell>Test3</cell><cell>55.1</cell><cell>57.8</cell><cell>72.2</cell><cell>55.8</cell><cell>74.6</cell><cell>67.1</cell><cell>60.7</cell></row><row><cell>Test4</cell><cell></cell><cell></cell><cell>76.0</cell><cell>61.8</cell><cell>77.4</cell><cell>72.4</cell><cell>64.7</cell></row><row><cell>Test5</cell><cell></cell><cell></cell><cell>84.3</cell><cell>68.6</cell><cell>80.0</cell><cell>74.1</cell><cell>66.6</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 .</head><label>5</label><figDesc>Distribution of parts of speech in the optimal dictionary and in the Russian National Corpus</figDesc><table><row><cell>Part of speech</cell><cell cols="2">Optimal lexicon Count of words Portion, %</cell><cell>Portion in RNC, %</cell></row><row><cell>Nouns</cell><cell>148</cell><cell>32.9</cell><cell>28.5</cell></row><row><cell>Verbs</cell><cell>138</cell><cell>30.7</cell><cell>17.8</cell></row><row><cell>Adjectives</cell><cell>59</cell><cell>13.1</cell><cell>8.5</cell></row><row><cell>Adverbs</cell><cell>43</cell><cell>9.6</cell><cell>4.1</cell></row><row><cell>Others</cell><cell>62</cell><cell>13.8</cell><cell>41.1</cell></row><row><cell>Total</cell><cell>450</cell><cell>100</cell><cell>100</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 5</head><label>5</label><figDesc>shows that in the optimal dictionary the proportion of verbs, adjectives and adverbs is much larger than in Russian National Corpus. If it has long been known for</figDesc><table><row><cell>100%</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>95%</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>90%</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>85%</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>80%</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>75%</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>70%</cell><cell>0</cell><cell>500</cell><cell>1000</cell><cell>1500</cell><cell>2000</cell><cell>2500</cell><cell>3000</cell><cell>3500</cell><cell>4000</cell><cell>4500</cell><cell>5000</cell><cell>5500</cell><cell>6000</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 6 .</head><label>6</label><figDesc>F 1 -measure, % (Test5 -Test7)</figDesc><table><row><cell>No. of test</cell><cell>SVM</cell><cell>kNN</cell><cell>NB</cell><cell>AB</cell><cell>DT</cell></row><row><cell>Test5 (RFE)</cell><cell>84.3</cell><cell>68.6</cell><cell>80.0</cell><cell>74.1</cell><cell>66.6</cell></row><row><cell>Test6 (TF-IDF)</cell><cell>77.2</cell><cell>65.0</cell><cell>77.7</cell><cell>73.1</cell><cell>63.3</cell></row><row><cell>Test7 (RF)</cell><cell>76.9</cell><cell>66.8</cell><cell>81.6</cell><cell>67.8</cell><cell>61.6</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://www.statista.com.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://vk.com.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>. This work was carried out as a part of the project «Research and Development of Sentiment Lexicons for Text Sentiment Analysis» of the Government Order No. 34.2092.2017/4.6 of the Ministry of Education and Science of the Russian Federation (2017-2019).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Mining Newsgroups Using Networks Arising from Social Behavior</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rajagopalan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Srikant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">12th International Conference on World Wide Web (WWW 2003)</title>
				<meeting><address><addrLine>Budapest</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="page" from="529" to="535" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Cats Rule and Dogs Drool!: Classifying Stance in Online Debate</title>
		<author>
			<persName><forename type="first">P</forename><surname>Anand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Walker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Abbott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Fox Tree</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bowmani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Minor</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis, ACL-HLT 2011</title>
				<meeting><address><addrLine>Portland, Oregon</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="1" to="9" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Back up your Stance: Recognizing Arguments in Online Discussions</title>
		<author>
			<persName><forename type="first">F</forename><surname>Boltužić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Šnajder</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">First Workshop on Argumentation Mining</title>
				<meeting><address><addrLine>Baltimore, Maryland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="49" to="58" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Collective Classification of Congressional Floor-Debate Transcripts</title>
		<author>
			<persName><forename type="first">C</forename><surname>Burfoot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bird</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Baldwin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">49th Annual Meeting of the Association for Computational Linguistics</title>
				<meeting><address><addrLine>Portland, Oregon</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="1506" to="1515" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">On over-fitting in model selection and subsequent selection bias in performance evaluation</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">C</forename><surname>Cawley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">L C</forename><surname>Talbot</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">JMLR</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="2079" to="2107" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Ideological Perspective Detection Using Semantic Features</title>
		<author>
			<persName><forename type="first">H</forename><surname>Elfardy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Diab</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Callison-Burch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Fourth Joint Conference on Lexical and Computational Semantics (*SEM 2015)</title>
				<meeting><address><addrLine>Denver, Colorado</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="137" to="146" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Emergent: a novel data-set for stance classification</title>
		<author>
			<persName><forename type="first">W</forename><surname>Ferreira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vlachos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">NAACL-HLT 2016</title>
				<meeting><address><addrLine>San Diego, California</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="1163" to="1168" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Gene selection for cancer classification using support vector machines</title>
		<author>
			<persName><forename type="first">I</forename><surname>Guyon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Weston</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Barnhill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vapnik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Mach. Learn</title>
		<imprint>
			<biblScope unit="volume">46</biblScope>
			<biblScope unit="issue">1-3</biblScope>
			<biblScope unit="page" from="389" to="422" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Measuring nominal scale agreement among many raters</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Fleiss</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Psychological Bulletin</title>
		<imprint>
			<biblScope unit="volume">76</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="378" to="382" />
			<date type="published" when="1971">1971</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Stance Classification of Ideological Debates: Data, Models, Features, and Constraints</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">S</forename><surname>Hasan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Ng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Joint Conference on Natural Language Processing</title>
				<imprint>
			<publisher>Nagoya</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="1348" to="1356" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">A statistical interpretation of term specificity and its application in retrieval</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">S</forename><surname>Jones</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Documentation</title>
		<imprint>
			<biblScope unit="volume">60</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="493" to="502" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Manually Created Sentiment Lexicons: Research and Development</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">V</forename><surname>Kotelnikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Bushmeleva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">V</forename><surname>Razova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">A</forename><surname>Peskisheva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">V</forename><surname>Pletneva</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics and Intellectual Technologies</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">22</biblScope>
			<biblScope unit="page" from="281" to="295" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Supervised and Traditional Term Weighting Methods for Automatic Text Categorization</title>
		<author>
			<persName><forename type="first">M</forename><surname>Lan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">L</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Trans. on Pattern Analysis and Machine Intelligence</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="721" to="735" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Sentiment analysis and opinion mining</title>
		<author>
			<persName><forename type="first">B</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Synthesis lectures on human language technologies</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">5</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Taking sides: User classification for informal online political discourse</title>
		<author>
			<persName><forename type="first">R</forename><surname>Malouf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mullen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Internet Research</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="177" to="190" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Raghavan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schütze</surname></persName>
		</author>
		<title level="m">Introduction to Information Retrieval</title>
				<meeting><address><addrLine>NY</addrLine></address></meeting>
		<imprint>
			<publisher>Cambridge University Press</publisher>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Bow: A toolkit for statistical language modeling, text</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Mccallum</surname></persName>
		</author>
		<ptr target="http://www.cs.cmu.edu/~mccallum/bow" />
	</analytic>
	<monogr>
		<title level="m">retrieval, classification and clustering</title>
				<imprint>
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Automatic detection of arguments in legal texts</title>
		<author>
			<persName><forename type="first">M.-F</forename><surname>Moens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Boiy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">M</forename><surname>Palau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Reed</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">11th International Conference on Artificial Intelligence and Law</title>
				<meeting><address><addrLine>Palo Alto, California</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="225" to="230" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Sentiment analysis: detecting valence, emotions, and other affectual states from text</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Mohammad</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Emotion Measurement</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Mohammad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kiritchenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sobhani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Cherry</surname></persName>
		</author>
		<title level="m">SemEval-2016 Task 6: Detecting Stance in Tweets</title>
				<meeting><address><addrLine>San Diego, California</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="31" to="41" />
		</imprint>
	</monogr>
	<note>Proceedings of SemEval-2016</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Social media definition and the governance challenge: An introduction to the special issue</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Obar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wildman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Telecommunications policy</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="issue">9</biblScope>
			<biblScope unit="page" from="745" to="750" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Scikit-learn: Machine Learning in Python</title>
		<author>
			<persName><forename type="first">F</forename><surname>Pedregosa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">JMLR</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2825" to="2830" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Identifying Users with Opposing Opinions in Twitter Debates</title>
		<author>
			<persName><forename type="first">A</forename><surname>Rajadesingan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SBP 2014</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="volume">8393</biblScope>
			<biblScope unit="page" from="153" to="160" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<ptr target="http://ruscorpora.ru" />
		<title level="m">RNC: Russian National Corpus</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">A Fast Morphological Algorithm with Unknown Word Guessing Induced by a Dictionary for a Web Search Engine</title>
		<author>
			<persName><forename type="first">I</forename><surname>Segalovich</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">MLMTA-2003</title>
				<meeting><address><addrLine>Las-Vegas</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">From Argumentation Mining to Stance Classification</title>
		<author>
			<persName><forename type="first">P</forename><surname>Sobhani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Inkpen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Matwin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2nd Workshop on Argumentation Mining</title>
				<meeting><address><addrLine>Denver, Colorado</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="67" to="77" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Detecting Stance in Tweets and Analyzing its Interaction with Sentiment</title>
		<author>
			<persName><forename type="first">P</forename><surname>Sobhani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Mohammad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kiritchenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Fifth Joint Conference on Lexical and Computational Semantics (*SEM 2016)</title>
				<meeting><address><addrLine>Berlin</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="159" to="169" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Recognizing Stances in Online Debates</title>
		<author>
			<persName><forename type="first">S</forename><surname>Somasundaran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wiebe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP</title>
				<meeting><address><addrLine>Singapore</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="226" to="234" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Joint Models of Disagreement and Stance in Online Debate</title>
		<author>
			<persName><forename type="first">D</forename><surname>Sridhar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Foulds</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Getoor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Walker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</title>
				<meeting>the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing<address><addrLine>Beijing</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="116" to="125" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Get out the vote: Determining support or opposition from Congressional floor-debate transcripts</title>
		<author>
			<persName><forename type="first">M</forename><surname>Thomas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Pang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">EMNLP</title>
				<meeting><address><addrLine>Sydney</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="327" to="335" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">D</forename><surname>Turney</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">40th Annual Meeting of the ACL</title>
				<meeting><address><addrLine>Philadelphia, Pennsylvania</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="417" to="424" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Stance Classification using Dialogic Properties of Persuasion</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Walker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Anand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Abbott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Grant</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Conference of the North American Chapter of the ACL: Human Language Technologies</title>
				<meeting><address><addrLine>Montreal</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="592" to="596" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Zafarani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Abbasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
		<title level="m">Social Media Mining: An Introduction</title>
				<imprint>
			<publisher>Cambridge University Press</publisher>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
