<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Neural semi-supervised learning for multi-labeled short-texts</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Johnny</forename><surname>Torres</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Electrical and Computer Engineering (FIEC)</orgName>
								<orgName type="institution">ESPOL Polytechnic University</orgName>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Carmen</forename><surname>Vaca</surname></persName>
							<email>cvaca@espol.edu.ec</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Electrical and Computer Engineering (FIEC)</orgName>
								<orgName type="institution">ESPOL Polytechnic University</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Neural semi-supervised learning for multi-labeled short-texts</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">CF04798E48797D58F32841116666E421</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T03:28+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The massive data generated by users in online platforms, such as social networks, create challenges for text classification tasks based on supervised learning. Supervised learning often requires a lot of feature engineering or a significant amount of annotated data to achieve good results. However, the scarcity of annotated data is a critical issue, and manual annotation can be both costly and time-consuming. Semi-supervised learning requires far less annotated data and achieve similar performance as supervised approaches. In this paper, we introduce a semi-supervised neural architecture for muti-label settings, that combines deep learning representation and k-means clustering. The results show that the semi-supervised approach can leverage large-scale unlabeled data and achieve better results compared to baseline unsupervised as well as supervised methods.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The classification or grouping of short texts is critical in various tasks in text mining and the retrieval of information in the context of social networks or data generated by users on the web. Specifically, these tasks aim to categorize or group similar texts, so that texts with the same label or group are similar to each other and different from texts in other categories or groups. Traditional classification or grouping models often use a sparse representation for text data, such as the bag of words (BOW) or TF-IDF <ref type="bibr" target="#b11">[12]</ref>.</p><p>However, the characteristics of the short texts create some problems for both conventional unsupervised and supervised models. Usually, the number of unique words in each short text is small (90% of the texts instances in the HappyDB dataset have less than 23 words), and as a result, the problem of lexical shortage generally leads to poor grouping quality <ref type="bibr" target="#b7">[8]</ref>.</p><p>An alternative to address lexical shortages is to enrich text representations by extracting characteristics and relationships with sources such as Wikipedia <ref type="bibr" target="#b3">[4]</ref> or ontologies <ref type="bibr" target="#b8">[9]</ref>; however, this approach requires written knowledge, which also depends on the language. Other alternative is to code texts in distributed dense vectors <ref type="bibr" target="#b13">[14]</ref> with neural networks <ref type="bibr" target="#b17">[18]</ref>.</p><p>Another problem is the definition of the labels for a specific task and the number of manually annotated instances for each label. Unsupervised methods learn the categories from the data, but the resulting groupings may not be related to the expected labels. Supervised methods have predefined labels but often require a considerable number of labeled instances to learn to categorize. Semi-supervised approaches offer an alternative to solve these problems by using a small amount of labeled data according to predefined classes, at the same time take advantage of the massive unlabelled data availability <ref type="bibr" target="#b2">[3]</ref>.</p><p>This paper investigates the research question: How can a semi-supervised approach learn to categorize short texts in a multi-label taxonomy using a small set of labeled data and leveraging the availability of large amounts of unlabeled data? To that end, we build upon neural semi-supervised k-means clustering that modifies the conventional objective function and adds a penalty term for labeled data <ref type="bibr" target="#b16">[17]</ref>. We extended the neural semi-supervised clustering and applied to multi-label settings. The results show that semi-supervised k-means outperform other baseline unsupervised models for multi-label classification tasks.</p><p>The rest of the paper is structured as follows: a) we review related work and the k-means clustering, b) we describe the neural semi-supervised clustering for multi-label setting, c) we analysis the experimental results, and d) finally, we outline the conclusions and future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>Previous work on semi-supervised clustering methods analyzes methods based on: constraints and representation. The constraint-based approaches use a small percentage of labeled data to restrict the clustering process <ref type="bibr" target="#b6">[7]</ref>. Instead, the representation-based methods first learn a data representation model that satisfies the labeled data, and then use it to group both labeled and unlabeled data <ref type="bibr" target="#b2">[3]</ref>.</p><p>The hybrid approaches try to integrate both methods in a unified framework <ref type="bibr" target="#b4">[5]</ref>; However, the use of linear projection for learning by representation has limitations to achieve good performance. Recent methods use deep neural architectures to learn text representations that overcome the limitation of linear models <ref type="bibr" target="#b17">[18]</ref>. However, the separation of the learning process of the data representation model and the clustering model restrict the benefits and is more similar to the techniques representation-based. In this work, the proposed model builds on an approach that combines into an integrated framework both the representation of deep learning and the clustering method <ref type="bibr" target="#b16">[17]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Unsupervised Learning</head><p>In unsupervised learning, k-means is an algorithm for clustering data used in many applications, including text mining tasks <ref type="bibr" target="#b5">[6]</ref>. The k-means algorithm divides the data into a K number of clusters, so that minimizes the distance of each point to the centroids of the clusters, assigning it to the nearest cluster. The input to the clustering model are the set of short texts {s 1 , s 2 , s 3 , ..., s N } represented by the data points {x 1 , x 2 , x 3 , ..., x N }, where x i is a sparse or dense vector.</p><p>The k-means algorithm defines a set of binary variables r nk ∈ {0, 1} for each data point x n , where k ∈ {1, ..., K} specifies the cluster assigned. For example, r nk = 1 if x n is assigned to cluster k, and r nj = 0 for j = k. The objective function in k-means is defined as:</p><formula xml:id="formula_0">J unsup = N n=1 K k=1 r nk x n − µ k 2 (1)</formula><p>where µ k is the centroid of the k-th cluster. The k-means algorithm learns the values of {r nk } and {µ k } such that optimizes J unsup . To minimize the objective function, kmeans utilizes the gradient descent approach with an iterative procedure <ref type="bibr" target="#b15">[16]</ref>.</p><p>Each iteration involves two steps: assign clusters and estimate centroids. In the assign clusters step, k-means minimizes J unsup with respect to {r nk } by keeping fixed {µ k }. In this case, J unsup is a linear function for {r nk }, so that we optimize each data point separately by merely assigning the n-th data point to the closest cluster centroid.</p><p>In the estimate centroids step, k-means minimizes J unsup with respect to {µ k } by keeping {r nk } fixed. In this case, J unsup is quadratic function of {µ k }, and we minimize it by setting to zero the derivative for {µ k }, as follows:</p><formula xml:id="formula_1">∂J unsup ∂µ k = 2 N n=1 r nk (x n − µ k ) = 0<label>(2)</label></formula><p>Then, we can solve {µ k } as</p><formula xml:id="formula_2">µ k = N n=1 r nk x n N n=1 r nk<label>(3)</label></formula><p>Thus, µ k corresponds to the mean of all the data points assigned to the cluster k.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Neural Semi-supervised Clustering</head><p>The classical k-means algorithm uses unlabeled data to solve the clustering problem based on an unsupervised learning approach; however, the clustering results may not be consistent with the expected labels. We extend the semi-supervised approach in <ref type="bibr" target="#b16">[17]</ref>, which injects some supervised information into the learning process to produce useful and coherent clusters. Similar to the classic k-means algorithm, the training steps for the neural semi-supervised k-means are:</p><p>1. Initialize {µ k } and f (•).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Repeat until convergence:</head><p>(a) assign clusters: Assigns each short-text to its nearest cluster centroid based on its neural representation. (b) estimate centroids: Estimates the clusters' centroid based on the cluster assignments from previous step. (c) update parameters: Updates the neural networks parameters according to the objective function by keeping fixed the centroids and cluster assignments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Representation Learning</head><p>We represent each short text entry s i as a sequence of word indices and together with the initial centroids form the input data to the semi-supervised neuronal clustering model. Then, the embedding layer maps each word in the sequence as a dense vector x = f (s), using word embeddings initialized randomly or from pre-trained embeddings <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b12">13]</ref>.</p><p>In this approach, rather than training the text representation model independently, the semi-supervised clustering integrates it with the k-means algorithm training process.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Objective Function</head><p>The neural semi-supervised clustering uses a small number of labeled instances to guide the clustering process and minimizes the objective function defined as:</p><formula xml:id="formula_3">J semi = C c=1 {α N n=1 K k=1 r nk f (s n ) − µ k 2 + (1 − α) L n=1 { f (s n ) − µ gn 2 + j =gn [l + f (s n ) − µ gn 2 − f (s n ) − µ j 2 ] + }}<label>(4)</label></formula><p>where {(s 1 , y 1 ), (s 2 , y 2 ), ..., (s L , y L )} denote the labeled data, and the unlabeled data is {s L+1 , s L+2 , ..., s N }. The label y i specify the cluster for each short-text s i . The outer sum iterates over the number of labels C defined in the taxonomy; thus, extending the original objective function in <ref type="bibr" target="#b16">[17]</ref>. The objective function contains two terms:</p><p>1. The first term is the objective function in the classic k-means algorithm (Eq. 1), and the second term penalizes depending how far are the predicted clusters from the ground-truth clusters for labeled data. The factor α ∈ [0, 1] is used to tune the importance of unlabeled data. 2. The second term contains two sub-terms:</p><p>(a) The first sub-term penalizes depending on the distance between each labeled instance and its correct cluster centroid, where g n = G(y n ) indicates the cluster ID given by the label y n . The mapping function G(•) uses the Hungarian algorithm <ref type="bibr" target="#b14">[15]</ref>. (b) The second sub-term specifies a hinge loss function with a margin l, where</p><p>[x] + = max(x, 0). This term incurs in some loss if the distance to the ground truth centroid is larger (by a margin l) than the distances to the wrong centroids.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Model Training</head><p>The parameters in J semi are: the clusters' assignment for each text {r nk }, the clusters' centroids {µ k }, and the neural network weights f (•). The goal is to find the values of {r nk }, {µ k }, and parameters in f (•) that minimizes J semi . Based on the k-means algorithm, the semi-supervised model iteratively minimizes J semi with respect to {r nk }, {µ k }, and parameters in f (•).</p><p>First, the model initializes the clusters' centroids {µ k } with the k-means method <ref type="bibr" target="#b0">[1]</ref>, and also initializes randomly the parameters in the neural network. Then, the model iteratively carry out three steps (assigns clusters, estimates the centroids, and updates the parameters) until J semi converges.</p><p>The assign clusters step minimizes J semi with respect to {r nk } by keeping fixed f (•) and {µ k } to assign a cluster ID for each data point. The second term in Eq. ( <ref type="formula" target="#formula_3">4</ref>) has no relation with {r nk }. Thus, the model only needs to minimize the first term, by setting the nearest cluster centroid to each text, i.e., is identical to the assign clusters step in the k-means algorithm. In this step, the model also calculates the mappings between the ground-truth clusters specified by {y i } and the cluster assignments for the labeled data.</p><p>The estimate centroids step minimizes J semi with respect to {µ k } by keeping {r nk } and f (•) fixed, which corresponds to the estimate centroids step in the classic k-means algorithm. It aims to estimate the cluster centroids {µ k } based on the cluster assignments {r nk } from the assign cluster step. In the Eq. 4, the second term considers each labeled instance in the process of estimating cluster centroids. By solving ∂J semi /∂µ k = 0, we get</p><formula xml:id="formula_4">µ k = N n=1 αr nk f (s n ) + L n=1 w nk f (s n ) N n=1 αr nk + L n=1 w nk (5) w nk = (1 − α)(I nk + j =gn I nkj − j =gn I nkj ) I nk = δ(k, g n ) I nkj = δ(k, j) • δ nj I nkj = (1 − δ(k, j)) • δ nj δ nj = δ(l + f (s n ) − µ gn 2 − f (s n ) − µ j 2 &gt; 0)<label>(6)</label></formula><p>where δ(x 1 , x 2 )=1 if x 1 is equal to x 2 , otherwise δ(x 1 , x 2 )=0; and δ(x) = 1 if x is true, otherwise δ(x) = 0. In the numerator of Eq. 5, the first term represents the contributions from all data points, and the weight of s n for µ k is αr nk . The second term represents labeled data, and w nk is the weight of an instance s n for µ k .</p><p>The update parameters step minimizes J semi with respect to f (•) by keeping {r nk } and {µ k } fixed, with no counterpart in the k-means algorithm. The primary goal is to learn the parameters of the text representation model. The training uses J semi as the loss function and employs Adam algorithm to optimize it <ref type="bibr" target="#b10">[11]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Experiments</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Experimental Setting</head><p>We evaluate the models on the HappyDB dataset <ref type="bibr" target="#b1">[2]</ref>, comprising individual accounts of happy moments. The aim is to predict agency and social labels that indicate the context of happy moments. For training, we use a small subset of labeled data and a large subset unlabeled dataset <ref type="bibr" target="#b9">[10]</ref>. The table 1 summarizes the number of labeled and unlabeled text instances for training, as well as the number of text instances in the test set. For the experiments, the splitting strategy is to randomly sample 80% of labeled instances for training (training set) and remaining 20% instances for validation (validation set). The unsupervised and semisupervised models use the unlabeled instances for training (unlabeled set). We train the models using k-fold cross-validation (k = 10) on the training set and report the results for the validation set using the metric F 1 . </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Model Hyperparameters</head><p>Neural architectures introduce several hyper-parameters like the output dimension of the text representation models, while semi-supervised k-means clustering has α in Eq. 4. This subsection analyzes the impact of some of the hyper-parameters and determines the configuration for further experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Embeddings dimension</head><p>To evaluate the effectiveness of the output dimension in text representation models, we perform experiments with embeddings size of {50, 100, 200, 300, 500, 1000}, while maintaining the all other parameters fixed. Figure <ref type="figure" target="#fig_0">1</ref> show that the score F 1 drops if the size is ≤ 100 and the curve falls if the size is ≥ 500. Based on the results, we use 300 as the embeddings size. Alpha We evaluate the effect of α in Eq. 4, which indicates the importance of unlabeled data in the performance of the model. We test α with values of: {0.00001, 0.0001, 0.001, 0.01, 0.1}, and maintain the other parameters fixed. The figure <ref type="figure">2</ref> shows that the performance decay for small α values. By increasing the value of α, we notice progressive improvements and reach a peak F 1 score with α = 0.1. Further experiments use the value of α = 0.1 as it maximizes F 1 . Labeled set size This parameter controls the influence of the size of the labeled data. We evaluate the ratio of labeled data for training between [1%, 10%], and kept the other parameters fixed. The figure <ref type="figure" target="#fig_2">3</ref> illustrates the performance improvement as the size of labeled data increases and confirms the importance of labeled data for training. Pre-training This option measures the effect of the pre-training embeddings for neural architectures. We use pre-trained embeddings in the models as a classification task with labeled data, and we then use the weights (excluding the top layer) to initialize semisupervised clustering. We evaluate several pre-trained embeddings such as Word2Vec, Glove, FastText. Figure <ref type="figure" target="#fig_3">4</ref> shows that pre-trained embeddings achieve superior performance compared to random embeddings; for further experiments we use FastText. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Models Benchmarking</head><p>This subsection compares the proposed semi-supervised approach with unsupervised and supervised models.</p><p>Unsupervised learning: All unsupervised models use k-means for clustering. We cluster with k = 2 to map the values for each label (0, 1). For learning representation, we use the following methods:</p><p>-BOW: represents each short-text as sparse vector based on term frequency (TF).</p><p>-TF-IDF: similar to BOW, uses a sparse vector to represent each short-text based on term frequency-inverse document frequency. -AVG-EMB: uses word embeddings vectors to represent each short-text and then calculate the average.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Supervised learning:</head><p>We evaluate several supervised models for the classification task, the representation learning used depends on each model as described next:</p><p>-LR: uses sparse vector representation that feeds a logistic regression classifier.</p><p>-FastText: uses dense word vectors representation (embeddings layer), followed by a Global Average Pooling layer, which averages the word embeddings, and then uses a Dense layer with sigmoid activation to predict the labels.</p><p>-CNN: uses dense word vectors representation (word embeddings layer) followed by a Dropout layer, then a convolutional layer, and an output layer with sigmoid activation. -LSTM: similar to CNN, but the word embeddings layer feeds a recurrent LSTM layer, which is more suitable for sequence modeling such as texts. -BiLSTM: uses two LSTM networks to model the texts sequences in both directions, followed by a Dropout layer with rate 0.5, and then a dense layer with sigmoid activation. -CNN-LSTM: leverage the advantage of CNN layer to capture salient features and sequence modeling capability of LSTM.</p><p>Table <ref type="table" target="#tab_1">2</ref> summarizes the scores of the models on the test set. The models fall into three categories (type): unsupervised, supervised, and semi-supervised. The metrics are precision, recall, and F 1 . We report the scores for each label: agency and social. The last three columns show the total weighted score of the metrics for each model. The results show that the supervised systems outperform unsupervised models by a large margin, which highlights the importance of labeled data. Among the supervised learning, deep neural models perform better than the baseline method (LR); however only with a small margin. Finally, the semi-supervised model shows promising results, as it achieves score close to the supervised models. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusions</head><p>This work builds on the neural semi-supervised clustering that integrates a neural representation learning for short-texts, and the k-means clustering into a unified framework.</p><p>To that end, the model utilizes a small percentage of labeled data to guide the intention for clustering. We extended the model to use it in muti-labeled clustering of short-texts. The results show that the proposed neural semi-supervised clustering is more effective than baselines unsupervised, and it close to the supervised models. Therefore, the results show the potential to overcome critical issues, such as of scarcity of labeled data, and leverage the availability of massive unlabeled data.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Influence of the dimensionality of the text learning representation.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>10 − 5 Fig. 2 .</head><label>52</label><figDesc>Fig.2. Influence of unlabeled data, where the x-axis is α in Eq. (4).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Influence of the size of labeled data used for training.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 4 .</head><label>4</label><figDesc>Fig. 4. Influence using pre-training embeddings in neural models.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Statistics for the HappyDB dataset</figDesc><table><row><cell>Dataset</cell><cell>Labeled</cell><cell cols="2">Unlabeled Test</cell><cell>Total</cell></row><row><cell cols="2">HappyDB 10,560</cell><cell>72,324</cell><cell>17,215</cell><cell>100,099</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Performance of the models. .69 78.24 54.02 52.96 53.49 63.36 68.83 65.86 EMB-AVG 73.73 97.79 84.07 53.59 54.16 53.87 63.66 75.97 68.97 supervised LR 81.06 94.30 87.18 93.00 82.30 87.32 87.03 88.30 87.25 FastText 85.74 93.98 89.67 90.99 86.73 88.81 88.37 90.35 89.24 CNN 90.92 88.53 89.71 89.34 90.53 89.93 90.13 89.53 89.82 LSTM 89.19 89.81 89.50 91.57 87.52 89.50 90.38 88.67 89.50 BiLSTM 89.18 90.84 90.00 92.84 84.96 88.72 91.01 87.90 89.36 CNN-LSTM 89.38 88.92 89.15 89.70 87.88 88.78 89.54 88.40 88.96 semi-supervised CNN 89.10 92.18 90.62 89.56 89.56 89.56 89.33 90.87 90.09</figDesc><table><row><cell></cell><cell></cell><cell>Agency</cell><cell>Social</cell><cell>Total (Weighted)</cell></row><row><cell>Type</cell><cell>Model</cell><cell cols="2">Prec. Recall F1 Prec. Recall F1 Prec. Recall F1</cell></row><row><cell></cell><cell>BOW</cell><cell cols="2">72.77 38.51 50.37 54.46 74.32 62.86 63.61 56.42 56.61</cell></row><row><cell>unsupervised</cell><cell>TF-IDF</cell><cell>72.70 84</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">k-means++: The advantages of careful seeding</title>
		<author>
			<persName><forename type="first">D</forename><surname>Arthur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Vassilvitskii</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms</title>
				<meeting>the eighteenth annual ACM-SIAM symposium on Discrete algorithms</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="1027" to="1035" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Happydb: A corpus of 100,000 crowdsourced happy moments</title>
		<author>
			<persName><forename type="first">A</forename><surname>Asai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Evensen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Golshan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Halevy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lopatenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Stepanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Suhara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">C</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of LREC 2018. European Language Resources Association (ELRA)</title>
				<meeting>LREC 2018. European Language Resources Association (ELRA)<address><addrLine>Miyazaki, Japan</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018-05">May 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Semi-supervised clustering methods</title>
		<author>
			<persName><forename type="first">E</forename><surname>Bair</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Wiley Interdisciplinary Reviews: Computational Statistics</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="349" to="361" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Clustering short texts using wikipedia</title>
		<author>
			<persName><forename type="first">S</forename><surname>Banerjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Ramanathan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gupta</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval</title>
				<meeting>the 30th annual international ACM SIGIR conference on Research and development in information retrieval</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="787" to="788" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Integrating constraints and metric learning in semisupervised clustering</title>
		<author>
			<persName><forename type="first">M</forename><surname>Bilenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Basu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Mooney</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the twenty-first international conference on Machine learning</title>
				<meeting>the twenty-first international conference on Machine learning</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page">11</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Introduction to information retrieval</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Christopher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Prabhakar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hinrich</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">An Introduction To Information Retrieval</title>
		<imprint>
			<biblScope unit="volume">151</biblScope>
			<biblScope unit="issue">177</biblScope>
			<biblScope unit="page">5</biblScope>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">A survey of clustering with instance level</title>
		<author>
			<persName><forename type="first">I</forename><surname>Davidson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Basu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Constraints</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page">2</biblScope>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Information theoretic clustering of sparse cooccurrence data</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">S</forename><surname>Dhillon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Guan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Third IEEE International Conference on</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2003">2003. 2003</date>
			<biblScope unit="page" from="517" to="520" />
		</imprint>
	</monogr>
	<note>Data Mining</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">On ontology-driven document clustering using core semantic features</title>
		<author>
			<persName><forename type="first">S</forename><surname>Fodeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Punch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">N</forename><surname>Tan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Knowledge and information systems</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="395" to="421" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">The CL-Aff Happiness Shared Task: Results and Key Insights</title>
		<author>
			<persName><forename type="first">K</forename><surname>Jaidka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mumick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Chhaya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ungar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2nd Workshop on Affective Content Analysis @ AAAI (AffCon2019)</title>
				<meeting>the 2nd Workshop on Affective Content Analysis @ AAAI (AffCon2019)<address><addrLine>Honolulu, Hawaii</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019-01">January 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Adam: A method for stochastic optimization</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ba</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1412.6980</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Scoring, term weighting and the vector space model</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Raghavan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schütze</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Introduction to information retrieval</title>
		<imprint>
			<biblScope unit="volume">100</biblScope>
			<biblScope unit="page" from="2" to="4" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Advances in pre-training distributed word representations</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bojanowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Puhrsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Language Resources and Evaluation (LREC</title>
				<meeting>the International Conference on Language Resources and Evaluation (LREC</meeting>
		<imprint>
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Distributed representations of words and phrases and their compositionality</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">S</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="3111" to="3119" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Algorithms for the assignment and transportation problems</title>
		<author>
			<persName><forename type="first">J</forename><surname>Munkres</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the society for industrial and applied mathematics</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="32" to="38" />
			<date type="published" when="1957">1957</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Pattern recognition and machine learning</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">M</forename><surname>Nasrabadi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of electronic imaging</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page">49901</biblScope>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Semi-supervised clustering for short text via deep representation learning</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ittycheriah</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1602.06797</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Short text clustering via convolutional neural networks</title>
		<author>
			<persName><forename type="first">J</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Tian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">VS@ HLT-NAACL</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="62" to="69" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
