<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Encoder-Decoder neural networks for taxonomy classification</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Makoto</forename><surname>Hiramatsu</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Kei</forename><surname>Wakabayashi</surname></persName>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="department">Graduate School of Library, Information and Media Studies</orgName>
								<orgName type="institution">University of Tsukuba Tsukuba</orgName>
								<address>
									<settlement>Ibaraki</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="department">Faculty of Library, Information and Media Science</orgName>
								<orgName type="institution">University of Tsukuba Tsukuba</orgName>
								<address>
									<settlement>Ibaraki</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<address>
									<settlement>Ann Arbor</settlement>
									<region>Michigan</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Encoder-Decoder neural networks for taxonomy classification</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">5D303200D2D2E9BDA34831E9EAEEB1AB</idno>
					<idno type="DOI">10.1145/nnnnnnn.nnnnnnn</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T06:32+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Encoder-Decoder Neural Networks</term>
					<term>Recurrent Neural Networks</term>
					<term>Taxonomy classification</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper describes our taxonomy classifier for SIGIR eCom Rakuten Data Challenge. We propose a taxonomy classifier based on sequenceto-sequence neural networks, which are widely used in machine translation and automatic document summarization, by treating taxonomy classification as the translation problem from a description of a product to a category path. Experiments show that our method can predict category paths more accurately than baseline classifier.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Taxonomy is the major classification schemes in organizing concepts. With the rapid growth of the e-commerce market accompanying the development on the Internet, the number of products on e-commerce becomes enormous. In this situation, it is required to develop methods that predict taxonomic categories automatically because it is costly to classify all the products manually.</p><p>Rakuten Data Challenge, which is a competition we participated, provides a task to predict correct categories for each given product. As a feature of this task, categories have a hierarchical structure. This hierarchical structure corresponds to a taxonomy, which indicates that items in a category are further classified into a subcategory that contains further lower detail information. Each product has a path in the taxonomy like "Clothing, Shoes &amp; Accessories → Shoes → Men → Boots".</p><p>As an approach to solving this task, the most straightforward approach is to train a multi-class classifier (e.g., Random Forest) that predicts a category path as a class of a given product. However, as mentioned earlier, the number of category paths is 3,695, which is fairly large to be considered as a set of classes for ordinal machine learning classifier. Moreover, this approach independently treats these category paths although a category path shares a part of another category path of a similar product. It is expected that this fact causes more data sparseness issue and degrades the performance because the classifier has no way to find common patterns that are shared in two different category paths.</p><p>In this paper, we propose a taxonomy classifier based on Encoder-Decoder neural networks. The key idea is to regard the category path as a series of category names in each hierarchical level. From this perspective, the taxonomy classification task can be converted into a sequence-to-sequence problem, which has a text (i.e., a sequence of words) of the product name as the input and a sequence of category names as the output. In recent years, remarkable performance has been demonstrated in the field of machine translation and automatic summarization by using the model called neural network Encoder-Decoder architecture. We apply the Encoder-Decoder model to the taxonomy classification task and evaluate the performance. Experiments show that our approach can successfully predict category paths more precisely than the baseline approach that treats the task as a multi-class classification problem and applies Random Forest. We have 800,000 records for training data and 200,000 records for test data. Each record has a description of a product and a category path. The number of labels in the training data is 3,695, and each label is assigned to 868 items on average. The category (id=4015) is most frequently assigned to products, which is assigned to 268,295 items. Figure <ref type="figure" target="#fig_0">1</ref>  Table <ref type="table" target="#tab_0">1</ref> shows the histogram of the depth of category path in the training set. The depth of category path in training set is 4.01 on average. In other words, each product has four categories on average. The maximum depth of the depth of category path was 8, and the minimum depth was 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">DATASET</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">PROPOSED METHOD 3.1 Preprocessing</head><p>We used 20 % of the training dataset as the validation set to evaluate models. As preprocessing, we lowercase a product name in training/validation/test sets with SpaCy<ref type="foot" target="#foot_0">1</ref> . We use both the original corpus and the lowercase corpus and compare classifier performances.</p><p>For the weights of dense word representation layer, we use GloVe <ref type="bibr" target="#b5">[6]</ref> pre-trained embeddings trained on Gigaword and Wikipedia. GloVe contains the lowercase words in its vocabulary. The preprocessing of lowercase makes the vocabulary matchinд rate improve. We show the matchinд rate of two corpora in Table <ref type="table" target="#tab_1">2</ref> where source means descriptions of products, which are inputs. matchinд rate is defined by</p><formula xml:id="formula_0">matchinд rate = |V Dat aset ∩ V GloV e | |V Dat aset | ,<label>(1)</label></formula><p>where V Dat aset is the vocabulary of the dataset and V GloV e is the vocabulary in the GloVe embeddings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Encoder-Decoder neural networks for taxonomy classifier</head><p>Encoder-Decoder Neural Network is a type of neural network that is actively studied in recent years <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b6">7]</ref>, which shows very good performance in various tasks such as machine translation and automatic summarization. We will describe the Encoder-Decoder Neural Network used in this research.</p><p>Figure <ref type="figure">2</ref> shows our Encoder-Decoder neural network with attention mechanism <ref type="bibr" target="#b0">[1]</ref>. Our model has two main functions called encoder and decoder. An encoder function f enc takes an input sequence of words x = (x 1 , x 2 , . . . , x n ) and a decoder function f dec predicts the probability of a category path sequence y = (y 1 , y 2 , . . . , y m ). f enc outputs a sequnce of hidden states h = (h 1 , h 2 , . . . , h n ). To predict y t , f dec uses information from h and c t . A context vector c t captures input sequence information to help predict an each label y t . A context vector c t is defined as following:</p><formula xml:id="formula_1">c t = i a t i h i , (<label>2</label></formula><formula xml:id="formula_2">)</formula><p>and attention is defined as following:</p><formula xml:id="formula_3">a t i = âti j ât j , (<label>3</label></formula><formula xml:id="formula_4">) âti = att(h i , ht ),<label>(4)</label></formula><p>where att(h t , hi ) is an attention function. The attention function of our works is based on Luong et al. <ref type="bibr" target="#b3">[4]</ref> defined as following:</p><formula xml:id="formula_5">att(h i , ht ) = h i T W a ht , (<label>5</label></formula><formula xml:id="formula_6">)</formula><p>where h is the encoder state, h is the decoder state and W a is the weight matrix that controls the contribution of each h i and ht .</p><p>Encoder-Decoder neural networks for taxonomy classification SIGIR 2018 eCom Data Challenge, July 2018, Ann Arbor, Michigan, USA </p><p>After the encoder takes input, the decoder predicts outputs using encoder state. As a feature of the Encoder-Decoder neural network, the input sequence length and the output sequence length do not have to match. It can predict various length category path with various length of a description of a product.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">EXPERIMENTS</head><p>This section presents evaluations of our taxonomy classifier and the baseline classifier in the validation set. At the time of training, we use up to 50,000 words as the features both in baseline and the proposed model. In the experiment, we examine parameters of our taxonomy classifier (in Table <ref type="table">3</ref>) and show best parameters in each pair of encoder and decoder in Table <ref type="table">4</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Baseline</head><p>We use Random Forest <ref type="bibr" target="#b1">[2]</ref> as the baseline. Random Forest is commonly used in various kind of tasks including classification. If we try to solve the multi-label problem where there are 3,695 labels, the computational cost is very expensive. To avoid this difficulty, we use the category path as the label to predict. Therefore our baseline tries to solve the multi-class (3,695 classes) classification problem.</p><p>We use the TF-IDF vectors for features of the product description representations. To implement the baseline, we use scikit-learn <ref type="bibr" target="#b4">[5]</ref>. We use the scikit-learn's default parameters to train the Random Forest.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Results</head><p>We evaluate the performance of our proposed models and the baseline on the validation set with the official script (eval.py). We show the best parameters for each model in Table <ref type="table">4</ref>, and the results in Table <ref type="table">5</ref>. Bidirectional LSTM with GloVe achieves the best F1 score. Our model achieved the best performance when it uses Bidirectional LSTM as an encoder/decoder, lowercase dataset and use GloVe embeddings to initialize the weights of the embedding layer for the input sequence. Interestingly, it shows bad scores when we use GRU for encoder and decoder. We will further investigate the reason for this.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">CONCLUSION</head><p>In this paper, we propose an encoder-decoder neural network for taxonomy classification where there are various sizes of category paths. It is computationally expensive to solve this problem as a multi-label classification because there are over 3,695 categories in the dataset, To avoid this difficulty, we regarded taxonomy classification as the translation from the description of products to the</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Histogram of the number of words in product descriptions in the dataset</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Histogram of the depth of category paths</figDesc><table><row><cell cols="2">Category depth Frequency of item</cell></row><row><cell>1</cell><cell>8,172</cell></row><row><cell>2</cell><cell>2,792</cell></row><row><cell>3</cell><cell>228,888</cell></row><row><cell>4</cell><cell>344,472</cell></row><row><cell>5</cell><cell>166,165</cell></row><row><cell>6</cell><cell>45,253</cell></row><row><cell>7</cell><cell>4,197</cell></row><row><cell>8</cell><cell>61</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Vocabulary matching rate</figDesc><table><row><cell cols="3">Preprocessing The size of source vocabulary Matching rate</cell></row><row><cell>None</cell><cell>670,092</cell><cell>10.69%</cell></row><row><cell>lowercase</cell><cell>626,567</cell><cell>57.82%</cell></row></table><note>t | y &lt;t , x)</note></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://spacy.io</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ACKNOWLEDGEMENTS</head><p>This work was supported by JSPS KAKENHI Grant Number 16H02904. Also, we would like to show our gratitude to Kento Nozawa and Taro Tezuka for comments that greatly improved the manuscript.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0" />			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Neural Machine Translation by Jointly Learning to Align and Translate</title>
		<author>
			<persName><forename type="first">Dzmitry</forename><surname>Bahdanau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kyunghyun</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
		<ptr target="http://arxiv.org/abs/1409.0473" />
	</analytic>
	<monogr>
		<title level="m">Proc. International Conference on Learning Representations</title>
				<meeting>International Conference on Learning Representations</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title/>
		<author>
			<persName><forename type="first">Leo</forename><surname>Breiman</surname></persName>
		</author>
		<idno type="DOI">10.1023/A:1010933404324</idno>
		<ptr target="https://doi.org/10.1023/A:1010933404324" />
	</analytic>
	<monogr>
		<title level="j">Random Forests. Mach. Learn</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="5" to="32" />
			<date type="published" when="2001-10">2001. Oct. 2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation</title>
		<author>
			<persName><forename type="first">Kyunghyun</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bart</forename><surname>Van Merrienboer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Caglar</forename><surname>Gulcehre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dzmitry</forename><surname>Bahdanau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fethi</forename><surname>Bougares</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Holger</forename><surname>Schwenk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
		<ptr target="http://emnlp2014.org/papers/pdf/EMNLP2014179.pdfhttp://arxiv.org/abs/1406.1078" />
	</analytic>
	<monogr>
		<title level="m">Proc. Empirical Methods in Natural Language Processing</title>
				<meeting>Empirical Methods in Natural Language essing</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Effective Approaches to Attention-based Neural Machine Translation</title>
		<author>
			<persName><forename type="first">Minh-Thang</forename><surname>Luong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Empirical Methods in Natural Language Processing</title>
				<meeting>Empirical Methods in Natural Language essing</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1412" to="1421" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Scikit-learn: Machine Learning in Python</title>
		<author>
			<persName><forename type="first">F</forename><surname>Pedregosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Varoquaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gramfort</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Michel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Thirion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Grisel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Blondel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Prettenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dubourg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vanderplas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Passos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cournapeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brucher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Perrot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Duchesnay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2825" to="2830" />
			<date type="published" when="2011">2011. 2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">GloVe: Global Vectors for Word Representation</title>
		<author>
			<persName><forename type="first">Jeffrey</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Richard</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<ptr target="http://www.aclweb.org/anthology/D14-1162" />
	</analytic>
	<monogr>
		<title level="m">Proc. Empirical Methods in Natural Language Processing</title>
				<meeting>Empirical Methods in Natural Language essing</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1532" to="1543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Sequence to sequence learning with neural networks</title>
		<author>
			<persName><forename type="first">Ilya</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Oriol</forename><surname>Vinyals</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Quoc</surname></persName>
		</author>
		<author>
			<persName><surname>Le</surname></persName>
		</author>
		<ptr target="http://arxiv.org/abs/1409.3215" />
	</analytic>
	<monogr>
		<title level="m">Proc. Advances in Neural Information Processing Systems</title>
				<meeting>Advances in Neural Information essing Systems</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="3104" to="3112" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
