=Paper=
{{Paper
|id=Vol-2319/ecom18DC_paper_7
|storemode=property
|title=Unconstrained Production Categorization with Sequence-to-Sequence Models
|pdfUrl=https://ceur-ws.org/Vol-2319/ecom18DC_paper_7.pdf
|volume=Vol-2319
|authors=Yundi Maggie Li,Liling Tan,Stanley Kok,Ewa Szymanska
|dblpUrl=https://dblp.org/rec/conf/sigir/LiTKS18
}}
==Unconstrained Production Categorization with Sequence-to-Sequence Models==
<pdf width="1500px">https://ceur-ws.org/Vol-2319/ecom18DC_paper_7.pdf</pdf>
<pre>
                                  Unconstrained Product Categorization with
                                       Sequence-to-Sequence Models
                                     Maggie Yundi Li∗                                                       Liling Tan, Stanley Kok, Ewa Szymanska
                          National University of Singapore                                                            Rakuten Institute of Technology
                                    Singapore                                                                                      Singapore
                            a0131278@comp.nus.edu.sg                                                                   {first.lastname}@rakuten.com

ABSTRACT                                                                                                  Category: 3292>1041>4175>4258
Product categorization is a critical component of e-commerce plat-                                        Canon EOS M10 Mirrorless Digital Camera with 15-45mm Lens
forms that enables organization and retrieval of the relevant prod-                                       + 16GB Memory Card + Camera Case
ucts. Instead of following the conventional classification approaches,
we consider category prediction as a sequence generation task                                             Canon 6163B001M PowerShot ELPH 530HS White 10.1MP
where we allow product categorization beyond the hierarchical
definition of the full taxonomy.                                                                          Panasonic Lumix DMC-GF7 Mirrorless Micro Four Thirds
   This paper presents our submissions for the Rakuten Data Chal-                                         Digital Camera (Black Body Only)
lenge at SIGIR eCom’18. The goal of the challenge is to predict the
multi-level hierarchical product categories given the e-commerce                                          Category: 3292>1041>4380>4953
product titles. We ensembled several attentional sequence-to-sequence                                     Canon PowerShot Elph 360 HS Wi-Fi Camera + 32GB + Case
models to generate product category labels without supervised                                             + Battery + Selfie Stick + Sling Strap + Kit
constraints. Such unconstrained product categorization suggests
possible addition to the existing category hierarchy and reveals                                          Fujifilm X-E3 4K Digital Camera & 23mm f/2 XF Lens (Silver)
ambiguous and repetitive category leaves.
   Our system achieved a balanced F-score of 0.8256, while the                                            Category: 3292>1041>4380>4374
organizers’ baseline system scored 0.8142, and the best performing                                        Canon EF 70-200mm f/2.8L IS II USM Telephoto Zoom Lens
system scored 0.8513.                                                                                     Deluxe Accessory Bundle

CCS CONCEPTS                                                                                             Table 1: Product Titles and Categories in the Training Data
• Computing methodologies → Natural language process-
ing; • Applied computing → Electronic commerce;

KEYWORDS
Text Classification, Sequence-to-Sequence                                                                on e-commerce platforms, the process is labor-intensive and leads
                                                                                                         to inconsistent categories for similar items [3, 10].
                                                                                                            Automatic product categorization based on available product
1      INTRODUCTION
                                                                                                         information, such as product titles, would thus significantly smooth
Product categorization is necessary to ensure that e-commerce                                            this process.
platforms accurately and efficiently retrieve the relevant items [9].                                       Previous approaches to e-commerce product categorization fo-
E-commerce sites use hierarchical taxonomies to organize prod-                                           cused on mapping product information (titles, descriptions, images,
ucts from generic to specific classes. For instance, the product ‘Dr.                                    etc.) to the specific categories based on the existing labels from
Martens Air Wair 1460 Mens Leather Ankle Boots’ falls under the                                          the training data. Despite the effectiveness of such approaches,
‘Clothing, Shoes, Accessories -> Shoes -> Men -> Boots’                                                  products can only be classified into the categories given by the
category on Rakuten.com.                                                                                 platform. In contrast, the static product category hierarchies would
   Product taxonomies allow easy detection of similar products and                                       not be able to adapt to the ever-growing number of products on the
are used for product recommendation and duplicate removal on                                             e-commerce platform. We want to automatically learn the cross-
e-commerce sites [16, 18]. Although merchants are encouraged to                                          pollination of sub-categories beyond the predefined hierarchy, in-
manually input categories for their products when they post them                                         stead of imposing the hard boundaries inherited from higher level
∗ This is the corresponding author                                                                       categories.
                                                                                                            By redefining the classic product category classification task as
Permission     to make
Copyright © 2018     by thedigital
                            paper’sor hard Copying
                                   authors.  copies of  part orfor
                                                     permitted   allprivate
                                                                     of this  work
                                                                            and      for personal
                                                                                academic  purposes. or   a sequence generation task, we were able to generate categories
classroom     use is granted
In: J. Degenhardt,              without S.feeKallumadi,
                      G. Di Fabbrizio,        provided M.
                                                        that copiesY.-C.
                                                           Kumar,      are Lin,
                                                                            not made   or distributed
                                                                                A. Trotman,  H. Zhao
for profit
(eds.):     or commercial
        Proceedings            advantage
                      of the SIGIR         and workshop,
                                   2018 eCom    that copies
                                                          12bear
                                                             July, this
                                                                   2018,notice  and the
                                                                          Ann Arbor,     full citation
                                                                                      Michigan,  USA,
                                                                                                         that were not predefined in training data. For example, our model
published  at http://ceur-ws.org
on the first page. Copyrights for third-party components of this work must be honored.                   assigned ‘Canon 9167b001 12.8 Megapixel Powershot(R) G1 X Mark Ii
For all other uses, contact the owner/author(s).                                                         Digital Camera’ to the 3292>1041>4380>4258 category which does
eCom Data Challenge, July 2018, Ann Arbor, Michigan, USA
                                                                                                         not exist in the product taxonomy in the train set. Table 1 shows a
© 2018 Copyright held by the owner/author(s).
                                                                                                         sample of related product titles and their respective categories from
                                 Top-level
                               Categories Count         (%)   Largest Sub-category         (%)
                                      4015 268,295 0.3353 4015>2337>1458>40             0.031851
                                      3292 200,945 0.2511 3292>3581>3145>2201           0.037682
                                      2199    96,714 0.1208 2199>4592>12                0.087393
                                      1608    85,554 0.1069 1608>4269>1667>4910         0.013727
                                      3625    29,557 0.0369 3625>4399>1598>3903         0.021400
                                      2296    28,412 0.0355 2296>3597>689               0.004927
                                      4238    23,529 0.0294 4238>2240>4187              0.001985
                                      2075    20,086 0.0251 2075>4764>272               0.004962
                                      1395    18,847 0.0235 1395>2736>4447>1477         0.004720
                                         92     8172 0.0102 92                          0.010215
                                      3730      8113 0.0101 3730>1887>3044>4882         0.003978
                                      4564      5648 0.0070 4564>1265>1706>1158>2064 0.001281
                                      3093      5098 0.0063 3093>4104>2151              0.001907
                                      1208      1030 0.0012 1208>546>4262>572           0.000195
           Table 2: Distribution of First Level Categories and the Most Common Label in Each First Level Categories


the training data that overlapped with the 3292>1041>4380>4258                  3.1    Class Imbalance
label.                                                                          Unbalanced class distribution presents a significant challenge to
                                                                                general classification systems, such as nearest neighbors and multi-
                                                                                layered perceptron, despite remedies, like up-/downsampling and
                                                                                cost-sensitive learning, with limited effectiveness [12].
2    SEQUENCE-TO-SEQUENCE LEARNING                                                  Like most e-commerce product categorization data [2, 6, 17],
The most common Sequence-to-Sequence (Seq2Seq) models belong                    the distribution of the 14 top-level categories is highly skewed, as
to the encoder-decoder family. The source sequence, i.e. product                shown in Table 2. A similar imbalance is found in the distribution
title string in our case, is first encoded as a fixed-length vector. This       of the sub-category labels. From the train set, there are over 3000
vector is then fed to a decoder, which steps through to generate                unique sub-categories. The largest category (2199>4592>12) con-
the predicted output sequence one symbol at a time until an end-                tains ~69,000 product titles that made up 8.7% of the 800,000 product
of-sequence (EOS) symbol is generated. In the context of product                titles from the train set.
categorization, every sub-category is a symbol in our experiments,
and a sequence of the sub-categories forms a full hierarchical cate-            3.2    Noisy Product Titles
gory label. The encoder and decoder are jointly trained to maximize             Noise is inherent to product categories datasets; the RDC dataset is
the probability of generating the correct output sequence given its             no different. Related works on product categorization had dedicated
input[4, 5, 8, 13] .                                                            approach to address the noise through a combination of feature
    Simple encoder-decoder performance deteriorates when trans-                 engineering and classifier ensembles [3, 10].
lating long input sequences; the single fixed-size encoded vector
is not expressive enough to encapsulate that much information.
To address this problem, the attention mechanism was proposed
to learn an implicit alignment between the input and output se-
quences. Before the decoder generates an item, it first aligns for
a set of positions in the source sequence with the most relevant
information [1]. The model then predicts the target item based
on the context vectors of these relevant positions and the history
of generated items. In other words, attention extracts contextual
information for every symbol processed.


3    DATASET CHARACTERISTICS                                                    Figure 1: Lists of Characters not in Printable ASCII Range
The Rakuten Data Challenge (RDC) dataset consists of 1 million
product titles and the anonymized hierarchical category labels. The                 We checked for common noise signatures in the RDC product
data was split 80-20 into training and testing set. The test labels             titles by searching for characters beyond the printable ASCII range
were kept unknown until the end of the competition.                             (0x20 to 0x7E). Figure 1 shows the list of characters outside the
                                                                                range, the left side shows the number of product titles that contain
                                                                            2
one or more of the characters on the right, e.g., the \x99 appears                                    Random
in 2 to 10 product titles.1                                                                  Model     Seed     Epoch Cross-entropy Perplexity
   Upon inspection, we find that the noise can be helpful to the                               M1        0        77        0.8446        1.1835
learning systems due to their systematic nature. For example, the                              M2        1        189       0.0191        1.0038
same strings of non-ASCII-printable characters appear consistently                             M3        1        470       0.0723        1.0145
in clothing category (1608>4269), such as “I (Heart) My *string                                M4        2        54        0.0542        1.0108
of non-ASCII-printable characters* - INFANT One Piece -                                    Table 3: Cross-entropy and Perplexity during Model Train-
18M” in category 1608>4269>4411>4306 and “Frankie Says Relax                               ing
Statement Women’s T-Shirt by American Apparel by Spreadshirt
*string of non-ASCII-printable characters*” in category
1608>4269>3031>62. Hence, we decided not to remove the noise                                Phase       Model(s)                  P       R        F
detected in the product titles.                                                               1         M1 (Baseline)             0.82    0.81     0.81
                                                                                                        M1-3                      0.83    0.83     0.82
4     EXPERIMENTS                                                                                       M1-4                      0.8311 0.8296 0.8245
We lowercased the product titles from the RDC dataset and tok-                                   2      M1-4                      0.8267 0.8305 0.8256
enized the data with the Moses tokenizer2,3 . To frame the product                                      Best system (mcskinner) 0.8697 0.8418 0.8513
categorization task into Seq2Seq generation, we split the categories                           Table 4: Precision, Recall, F1 Scores on Held-out Test Set
up into its sub-categories and treat the category as a sentence. For
example, "4015>3636>1319>1409>3606" is changed to "4015 3636
1319 1409 3606".
                                                                                           validation more closely to stop with a consistent criterion, e.g.,
4.1     Models                                                                             limiting the no. of epochs/steps or a particular threshold for the
Without explicit tuning, we trained a single-layer attentional encoder-                    validation metric.
decoder using the Marian toolkit[7] (commit f429d4a) with the
following hyperparameters.                                                                 5     RESULTS
                                                                                           Table 4 presents the precision, recall, and F-score of the baseline
      • RNN Cell: GRU
                                                                                           and ensemble systems. The phase 1 results are based on a subset of
      • Source/Target Vocab size: 120,000
                                                                                           the full test data, and the phase 2 results are based on the entire test
      • Embedding dim.: 512
                                                                                           dataset. Our baseline system achieved competitive results with 0.81
      • En/Decoder dim.: 1024
                                                                                           weighted F-score in phase 1 of the data challenge and the ensembled
      • Embedding dropout: 0.1
                                                                                           systems improved the performance scored 0.82 in phase 1 and 2 of
      • Dropout: 0.2
                                                                                           the challenge.4,5
      • Optimizer: Adam
                                                                                              Similarly, the best system (mcskinner) in the competition is an
      • Batch size: 5000
                                                                                           ensembled neural network system[11]. It used an ensembled of
      • Learning Rate: 0.0001
                                                                                           multiple bi-directional Long Short Term Memory (LSTM) with a
      • Beam Size: 6
                                                                                           novel pooling method that balances max- and min-pooling across
   We allowed the model to over-fit the training data by using                             the recurrent states. The best system scored 0.85 in phase 2. How-
the full training set as our validation set. We trained the baseline                       ever, the best system follows the traditional classification paradigm
model for 2 hours and stopped arbitrarily at the 7th epoch when the                        where supervised inference produces a fixed set of labels learned
perplexity reaches 1.18. Our baseline model achieved 0.81 weighted                         from the training data.
F-score in the phase 1 result.
   For the rest of the submissions, we ensembled the baseline model                        6 ANALYSIS
with the models trained on different random seeds, and we stopped                          6.1 Attention Alignment
the training when we observed that the perplexity on the validation
                                                                                           The ability to generate alignments between the source and target
set falls below 1.0*. It is unclear what is the benefit of over-fitting
                                                                                           sequences allows us to easily interpret the category predictions
the model to the training set and expecting a 1.0* perplexity, but
                                                                                           with respect to their product titles. We generated the attention
the assumption is that at inference, given a product title that was
                                                                                           weight alignment between source and target sequences for the
seen in training, the model should output the same label.
                                                                                           training set using the baseline model, M1.6
   Table 3 presents the validation metrics (cross-entropy and per-
plexity) for the different models. In retrospect, we could have been                       4 Initially, the data challenge reported scores to 2 decimal places, and the change to

more disciplined in the stopping criteria and monitor the model                            report 4 decimal places happened in the last couple of days of the challenge. Since
                                                                                           the labels for the test set were not available at the time of publication, we could not
1 The penultimate character in the >50 list is the non-breaking space \xa0 and the         perform postmortem evaluation to find out the scores for the M1 baseline and M1-3
last character is a replacement character. They appear in 643 and 766 product titles       ensemble models
respectively. Usually, these are breadcrumbs of the HTML to Unicode conversion.[14,        5 The full ranking of the data challenge is available on https://sigir-ecom.github.io/data-
15]                                                                                        task.html
2 https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl   6 We only analyzed the attention weight alignment on the test set minimally because
3 Python port: https://github.com/alvations/sacremoses                                     the gold labels on the test set were not made accessible.
                                                                                       3
                                     Example 1                                                     Example 2
                          (Correct label: 2296>3597>2989)                               (Correct label: 2296>3597>1997)


                             Figure 2: Attention Alignments of Music Product Titles from Training Set


   In this section, we analyze the behaviors of the model predictions       6.2     Music Category
in relation to their attention alignment based on cherry-picked             The first row of Example 1 in Figure 3 shows an interesting phenom-
examples (Figure 2-6). We also discuss the implications of such             enon that the ‘</s>’ (end of sentence) token is highly associated
behaviors on the existing product category hierarchy.                       with the 2296 first level category. The attention model might have
                                                                            learned to correlate short sequence length with 2296 category. The
                                                                            2296 category seems to be related to media content whose titles are
                                                                            often succinct; in the train set, there are 2085 single token product
                                                                            titles out of which 1720 titles has 2296 as their first level category.
                                                                                When the product titles are terse, the model is unable to dis-
                                                                            tinguish between the fine-grained subcategories. In Example 2,
                                                                            the true label in the 2296>3597>1997 refers to the ‘Media>Music>
                                                                            Electronica’ category7 , but the model predicts it to be 2296>3597>
                                                                            689 i.e. the ‘Media>Music>Pop’ category.8 . Although the model is
                                                                            smart enough to discover the correct top-level(s) categories by
                                                                            learning to associate short sequence with 2296>3597 label, it fails
                                                                            to correctly identity the lowest level category. There are 25 sub-
                                                                            categories under the 2296>359, without additional information, it
                                                                            would be hard even for a human to categorize the music genre
                                                                            based on short and sometimes single-word product title.


                                                                            6.3     Machine Created Categories
                                                                            Unlike traditional classification, the Seq2Seq approach has the abil-
Figure 3: Attention Alignments of a Correctly Labeled                       ity to generate new categories.
Product
                                                                                          Model      Data Split Creation Count
                                                                                            M1         Train            2
   Figure 3 shows an example of a correctly labeled product from                         (Baseline)     Test           46
the training set. The heat maps represent the attention weights                            M1-4        Train            0
that associates the the subcategory labels to each word in the prod-                                    Test            1
uct titles. The ‘gucci’ token aligns heavily to the 1608 first level                       Table 5: Count of Created Categories
category that we observe from eyeballing the data, it may refer to
the ‘jewelery and accessories’ category. We see that the ‘eyeglasses’
and ‘frames’ aligns tightly to 2227 subcategory while ‘woman’ and
                                                                              Table 5 shows the breakdown of the created categories when
‘gucci’ are associated with the 574 subcategory. We observe in the
                                                                            we applied the models to the train and test set. While the baseline
train set that the 2226 final level category is dominated by the
                                                                            7 https://www.rakuten.com/search/asiatisch/4464/
‘eyeglasses’. From the attention weights, we see that many tokens           8 We   found this out by searching the product titles from the train
in the product titles has little or no effect to the alignment to the       set that are labeled with 2296>3597>689 on Rakuten.com, e.g.
specific subcategories.                                                     https://www.rakuten.com/search/Grey%20Sky%20Over%20Black%20Town/4455/
                                                                        4
                                              Example 3                                                               Example 4

        Example 3: PM Company 07622 One-Ply Adding Machine/Calculator Rolls- 2-1/4&amp;quot; x 17 ft- White- 5/Pack
        Example 4: "Universal Adding Machine/Calculator Roll, 16 lb, 1/2"" Core, 2-1/4"" x 150 ft,White, 100/CT - UNV35710"

                                          Figure 4: Attention Alignment of Products with Created Categories


model created 2 new categories, it created 46 categories on the                           33.8oz” as 3625>594>1920. However, it is unclear whether the cre-
test set. During model training, the optimizer makes updates that                         ated category is a valid one without the true labels of the test set
discourage the creation of new categories to minimize cross-entropy                       which is not released prior to the paper publication.12
loss and perplexity. The M1 baseline model created 46 new categories                         There is a variety of creations across almost all categories in the
on the test set, while the M1-4 ensemble model produced only 1                            existing category hierarchy. Although some are mislabeling, many
new category.                                                                             of these created categories are worth considering for adaptations
   Example 3 and 4 from Figure 4 demonstrates how Seq2Seq model                           and additions to the existing ones.13
creates cross-pollinated categories. In this example, the baseline
Seq2Seq model M1 assigned the product, “PM Company 07622 One-                             7     CONCLUSION
Ply Adding Machine/Calculator Rolls- 2-1/4&amp;quot; x 17 ft- White-                      By framing the product categorization task as a sequence gener-
5/Pack”, with a new category, 4238>2149>1286.                                             ation task, we trained attentional sequence-to-sequence models
   To breakdown this created category, we find in the train set that                      to generate unconstrained product categories that are not limited
the overarching category 4235>2149 is for paper-related stationary                        to the supervised labels from the training dataset. These models
products 9 . The last sub-category 1286 consistently appears in                           created new categories based on the existing sub-categories, sug-
4238>4960>1286 which includes calculator-like machines 10 and                             gesting improvement to existing product taxonomy. Categoriza-
their accessories, like calculator cases 11 .                                             tion outcomes by these models can also highlight repetitive and
   In 4238>4960>1286, we also spotted a product analogous to Ex-                          ambiguous categories. In contrast to the traditional classification
ample 6, "Universal Adding Machine/Calculator Roll, 16 lb, 1/2"" Core,                    paradigm, the attention weight alignment generated for each prod-
2-1/4"" x 150 ft,White, 100/CT - UNV35710". The presence of this calcu-                   uct title makes the model easily interpretable. With an F1-score of
lator printing roll from a different brand may suggest that Example                       0.82 in the Rakuten Data Challenge at SIGIR eCom’18, attentional
6 should fall under the same category. However, calculator-like ma-                       sequence-to-sequence models are shown to be adequate for product
chines dominate the category 4238>4960>1286 by constituting 95                            categorization.
out the 105 products in the train set. Therefore, 4238>2149>1286,
created by our Seq2Seq model, is an adequate suggestion for a new
category of calculator printing rolls.
                                                                                          ACKNOWLEDGEMENTS
   The ensemble model (M1-4) created one novel category by la-                            We thank the organizers for organizing the Rakuten Data Challenge.
belling the product “Natural Tech Well-Being Conditioner - 1000ml/                        Our gratitude goes to Rakuten Institute of Technology (Singapore),
                                                                                          for their support and the computation resources for our experi-
                                                                                          ments. Additionally, we thank our dear colleagues, Ali Cevahir

                                                                                          12 By inspecting the training data, most of the hair conditioner in the train set fall under
9 Examples: Paper | FE4280-22-250 in 4238>2149>1644 and Lissom Design 24021 Paper
                                                                                          the category 3625>3641>1920, M1-4 combined that category with 3625>594>...,
Block Set -WB in 4238>2149>488                                                            which seems to be the skincare sub-category. This category creation, though sensible,
10 Examples: Hewlett Packard HP 10s Scientific Calculator, Casio DR-210TM Two-Color
                                                                                          might be a mislabel because 3625>3641>1920 is a well-defined hair product category.
Desktop Printing Calculator and Ti Nspire Cx Graphing Calc                                13 The full list of created categories and dataset exploratory code described in Section
11 Guerrilla Accessories TI83BLKSC TI83 Plus Silicone Case Black
                                                                                          3 is available on https://github.com/MaggieMeow/neko
                                                                                      5
and Kaidi Yue, for sharing their knowledge and insights in related
research subjects.

REFERENCES
 [1] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural ma-
     chine translation by jointly learning to align and translate. arXiv preprint
     arXiv:1409.0473 (2014).
 [2] Ali Cevahir and Koji Murakami. 2016. Large-scale Multi-class and Hierarchical
     Product Categorization for an E-commerce Giant. In Proceedings of COLING 2016,
     the 26th International Conference on Computational Linguistics: Technical Papers.
     525–535.
 [3] Jianfu Chen and David Warren. 2013. Cost-sensitive Learning for Large-scale Hi-
     erarchical Classification. In Proceedings of the 22Nd ACM International Conference
     on Information & Knowledge Management (CIKM ’13).
 [4] Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio.
     2014. On the Properties of Neural Machine Translation: Encoder–Decoder Ap-
     proaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and
     Structure in Statistical Translation. Association for Computational Linguistics.
 [5] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau,
     Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase
     Representations using RNN Encoder–Decoder for Statistical Machine Translation.
     In Proceedings of the 2014 Conference on Empirical Methods in Natural Language
     Processing (EMNLP). Association for Computational Linguistics.
 [6] Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual
     evolution of fashion trends with one-class collaborative filtering. In proceedings
     of the 25th international conference on world wide web. International World Wide
     Web Conferences Steering Committee.
 [7] Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang,
     Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham
     Fikri Aji, Nikolay Bogoychev, André F. T. Martins, and Alexandra Birch. 2018.
     Marian: Fast Neural Machine Translation in C++. In Proceedings of ACL 2018,
     System Demonstrations. Melbourne, Australia. https://arxiv.org/abs/1804.00344
 [8] Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation
     models. In Proceedings of the 2013 Conference on Empirical Methods in Natural
     Language Processing.
 [9] Bhargav Kanagal, Amr Ahmed, Sandeep Pandey, Vanja Josifovski, Jeff Yuan,
     and Lluis Garcia-Pueyo. 2012. Supercharging Recommender Systems Using
     Taxonomies for Learning User Purchase Behavior. In Proceedings of VLDB En-
     dowment.
[10] Zornitsa Kozareva. [n. d.]. Everyone Likes Shopping! Multi-class Product Cat-
     egorization for e-Commerce. In NAACL HLT 2015, The 2015 Conference of the
     North American Chapter of the Association for Computational Linguistics: Human
     Language Technologies, year = 2015.
[11] Michael Skinner. 2018. Product Categorization with LSTMs and Balanced Pooling
     Views. In SIGIR 2018 Workshop on eCommerce (ECOM 18).
[12] Yanmin Sun, Andrew K. C. Wong, and Mohamed S. Kamel. 2009. Classification
     of Imbalanced Data: a Review. International Journal of Pattern Recognition and
     Artificial Intelligence 23, 4 (2009), 687–719.
[13] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning
     with neural networks. In Advances in neural information processing systems.
[14] Liling Tan and Francis Bond. 2011. Building and Annotating the Linguistically
     Diverse NTU-MC (NTU-Multilingual Corpus). In Proceedings of the 25th Pacific
     Asia Conference on Language, Information and Computation.
[15] Liling Tan, Marcos Zampieri, Nikola Ljubesic, and Jorg Tiedemann. 2014. Merging
     comparable data sources for the discrimination of similar languages: The dsl
     corpus collection. In Proceedings of the 7th Workshop on Building and Using
     Comparable Corpora (BUCC).
[16] Li-Tung Weng, Yue Xu, Yuefen Li, and Richi Nayak. 2008. Exploiting Item
     Taxonomy for Solving Cold-Start Problem in Recommendation Making. In 2008
     20th IEEE International Conference on Tools with Artificial Intelligence.
[17] Yandi Xia, Aaron Levine, Pradipto Das, Giuseppe Di Fabbrizio, Keiji Shinzato, and
     Ankur Datta. 2017. Large-Scale Categorization of Japanese Product Titles Using
     Neural Attention Models. In Proceedings of the 15th Conference of the European
     Chapter of the Association for Computational Linguistics: Volume 2, Short Papers.
     Association for Computational Linguistics.
[18] Cai-Nicolas Ziegler, Georg Lausen, and Lars Schmidt-Thieme. 2004. Taxonomy-
     driven computation of product recommendations. In CIKM.


                                                                                          6

</pre>