<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic Tag Recommendation for News Articles</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zijian Győző Yang</string-name>
          <email>yang.zijian.gyozo@itk.ppke.hu</email>
          <email>yang.zijian.gyozo@uni-eszterhazy.hu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Attila Novák</string-name>
          <email>novak.attila@itk.ppke.hu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>László János Laki</string-name>
          <email>laki.laszlo@itk.ppke.hu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Eszterházy Károly University, Faculty of Informatics</institution>
          ,
          <country country="HU">Hungary</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>MTA-PPKE Hungarian Language Technology Research Group</institution>
          ,
          <country country="HU">Hungary</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Pázmány Péter Catholic University, Faculty of Information Technology and Bionics</institution>
          ,
          <country country="HU">Hungary</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>29</fpage>
      <lpage>31</lpage>
      <abstract>
        <p>In this paper, we present an automatic neural tag recommendation system for Hungarian news articles and the results of our experiments concerning the efect of preprocessing applied to the texts and various parameter settings. A novelty of the approach is a combination of subword tokenization with character-n-gram-based representations, which resulted in high gains in recall. The best system yields 76% precision at 58% recall. Subjective performance is higher, because suggested labels missing from the reference often fit the document well or are similar to missing reference labels. We also created an online GUI for the tag recommendation system that makes it possible for the user to interactively set threshold parameters facilitating customization of precision and recall.</p>
      </abstract>
      <kwd-group>
        <kwd>tag recommendation</kwd>
        <kwd>tag suggestion</kwd>
        <kwd>keyword generation</kwd>
        <kwd>fastText</kwd>
        <kwd>SentencePiece tokenizer</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Content published at online news sites is often labeled using thematic tags. The
presence of these labels makes it possible for readers to focus on topics interesting
for them and for the publisher to display content related to each individual article.
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
The publisher can also customize/filter/recommend content probably interesting
for registered users that have a user profile. Content keywords added to the meta
tags in the HTML head section also play a role in ranking algorithms of search
engines. As long as the keywords are really appropriate to the content, they may
positively afect search engine ranking.</p>
      <p>
        Tag recommendation algorithms have existed for several years. For an
overview of methods and solutions see e.g. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Nevertheless, thematic tags are
assigned to content in a manual manner at many online publishers. Some editorial
boards include a dedicated staf member (usually an educated librarian) whose task
is to assign thematic tags to all published articles, while other publishers make it
the responsibility of each author to assign keywords. The former approach has
a rather limited throughput. The latter approach makes the process cheaper and
much more productive, however, it results in a proliferation of keywords and a much
less uniform and well-thought-out keyword usage. Uniformity of keyword usage
is only partially assured by tagging guidelines and prefix-based predictive input
returning formerly used keywords integrated in the content management system
(CMS) used by the publisher. The latter, however also results in duplication of
typos in the keywords and thus may increase rather than decrease variation in
many cases.
      </p>
      <p>In this paper, we present an automatic keyword recommendation system for
Hungarian that can be integrated in content management systems supporting
editorial work. We train our models for keyword prediction using manually tagged
past articles.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        The most important former result published on automatic keyword assignment for
Hungarian news was a system created for the [origo] news site [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The goal of
that research was automatic tagging of past news articles that had been published
by [origo] before manual tagging was introduced so as to make them available for
automatic content recommendation. The system created was not integrated in the
editorial system to support future work of the news editors or authors, or at least
such application is not mentioned in the publications. The solution outlined in
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is rather complex: the text underwent detailed automatic annotation including
PoS tagging, lemmatization, name entity recognition and NP chunking. Identified
NP’s were normalized/lemmatized, and the type of the name entity was identified
(person, location or organization). A rather limited amount of tagged content was
available at that time, thus authors of that paper relied primarily on extracting
and normalizing phrases present in the article when assigning keywords.
      </p>
      <p>
        We, in contrast, could rely on a reasonably large amount of tagged documents.
And our main goal is also diferent: automatic support of the tagging of future
content, which will remain a manually controlled process. This makes a simpler
solution possible, while manual control of precision vs. recall is a useful feature
in a CMS-integrated computer-aided keyword assignment system. We based our
solution on the fastText text classification library [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which uses neural classifiers
to assign thematic tags. The input to the classifiers is a semantic vector
representation of the document computed as a weighted average of the vectors representing
individual tokens and token n-grams. The latter, in turn, are computed as the
average of the representation of character n-grams of various length making up
individual tokens in the text. Classifiers calculate a probability estimate of each label
being suitable to tag a document with the given semantic vector representation.
Setting a threshold on the probability estimate can be used to distinguish suitable
labels from unsuitable ones as well as to fine-tune the precision and the recall of
the algorithm. Although more recent deep neural models surpass the performance
of fastText for text classification (currently XLNet [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] performing best on many
English text classification benchmark data sets), their complexity and resource
requirements exceed that of fastText with a much higher margin than what the
diference in performance would justify.
      </p>
      <p>
        An experiment concerning the performance of fastText on Hungarian text
classification was presented in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. However, the task in that experiment was a simple
two-class (sports vs. video game) classification problem, while our goal is to select
the most suitable thematic labels from a set of several thousand or tens of
thousands of possible labels, where the number of suitable labels can also widely difer
depending on the length and the topic of the document.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. The tag recommendation system</title>
      <p>In this section, we present the architecture and components of the system and the
training database.</p>
      <sec id="sec-3-1">
        <title>3.1. Architecture</title>
        <p>The tag recommendation system is a REST-based web application consisting of a
JavaScript-bootstrap-based front-end UI and a Python-based back-end server. The
title, lead, author, date and content of the article can be entered through the front
end, and it can be submitted for tagging to the back end.</p>
        <p>Proposed keywords can be either names, thematic labels, or trend labels.
Labels of the latter type pertain to unique events (a specific performance, festival,
sports event, election etc.), often of periodic recurrence (such as Olympic games or
elections). Trend labels are assigned to many articles published within a relatively
short period of time, but then they fall out of use. However, reports on a specific
fair, conference or award ceremony in an event series are quite similar to reports
on any other event in that series, thus trend labels pertaining to past instances of
periodically recurring events appear as noise when tagging a document on a
current instance of the event. To prevent this, we distinguish trend labels from static
names (of persons, organizations, products and more generic event types etc.) and
from generic conceptual labels.</p>
        <p>Suggested labels are displayed along with the confidence/probability assigned
to them by the classifier, and they can be filtered by setting a threshold using a
slider. In addition a minimum amount of best labels to be displayed can be set.</p>
        <p>The front end also contains a demo where test documents taken from the
original labeled data are displayed along with the original manually assigned reference
keywords and those proposed by the automatic classifier. The set can be
dynamically filtered using the threshold slider.</p>
        <p>The back end is implemented in Python. It uses a Flask-based web server to
communicate with the front end that uses AJAX requests to send the document to
be tagged and get the suggested labels. The format of the data packages is JSON.
The back end loads distinct models for static names and conceptual labels and one
suggesting trend labels that is trained only on recent documents (published not
earlier than 6 months before). For older documents, trend labels are substituted
with their generic equivalents.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. The tools</title>
        <p>
          As we mentioned above, our models are based on text representation and
classification models implemented in the open-source fastText library [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. FastText
was developed at Facebook and is implemented in C++. The semantic vector
representations created by the neural model implemented in fastText is based on
distributional properties of words. The model is trained to predict words in the
context of a given word token (or vice versa). FastText handles the problem that
rare words and ones unseen in the training corpus (i.e. out-of-vocabulary – OOV
words) would lack a representation by inducing character n-gram, instead of word,
representations. Vectors representing words and documents are calculated
averaging word n-gram representations.
        </p>
        <p>Recent end-to-end deep neural models for machine translation and other
highlevel NLP tasks use another approach, subword tokenization, to handle the serious
problems former word-token-based models had: excessive memory requirements
and a general inability to adequately handle rare and OOV word forms.</p>
        <p>
          The subword tokenizer most frequently used in current neural machine
translation systems is SentencePiece [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], a language-independent subword tokenizer and
detokenizer implementing two subword tokenization algorithms, byte-pair-encoding
BPE [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and unigram language model [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Using a subword tokenizer makes it
possible to do away with any language-specific preprocessing. It guarantees a limited
vocabulary, the size of which can be specified in advance, almost entirely
eliminating the problem of unknown tokens. (The only exception is possible unknown
characters resulting in unknown tokens e.g. from foreign-language document
sections.)
        </p>
        <p>While the application of character n-grams handles the OOV problem in
fastText and thus subword tokenization seems superfluous in this context, driven by
a sudden impulse, we tested during our preliminary experiments whether applying
subword tokenization influences the labeling performance of the classifiers. And we
found that, indeed, training and testing the fastText classifiers on BPE-tokenized
corpus
Weekly train
Weekly test
Online train
Online test
input greatly improves recall while precision is afected only to a much more
limited extent. Since our goal is to use the models in a human-controlled environment
where improper keywords can easily be identified and unselected by the author
of the article, the slight reduction of precision is a fair price given for a largely
improved recall.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. The corpus</title>
        <p>For our experiments, we used articles from the HVG printed weekly newspaper
from 1994–2017 as well as news documents from the online hvg.hu news portal
(2012–2018). The former were tagged by a single expert librarian, while the latter
by the article authors. As a result, the set of labels used within the latter corpus
is not uniform. Authors may use diferent (often misspelled) forms of the same
label: M0-ás autópálya, M0-ás, M0-s autópálya, M0-s, M0-ás autóút, M0, M0-s
autóút ‘M0 highway, M0 motorway, M0, M0 freeway, highway M0’, etc. In many
cases, synonymous labels include not only diferent spellings, but labels of diferent
origin or style for the same concept, for example fű, marihuána, kannabisz ‘weed,
marijuana, cannabis’ etc.</p>
        <p>Some documents in the weekly newspaper corpus are tables or graphs rather
than articles. We omitted these from our experimental data.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and results</title>
      <p>We performed detailed experiments on the weekly newspaper and the online news
corpus concerning the efect of various factors on text classification and labeling
performance. We wanted to see how the number of training examples seen for a
specific label afects performance. The train-test split was done for both corpora
so that we have at least 5 test documents for each label occurring at least 15 times
in the corpus. Sizes and other features of the two train-test splits are shown in
Table 1.</p>
      <p>Although the weekly newspaper corpus spans 24 years while the online corpus
only 7 years, the latter contains more than 3 times as many documents and, due
to the much higher variation in label usage, 7.5 times more diferent label types.
About 95% of label occurrences are names in both corpora. There is, however,
a significant diference in the ratio of name labels among label types: about 96%
in the weekly corpus, while under 50% in the online corpus. This is due both to
much higher variation in concept labels and sloppy lower cased spelling of many
rare name labels in the online corpus. The ratio OOV label occurrences is 2.7% in
the weekly corpus and 6.9% in the online corpus.</p>
      <p>We tested diferent tokenization models on the weekly corpus: traditional
punctuation-based tokenization, no tokenization, and BPE subword tokenization. We
trained joint models where name labels were not distinguished from concept labels,
and ones were they were separated. When training and predicting named entity
(NE) labels separately, we preprocessed documents keeping only the
maximumtwo-word context of words containing capital letters. We omitted untokenized and
traditionally tokenized models for the online corpus from our experiments, as we
found these to have an inferior performance on the weekly corpus.</p>
      <p>All models were trained using the same parameters. We trained one-to-many
classifiers to handle variable label counts. The dimension of vectors was 100 for
all models. Training models for the online corpus took much longer than for the
weekly corpus due to the much higher number of diferent labels (much more
oneto-many classifiers need to be trained). Due to this, we trained online models for
only 30 epochs in contrast to 50 epochs used when training the weekly models.</p>
      <p>We wanted to see how the model performance is afected by the frequency of
labels in the training data. We thus assigned labels to bins based on their frequency,
and measured precision, recall and  1 score for labels in each bin as a function of the
cutof threshold used to select the top label candidates. We measured performance
for names, non-name labels and both combined. We were especially interested in
performance on rare labels. If the model cannot learn to predict rare labels, we
can safely eliminate them from the training data significantly speeding up training
and update of the models without afecting performance.</p>
      <p>We present our findings in Fig. 1. The diagrams shows precision ( ), recall ( )
and  1 score of each model as a function of the cutof threshold. The diagrams on
the left show performance on name labels, the ones on the right on non-name labels,
the ones in the middle for all labels. We can clearly see that subword tokenization
(weekly-sp model) results in a much higher recall and  1 score than traditional
(weekly-tok) or no tokenization (weekly-untok). The latter two models performed
almost identically. Although their precision is higher than that of the subword
tokenized models at lower cutof threshold values, they have very much lower recall.
We measured lower performance on the online corpus (online-sp model) due to the
much higher variation of labels. However, the subjective impression concerning the
quality of labels suggested by this model is not worse than for models trained on
the weekly corpus due to appearance of labels synonymous to the reference labels.
Members of the editorial board clearly found the performance this model superior.
We created a tool that can be used to merge and normalize synonymous labels.
Normalization of the label set used in the online corpus is under way using this
tool.</p>
      <p>All models perform better for names than for non-name labels except that recall
for non-names is higher for models trained on the weekly corpus for lower
threshold values. For non-subword-tokenized models, there is a very pronounced peak in
 1 score at 0.02, while subword-tokenized models perform best at a threshold of
0.2, but they have a much more balanced performance overall. Training a
separate model for names and non-name labels clearly improved performance with the
exception of a slight drop in precision for names in the weekly corpus, but on the
online corpus, name label precision also slightly improved.1</p>
      <p>We also measured performance on distinct label frequency classes. For lack of
space, we present only the values for the online-sp model in Fig. 2. All
subwordtokenized models have measurable recall for all label frequency classes, although it
is rather low for very rare labels (with less than 5 training examples). This means
that we can safely eliminate labels of very low frequency from the training data
reducing model size and training time without hurting performance. Precision is
not very high either for labels with less than 10 training examples. The general
trend is that the more training examples there are for a label, the higher precision
and recall we get. Untokenized and traditionally tokenized models have measurable
recall only at very low threshold values. While one fifth of the label occurrences in
the weekly test data pertains to rare labels having less than 30 training examples,
only 0.5% of these labels is actually suggested by these models above the 0.05
threshold value.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this paper, we presented an automatic thematic keyword suggestion system for
Hungarian news text. We created a web-based front end to the system that makes
it possible for the user to set certain parameters, e.g. the cutof threshold for the
keyword suggestion list. This allows customization of the precision and recall of the
keyword candidates. In an optimal setting, we can recommend thematic keywords
with 76% precision at 58% recall. We performed a detailed evaluation of models
examining various preprocessing and parameter options. Combining the fastText
model with subword tokenization substantially improved recall while the decrease
of precision was tolerable. At the same time, model size was also reduced to a
fraction of the original. We have also found that rare labels can be eliminated
from the training corpus speeding up training and reducing model size without
significantly afecting performance.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This research was implemented with support provided by grant FK 125217 of the
National Research, Development and Innovation Ofice of Hungary financed under
1The drop of precision on names in the weekly corpus seems mainly to be due to country
labels, some of which are very frequent. Location can be inferred better relying on the whole text
than just on names present in the document.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Aggarwal</surname>
            ,
            <given-names>C. C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>A survey of text classification algorithms</article-title>
          . In Mining Text Data,
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          and C. Zhai, Eds. Springer US, Boston, MA,
          <year>2012</year>
          , pp.
          <fpage>163</fpage>
          -
          <lpage>222</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Farkas</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berend</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hegedűs</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kárpáti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Krich</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <article-title>Automatic free-text-tagging of online news archives</article-title>
          .
          <source>In Proceedings of the 2010 Conference on ECAI 2010: 19th European Conference on Artificial Intelligence (Amsterdam</source>
          , The Netherlands, The Netherlands,
          <year>2010</year>
          ), IOS Press, pp.
          <fpage>529</fpage>
          -
          <lpage>534</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <article-title>Bag of tricks for eficient text classification</article-title>
          .
          <source>In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume</source>
          <volume>2</volume>
          ,
          <string-name>
            <given-names>Short</given-names>
            <surname>Papers</surname>
          </string-name>
          (Valencia, Spain,
          <year>2017</year>
          ), ACL, pp.
          <fpage>427</fpage>
          -
          <lpage>431</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Kowsari</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meimandi</surname>
            ,
            <given-names>K. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heidarysafa</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mendu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barnes</surname>
            ,
            <given-names>L. E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Brown</surname>
          </string-name>
          , D. E.
          <article-title>Text classification algorithms: A survey</article-title>
          . ArXiv abs/
          <year>1904</year>
          .08067 (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Kudo</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <article-title>Subword regularization: Improving neural network translation models with multiple subword candidates</article-title>
          .
          <source>In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          (Melbourne, Australia,
          <year>2018</year>
          ), ACL, pp.
          <fpage>66</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Kudo</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Richardson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing</article-title>
          .
          <source>In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</source>
          (Brussels, Belgium, Nov.
          <year>2018</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , pp.
          <fpage>66</fpage>
          -
          <lpage>71</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Sennrich</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haddow</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Birch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>Neural machine translation of rare words with subword units</article-title>
          .
          <source>In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          (Berlin, Germany, Aug.
          <year>2016</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , pp.
          <fpage>1715</fpage>
          -
          <lpage>1725</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Szántó</surname>
            ,
            <given-names>Zs.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vincze</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Farkas</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Magyar nyelvű szó- és karakterszintű szóbeágyazások [Word and character embeddings for Hungarian]</article-title>
          .
          <source>In XIII. Magyar Számítógépes Nyelvészeti Konferencia (MSZNY 2017) [13th Hungarian Conference on Computatinal Linguistics] (Szeged</source>
          ,
          <year>2017</year>
          ), SZTE, pp.
          <fpage>323</fpage>
          -
          <lpage>328</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , Carbonell, J.,
          <string-name>
            <surname>Salakhutdinov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q. V.</given-names>
          </string-name>
          <article-title>XLNet: generalized autoregressive pretraining for language understanding</article-title>
          . CoRR abs/
          <year>1906</year>
          .08237 (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>