<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Feature Reduction for Dependency Graph Construction in Computational Linguistics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>László Kovács</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>László Csépányi-Fürjes</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Miskolc, Institute of Information Science</institution>
          ,
          <country country="HU">Hungary</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>29</fpage>
      <lpage>31</lpage>
      <abstract>
        <p>Dependency grammar is an important tool in semantic analysis of text sources. In the transition-based approach of dependency graph construction, the engine detects the special features of the source text and determines the next construction steps using a machine learning method. Traditionally, the feature set is constructed manually, and some of the features may be irrelevant or redundant. In this paper, we investigate the eficiency of two known feature reduction methods in a grammar induction problem. The first method uses variance minimization algorithm and the other works with an approach based on mutual information content. We propose also a normalized feature similarity for alternative cluster-based feature reduction approach. For the test evaluation, we use the sentence bank of UD (Universal Dependency) homepage taking examples from two diferent languages: English and Hungarian.</p>
      </abstract>
      <kwd-group>
        <kwd>computational linguistics</kwd>
        <kwd>dependency graph</kwd>
        <kwd>feature reduction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The analysis of sentence structures is an important field of computational
linguistics. In the literature, we can find diferent approaches to describe the grammar of
human languages as diferent languages may require diferent structure models to
represent the very rich language specialties. The most widely used grammar model
is the phrase-structure grammar or constituency grammar [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], where the sentence
structure is based on the structural axis between the Noun-phrase and the
Verbphrase units. The phrase-structure grammar provides very good results especially
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
for languages with fixed word order. The other main approach of sentence level
grammar is the family of dependency grammars. The idea of dependency
structures dates back to the work of Frege on algebraic logic [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. According to Frege,
the semantic of a sentence is based on the predicate (verb) phrase, where the exact
meaning of the sentence is given by the arguments of the predicate. In this sense,
the words related to the arguments depend on the words of the predicate. The first
explicit application of this semantic dependency model on sentence structuring
relates to the works of Tesniere [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        In dependency grammars, the semantic structure of a sentence is represented
with the dependency graph connecting the words of the sentence. In the graph,
the edges may be assigned to diferent semantic labels [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In a dependency
relationship, one of the words is called head word, the other is the dependent word.
This dependency corresponds to a parent - child relationship, every word as
dependent unit can be bound only to one head word. The maximal number of related
dependent words is called the valency of the head. In the sentence parsing module,
ifrst POS (part of speech) and other main morphological attributes are determined
for the words, then the full dependency hierarchy is constructed. The grammar
is called projective if the order of the words in the sentence is identical to the
order of the corresponding leaf nodes in the dependency graph. The languages with
lfexible word order like Hungarian, theses sequences may be diferent, the language
contains non-projective grammar structures, too. According to the analysis of
Sartorio [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the ratio of sentences with some non-projective phenomena, can achieve
a frequency ratio of 25%.
      </p>
      <p>
        In the literature, there are two main approaches to construct the dependency
graph. One solution is the transition-based method where the words in the
sentence are processed sequentially and the next elementary graph construction step is
determined from the current processing context [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The method uses a predefined
set of elementary transformation rules and the appropriate rule for execution is
predicted using some machine learning methods. This construction module uses
the following data elements: state descriptor variables, initial and finals state
descriptors. Considering the practical implementations, the most widely used variant
is the arc-eager method [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], where the state description is given by three base lists:
the bufer of words not tested yet, stack of words under investigation and the list
of words already processed. The model contains the following transformation
operators: Local-left, Local-right, Reduce and Shift. The Shift operator moves one
word from the bufer of words not tested yet into the stack. The Reduce operator
removes the word from the head of this stack.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Features vectors in graph construction</title>
      <p>
        The winner elementary graph construction step is generally predicted from the
current context parameters. These context parameters are called context features
and they relate to some grammatical parameters of the words like the POS tag or
the position. The applied classifier engine uses these feature vectors as input to
determine the winner operation category. It can be shown that the eficiency of
the classification process depends significantly from the feature set used to describe
the context status. According to the experiences presented in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], the application
of large, extended feature set yields in an improved classification accuracy. To
provide good prediction result, in the classical approach, the parser engines use a
large set of features in order to access rich information about current context. In
the practice, there exists a default feature set which is dominantly applied in the
diferent application systems. Only few articles focus on the problem of optimality
of the selected feature set. Among the related approaches, the dominant solution
use a greedy expansion algorithm. Starting from a minimal feature set as a subset of
the global feature pool, the set is extended iteratively with new features providing
the largest quality increase. The main quality factor is the classification accuracy
of the prediction engine.
      </p>
      <p>
        Although large feature sets can increase the classification accuracy, the large
data set decreases the time cost eficiency of the system. From this point of view,
it seems reasonable to reduce the applied feature set. At the same time we can also
see that the usual feature sets (like unigrams, bigrams and trigrams) are
sporadically overlapping each others which suggests that some of the features are either
irrelevant or even misleading (see [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). The reduction mechanism [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for feature
set optimization eliminates the redundant features while it tries to maximize the
information content of the set. In the literature, there are only few research works
on this optimization problem.
      </p>
      <p>In our investigation, we focus on the reduction approach to provide an optimal
reduced feature set for the investigated source language. In the evaluation tests we
use Hungarian and English text sources, these two languages belong to diferent
language categories regarding the projective status of the grammar. Our hypothesis
says that the default feature sets can be reduced providing a more eficient feature
set selection. The main goal in our investigation is to construct a method to
discover the features with low relevance.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Related work</title>
      <p>
        The construction of a language dependency treebank is labor-intensive process.
Researchers at the University of Szeged developed the first Hungarian dependency
corpus by transforming the existing expression-based Szeged Treebank. The
converted dependency trees were manually checked and repaired (see [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]). The
creation of the dependency Treebank was motivated by the so-called UD CoNLL 2007
Shared Task, which required participants to train and test their own dependency
parser system using the same data set (see [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]). The development of the
Hungarian Treebank enabled the Hungarian researchers to appear in the shared task
not only as the developer of a dependency parser, but also as the owner of the
Hungarian data set.
      </p>
      <p>
        The current website of the Universal Dependency (UD) project contains the
so-called Hungarian Szeged Universal Treebank, which is an extract from the
Hungarian Dependency Treebank. This publicly available excerpt contains 910 train,
441 dev, and 449 test sentences and largely follows the UD 2.0 annotation principles
(see [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]).
      </p>
      <p>
        The magyarlanc library is a tightly coupled tool chain that performs NLP tasks
and allows you to perform basic operations such as: text and sentence segmentation
and tokenization, lemmatization, POS tagging and dependency tree parsing of a
Hungarian text. It was developed in accordance with the already mentioned Szeged
Universal Treebank. For doing the dependency parsing the graph-based MATE
parser was integrated into the magyarlanc system (see [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]).
      </p>
      <p>The reason why
they selected a graph-based parser was an experiment in which they compared the
popular ArcEager transition-based algorithm with the graph-based MATE parser.
The result showed that the graph-based parser over performs its competitor.</p>
      <p>You may wonder why we still focus on the transition-based algorithms.
We
believe that it is worth considering the use of these algorithms for the Hungarian
language for two reasons. Firstly the ArcEager version of the algorithm is not
capable of exploring the dependency tree of non-projective sentence structures,
and is therefore unable to compete with a graph-based parser. Since Hungarian
language is rich in non-projective structures, this characteristic is an important
factor when we are selecting the algorithm variant. For instance there is the
nonprojective list-based variant, which is specifically designed to analyze non-projective
sentence structures. Secondly, the text processing from left to right is similar to
the way the human brain parses a written text. We think that experimenting with
a transition-based algorithm - which is a classic left-to-right processor - can provide
some interesting insights about the human parsing-learning process as well.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Feature reduction using maximum entropy</title>
      <p>We can observe that a certain feature may be more valuable than the others from
the viewpoint of the classification accuracy if its information value regarding the
category label is higher than the average. If a feature shows very similar or even
the same values for all the diferent transitions then that feature is weak in terms
of classification. The above mentioned conditions are showing similarities with the
relevance of information in a probabilistic system. The transitions can be viewed
as possible signals in a communication channel:</p>
      <p>Ω = { 1,  2,  3, . . . }.</p>
      <p>The probabilities ( (  )) of the features determined by the frequency values in the
training set can be viewed as a probability space:
The information content of  Ω can be expressed with the entropy value:
 Ω = { ( 1),  ( 2),  ( 3), . . . }.
 ( Ω) = −
 (  )</p>
      <p>( (  )).
︁∑

The entropy gets the greatest value when the probabilities calculated for all
transitions are the same and the smallest if the probability calculated for a certain
transition is 1. It means that the lower the entropy the more valuable the feature
template is.</p>
      <p>To evaluate the information content of a feature  , we group the test cases by
the feature value into disjoint subsets. The weighted entropy of  is calculated with
︁∑</p>
      <p>=</p>
      <p>|Ω | ·  ( Ω ),
where the summation runs over the possible feature values ( ) and Ω denotes the
set of test cases where the  value is equal to  .</p>
      <p>Having an input training set, we calculate the frequency values and using these
values as approximation of the probabilities, we get also the entropy for every
features. Based on the entropy values, we can eliminate the features with high
entropy.</p>
      <p>tent</p>
    </sec>
    <sec id="sec-5">
      <title>5. Feature reduction using mutual information con</title>
      <p>
        In the field of data analysis, the attribute reduction is a widely used preprocessing
step. A key factor in attribute reduction is the dependency measure among the
diferent features. One of the most widely used base measures is the correlation
coeficient, where the correlation for two variables with continuous values
can be given [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] with
 =
︁√ ∑︀
 (  −  )2︁√ ∑︀
︀∑
 (  −  )(  −  )
 (  −  )2
      </p>
      <p>
        In the literature, we can find many extensions of the base correlation measure,
like the Fast Correlation-based Filter method [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] using symmetric uncertainty
measure.
      </p>
      <p>As the features in the transition-based graph construction model are
mainly categorical variables, the dependency between two variables is usually given
with a contingency table which displays the multivariate frequency distribution of
the variables. To measure the strength of association between the two variables,
we can use measures like contingency coeficient or phi co-eficient.</p>
      <p>In our investigation we selected an information theory oriented approach
using mutual information content to measure the mutual dependence between the
features. Mutual information can be calculated with
) and the marginal distributions are  
and   .</p>
      <p>The summation runs over the possible value pairs in the contingency table. This
formula can be expressed also with
where  ( ) is the entropy for the variable  and  (, 
entropy of  and  .
) denotes the joint
5.1. Similarity measure using normalized mutual information
content
The mutual information measure can take any value from the set of non-negative
real numbers. In order to use a normalized similarity value, we propose the
 (,  ) measure given with
) =  ( ) +  ( ) −  (,</p>
      <p>) ≤  ( ) +  ( ) − max( ( ),  ( )),

) =</p>
      <p>(,  )
min( ( ),  ( )
on the following way. As  ( ),  ( ) ≥ 0 and  (,  ) ≥ 0, thus 
On the other hand, we know that</p>
    </sec>
    <sec id="sec-6">
      <title>6. Experiments</title>
      <p>
        6.1. Evaluating the features with the Hungarian data set
In our experiment we implemented four variants of the transition-based algorithm,
namely ArcEager-Stack, ArcStandard-Stack, Projective-List and
NonProjectiveList (see [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]). We incorporated the feature list from the literature. Then we
applied the following iterative experiment: train ⇒ evaluate ⇒ entropy calculation
described in Section 4 ⇒ feature reduction. In the feature reduction phase we
omitted the feature that has the maximum entropy.
      </p>
      <p>The numbers on the horizontal axis of Figure 1 graph show how many features
have been omitted. The vertical axis shows the mean UAS ( unlabeled attachment
score) of the test, validation, and train data sets. It is interesting to see that
omitting the high entropy features does increase the accuracy until we reach a
point where even the highest entropy features are too valuable to leave out from
the calculation. These results suggest that the maximum entropy calculation can
be used to evaluate the features and to find ones that can be eliminated from the
training. This elimination helps reducing the calculation cost of the algorithms and
even increases the accuracy.
6.2. Comparing the Hungarian result to English
We compared the Hungarian result with an English training set downloaded from
the UD website1. To eliminate the diference that comes from the diferent training
set sizes we shortened the English set to the same size as the Hungarian one (1800
sentences). Figure 2 concludes that our feature reduction procedure appears to
be independent of the examined language. The other three algorithm variants
produced similar results.</p>
      <sec id="sec-6-1">
        <title>6.3. Comparing the maximum randomized reduction entropy based reduction to</title>
        <p>Finally, we wanted to investigate whether maximal-entropy based elimination is
actually more beneficial than just leaving out a randomly selected feature. In this
ifnal test, the features to be discarded were randomly selected and the UAS values
were compared with the aforementioned results. It can be seen that the accuracy
achieved by the randomly dropped features is constantly decreasing and falling
under the entropy-selected features (see Figure 2).</p>
        <p>1https://github.com/UniversalDependencies/UD_English-EWT</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.4. Feature Clustering with Esim measure</title>
        <p>In the mutual information content similarity approach, we used the  (,   )
value to measure the importance of the feature   . This means, that the relevance of
a feature is given with the mutual information content related to the category label
(transition code). As Figure 3 shows, both measures provided the same importance
order of the features.</p>
        <p>Thus both methods can be used for feature reduction, they provide the same or
similar priority orders. On the other hand, we can mention an additional benefit
of the Esim method, namely, it can be used also to measure the general
similarity among the diferent features, independently from the category label. Based
on the generated distance matrix, also a clustering of the features can be
constructed. We have selected the MDS (Multidimensional Scaling) method to map
the elements of the feature set into the points of an Euclidean space. In our
test results (see Figure 4), the single outlier point with thick border denotes the
category variable. We can use clustering techniques like k-means algorithm, to
determine the group of similar features. The MDS result in Figure 4 shows, for
example, that S0_FORM_B0_FORM_POS and S0_FORM_POS_B0_FORM
are very similar to each others. The related cluster-based feature reduction and
the entropy-based reduction provide consistent results.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>In our study, we analyze a maximum entropy based feature reduction mechanism
of dependency parser algorithms. After evaluating our results, we can say that the
maximum entropy-based evaluation is well suited for investigating the relevance
of features. It can be seen that the accuracy achieved by the randomly dropped
features is constantly decreasing and falling under the entropy-selected features.
It also appears that this mechanism is language independent. By reducing the
number of features in the transition-based algorithms, we can achieve cost savings
without reducing the accuracy of the parser.</p>
      <p>Acknowledgements. The described study was carried out as part of the
EFOP3.6.1-16-00011 “Younger and Renewing University – Innovative Knowledge City –
institutional development of the University of Miskolc aiming at intelligent
specialization” project implemented in the framework of the Szechenyi 2020 program.
The realization of this project is supported by the European Union, co-financed by
the European Social Fund.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Prószéky</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koutny</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wacha</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>1989</year>
          ).
          <article-title>A dependency syntax of Hungarian, Metataxis in Practice , (</article-title>
          <year>1989</year>
          ),
          <fpage>151</fpage>
          -
          <lpage>181</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Klement</surname>
            <given-names>K. C.</given-names>
          </string-name>
          ,
          <article-title>Frege and the Logic of Sense and Reference</article-title>
          . Routledge,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Tesnière</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eléments de Syntaxe Structurale</surname>
          </string-name>
          , Klincksiek, Paris (
          <year>1959</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Ágel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <article-title>Dependency grammar and valency theory</article-title>
          .
          <source>In: The Oxford Handbook of Linguistic Analysis</source>
          , (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Sartorio</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <source>Improvements in Transition Based Systems for Dependency Parsing</source>
          ,
          <source>PhD Thesis</source>
          , Università degli Studi di Padova (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Nivre</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <article-title>Algorithms for deterministic incremental dependency parsing, Comput</article-title>
          . Linguist.,
          <volume>34</volume>
          (
          <issue>4</issue>
          ), (
          <year>2008</year>
          ),
          <fpage>513</fpage>
          -
          <lpage>553</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Nivre</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nilson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Memory-based dependency parsing</article-title>
          ,
          <source>Proceedings of CoNLL</source>
          , (
          <year>2004</year>
          ),
          <fpage>49</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nivre</surname>
            ,
            <given-names>J. .</given-names>
          </string-name>
          <article-title>Transition-based dependency parsing with rich non-local features</article-title>
          ,
          <source>Proceedings of ACL-HLT</source>
          , (
          <year>2011</year>
          ),
          <fpage>188</fpage>
          -
          <lpage>193</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Vincze</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szauter</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Almási</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Móra</surname>
          </string-name>
          , Gy. ,
          <string-name>
            <surname>Alexin</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Csirik</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <source>Hungarian Dependency Treebank</source>
          , (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Nivre</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kübler</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nilsson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuret</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <article-title>The CoNLL 2007 shared task on dependency parsing</article-title>
          .
          <source>EMNLP-CoNLL</source>
          , (
          <year>2007</year>
          ),
          <fpage>915</fpage>
          -
          <lpage>932</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Vincze</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farkas</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simkó</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szántó</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varga</surname>
          </string-name>
          , V. .
          <article-title>Univerzális dependencia és morfológia magyar nyelvre</article-title>
          .
          <source>XII. Magyar Számítógépes Nyelvészeti Konferencia</source>
          ,(
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Farkas</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szántó</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vincze</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zsibrita</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <article-title>Módosított morfológiai egyértelműsítés és integrált konstituenselemzés a magyarlanc 3.0-ban, XII. Magyar Számítógépes Nyelvészeti Konferencia</article-title>
          .(
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Blessie</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Chandra</surname>
          </string-name>
          , E. Karthikeyan,
          <article-title>Sigmis: A feature selection algorithm using correlation based method</article-title>
          ,
          <source>Journal of Algorithms and Computational Technology</source>
          , Vol
          <volume>6</volume>
          .
          <fpage>3</fpage>
          , (
          <year>2012</year>
          ),
          <fpage>385</fpage>
          -
          <lpage>394</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huan</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <article-title>Feature selection for high-dimensional data: A fast correlationbased filter solution</article-title>
          ,
          <source>Proceedings of the 20th international conference on machine learning (ICML-03)</source>
          , (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>