<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bulgarian Question Answering for Machine Reading</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kiril Simov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petya Osenova</string-name>
          <email>petya@bultreebank.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georgi Georgiev</string-name>
          <email>georgi.georgiev@ontotext.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valentin Zhikov</string-name>
          <email>valentin.zhikov@ontotext.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laura Toloşi</string-name>
          <email>laura.tolosi@ontotext.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Linguistic Modelling Department, IICT, Bulgarian Academy of Sciences Acad.</institution>
          <addr-line>G.Bonchev St. 25A, 1113 Sofia</addr-line>
          ,
          <country country="BG">Bulgaria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Polygraphia Office Center</institution>
          ,
          <addr-line>fl. 4, 47 A Tsarigradsko Shosse, 1504, Sofia</addr-line>
          ,
          <country country="BG">Bulgaria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the CLEF 2012 the BulTreeBank Group of LMD, IICT, BAS is participating for QA4MRE task for Bulgarian. The system represented in the paper exploits an NLP Pipeline for Bulgarian in order to process the questions, answers and the supporting texts. Then we represent the results of the analysis as a bag of linguistic units - lemmas, dependency relations. These bags of words are the match between the question plus answer and the sentences in the text. The answer that maximizes the overlap is selected as the correct one. Since the system is deterministic we have only one run. The score achieved by the run is 0.29. The other two runs are performed as baseline runs with randomly selected answers. Their scores are 0.20 and 0.12, respectively. Thus, the using of lin guistic units in the overlapping estimation provides significant improvements over the baseline.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>Linguistic NLP Pipeline</kwd>
        <kwd>Linguistically-enhanced Similarity</kwd>
        <kwd>Bag of Linguistic Units</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>2 Ontotext AD</title>
      <p>
        Bulgarian language has been included into the set of participating languages in CLEF
tasks since 2004. It is the first time when Bulgarian systems have been tuned to the
new format of the CLEF main task, namely: Question Answering for Machine
Reading Evaluation (QA4MRE). For the previous formats, an infrastructure was designed
        <xref ref-type="bibr" rid="ref4 ref6">(Osenova and Simov 2005)</xref>
        , which included processing NLP tools and strategies in
order to better handle the task requirements. These requirements were as follows: de
tection of correct answers to specific categorized questions in large corpora. Thus, an
adaptation was needed of our previous architecture to the new conditions of better un
derstanding small texts in pre-selected domains. Our approach focused on the analysis
of overlapping linguistic structures. In order to do this, we first converted manually
each question and possible answer into a declarative sentence. For example: Защо
страдащите от деменция трябва да бъдат насърчавани да рисуват? (Why do the
dementia sufferers have to be encouraged to take risks?) with a possible answer:
защото това укрепва паметта, вниманието и възприемането (because this would
strengthen their memory, attention and perception) are combined in the sentence:
Страдащите от деменция трябва да бъдат насърчавани да рисуват, защото това
укрепва паметта, вниманието и възприемането (The dementia sufferers have to be
encouraged to paint, because this would strengthen the memory, attention and
perception). Then these sentences and the supporting texts were analyzed by our NLP
pipeline for Bulgarian. This pipeline includes the following linguistic processing
steps: POS tagging, lemmatization and dependency structures. For each
question-andanswer pair we extracted a bag of lemmas and triples: (dependent lemma, dependency
relation, head lemma). This bag is then compared to the bag for each sentence. In this
way, each paired question-and-answer was ranked with respect to the overlapping
parts from sentences in the texts. As a next step, the answer that provided the largest
overlap has been chosen. The advantages of such an approach are: handling of the
structural ambiguity, such as active/passive alternations; pro-drop subjects;
modification/predication, etc. During the mapping, we also included some new triples that
were derived from the possible varieties of the answer in the supporting text.
Our group provided 3 runs - one was based on the processing described above, and
two were performed via a random selection in order to have a baseline for the com
parison. These two runs (2 and 3 in the uploaded information) provided the baseline
0.12 and 0.20, respectively. The result of the system based on the linguistic
processing is 0.29, which shows significant improvement over the baseline case.
The paper is structured as follows: next section described the NLP Pipeline for
Bulgarian, which we are suing for processing of the data within the task; Section 3
presented the answer ranking using the result from the processing via the NLP
pipeline; the last section concludes the paper and outlines some future direction of de
velopment.
2
      </p>
      <sec id="sec-1-1">
        <title>The NLP Pipeline for Bulgarian</title>
        <p>In this section we present the linguistic processing pipeline (BTB-LPP1) for
Bulgarian which we used for analyzing of the data. BTB-LPP comprises three main
modules: a Morphological Tagger, a Lemmatizer and a Dependency Parser.
2.1</p>
        <sec id="sec-1-1-1">
          <title>Morphological Tagger</title>
          <p>The morphological tagger is constructed as a pipeline of three modules - two stat
1</p>
          <p>The pipeline is developed on the basis of the language resources, created within
BulTreeBank project. The prefix BTB stands for BulTreeBank.
istical taggers trained on the Morphologically Annotated Part of BulTreeBank (Bul
TreeBank-Morph)2 and a rule-based module exploiting a large Bulgarian
Morphological Lexicon and manually crafted disambiguation rules.</p>
          <p>SVM Tagger</p>
          <p>
            The first statistical tagger uses the SVMTool
            <xref ref-type="bibr" rid="ref2 ref9">(Giménez and Márquez 2004)</xref>
            , which
is a SVM-based statistical sequential classifier. It is built on top of the SVMLight
            <xref ref-type="bibr" rid="ref3">(Joachims and Schölkopf 1999)</xref>
            implementation of the Support Vector Machine
algorithm
            <xref ref-type="bibr" rid="ref10">(Vapnik 1999)</xref>
            . Its flexibility allows it to be trained on an arbitrary language
as long as it is provided with enough annotated data. The accuracy of the tagging that
was achieved with the optimal training configuration ranged from 89 % to 91 % de
pending on the text genre. Having applied the morphological lexicon as a filter on the
possible tags for each word form together with the set of disambiguation rules, the
best achieved result was 94.65 % accuracy of the tagging.
          </p>
          <p>Rule-based Component</p>
          <p>
            The task of this component is to correct some of the erroneous analyses made by
the SVM Tagger. The correction of the wrong suggestions is performed by two
sources of linguistic knowledge – the morphological lexicon and the set of context
based rules. In the process of repairing we used as much as possible from the
information provided by the SVM tagger. The context rules are designed in such a way that
they aim at achieving higher precision even at the cost of low recall. The lexicon
look-up is implemented as cascaded regular grammars within the CLaRK system –
            <xref ref-type="bibr" rid="ref8">(Simov et. al 2001)</xref>
            . The lexicon is an extended version of
            <xref ref-type="bibr" rid="ref7">(Popov et. al 2003)</xref>
            and
covers more than 110 000 lemmas. Additionally, a set of gazetteers were incorporated
within the regular grammars. Here is an example of a rule: If a wordform is ambigu
ous between a masculine count noun (Ncmt) and a singular short definite masculine
noun (Ncmsh), the Ncmt tag should be chosen if the previous token is a numeral or a
number.
          </p>
          <p>Guided Learning System: GTagger</p>
          <p>
            GTagger is based on the guided learning system -
            <xref ref-type="bibr" rid="ref1">(Georgiev et. al 2012)</xref>
            . The best
result of the tagging is 97.98 % accuracy. It can be considered the state-of-the-art for
Bulgarian. However, this result is achieved when the input to GTagger is already
tagged with the list of all possible tags for each token - similarly to the morphological
dataset BulTreeBank-Morph. BTB-LPP provides such an input for GTagger
exploiting the SVM Tagger as well as the rule-based component that tags some tokens with a
list of the best possible candidate tags according to the morphological lexicon.
Additionally, the set of rules is applied in order to solve some of the ambiguities.
          </p>
          <p>The combination of the three components implements the morphological tagger of
BTB-LPP. The SVM Tagger plays the role of a guesser for the unknown words. The
rule-based component provides an accurate annotation of the known words, leaving
some unsolved cases. GTagger provides the final result. This result is used by the
lemmatizer and the dependency parser.</p>
          <p>http://www.bultreebank.org/btbmorf/
2.2</p>
        </sec>
        <sec id="sec-1-1-2">
          <title>Lemmatizer</title>
          <p>The second processing module of BTB-LPP is a functional lemmatization module,
based on the morphological lexicon, mentioned above. The functions are defined via
two operations on word forms: remove and concatenate. The rules have the following
form:
if tag = Tag then {remove OldEnd; concatenate NewEnd}
where Tag is the tag of the word form, OldEnd is the string which has to be
removed from the end of the word form and NewEnd is the string which has to
concatenated to the beginning of the word form in order to produce the lemma. Here is an
example of such a rule:
if tag = Vpitf-o1s then {remove ох; concatenate а}</p>
          <p>The application of the rule to the past simple verb form for the verb четох
(remove: ох; concatenate: а) gives the lemma чета (to read). Additionally, we encode
rules for unknown words in the form of guesser word forms: # ох and tag=Vpitf-o1s.
In these cases the rules are ordered.</p>
          <p>In order to facilitate the application of the rules, we attach them to the word forms
in the lexicon. In this way, we gain two things: (1) we implement the lemmatization
tool as a part of the regular grammar for lexicon look-up, discussed above and (2) the
level of ambiguity is less than 2% for the correct tagged word forms. In case of
ambiguities we produce all the lemmas. After the morphosyntactic tagging, the rules that
correspond to the selected tags, are applied.
2.3</p>
        </sec>
        <sec id="sec-1-1-3">
          <title>Dependency Parser</title>
          <p>
            Many parsers have been trained on data from BulTreeBank. Especially successful was
the MaltParser of Joakim Nivre
            <xref ref-type="bibr" rid="ref5">(Nivre et. al 2006)</xref>
            . It works with 87.6 % parsing
accuracy. The following text describes the dependency relations produced by the parser.
          </p>
          <p>Here is a table with the dependency tagset, related to the Dependency part of the
BulTreeBank. This part has been used for training of the dependency parser:
adjunct
12009
clitic
2263
comp
18043
conj
6342
conjarg
7005</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Adjunct (optional verbal argument)</title>
    </sec>
    <sec id="sec-3">
      <title>Short forms of the possessive pronouns</title>
    </sec>
    <sec id="sec-4">
      <title>Complement (arguments of non-verbal heads, non-finite verbal heads, copula, auxiliaries)</title>
    </sec>
    <sec id="sec-5">
      <title>Conjunction in coordination</title>
    </sec>
    <sec id="sec-6">
      <title>Argument (second, third, ...) of coordination</title>
      <p>Indirect Object (indirect argument of a non-auxiliary
verbal head)</p>
    </sec>
    <sec id="sec-7">
      <title>Marked (clauses, introduced by a subordinator)</title>
    </sec>
    <sec id="sec-8">
      <title>Modifier (dependants which modify nouns, adjectives, adverbs; also the negative and interrogative particles)</title>
    </sec>
    <sec id="sec-9">
      <title>Object (direct argument of a non-auxiliary verbal head)</title>
    </sec>
    <sec id="sec-10">
      <title>Subject</title>
    </sec>
    <sec id="sec-11">
      <title>Pragmatic adjunct</title>
    </sec>
    <sec id="sec-12">
      <title>Punctuation</title>
    </sec>
    <sec id="sec-13">
      <title>Clausal adjunct</title>
    </sec>
    <sec id="sec-14">
      <title>Clausal complement</title>
    </sec>
    <sec id="sec-15">
      <title>Clausal modifier</title>
    </sec>
    <sec id="sec-16">
      <title>Clausal complement of preposition</title>
    </sec>
    <sec id="sec-17">
      <title>Clausal subject</title>
      <p>
        In addition to the dependency tags, also the morphosyntactic tags have been
attached to each word
        <xref ref-type="bibr" rid="ref9">(Simov et. al 2004)</xref>
        . For each lexical node the lemma was as
signed. The number under the name of each relation indicates how many times the
relation appears in the dependency version of BulTreeBank.
      </p>
      <p>Here is an example of a processed sentence. The sentence is Бразилия е
епицентърът на пандемията на СПИН (Brazil is the epicenter of the AIDS
pandemic.) After the application of the language pipeline, the result is represented in a table
form following the CoNLL shared task format. It is given in Table 2.</p>
      <p>The column WF corresponds to the order of the word forms in the sentence. The
information in Ling column is the suffix of the corresponding tag (according to Bul
TreeBank morphosyntactic tagset) after removing the prefix represented in column
POSex (extended POS). The elements in Head point to number of the dependency
head of the given word form. The Rel is the dependency relation between the two
wordforms.</p>
      <p>In the next section we present the procedure for using the pipeline for the QA4MR
task for Bulgarian.
3</p>
      <sec id="sec-17-1">
        <title>Answer Ranking</title>
        <p>In the process of answer selection for each question we have performed the following
steps:
1. The supporting texts were processed by the NLP pipeline described in the previous
section;
2. For each question and each potential answer of the question we constructed a
declarative sentence which provides evidence that the potential answer is really an
answer of the question;
3. The analyses of the sentences in the texts are compared for similarity with the
analysis of the declarative sentence produced in step 2. In this way we rank the
answers for each question.</p>
        <p>In the rest of the section we describe in more details each of the steps.</p>
        <p>
          Each sentence in the texts was presented as a bag of linguistic units where each
unit is either a lemma, either a triple from the dependency tree for the sentence -
&lt;DepLemma, Rel, HeadLemma&gt;. In the triple DepLemma is the lemma of the dependency
node in the tree, HeadLemma is the lemma for the head node in the tree, Rel is the
relation between the nodes. Thus, the ranking of the answers will be done on the basis
of a sentence in the text and the bag of the selected linguistic units. The first decision
is motivated by the limitation of the current processing pipeline which cannot estab
lish reliable connections between the linguistic units in more than one sentence. The
second decision is motivated by the fact that the matching of dependency trees might
be very complicated, although we are aware of works on edit distance comparisons,
such as the one used in
          <xref ref-type="bibr" rid="ref4">(Kouylekov and Magnini, 2005)</xref>
          .
        </p>
        <p>In order to ensure the mapping between the analysis of the question-and-answer
pair and the analyses of the sentences in the text we had to process the pair in the
same way. The initial idea was to process the question and the corresponding answers
separately, but there were some problems. The dependency parser is not good on
fragments of sentences. But the answers are fragments in most cases. Some possible
relations between the words in the question and in the answer had to be used. On the
other hand, the ideal supporting sentence in the text would be would be composed of the
question (potentially rearranged in a declarative form) and the answer in an
appropriate way. Thus, we decided to convert each pair of a question and a potential answer
into the best supporting declarative sentence. In our case this was done manually, but
in future we envisage implementing this procedure automatically. Here are two
examples:</p>
        <p>Q1: Защо страдащите от деменция трябва да бъдат насърчавани да рисуват?
(Why do the dementia sufferers have to be encouraged to take risks?)
A1: защото това укрепва паметта, вниманието и възприемането</p>
        <p>(because this would strengthen their memory, attention and perception)
D1: Страдащите от деменция трябва да бъдат насърчавани да рисуват, защото
това укрепва паметта, вниманието и възприемането.</p>
        <p>(The dementia sufferers have to be encouraged to paint, because this
would strengthen the memory, attention and perception.)
Q2: Кой е епицентърът на пандемията на СПИН?</p>
        <p>(Where is the epicenter of the AIDS pandemic?)
A2: Бразилия</p>
        <p>(Brazil)
D2: Бразилия е епицентърът на пандемията на СПИН.</p>
        <p>(Brazil is the epicenter of the AIDS pandemic.)</p>
        <p>The ranking of each answer was done by calculating, first, the size of the
intersection of the bag of linguistic units for each sentence in the texts and the bag for the
pair's sentence. Then we calculated the maximum of the size of the intersections. This
maximum was considered a rank of the pair question-and-answer and, thus, it is the
rank of the answer. Then we selected the answer with the highest rank as an answer to
the question. In case there is more than one answer with the same highest rank we
selected randomly one of them.</p>
        <p>
          For the type of questions for which we know possible variations of the realization
of the answers in the text
          <xref ref-type="bibr" rid="ref4 ref6">(see Osenova and Simov 2005)</xref>
          , we also included triples in
the bag for the pair question-and-answer. In this way we approached some basic cases
of paraphrases.
        </p>
        <p>Three runs of the system have been performed. One is the actual application of the
above procedure. We also performed two random runs in order to establish a baseline.
The two baseline runs were evaluated with scores: 0.12 and 0.20. The actual run
received a score 0.29. This score shows a significant improvement over the baseline
scores.</p>
        <p>The error analysis showed two main problems for the method. First, in many cases
the words in the question-and-answer pair differ from the words used in the text.
Second, we did not implement enough paraphrased linguistic units.
4</p>
      </sec>
      <sec id="sec-17-2">
        <title>Conclusion and Future Work</title>
        <p>In the paper we presented a method for QA4MR for Bulgarian which exploits
linguistic analyses of both - the question-and-answer pairs and the text. The similarity
metric between the question-and-answer pairs and the sentences in the text is based on
bag-of-linguistic units - lemmas and dependency relations between lemmas in the
sentences. Our conclusion is that for improving the results, a good synonymic lexicon is
needed to cover the lexical variety as well as the usage of more other knowledge re
sources, such as the provided background collections, various kinds of thesauri and
domain-specific dictionaries for the specialized terms. Additionally, we need to
extend the paraphrases generation mechanism. In future, we will also work on the inclu
sion of more semantic objects in the comparison algorithm using connections to
ontological knowledge. Another restriction of the current method is that the comparison of
the question-and-answer derived sentence is only with one sentence in the text. It is
necessary to develop a better model that broadens the observations both - in the text
and in the question-and-answer unit. We expect the approach, proposed here, would
perform better on technical domains, where the degree of lexical variety is minimized
and literal repetitions are used instead of synonyms.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Georgiev</surname>
          </string-name>
          ,
          <string-name>
            <surname>Georgi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Zhikov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Osenova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Simov</surname>
            , and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Nakov</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Feature-rich part-of-speech tagging for morphologically complex languages: Application to Bulgarian</article-title>
          .
          <source>In: Proceedings of EACL</source>
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Giménez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Márquez</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2004</year>
          .
          <article-title>SVMTool: A general POS tagger generator based on Support Vector Machines</article-title>
          .
          <source>In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC'04)</source>
          . Lisbon, Portugal.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Joachims</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Schölkopf</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>1999</year>
          .
          <article-title>Making Large-Scale SVM Learning Practical</article-title>
          . In: Burges,
          <string-name>
            <given-names>C.</given-names>
            and
            <surname>Smola</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.),
          <source>Advances in Kernel Methods - Support Vector Learning</source>
          . Cambridge, MA, USA: MIT Press.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kouylekov</surname>
            , Milen and
            <given-names>Bernardo</given-names>
          </string-name>
          <string-name>
            <surname>Magnini</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Tree Edit Distance for Recognizing Textual Entailment</article-title>
          .
          <source>In Recent Advances in Natural Language Processing (RANLP-2005)</source>
          , Borovetz, Bulgaria.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Nivre</surname>
            , Joakim, Johan Hall,
            <given-names>Jens</given-names>
          </string-name>
          <string-name>
            <surname>Nilsson</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Malt-Parser: A data-driven parser-generator for de-pendency parsing</article-title>
          .
          <source>In Proc. of LREC-2006</source>
          , pp
          <fpage>2216</fpage>
          -
          <lpage>2219</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Petya</given-names>
            <surname>Osenova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Kiril</given-names>
            <surname>Simov</surname>
          </string-name>
          .
          <article-title>Infrastructure for Bulgarian Question Answering</article-title>
          .
          <article-title>Implication for the Language Resources and Tools</article-title>
          . Piperidis and Paskaleva (eds).
          <source>Proc. Workshop</source>
          on Language and
          <article-title>Speech Infrastructure for Information Access in the Balkan Countries</article-title>
          . Borovetc, Bulgaria.
          <year>2005</year>
          . pp
          <fpage>47</fpage>
          -
          <lpage>52</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Popov</surname>
            , Dimitar, Kiril Simov, Svetlomira Vidinska, and
            <given-names>Petya</given-names>
          </string-name>
          <string-name>
            <surname>Osenova</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Spelling Dictionary of Bulgarian. Nauka i izkustvo</article-title>
          , Sofia, Bulgaria (in Bulgarian).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Simov</surname>
          </string-name>
          ,
          <string-name>
            <surname>Kiril</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Peev</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Kouylekov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Simov</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dimitrov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kiryakov</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>CLaRK - an XML-based System for Corpora Development</article-title>
          .
          <source>In: Proc. of the Corpus Linguistics</source>
          <year>2001</year>
          Conference. Lancaster, UK.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Simov</surname>
            , Kiril, Petya Osenova and
            <given-names>Milena</given-names>
          </string-name>
          <string-name>
            <surname>Slavcheva</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>BTB-TR03: BulTreeBank Morphosyntactic Tagset</article-title>
          .
          <source>BulTreeBank Technical Report</source>
          №
          <volume>03</volume>
          (http://www.bultreebank.org/TechRep/BTB-TR03.pdf).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Vapnik</surname>
            ,
            <given-names>V. N.</given-names>
          </string-name>
          <year>1999</year>
          .
          <article-title>The nature of statistical learning theory (2nd ed</article-title>
          .). New York: Springer.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>