<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Metric MIF MIP MIR Acc.
IIIT System</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>IIITH at BioASQ Challenge 2015 Task 3a: Extreme Classi cation of PubMed Articles using MeSH Labels</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Avinash Kamineni?</string-name>
          <email>avinash.kamineni@research.iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nausheen Fatma?</string-name>
          <email>nausheen.fatma@research.iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arpita Das?</string-name>
          <email>arpita.das@research.iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manish Shrivastava</string-name>
          <email>m.shrivastava@iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manoj Chinnakotla</string-name>
          <email>manojc@microsoft.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>International Institute of Information Technology Hyderabad</institution>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Microsoft</institution>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <volume>3</volume>
      <issue>0</issue>
      <abstract>
        <p>Automating the process of indexing journal abstracts has been a topic of research for several years. Biomedical Semantic Indexing aims to assign correct MeSH terms to the PubMed documents. In this paper we report our participation in the Task 3a of BioASQ challenge 2015. The participating teams were provided with PubMed articles and asked to return relevant MeSH terms. We tried three di erent approaches: Nearest Neighbours, IDF-Ratio based indexing and multi-label classi cation. The o cial challenge results demonstrate that we consistently performed better than the baseline approaches for Task 3a.</p>
      </abstract>
      <kwd-group>
        <kwd>MeSH Indexing</kwd>
        <kwd>Biomedical Semantic Indexing</kwd>
        <kwd>Hierarchical Text Classi cation</kwd>
        <kwd>FastXML</kwd>
        <kwd>PubMed</kwd>
        <kwd>Information Retrieval and Extraction</kwd>
        <kwd>Metamap</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The annotation of biomedical journals by the experts is both expensive and
time-consuming. Therefore, Large Scale Hierarchical Text Classi cation in this
domain has gained much importance over the past few years. It is also helpful in
elds like Question Answering, Information Retrieval, Categorization etc. The
challenge introduced by BioASQ [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] deals with handling large scale complex
data and automatically assigning relevant MeSH [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] terms to the PubMed [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
articles.
      </p>
      <p>
        Researchers have tried to crack the problem of biomedical semantic indexing
using a wide variety of methods such as Latent Semantic Analysis [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], Latent
Dirichlet Allocation (LDA) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Support Vector Machines [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] etc. We approach
the problem from a document clustering perspective, based on the observation
that similar documents often share MeSH terms. In this paper, we built a generic
model for tagging the documents with MeSH terms which can be utilized in
? These authors contributed equally
any other domain. Three di erent approaches namely Nearest Neighbours,
IDFRatio based learning and FastXML [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] based extreme classi cation were used.
All the three approaches beat the BioASQ baseline and had high precision values,
however the values of recall were comparatively low.
      </p>
      <p>The rest of the paper is divided into following sections: Section 2, describe the
previous works done in BioASQ semantic indexing task. Sections 3 explains the
model using di erent approaches in detail. Section 4, contain the experiments
performed and the results obtained. Section 5, comprises of the conclusion and
future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Semantic Indexing has been a topic of research for several years. Amongst the
successful unsupervised models, the most well known one is Latent Semantic
Analysis (LSA) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] developed by Deerwester et al. LSA takes the high
dimensional vector space representation of documents and applies dimension reduction
by Singular Value Decomposition (SVD) on it. The similarities between
documents are more reliably estimated in the latent semantic space than in the
original one. However, LSA lacks solid statistical foundation. Hence, Ho man et al.
introduced Probabilistic Latent Semantic Analysis (PLSA) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] based on a
statistical latent class model.This model dealt with domain speci c synonymy and
polysemy. David M. Blei et al. introduced Latent Dirichlet Allocation (LDA) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
considering the mixture models that capture the exchangeability
(Exchangeability and related topics David J. Aldous) of both words and documents. Each item
of a collection is modeled as a nite mixture over an underlying set of topics.
      </p>
      <p>
        Few supervised methods were also developed in this area. Bing Bai et al.
proposed Supervised Semantic Indexing (SSI) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] which de nes a class of models
that can be trained on a supervised signal (i.e., labeled data) to provide a ranking
of a database of documents given a query. Sutanu et al. proposed sprinkling
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to automatically index documents. Sprinkling is a simple extension of LSI
based on augmenting the set of features using additional terms that encode
class knowledge. But sprinkling treats all classes in the same way. To overcome
this problem, they proposed Adaptive Sprinkling (AS) which leverages confusion
matrices to emphasise the di erences between those classes which are hard to
separate.
      </p>
      <p>
        Considering prediction of MeSH headings, we have Medical Text Indexer
(MTI) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], the o cial solution of National Library of Medicine (NLM). The
major components of MTI are:
1. MetaMap Indexing (MMI) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
2. PubMed Related Citations [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]
3. Restrict to MeSH [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
4. Extract MeSH Descriptors
5. Clustering and Ranking [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
The approach of Tsoumakas, G. et al. [24] performed better than MTI.
MetaLabeler [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] by Tang et al. used binary classi cation model trained using linear
SVM. Also a regression model was trained to predict the number of MeSH
headings for each citation. Finally, given a target citation, di erent MeSH headings
were ranked according to the SVM prediction score of each classi er, and the top
K MeSH headings were returned. Learning to rank (LTR) method, which was
utilized by Lu et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] for automatic MeSH annotation. In this method,
each citation was deemed as a query and each MeSH headings as a document.
LTR method was utilized to rank candidate MeSH headings with respect to
target citation. The candidate MeSH headings came from similar citations (nearest
neighbors). In the similar line of thought Huang et al. reformulated the indexing
task as a ranking problem [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. They retrieved 20 neighbor documents, obtained
a list of MeSH main headings from neighbors, and ranked the MeSH headings
using ListNet learning-to-rank algorithm [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Our Approach</title>
      <p>Our system mainly consists of three di erent modules. We compare these di
erent systems. In this section, we explain these approaches in detail.</p>
      <p>Fig. 1: System Modules</p>
      <p>We have implemented three distinct techniques to index articles. Eventually,
our aim was to nd which of these techniques contribute the most in nding
relevant MeSH terms. The following are the three techniques:</p>
      <sec id="sec-3-1">
        <title>1. K Nearest Neighbours approach 2. IDF-Ratio based approach 3. Extreme Classi cation using FastXML.</title>
        <p>3.1</p>
        <sec id="sec-3-1-1">
          <title>K Nearest Neighbours Approach</title>
          <p>
            In this approach,we use a K Nearest Neighbours [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] based lazy learning
approach to nd the most relevant MeSH headings.
1. The training les were rst converted to Lucene index with elds
\pmid",\title",\abstractText",\meshMajors".
2. K Nearest Neighbours are retrieved for nding the candidate MeSH terms
For a given unknown test instance, the elds abstract and title were
concatenated as a single string. We then nd K Nearest Neighbours (with k=60)
from the Lucene index. Similarity of documents is computed by nding the
number of overlapping words and giving them di erent weights based on
TF-IDF [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ].
3. Rank to each candidate MeSH term is given by its number of occurrences in
the neighbours
Top 60 (k=60) similar records were retrieved and a HashMap was created
with every MeSH term found in the neighbours as key and the count of total
number of times that MeSH term occurs in the all the neighbours together as
value. The HashMap keys become our candidate MeSH terms for the given
test instance.
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>4. Threshold is used for nal predictions</title>
        <p>For every &lt;key,value&gt; pair in the hashmap created above, the value is
compared against a threshold . If value &gt;= then the key is included in a set
S. If the value &lt; then we check if the key (which is a MeSH term) exists
in the title or abstract. If the key is present in the title or abstract then it is
very likely that the key is a relevant label and is added to the set S. After all
the &lt;key,value&gt; pairs have been iterated,the set S becomes our nal MeSH
label set for x.</p>
        <p>was set to 12 empirically for k=60. It was observed that threshold = k=5
generally gave optimum results for unweighted votes.</p>
        <p>Query k alpha precision recall
Title + abstract -stopwords 60 12 0.510845 0.503196
Title + abstract -stopwords 75 3.75 0.472817 0.539864
nounphrases(From Title + abstract) 75 3.75 0.451753 0.540818
Nouns(From Title + abstract) 75 3.85 0.464746 0.541609
Nouns(From Title + abstract) 75 15 0.511757 0.487618
Nouns(From Title + abstract) 60 12 0.50631 0.496969
Some variations using this approach were also tried :
1. Weighted votes are used with similarity distance score as weight.
2. Using just noun phrases as queries
3. Using just nouns as queries</p>
        <sec id="sec-3-2-1">
          <title>3.2 IDF-Ratio based approach</title>
          <p>We know that IDF (Inverse Document Frequency) measures the importance of
a particular term in a set of documents. But certain terms like \is", \and", and
\are", may appear frequently but have little importance. Hence idf weighs down
the frequently occurring terms and boosts up the rare and signi cant ones. IDF
for a term t can be expressed as:</p>
          <p>IDF (t) = log</p>
          <p>N
Nt
where, N is total number of documents, Nt is number of documents with term
t.
(1)
Here for the task of semantic indexing we need to nd that how much a
particular word is important for a MeSH term. In other words we want to nd out which
particular word(s) in a document can lead to a MeSH term.For extracting this
information the novel concept of IDF-Ratio is introduced. This ratio identi es
the word(s) in a document that will certainly result in a MeSH term. The IDF
Ratio with respect to a MeSH term for a word can be expressed as :
IDF</p>
          <p>Nm
Ratio(tjm) = ( NNtm )</p>
          <p>Nt
(2)
where, Nm is number of times a particular MeSH term m is occurring, Ntm is
total number of times the term t occurred with that MeSH term m. Thus,
IDFRatio(tjm) for a t term exists for every 27455 MeSH terms (m) provided.</p>
          <p>We have IDF Ratio of a word for all the MeSH terms. It does not make sense
to consider all the 27455 MeSH terms for a single word, since a word cannot lead
to all the MeSH terms. So it is necessary to lter out the unwanted MeSH terms
for each word. We do this by thresholding. After experimenting with di erent
values, a threshold of 0.55 was found to be optimum. Now every word is related
to 5-15 relevant MeSH terms which it can potentially lead to. Some of the MeSH
terms like \humans", \male", \female", \animals" are very common and occurs
with almost every word, so for any word, the IDF Ratios with respect to these
MeSH terms are very high. So almost all the words lead to these MeSH terms.
1. Pre-processing</p>
          <p>The documents given to index are tokenized. The set of biomedical stopwords
are eliminated from the documents. Some Special symbols are removed. The
symbols necessary for retaining the meaning of chemical components are
kept intact.
2. Extraction of meaning words</p>
          <p>
            POS-Tagger is used to extract the NN ,NNS, NNP,VB,JJ and RB tags from
the documents. SENNA [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] is used for the tagging purpose. It uses deep
learning (unsupervised convolutional neural network) to tag sentences.
3. Collection of candidate MeSH terms
          </p>
          <p>After obtaining the meaning words we consult the IDF Ratios with respect
to the MeSH terms. For each word, we choose a set of MeSH terms it can
lead to. Finally we get a candidate set of potential MeSH terms.
4. Ranking the candidate MeSH terms</p>
          <p>The MeSH terms in the candidate set has to be ranked correctly. The
following ranking approaches were used:
(a) Ranking in the order of IDF-Ratio: The words possess IDF Ratio
with respect to the MeSH terms, we can rank these MeSH terms in the
order of these ratios. If more than one word in the document leads to
the same MeSH term ,their corresponding IDF ratios are simply added.
(b) Ranking in terms of maximum intersection: In a document if
several words are pointing to the same MeSH term then that MeSH term
must be important for that document. This concept is utilised in this
ranking method. We gather the set of MeSH terms for each meaning
word and nd the intersection of these sets. The elements of intersection
are assigned as indices of the document.
(c) SVM-Rank:3 It is used to rank lists of items. For training, the inputs to
SVM-Rank are ordered entries of every possible pair of items which are
assigned weights depending upon the correctness of the order. Initial step
of optimisation problem is formulated as ordinal regression; however, it
is turned into a classi cation problem due to the pair wise di erence.
In the semantic indexing task, feature vector is composed for the MeSH
terms. The feature vector consists of bag of words, IDF Ratio weights,
etc. The above two methods of ranking mentioned in a) and b) did not
yield good results, so the rankings obtained through them were included
as features for training SVM-Rank. Inclusion of this feature resulted in
a slight improvement in the performance.</p>
          <p>The main di culty was in assigning weights to the MeSH terms. While
training, we give all the terms assigned to that document very high
weights, but we cannot grade them in some order, as we have no clue
which of the tags assigned to the document has more weight and which
has less weight. Similarly, we have no other way of giving weights to the
3 SVM for ranking http://www.cs.cornell.edu/people/tj/svm_light/svm_rank.
html#References
remaining MeSH terms in the data provided, that are not assigned to
that document .</p>
          <p>
            After ranking is done ,the ltered top-ranked MeSH terms are assigned to
the document.
The main objective of FastXML [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ] is to acquire fast and e cient training of a
model. Training of 4 Million BioASQ 2015 documents took about 36 hours on a
4 core machine. Also, FastXML is capable of learning the hierarchy of the MeSH
terms by optimizing the ranking loss function. Existing approaches optimize
local measures of performance which depends solely on predictions made by the
current node being partitioned. FastXML allows the hierarchy to be learned node
by node, starting from the root and going down to the leaves, thus it is more
e cient than learning all the nodes jointly. The frequent MeSH terms could be
learnt better compared to the rare ones.
          </p>
          <p>FastXML is based on the assumption that only a few number of labels occur
at each region of the feature space. It learns ensemble of trees and does not
rely on base classi ers. The output of the classi er is the labels along with their
probabilities. It also provides the precision at 1..k, where k is the max number
of labels that must be tagged for a document. The experimental results of this
approach is explained below.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>1. Tokenization</title>
          <p>As the terms in this particular domain contains special symbols in the
chemical formulae etc, special care is taken while tokenizing. Few special
symbols like (-,) are maintained. This tokenization is done using the
tokenization module of word2vec4 source code provided in Open source software by
BioASQ. They also have the vocabulary list of 1.7 million words.5</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>2. DF Matrix Construction</title>
          <p>We iterate over each document in the BioASQ 2015 training set and tokenize
the title and abstract, for each token we increment the corresponding MeSH
term column. So, this gives us a sparse matrix, indexed accordingly, which
is later used for feature extraction.
4 The word vectors can then be used, for example, to estimate the relatedness
of two words or to perform query expansion. http://bioasq.lip6.fr/tools/
BioASQword2vec/
5 For the unidenti ed words in the vocabulary, we have done simple Laplace Smoothing
for updating the weights of the feature.
As a part of the BioASQ 3a challenge 2015, we have made weekly submissions
of the two of three batches. We performed better than the baseline System each
time. The results of one of submission of 3a Batch 3, Week 3 are shown in the
following tables.</p>
          <p>In tables 3 and 4, IIIT System 3 represents the Nearest Neighbours approach,
IIIT System 4 represents the IDF Ratio based approach and qaiiit system 1
represents the FastXML approach.
6 Semantic Word Matching Algorithm http://en.wikipedia.org/wiki/Lesk_
algorithm
1. This method gives a very high precision of 0.84 but the candidate set is too
large in number.
2. SVM-Rank gives a very low recall of 0.25 only. This is due to the inability
to assign proper weights in descending order to the MeSH terms.
3. Ranking in the order of IDF-Ratio gave a recall of 0.267. Very common MeSH
terms like male, females, rats had very high IDF-Ratio value in the overall
documents, hence they were assigned to almost all the documents,thus
decreasing the recall value.
4. Ranking in terms of maximum intersection also gave a recall of 0.232. This
faced the similar problem as that in ranking in the order of IDF-Ratio.</p>
          <p>Mostly, the common MeSH terms were found in the intersection set .
5. Due to the high precision and low recall the overall F-score reduced to 0.4.</p>
          <p>Precision Recall F-Score
SVM-Rank 0.84 0.25 0.39
IDFRatio order 0.84 0.267 0.41</p>
          <p>Intersection 0.84 0.232 0.36
1. Few of the common MeSH terms like \Humans", \Male",\Female" occurs in
most of the articles hence these terms are tagged with high probability.
2. Rare MeSH terms like \2-Oxoisovalerate Dehydrogenase (Acylating)",
\HydroxyacylCoA Dehydrogenase" occurs in very few articles,hence their probability of
being tagged is very low.
For IDF Ratio based approach, the following observations were made:
1. The concept of IDF Ratio is pretty intuitive, it help us determine the
importance of a word for a particular MeSH term. We can determine the presence
of which words lead to a MeSH term.
2. As a part of an experiment, hierarchy information was tried to be infused
in this method. Several approaches were tried like for a MeSH term, its
child, parent and siblings are included till 2 levels in the candidate set, or if
a parent is included in candidate set its child is excluded,etc. Several such
schemes were applied but with no signi cant change in results. No particular
hierarchial pattern was followed by the data provided.
3. As already mentioned the precision of this approach was high, the
candidate set sort of formed a superset of the answers obtained by the other two
methods i.e., Extreme Classi cation and Nearest Neighbour.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Future Work</title>
      <p>It can be stated that by using the Nearest Neighbours we can limit the
candidate MeSH terms by maintaining the precision and recall. By the IDF Ratio
approach we can gather all the mesh terms a word can lead to. It sort of
captures both lexical and semantic information. By using Extreme Classi cation,
training can be done quickly even on a single machine, this process is scalable.
The information of hierarchy between the MeSH terms can be captured. These
three approaches mentioned, are implemented independently. The next logical
step would be to combine these results and use them as features for the ranking
algorithm, which will be done as a part of our future work. Future work includes :
1. To come up with a better ranking algorithm to rank the MeSH terms in the
candidate set.
2. To exploit the hierarchy information of the MeSH headings provided.
3. To merge the 3 approaches to get a compact and smaller version of the
candidate set.
4. In IDF-Ratio approach we are basically nding the MeSH terms which are
pointed by individual words, in future it would be a better idea to nd the
MeSH terms which the entire document is leading to.
Ngonga Ngomo. Bioasq: A challenge on large-scale biomedical semantic
indexing and question answering.
24. Grigorios Tsoumakas, Manos Laliotis, Nikos Markantonatos, and Ioannis Vlahavas.</p>
      <p>Large-scale semantic indexing of biomedical publications at bioasq.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>1. Medical subject headings https://www.nlm.nih.gov/mesh/.</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <article-title>Medical text indexer (mti) processing ow whitepaper</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>3. Search engine of medline database http://www.ncbi.nlm.nih.gov/pubmed.</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Aronson</surname>
            <given-names>AR</given-names>
          </string-name>
          .
          <article-title>The mmi ranking function whitepaper</article-title>
          , (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Aronson</surname>
          </string-name>
          .
          <article-title>E ective mapping of biomedical text to the umls metathesaurus: the metamap program</article-title>
          .
          <source>Proc AMIA Symp</source>
          , pages
          <volume>17</volume>
          {
          <fpage>21</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Bing</given-names>
            <surname>Bai</surname>
          </string-name>
          , Jason Weston, David Grangier,
          <string-name>
            <given-names>Ronan</given-names>
            <surname>Collobert</surname>
          </string-name>
          , Kunihiko Sadamasa, Yanjun Qi, Olivier Chapelle, and
          <string-name>
            <given-names>Kilian</given-names>
            <surname>Weinberger</surname>
          </string-name>
          .
          <article-title>Supervised semantic indexing</article-title>
          .
          <source>In Proceedings of the 18th ACM Conference on Information and Knowledge Management</source>
          ,
          <source>CIKM '09</source>
          , pages
          <fpage>187</fpage>
          {
          <fpage>196</fpage>
          , New York, NY, USA,
          <year>2009</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>David</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>Andrew Y.</given-names>
          </string-name>
          <string-name>
            <surname>Ng</surname>
            , and
            <given-names>Michael I.</given-names>
          </string-name>
          <string-name>
            <surname>Jordan</surname>
          </string-name>
          .
          <article-title>Latent dirichlet allocation</article-title>
          ,
          <year>March 2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Olivier</given-names>
            <surname>Bodenreider</surname>
          </string-name>
          , Stuart J Nelson, William T Hole, and
          <string-name>
            <given-names>H Florence</given-names>
            <surname>Chang</surname>
          </string-name>
          .
          <article-title>Beyond synonymy: exploiting the umls semantics in mapping vocabularies</article-title>
          .
          <source>In Proceedings of the AMIA symposium, page 815</source>
          . American Medical Informatics Association,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Lijuan</given-names>
            <surname>Cai</surname>
          </string-name>
          and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Hofmann</surname>
          </string-name>
          .
          <article-title>Hierarchical document categorization with support vector machines</article-title>
          .
          <source>In Proceedings of the thirteenth ACM international conference on Information and knowledge management</source>
          , pages
          <volume>78</volume>
          {
          <fpage>87</fpage>
          . ACM,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Zhe</surname>
            <given-names>Cao</given-names>
          </string-name>
          , Tao Qin,
          <string-name>
            <surname>Tie-Yan</surname>
            <given-names>Liu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Feng Tsai</surname>
            , and
            <given-names>Hang</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Learning to rank: from pairwise approach to listwise approach</article-title>
          .
          <source>In Proceedings of the 24th international conference on Machine learning</source>
          , pages
          <volume>129</volume>
          {
          <fpage>136</fpage>
          . ACM,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Sutanu</surname>
            <given-names>Chakraborti</given-names>
          </string-name>
          , Rahman Mukras, Robert Lothian, Nirmalie Wiratunga,
          <string-name>
            <given-names>Stuart N. K.</given-names>
            <surname>Watt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>David J.</given-names>
            <surname>Harper</surname>
          </string-name>
          .
          <article-title>Supervised latent semantic indexing using adaptive sprinkling</article-title>
          . In Manuela M. Veloso, editor,
          <source>IJCAI</source>
          , pages
          <volume>1582</volume>
          {
          <fpage>1587</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Tsung-Hsien</surname>
            <given-names>Chiang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hung-Yi Lo</surname>
          </string-name>
          , and
          <string-name>
            <surname>Shou-De Lin</surname>
          </string-name>
          .
          <article-title>A ranking-based knn approach for multi-label classi cation</article-title>
          .
          <source>In ACML</source>
          , pages
          <volume>81</volume>
          {
          <fpage>96</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Ronan</surname>
            <given-names>Collobert</given-names>
          </string-name>
          , Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and
          <string-name>
            <given-names>Pavel</given-names>
            <surname>Kuksa</surname>
          </string-name>
          .
          <article-title>Natural language processing (almost) from scratch</article-title>
          .
          <source>J. Mach. Learn. Res.</source>
          ,
          <volume>12</volume>
          :
          <fpage>2493</fpage>
          {
          <fpage>2537</fpage>
          ,
          <string-name>
            <surname>November</surname>
          </string-name>
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Scott</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Deerwester</surname>
          </string-name>
          , Susan T. Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman.
          <article-title>Indexing by latent semantic analysis</article-title>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Hofmann</surname>
          </string-name>
          .
          <article-title>Probabilistic latent semantic indexing</article-title>
          .
          <source>In Proceedings of the 22Nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '99</source>
          , pages
          <fpage>50</fpage>
          {
          <fpage>57</fpage>
          , New York, NY, USA,
          <year>1999</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Minlie</surname>
            <given-names>Huang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Aurelie</given-names>
            <surname>Neveol</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Zhiyong</given-names>
            <surname>Lu</surname>
          </string-name>
          .
          <article-title>Recommending mesh terms for annotating biomedical articles</article-title>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>Jimmy</given-names>
            <surname>Lin</surname>
          </string-name>
          and
          <string-name>
            <given-names>W John</given-names>
            <surname>Wilbur</surname>
          </string-name>
          .
          <article-title>Pubmed related articles: a probabilistic topicbased model for content similarity</article-title>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Christopher D. Manning</surname>
          </string-name>
          , Prabhakar Raghavan, and Hinrich Schutze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>Yuqing</given-names>
            <surname>Mao</surname>
          </string-name>
          and
          <string-name>
            <given-names>Zhiyong</given-names>
            <surname>Lu</surname>
          </string-name>
          .
          <article-title>Ncbi at the 2013 bioasq challenge task: Learning to rank for automatic mesh indexing</article-title>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20. James G Mork, Antonio Jimeno-Yepes, and
          <string-name>
            <surname>Alan R Aronson.</surname>
          </string-name>
          <article-title>The nlm medical text indexer system for indexing biomedical literature</article-title>
          .
          <source>In BioASQ@ CLEF</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>Yashoteja</given-names>
            <surname>Prabhu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Manik</given-names>
            <surname>Varma</surname>
          </string-name>
          .
          <article-title>Fastxml: A fast, accurate and stable treeclassi er for extreme multi-label learning</article-title>
          .
          <source>In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14</source>
          , pages
          <fpage>263</fpage>
          {
          <fpage>272</fpage>
          , New York, NY, USA,
          <year>2014</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Lei</surname>
            <given-names>Tang</given-names>
          </string-name>
          , Suju Rajan, and
          <article-title>Vijay K Narayanan</article-title>
          .
          <article-title>Large scale multi-label classi cation via metalabeler</article-title>
          .
          <source>In Proceedings of the 18th international conference on World wide web</source>
          , pages
          <volume>211</volume>
          {
          <fpage>220</fpage>
          . ACM,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>George</surname>
            <given-names>Tsatsaronis</given-names>
          </string-name>
          , Michael Schroeder, Technische Universitt Dresden, Georgios Paliouras, Yannis Almirantis, Eric Gaussier, Patrick Gallinari, Thierry Artieres,
          <string-name>
            <surname>Michael R. Alvers</surname>
          </string-name>
          , Matthias Zschunke, Transinsight Gmbh, and Axel cyrille
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>