<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic Classi cation of PubMed Abstracts with Latent Semantic Indexing: Working Notes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Joel Robert Adams</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steven Bedrick</string-name>
          <email>bedricks@ohsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Spoken Language Understanding Oregon Health and Science University</institution>
          ,
          <addr-line>3181 SW Sam Jackson Park Road, Portland, OR</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>1275</fpage>
      <lpage>1282</lpage>
      <abstract>
        <p>The 2014 BioASQ challenge 2a tasks participants with assigning semantic tags to biomedical journal abstracts. We present a system that uses Latent Semantic Analysis to identify semantically similar documents in MEDLINE to an unlabeled abstract, and then uses a novel ranking scheme to select a list of MeSH headers from candidates drawn from the most similar documents. Our approach achieved good precision, but su ered in terms of recall. We describe several possible strategies to improve our system's performance.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Biomedical journal articles are manually indexed in the National Library of
Medicine's MEDLINE database with semantic descriptors selected from the
Medical Subject Headings (MeSH) hierarchy. These descriptors are then used
as key features in traditional document retrieval systems such as PubMed, as
well as for document classi cation and recommendation (c.f. [
        <xref ref-type="bibr" rid="ref10 ref11 ref9">11, 9, 10</xref>
        ]) and even
for word-sense disambiguation[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This manual indexing process is both
timeconsuming and expensive[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and as a result the eld of automatic MeSH
indexing has a long and rich history(c.f. [
        <xref ref-type="bibr" rid="ref13 ref2">16, 2, 13</xref>
        ], just to name a few). The goal of
BioASQ Task 2a is to automatically assign MeSH index headings to un-tagged
MEDLINE abstracts.
      </p>
      <p>
        Previous researchers have tried a wide variety of approaches to this problem,
including discriminative classi ers such as Bayesian classi ers[15] and Support
Vector Machines[
        <xref ref-type="bibr" rid="ref3 ref6">6, 3</xref>
        ] as well as tools based on more traditional natural langauge
processing techniques[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We approach the problem from a document clustering
perspective, based on the observation that similar documents often share MeSH
terms. For example two articles about treatments prolonging survival of
patients with Glioblastoma, one tagged with 15 MeSH descriptors and the other
with 17, share 10 of these terms. This work presents a system that uses
Latent Semantic Analysis (LSA)1 to identify semantically \similar" articles to an
1 Described in brief in section 2.2; for a more complete description of the technique,
see in Furnas, et al.[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
unlabeled (\query") abstract. Given this set of similar abstracts, we use the
human-assigned MeSH descriptors of these similar abstracts to build a set of
candidate MeSH descriptors. We then use distributional features of these
descriptors to attempt to rank the most likely descriptor candidates for our query
abstract.
2
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <sec id="sec-2-1">
        <title>Data Selection</title>
        <p>Due to the large size of the training data, and the changing nature of the MeSH
tree, we chose to focus only on the documents included in the list of 1,993 journals
that BioAsq has identi ed as having \small average annotation periods", and
only include descriptors which appear in the 2014 edition of MeSH. As such,
we trained on a subset of the provided Training Set v.2014b, considering only
journal articles from 2005 and later.</p>
        <p>For development purposes, we used a 90/10 train/test split. For our BioASQ
submissions, we tested on the entire training set.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Latent Semantic Analysis</title>
        <p>
          Latent Semantic Analysis (LSA) is a technique for analyzing semantic
relationships between documents. It is an extension of standard vector-space retrieval[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]
that is more robust in the face of synonymy[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. LSA has been applied to a wide
variety of information retrieval tasks, ranging from standard ad-hoc retrieval[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
to cross-language information retrieval[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Using LSA, one may perform
vectorspace retrieval on a low-rank approximation of a term-document matrix, in which
\related" words end up grouped together (and are therefore retrieved together).
The combination of dimensionality reduction and semantic grouping make LSA
a natural t for the problem of computing document similarity for automatic
indexing.
        </p>
        <p>LSA produces this matrix approximation using the singular value
decomposition (SVD). The SVD e ectively \splits" a term-document matrix X into
three new matrices, T , S, and D, which may be multiplied together in order to
re-create the original matrix (X = T SD0). The S matrix contains the
\singular values" of X, and T and D map terms and documents (respectively) onto
singluar values. By multiplying more or less complete subsets of the decomposed
matrices, one may create more or less accurate approximations of the original
matrix.</p>
        <p>Given the LSA-produced approximation of the term-document matrix, and
a query document, one may perform retrieval as follows. The query document is
transformed into a term vector, and this vector is projected into the LSA space.
Then, one may use standard vector-space retrieval techniques to score low-rank
approximations of corpus documents with the transformed query document.</p>
        <p>
          Our implementation begins by pre-processing MEDLINE abstract using the
Python Natural Language Toolkit (NLTK) library.2 We use NLTK's
implementation of the Punkt sentence tokenizer[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] along with the standard NLTK word
tokenizer. As part of pre-processing, we removed words found in the standard
NLTK English stop word list.
        </p>
        <p>We next used the Gensim library3 to produce a term-document matrix, in
which each \row" represents a term, and each \column" represents a document
(i.e., a MEDLINE abstract), and the values in cells represent occurrance counts.
We then weighted the counts by their normalized TF/IDF scores, and ran LSA
on the resulting matrix. Since the point of LSA is to produce a low-rank
approximation of the complete term-document matrix, users of LSA must set an
operating point of how approximate they wish their new matrix to be. We
(somewhat arbitarily) use a the rst 200 ranks of our transformed matrix.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Choosing Closest Neighbors</title>
        <p>Once the similarity value is calculated for the new document, its n-closest
neighbors are calculated. Based on an initial tuning experiment, a provisional value
for n was set at 20. A minimum similarity threshold of :1 was chosen to avoid
considering documents with 0 or negative cosine similarity.</p>
        <p>The MeSH descriptors associated with these neighbors are the candidates for
our new abstract.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>MeSH Descriptor Selection</title>
        <p>For our initial submission (Test batch 3, week 4) we developed a simple scoring
algorithm to rank the candidate descriptors based on the following assumptions:
1. All else being equal, a MeSH term associated with a more similar document
should have a greater contribution to the score than a heading from a less
similar document.
2. Terms which appear more frequently in neighboring documents are better
candidates than those which only occur a single time.
3. This second point is mediated by the fact that some MeSH headings, such as
the check tag \Human" are much more frequent in the corpus than others, so
neighbors sharing one of these contributes less information than les sharing
a more obscure header.</p>
        <p>Let our n neighboring documents d1; d2; : : : ; dn be represented as the ordered
pairs di = (si; Mi) where si represents the cosine similarity between document i
and the new abstract, and Mi is the set of MeSH terms associated with document
i.
2 http://www.nltk.org/
3 http://radimrehurek.com/gensim/</p>
        <p>Then for any MeSH header m in our set of candidates, we can de ne a
weighted frequency f (m) as:
f (m) =
n
X e(i) si :
i=1
e(i) =
(1 if m 2 Mi</p>
        <p>0 otherwise .
idf (m) = log(</p>
        <p>N
1 + C
)
Where:</p>
        <p>And de ne an inverse document frequency idf (m) over the training corpus:
Where N is the number of documents in the training corpus and C is the number
of documents in the training corpus which contain m.</p>
        <p>Then our score for term m is:
score(m) = f (m) idf (m)
(1)
(2)
(3)
(4)</p>
        <p>We then assign a lower threshold of 1:5 and return the highest scored MeSH
headers up to a maximum of 12 headers.
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results and Discussion</title>
      <sec id="sec-3-1">
        <title>Flat Measures</title>
        <p>The Micro-Precision score of our system outperforms the BioASQ baseline
system { MTI and MTI First Line Index. This suggests that the Latent Semantic
Indexing approach is returning semantically relevant MeSH headings.</p>
        <p>However, our system consistently performs below baseline on Micro-Recall
and, due to this, the Micro-F measure.</p>
        <p>This seems consistent with our scoring approach. As an example, let us
consider Table 2 which lists the candidates and scores for a document which was
manually labelled with the following MeSH descriptors: `C-Reactive Protein',</p>
        <p>MeSH Descriptor</p>
        <p>Score
`Cardiovascular Diseases', `Female', `Haplotypes', `Humans', `Male', `Middle
Aged', `Renal Dialysis', `Risk Factors'.</p>
        <p>The horizontal line below the term `Female' marks the 12 term threshold.
Ellipses mark where terms were removed for clarity. Terms in the actual list of
headers are marked in bold.</p>
        <p>In this particular example, a total of 147 candidate terms were considered.
The candidate list includes all of the MeSH terms that were manually applied
to the abstract. However, our current selection criteria excludes `Male' due to
our choice of assigning a maximum of 12 terms, and further excludes three
more potential true positives from consideration because their score is below our
chosen threshold of 1.5.</p>
        <p>Simply increasing the ceiling on the number of allowed MeSH headers would
allow the term `Male'. However, not without reducing precision. As such,
modications will need to be made to the scoring rule to improve scores for relevant
terms like `Male' and `Haplotypes' while reducing irrelevant terms like `Ankle
Brachial Index'.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Hierarchical Measures</title>
        <p>In Table 3 you can see our performance in the hierarchical Lowest Common
Ancestor measures. Again, our system's precision is competitive with the BioASQ
baseline system but recall is lower.
For the two submissions that we entered, both training and evaluation were
performed on a single 2.9 GHz MacBook Pro with 8GB of memory. Under those
conditions, training the LSA model took approximately 6 hours, and once that
was complete, the system could generate MeSH headers for approximately 20
abstracts ??er minute.</p>
        <p>This made evaluating changes over the system unwieldy. Subsequently both
the training and the assignment of headers have been updated to run on a cluster,
but we have yet to evaluate the performance gains.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Future Work</title>
      <p>The results for the system are encouraging, and suggest that this is a viable
approach to semantic tagging. However there are a number of potential avenues
for improvement that we will continue to explore.</p>
      <p>The scoring and selection of candidates naively seems to be the area where
the largest gains could be made, particularly in recall. We'll begin by separating
the features of cosine similarity and term frequency in the candidate set, in order
to allow for separate weighting of these features.</p>
      <p>In addition, we are experimenting with adding a feature to track whether a
candidate is a major MeSH term in the relevant training document, as these are
should represent the primary concern of a given article.</p>
      <p>Another potential source of information is the hierarchical structure of MeSH
terms. Once a candidate set is chosen, leveraging the structure of the MeSH tree
should help us to reduce cases of over and under-specialization</p>
      <p>There is also room for improvement in the LSA model. The list of stopwords
should be given some consideration. Numbers without context seem largely
irrelevant in this case, and section headers which appear in some but not all
PubMed abstracts (such as `RESULTS' and `CONCLUSION' ) should
probably be ignored. In addition, we are investigating stemming and normalization of
acronyms to improve document matching.</p>
      <p>Finally, there are a number of variables that could be tuned. We are
investigating the e ects of both varying the number of similar documents considered,
and replacing n-closest with a similarity threshold for documents. Similarly, we
are investigating removing the hard-ceiling on number of MeSH terms associated
with an abstract, and instead basing this decision on the distribution of scores
among the candidates.</p>
      <p>This investigation is still fairly preliminary . We'll continue to re ne and
document the system going forward.
15. Sohn, S., Kim, W., Comeau, D.C., Wilbur, W.J.: Optimal training sets for Bayesian
prediction of MeSH assignment. Journal of the American Medical Informatics
Association : JAMIA 15(4), 546{553 (Jul 2008)
16. Trieschnigg, D., Pezik, P., Lee, V., de Jong, F., Kraaij, W., Rebholz-Schuhmann,
D.: MeSH Up: e ective MeSH text classi cation for improved document retrieval.
Bioinformatics (Oxford, England) 25(11), 1412{1418 (Jun 2009)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aronson</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bodenreider</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>H.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Humphrey</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mork</surname>
            ,
            <given-names>J.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nelson</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          , Rind esch, T.C.,
          <string-name>
            <surname>Wilbur</surname>
            ,
            <given-names>W.J.:</given-names>
          </string-name>
          <article-title>The NLM Indexing Initiative</article-title>
          .
          <source>Proceedings / AMIA Annual Symposium AMIA</source>
          Symposium pp.
          <volume>17</volume>
          {
          <issue>21</issue>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Aronson</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lang</surname>
            ,
            <given-names>F.M.:</given-names>
          </string-name>
          <article-title>An overview of MetaMap: historical perspective and recent advances</article-title>
          .
          <source>Journal of the American Medical Informatics Association : JAMIA</source>
          <volume>17</volume>
          (
          <issue>3</issue>
          ),
          <volume>229</volume>
          {236 (May
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hofmann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Hierarchical Document Categorization with Support Vector Machines</article-title>
          .
          <source>In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management</source>
          . pp.
          <volume>78</volume>
          {
          <fpage>87</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Deerwester</surname>
            ,
            <given-names>S.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumais</surname>
            ,
            <given-names>S.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Landauer</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Furnas</surname>
            ,
            <given-names>G.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harshman</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          :
          <article-title>Indexing by latent semantic analysis</article-title>
          .
          <source>JASIS</source>
          <volume>41</volume>
          (
          <issue>6</issue>
          ),
          <volume>391</volume>
          {
          <fpage>407</fpage>
          (
          <year>1990</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Furnas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deerwester</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumais</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Landauer</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harshman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Streeter</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lochbaum</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Information retrieval using a singular value decomposition model of latent semantic structure</article-title>
          .
          <source>SIGIR '88: Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval (May</source>
          <year>1988</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Jimeno</given-names>
            <surname>Yepes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Mork</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.G.</given-names>
            ,
            <surname>Wilkowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Aronson</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.R.</surname>
          </string-name>
          :
          <article-title>MEDLINE MeSH Indexing: Lessons Learned from Machine Learning and Future Directions</article-title>
          .
          <source>In: Proceedings of the 2Nd ACM SIGHIT International Health Informatics Symposium</source>
          . pp.
          <volume>737</volume>
          {
          <fpage>742</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Jimeno-Yepes</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McInnes</surname>
            ,
            <given-names>B.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aronson</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          :
          <article-title>Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation</article-title>
          .
          <source>BMC Bioinformatics</source>
          <volume>12</volume>
          ,
          <issue>223</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kiss</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strunk</surname>
          </string-name>
          , J.:
          <source>Unsupervised Multilingual Sentence Boundary Detection. Computational Linguistics</source>
          <volume>32</volume>
          (
          <issue>4</issue>
          ),
          <volume>485</volume>
          {525 (Dec
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , DiCuccio,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Grigoryan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Wilbur</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          :
          <article-title>Navigating information spaces: A case study of related article search in PubMed</article-title>
          .
          <source>Information Processing and Management</source>
          <volume>44</volume>
          (
          <issue>5</issue>
          ),
          <volume>1771</volume>
          {
          <fpage>1783</fpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilbur</surname>
            ,
            <given-names>W.J.:</given-names>
          </string-name>
          <article-title>PubMed related articles: a probabilistic topic-based model for content similarity</article-title>
          .
          <source>BMC Bioinformatics 8</source>
          ,
          <issue>423</issue>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>A document clustering and ranking system for exploring MEDLINE citations</article-title>
          .
          <source>Journal of the American Medical Informatics Association : JAMIA</source>
          <volume>14</volume>
          (
          <issue>5</issue>
          ),
          <volume>651</volume>
          {
          <fpage>661</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Littman</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumais</surname>
            ,
            <given-names>S.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Landauer</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          :
          <article-title>Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing</article-title>
          . In: Grefenstette,
          <string-name>
            <surname>G</surname>
          </string-name>
          . (ed.)
          <source>CrossLanguage Information Retrieval: The Spring International Series on Information Retrieval</source>
          , pp.
          <volume>51</volume>
          {
          <fpage>62</fpage>
          . Springer (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Ruch</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Automatic assignment of biomedical categories: toward a generic approach</article-title>
          .
          <source>Bioinformatics</source>
          (Oxford, England)
          <volume>22</volume>
          (
          <issue>6</issue>
          ),
          <volume>658</volume>
          {664 (Mar
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Salton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>C.S.:</given-names>
          </string-name>
          <article-title>A vector space model for automatic indexing</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>18</volume>
          (
          <issue>11</issue>
          ) (
          <year>Nov 1975</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>