Introduction

UMass at BioASQ 2014: Figure-inspired Text Retrieval

Jesse Lingeman

lingeman@cs.umass.edu 0

Laura Dietz

dietz@cs.umass.edu 0 0 School of Computer Science, University of Massachusetts , Amherst , USA

1296 1310

Building on our experience with retrieval of gures, gure summarization with sentences from text, we study the utility of gurebased features and techniques for text retrieval. Figure based approaches are compared to approaches using abstracts instead of gures. We also explore two dierent relevance models: one built using the Unied Medical Language System (UMLS) and one built using Wikipedia. We conduct several experiments exploring dierent feature combinations using a model built with the TREC Genomics track for submission to the 2014 BioASQ competition.

Introduction

The BioASQ competition is about answering biomedical questions by extracting information from research publications on Pubmed. BioASQ oers several subtasks to participate in: retrieving Pubmed documents that contain an answer, retrieving snippets from those documents that contain an answer, retrieving relevant concepts or RDF triples, and extracting the answer from all retrieved material.

In a cooperation between the Center for Intelligent Information Retrieval and UMass Amherst and the BioNLP group at UMass Medical school in Worcester, we developed a gure-inspired text retrieval method as a new way of retrieving documents and text passages from biomedical publictions. Our method is based on the insight that for biomedical publications, the gures play a central role up to the point where their caption and references provide abstract-like summaries of the paper. In this work we build on our experience with gure summarization and gure ranking algorithms [ 5,8,1 ].

We are test driving our gure-inspired retrieval method in the BioASQ competition, where we focus our participation on document and snippet retrieval. As gures are the center of our attention, our methods rely on the availability of full text, e.g. in PMC format. Therefore we only retrieve documents and snippets contained in Pubmed Central. We notice that the available training data covers Pubmed Central only sparsely. Most queries in the gold standard contain just one publication from Pubmed Central; only 13 queries contained at least 10 documents in Pubmed Central. Since it is infeasible to dene a complete gold standard ahead of time, our mission is to identify new material from PMC 5319ac18b166e2b806000030 Is clathrin involved in E-cadherin endocytosis? plasma membranes we have found here that non-trans-interacting e-cadherin is constitutively endocytosed like integrin ligand-independent endocytosis that the formation of endocytosed vesicles of e-cadherin is clathrin dependent and that e-cadherin but not other cams at ajs and tjs including nectins claudins and occludin is selectively sorted into the endocytosed (PMC 15263019) 5319abc9b166e2b80600002d Is Rac1 involved in cancer cell invasion? cells was clearly demonstrated by rna interference assay rac1 depletion signicantly suppressed the frequency of invasion in both quiescent and igf-i-stimulated mda-mb-231 cells this indicates the necessity of rac1 for igf-i-induced cell invasion in the cells overexpression of rac1 has been (PMC 21961005) that answers the questions. To demonstrate the existence of relevant material we show examples of relevant snippets in Table 1 and provide more examples in the result section.

In the absence of suitable training data on full documents, we develop and train our method on data from TREC Genomics track 2006 and 2007. Like Bioasq Task 2b(phase A), the Genomics TREC task focuses on retrieving relevant documents and snippets for biomedical questions. The distinctions lie in the use of the Highwire corpus. After training supervised models on the TREC data, they are applied to questions posed in the BioASQ competition.

Our approach takes an Information Retrieval perspective on the problem. First, query expansion is performed with information from UMLS, Wikipedia, and Figures to enrich the question. Second, a ranking of full documents and snippets is retrieved from a corpus of articles from Pubmed Central. Third, we extract features for each document and snippet that indicate its relevance for the question and re-rank document/snippets with a supervised learning-to-rank approach. 2

Background: Information Retrieval This section introduces document retrieval models and query expansion techniques. 2.1

Sequential Dependence Model An early IR method called query likelihood employed an independence assumption within query terms to score documents with Dirichlet collection smoothing. For query terms q1; q2; :::qm, each document D in the collection is scored by a product of scores under each query term.

We use the notation ’ ’ to denote sums over all possible entries. In particular #(qi; D) refers to the term frequency of qi in the given document, #(qi; ) refers to the term frequency of qi in the corpus, and #( ; D) is the document length and #( ; ) number of terms in the collection. The scalar controls the amount of collection smoothing applied, and is a hyperparameter to be estimated. Good values of are in the range of [500; 5000].

The query likelihood model is almost always outperformed by the sequential dependence model [ 6 ], which also includes exact bigrams and windowed skip-bigrams. The unigram model above can be generalized to arbitrary count statistics, such as occurrences of a bigram "qi qi+1" in document D to derive scorebi. Furthermore, counting co-occurrences of the two terms qi and qi+1 in any order within a window of 8 terms in the document gives rise to the score under the windowed bigram model scorewbi, where the marginal counts in the denominator #( ; D) are approximated by the document length.

The sequential dependence model combines the scores of the document D under the unigram, bigram and window model as a log-linear model. scoreSDM (q1;q2;:::qm)(D) = uniscoreuni(D) + biscorebi(D) + wbiscorewbi(D) = < ; (D) > (2)

The sequential dependence model requires setting of hyperparameters uni; bi, wbi, and , where the s can be estimated with machine learning. 2.2

Query Expansion Keyword-based retrieval methods such as query likelihood and sequential dependence fail to retrieve documents that refer to the query terms via synonyms. A solution is to expand the original query q1; q2; :::qm with additional terms t1; t2; :::tK so-called expansion terms. Methods for predicting expansion terms ti also provide condence weights wi.

An expanded SDM query scores documents D by scoreQ(D) = scoreSDM (q1;q2;:::qm)(D) + ! X wi scoreuni (ti)(D) i (3)

The expanded retrieval model introduces another hyperparameter !, which can be estimated along with using machine learning. 2.3

Pseudo-relevance Feedback Additional expansion terms can be derived from external synonym resources or estimated with pseudo-relevance feedback. In pseudo relevance feedback the expansion terms are estimated from the document collection [ 3 ]. The approach is based on the assumption that the un-expanded retrieval model obtained high precision in the top ranks, but was lacking recall.

The procedure gathers a feedback ranking D1; D2; :::; Dn from the documents from the collection which have the highest score under the un-expanded query, e.g. scoreSDM(D).

The next step derives distribution over terms from the feedback documents. This involves taking the score of the document Di to approximate a relative retrieval probability of Di compared to the rest of the feedback set. p(Dijq1; :::; qm) =

1 Pn j=1 exp scoreSDM(Dj ) exp scoreSDM(Di)

In addition, for each feedback document, a distribution over terms is derived as a language model.

These two parts are aggregated to estimate the term distribution for expansion. We derive the estimator as a mixture of document-specic language models where the document retrieval probabilitie govern the mixing weights. #(t; Di) p(tjDi) / #( ; Di) p(t) = n X p(tjDi)p(Dijq1; :::; qm) i=1 (4) (5) (6)

The K most probable terms ti under this distribution, together with weights w = p(ti) are predicted as expansion terms. 2.4

Learning Hyperparameters We exploit that a SDM retrieval model with query expansion falls into the family of log-linear models which can be eciently estimated with a learning-to-rank approach [ 7 ]. We represent each document by a feature vector with four entries: the document’s score under the unigram model, as well as the bigram, windowbigram, and expansion model. We use the document relevance assessments from the training set to estimate a log-linear learning-to-rank model.

In this work we use the coordinate ascent learner from the RankLib 1 package optimizing for the metric mean-average precision (MAP).

The weights of the optimal learning-to-rank model are also the optimal settings uni; bi, wbi and ! for the retrieval model. When the SDM model is expanded with multiple expansion models this learning-to-rank approach can be generalized appropriately.

This reduces the hyperparameters that need to be estimated by grid-tuning to the Dirichlet smoothing for SDM, and number of feedback document n and number of expansion terms K for each expansion model. 1 http://people.cs.umass.edu/~vdang/ranklib.html

Retrieval Approaches

In this section we detail how retrieval and query expansion approaches are combined to leverage gure information to derive a rst pass of bio-medical text retrieval. We discuss reranking techniques in Section 4. We refer to the target document collection as full documents, as we further extract pseudo-documents for gures and abstract. 3.1

Indexes From the full documents in the collection, we create dierent retrieval indexes.

The full document index contains the documents in Pubmed Central document collection. The task is to retrieve relevant documents from this collection. The collection is converted into JSON format using the convertion tool provided by the BioASQ organizers. We index the all visible text as-is while preserving character osets and section information. The document preprocessing uses a special tokenizer that preserves the names of chemical compounds, genes and pathways.

We identify all gures in the original Pubmed central format and extract socalled figure documents for each of them. The gure document includes the caption of the gure, the sentences that reference the gure. In separate elds we also include sentences within a window of one and two sentences away from a gure reference. We use the gure documents for query expansion and feature generation.

In order to compare the expressiveness of gure documents to abstracts, we also create an index of abstracts that we swap in as a replacement for gure documents. 3.2

Document Retrieval The most basic retrieval method uses the given query Q to obtain a ranking of full documents under the sequential dependence model. This ranking can be output directly [UMass-irSDM], or submitted to a feature-based re-ranking method (described in Section 4).

We can improve the ranking by expanding the original query with expansion terms (to obtain query Q0) to derive a ranking the full documents. To expand the query with pseudo-relevance feedback, we have dierent options. We can employ the gure document index [FigDoc Query Expansion] to retrieve a feedback run, compute term distributions according to the relevance model and expand the query Q. This approach is also applied to the index of abstract documents to derive the method [Abstract Query Expansion].

As an external source of synonyms we can also use Wikipedia. For that we create a full text index of a Wikipedia snapshot from January 2012 which contains articles for dierent entities, where some are targeting the biomedical domain. We cast the original query to our Wikipedia index and apply standard pseudo-relevance feedback [Wiki Query Expansion].

Alternatively, we expand the query using an external synonym dictionary. In this study we use the Unied Medical Language System (UMLS) [ 4,2 ]. We look up all query terms qi and all query bi-grams qiqi+1 in the UMLS dictionary to build a pool of expansion terms. Prioritizing for terms that are returned by more than one lookup, we identify K expansion terms [UMLS Query Expansion].

In all approaches we learn the SDM parameters and expansion weight ! using 25% of the TREC Genomics queries as training data. We tune the hyperparameter of the sequential dependence model using grid-tuning on another 25% of the TREC queries as validation data. We select the maximal and according and ! and keep it xed for the remainder of the experiment. 3.3

Snippet Retrieval To participate in the snippet retrieval task, the goal is to break down the relevant documents into passages that are likely to contain the answer. In the eld of Information Retrieval this problem is known under the name Answer-Passage Retrieval.

The passage retrieval approach applies the document retrieval model to consecutive text segments inside the document, to create a ranking on the subdocument level. We chose a granularity of 50 words, which are shifted through the document in increments of 25 words. For eciency reasons we only consider documents in the high ranks for passage retrieval.

For each document, we only consider the highest ranking passage (called Max-Passage) in the following. 4

Feature-based Re-ranking Approaches The ranking of full documents created by methods in Section 3 can be further improved with a supervised re-ranking approach. We use four main classes of features. IR Features (Table 2) are derived from the retrieval score under the unigram, bigram, windowed bigram, and expansion model. The Fiat Document Features (Table 3) are based on similarity measures between the query and a semi-structured representation of the full document. Figure captions are included in the text, but not regarded in any special way. The Fiat Figure Features (Table 4) are designed to capture similarity of the query to gure-related information available in the semi-structured document. The fourth category are Figure Document Features (Table 5) which are derived by retrieving gure documents (or abstracts), generate features for every gure, and aggregating across gures within the same document. A full list of features can be found in the appendix.

The main idea behind the gure and gure document features is to use gures as a way to easily isolate important text. There is a lot of technical content in articles, such as related work sections or details on the experimental setup, that are not necessarily relevant to the question being asked and can skew search results. Figures and gure-related passages, on the other hand, are usually describing

Type Description FigDoc Average score of gure documents for a given document FigDoc Average rank of gure documents for a given document FigDoc Total number of gure document returned FigDoc Number of gure documents returned at rank 1 FigDoc Number of gure documents returned at rank 3 FigDoc Number of gure documents returned at rank 5 FigDoc Number of gure documents returned at rank 10 FigDoc Number of gure documents returned at rank 20 FigDoc Number of gure documents returned at rank 50 FigDoc Number of gure documents returned at rank 100 FigDoc Number of gure documents returned at rank 1000 FigDoc Maximum score of returned gure documents FigDoc Minimum rank of returned gure documents FigDoc Average reciprocal rank of returned gure documents

FigDoc Maximum reciprocal rank of returned gure documents an important nding of the article. Here, we use the index of gure documents to extract features capturing the essence of ndings. The query is issues against the FigDoc index and we keep track how many and at which rank we retrieve gures for the respective document. We also keep track whether high ranking gures are referenced from the highest scoring passage, and measure the textual similarity between passage and high ranked captions. This allows to separate the false positives from the true positives: an article may be highly ranked because of something discussed in the related work or future work sections, however an article that may be slightly lower ranked but has relevant gure documents may be the more relevant document.

We also use features considering the document as a whole. We generate binary values for quality indicators, e.g., whether a document has gures, citations, and tables. We also generate features about the passages, such as number of gure references, number of citation references, number of table references, and the sum of all references in a passage. Binary features are also calculated for whether or not a passage is in a gure caption or in a document abstract.

Most of the generated features compare the tokens in the query to the tokens of some part of the document. Two measures are used to do this: Query Cover and TF-IDF. Query Cover is a simple proportion of how many of the query tokens appear in a particular part of the document. TF-IDF is similar, but each token is weighted by how frequent it appears in the corpus. If a token does not frequently appear in the corpus, but appears often in a part of the document, it gets a higher score than if it is a common token in the corpus. These measures are evaluated over dierent segments of the document: we obtain scores by comparing the query to the document abstracts, sentences in the document that reference a gure, a window of sentences around a gure reference, gure captions, and sentences in the document that reference a citation or table. 5

Experimental Evaluation

We train and validate our methods on test sets of the TREC Genomics track from the years 2006 and 2007. Both test sets make use of a collection of 162,259

IR coD igF igD ll M kF knA oRM SD RM rakn rakn rakn rean rea lln IR IR eR eR eR R R A

X X X X X X X X X X X X X X

X X X X X X X X X X X X X X X

X X X X

X X X

X X X documents from 59 biomedical journals published by Highwire Press. The documents are made available as raw HTML with several download errors and partial documents. The 2006 collection comprises 27 queries and the 2007 collection include 35 queries.

In the following, we make use of a development set comprising the union of the rst half of queries from both 2006 and 2007 test collections for feature development and hyperparameter tuning. We report results on both the development set and the combined test sets from 2006 and 2007.

0.45 0.40 0.35 T0.30 N E0.25 M CU0.20 O D0.15 0.10 0.05 0.00

IR SDM

IR RM

Re rank IR

rank Do Re c

Re rank Fig

Rerank FigDoc Rerank All ll no RM A 5.1

Retrieval Hyperparameters Settings of hyperparameters for retrieval models are determined on the BioASQ training data, which we further subdivide into a 50% training-fold for log-linand a 50% validation-fold. We train the sequential dependence parameters uni; bi, wbi and relevance model balance-weight ! in log-linear model fashion with coordinate ascent (using the RankLib package) on the training fold. We tune the Dirichlet smoothing parameter on a selection of 100, 1000, 2000, 2500, 3000 on the validation fold.

The parameter settings change with the system. As we aggregate more BioASQ training data from the previous batch submissions (query for task 2b phase b), the parameters also change across batches. A detailed list of which parameter has been used in which batch is given in Table 9. 5.2

Retrieval and Reranking Methods We study the impact of dierent components on the overall document retrieval eectiveness, by omitting some components from the pipeline as indicated in Table 7. The most complete method, referred to as All-Figdoc-UMLS includes all elements of our pipeline: query expansion on the Figure Document index, retrieval of full documents with the expanded query, generation of various features for re-ranking. The feature sets include scores from the IR system as well as text-only features in addition to gure-related features as extracted from the full documents and Figure Documents. 5.3

Training Supervised Re-ranking on TREC Genomics As only few BioASQ training queries have more than 10 positive documents in the Pubmed Central collection, we were hesitant to train the supervised reranking model on it. We learn the parameter vector for feature-based reranking on the TREC Genomics queries test set, using years 2006 and 2007 on the corpus of Highwire publications. We use 50% of the TREC queries for learning the supervision. As the supervision depends on IR hyperparameters, we apply the tuning heuristic above to 25% of the TREC queries (yielding uni = 0:77, bi = 0:005, wbi = 0:037, ! = 0:20 and = 2500). 5.4

Evaluation on TREC Genomics We study dierent components of our methods on TREC Genomics holdout set. We evaluate the Rerank All method (corresponding to system All-FigdocUMLS) method compared to variants of this approach that omit certain feature classes or steps in the retrieval pipeline. An overview of the evaluated methods is given in Table 6.

The ocial evaluation metric of the TREC Genomics test set is mean-average precision (MAP) on the document ranking. The results on the development set are presented in Figure 1. We see that the re-ranking approaches gain a decent boost, whereas the dierences between dierent feature sets are neglegible. With a paired-t-test at signicance level = 5%, we verify that Rerank All and Rerank Doc yield signicant improvements over both IR baselines (despite the overlap in error bars). 5.5

Submission to BioASQ We restrict all rankings to the top 20 documents, and for each document we provide the best scoring snippet, yielding 20 snippets per system and query. We score snippets with the same retrieval model that we use for document retrieval.

Inspecting all top 50 documents, for each document we create snippet candidates by a sliding window of 50 terms (shifted by 25 terms) and only return the snippet with the highest score under the expanded retrieval model. The snippets are reranked by the retrieval score under the passage model and we only output the top 20 snippets. This means, that some snippets might stem from new documents.

The term windows are converted to section IDs and character osets. In the batch 1 submission, we did not incorporate whitespaces and XML formatting correctly. This has been corrected for all remaining batches.

M D S r i s s a M U

S L M U c o d g i F c o

D X

S L M U c o d g i F l l

A X c o d g i F l l

B1, B2 B1, B2 B1, B2 B3, B4, B5 B3, B4, B5 B3, B4, B5

X X X X X X X X X X X X

X X

X X X X X

S L M U t c a r t s b A l l

X B1-B5

X X X X X X X

We modied the some components across dierent submitted batches, to maximize our knowledge gain in the light of the limitation to 5 submission systems. In particular we varied the query expansion with external sources, from using UMLS to Wikipedia. This change is indicated in Table 7.

Timing. The methods were run on a gridengine cluster each node having a 2.21GHz Intel Xeon CPU with 10GB of RAM (much more than necessary). Averaging the CPU time of 100 queries, we observe 21 seconds for irSDM, 35 seconds for All-FigDoc-UMLS (with Wikipedia Expansion), 41 seconds All-AbstractUMLS, 25 seconds for All-FigDoc, 36 seconds for Doc-Figdoc-UMLS. Results. After observing an abysmal score for all our systems on the ocial preliminary results, we manually inspected the quality of predicted snippets on rank one and two in 25 queries of batch 5 obtained by the irSDM method. Table 10 displays some of the relevant snippets. We notice that many of the documents are not listed in the gold standard. An exception are the query on archeal genomes where we found a much more descriptive snippet than the one provided in the gold standard, and the query on Gray paleted syndrome, where our passage includes the ground truth passage.

We perform a more elaborate annotation on a subset of nine queries from batch 3 (irSDM). The results, measured in snippet precision at rank 10 (P@10) are presented in Table 8. We see that the precision varies between 10% and 70%, but all queries have a non-zero precision. One of our common mistakes occurs when questions ask about a particular brand of medicine or active ingredient. We notice that in such cases, a large percentage of retrieved snippets are about the disease in general, but do not mention the brand or ingredient. In the future, we intend to modify our approach by identifying such required words with an NLP tagger such as conditional random elds and discard snippets that do not contain the required word. average 0.1 0.2 0.3 0.6 0.7 0.4 0.1 0.2 0.2 0.3

Conclusion

For the UMass BioASQ submission we designed a gure-aware IR system which includes search-indexes of full document as well as gure captions and references. We use gures both as a resource for query expansion and test external source such as Wikipedia and UMLS as well. The retrieval approach is complemented by a supervised learning-to-rank method the includes features from IR, the document, gure features, and features from retrieving gure documents.

We evaluate against a very strong text-only baseline, which is outperformed on our development test set from the TREC Genomics track. We anticipate that including features from the gure-documents in both the retrieval methods and in reranking will improve the ranking of both document and snippets. Acknowledgements This work was supported in part by the Center for Intelligent Information Retrieval, in part by Umass Medical School subaward RFS2014051 under National Institutes of Health grant 5R01GM095476-04. Any opinions, ndings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reect those of the sponsor. 5319abb166e2b80600002f Which growth factors are known to be involved in the induction of EMT? in emt induction additionally non-smad signaling pathways activated by tgf-? and cross-talk with other signaling pathways including broblast growth factor fgf and tumor necrosis factor-? tnf-? signaling play important roles in emt promotion induction of emt in tumor stromal cells by (PMC 22111550, rank 1) 5319ac18b166e2b806000030 Is clathrin involved in E-cadherin endocytosis? plasma membranes we have found here that non-trans-interacting e-cadherin is constitutively endocytosed like integrin ligand-independent endocytosis that the formation of endocytosed vesicles of e-cadherin is clathrin dependent and that e-cadherin but not other cams at ajs and tjs including nectins claudins and occludin is selectively sorted into the endocytosed (PMC 15263019, rank 1) 5319abc9b166e2b80600002d Is Rac1 involved in cancer cell invasion? cells was clearly demonstrated by rna interference assay rac1 depletion signicantly suppressed the frequency of invasion in both quiescent and igf-i-stimulated mda-mb-231 cells this indicates the necessity of rac1 for igf-i-induced cell invasion in the cells overexpression of rac1 has been (PMC 21961005, rank 1) 5311bcc2e3eabad021000005 Describe a diet that reduces the chance of kidney stones. stone promoters and inhibitors reducing deposition and excretion of small particles of caox from the kidney maintaining the antioxidant environment and reducing the chance of them being retained in the urinary tract number of herbal extracts and their isolated constituents have also shown (PMC 23112535, rank 1) for age study on the relationship of an animal-rich diet with kidney stone formation has shown that as the xed acid content of the diet increases urinary calcium excretion also increases the inability to compensate for animal protein-induced calciuric response may be risk factor for the (PMC 21369385, rank 2) 530cf4fe960c95ad0c000003 Could Catecholaminergic Polymorphic Ventricular Tachycardia (CPVT) cause sudden cardiac death? case of catecholaminergic polymorphic ventricular tachycardia introduction in reid et al.1 discovered catecholaminergic polymorphic ventricular tachycardia cpvt cpvt is known to cause syncope or sudden cardiac death and the three distinguishing features of cpvt has subsequently been described (PMC 19568611, rank 1) 52fe58f82059c6d71c00007a Do archaeal genomes contain one or multiple origins of replication? genomes in the genus bacillus such positive correlation cannot be explained by the pure c?u/t mutation bias archaeal genomes multiple replication origins are typically assumed for archaeal genome replication multiple origins of replication implies multiple changes in polarity in nucleotide (PMC 22942672, rank 1) 52e204a998d0239505000012 Which is the denition of pyknons in DNA? processed the sequences of the human and mouse genomes using the previously outlined pyknon discovery methodology see methods section as well as ref and generated the corresponding pyknon sets by denition each pyknon is recurrent motif whose sequence has minimum length minimum number of intact (PMC 18450818, rank 1) 52d8494698d0239505000007 Which genes have been found mutated in Gray platelet syndrome patients? nbeal2 is mutated in gray platelet syndrome and is required for biogenesis of platelet alpha-granules platelets are organelle-rich cells that transport granule-bound compounds to tissues throughout the body platelet ?-granules the most abundant platelet organelles store large proteins that when released promote platelet adhesiveness haemostasis and wound (PMC 21765412, rank 1) 52ce531f03868f1b06000031 Are retroviruses used for gene therapy? frequently employed forms of gene delivery in somatic and germline gene therapies retroviruses in contrast to adenoviral and lentiviral vectors can transfect dividing cells because they can pass through the nuclear pores of mitotic cells this character of retroviruses make them proper candidates (PMC 23210086, rank 2)

1. Agarwal , S. , Yu , H.: Figsum: automatically generating structured text summaries for gures in biomedical literature . In: AMIA Annual Symposium Proceedings . vol. 2009 , p. 6 .

American

Medical Informatics Association ( 2009 )

2. Bodenreider , O. : The Unied Medical Language System (UMLS): integrating biomedical terminology . Nucleic Acids Research 32 ( Database issue ), D267D270 (Jan 2004 )

3. Lavrenko , V. , Croft , W.B.: Relevance based language models . In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval . pp. 120127 . SIGIR '01, ACM , New York, NY, USA ( 2001 ), http://doi.acm. org/10 .1145/383952.383972

4. Lindberg , D.A. , Humphreys , B.L. , McCray , A.T. : The Unied Medical Language System . Methods of Information in Medicine 32 ( 4 ), 281291 (Aug 1993 )

5. Liu , F. , Yu , H. : Learning to Rank Figures within a Biomedical Article. PLOS ONE 9(3) (MAR 13 2014 )

6. Metzler , D. , Croft , W.B.: A markov random eld model for term dependencies . In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval . pp. 472479 . SIGIR '05, ACM , New York, NY, USA ( 2005 ), http://dx.doi.org/10.1145/1076034.1076115

7. Metzler , D. , Croft , W.B.: Linear feature-based models for information retrieval . Inf. Retr . 10 ( 3 ), 257274 (Jun 2007 ), http://dx.doi.org/10.1007/s10791-006-9019-z

8. Yu , H. , Liu , F. , Ramesh , B.P. : Automatic gure ranking and user interfacing for intelligent gure search . PLoS One 5 ( 10 ), e12983 ( 2010 )