<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UMass at BioASQ 2014: Figure-inspired Text Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jesse Lingeman</string-name>
          <email>lingeman@cs.umass.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laura Dietz</string-name>
          <email>dietz@cs.umass.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science, University of Massachusetts</institution>
          ,
          <addr-line>Amherst</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>1296</fpage>
      <lpage>1310</lpage>
      <abstract>
        <p>Building on our experience with retrieval of gures, gure summarization with sentences from text, we study the utility of gurebased features and techniques for text retrieval. Figure based approaches are compared to approaches using abstracts instead of gures. We also explore two dierent relevance models: one built using the Unied Medical Language System (UMLS) and one built using Wikipedia. We conduct several experiments exploring dierent feature combinations using a model built with the TREC Genomics track for submission to the 2014 BioASQ competition.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The BioASQ competition is about answering biomedical questions by extracting
information from research publications on Pubmed. BioASQ oers several
subtasks to participate in: retrieving Pubmed documents that contain an answer,
retrieving snippets from those documents that contain an answer, retrieving
relevant concepts or RDF triples, and extracting the answer from all retrieved
material.</p>
      <p>
        In a cooperation between the Center for Intelligent Information Retrieval and
UMass Amherst and the BioNLP group at UMass Medical school in Worcester,
we developed a gure-inspired text retrieval method as a new way of retrieving
documents and text passages from biomedical publictions. Our method is based
on the insight that for biomedical publications, the gures play a central role up
to the point where their caption and references provide abstract-like summaries
of the paper. In this work we build on our experience with gure summarization
and gure ranking algorithms [
        <xref ref-type="bibr" rid="ref1 ref5 ref8">5,8,1</xref>
        ].
      </p>
      <p>We are test driving our gure-inspired retrieval method in the BioASQ
competition, where we focus our participation on document and snippet retrieval.
As gures are the center of our attention, our methods rely on the
availability of full text, e.g. in PMC format. Therefore we only retrieve documents and
snippets contained in Pubmed Central. We notice that the available training
data covers Pubmed Central only sparsely. Most queries in the gold standard
contain just one publication from Pubmed Central; only 13 queries contained at
least 10 documents in Pubmed Central. Since it is infeasible to dene a complete
gold standard ahead of time, our mission is to identify new material from PMC
5319ac18b166e2b806000030 Is clathrin involved in E-cadherin endocytosis?
plasma membranes we have found here that non-trans-interacting e-cadherin is constitutively
endocytosed like integrin ligand-independent endocytosis that the formation of endocytosed
vesicles of e-cadherin is clathrin dependent and that e-cadherin but not other cams at ajs
and tjs including nectins claudins and occludin is selectively sorted into the endocytosed (PMC
15263019)
5319abc9b166e2b80600002d Is Rac1 involved in cancer cell invasion?
cells was clearly demonstrated by rna interference assay rac1 depletion signicantly
suppressed the frequency of invasion in both quiescent and igf-i-stimulated
mda-mb-231 cells this indicates the necessity of rac1 for igf-i-induced cell invasion in the cells
overexpression of rac1 has been (PMC 21961005)
that answers the questions. To demonstrate the existence of relevant material
we show examples of relevant snippets in Table 1 and provide more examples in
the result section.</p>
      <p>In the absence of suitable training data on full documents, we develop and
train our method on data from TREC Genomics track 2006 and 2007. Like
Bioasq Task 2b(phase A), the Genomics TREC task focuses on retrieving
relevant documents and snippets for biomedical questions. The distinctions lie in
the use of the Highwire corpus. After training supervised models on the TREC
data, they are applied to questions posed in the BioASQ competition.</p>
      <p>Our approach takes an Information Retrieval perspective on the problem.
First, query expansion is performed with information from UMLS, Wikipedia,
and Figures to enrich the question. Second, a ranking of full documents and
snippets is retrieved from a corpus of articles from Pubmed Central. Third, we
extract features for each document and snippet that indicate its relevance for
the question and re-rank document/snippets with a supervised learning-to-rank
approach.
2</p>
      <p>Background: Information Retrieval
This section introduces document retrieval models and query expansion
techniques.
2.1</p>
      <p>Sequential Dependence Model
An early IR method called query likelihood employed an independence
assumption within query terms to score documents with Dirichlet collection
smoothing. For query terms q1; q2; :::qm, each document D in the collection is scored by
a product of scores under each query term.</p>
      <p>We use the notation ’ ’ to denote sums over all possible entries. In particular
#(qi; D) refers to the term frequency of qi in the given document, #(qi; ) refers
to the term frequency of qi in the corpus, and #( ; D) is the document length
and #( ; ) number of terms in the collection. The scalar controls the amount
of collection smoothing applied, and is a hyperparameter to be estimated. Good
values of are in the range of [500; 5000].</p>
      <p>
        The query likelihood model is almost always outperformed by the
sequential dependence model [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which also includes exact bigrams and windowed
skip-bigrams. The unigram model above can be generalized to arbitrary count
statistics, such as occurrences of a bigram "qi qi+1" in document D to derive
scorebi. Furthermore, counting co-occurrences of the two terms qi and qi+1 in
any order within a window of 8 terms in the document gives rise to the score
under the windowed bigram model scorewbi, where the marginal counts in the
denominator #( ; D) are approximated by the document length.
      </p>
      <p>The sequential dependence model combines the scores of the document D
under the unigram, bigram and window model as a log-linear model.
scoreSDM (q1;q2;:::qm)(D) = uniscoreuni(D) + biscorebi(D) + wbiscorewbi(D)
= &lt; ; (D) &gt; (2)</p>
      <p>The sequential dependence model requires setting of hyperparameters uni; bi,
wbi, and , where the s can be estimated with machine learning.
2.2</p>
      <p>Query Expansion
Keyword-based retrieval methods such as query likelihood and sequential
dependence fail to retrieve documents that refer to the query terms via synonyms.
A solution is to expand the original query q1; q2; :::qm with additional terms
t1; t2; :::tK so-called expansion terms. Methods for predicting expansion terms
ti also provide condence weights wi.</p>
      <p>An expanded SDM query scores documents D by
scoreQ(D) = scoreSDM (q1;q2;:::qm)(D) + !
X wi scoreuni (ti)(D)
i
(3)</p>
      <p>The expanded retrieval model introduces another hyperparameter !, which
can be estimated along with using machine learning.
2.3</p>
      <p>
        Pseudo-relevance Feedback
Additional expansion terms can be derived from external synonym resources
or estimated with pseudo-relevance feedback. In pseudo relevance feedback the
expansion terms are estimated from the document collection [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The approach
is based on the assumption that the un-expanded retrieval model obtained high
precision in the top ranks, but was lacking recall.
      </p>
      <p>The procedure gathers a feedback ranking D1; D2; :::; Dn from the documents
from the collection which have the highest score under the un-expanded query,
e.g. scoreSDM(D).</p>
      <p>The next step derives distribution over terms from the feedback documents.
This involves taking the score of the document Di to approximate a relative
retrieval probability of Di compared to the rest of the feedback set.
p(Dijq1; :::; qm) =</p>
      <p>1
Pn
j=1 exp scoreSDM(Dj )
exp scoreSDM(Di)</p>
      <p>In addition, for each feedback document, a distribution over terms is derived
as a language model.</p>
      <p>These two parts are aggregated to estimate the term distribution for
expansion. We derive the estimator as a mixture of document-specic language models
where the document retrieval probabilitie govern the mixing weights.
#(t; Di)
p(tjDi) / #( ; Di)
p(t) =
n
X p(tjDi)p(Dijq1; :::; qm)
i=1
(4)
(5)
(6)</p>
      <p>The K most probable terms ti under this distribution, together with weights
w = p(ti) are predicted as expansion terms.
2.4</p>
      <p>
        Learning Hyperparameters
We exploit that a SDM retrieval model with query expansion falls into the family
of log-linear models which can be eciently estimated with a learning-to-rank
approach [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. We represent each document by a feature vector with four entries:
the document’s score under the unigram model, as well as the bigram,
windowbigram, and expansion model. We use the document relevance assessments from
the training set to estimate a log-linear learning-to-rank model.
      </p>
      <p>In this work we use the coordinate ascent learner from the RankLib 1 package
optimizing for the metric mean-average precision (MAP).</p>
      <p>The weights of the optimal learning-to-rank model are also the optimal
settings uni; bi, wbi and ! for the retrieval model. When the SDM model is
expanded with multiple expansion models this learning-to-rank approach can be
generalized appropriately.</p>
      <p>This reduces the hyperparameters that need to be estimated by grid-tuning
to the Dirichlet smoothing for SDM, and number of feedback document n and
number of expansion terms K for each expansion model.
1 http://people.cs.umass.edu/~vdang/ranklib.html</p>
    </sec>
    <sec id="sec-2">
      <title>Retrieval Approaches</title>
      <p>In this section we detail how retrieval and query expansion approaches are
combined to leverage gure information to derive a rst pass of bio-medical text
retrieval. We discuss reranking techniques in Section 4. We refer to the target
document collection as full documents, as we further extract pseudo-documents
for gures and abstract.
3.1</p>
      <p>Indexes
From the full documents in the collection, we create dierent retrieval indexes.</p>
      <p>The full document index contains the documents in Pubmed Central
document collection. The task is to retrieve relevant documents from this
collection. The collection is converted into JSON format using the convertion tool
provided by the BioASQ organizers. We index the all visible text as-is while
preserving character osets and section information. The document preprocessing
uses a special tokenizer that preserves the names of chemical compounds, genes
and pathways.</p>
      <p>We identify all gures in the original Pubmed central format and extract
socalled figure documents for each of them. The gure document includes the
caption of the gure, the sentences that reference the gure. In separate elds
we also include sentences within a window of one and two sentences away from
a gure reference. We use the gure documents for query expansion and feature
generation.</p>
      <p>In order to compare the expressiveness of gure documents to abstracts, we
also create an index of abstracts that we swap in as a replacement for gure
documents.
3.2</p>
      <p>Document Retrieval
The most basic retrieval method uses the given query Q to obtain a ranking
of full documents under the sequential dependence model. This ranking can
be output directly [UMass-irSDM], or submitted to a feature-based re-ranking
method (described in Section 4).</p>
      <p>We can improve the ranking by expanding the original query with expansion
terms (to obtain query Q0) to derive a ranking the full documents. To expand the
query with pseudo-relevance feedback, we have dierent options. We can employ
the gure document index [FigDoc Query Expansion] to retrieve a feedback run,
compute term distributions according to the relevance model and expand the
query Q. This approach is also applied to the index of abstract documents to
derive the method [Abstract Query Expansion].</p>
      <p>As an external source of synonyms we can also use Wikipedia. For that
we create a full text index of a Wikipedia snapshot from January 2012 which
contains articles for dierent entities, where some are targeting the biomedical
domain. We cast the original query to our Wikipedia index and apply standard
pseudo-relevance feedback [Wiki Query Expansion].</p>
      <p>
        Alternatively, we expand the query using an external synonym dictionary. In
this study we use the Unied Medical Language System (UMLS) [
        <xref ref-type="bibr" rid="ref2 ref4">4,2</xref>
        ]. We look
up all query terms qi and all query bi-grams qiqi+1 in the UMLS dictionary to
build a pool of expansion terms. Prioritizing for terms that are returned by more
than one lookup, we identify K expansion terms [UMLS Query Expansion].
      </p>
      <p>In all approaches we learn the SDM parameters and expansion weight !
using 25% of the TREC Genomics queries as training data. We tune the
hyperparameter of the sequential dependence model using grid-tuning on another 25%
of the TREC queries as validation data. We select the maximal and according
and ! and keep it xed for the remainder of the experiment.
3.3</p>
      <p>Snippet Retrieval
To participate in the snippet retrieval task, the goal is to break down the relevant
documents into passages that are likely to contain the answer. In the eld of
Information Retrieval this problem is known under the name Answer-Passage
Retrieval.</p>
      <p>The passage retrieval approach applies the document retrieval model to
consecutive text segments inside the document, to create a ranking on the
subdocument level. We chose a granularity of 50 words, which are shifted through
the document in increments of 25 words. For eciency reasons we only consider
documents in the high ranks for passage retrieval.</p>
      <p>For each document, we only consider the highest ranking passage (called
Max-Passage) in the following.
4</p>
      <p>Feature-based Re-ranking Approaches
The ranking of full documents created by methods in Section 3 can be further
improved with a supervised re-ranking approach. We use four main classes of
features. IR Features (Table 2) are derived from the retrieval score under the
unigram, bigram, windowed bigram, and expansion model. The Fiat
Document Features (Table 3) are based on similarity measures between the query
and a semi-structured representation of the full document. Figure captions are
included in the text, but not regarded in any special way. The Fiat Figure
Features (Table 4) are designed to capture similarity of the query to gure-related
information available in the semi-structured document. The fourth category are
Figure Document Features (Table 5) which are derived by retrieving gure
documents (or abstracts), generate features for every gure, and aggregating
across gures within the same document. A full list of features can be found in
the appendix.</p>
      <p>The main idea behind the gure and gure document features is to use gures
as a way to easily isolate important text. There is a lot of technical content in
articles, such as related work sections or details on the experimental setup, that are
not necessarily relevant to the question being asked and can skew search results.
Figures and gure-related passages, on the other hand, are usually describing</p>
      <p>Type Description
FigDoc Average score of gure documents for a given document
FigDoc Average rank of gure documents for a given document
FigDoc Total number of gure document returned
FigDoc Number of gure documents returned at rank 1
FigDoc Number of gure documents returned at rank 3
FigDoc Number of gure documents returned at rank 5
FigDoc Number of gure documents returned at rank 10
FigDoc Number of gure documents returned at rank 20
FigDoc Number of gure documents returned at rank 50
FigDoc Number of gure documents returned at rank 100
FigDoc Number of gure documents returned at rank 1000
FigDoc Maximum score of returned gure documents
FigDoc Minimum rank of returned gure documents
FigDoc Average reciprocal rank of returned gure documents</p>
      <p>FigDoc Maximum reciprocal rank of returned gure documents
an important nding of the article. Here, we use the index of gure documents
to extract features capturing the essence of ndings. The query is issues against
the FigDoc index and we keep track how many and at which rank we retrieve
gures for the respective document. We also keep track whether high ranking
gures are referenced from the highest scoring passage, and measure the textual
similarity between passage and high ranked captions. This allows to separate the
false positives from the true positives: an article may be highly ranked because
of something discussed in the related work or future work sections, however an
article that may be slightly lower ranked but has relevant gure documents may
be the more relevant document.</p>
      <p>We also use features considering the document as a whole. We generate binary
values for quality indicators, e.g., whether a document has gures, citations, and
tables. We also generate features about the passages, such as number of gure
references, number of citation references, number of table references, and the sum
of all references in a passage. Binary features are also calculated for whether or
not a passage is in a gure caption or in a document abstract.</p>
      <p>Most of the generated features compare the tokens in the query to the tokens
of some part of the document. Two measures are used to do this: Query Cover and
TF-IDF. Query Cover is a simple proportion of how many of the query tokens
appear in a particular part of the document. TF-IDF is similar, but each token is
weighted by how frequent it appears in the corpus. If a token does not frequently
appear in the corpus, but appears often in a part of the document, it gets a higher
score than if it is a common token in the corpus. These measures are evaluated
over dierent segments of the document: we obtain scores by comparing the
query to the document abstracts, sentences in the document that reference a
gure, a window of sentences around a gure reference, gure captions, and
sentences in the document that reference a citation or table.
5</p>
    </sec>
    <sec id="sec-3">
      <title>Experimental Evaluation</title>
      <p>We train and validate our methods on test sets of the TREC Genomics track
from the years 2006 and 2007. Both test sets make use of a collection of 162,259</p>
      <p>IR coD igF igD ll
M kF knA oRM
SD RM rakn rakn rakn rean rea lln
IR IR eR eR eR R R A</p>
      <p>X X X X X X
X X X X X X X X</p>
      <p>X X X
X X X X X X
X X X X X X</p>
      <p>X X X X</p>
      <p>X X X</p>
      <p>X X X
documents from 59 biomedical journals published by Highwire Press. The
documents are made available as raw HTML with several download errors and partial
documents. The 2006 collection comprises 27 queries and the 2007 collection
include 35 queries.</p>
      <p>In the following, we make use of a development set comprising the union of
the rst half of queries from both 2006 and 2007 test collections for feature
development and hyperparameter tuning. We report results on both the development
set and the combined test sets from 2006 and 2007.</p>
      <p>0.45
0.40
0.35
T0.30
N
E0.25
M
CU0.20
O
D0.15
0.10
0.05
0.00</p>
      <p>IR SDM</p>
      <p>IR RM</p>
      <p>Re
rank IR</p>
      <p>rank Do
Re
c</p>
      <p>Re
rank Fig</p>
      <p>Rerank FigDoc Rerank All
ll no RM
A
5.1</p>
      <p>Retrieval Hyperparameters
Settings of hyperparameters for retrieval models are determined on the BioASQ
training data, which we further subdivide into a 50% training-fold for log-linand a
50% validation-fold. We train the sequential dependence parameters uni; bi, wbi
and relevance model balance-weight ! in log-linear model fashion with coordinate
ascent (using the RankLib package) on the training fold. We tune the Dirichlet
smoothing parameter on a selection of 100, 1000, 2000, 2500, 3000 on the
validation fold.</p>
      <p>The parameter settings change with the system. As we aggregate more BioASQ
training data from the previous batch submissions (query for task 2b phase b),
the parameters also change across batches. A detailed list of which parameter
has been used in which batch is given in Table 9.
5.2</p>
      <p>Retrieval and Reranking Methods
We study the impact of dierent components on the overall document retrieval
eectiveness, by omitting some components from the pipeline as indicated in
Table 7. The most complete method, referred to as All-Figdoc-UMLS includes
all elements of our pipeline: query expansion on the Figure Document index,
retrieval of full documents with the expanded query, generation of various
features for re-ranking. The feature sets include scores from the IR system as well
as text-only features in addition to gure-related features as extracted from the
full documents and Figure Documents.
5.3</p>
      <p>Training Supervised Re-ranking on TREC Genomics
As only few BioASQ training queries have more than 10 positive documents
in the Pubmed Central collection, we were hesitant to train the supervised
reranking model on it. We learn the parameter vector for feature-based reranking
on the TREC Genomics queries test set, using years 2006 and 2007 on the
corpus of Highwire publications. We use 50% of the TREC queries for learning
the supervision. As the supervision depends on IR hyperparameters, we apply
the tuning heuristic above to 25% of the TREC queries (yielding uni = 0:77,
bi = 0:005, wbi = 0:037, ! = 0:20 and = 2500).
5.4</p>
      <p>Evaluation on TREC Genomics
We study dierent components of our methods on TREC Genomics holdout set.
We evaluate the Rerank All method (corresponding to system
All-FigdocUMLS) method compared to variants of this approach that omit certain feature
classes or steps in the retrieval pipeline. An overview of the evaluated methods
is given in Table 6.</p>
      <p>The ocial evaluation metric of the TREC Genomics test set is mean-average
precision (MAP) on the document ranking. The results on the development set
are presented in Figure 1. We see that the re-ranking approaches gain a decent
boost, whereas the dierences between dierent feature sets are neglegible. With
a paired-t-test at signicance level = 5%, we verify that Rerank All and
Rerank Doc yield signicant improvements over both IR baselines (despite the
overlap in error bars).
5.5</p>
      <p>Submission to BioASQ
We restrict all rankings to the top 20 documents, and for each document we
provide the best scoring snippet, yielding 20 snippets per system and query. We
score snippets with the same retrieval model that we use for document retrieval.</p>
      <p>Inspecting all top 50 documents, for each document we create snippet
candidates by a sliding window of 50 terms (shifted by 25 terms) and only return the
snippet with the highest score under the expanded retrieval model. The snippets
are reranked by the retrieval score under the passage model and we only
output the top 20 snippets. This means, that some snippets might stem from new
documents.</p>
      <p>The term windows are converted to section IDs and character osets. In the
batch 1 submission, we did not incorporate whitespaces and XML formatting
correctly. This has been corrected for all remaining batches.</p>
      <p>M
D
S
r
i
s
s
a
M
U</p>
      <p>S
L
M
U
c
o
d
g
i
F
c
o</p>
      <p>D
X</p>
      <p>S
L
M
U
c
o
d
g
i
F
l
l</p>
      <p>A
X
c
o
d
g
i
F
l
l</p>
      <p>A</p>
      <p>X</p>
      <p>B1, B2 B1, B2 B1, B2
B3, B4, B5 B3, B4, B5 B3, B4, B5</p>
      <p>X X X
X X X
X X X
X X X</p>
      <p>X X</p>
      <p>X X
X X X</p>
      <p>S
L
M
U
t
c
a
r
t
s
b
A
l
l</p>
      <p>A</p>
      <p>X
B1-B5</p>
      <p>X
X
X
X
X
X
X</p>
      <p>We modied the some components across dierent submitted batches, to
maximize our knowledge gain in the light of the limitation to 5 submission
systems. In particular we varied the query expansion with external sources, from
using UMLS to Wikipedia. This change is indicated in Table 7.</p>
      <p>Timing. The methods were run on a gridengine cluster each node having a
2.21GHz Intel Xeon CPU with 10GB of RAM (much more than necessary).
Averaging the CPU time of 100 queries, we observe 21 seconds for irSDM, 35 seconds
for All-FigDoc-UMLS (with Wikipedia Expansion), 41 seconds
All-AbstractUMLS, 25 seconds for All-FigDoc, 36 seconds for Doc-Figdoc-UMLS.
Results. After observing an abysmal score for all our systems on the ocial
preliminary results, we manually inspected the quality of predicted snippets on
rank one and two in 25 queries of batch 5 obtained by the irSDM method.
Table 10 displays some of the relevant snippets. We notice that many of the
documents are not listed in the gold standard. An exception are the query on
archeal genomes where we found a much more descriptive snippet than the one
provided in the gold standard, and the query on Gray paleted syndrome, where
our passage includes the ground truth passage.</p>
      <p>We perform a more elaborate annotation on a subset of nine queries from
batch 3 (irSDM). The results, measured in snippet precision at rank 10 (P@10)
are presented in Table 8. We see that the precision varies between 10% and 70%,
but all queries have a non-zero precision. One of our common mistakes occurs
when questions ask about a particular brand of medicine or active ingredient.
We notice that in such cases, a large percentage of retrieved snippets are about
the disease in general, but do not mention the brand or ingredient. In the future,
we intend to modify our approach by identifying such required words with an
NLP tagger such as conditional random elds and discard snippets that do not
contain the required word.
average
0.1
0.2
0.3
0.6
0.7
0.4
0.1
0.2
0.2
0.3</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>For the UMass BioASQ submission we designed a gure-aware IR system which
includes search-indexes of full document as well as gure captions and
references. We use gures both as a resource for query expansion and test external
source such as Wikipedia and UMLS as well. The retrieval approach is
complemented by a supervised learning-to-rank method the includes features from IR,
the document, gure features, and features from retrieving gure documents.</p>
      <p>We evaluate against a very strong text-only baseline, which is outperformed
on our development test set from the TREC Genomics track. We anticipate that
including features from the gure-documents in both the retrieval methods and
in reranking will improve the ranking of both document and snippets.
Acknowledgements
This work was supported in part by the Center for Intelligent Information
Retrieval, in part by Umass Medical School subaward RFS2014051 under National
Institutes of Health grant 5R01GM095476-04. Any opinions, ndings and
conclusions or recommendations expressed in this material are those of the authors
and do not necessarily reect those of the sponsor.
5319abb166e2b80600002f Which growth factors are known to be involved in the induction of EMT?
in emt induction additionally non-smad signaling pathways activated by tgf-? and cross-talk with other signaling
pathways including broblast growth factor fgf and tumor necrosis factor-? tnf-? signaling play important
roles in emt promotion induction of emt in tumor stromal cells by (PMC 22111550, rank 1)
5319ac18b166e2b806000030 Is clathrin involved in E-cadherin endocytosis?
plasma membranes we have found here that non-trans-interacting e-cadherin is constitutively endocytosed like
integrin ligand-independent endocytosis that the formation of endocytosed vesicles of e-cadherin is clathrin
dependent and that e-cadherin but not other cams at ajs and tjs including nectins claudins and occludin is
selectively sorted into the endocytosed (PMC 15263019, rank 1)
5319abc9b166e2b80600002d Is Rac1 involved in cancer cell invasion?
cells was clearly demonstrated by rna interference assay rac1 depletion signicantly suppressed the frequency
of invasion in both quiescent and igf-i-stimulated mda-mb-231 cells this indicates the necessity of rac1 for
igf-i-induced cell invasion in the cells overexpression of rac1 has been (PMC 21961005, rank 1)
5311bcc2e3eabad021000005 Describe a diet that reduces the chance of kidney stones.
stone promoters and inhibitors reducing deposition and excretion of small particles of caox from the kidney
maintaining the antioxidant environment and reducing the chance of them being retained in the urinary
tract number of herbal extracts and their isolated constituents have also shown (PMC 23112535, rank 1)
for age study on the relationship of an animal-rich diet with kidney stone formation has shown that as the
xed acid content of the diet increases urinary calcium excretion also increases the inability to
compensate for animal protein-induced calciuric response may be risk factor for the (PMC 21369385, rank
2)
530cf4fe960c95ad0c000003 Could Catecholaminergic Polymorphic Ventricular Tachycardia (CPVT) cause sudden
cardiac death?
case of catecholaminergic polymorphic ventricular tachycardia introduction in reid et al.1 discovered
catecholaminergic polymorphic ventricular tachycardia cpvt cpvt is known to cause syncope or sudden cardiac
death and the three distinguishing features of cpvt has subsequently been described (PMC 19568611, rank 1)
52fe58f82059c6d71c00007a Do archaeal genomes contain one or multiple origins of replication?
genomes in the genus bacillus such positive correlation cannot be explained by the pure c?u/t mutation bias
archaeal genomes multiple replication origins are typically assumed for archaeal genome replication
multiple origins of replication implies multiple changes in polarity in nucleotide (PMC 22942672, rank 1)
52e204a998d0239505000012 Which is the denition of pyknons in DNA?
processed the sequences of the human and mouse genomes using the previously outlined pyknon discovery
methodology see methods section as well as ref and generated the corresponding pyknon sets by denition each
pyknon is recurrent motif whose sequence has minimum length minimum number of intact (PMC
18450818, rank 1)
52d8494698d0239505000007 Which genes have been found mutated in Gray platelet syndrome patients?
nbeal2 is mutated in gray platelet syndrome and is required for biogenesis of platelet alpha-granules platelets
are organelle-rich cells that transport granule-bound compounds to tissues throughout the body platelet ?-granules
the most abundant platelet organelles store large proteins that when released promote platelet adhesiveness
haemostasis and wound (PMC 21765412, rank 1)
52ce531f03868f1b06000031 Are retroviruses used for gene therapy?
frequently employed forms of gene delivery in somatic and germline gene therapies retroviruses in
contrast to adenoviral and lentiviral vectors can transfect dividing cells because they can pass through the nuclear
pores of mitotic cells this character of retroviruses make them proper candidates (PMC 23210086, rank 2)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
          </string-name>
          , H.:
          <article-title>Figsum: automatically generating structured text summaries for gures in biomedical literature</article-title>
          .
          <source>In: AMIA Annual Symposium Proceedings</source>
          . vol.
          <year>2009</year>
          , p.
          <fpage>6</fpage>
          .
          <string-name>
            <given-names>American</given-names>
            <surname>Medical Informatics Association</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bodenreider</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>The Unied Medical Language System (UMLS): integrating biomedical terminology</article-title>
          .
          <source>Nucleic Acids Research</source>
          <volume>32</volume>
          (
          <issue>Database issue</issue>
          ),
          <source>D267D270 (Jan</source>
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Lavrenko</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croft</surname>
          </string-name>
          , W.B.:
          <article-title>Relevance based language models</article-title>
          .
          <source>In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          . pp.
          <fpage>120127</fpage>
          . SIGIR '01,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2001</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/383952.383972
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lindberg</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Humphreys</surname>
            ,
            <given-names>B.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCray</surname>
            ,
            <given-names>A.T.</given-names>
          </string-name>
          :
          <article-title>The Unied Medical Language System</article-title>
          .
          <source>Methods of Information in Medicine</source>
          <volume>32</volume>
          (
          <issue>4</issue>
          ),
          <volume>281291</volume>
          (Aug
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Learning to Rank Figures within a Biomedical Article. PLOS ONE 9(3) (MAR 13</article-title>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Metzler</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croft</surname>
          </string-name>
          , W.B.:
          <article-title>A markov random eld model for term dependencies</article-title>
          .
          <source>In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          . pp.
          <fpage>472479</fpage>
          . SIGIR '05,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2005</year>
          ), http://dx.doi.org/10.1145/1076034.1076115
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Metzler</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croft</surname>
          </string-name>
          , W.B.:
          <article-title>Linear feature-based models for information retrieval</article-title>
          .
          <source>Inf. Retr</source>
          .
          <volume>10</volume>
          (
          <issue>3</issue>
          ),
          <volume>257274</volume>
          (Jun
          <year>2007</year>
          ), http://dx.doi.org/10.1007/s10791-006-9019-z
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramesh</surname>
            ,
            <given-names>B.P.</given-names>
          </string-name>
          :
          <article-title>Automatic gure ranking and user interfacing for intelligent gure search</article-title>
          .
          <source>PLoS One</source>
          <volume>5</volume>
          (
          <issue>10</issue>
          ),
          <year>e12983</year>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>