<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Evidence Extraction : Analysis of Scientific Figures from Studies of Molecular Interactions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gully BURNS</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiangyang SHI</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yue WU</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Huaigu CAO</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prem NATARAJAN</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>USC Information Sciences Institute</institution>
          ,
          <addr-line>4676 Admiralty Way, Suite 1001, Marina del Rey CA 90292</addr-line>
          ,
          <country country="US">United States of America</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>Scientific figures, captions and accompanying text provide a valuable resource that comprise the evidence generated by a published scientific study. Extracting information pertaining to that evidence requires a pipeline made up of several intermediate steps. We describe machine reading analysis applied to papers that had been curated into the European Bioinformatics Institute's INTACT database describing molecular interactions. We unpack multiple steps in an extraction pipeline that ultimately attempts to identify the type of experiments being performed automatically. We apply machine vision and natural language processing to classify figures and their associated text based on the type of methods used in the experiment to a level of accuracy that can likely support future biocuration tasks.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>Information Extraction</kwd>
        <kwd>Molecular Interactions</kwd>
        <kwd>Biomedical Informatics</kwd>
        <kwd>Image Analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Figures in the results sections of experimental research articles papers serve as the
primary representation of evidence in scientific publications. They anchor the narrative flow
of a paper in data by showcasing relevant aspects that illuminate points in papers’
arguments. As scientists mature, they tend to focus more on the methods and results of
papers, and find figures easier to understand when reading the literature [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Although
experimental findings shown in figures are the most informative and valuable semantic
elements of scientific papers, in-depth knowledge of the domain may be required to
interpret the data correctly. This may make developing semantic representation of figures’
scientific content a less attractive target for information extraction (IE) researchers. Most
existing IE systems work with text and extract information from all claims available in
the text (not just those derived from evidence presented in the paper). Our goal in this
paper is to describe preliminary results from deep learning classification and extraction
work based on text and images pertaining to figures in a well-defined experimental
domain.
      </p>
      <p>
        Molecular interactions are binding events where two molecules join to form a single,
larger “molecular complex”. The European Bioinformatics Institute’s (EBI) INTACT
database provides an open-access, high-quality repository of molecular interactions that
have been manually-curated from primary research papers. INTACT links subfigure
references (i.e., 1a, 2b, 5f, etc.) of experiments that describe interactions directly to their
database records [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. INTACT, therefore, provides a high-quality resource for IE in this
domain. We previously developed methods to link ‘evidence fragments’ (i.e., text from
the main narrative of papers pertaining to figures) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We report initial efforts to
develop evidence extraction infrastructure. This involves extracting images from PDF files,
breaking them into subfigures, and classifying each based on the type of image. Each of
the various pieces described here should be considered preliminary and will be described
in subsequent technical papers. Here, we focus on synthesis of these multiple steps into
a workflow.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Detecting and processing scientific figures in biomedical papers using machine vision
techniques is a well-established area of research [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This work includes [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], who
extracted vector images from PDF files to analyze their substructure. FigSearch [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
classified the text of figure captions to identify ‘schematic representations of protein
interactions and signaling events’ with an F-score of 0.77. The Yale Imagefinder system
searched and examined data from scientific figures, based on OCR analysis of text in
the figures [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. More focused extraction work from the same team was then centered on
molecular gel images given their ubiquity and regular structure [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Our long-term goal
follows their example by applying deep learning to gel-based images to reconstruct
primary measurements made with gels in molecular interaction experiments (see
Discussion).
      </p>
      <p>
        There are a few methods for segmentation of multipanel figures in the literature. In
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], panels are located by a line segment detection algorithm followed by a line
vectorization process that connects broken line segments on the boundary of the panel. As a useful
step to analyze and understand figures in biomedical papers, a caption localization and
recognition algorithm is presented in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. It is worth noting that ImageCLEF
competition [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is an evaluation campaign with several image related tasks. Prediction of
condensed textual descriptions for biomedical images has become one of the ImageCLEF
tasks since 2017.
      </p>
      <p>
        The YOLO (“You Only Look Once”) method is a high-performance approach to
object detection in computer vision [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. YOLO is designed for real-time object detection,
and can be trained with user-provided training data and deployed to a customized set of
objects. This method has been used for medical image analysis [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], but not yet (to our
knowledge) for literature-based IE.
      </p>
      <p>PMC Articles
(.pdf + .nxml)</p>
      <p>A B C D</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <sec id="sec-3-1">
        <title>3.1. INTACT Data</title>
        <p>A</p>
        <p>B</p>
        <p>C</p>
        <p>D</p>
        <p>A</p>
        <p>B</p>
        <p>C</p>
        <p>D
Cropped Figures</p>
        <p>Subfigures
Captions</p>
        <p>
          Subcaptions
Our INTACT data contains 20,065 papers of which 2,254 were available as part of the
open access subset of Pubmed Central’s (PMC) online digital collection. We downloaded
bundled .tar.gz files from the PMC ftp service (available at ftp://ftp.ncbi.nlm.
nih.gov/pub/pmc/), which provided access to both the .nxml and .pdf formatted
versions of each article. We downloaded access to the original INTACT data records for each
paper in PSI-MI25 format [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] from https://www.ebi.ac.uk/intact/downloads.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Preprocessing Figure-Based Image and Text Data</title>
        <p>Preprocessing of the text of each paper was performed using the UIMA-BIOC
library (https://github.com/SciKnowEngine/UimaBioC) using regular expressions
to identify and standardize subfigure references within the captions of papers. Subfigure
references are provided in the body of .nxml formatted papers and may be read easily.</p>
        <p>We used LAPDFText (https://github.com/SciKnowEngine/lapdftext),
providing a new figure extraction capability based on finding captions in PDF files (i.e.,
blocks that start with the word ‘Figure’) and identifying a nearby region with very low
word density over the page. Caption text was painted out by masking individual words
with whitespace and the region cropped from the PDF to provide a bitmap version of the
figure image.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Data Pipeline</title>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Figure Subpanel Extraction</title>
      </sec>
      <sec id="sec-3-5">
        <title>3.4.1. Simple Baseline: A Heuristic Connected Component Approach</title>
        <p>We developed a heuristic approach for subpanel extraction based on detecting the
uppercase letters that denote each subfigure (‘A’, ‘B’, etc.) and then use a greedy tiling
mechanism that places the letter in the top left corner of panels to construct a rectangular layout
for each panel in a figure. Letters are detected using connected component analysis. A
figure is cut into multiple su-panels by straight lines that go along the top or left side of
each detected letter. As a baseline, this is designed to be an easy-to-implement solution
that we use for comparison with more sophisticated methods.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.4.2. Applying and Modifying Convolutional Neural Networks for Subpanel Detection</title>
        <p>
          We applied the YOLO algorithm [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] to detect subpanels in scientific figures (rather
than objects in photo-quality images). The multipanel figure was resized to obtain 1:1
aspect ratio and fed into the input layer of YOLO. All subpanels were considered the
same type of object. The YOLO network produces an output indicating the locations of
at most 13 by 13 (i.e., 169) subpanels from the input figure. In the architecture, images
are horizontally and vertically split into finely-divided, regular grids. The system then
uses this grid structure to predict the existence of bounding-boxes. Since YOLO finds
the bounding-boxes based on regression, these delineations are sensitive to the center
position of each box and more finely grained grids are more likely to ensure that each
subgraph has a more accurate center point. Thus, YOLO tends to split subfigures, causing
errors by splitting the image too finely. We implemented a variant of YOLO to act more
flexibly with irregularly distributed grids by introducing constraints on the generated
layout of the figure. This work is ongoing and is reported here in a preliminary form.
        </p>
      </sec>
      <sec id="sec-3-7">
        <title>3.5. Image Type Detection</title>
        <p>
          We applied the LeNet image classification algorithm [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] directly to subfigures labeled
as “gel” (for images of gel data), “graph” (for data visualizations with axes such as bar
and line charts), “histology” (for photographic images of tissue), and “diagram” (for any
conceptual diagrams). We hand-annotated figures extracted from the INTACT database
and created binary classifiers for each of the four types of image.
        </p>
      </sec>
      <sec id="sec-3-8">
        <title>3.6. Text Classification of Experimental Type</title>
        <p>
          We processed open access INTACT papers with pattern-based extraction to identify
individual sentences from figure captions that refer to specific subfigures (concatenating
them with captions for the figure as a whole). We matched these caption documents to
INTACT records to yield 3,366 entries with an associated annotation for the types of
methods used to detect molecules and their interactions [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. There were 122 separate
codes for “interaction detection method” which we grouped into to 18 higher-level codes.
Similarly, the INTACT set used 48 separate codes for detecting molecular participants
in interactions, which we simplified to 6 higher-level codes. We then applied document
classification tools based on one-dimensional convolution neural networks (CNN), and
Long-Short Term Memory networks (LSTM). Source code for each classifier (with
complete configuration details) may be found through the paper’s accompanying research
object descriptor: http://purl.org/ske/ro/semsci18.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>
        We describe multiple stages of analyses that together provide the initial stages of a full
information extraction pipeline for evidence from scientific figures. They do not
themselves provide a complete solution but each contributes a step towards the construction
of such a system. We provide access to code and data for this work as a research object
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]: http://purl.org/ske/ro/semsci18.
      </p>
      <sec id="sec-4-1">
        <title>4.1. Sub-panel Extraction</title>
        <p>Within the preliminary INTACT evidence extraction pipeline, the augmented YOLO
method yields an accuracy of 0.87. This stands in comparison to the use of our heuristic
baseline (accuracy=0.78) and the use of plain YOLO without our modifications
(accuracy=0.76). This is an essential part of the pipeline for constructing the basic data record
pertaining to each individual piece of evidence in a paper and will be a focus of continued
improvement going forward.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Image Type Detection</title>
        <p>We performed machine learning experiments on manually-tagged subfigure images
from INTACT. Table 1 shows very good performance even with simple, off-the-shelf
image classification technology. In our sample, we were able to detect histological
images with near-perfect accuracy (0.97), charts with an accuracy of 0.92, and gel images
with an accuracy of 0.83. Tagging accuracy for general conceptual diagrams was only
0.40. Given the variety of visual design that these diagrams can have, this is unsurprising
and perhaps requires a more finely-divided classification scheme. Table 1 also shows the
number of training and testing examples we performed our experiments on.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Text Classification of Experimental Type</title>
        <p>
          Table 2 shows how text source (from evidence fragments[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] or subcaptions), number of
classes, and neural network model affected the accuracy of experimental type
classifications. We investigated multi-way classification of the PSI-MI2.5 experimental
‘participant detection method’ codes (marked ‘Participant’ in Table 2) or ‘interaction detection
method’ (marked ‘Interaction’) for each curated data record in our corpus at two levels
of granularity for both CNN and LSTM classifiers.
        </p>
        <p>
          First, we attempted to reconstruct the INTACT record classification, involving a
large number of target categories (48 for participant methods and 122 for interaction
methods). Our systems generally had quite poor performance for this data. We then
grouped together more finely delineated records into more general categories. For
example, we replaced the low-level category for ‘anti-tag coimmunoprecipitation’ (MI:0007)
with the higher-level category ‘affinity chromatography technology’ (MI:0004) to
provide a coarser classification target. We reduced the number of classification categories
from 48 to 6 for participant detection methods and from 122 to 18 for interaction
detection methods. This improve prediction accuracy for interaction detection methods to 0.83
(using a CNN document classifier) and 0.75 for participant detection methods. Finally,
we performed a binary tagging classification to identify specific subtypes of method:
Coimmunoprecipitation (‘Co-IP’) as the most common interaction detection method and
Western Blot (‘WB’) as the most common method for participant detection. The
classification accuracy for tagging coimmunoprecipitation experiments was 0.90 and western
blots was 0.85. We found that prediction performance was consistently better based on
caption text rather than text from evidence fragments. This is consistent with findings
from previous work [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>Ultimately, we seek to isolate, model, and extract scientific evidence as a distinct class of
entity from interpreted ‘facts’ in scientific papers. Scientists spend the majority of their
effort on creating evidence to support mechanistic explanations through experimentation.</p>
      <p>
        Yet, informatics systems rarely support the complete chain of reasoning that supports a
given assertion. Typically coding schemes, such as the Evidence Code Ontology (ECO),
designate the type of evidence for a given claim (i.e., inferred from data, asserted by
curator, etc.), but do not deal with detailed representations of the evidence itself [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        Similarly, the PSI-MI25 codes for interaction and participant detection methods [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
provide a human-generated classification scheme for methods but do not provide any
structures to help understand and interpret data acting as evidence.
      </p>
      <p>
        An important use case is ‘document triage’ where biocurators need to prioritize
studies. Typically, this is viewed as a whole-document task [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], but being able to identify
types of individual experiments could provide a powerful, lower-level set of features for
triage.
      </p>
      <p>
        Figure 2 illustrates the desired outcome of what an evidence extraction system
should be able to do: given a scientific publication where experimental work is described
in text and figures, we envisage a system that can (A) identify a semantic model of the
experiment being performed; and (B) populate a tabular representation of the experiment’s
hi et al., 1995; Biesova et al., 1997). E3b1
condomain and binds Sos-1 (Scita et al., 1999; Fan
2000). In addition, E3b1 binds to the SH3
dos8 (Biesova et al., 1997). Thus E3b1 acts as a
tein, which holds together Sos-1 and Eps8. The
Sos-1–E3b1–Eps8 (S/E/E8) is endowed with Rac
ty in vitro (Scita et al., 1999). Thus, the sum of
bservation raises the possibility that Sos-1 might
different step in the signaling cascade, acting as a
nd a Rac-GEF, respectively. A number of
outestions need clarification, however, before such a
d be accepted: does a trimericJuSly/E20/1E88 complex
physiological conditions? How is the specificity
ected toward Ras or Rac? Doreessuthltes pthurtoautgivhetdhueaelxecution of IE technology. We seek to use ontology-based semantic
Sos-1 provide a mechanisticmforadmeleswtoorakccfoomrpthliesh this goal [
        <xref ref-type="bibr" rid="ref19 ref20 ref21">19,20,21</xref>
        ].
      </p>
      <p>activation of Ras and Rac? The present studies
taken to elucidate these questions.A [cEepll-st8y(p-e/-)=
fibroblasts]
promoterconstruct =
[Eps8myc]</p>
      <p>Cell Line
Transfection</p>
      <p>B
complex exists under Cell
cal conditions Lysing
studies, we showed that Sos-1, E3b1, and Eps8 Lysate
a trimeric complex in vivo upon concomitant</p>
      <p>blockingion of the three proteins. However,pwepteidef=ailed to DNA-Binding</p>
      <p>Blocking
existence of an endogenous S/E/E[PPXX8XXDDAYc],omplex Reaction C
., 1999). We reasoned that this couilmdmubnoe- due to Figure 1. The Sce/llE-t/yEpe8 cpcoroonmmstoprutelcert-x ebplxoeipcsktiitdnseg-unpidrmeecmirpuintpaoth-eysapinroitmilboaordgyy-icacolncpcreoontnteridant-iiotnions.
ficiency of the immunoprecipitatinpgrecaipntatteibodiesImmuno(pAre)cipEitpatsio8n!/! cellEspsw8(-e/-r)e tErpasn8msyfcectedPXXwDYith a mcyocntrol Svoes-c1tor (!n/o!ne lanes)
us sought to exploit the availability [toamrfygcee]t=ps8!/! fi- or a vector codfiEibnprosg8b(l-a/as-t)smyEpcs8empycitopePX–XtDaYggedmEycps8 (!E3/b!-1 [Eps8nmoneyc]
cuinrcoupmrevceipnittatthiiosnperoxpbelermim.eTntos tuhsiisnegn[apeSdrniopmtis,bas-o1r8wdy,y-!e=/!pefri-- WlrweBaeslinsottetihersnstha),nybgtrosoitnhmgcilceaiff-iirEEnbbcrpprreooyss(88bb0li((lll--naa//.ss--2cgtt))sslmoanEEghppe/ssy88msmmgwlyyr)cco.emTrehiPPcceeoXXinXXsnltDDtreoAAarvl,,beelsliissshotmmaefyyndccEcpaesft8ge,erESPn3o1bDse0--11.GdHFoyRfg,rsaoelloonwwlmedcicEtiGionnF-R
which the expression of Eps8 was rEe3sbt-1o, rEepsd8], toProtein- in individual cfliobrnobelasstsof eps8!/!, c!on/t!rol [Eps8myc], or in wild-type</p>
      <p>
        Concentration (WT) cells were determined by immunoblotting analysis of equal
al levels, with an expression vector encoding a
e-tagged Eps8 (!/! [Eps8myc] cells). We se- amounts of total cellular lysates (50 "g) using the indicated antibodies
fected clones in which the levFeiglsuroef2e.xAprmeasnsuioalnly-ocufrated e(WxamB)p,leanodf‘cevloidneensceweixthtralcetvioenl’s. oAf.eFxlopwrecshsairotnofotfheEppsro8t,oPcoDl GfoFrRex,paenridment
1C from [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. Dependency relaEtiGonFsRbestiwmeielnarintodewpeinldde-ntyt paendfidberpoebnldaesnttsvwareiarbeleussaerde.s(hBo)wTnoatsalrecdellilnuel.aBr.
ere very similar to those presOernigtinianl gel image showfii-ng gel-lbyassaetdemse(1as0umremg)e,notsbotafipnreodteifnrocmoncthenetrtaratinonsfiencdteaxnetdsbdyevsaclruiebseodf iinndAep,ewndeernet
wild-type
ig. 1 A). Endogenous Sos-1 vaanridablEes3(b‘W1Bc’o,‘uIPld’, ‘bpeeptides’i)m.Cm.uDneosipreredceixptirtaactteedd d(IaPta) twabitlehsthhoewainngtifbooudrvyailnuedsiccaorteredspaotntdhinegbtootvtoalmues
anti-myc immunoprecipitaetnecslofsreod mby alyresdatdeosttedofline in (Bc.tr, irrelevant antibody), followed by immunoblot with indicated
constituted, but not from eps8!/! fibroblasts antibodies (WB). The indicated lanes (lysates) were loaded with 100
As is often the case wo"bigtthaoifnteodtafrlocmellular lysates. (C) Total cellular lysates (10 mg)
work in e!S/c!ie[nEcpes8amndycb]icoemllsedwiecrael iimnfmourmnoaptircesc,ipditeavteedlo(pIPi n)g
mine whether E3b1 mediatesutshefeuilnttoeoralsctfioornsbciee-ntistswmithusttheinaintitaiblloydyp ainsdsictharteodugahtthseevteorpa(lcitnr,teirrrmeleevdaiantteandtai btao-dsyc)ieinnce
and Sos-1, as it would be exspteecptse.dTahcecocordnitnrigbuttoion ofthtehipsrepsaepnecre iosrparbesleimnciena(!ry) obfu4t0prnogv/midleosf cthleeairnddiecmatoednsptreapttiiodnesof
plex model, we performed coim
      </p>
      <p>feasimbiulintyotporedceifipniteaa-
fram(peewpotirdkefso).rTehxetrpaecpttiindgesanusdecdlawsesriefyPiPnPgPtPyVpDesYoTfEDevEiEd(ePnXcXeDpYe)rtaanindments under conditions in whinigchtothspeeacsisfioccitaytpieosnof exPpPePriPmPVenDtA.TEDEE (PXXDA, used as a control). Immunoblotting was
s8 and E3b1 was disrupted. The binding site of with the indicated antibodies (WB). The indicated lanes (lysates)
e SH3 domain of Eps8 was previously mapped to were loaded with 100 "g of total cellular lysates.
cid sequence, PPPPPVDYTEDEE, where the D
residues are critical for efficiAenctknboinwdliendgg m(Menotns.- covered in anti-myc immunoprecipitates in the presence of
1999). Thus, a peptide encompassing this region the competing, but not of the control, peptide (Fig. 1 C).
cifically disrupt the Eps8–E3Tbh1isaswsoocrkiatwioans. fIunn-ded TbyhuDs,AuRnPdAer BphigysiMoleocghiacnaliscmonpdriotigornasm, thuendceorimAmRuOnocpornetcriapc-t
3b1could be recovered in antWi-m91y1cNiFm-1m4u-1n-o0p4r3e6-, by DitAatRioPnAoMf EEDpIsF8OaRndPrSoojesc-1tadndepbeyndNsIHongrtahnet
Rin0t1egLrMity01o2f5t9h2em lysates of Eps8myc-recon0st1i.tuTtheids mcealltse,riwalhiesnbasedEopns8r–esEe3abrc1hinspteornascotrieodn,bpyotihnetiAnigr tFootrhcee Rexeisseteanrcche
Lofabaoprahtyosriyoprecipitation was performedanidn tthheeDperfeesnesnecAedovfancedolRogeisceaalrcSh/EPr/oEj8ectcsoAmgpelnecxy.
uItndcearnangoretebmeenfotrnmumalblyereFxAcl8u7d5e0df the competing peptide. T1h6e-2a-s0so20ci4a.tiTohne wUa.Ss,. GovtehrantmEepnst8i,s Ea3utbh1o,riaznedd Stoosr-e1praosdsouccieataendafdteirstcrieblultleysries,prtihnutss faol-r
reserved when a control peptiGdoe,vebrenamrienngtaal Ypu→rpAoses nlootwwiinthgstcaonidminmg
uannyopcroepcyipriigtahttionnot.atHioonwtheevreero,nw.Tehheavvieewpsreavnidand unable to bind to Epsc8on(Mcluosniogniosvcioenttaailn.,ed heoreuisnlyardee mthonsestorafttehde(aSuctihtaorest anl.d, 2s0ho0u1l)dthnaot
tbheeitnhtererpereenteddogasused (Fig. 1 C). Similarly, noneScoess-s1arciolyulrdepbresreen-ting tehneouosfficpiraoltpeionlsiciaelssoorceonlodcoarsliezmeeinnts,veivitoheirnexdpyrnesasme
dicoracitminplied, of the Air Force Research Laboratory and the Defense Advanced Research Projects
Agency or the U.S. Government. Additionally, we thank Dr. Nanyung Peng for her
generous advice and guidance. Thanks too to Deepthi Devaraj.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K. E.</given-names>
            <surname>Hubbard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Dunbar</surname>
          </string-name>
          ,
          <article-title>Perceptions of scientific research literature and strategies for reading papers depend on academic career stage</article-title>
          .
          <source>PLoS One</source>
          <volume>12</volume>
          , (
          <year>2017</year>
          ),
          <year>e0189753</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Orchard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ammari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Aranda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Breuza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Briganti</surname>
          </string-name>
          , et al.,
          <article-title>The MIntAct project-IntAct as a common curation platform for 11 molecular interaction databases</article-title>
          .
          <source>Nucleic Acids Res</source>
          <volume>42</volume>
          , (
          <year>2014</year>
          )
          <fpage>D358</fpage>
          -
          <lpage>363</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Burns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dasigi</surname>
          </string-name>
          and
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Hovy</surname>
          </string-name>
          .
          <article-title>Extracting Evidence Fragments for Distant Supervision of Molecular Interactions</article-title>
          . SemSci 2017 Workshop, ISWC (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kuhn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Luong</surname>
          </string-name>
          . and
          <string-name>
            <given-names>M.</given-names>
            <surname>Krauthammer</surname>
          </string-name>
          ,
          <article-title>Finding and accessing diagrams in biomedical publications</article-title>
          .
          <source>AMIA Annu Symp Proc</source>
          <year>2012</year>
          , (
          <year>2012</year>
          )
          <fpage>468</fpage>
          -
          <lpage>474</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>McCusker</surname>
          </string-name>
          ,
          <string-name>
            <surname>and M.</surname>
          </string-name>
          <article-title>Krauthammer Yale Image Finder (YIF): a new search engine for retrieving biomedical images</article-title>
          .
          <source>Bioinformatics</source>
          <volume>24</volume>
          , (
          <year>2008</year>
          )
          <fpage>1968</fpage>
          -
          <lpage>1970</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kuhn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Nagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Luong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Krauthammer</surname>
          </string-name>
          ,
          <article-title>Mining images in biomedical publications: Detection and analysis of gel diagrams</article-title>
          .
          <source>J Biomed Semantics</source>
          <volume>5</volume>
          , (
          <year>2014</year>
          )
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Shao</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. P.</given-names>
            <surname>Futrelle</surname>
          </string-name>
          ,
          <article-title>Recognition and Classification of Figures in PDF Documents</article-title>
          .
          <article-title>Graphics Recognition. Ten Years Review and Future Perspectives (eds</article-title>
          . Liu,
          <string-name>
            <given-names>W.</given-names>
            and
            <surname>Llads</surname>
          </string-name>
          , J.)
          <volume>231</volume>
          -
          <fpage>242</fpage>
          , Springer Berlin Heidelberg,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.-K. Jenssen</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Nygaard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Sack</surname>
          </string-name>
          , and E. Hovig,
          <article-title>FigSearch: a figure legend indexing and classification system</article-title>
          .
          <source>Bioinformatics</source>
          <volume>20</volume>
          (
          <year>2004</year>
          )
          <fpage>2880</fpage>
          -
          <lpage>2882</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K. C.</given-names>
            <surname>Santosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Aafaque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Antani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. R.</given-names>
            <surname>Thoma</surname>
          </string-name>
          ,
          <article-title>Line Segment-Based Stitched Multipanel Figure Separation for Effective Biomedical CBIR</article-title>
          . IJPRAI
          <volume>31</volume>
          (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Antani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. R.</given-names>
            <surname>Thoma</surname>
          </string-name>
          <article-title>: Localizing and Recognizing Labels for Multi-Panel Figures in Biomedical Journals</article-title>
          .
          <source>ICDAR</source>
          <volume>14</volume>
          (
          <year>2017</year>
          )
          <fpage>753</fpage>
          -
          <lpage>758</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Garc</surname>
          </string-name>
          ´ıa Seco de Herrera,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schaer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bromuri</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Mu</surname>
          </string-name>
          <article-title>¨ller, Overview of the ImageCLEF 2016 medical task Working Notes of CLEF 2016 (Cross Language Evaluation Forum)</article-title>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Redmon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Divvala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.B.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Farhadi</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>You Only Look Once: Unified, RealTime Object Detection</article-title>
          .
          <source>Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          (
          <year>2016</year>
          )
          <fpage>779</fpage>
          -
          <lpage>788</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>M. van Rijthoven</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Swiderska-Chadaj</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Seeliger</surname>
            ,
            <given-names>J. van der Laak</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Ciompi</surname>
          </string-name>
          .
          <source>You Only Look on Lymphocytes Once. in Medical Imaging with Deep Learning</source>
          <year>2018</year>
          (
          <year>2018</year>
          ) Amsterdam.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lecun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bottou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Haffner</surname>
          </string-name>
          ,
          <article-title>Gradient-based learning applied to document recognition</article-title>
          .
          <source>Proceedings of the IEEE 86</source>
          , (
          <year>1998</year>
          )
          <fpage>2278</fpage>
          -
          <lpage>2324</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kerrien</surname>
          </string-name>
          , et al.
          <article-title>Broadening the horizon-level 2.5 of the HUPO-PSI format for molecular interactions</article-title>
          .
          <source>BMC Biol</source>
          <volume>5</volume>
          , (
          <year>2007</year>
          )
          <fpage>44</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>K.</given-names>
            <surname>Belhajjame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Garijo</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. Hettne</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Palma</surname>
            , R.,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          <string-name>
            <surname>Gomez-Perez</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bechhofer</surname>
            , G. Klyne, and
            <given-names>C. A.</given-names>
          </string-name>
          <string-name>
            <surname>Goble</surname>
          </string-name>
          ,
          <article-title>The Research Object Suite of Ontologies: Sharing and Exchanging Research Data and Methods on the Open Web</article-title>
          .
          <source>CoRR abs/1401</source>
          .4307, (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Burns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dasigi</surname>
          </string-name>
          , A. de Waard, and
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Hovy</surname>
          </string-name>
          ,
          <article-title>Automated detection of discourse segment and experimental types from the text of cancer pathway results sections</article-title>
          .
          <source>Database (Oxford)</source>
          (
          <year>2016</year>
          )
          <article-title>baw122</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>M. C. Chibucos</surname>
            ,
            <given-names>C. J.</given-names>
          </string-name>
          <string-name>
            <surname>Mungall</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Balakrishnan</surname>
            ,
            <given-names>K. R.</given-names>
          </string-name>
          <string-name>
            <surname>Christie</surname>
            ,
            <given-names>R. P</given-names>
          </string-name>
          <string-name>
            <surname>Huntley</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>White</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Blake</surname>
            ,
            <given-names>S. E.</given-names>
          </string-name>
          <string-name>
            <surname>Lewis</surname>
            , and
            <given-names>M</given-names>
          </string-name>
          <string-name>
            <surname>Giglio</surname>
          </string-name>
          ,
          <article-title>Standardized description of scientific evidence using the Evidence Ontology (ECO)</article-title>
          .
          <source>Database (Oxford)</source>
          <year>2014</year>
          , (
          <year>2014</year>
          ) bau075
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>T.</given-names>
            <surname>Russ</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Hovy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bota</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Burns</surname>
          </string-name>
          ,
          <article-title>Knowledge Engineering Tools for Reasoning with Scientific Observations and Interpretations: a Neural Connectivity Use Case</article-title>
          .
          <source>BMC Bioinformatics</source>
          <volume>12</volume>
          (
          <year>2011</year>
          )
          <fpage>351</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bandrowski</surname>
          </string-name>
          , et al.
          <article-title>The Ontology for Biomedical Investigations</article-title>
          .
          <source>PLoS One</source>
          <volume>11</volume>
          , (
          <year>2016</year>
          ) e0154556
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Burns</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Chalupsky</surname>
          </string-name>
          , 'Its All Made Up'
          <article-title>- Why we should stop building representations based on interpretive models and focus on experimental evidence instead</article-title>
          .
          <source>in Discovery Informatics: Scientific Discoveries Enabled by AI</source>
          (
          <year>2014</year>
          )
          <article-title>Quebec City</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Cohen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. R.</given-names>
            <surname>Hersh</surname>
          </string-name>
          ,
          <article-title>The TREC 2004 genomics track categorization task: classifying full text biomedical documents</article-title>
          .
          <source>J Biomed Discov Collab</source>
          <volume>1</volume>
          , (
          <year>2006</year>
          )
          <article-title>4</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Innocenti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Frittoli</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Ponzanelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Falck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Brachmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. P.</given-names>
            <surname>Di Fiore</surname>
          </string-name>
          , and G. Scita, Phosphoinositide 3
          <article-title>-kinase activates Rac by entering in a complex with Eps8, Abi1, and Sos-1</article-title>
          .
          <source>J Cell Biol</source>
          <volume>160</volume>
          (
          <year>2003</year>
          )
          <fpage>17</fpage>
          -
          <lpage>23</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>