=Paper= {{Paper |id=Vol-1650/smbm16Sammartino |storemode=property |title=Annotation Process, Guidelines and Text Corpus of Small Non-Coding RNA Molecules: the MiNCor for MicroRNA Annotations |pdfUrl=https://ceur-ws.org/Vol-1650/smbm16Sammartino.pdf |volume=Vol-1650 |authors=Jose Camilla Sammartino,Martin Krallinger,Alfonso Valencia |dblpUrl=https://dblp.org/rec/conf/smbm/SammartinoKV16 }} ==Annotation Process, Guidelines and Text Corpus of Small Non-Coding RNA Molecules: the MiNCor for MicroRNA Annotations== https://ceur-ws.org/Vol-1650/smbm16Sammartino.pdf
Annotation process, guidelines and text corpus of small non-coding RNA
          molecules: the MiNCor for microRNA annotations

 José Camilla Sammartino                  Martin Krallinger                      Alfonso Valencia
 Department of Molecular Medicine             Centro Nacional de                    Centro Nacional de
     and Medical Biotechnology.           Investigaciones Oncológicas.         Investigaciones Oncológicas.
University of Naples Federico II, Italy          Madrid, Spain                         Madrid, Spain
 j.sammartino.88@gmail.com                mkrallinger@cnio.es                    avalencia@cnio.es




                     Abstract                             1      Introduction

                                                          MicroRNAs are small non-coding RNA molecules
                                                          involved in the post-transcriptional regulation of
 MicroRNA are small non-coding                            gene expression. In the last decade they have
 molecules that act as post-transcriptional               been linked to a wide spectrum of biologi-
 regulators of gene expression in a wide                  cal/developmental processes and diseases includ-
 spectrum of biological states. Mostly, the               ing cancer, metabolic disorders or infectious dis-
 information about microRNA is embed-                     eases (Bayoumi et al., 2016; Smith et al., 2015;
 ded in unstructured data (text files) which              Pogue et al., 2014; Ohtsuka et al., 2015; Pileczki
 needs specific text mining techniques                    et al., 2016). MicroRNAs are post-transcriptional
 for its retrieval and analysis.       These              regulators of gene expression acting on the mes-
 are generally based on supervised (or                    senger RNA target. The maturation of microR-
 semi-supervised) learning methods, which                 NAs is a double step-and-area process, starting
 require collections of neatly annotated                  in the Nucleus of the cell, where is cleaved then
 and categorised training data. In this study             exported in the Cytoplasm where is subjected to
 we propose a comprehensive granular                      another cleavage which produce a double-strands
 annotation protocol for the annotation                   microRNA of 22 nucleotides. This dsmicroRNA
 of non-coding RNA molecules, focusing                    is recognised by the RNA-Induced Silencing
 primarily on microRNA mentions. This                     Complex (RISC) (Stroynowska-Czerwinska et al.,
 annotation protocol was used to construct                2014). Even though the exact mechanism of ac-
 a manually annotated corpus (MiNCor                      tion of RISC is yet fully understood, there are ev-
 Gold) for microRNA mentions as well as                   idences that RISC is lead to the messenger RNA
 a large semi-automatically generated mi-                 (mRNA) target by the microRNA, which has a ho-
 croRNA mentions silver standard corpus                   mologous sequence to the 3’ - UnTranslated Re-
 (MiNCor Silver) and a large microRNA                     gion (3’ - UTR) of the target. The binding to this
 name dictionary. Therefore, the efficiency               region allows the regulation process, that can hap-
 of these standards was evaluated using a                 pen before or during the translation in protein of
 named entity recognition (NER) system in                 the messenger, which means that is possible to
 comparison with another microRNA men-                    have or don?t have protein products (Morozova
 tions standard freely available online. The              et al., 2012). Temporal and spatial expression
 NER system trained with our silver corpus                of these molecules is important as much as their
 showed a better performance, with higher                 expression levels, a modification in one of these
 precision (96,67% vs. 94,00%) and recall                 can lead to a dysregulation of the biological pro-
 (97,57% vs. 95,00%) on their test data and               cesses in which they are involved, with effects that
 on our (precision 89,26% vs. 88,97% and                  can expand to entire biological pathways. Differ-
 recall 90,03% vs. 86,74%). The corpora                   ent studies show the importance of a correct mi-
 and guidelines are freely downloadable at                croRNA post-transcriptional regulation to prevent
 http://zope.bioinfo.cnio.es/                             the development of pathological states and devel-
 mincor/minacor.tar.gz.                                   opment defects (Bhaskaran and Mohan, 2014), but
their importance is also enlightened by their pos-         The assembling of a corpus requires specific
sible application as fast, specific and non-invasive     documents that describe the annotation process
biomarkers in a large spectrum of harmful states         and define its guidelines.
(Rubio et al., 2016; Benz et al., 2016; Larrea et al.,       As for microRNAs, several attempts have been
2016). Furthermore, these molecules can be used          made to facilitate the extraction of information di-
as target in pharmacological therapies and clini-        rectly from the literature (Bagewadi et al., 2014;
cal application (for the diagnosis and the follow-       Griffiths-Jones et al., 2006; Li et al., 2015; Naeem
up) (Lin et al., 2014; Du et al., 2014; Mao et           et al., 2010; Xie et al., 2013). To our knowl-
al., 2013). This promoted the publication of an          edge there are three freely-available corpora for
increasing number of publications especially de-         microRNA (miRNA) mentions, two of them, Mir-
voted to the study of microRNA biology as well           Base and MirTex (Griffiths-Jones et al., 2006; Li et
as predictive bioinformatics analysis methodolo-         al., 2015), do provide very short annotation guide-
gies tailored to the characterisation of miRNA ex-       lines (Ambros et al., 2003; Meyers et al., 2008;
pression and target prediction.                          Griffiths-Jones, 2004) for the annotation of mi-
   Biomedical Natural Language Processing                croRNA mentions which mostly focus on the iden-
(BioNLP) techniques and text mining strategies           tification of single mentions, without considering
can be applied for the retrieval, filtering and anal-    more granular annotation types. The third cor-
ysis of knowledge from unstructured data such            pus (SCAI corpus) (Bagewadi et al., 2014), does
the scientific literature. One of the main hurdles       provide additional details and a set of annotation
for the implementation of text mining building           rules, but we believe that it underspecified some
block components is the construction of manually         of the relevant annotation criteria and it primarily
annotated text-bound corpora, as they require            focuses only on human microRNA Mentions. For
usually a considerable human workload together           instance it covers the annotation of species spe-
with annotators with deep domain knowledge and           cific prefixes, e.g. hsa for human miRNAs, but
basic linguistic expertise. The development of           does not annotated terms such as human preced-
corpora is a time-consuming, tedious and very so         ing miRNA mentions. Moreover, general prefixes
much needed process for Text Mining and BLP              (anti-, onco-, pre-, pri-), specific miRNA class
methods (Neves, 2014). To promote advances               names (angiomir, antagomir, isomir), as well as
in BLP, different competitive evaluations have           non-coding RNA names are not included in the an-
been held (Hersh et al., 2004; Kim et al., 2004;         notation process.
Hirschman et al., 2005), in which distinct groups           Here we propose a comprehensive annotation
participated in different tasks, ranking from doc-       protocol for labelling microRNA mentions in
ument retrieval, NER to complex relation/event           biomedical literature. It encompasses all mi-
extraction tasks (Hunter and Cohen, 2006). Those         croRNA mentions regardless of the species or ori-
challenges resulted in valuable text corpora that        gin, the maturing step or the classification and
have been re-used by the biomedical text mining          includes also a class of non-coding RNA names
community.                                               and miRNA clusters. This annotation protocol has
   Despite the release of several manually anno-         been iteratively refined and was then used for the
tated text corpora devoted to biological entities,       annotation of the MiNCor corpus, which as used
there isn’t one and only manual, work or refer-          for the evaluation of several microRNA mentions
ence that can be considered as a general guide to        recognition approaches. We believe that the re-
build specific guidelines which are usually writ-        lease of this MiNCor corpus guidelines might be
ten based on the background knowledge of the au-         useful as an annotation template for the corpus
thors or don’t include all the possible character-       construction of other biomedical entities.
istics. Furthermore, the annotation process can be          We tested our corpus in comparison with the
very variable and complex, due to the interconnec-       SCAI corpus, which is to our knowledge, the one
tion of different disciplines (medical/biological        whose guidelines are the most comprehensive so
and linguistic) and the different aims of the an-        far. Therefore, to test the efficiency of our corpus
notation (chemical compounds, disease, connec-           we trained and tested a named entity recognition
tion between mutated proteins and disease, case          (NER) system with it and evaluated the results in
reports).                                                comparison with SCAI, whose trainer and tester
for the NER system were the only ones with char-
acteristics that could be compared to ours.

2     Annotation protocol and guidelines
The guidelines for the MiNCor annotation proto-
col is composed of a 14 pages written manual de-
fined by a biotechnologist with extensive biologi-
cal knowledge, integrating information from pre-
vious miRNA corpora, revision of multiple differ-
ent resources (NCBI, MeSH terms, miRNA review
articles) and the model of the Manual for annota-
tion of chemical entities of the CHEMDNER cor-
pus (Krallinger et al., 2015). The annotation pro-
                                                        Figure 1: Here are shown the classes of microRNA
tocol is structured into rule types together with ex-
                                                        mention and the elements that are part of them. In
ample cases, which we call the GPNCE annotation
                                                        white are the different components of the mentions
system, standing for general rules, positive rules,
                                                        that can help discriminate the different classes.
negative rules, class rules and examples. We be-
                                                        The examples of different classes are highlighted
lieve that structuring the annotation protocol into
                                                        with different colours (one for each class).
such rules, makes it easier to follow the annota-
tion criteria by the human annotators during the
labelling of the mentions.
                                                        included examples with positive cases (specified
2.1    The GPCNE annotation protocol                    by the check mark symbol) and negative ones
                                                        (specified by the cross mark symbol). The defini-
We based our guidelines on a three phase anno-          tion of these rules was based on the Manual for an-
tation protocol that we called GPNCE (General,          notation of chemical entities of the CHEMDNER
Positive, Negative, Class and Examples).                corpus (Krallinger et al., 2015).
We firstly describe the different classes that can      The last phase, Examples, consist in two appendix
be identified in literature. We cover in detail         at the end of the manual in which are represented
six different classes of microRNA mentions: (1)         different examples of annotated mentions in sen-
general microRNA names, (2) specific microRNA           tences extracted from abstracts and possible errors
names, (3) multiple microRNA mentions, (4)              to avoid. To help visualising the correct labels,
nested microRNA mentions, (5) microRNA clus-            those were highlighted with a specific colour cod-
ter mentions and (6) other/non-coding RNA men-          ing system in regard of the class to which them
tions. Figure 1 provides different examples for the     belong.
classes.
In the second phase we propose three types of
rules for the annotation: General, Positive and
                                                        3   MiNCor Corpora
Negative. The General Rules describe the deci-
sions the annotator should take into account dur-
ing the annotation process (what constitutes at a       We decided to build two different corpus for mi-
general level a miRNA mention and how to deal           croRNA mention following the GPCNE annota-
with cases of uncertainty). The Positive Rules          tion protocol. The first one is a manually la-
describe how to annotate correct miRNA men-             beled corpus of 102 abstracts called MiNCor
tions, what to include in the mentions (positive        Gold, the second is a semi-automatically gen-
word-boundaries, prefixes, suffixes, symbols) and       erated corpus of 302K sentences called MiN-
illustrate the criteria through different examples      Cor Silver. The corpora and the guidelines can
of correctly annotated mentions. The Negative           be downloaded at http://zope.bioinfo.
Rules, cover what needs to be excluded from the         cnio.es/mincor/minacor.tar.gz. The
mentions during the annotation (negative-word-          directory contains six files with a ’README.txt’
boundaries, prefixes, part-of-speech entities, other    file which contains all the descriptions of the other
entities, wrong mentions). All the rules described      files.
3.1    MiNCor Gold
Using the MiNCor annotation guidelines a domain
expert annotator performed a manual labelling
of 102 abstracts with at least one microRNA
mention. The abstracts were randomly selected
from 3869 abstracts retrieved using the MeSH
query ”mirna” on Pubmed but restricting the
search only to papers published in 2016. The
labelling was performed manually using the
customised AnnotateIt web-interface http:
//ubio.bioinfo.cnio.es/people/
fleitner/mirnaner_test_250.html,
similar to the one used for the annotation of
the CHEMDNER-Patents Corpus (Krallinger
et al., 2015), but adjusting the different classes
and labels to the microRNA Mention Classes, a        Figure 2: This figure shows a schematic overview
schematic overviw of the annotation protocol is      of the process used for the construction of the
summarised in figure 2. Out of the 102 abstracts     MiNCor corpus.
we extracted a total of 1154 mentions. Table 1
provides an overview of the distribution of the
microRNA class types.                                tion?
                                                     In this example, if we follow the guidelines, the
      Type of Mention      Total mentions            correct labelling should be ’long intergenic non-
      General microRNA           607                 coding RNAs’ and ’lincRNAs’, but in this type of
      Specific microRNA          501                 cases the annotators labelled differently (one did
      Multiple microRNA          14                  the label as a single mention and the other did
      Nested microRNA             2                  as two separate mentions) showing uncertainty.
      Cluster microRNA            1                  Therefore, we suggest to not label the mention
      ncRNA microRNA             29                  when there are doubts.

                                                     3.2   MiNCor Silver
Table 1: MicroRNA class types number of men-
tions.                                               The MiNCor Silver was obtained from sentences
                                                     that were derived from PubMed results contain-
   To validate the annotation process 20 of these    ing the MeSH term microRNA as well as addi-
abstracts were randomly selected and de novo an-     tional manually defined microRNA search query
notated using the same annotation guidelines by      terms (”mirna”; ”microrna”; ”non-coding RNA”;
a second annotator. The results obtained were        ”lin-4”; ”let-7”, ”antagomir”; ”oncomir”). The
then compared with the first annotation consid-      research was limited to all the abstracts ad full
ering only perfect mention matches. The an-          text papers published starting from 2016. All
notator agreement scores resulted in: Precision:     the resulting files were segmented into sentences
99,00, Recall: 99,45 and F1: 99,22. All the non-     and then tagged using a large dictionary of mi-
overlapping labeled entities were analysed and       croRNA names (MiNCor lexicon). This dictio-
modified only when the annotators could agree, if    nary contained names derived from multiple mi-
there still was uncertainty the mentions were left   croRNA databases as well as microRNA men-
unlabelled. The errors in the annotation mostly      tions detected by GNormplus. We carried out a
concerned the non-coring RNA class mentions :        dictionary expansion step taking into account the
- [...the long intergenic non-coding RNAs (lincR-    nomenclature guidelines of microRNAs by con-
NAs) expressed in...]                                sidering core terms (e.g. miRNA, microRNA),
- Should we consider ’non-coding RNAs (lincR-        prefixes (e.g. hsa, mmu) and suffixes (e.g. -101;
NAs)’ as a unique mention or as a two separate       -23a/b). A dictionary pruning step was carried out
mention ’non-coding RNAs’, ’lincRNAs’?               to remove highly ambiguous mentions (e.g. ’MIR’
- Should ’long intergenic’ be included in the men-   was found to be referred to other entities as for
example the ’Space Station Mir’ (Johannes et al.,              Score       SCAI-Test   MiNCor-Test
2016)). After applying the dictionary look-up we               Precision     94,00     88,97
additionally used a cascade of rules to adjust the             Recall        95,00     86,74
mention boundaries, to cover for instance men-                 F-score       94,49     87,84
tions of co-ordinated microRNAs or lists of mi-
croRNAs. In the end our training silver corpus had      Table 2: Results for the CRF model trained on the
a total of 302’560 sentences, over 3’000’000 of to-     SCAI training set.
kens and over 175K labeled microRNA mentions,
the sum of the entities used in the different dictio-          Score       SCAI-Test   MiNCor-Test
naries for the post-processing amount to 788’784               Precision     96,67     89,26
different terms in total.                                      Recall        97,57     90,03
                                                               F-score       97,11     89,64


4   Comparison with other corpus                        Table 3: Results for the CRF model trained on the
                                                        MiNCor silver standard training set

We decided to use both the MiNCor corpus and
                                                        5   Results and Discussion
the SCAI miRNA corpus (Bagewadi et al., 2014)
as a test set to evaluate the performance of a          Using our guidelines we manually annotated 102
CRF-based miRNA entity tagger. We chose                 abstracts retrieved form Pubmed using the MeSH
the SCAI corpus, because it did provide anno-           query ”mirna” and filtering the results includ-
tation criteria, and thus allowed some interpre-        ing only the recent ones (2016). At the same
tation of differences in the annotation process.        time, with a more refined search on Pubmed
We constructed a miRNA entity tagger using              (including different MeSH queries) we extracted
the NERsuite toolkit (Cho et al., 2010). The            302K sentences that were semi-automatically la-
NERsuite toolkit is freely available at http:           beled following our guidelines and subsequently
//nersuite.nlplab.org. It is based on                   pruned with dictionary look-up and a cascade
the CRFsuite (http://www.chokkan.org/                   of rules to adjust the mention boundaries. We
software/crfsuite/ an implementation of                 then tested our corpora in comparison with the
the Conditional Random Field) (Okazaki, 2007)           SCAI manually labelled corpus using the NER-
and includes different feature types commonly           suite toolkits to perform the named entity recog-
used for biomedical NER tasks, including aspects        nition task for microRNA mention in litera-
covering lemmatization, Part-Of-Speech and word         ture. We used the MiNCor Gold as our test
morphology.                                             and the MiNCor Silver as the trainer to build
We trained two different NER models, one using          our model. To obtain the SCAI model we
the miRNA SCAI training set (201 manually la-           trained the NERsuite with their trainer download-
beled abstracts) and another based on a large sil-      able at http://www.scai.fraunhofer.
ver standard MiNCor training dataset comprising         de/mirna-corpora.html. As shown in Ta-
302K sentences. We generated a CRF model both           ble 2 and Table 3, the microRNA tagger models,
using the SCAI training set and the MiNCor silver       trained using the SCAI training set (Table 2) and
standard training set.                                  our dictionary/rule-based Silver Standard training
Both the two corpora used for the train of the mod-     set (Table 3), report lower scores when using our
els were segmented in sentences and tokenised.          corpus as gold standard (second column of the two
At token level the two corpora were lemmatised,         tables). This is due to the more granular defini-
labelled with Part-Of-Speech and chunking tags,         tion of the microRNA mentions and by including
and labelled following the I.O.B. format. The re-       for instance also other ncRNA types that were not
sults were defined in terms of Precision, Recall        labelled in the used training collections. On the
and F1. The obtained results with the two mod-          other end, our model had a better performance in
els are shown in table 2, for the SCAI test set, and    comparison with SCAI on both test sets, this is
in table 3, using the MiNCor gold standard test set.    due to our model, even though not being manually
Table 4 shows the overall statistics of the two test    curated, covers more possible mentions, including
corpora (MiNCor Gold and SCAI).                         microRNA mentions for all different species and
    Statistic           SCAI-Test    MiNCor-test        a named entity recognition task using the NER-
    Abstracts              100       102                suite toolkit and comparing the results with an-
    Sentences              780       1063               other microRNA tagger already available. Manu-
    Total Mentions         712       1154               ally curated corpora are considered a gold standard
    Unique Mentions        130       232                in Natural Language Processing because they can
                                                        generally reach higher level of accuracy. In our
Table 4: Statistics of the two microRNA test cor-       case that is not true, which provide an example
pora.                                                   of a good surrogate for manually annotated gold
                                                        standard corpora. At the moment there aren’t very
                                                        large gold standard for microRNA mention that
biosynthesis steps, furthermore, it includes more
                                                        encompass all the possible characteristic and types
classes of mentions, leading to a more comprehen-
                                                        of mention, which is why our MiNCor Silver can
sive identification.
                                                        be considered a better option, even though not be-
Even if our model had a better performance, the
                                                        ing manually curated, as shown by the results we
resulting score wasn’t perfect. Some of the main
                                                        obtained.
sources of errors related to the microRNA mention
                                                        In the future, our intent is to enlarge our guidelines
recognition was due to mention of lists of microR-
                                                        with other types of non-coding RNAs (e.g. riboso-
NAs, where microRNA mentions are expressed as
                                                        mial RNAs, transfer RNAs) that are not included
multiple overlapping entity mentions (mir-1, -23,
                                                        at the moment, provide a larger corpus of microR-
-33 and -101). Other errors occurred in the la-
                                                        NAs derived from full text and patent abstract sen-
belling of non-coding RNAs.
                                                        tences and describe additional rules to help defy-
Non-coding RNA mentions are hard to define be-
                                                        ing the relations of these molecules with other bio-
cause there isn’t a specific nomenclature to which
                                                        logical entities (e.g. chemical compounds, genes,
the researcher can refer. Nevertheless , there are
                                                        proteins).
resources online (NCBI, MeSH terms, miRNA re-
view articles, books) that can help in the definition
of this class. What we tried to do was to give rules    References
for the identification of non-coding RNA men-
                                                        Victor Ambros, Bonnie Bartel, David P Bartel, Christo-
tions, where the most important was that in case          pher B Burge, James C Carrington, Xuemei Chen,
of uncertainty the mention shouldn’t be labelled,         Gideon Dreyfuss, Sean R Eddy, SAM Griffiths-
which results in a lower accuracy for the model.          Jones, Mhairi Marshall, et al. 2003. A uniform sys-
                                                          tem for microrna annotation. Rna, 9(3):277–279.
6   Conclusion and future works                         Shweta Bagewadi, Tamara Bobić, Martin Hofmann-
                                                          Apitius, Juliane Fluck, and Roman Klinger. 2014.
Here we have presented the MiNCor corpora and             Detecting mirna mentions and relations in biomedi-
the Guidelines for the Annotation for microRNA            cal literature. F1000Research, 3.
and non-coding RNA mentions in scientific lit-
                                                        Ahmed S Bayoumi, Amer Sayed, Zuzana Broskova,
erature. The aim of this work was to provide              Jian-Peng Teoh, James Wilson, Huabo Su, Yao-
annotation guidelines that are comprehensive and          Liang Tang, and Il-man Kim. 2016. Crosstalk be-
explicative, using different examples for the an-         tween long noncoding rnas and micrornas in health
notation and rules to help the annotator during           and disease. International journal of molecular sci-
                                                          ences, 17(3):356.
the process. The availability of exhaustive guide-
lines for the annotation of biomedical entities is a    Fabian Benz, Sanchari Roy, Christian Trautwein,
very important contribution for Biomedical Natu-          Christoph Roderburg, and Tom Luedde. 2016. Cir-
                                                          culating micrornas as biomarkers for sepsis. Inter-
ral Language Processing tasks, because gives the          national journal of molecular sciences, 17(1):78.
researcher the possibility to have a standardised
tool that can help in the definition of a line of re-   M Bhaskaran and M Mohan. 2014. Micrornas history,
                                                          biogenesis, and their evolving role in animal devel-
search even without extensive knowledge of the            opment and disease. Veterinary Pathology Online,
field. Furthermore, the possibility to use prede-         51(4):759–774.
fined guidelines for the construction of corpora
                                                        HC Cho, N Okazaki, M Miwa, and J Tsujii. 2010.
can reduce the time needed for the process.               Nersuite: a named entity recognition toolkit. Tsu-
We also constructed two corpora (gold and sil-            jii Laboratory, Department of Information Science,
ver) using our guidelines and tested them with            University of Tokyo, Tokyo, Japan.
Bowen Du, Zhe Wang, Xin Zhang, Shipeng Feng,                 Bo Zhu, and Qi-Jing Li. 2014. Targeting mir-
  Guoxin Wang, Jianxing He, and Biliang Zhang.               23a in cd8+ cytotoxic t lymphocytes prevents tumor-
  2014. Microrna-545 suppresses cell proliferation by        dependent immunosuppression. The Journal of clin-
  targeting cyclin d1 and cdk4 in lung cancer cells.         ical investigation, 124(12):5352–5367.
  PloS one, 9(2):e88022.
                                                           Yiping Mao, Ramkumar Mohan, Shungang Zhang, and
Sam Griffiths-Jones, Russell J Grocock, Stijn Van Don-       Xiaoqing Tang. 2013. Micrornas as pharmacolog-
  gen, Alex Bateman, and Anton J Enright. 2006.              ical targets in diabetes. Pharmacological research,
  mirbase: microrna sequences, targets and gene              75:37–47.
  nomenclature. Nucleic acids research, 34(suppl
  1):D140–D144.                                            Blake C Meyers, Michael J Axtell, Bonnie Bartel,
                                                             David P Bartel, David Baulcombe, John L Bowman,
Sam Griffiths-Jones. 2004. The microrna registry. Nu-        Xiaofeng Cao, James C Carrington, Xuemei Chen,
  cleic acids research, 32(suppl 1):D109–D111.               Pamela J Green, et al. 2008. Criteria for annotation
                                                             of plant micrornas. The Plant Cell, 20(12):3186–
William Hersh, Ravi Teja Bhupatiraju, and Sarah Cor-         3190.
  ley. 2004. Enhancing access to the bibliome: the
  trec genomics track. Medinfo, 11(Pt 2):773–777.          Nadya Morozova, Andrei Zinovyev, Nora Nonne,
                                                             Linda-Louise Pritchard, Alexander N Gorban, and
Lynette Hirschman, Alexander Yeh, Christian                  Annick Harel-Bellan. 2012. Kinetic signatures of
  Blaschke, and Alfonso Valencia. 2005. Overview             microrna modes of action. Rna, 18(9):1635–1655.
  of biocreative: critical assessment of information
  extraction for biology.       BMC bioinformatics,        Haroon Naeem, Robert Küffner, Gergely Csaba, and
  6(Suppl 1):S1.                                             Ralf Zimmer. 2010. mirsel: automated extrac-
                                                             tion of associations between micrornas and genes
Lawrence Hunter and K Bretonnel Cohen. 2006.                 from the biomedical literature. BMC bioinformat-
  Biomedical language processing: what’s beyond              ics, 11(1):135.
  pubmed? Molecular cell, 21(5):589–594.
                                                           Mariana Neves. 2014. An analysis on the entity anno-
Bernd Johannes, Vyacheslav Salnitski, Alexander             tations in biological corpora. F1000Research, 3.
  Dudukin, Lev Shevchenko, and Sergey Bronnikov.
  2016. Performance assessment in the pilot experi-        Masahisa Ohtsuka, Hui Ling, Yuichiro Doki, Masaki
  ment on board space stations mir and iss. Aerospace       Mori, and George Adrian Calin. 2015. Microrna
  medicine and human performance, 87(6):534–544.            processing and human cancer. Journal of clinical
                                                            medicine, 4(8):1651–1667.
Jin-Dong Kim, Tomoko Ohta, Yoshimasa Tsuruoka,
   Yuka Tateisi, and Nigel Collier. 2004. Introduc-        Naoaki Okazaki. 2007. Crfsuite: a fast implementa-
   tion to the bio-entity recognition task at jnlpba. In     tion of conditional random fields (crfs).
   Proceedings of the international joint workshop on
   natural language processing in biomedicine and its      Valentina Pileczki, Roxana Cojocneanu-Petric, Maha-
   applications, pages 70–75. Association for Compu-         farin Maralani, Ioana Berindan Neagoe, and Robert
   tational Linguistics.                                     Sandulescu. 2016. Micrornas as regulators of
                                                             apoptosis mechanisms in cancer. Clujul Medical,
Martin Krallinger, Obdulia Rabal, Florian Leitner,           89(1):50.
 Miguel Vazquez, David Salgado, Zhiyong Lu,
 Robert Leaman, Yanan Lu, Donghong Ji, Daniel M            Aileen I Pogue, James M Hill, and Walter J Lukiw.
 Lowe, et al. 2015. The chemdner corpus of chemi-            2014. Microrna (mirna): sequence and stability,
 cals and drugs and its annotation principles. Journal       viroid-like properties, and disease association in the
 of cheminformatics, 7(S1):1–17.                             cns. Brain research, 1584:73–79.

Erika Larrea, Carla Sole, Lorea Manterola, Ibai            Mercedes Rubio, Quique Bassat, Xavier Estivill, and
   Goicoechea, Marı́a Armesto, Marı́a Arestin,              Alfredo Mayor. 2016. Tying malaria and micrornas:
   Marı́a M Caffarel, Angela M Araujo, Marı́a Araiz,        from the biology to future diagnostic perspectives.
   Marta Fernandez-Mercado, et al. 2016. New con-           Malaria journal, 15(1):1.
   cepts in cancer biomarkers: Circulating mirnas in
   liquid biopsies. International journal of molecular     Tanya Smith, Cha Rajakaruna, Massimo Caputo, and
   sciences, 17(5):627.                                      Costanza Emanueli. 2015. Micrornas in congeni-
                                                             tal heart disease. Annals of translational medicine,
Gang Li, Karen E Ross, Cecilia N Arighi, Yifan Peng,         3(21).
  Cathy H Wu, and K Vijay-Shanker. 2015. mirtex:
  A text mining system for mirna-gene relation extrac-     Anna Stroynowska-Czerwinska, Agnieszka Fiszer, and
  tion. PLoS Comput Biol, 11(9):e1004391.                    Wlodzimierz J Krzyzosiak. 2014. The panorama
                                                             of mirna-mediated mechanisms in mammalian cells.
Regina Lin, Ling Chen, Gang Chen, Chunyan Hu, Shan           Cellular and Molecular Life Sciences, 71(12):2253–
  Jiang, Jose Sevilla, Ying Wan, John H Sampson,             2270.
Boya Xie, Qin Ding, Hongjin Han, and Di Wu. 2013.
  mircancer: a microrna–cancer association database
  constructed by text mining on literature. Bioinfor-
  matics, page btt014.