Annotation process, guidelines and text corpus of small non-coding RNA molecules: the MiNCor for microRNA annotations José Camilla Sammartino Martin Krallinger Alfonso Valencia Department of Molecular Medicine Centro Nacional de Centro Nacional de and Medical Biotechnology. Investigaciones Oncológicas. Investigaciones Oncológicas. University of Naples Federico II, Italy Madrid, Spain Madrid, Spain j.sammartino.88@gmail.com mkrallinger@cnio.es avalencia@cnio.es Abstract 1 Introduction MicroRNAs are small non-coding RNA molecules involved in the post-transcriptional regulation of MicroRNA are small non-coding gene expression. In the last decade they have molecules that act as post-transcriptional been linked to a wide spectrum of biologi- regulators of gene expression in a wide cal/developmental processes and diseases includ- spectrum of biological states. Mostly, the ing cancer, metabolic disorders or infectious dis- information about microRNA is embed- eases (Bayoumi et al., 2016; Smith et al., 2015; ded in unstructured data (text files) which Pogue et al., 2014; Ohtsuka et al., 2015; Pileczki needs specific text mining techniques et al., 2016). MicroRNAs are post-transcriptional for its retrieval and analysis. These regulators of gene expression acting on the mes- are generally based on supervised (or senger RNA target. The maturation of microR- semi-supervised) learning methods, which NAs is a double step-and-area process, starting require collections of neatly annotated in the Nucleus of the cell, where is cleaved then and categorised training data. In this study exported in the Cytoplasm where is subjected to we propose a comprehensive granular another cleavage which produce a double-strands annotation protocol for the annotation microRNA of 22 nucleotides. This dsmicroRNA of non-coding RNA molecules, focusing is recognised by the RNA-Induced Silencing primarily on microRNA mentions. This Complex (RISC) (Stroynowska-Czerwinska et al., annotation protocol was used to construct 2014). Even though the exact mechanism of ac- a manually annotated corpus (MiNCor tion of RISC is yet fully understood, there are ev- Gold) for microRNA mentions as well as idences that RISC is lead to the messenger RNA a large semi-automatically generated mi- (mRNA) target by the microRNA, which has a ho- croRNA mentions silver standard corpus mologous sequence to the 3’ - UnTranslated Re- (MiNCor Silver) and a large microRNA gion (3’ - UTR) of the target. The binding to this name dictionary. Therefore, the efficiency region allows the regulation process, that can hap- of these standards was evaluated using a pen before or during the translation in protein of named entity recognition (NER) system in the messenger, which means that is possible to comparison with another microRNA men- have or don?t have protein products (Morozova tions standard freely available online. The et al., 2012). Temporal and spatial expression NER system trained with our silver corpus of these molecules is important as much as their showed a better performance, with higher expression levels, a modification in one of these precision (96,67% vs. 94,00%) and recall can lead to a dysregulation of the biological pro- (97,57% vs. 95,00%) on their test data and cesses in which they are involved, with effects that on our (precision 89,26% vs. 88,97% and can expand to entire biological pathways. Differ- recall 90,03% vs. 86,74%). The corpora ent studies show the importance of a correct mi- and guidelines are freely downloadable at croRNA post-transcriptional regulation to prevent http://zope.bioinfo.cnio.es/ the development of pathological states and devel- mincor/minacor.tar.gz. opment defects (Bhaskaran and Mohan, 2014), but their importance is also enlightened by their pos- The assembling of a corpus requires specific sible application as fast, specific and non-invasive documents that describe the annotation process biomarkers in a large spectrum of harmful states and define its guidelines. (Rubio et al., 2016; Benz et al., 2016; Larrea et al., As for microRNAs, several attempts have been 2016). Furthermore, these molecules can be used made to facilitate the extraction of information di- as target in pharmacological therapies and clini- rectly from the literature (Bagewadi et al., 2014; cal application (for the diagnosis and the follow- Griffiths-Jones et al., 2006; Li et al., 2015; Naeem up) (Lin et al., 2014; Du et al., 2014; Mao et et al., 2010; Xie et al., 2013). To our knowl- al., 2013). This promoted the publication of an edge there are three freely-available corpora for increasing number of publications especially de- microRNA (miRNA) mentions, two of them, Mir- voted to the study of microRNA biology as well Base and MirTex (Griffiths-Jones et al., 2006; Li et as predictive bioinformatics analysis methodolo- al., 2015), do provide very short annotation guide- gies tailored to the characterisation of miRNA ex- lines (Ambros et al., 2003; Meyers et al., 2008; pression and target prediction. Griffiths-Jones, 2004) for the annotation of mi- Biomedical Natural Language Processing croRNA mentions which mostly focus on the iden- (BioNLP) techniques and text mining strategies tification of single mentions, without considering can be applied for the retrieval, filtering and anal- more granular annotation types. The third cor- ysis of knowledge from unstructured data such pus (SCAI corpus) (Bagewadi et al., 2014), does the scientific literature. One of the main hurdles provide additional details and a set of annotation for the implementation of text mining building rules, but we believe that it underspecified some block components is the construction of manually of the relevant annotation criteria and it primarily annotated text-bound corpora, as they require focuses only on human microRNA Mentions. For usually a considerable human workload together instance it covers the annotation of species spe- with annotators with deep domain knowledge and cific prefixes, e.g. hsa for human miRNAs, but basic linguistic expertise. The development of does not annotated terms such as human preced- corpora is a time-consuming, tedious and very so ing miRNA mentions. Moreover, general prefixes much needed process for Text Mining and BLP (anti-, onco-, pre-, pri-), specific miRNA class methods (Neves, 2014). To promote advances names (angiomir, antagomir, isomir), as well as in BLP, different competitive evaluations have non-coding RNA names are not included in the an- been held (Hersh et al., 2004; Kim et al., 2004; notation process. Hirschman et al., 2005), in which distinct groups Here we propose a comprehensive annotation participated in different tasks, ranking from doc- protocol for labelling microRNA mentions in ument retrieval, NER to complex relation/event biomedical literature. It encompasses all mi- extraction tasks (Hunter and Cohen, 2006). Those croRNA mentions regardless of the species or ori- challenges resulted in valuable text corpora that gin, the maturing step or the classification and have been re-used by the biomedical text mining includes also a class of non-coding RNA names community. and miRNA clusters. This annotation protocol has Despite the release of several manually anno- been iteratively refined and was then used for the tated text corpora devoted to biological entities, annotation of the MiNCor corpus, which as used there isn’t one and only manual, work or refer- for the evaluation of several microRNA mentions ence that can be considered as a general guide to recognition approaches. We believe that the re- build specific guidelines which are usually writ- lease of this MiNCor corpus guidelines might be ten based on the background knowledge of the au- useful as an annotation template for the corpus thors or don’t include all the possible character- construction of other biomedical entities. istics. Furthermore, the annotation process can be We tested our corpus in comparison with the very variable and complex, due to the interconnec- SCAI corpus, which is to our knowledge, the one tion of different disciplines (medical/biological whose guidelines are the most comprehensive so and linguistic) and the different aims of the an- far. Therefore, to test the efficiency of our corpus notation (chemical compounds, disease, connec- we trained and tested a named entity recognition tion between mutated proteins and disease, case (NER) system with it and evaluated the results in reports). comparison with SCAI, whose trainer and tester for the NER system were the only ones with char- acteristics that could be compared to ours. 2 Annotation protocol and guidelines The guidelines for the MiNCor annotation proto- col is composed of a 14 pages written manual de- fined by a biotechnologist with extensive biologi- cal knowledge, integrating information from pre- vious miRNA corpora, revision of multiple differ- ent resources (NCBI, MeSH terms, miRNA review articles) and the model of the Manual for annota- tion of chemical entities of the CHEMDNER cor- pus (Krallinger et al., 2015). The annotation pro- Figure 1: Here are shown the classes of microRNA tocol is structured into rule types together with ex- mention and the elements that are part of them. In ample cases, which we call the GPNCE annotation white are the different components of the mentions system, standing for general rules, positive rules, that can help discriminate the different classes. negative rules, class rules and examples. We be- The examples of different classes are highlighted lieve that structuring the annotation protocol into with different colours (one for each class). such rules, makes it easier to follow the annota- tion criteria by the human annotators during the labelling of the mentions. included examples with positive cases (specified 2.1 The GPCNE annotation protocol by the check mark symbol) and negative ones (specified by the cross mark symbol). The defini- We based our guidelines on a three phase anno- tion of these rules was based on the Manual for an- tation protocol that we called GPNCE (General, notation of chemical entities of the CHEMDNER Positive, Negative, Class and Examples). corpus (Krallinger et al., 2015). We firstly describe the different classes that can The last phase, Examples, consist in two appendix be identified in literature. We cover in detail at the end of the manual in which are represented six different classes of microRNA mentions: (1) different examples of annotated mentions in sen- general microRNA names, (2) specific microRNA tences extracted from abstracts and possible errors names, (3) multiple microRNA mentions, (4) to avoid. To help visualising the correct labels, nested microRNA mentions, (5) microRNA clus- those were highlighted with a specific colour cod- ter mentions and (6) other/non-coding RNA men- ing system in regard of the class to which them tions. Figure 1 provides different examples for the belong. classes. In the second phase we propose three types of rules for the annotation: General, Positive and 3 MiNCor Corpora Negative. The General Rules describe the deci- sions the annotator should take into account dur- ing the annotation process (what constitutes at a We decided to build two different corpus for mi- general level a miRNA mention and how to deal croRNA mention following the GPCNE annota- with cases of uncertainty). The Positive Rules tion protocol. The first one is a manually la- describe how to annotate correct miRNA men- beled corpus of 102 abstracts called MiNCor tions, what to include in the mentions (positive Gold, the second is a semi-automatically gen- word-boundaries, prefixes, suffixes, symbols) and erated corpus of 302K sentences called MiN- illustrate the criteria through different examples Cor Silver. The corpora and the guidelines can of correctly annotated mentions. The Negative be downloaded at http://zope.bioinfo. Rules, cover what needs to be excluded from the cnio.es/mincor/minacor.tar.gz. The mentions during the annotation (negative-word- directory contains six files with a ’README.txt’ boundaries, prefixes, part-of-speech entities, other file which contains all the descriptions of the other entities, wrong mentions). All the rules described files. 3.1 MiNCor Gold Using the MiNCor annotation guidelines a domain expert annotator performed a manual labelling of 102 abstracts with at least one microRNA mention. The abstracts were randomly selected from 3869 abstracts retrieved using the MeSH query ”mirna” on Pubmed but restricting the search only to papers published in 2016. The labelling was performed manually using the customised AnnotateIt web-interface http: //ubio.bioinfo.cnio.es/people/ fleitner/mirnaner_test_250.html, similar to the one used for the annotation of the CHEMDNER-Patents Corpus (Krallinger et al., 2015), but adjusting the different classes and labels to the microRNA Mention Classes, a Figure 2: This figure shows a schematic overview schematic overviw of the annotation protocol is of the process used for the construction of the summarised in figure 2. Out of the 102 abstracts MiNCor corpus. we extracted a total of 1154 mentions. Table 1 provides an overview of the distribution of the microRNA class types. tion? In this example, if we follow the guidelines, the Type of Mention Total mentions correct labelling should be ’long intergenic non- General microRNA 607 coding RNAs’ and ’lincRNAs’, but in this type of Specific microRNA 501 cases the annotators labelled differently (one did Multiple microRNA 14 the label as a single mention and the other did Nested microRNA 2 as two separate mentions) showing uncertainty. Cluster microRNA 1 Therefore, we suggest to not label the mention ncRNA microRNA 29 when there are doubts. 3.2 MiNCor Silver Table 1: MicroRNA class types number of men- tions. The MiNCor Silver was obtained from sentences that were derived from PubMed results contain- To validate the annotation process 20 of these ing the MeSH term microRNA as well as addi- abstracts were randomly selected and de novo an- tional manually defined microRNA search query notated using the same annotation guidelines by terms (”mirna”; ”microrna”; ”non-coding RNA”; a second annotator. The results obtained were ”lin-4”; ”let-7”, ”antagomir”; ”oncomir”). The then compared with the first annotation consid- research was limited to all the abstracts ad full ering only perfect mention matches. The an- text papers published starting from 2016. All notator agreement scores resulted in: Precision: the resulting files were segmented into sentences 99,00, Recall: 99,45 and F1: 99,22. All the non- and then tagged using a large dictionary of mi- overlapping labeled entities were analysed and croRNA names (MiNCor lexicon). This dictio- modified only when the annotators could agree, if nary contained names derived from multiple mi- there still was uncertainty the mentions were left croRNA databases as well as microRNA men- unlabelled. The errors in the annotation mostly tions detected by GNormplus. We carried out a concerned the non-coring RNA class mentions : dictionary expansion step taking into account the - [...the long intergenic non-coding RNAs (lincR- nomenclature guidelines of microRNAs by con- NAs) expressed in...] sidering core terms (e.g. miRNA, microRNA), - Should we consider ’non-coding RNAs (lincR- prefixes (e.g. hsa, mmu) and suffixes (e.g. -101; NAs)’ as a unique mention or as a two separate -23a/b). A dictionary pruning step was carried out mention ’non-coding RNAs’, ’lincRNAs’? to remove highly ambiguous mentions (e.g. ’MIR’ - Should ’long intergenic’ be included in the men- was found to be referred to other entities as for example the ’Space Station Mir’ (Johannes et al., Score SCAI-Test MiNCor-Test 2016)). After applying the dictionary look-up we Precision 94,00 88,97 additionally used a cascade of rules to adjust the Recall 95,00 86,74 mention boundaries, to cover for instance men- F-score 94,49 87,84 tions of co-ordinated microRNAs or lists of mi- croRNAs. In the end our training silver corpus had Table 2: Results for the CRF model trained on the a total of 302’560 sentences, over 3’000’000 of to- SCAI training set. kens and over 175K labeled microRNA mentions, the sum of the entities used in the different dictio- Score SCAI-Test MiNCor-Test naries for the post-processing amount to 788’784 Precision 96,67 89,26 different terms in total. Recall 97,57 90,03 F-score 97,11 89,64 4 Comparison with other corpus Table 3: Results for the CRF model trained on the MiNCor silver standard training set We decided to use both the MiNCor corpus and 5 Results and Discussion the SCAI miRNA corpus (Bagewadi et al., 2014) as a test set to evaluate the performance of a Using our guidelines we manually annotated 102 CRF-based miRNA entity tagger. We chose abstracts retrieved form Pubmed using the MeSH the SCAI corpus, because it did provide anno- query ”mirna” and filtering the results includ- tation criteria, and thus allowed some interpre- ing only the recent ones (2016). At the same tation of differences in the annotation process. time, with a more refined search on Pubmed We constructed a miRNA entity tagger using (including different MeSH queries) we extracted the NERsuite toolkit (Cho et al., 2010). The 302K sentences that were semi-automatically la- NERsuite toolkit is freely available at http: beled following our guidelines and subsequently //nersuite.nlplab.org. It is based on pruned with dictionary look-up and a cascade the CRFsuite (http://www.chokkan.org/ of rules to adjust the mention boundaries. We software/crfsuite/ an implementation of then tested our corpora in comparison with the the Conditional Random Field) (Okazaki, 2007) SCAI manually labelled corpus using the NER- and includes different feature types commonly suite toolkits to perform the named entity recog- used for biomedical NER tasks, including aspects nition task for microRNA mention in litera- covering lemmatization, Part-Of-Speech and word ture. We used the MiNCor Gold as our test morphology. and the MiNCor Silver as the trainer to build We trained two different NER models, one using our model. To obtain the SCAI model we the miRNA SCAI training set (201 manually la- trained the NERsuite with their trainer download- beled abstracts) and another based on a large sil- able at http://www.scai.fraunhofer. ver standard MiNCor training dataset comprising de/mirna-corpora.html. As shown in Ta- 302K sentences. We generated a CRF model both ble 2 and Table 3, the microRNA tagger models, using the SCAI training set and the MiNCor silver trained using the SCAI training set (Table 2) and standard training set. our dictionary/rule-based Silver Standard training Both the two corpora used for the train of the mod- set (Table 3), report lower scores when using our els were segmented in sentences and tokenised. corpus as gold standard (second column of the two At token level the two corpora were lemmatised, tables). This is due to the more granular defini- labelled with Part-Of-Speech and chunking tags, tion of the microRNA mentions and by including and labelled following the I.O.B. format. The re- for instance also other ncRNA types that were not sults were defined in terms of Precision, Recall labelled in the used training collections. On the and F1. The obtained results with the two mod- other end, our model had a better performance in els are shown in table 2, for the SCAI test set, and comparison with SCAI on both test sets, this is in table 3, using the MiNCor gold standard test set. due to our model, even though not being manually Table 4 shows the overall statistics of the two test curated, covers more possible mentions, including corpora (MiNCor Gold and SCAI). microRNA mentions for all different species and Statistic SCAI-Test MiNCor-test a named entity recognition task using the NER- Abstracts 100 102 suite toolkit and comparing the results with an- Sentences 780 1063 other microRNA tagger already available. Manu- Total Mentions 712 1154 ally curated corpora are considered a gold standard Unique Mentions 130 232 in Natural Language Processing because they can generally reach higher level of accuracy. In our Table 4: Statistics of the two microRNA test cor- case that is not true, which provide an example pora. of a good surrogate for manually annotated gold standard corpora. At the moment there aren’t very large gold standard for microRNA mention that biosynthesis steps, furthermore, it includes more encompass all the possible characteristic and types classes of mentions, leading to a more comprehen- of mention, which is why our MiNCor Silver can sive identification. be considered a better option, even though not be- Even if our model had a better performance, the ing manually curated, as shown by the results we resulting score wasn’t perfect. Some of the main obtained. sources of errors related to the microRNA mention In the future, our intent is to enlarge our guidelines recognition was due to mention of lists of microR- with other types of non-coding RNAs (e.g. riboso- NAs, where microRNA mentions are expressed as mial RNAs, transfer RNAs) that are not included multiple overlapping entity mentions (mir-1, -23, at the moment, provide a larger corpus of microR- -33 and -101). Other errors occurred in the la- NAs derived from full text and patent abstract sen- belling of non-coding RNAs. tences and describe additional rules to help defy- Non-coding RNA mentions are hard to define be- ing the relations of these molecules with other bio- cause there isn’t a specific nomenclature to which logical entities (e.g. chemical compounds, genes, the researcher can refer. Nevertheless , there are proteins). resources online (NCBI, MeSH terms, miRNA re- view articles, books) that can help in the definition of this class. What we tried to do was to give rules References for the identification of non-coding RNA men- Victor Ambros, Bonnie Bartel, David P Bartel, Christo- tions, where the most important was that in case pher B Burge, James C Carrington, Xuemei Chen, of uncertainty the mention shouldn’t be labelled, Gideon Dreyfuss, Sean R Eddy, SAM Griffiths- which results in a lower accuracy for the model. Jones, Mhairi Marshall, et al. 2003. A uniform sys- tem for microrna annotation. Rna, 9(3):277–279. 6 Conclusion and future works Shweta Bagewadi, Tamara Bobić, Martin Hofmann- Apitius, Juliane Fluck, and Roman Klinger. 2014. Here we have presented the MiNCor corpora and Detecting mirna mentions and relations in biomedi- the Guidelines for the Annotation for microRNA cal literature. F1000Research, 3. and non-coding RNA mentions in scientific lit- Ahmed S Bayoumi, Amer Sayed, Zuzana Broskova, erature. The aim of this work was to provide Jian-Peng Teoh, James Wilson, Huabo Su, Yao- annotation guidelines that are comprehensive and Liang Tang, and Il-man Kim. 2016. Crosstalk be- explicative, using different examples for the an- tween long noncoding rnas and micrornas in health notation and rules to help the annotator during and disease. International journal of molecular sci- ences, 17(3):356. the process. The availability of exhaustive guide- lines for the annotation of biomedical entities is a Fabian Benz, Sanchari Roy, Christian Trautwein, very important contribution for Biomedical Natu- Christoph Roderburg, and Tom Luedde. 2016. Cir- culating micrornas as biomarkers for sepsis. Inter- ral Language Processing tasks, because gives the national journal of molecular sciences, 17(1):78. researcher the possibility to have a standardised tool that can help in the definition of a line of re- M Bhaskaran and M Mohan. 2014. Micrornas history, biogenesis, and their evolving role in animal devel- search even without extensive knowledge of the opment and disease. Veterinary Pathology Online, field. Furthermore, the possibility to use prede- 51(4):759–774. fined guidelines for the construction of corpora HC Cho, N Okazaki, M Miwa, and J Tsujii. 2010. can reduce the time needed for the process. Nersuite: a named entity recognition toolkit. Tsu- We also constructed two corpora (gold and sil- jii Laboratory, Department of Information Science, ver) using our guidelines and tested them with University of Tokyo, Tokyo, Japan. Bowen Du, Zhe Wang, Xin Zhang, Shipeng Feng, Bo Zhu, and Qi-Jing Li. 2014. Targeting mir- Guoxin Wang, Jianxing He, and Biliang Zhang. 23a in cd8+ cytotoxic t lymphocytes prevents tumor- 2014. Microrna-545 suppresses cell proliferation by dependent immunosuppression. The Journal of clin- targeting cyclin d1 and cdk4 in lung cancer cells. ical investigation, 124(12):5352–5367. PloS one, 9(2):e88022. Yiping Mao, Ramkumar Mohan, Shungang Zhang, and Sam Griffiths-Jones, Russell J Grocock, Stijn Van Don- Xiaoqing Tang. 2013. Micrornas as pharmacolog- gen, Alex Bateman, and Anton J Enright. 2006. ical targets in diabetes. Pharmacological research, mirbase: microrna sequences, targets and gene 75:37–47. nomenclature. Nucleic acids research, 34(suppl 1):D140–D144. Blake C Meyers, Michael J Axtell, Bonnie Bartel, David P Bartel, David Baulcombe, John L Bowman, Sam Griffiths-Jones. 2004. The microrna registry. Nu- Xiaofeng Cao, James C Carrington, Xuemei Chen, cleic acids research, 32(suppl 1):D109–D111. Pamela J Green, et al. 2008. Criteria for annotation of plant micrornas. The Plant Cell, 20(12):3186– William Hersh, Ravi Teja Bhupatiraju, and Sarah Cor- 3190. ley. 2004. Enhancing access to the bibliome: the trec genomics track. Medinfo, 11(Pt 2):773–777. Nadya Morozova, Andrei Zinovyev, Nora Nonne, Linda-Louise Pritchard, Alexander N Gorban, and Lynette Hirschman, Alexander Yeh, Christian Annick Harel-Bellan. 2012. Kinetic signatures of Blaschke, and Alfonso Valencia. 2005. Overview microrna modes of action. Rna, 18(9):1635–1655. of biocreative: critical assessment of information extraction for biology. BMC bioinformatics, Haroon Naeem, Robert Küffner, Gergely Csaba, and 6(Suppl 1):S1. Ralf Zimmer. 2010. mirsel: automated extrac- tion of associations between micrornas and genes Lawrence Hunter and K Bretonnel Cohen. 2006. from the biomedical literature. BMC bioinformat- Biomedical language processing: what’s beyond ics, 11(1):135. pubmed? Molecular cell, 21(5):589–594. Mariana Neves. 2014. An analysis on the entity anno- Bernd Johannes, Vyacheslav Salnitski, Alexander tations in biological corpora. F1000Research, 3. Dudukin, Lev Shevchenko, and Sergey Bronnikov. 2016. Performance assessment in the pilot experi- Masahisa Ohtsuka, Hui Ling, Yuichiro Doki, Masaki ment on board space stations mir and iss. Aerospace Mori, and George Adrian Calin. 2015. Microrna medicine and human performance, 87(6):534–544. processing and human cancer. Journal of clinical medicine, 4(8):1651–1667. Jin-Dong Kim, Tomoko Ohta, Yoshimasa Tsuruoka, Yuka Tateisi, and Nigel Collier. 2004. Introduc- Naoaki Okazaki. 2007. Crfsuite: a fast implementa- tion to the bio-entity recognition task at jnlpba. In tion of conditional random fields (crfs). Proceedings of the international joint workshop on natural language processing in biomedicine and its Valentina Pileczki, Roxana Cojocneanu-Petric, Maha- applications, pages 70–75. Association for Compu- farin Maralani, Ioana Berindan Neagoe, and Robert tational Linguistics. Sandulescu. 2016. Micrornas as regulators of apoptosis mechanisms in cancer. Clujul Medical, Martin Krallinger, Obdulia Rabal, Florian Leitner, 89(1):50. Miguel Vazquez, David Salgado, Zhiyong Lu, Robert Leaman, Yanan Lu, Donghong Ji, Daniel M Aileen I Pogue, James M Hill, and Walter J Lukiw. Lowe, et al. 2015. The chemdner corpus of chemi- 2014. Microrna (mirna): sequence and stability, cals and drugs and its annotation principles. Journal viroid-like properties, and disease association in the of cheminformatics, 7(S1):1–17. cns. Brain research, 1584:73–79. Erika Larrea, Carla Sole, Lorea Manterola, Ibai Mercedes Rubio, Quique Bassat, Xavier Estivill, and Goicoechea, Marı́a Armesto, Marı́a Arestin, Alfredo Mayor. 2016. Tying malaria and micrornas: Marı́a M Caffarel, Angela M Araujo, Marı́a Araiz, from the biology to future diagnostic perspectives. Marta Fernandez-Mercado, et al. 2016. New con- Malaria journal, 15(1):1. cepts in cancer biomarkers: Circulating mirnas in liquid biopsies. International journal of molecular Tanya Smith, Cha Rajakaruna, Massimo Caputo, and sciences, 17(5):627. Costanza Emanueli. 2015. Micrornas in congeni- tal heart disease. Annals of translational medicine, Gang Li, Karen E Ross, Cecilia N Arighi, Yifan Peng, 3(21). Cathy H Wu, and K Vijay-Shanker. 2015. mirtex: A text mining system for mirna-gene relation extrac- Anna Stroynowska-Czerwinska, Agnieszka Fiszer, and tion. PLoS Comput Biol, 11(9):e1004391. Wlodzimierz J Krzyzosiak. 2014. The panorama of mirna-mediated mechanisms in mammalian cells. Regina Lin, Ling Chen, Gang Chen, Chunyan Hu, Shan Cellular and Molecular Life Sciences, 71(12):2253– Jiang, Jose Sevilla, Ying Wan, John H Sampson, 2270. Boya Xie, Qin Ding, Hongjin Han, and Di Wu. 2013. mircancer: a microrna–cancer association database constructed by text mining on literature. Bioinfor- matics, page btt014.