=Paper=
{{Paper
|id=Vol-2469/ERForum3
|storemode=property
|title=Integration and Analysis of Clinical and Genomic Data of Neuroblastoma applying Conceptual Modeling
|pdfUrl=https://ceur-ws.org/Vol-2469/ERForum3.pdf
|volume=Vol-2469
|authors=Sipan Arevshatyan,José Fabián Reyes Román,Verónica Burriel,Adela Cañete,Victoria Castel,Óscar Pastor
|dblpUrl=https://dblp.org/rec/conf/er/ArevshatyanRBCC19
}}
==Integration and Analysis of Clinical and Genomic Data of Neuroblastoma applying Conceptual Modeling==
Integration and Analysis of Clinical and Genomic Data of Neuroblastoma applying Conceptual Modeling Sipan Arevshatyan1[0000-0001-8718-2211], José Fabián Reyes Román1[0000-0002-9598-1301], Verónica Burriel2, Adela Cañete3, Victoria Castel3, and Óscar Pastor1[0000-0002-1320- 8471] 1 PROS Research Center, Universitat Politècnica de València, Camino Vera s/n. 46022, Valencia, Spain 2 Department of Information and Computing Sciences, Utrecht University, Domplein 29, 3512 JE, Utrecht, The Netherlands 3 Pediatric Oncology Unit of Hospital Universitari i Politècnic La Fe, Valencia, Spain siar5@doctor.upv.es, {jreyes|opastor}@pros.upv.es, v.burriel@uu.nl, {canyete_ade|castel_vic}@gva.es Abstract. Data management and analysis for risk assessment of rare and complex diseases such as Neuroblastoma require efficient management of multidisciplinary data. Recent advances in genomic testing are revealing new publicly available data whose storage and analysis with clinical and genomic data is becoming a big challenge. The use of Conceptual Modeling (CM) techniques helps to define and structure the Neuroblastoma domain, which serves as a basis to determine the information required for diagnosing the disease. It is important to highlight that a Genomic Information System (GeIS) based on a conceptual model allows improving the adaptation of new requirements of the domain, and greatly simplifies the integration and management of heterogeneous and homogeneous data. The main objectives of this work are: i) to present a Conceptual Model of Neuroblastoma (CMN), which defines all elements involved in the clinical and genomic domain. ii) to apply the SILE method, in order to obtain all (clinically) relevant variations associated with Neuroblastoma from genomic data sources. The developed GeIS is intended to make the correct exploitation of the validated data set to provide an early and efficient risk assessment for patients with Neuroblastoma. Keywords: Neuroblastoma, CM, GeIS, CMN, SILE Method, PM 1 Introduction Since the first complete human genome was published in 2003, there has been an astonishing progress regarding speed and cost. This task took 13 years and an approximate expense of $3 billion [1]. With the barrier of $1000 per genome Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 29 already broken, data acquirement is no longer a challenge for Next-Generation Sequencing (NGS) technologies. The storing, managing and making sense of the data has become the main issue since it requires a broad spectrum of specialists including IT, computational biologists, genetic counsellors or pathologists [2]. Once the raw sequence data is obtained, it is aligned to the reference sequence. According to a recent study performed on over 2,500 individuals, everyone differs at 4.1 to 5 million sites from that reference genome [3]. The combination of these genetic variations (or variants), known as “genotype”, together with environmental factors, determines its host physical traits (known as “phenotype”). The main goal of genomic medicine is to understand the genotype-phenotype relationships. Considering that the reference sequence does not codify for any serious condition (which is not clear is completely true [4]), the genetic driver of a disease can be found among the group of differences between a patient’s genome and the reference one. Therefore, Whole Genome Sequencing also known as WGS, may assist in differential diagnosis [5]. Intensive research has aimed to reveal associations between genetic variations and diseases through Genome-Wide Association Studies (GWAS) [6]. These associations are especially interesting in rare diseases in which hereditary studies are hindered by the high lethality in early childhood. Representing more than 7% of all pediatric cancers and causing around 15% oncology deaths, rarity and early lethality unfortunately fit with Neuroblastoma’s definition. Neuroblastoma is the most common extra-cranial solid tumor in childhood. It usually appears within the abdomen, neck, chest or pelvis, and its symptoms depend on the tumor’s location [7]. The variability of clinical presentation and likelihood of cure are examples of one of the most remarkable hallmarks of Neuroblastoma: its clinical heterogeneity. Although some tumors undergo spontaneous regression, others progress despite aggressive therapy [8]. This disparate clinical behavior has been shown to be related to biological factors such as age at diagnosis, tumor histology or genetic aberrations. For many years, research groups have used different factors in order to stratify patients for risk- based clinical trials. These differences prevented results from being compared. To ease that analysis, several international efforts have resulted in standards for patient classification and staging, such as the International Neuroblastoma Staging System (INSS) [9], International Neuroblastoma Risk Group Staging System (INRGSS) [10] and International Neuroblastoma Risk Group (INRG) classification System [11]. The INSS consists of six stages (1, 2a, 2b, 3, 4 and 4S) according to the degree of surgical resection. Patients are assigned a stage post-surgery. Since that fact makes INSS not suitable for pre-treatment risk-based classification, a new INRGSS was defined. In contrast to INSS, INRGSS classifies patients in four stages (L1, L2, M and MS) [9] depending on clinical criteria and image-defined risk factors. The INRGSS together with other genetic features as tumor ploidy, chromosome 11q aberration, and “v-myc avian myelocytomatosis viral oncogene neuroblastoma derived homolog” (MYCN) amplification are considered when assigning patients to its corresponding INRG Classification System risk group. 30 The variations in the number and size of chromosomes as well as in the number of MYCN repetitions are not the only genetic aberrations associated with Neuroblastoma development and outcome. Also, smaller variations like substitutions in genes such as “anaplastic lymphoma kinase” (ALK) [12], “pairedlike homeobox 2b” (PHOX2B) [13] and “kinesin family member 1B” (KIF1B) [14] are shown to be related to the hereditary disease. Similarly, variations in regions between genes have been found to affect Neuroblastoma progression. Moreover, they have a cumulative effect allowing to assign patients to different risk-based groups, depending on the type of variations they carry. These variations are normally presented in scientific publications. Although many of them remain locked in unstructured sources, some researchers have focused on targeting genetic variations in order to transfer them into curated clinical databases [15]. Some of these text mining efforts have successfully located genetic variations within papers and loaded into databases (e.g., ClinVar 1) with clinically relevant information. Scientists also have the option to directly load the variations they find in these databases or many others with different backgrounds. ClinVar and Human Gene Mutation Database (HGMD, www.hgmd.cf.ac.uk/) are examples of clinical repositories gathering information about the effect that genetic variations have in humans, independently of the population or possible treatments for patients who carry them. Drugs targeting specific variations can be obtained by browsing the Comparative Toxic genomics Database (CTD, ctdbase.org/). On the other hand, the variation frequencies in different populations can be found in dbSNP 2, together with a small portion of the sequence containing the variation. The complete sequence of genes is stored in databases such as Nucleotide 3 or Ensembl (https://www.ensembl.org/). These archives sometimes contain many sequences for a single gene. To facilitate study comparisons a Reference Sequence (RefSeq, https://www.ncbi.nlm.nih.gov/refseq/) database was created which stores a unique sequence per gene. This is only a small selection of databases showing the existing heterogeneity among the so-called “genomic chaos”. As seen above, valuable information in the decision-making process following a genetic test is dispersed over several sources. The same variation can be stored in different databases, which sometimes offers the same information. That fact leads in the best of the cases to redundancy. The data does not always match though; incongruities arise in these cases which can compromise the patient’s response to the prescribed treatment [16]. The main objective of the research is to create a conceptual model of Neuroblastoma which will integrate clinical and genomic data. Applying CM on the genomic domain is highly helpful since it allows to define it accurately, hence creating a robust structure from which data can be efficiently managed. According to Olivé [17] the activity which ultimately defines the general knowledge on which an Information System (IS) works is what we know as CM. Since an IS 1 https://www.ncbi.nlm.nih.gov/clinvar/ 2 https://www.ncbi.nlm.nih.gov/snp/ 3 https://www.ncbi.nlm.nih.gov/nucleotide/ 31 built upon a non-described knowledge is considered to be unpredictable, the obtaining of such a description is the main aim of CM. Once a genomic conceptual model is created, it is possible to design Genomic Information Systems (GeIS) [18], which play a key role in the biomedical domain. Taking into consideration the heterogeneity, dispersion and redundancy which characterize the genomic domain, the use of conceptual models structuring its basic features is of great use in the way towards Precision Medicine (PM) [19]. Personalizing the treatment, depending on the patient’s genome and environment, will only be possible if there is a clear definition of the domain and a robust GeIS able to integrate and manage multidisciplinary data. To achieve our goal following this research line, in Section 2 we firstly give a brief view of the current state of CM in the biological domain, as well as the contemporary condition of Neuroblastoma research. A Conceptual Model of Neuroblastoma (CMN) and an online tool built on it are introduced in Section 3. Section 4 shows the process of searching and identifying relevant genetic variations affecting Neuroblastoma development, and how to load them into our database, which will allow data exploitation. Finally, the conclusions and future works are presented in Section 5. 2 State of the Art We study the state of the art in two different fields. Firstly, we describe how CM is applied to the genomic domain in order to provide deep knowledge to generate the CMN, and design and develop a system adapted to the needs of Pediatric Oncology department of the Hospital Universitari i Politècnic La Fe (HUP/IIS La Fe). Secondly, we study the Neuroblastoma’s domain and the available databases in use by the hospital. This was an important step of the research in order to figure it out how to create a conceptual model adapted to the needs of the department and additionally, implement advanced data management techniques in order to improve decision making processes. 2.1 Conceptual Modeling in Biology Despite the large amount of publicly available genomic data repositories, it is not usual to find stable conceptual models underlying them. Since most of the accessible archives require storage improvements, CM in the biological domain has been used over the years. Initial proposals in the field were introduced by Paton [20], who presented several models defining cellular genetic components, products and their interactions. The work developed by Ram [21] was focused on the protein context and proved conceptual modeling usefulness in searching and comparing large volumes of 3D structural data. CM is not only an approach to 32 describe and represent a specific domain but also aids in software production [34]. The models have been already used in bioinformatics. By applying this method Garwood et al. [22] created user interfaces for querying biological data repositories. Considering data archives designed so that variations associated to diseases are stored, a proper data management may assist in generating a diagnosis and recommending personalized treatments. Keeping in mind the principles of PM, this research group 4 developed GeIS in order to manage some diseases like usher syndrome [23], breast cancer [24], alcohol sensitivity [18], among others; based on its respective domain conceptualization. More generally, with the aim of creating a standardized vocabulary and unequivocally defining the genomic domain, the Conceptual Model of the Human Genome (CMHG) [25-26] was created representing all relevant information in the genomic domain as a whole. This model led to the development of a Human Genome Database (HGDB) [26], which can be exploited by using “GenesLove.Me” (see more information in [27]), an internally matured tool believed to be of great use in genetic diagnostics. The use of CM is gaining momentum as a software development approach in the medical domain since it greatly improves geneticists’, lab scientists’ and physicians’ work. Through the creation of GeIS, diagnostic tools can be developed to allow the creation of accurate diagnosis once the right information is identified and collected. 2.2 Neuroblastoma Given the relative rarity of Neuroblastoma, small volumes of data are readily available. Its correct management is then paramount in order to properly understand the molecular basis of the disease. To make Neuroblastoma data widely available and thus facilitate international, multi-disciplinary research, an INRG database [28] was created with the specific aims of enabling complex queries and linking their results to biological databanks and publicly available datasets. In 2013, after two years of work and when the most recent progress report was published, this database contained a total of thirty-four biological metrics on approximately 11,000 patients, including tumor stage, MYCN status, tissue availability, race, ethnicity or site of relapse [28]. Since several common SNP (Single Nucleotide Polymorphism) alleles have been demonstrated to be involved in Neuroblastoma tumorigenicity [29-30] genomic data stored in the database of Genotypes and Phenotypes (dbGaP, https://www.ncbi.nlm.nih.gov/gap/) the NGS and TARGET 5 data were planned to be added to INRG database. The INRG international collaboration intends to compensate the relative lack of research focused on Neuroblastoma. This shortage was noticed after exhaustive 4 Research Center on Software Production Methods (PROS), http://www.pros.webs.upv.es 5 Therapeutically Applicable Research to Generate Effective Treatments, https://ocg.cancer.gov/programs/target 33 searches for scientific information prior to the project in collaboration with the Pediatric Oncology Unit of HUP/IIS La Fe (http://www.hospital-lafe.com/) in Valencia (Spain). Nowadays, the information related to Neuroblastoma available in databases is mainly clinical. The genetic tests performed on tumor cells in order to diagnose the disease analyses only a few variations for which there is a known treatment. This catalogue of druggable genetic variations is growing as new therapeutic targets are discovered and innovative genetic tests are developed. The information storage and analysis are becoming a challenge since it needs to meet the new requirements. This work is an effort to unify, in a single database, clinical and genetic information with the aim of assessing Neuroblastoma patient’s risk. Building a GeIS on a database gathering interdisciplinary information will allow unveiling genetic patterns within Neuroblastoma cohorts. Together with recent technological advances in molecular testing, and also in the computer sciences (e.g., applying Data Sciences, Artificial Intelligence, and Big Data techniques), could help and improve our understanding about genetic basis of the disease, and would ultimately lead to an efficient diagnosis. A precise diagnosis will in turn assist in the process of developing more effective targeted therapies. 3 Domain Definition The CM and ontological characterization of basic features making up a specific environment assure consistency, correction and an efficient exploitation of its specialized datasets. This is especially true for those repositories designed to store complex data coming from heterogeneous sources. Neuroblastoma represents an excellent example of a disease whose treatment choice, application and traceability require advanced technologies able to gather and manage data coming from many different fields, ranging from histological tests to nuclear medicine. The users of these technologies fit quite different profiles encompassing pathologists, geneticists, lab assistants or bioinformaticians. A precise definition of the domain is highly advantageous in order to build applications able to store valuable information for each of them separately and, most important, to obtain worthwhile conclusions from its aggregated analysis. For that reason, the CMN (Fig. 1) was created in collaboration with experts of the Clinical and Translational Cancer Research Group (GICT-Cancer) from HUP/IIS La Fe. This CMN is the result of iterative meetings and discussions with the experts of the Research Group. Through the discussions and analyzes carried out, the CMN was generated, which facilitates the understanding of the domain of Pediatric Oncology. 34 Fig. 1. The Conceptual Model of Neuroblastoma (CMN) 35 Next, the description of the CMN is presented, which contains all the elements of the previously defined domain. In the model the “Patient” class is characterized by attributes such as “name”, “NRG”, “date of birth” and “date of diagnosis”, “age” or “weight” and could be considered the center of the model. This class represents information about individuals attending a specific healthcare institution, represented in the model by the “Hospital” class. To start with, the patient is assigned a “Status” class, which is divided into “Deceased”, “Relapse” and “Disease Free” classes respectively. Given the Neuroblastoma heterogeneity, different diagnostic tests are performed depending on biological and genetic factors. Several standard procedures regulating tests were developed, which are represented by “Protocol” class. The disease development stages are also dependent on the protocol the patient is assigned to. They are represented by “Phase” class, whose intersection with patients in a specific phase results in the “Episode” class. The former class refers to tests performed, diagnosis arising from the tests and the treatment applied to the patient. One or several tests can be carried out during a single episode of the disease so its progression can be tracked. Since several tests coexist, with both common and unique attributes, “Test” class is divided into “Histological”, “Radiological”, “Nuclear Medicine”, “Biological” and “Minimal Residual Disease”. The “Histological” class has two classes which inherit from it. These classes are “Histological Marrow” class and “Histological Tumor” class, depending on the analyzed tissue. The tumor tissue can be used for genetic analysis, represented in the model by “Genetic Test” class. The results from these tests are referred by “Genetic Result”. All the previous analysis is performed in order to characterize the “Tumor” class, whose size, location and possible dissemination (i.e., “Metastasis”) are recorded. The tumor features together with other significant test results assist in generating the “Diagnosis”, which ultimately determines the prescribed “Treatment”. Since there are several treatment options, this entity is specialized in the classes “Chemotherapy”, “Surgery”, “Transplant Aut”, “Radiotherapy”, “MIBG” and “Maintenance”. Some of these treatments involve toxic substances whose effects need to be tracked. For that reason, the “Toxicity” class was created which refers to the toxicity level. The “Genetic Results” class has a relationship with the “Variation” which represents every genomic variation in that patient. If those variations have been reported in scientific literature and are stored in a public genomic database, a relation will be defined with “External DB” class. A “Variation” is related with the "Chromosome" in which it is located, and also it could be located inside a “Gene”. The relationship between the “Gene” and “Chromosome” classes shows the chromosome where the gene is located. The “Variation” class is divided into “Precise” and “Imprecise” classes to specify if the position of the mutation is known. In particular “Deletion”, “Insertion” and “Inversion” classes show the position of the deletion, inversion and insertion of nucleotide sequence in the DNA sequence of the chromosome while the “Indel” class represents consistent variations in insertions and deletions at the same time in the DNA sequence of the chromosome. In contrast to “Precise” class the “Imprecise” does not contain the position as it is unknown. This class could be 36 specialized in 2 subclasses to define two kinds of imprecise variations, such as changes in the number of copies of certain DNA fragments (also called Copy Number Variation) - “CNV” class – and big changes affecting the structure of the chromosome – “Structural” class. Besides that, the "Amplification" class represents an increase in the expression of a specific "Gene", which is represented by a relationship between these two classes. This model allows non-experts to understand the environment and easily use tools built on it. In particular, it was used as a basis to design an online tool meant for the management of clinical and genomic data collected from patients with Neuroblastoma. Furthermore, its structure does not limit its extension but leaves it open to evolution by adding possible diagnostic tests, drug targets or innovative treatments. Several technologies were used in the development process of the web-based tool (prototype) aimed for Neuroblastoma clinical and genomic data management. Among these technologies we can highlight, Java (https://www.java.com/en/) using Java Persistence API 6 for the database loading process. JavaScript (https://www.javascript.com/) was used in the development of the tool itself. Specifically, the backend was designed with Node.js (https://nodejs.org/en/), Jade (https://jade.tilab.com/) while Bootstrap (https://getbootstrap.com/), jQuery (https://jquery.com/) were used for frontend development. An analysis of the available data storage technologies was carried out in order to determine the most appropriate one based on the type, quantity and subsequent use of the data to be stored. In this stage, the relational SQL technology was selected. However, in the future in case of the expansion of the data and according to the needs of the doctors the data could be migrated to NoSQL technologies (e.g. MongoDB or Neo4j among others). This prototype is based on the conceptual model mentioned above and currently is being tested by the experts of the GICT-Cancer from HUP/IIS La Fe. In this stage, the prototype is being checked for errors, bugs with the aim of improving the prototype according to the needs of the doctors. 4 Applying the SILE Method to Neuroblastoma In order to assess Neuroblastoma risk from NGS data, the SILE (Search- Identification-Load-Exploitation) method was followed [31]. Its main goal is to systematize the search and identification of genomic information to be loaded, analyzed and exploited by a GeIS based on the CMHG [26]. A summary of the activities taking place at each level of the method is defined in Table 1. As previously stated, Neuroblastoma has been associated to/with a wide variety of genetic variations by means of different study methods. Since risk assessment in Neuroblastoma does not only rely on biological issues but also on clinical and 6 https://www.oracle.com/technetwork/java/javaee/tech/persistence-jsp-140049.html 37 genetic features, the SILE method [35] represents a good choice to unify and efficiently use all the available information. Table 1. Description of each level of the SILE method [31] Level Description (S) Search Determination of the information context, required to solve a concrete need, as well as the selection of data source from which to extract information (I) Identification Determination of a reliable and relevant dataset to be used to populate a database which structure is delimited by the CSHG (L) Load Population of the database with the data identified in the previous level (E) Exploitation Extraction of knowledge form the database by using tools to analyse and interpret genomic data 4.1 Results obtained from the SILE method • Search. An exhaustive research on existing integrative databases was carried out in order to study other group’s strategies (e.g., where to obtain variations from and how to properly annotate them). DisGeNET (www.disgenet.org/) was the major finding of this intensive search. Its creators define it as a discovery platform which integrates information on gene-disease associations (GDAs) from public data archives and the literature. Interestingly, DisGeNET classifies data as “Curated”, “Predicted” and “All” depending on the original source it comes from. We also have used ClinVar and dbGaP genomic repositories to find that variations [32]. • Identification. In this stage, the data which was collected from different data sources presented in the first stage of the SILE method, was analyzed in order to remove possible redundancies and other quality issues. The main aim of this step is to prepare the data for loading into the database. At the beginning there were 996 variations collected from different databases. After the Identification stage the complete validated dataset consisted of 375 clinically relevant variations annotated in order to allow a GeIS to efficiently locate and display the data [32-33]. Furthermore, search and identification processes allowed us to spot several data quality issues such as redundancy or inconsistency which must be solved before loading process. • Load. A Database of Neuroblastoma (DBN) was developed in our research group in order to efficiently store the clinical and genomic data. It was based on the CMN mentioned above (Fig. 1). Both the DBN and CMN were developed with the aim of defining all features related to research, diagnose and treatment of Neuroblastoma and creating a strong structure on which a GeIS has been developed. Based on the GeIS previous studies, Neuroblastoma data was loaded into the DBN so that it 38 could be exploited using an online tool, the GeIS, which is presented in the “Exploitation” section. • Exploitation. The exploitation will be carried out through the prototype (Fig. 2) developed for the GICT-Cancer from HUP/IIS La Fe. After the testing of the prototype by the experts it will be available for clinicians to use. The exploitation of the prototype will ultimately lead to a genetic risk assessment based on validated evidences available on public resources. Fig. 2. The prototype developed to integrate and analyse clinical and genomic data The design and the development of the prototype are based on emphasizing the efficiency, usability, and security aspects (e.g., the prototype will take into account all the current regulations of Data Protection Law 7). A Web-based approach is the selected strategy to design the application as it has some well-reported advantages for the considered working domain. The web-based software gives the user flexible access to the all necessary tools or required data. Although Model-Driven Engineering (MDE) tools can generate software code automatically by using the model as an artifact we used model-based archetypes [36] to specify user interaction strategies. Archetypes represent different views offered to the users. In order to design the interface of the application, some archetypes have been created based on the CM. The prototype stores and manages patient’s demographic information, episode description, complementary information, treatments, pathological and genomic 7 https://www.boe.es/buscar/doc.php?id=BOE-A-2018-16673 39 information. Additionally, the prototype has an analysis section where the clinicians can define certain queries and get information from the GeIS. 5 Conclusions and Further Work The main result of this work was the definition of the CMN, with the aim of making the disease widely understandable for any personal profile involved in its diagnosis, treatment and traceability process. The thorough analysis of the domain allowed us to spot which information was valuable for the disease diagnosis, its availability and current challenges regarding its management. In order to overcome the contemporary difficulties, a GeIS based on the CMN was built which was capable of efficiently managing Neuroblastoma-related clinical and genetic data. The implementation of the SILE method allowed to define a validated dataset of variations associated with Neuroblastoma, this group of variations was selectively loaded into our DBN, and finally the exploitation was carried out through the developed prototype, which is currently being tested. The construction of the CMN has been proved to be highly convenient in assisting the development of the GeIS since it provides a robust structure supporting a correct data organization. Besides that, the model is also flexible in the way it could easily adapt to new diagnostic tests or treatment strategies, hence growing as Neuroblastoma related knowledge does. Although only curated databases have been browsed in order to obtain the validated set of variations, other databases might be useful for doctors in their decision-making process. Following this research line, the CMN will be extended by characterizing new "concepts" or "knowledge" involved in the domain and adapting the developed prototype according to the new requirements. This study can serve as a basis or prototype for the risk assessment of many other genetic conditions. In the future it is foreseen to provide an IS that can be used in the Hospitals of the Community of Valencia (and even in the country) where all the studies and analyzes on Neuroblastoma can help researchers to continue advancing in the treatment and monitoring of this disease. It is important to highlight that the use of CM is very beneficial and efficient for the construction of GeISs, providing a strong structure for the data they are meant to manage, and capable of adapting heterogeneous, disperse, redundant and changing environment it ultimately defines. Acknowledgment The authors would like to thank the members of the PROS Research Center Genome group for the fruitful discussions regarding the application of CM in the medicine field. In addition, we would like to thank Dra. Desireé Ramal, Dra. Vanessa Segura, Dra. Blanca Martínez, and Dra. Yania Yáñez as experts in Neuroblastoma for their contribution to this research. This work was supported by the Spanish Ministry of Science and Innovation through Project DataME (ref: TIN2016-80811-P), the Preparatory Action - UPV / IIS La Fe (C07-ClinGenNBL, 40 2018) and the Generalitat Valenciana through project GISPRO (PROMETEO/2018/176). References 1. Van Dijk, E.L. et al.: “Ten years of next-generation sequencing technology”, Trends Genet., vol. 30, pp. 418-426 (2014). 2. Mardis, E.R.: “The $1,000 genome, the $100,000 analysis?”, Genome Med., DOI: 10.1186/gm205 (2010). 3. Auton, A., Abecasis, G.: The 1000 Genomes Project Consortium, “A global reference for human genetic variation”, Nature, vol. 526, pp. 68-74 (2015). 4. Chen, R., Butte, A.J.: “The reference human genome demonstrates high risk of type 1 diabetes and other disorders”, Pac Symp Biocomput., pp. 231-242, Jan. 2011. 5. Gonzaga-Jauregui C., Lupski J.R., Gibbs R.A.: “Human Genome Sequencing in Health and Disease”, Annu Rev Med, vol. 63, pp. 35-61 (2012). 6. Li, X. et al.: “Genome-wide association study identifies four SNPs associated with response to platinum-based neoadjuvant chemotherapy for cervical cancer”, Sci Rep, vol. 7, DOI: 10.1038/srep41103 (2017). 7. Maris, J.M.: “Recent advances in neuroblastoma”, N Engl J Med., vol. 362(23), pp. 2202-2211 (2010). 8. Cao, Y. et al.: “Research progress of neuroblastoma related gene variations”, Oncotarget, vol. 5, DOI: 10.18632/oncotarget.14408 (2016). 9. Monclair, T. et al.: “The International Neuroblastoma Risk Group (INRG) staging system: an INRG Task Force report”, J Clin Oncol., vol. 27, pp. 289-297 (2009) 10. Cohn, S.L. et al.: “The International Neuroblastoma Risk Group (INRG) classification system: An INRG Task Force Report”, J Clin Oncol., vol. 27, pp. 289-297(2009). 11. Castel, V. et al.: “Prospective evaluation of the International Neuroblastoma Staging System (INSS) and the International Neuroblastoma Response Criteria (INRC) in a multicentre setting”, Eur J Cancer., vol. 35, pp. 606-611 (1999). 12. Bourdeaut, F. et al.: “ALK germline mutations in patients with neuroblastoma: a rare and weakly penetrant syndrome”, Eur J Hum Genet., vol. 20, pp. 291-297 (2012). 13. Van Limpt, V. et al.: “The Phox2B homeobox gene is mutated in sporadic neuroblastomas”, Oncogene, vol. 23, pp. 9280-9288 (2004). 14. Yeh, I-T. et al.: “A germline mutation of the KIF1Bb gene on 1p36 in a family with neural and nonneural tumors”, Hum Genet, vol. 124, pp. 279-285 (2008). 15. Jimeno Yepes, A., Verspoor, K.: “Literature mining of genetic variants for curation: quantifying the importance of supplementary material”, Database (Oxford). DOI: 10.1093/database/bau003 (2014). 16. León, A. et al.: “Data Quality problems when integrating genomic information”. Conceptual Modeling (ER2016): 35th International Conference, 3rd. Workshop Quality of Models and Models of Quality (QMMQ 2016), pp. 173-182 (2016) 17. Olivé, A.: “Conceptual Modeling of Information Systems”, 1st ed, Springer-Verlag (2007). 18. Reyes Román, J.F., Pastor, Ó.: Use of GeIS for early diagnosis of alcohol sensitivity. In Proceedings of the International Joint Conference on Biomedical Engineering Systems and Technologies (pp. 284-289). SCITEPRESS-Science and Technology Publications, Lda (2016). 41 19. Roden, D.M., Tyndale, R.F., “Genomic Medicine, Precision Medicine, Personalized Medicine: What’s in a Name?” Clin Pharmacol Ther., vol. 94, pp. 169-172 (2013). 20. Paton, N. W. et al.: “Conceptual Modelling of Genomic Information”, Bioinformatics., vol. 16, pp. 548-557 (2000). 21. Ram, S. Wei, W.: “Modeling the semantics of 3D protein structures”, In Conceptual Modeling-ER 2004, Springer Berlin Heidelberg, pp. 696- 708 (2004). 22. Garwood, K. et al.: “Model-driven user interfaces for bioinformatics data resources: regenerating the wheel as an alternative to reinventing it,” BMC bioinformatics., vol. 7, p. 532 (2006). 23. Burriel, V. et al.: “Design and Development of an Information System to Manage Clinical Data about Usher Syndrome Based on Conceptual Modeling”, BIOTECHNO (2013). 24. Burriel, V., Pastor, Ó.: “Conceptual Schema of Breast Cancer: the background to design an efficient information system to manage data from diagnosis and treatment of breast cancer patients”, IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) (2014). 25. Reyes Román, J.F., Pastor, Ó., Casamayor, J.C., Valverde, F.: Applying conceptual modeling to better understand the human genome. In International Conference on Conceptual Modeling (pp. 404-412). Springer, Cham (2016). 26. Reyes Román, J.F.: Diseño y Desarrollo de un Sistema de Información Genómica Basado en un Modelo Conceptual Holístico del Genoma Humano. PhD Thesis, Universitat Politècnica de València. https://riunet.upv.es/handle/10251/99565 (2018). 27. Reyes Román, J.F., García, A., Rueda, U., Pastor, Ó.: GenesLove.Me 2.0: Improving the Prioritization of Genetic Variations. In: Damiani E., Spanoudakis G., Maciaszek L. (eds) Evaluation of Novel Approaches to Software Engineering. ENASE 2018. Communications in Computer and Information Science, vol 1023. Springer (2019). 28. Cohn, S.L.: “The interactive International Neuroblastoma Risk Group (INRG) database (db) 2nd year progress report” (2013). 29. Maris, J.M. et al.: “A genome-wide association study identifies a susceptibility locus to clinically aggressive neuroblastoma at 6p22”, N Engl J Med. (358), pp. 2585-2593 (2008). 30. Capasso, M. et al.: “Replication of GWAS-identified neuroblastoma risk loci strengthens the role of BARD1 and affirms the cumulative effect of genetic variations on disease susceptibility”, Carcinogenesis., vol. 34, pp. 605-611 (2013). 31. León, A., Pastor, Ó.: Smart Data for Genomic Information Systems: the SILE Method. Complex Systems Informatics and Modeling Quarterly, (17), pp.1-23 (2018). 32. Burriel, V. et al.: GeIS based on conceptual models for the risk assessment of neuroblastoma. In 2017 11th RCIS (pp. 451-452). IEEE (2017). 33. Soler, C.: Diseño de un sistema de información genómica para el diagnóstico del Neuroblastoma. https://riunet.upv.es/handle/10251/85716, Bachelor degree Project (2017). 34. Pastor, O., Molina, J.C.: Model-driven architecture in practice: a software production environment based on conceptual modeling. Springer Science & Business Media (2007). 35. León, A., Pastor, Ó.: From big data to smart data: a genomic information systems perspective. In 2018 12th RCIS, pp. 1-11. IEEE (2018). 36. Burriel, V.: Diseño y Desarrollo de un Sistema de Información para la Gestión de Información sobre Cáncer de Mama. PhD Thesis, Universitat Politècnica de València. https://riunet.upv.es/handle/10251/86158 (2017).