=Paper=
{{Paper
|id=Vol-1795/paper44
|storemode=property
|title=Semantic Linking and Integration of Researchers and Research Organizations in DISQOVER
|pdfUrl=https://ceur-ws.org/Vol-1795/paper44.pdf
|volume=Vol-1795
|authors=Filip Pattyn,Steven Vandeschaeve,Stijn Vermaere,Paulo Van Huffel,Kenny Knecht,Hans Constandt
|dblpUrl=https://dblp.org/rec/conf/swat4ls/PattynVVHKC16
}}
==Semantic Linking and Integration of Researchers and Research Organizations in DISQOVER
==
Semantic linking and integration of researchers and research organizations in DISQOVER Filip Pattyn1 , Steven Vanderschaeve1 , Stijn Vermaere1 , Paolo Van Hu↵el1 , Kenny Knecht1 , and Hans Constandt ONTOFORCE, Ottergemsesteenweg-Zuid 808, 9000 Gent, Belgium filip.pattyn@ontoforce.com, WWW home page: http://www.ontoforce.com Abstract. Keywords: linked data, semantic web, ORCiD, GRID, data aggregation, smart searching 1 Background ONTOFORCE has developed DISQOVER (http://www.disqover.com), a se- mantic search engine with faceted search capabilities for life sciences. It cur- rently allows to search automatically across more than 115+ di↵erent public data sources that are aggregated, interlinked and contain information about 21 di↵erent data types. This system uses semantic web technologies to embrace the mapping ef- forts from di↵erent projects like Unified Medical Language System (UMLS), SNOMED CT, ICD10, ICD9, MedDRA, Human Disease Ontology (DO), Medi- cal Subject Headings (MeSH) and Human Phenotype Ontology (HPO) amongst others. These projects structure and encode information related to diseases, phe- notypes, and clinical signs. Many of the sources included in DISQOVER aren?t available in a semantic web format. Therefore, we developed a data source update pipeline that con- stantly checks the update status of the data at its source. It also means that the data needs a conversion step to a semantic web format (e.g. ttl) to be able to be linkable to other data sources. 2 Results Since we are trying to aggregate the information of identical concepts, we are investigating di↵erent mapping strategies. For example data sources describing information about chemicals contain di↵erent identifiers depending on the orig- inal source of the data. ChEMBL and PubChem could be considered as golden sources and other identifiers like a CAS number or an InChi key could be con- sidered for mapping too. One of the more challenging items in linking data is to solve the mapping issues of person names in publications, patents, clinical trials and grant appli- cations. Names can be misspelled, initials are sometimes used instead of a full 2 first name or one person could be annotated with di↵erent spellings. We used ORCiD (http://www.orcid.org) that provides a persistent digital identifier for researchers, as a golden source for persons. A researcher can make a personal profile on ORCiD and can add his or her scientific output to it. This makes it possible to change names into physical persons based on the claimed scientific output linked to an ORCiD profile. Subsequently, we employ mapping techniques to map the remaining names to these ORCiD Unique Resource Identifiers. As a result user profiles are directly linked to clinical trials, publications, patents, grant applications and indirectly to drugs, chemical, proteins, genes, pathways and more. Moreover, we try to solve the issue of di↵erent layouts and spellings of author affiliations in publications, clinical trials, patents and grant applications. The Global Research Identifier Database (or GRID) (http://grid.ac) is a curated catalogue with a worldwide coverage of research organizations. We digitize the author affiliations and map them to other entries of affiliations in public data sources. Overall, this work has let to a more in-depth linking of persons and organi- zations with other data types.