DC-THERA Directory, a Knowledge Management System for the support of the European Dendritic Cell Immunology Community Marco Brandizi1,*, Michaela Gündel1,*, Ciro Scognamiglio1, Andrea Splendiani1 1 Leaf Bioscience s.r.l., Via G. Puccini 3, 20121 Milan, Italy {marco, michaela.guendel, ciro.scognamiglio, andrea}@leafbioscience.com * These authors equally contributed to this work. Abstract. DC-THERA Directory is a web portal to support collaboration, communication and knowledge sharing within DC-THERA, a community focused on immunology. We show how we have faced the problem of repres- enting and managing highly heterogeneous and interconnected knowledge. One aspect of the application interface is the search and navigation through web ontologies. Another aspect is the dynamic representation of information entities having variable sets of properties. These results have been achieved by adopting a modelling approach that combines traditional object-oriented modelling with a triple-based knowledge representation. We discuss advantages of such an approach, especially for what concerns the possible future integration of other information sources. Keywords: Knowledge Management Systems, Bio-Ontologies, Dendritic Cells, On Line Communities, Semantic Web. Introduction Knowledge Management Systems based on web technologies are tools widely used for promoting collaboration and information flow among organizations, including sci- entific communities [1]. Moreover, standard terminology and formal ontologies in knowledge applications are particularly important for the life sciences domain [2], as this is characterised by highly heterogeneous and interconnected information. These technologies are also useful for supporting cooperation in communities put together by research projects [3,4]. An example of that is DC-THERA1, a European project, focusing on integrating a large community of researchers and therapists on dendritic cell research, an important topic in immunology. In the DC-THERA context, people often need to answer questions like: are certain entities (e.g., blood samples of a certain kind or a purification method) available in the network? Who is maintaining them? Who is expert in their usage? How can I contact them? These requirements are addressed in the DC-THERA Directory, a web-based knowledge management system that allows to collect summary information about the bio-medical resources available among the DC-THERA researchers. The Directory eases knowledge sharing and exchange by providing a single, unified access point to a variety of bio-entities. It promotes collaboration by representing links between bio-re- sources and people who are related to them (for example, someone having experience in applying a given laboratory protocol, or someone else studying a certain data set). 1 http://www.dc-thera.org Furthermore, by integrating other biological repositories (e.g., about specific experi- mental data, or about scientific publications), it allows to narrow down an initial search or find desired details that are managed outside the scope of the Directory it- self. Finally, having an electronic repository where to store reference information about the work and achievements produced in the context of an EU-funded research project helps in not dispersing such information and potentially keeping it available to the general public. Figure 1: a list of resources (top) and a person description. The Directory The application currently supports the main categories visible in fig. 1. Clicking on one of them, or their sub-categories, a list of existing resources is shown, together with a taxonomy of sub-types defined for that category, which is built based on the underlying ontologies. From here, it is possible to open the details form of a particular resource in a “subject centric” view, that puts together resource properties, such as title or description, and links to other items (fig. 1, bottom). Both the details view and the list of properties can flexibly vary, depending on the specific resource. Another use case is the classical keyword-based search. An auto-completion fea- ture suggests search hints while typing, by dynamically querying (via AJAX) the knowledge base and proposing expressions such as term variants, associated keywords and synonyms (again, the knowledge contained in the ontologies is used for that). Results are presented by highlighting the terms in the text. Depending on the re- source category that is queried, additional related external data are shown (e.g.: list of publications in fig. 1; gene expression information, coming from EBI's Gene Expres- sion Atlas2 or BASE installations [5]). Architecture and Implementation. DC-THERA Directory has been designed by adopting a modelling approach that is similar to the one proposed in [6]. On the one hand, an object model has been defined, implemented in PHP (fig. 2), containing classes for the main resource types and defining general and specific properties in top- level and pertinent classes respectively. On the other hand, the representation of prop- erty/value pairs and helpers for querying the knowledge base by using triple patterns has been embedded in the object model (e.g.: finds all resources that are p- related to r). Figure 2: the DC-THERA Directory object model. This “mixed” model makes use of the advantages of both the world of structured object-oriented architecture and the world of semi-structured “RDF-like” triples. The PHP classes can contain implementation-specific code, such as methods invoked by the Symfony3 framework (e.g.: the Dataset class in fig. 2). The “triplified” part allows to dynamically define any set of properties and links for a resource, e.g., character- ising the resource with terms from external ontologies, or embedding triple statements obtained by external services. For example, we have selected and imported those Open Biomedical Ontologies [7] that are suitable to represent the kind of information 2 http://www.ebi.ac.uk/gxa 3 http://www.symfony-project.org currently available in the Directory and the tackled domain – especially OBI, the On- tology for Biomedical Investigations4. Discussion and future work A first stable version of the Directory was released to the project's participants in July 2009. Before, a test had been done with a selected panel of pilot users, who helped in evaluating and further improving the quality of the tool and its contents by answering a questionnaire (unpublished). This had confirmed that the application and the way it has been developed is found useful by the users. As an example of improvement sug- gested by the survey, we are using the BioLexicon [8] text mining tool to extend onto- logy classes with term variants, extracted from literature text mining. This will improve keyword-based searches, which already use ontologies for aspects like syn- onyms or semantically close terms. Another feature under development is an export of the Directory content to the RDF/OWL format. This could, for instance, be useful to integrate and compare the knowledge base with similar on line resources, to find useful information without the need to query many repositories one by one [9]. Another example is comparing the Directory by semantic similarity, analysing ontologically related terms [10]. In conclusion, as the first user feedback suggests, the DC-THERA Directory provides an effective way to make available the knowledge asset developed in the five-year project time. As far as we know, it is the first time that the hereby described hybrid modelling approach is used to build a rapid web development infrastructure. This allows both the development of similar bio-medical applications and, thanks to the fact it is compatible with Semantic Web technologies, to integrate them together. References 1. Das, S., et al.: Building biomedical web communities using a semantically aware content management system. Brief Bioinform;10(2):129-38 (2009) 2. Coskun, G., et al.: Towards Corporate Semantic Web: Requirements and Use Cases. Freie Universität Berlin, Tech Rep TR-B-08-09 (2008) 3. Clark, T., Kinoshita, J.: Alzforum and SWAN: The Present and Future of Scientific Web Communities. Brief. in Bioinformatics 8(3):163-171 (2007) 4. Gaines, B. R., Shaw, M. L. G.: Knowledge management for research communities. Proc AAAI Spring Symposium on A.I. in Knowledge Management, Stanford University, pp 55- 62 (1997) 5. Saal, L. H., et al.: BioArray Software Environment: A Platform for Comprehensive Manage- ment and Analysis of Microarray Data. Genome Biology 3(8): software0003.1-0003.6 (2002) 6. Puleston, C., et al.: Integrating object-oriented and ontological representations: A case study in Java and OWL. The Semantic Web - ISWC 2008, pp. 130-145 (2008) 7. Smith, B., et al.: The OBO Foundry: coordinated evolution of ontologies to support biomed- ical data integration. Nat Biotechnol. (11):1251-5 (2007) 8. Rebholz-Schuhmann, D., et al.: BioLexicon: Towards a Reference Terminological Resource in the Biomedical Domain. Proc ISMB-2008 (2008) 9. Heng, T. S. P., et al.: The Immunological Genome Project: networks of gene expression in immune cells. Nat Immunology 9, 1091 - 1094 (2008) 10.Pedersen, T., et al.: Measures of semantic similarity and relatedness in the biomedical do- main. J of Biomedical Informatics, Vol. 40, Issue 3. (2007) 4 http://purl.obofoundry.org/obo/obi