A Collaborative Framework for Distributed Scientific Groups Muthukkaruppan Annamalai Leon Sterling Glenn Moloney Dept. of Computer Science and Dept. of Computer Science and School of Physics, Software Engineering, Software Engineering, The University of Melbourne, The University of Melbourne, The University of Melbourne, Victoria 3010, Australia. Victoria 3010, Australia. Victoria 3010, Australia. +61 3 83445455 +61 3 83449100 +61 3 83449100 glenn@physics.unimelb.edu.au mkppan@cs.mu.oz.au leon@cs.mu.oz.au ABSTRACT groups (http://www.slac.stanford.edu/BFROOT/), are involved in The potential of collaborative work can be further harnessed if the a similar study [1]. implicit knowledge in the collaboration documents can be The research groups within a collaboration analyse the huge sets exploited. Together with the Experimental High-Energy Physics of data produced in an experiment, using various analysis (EHEP) community, we are investigating the use of ontologies for techniques. The analyses attempt to discern the hidden pattern scientific collaboration. The EHEP collaborative work revolves inside the data sets. Typically, the results of the analyses are around experimental analyses. We propose an intuitive way to communicated to fellow researchers in the form of pre-prints and establish augmented collaborative experimental analysis research notes and is stored in the collaboration's publication documents. The collaboration documents are annotated with archive, like http://belle.kek.jp/belle/publications/. This kind of appropriate semantic descriptors, linked to ontologies published information sharing leads to scientific productivity and trust. A on the web. Our initiative will necessarily lead the EHEP research group is free to verify and extend the analysis work of community to produce and share information innovatively. This another. development is an epitome of large-scale scientific collaborations While the information about the experiment and its data is evident and will provide the impetus for a more rapid scientific advance. to all members of the collaboration, the research group that performed the analyses holds the complete knowledge about the 1. INTRODUCTION analyses. Most EHEP publications do not provide detail The WWW has become the defacto collaborating medium for description about the analyses performed. In the absence of a distributed scientific community to interchange information prescribed set of analysis description guidelines, authors generally among them. There is still much human mediation involved to state aspects of the analysis procedure, which they think is utilise this information. The human effort can be largely reduced, essential to be conveyed to the readers. As in the case of when the information is exchanged with meanings attached. The experimental science publications, there is a tendency among key enabler for this meaningful collaboration is ontologies. authors to presume readers already have knowledge about the analysis procedure. The publications mainly highlight the results Ontologies are a specification of conceptualisation [3] and are in of the analyses that account for the observed phenomena. essence, a set of formally defined vocabulary in a shared domain. An ontology does not have to be a universally standardised The Experimental analyses described in this fashion, with language. However, its usability depends chiefly on its adoption publication bogged down with tacit knowledge are prone to be as a collaborating language by a user community. Following this, misunderstood, particularly by new researchers or researchers who our research aims to demonstrate that suitable ontologies can be are not familiar with the kind of analyses mentioned in the constructed to support the exchange of meaningful information document. Often times, a researcher trying to replicate published within a distributed EHEP collaboration. experimental analyses, ends up with relatively different result. Precious time is expended trying to correctly interpret the 2. THE EHEP COLLABORATIVE WORK experimental analyses, which often results in tedious debugging of EHEP is dominated by large collaborations with membership the analysis procedure. from all over the world. For instance, the Belle collaboration (http://belle.kek.jp/), with 54 research groups (institutional 3. EXPLICATING THE EXPERIMENTAL members), is one such international undertaking. The University ANALYSES of Melbourne's Experimental Particle Physics group It is not difficult to see that the problem in the scenario described (http://www.ph.unimelb.edu.au/epp/) is a member of this above could be traced to lack of structure and semantics in the collaboration. An EHEP collaboration is established to find published analysis description documents. Debugging an answers for a narrow range of questions. For instance, the Belle experimental analysis described by authors who profess somewhat collaboration is set up to study the nature of Charge-Parity different ontological commitment about the domain is indeed a symmetry violation, which the physicists believe may explain the daunting task. We believe this misinterpretation problem can be dominance of matter over anti-matter in the universe. There are safely resolved if an analysis process is described explicitly in other contemporary collaborations, such as the CLEO definite terms to peer researchers. collaboration (http://www.lns.cornell.edu/public/CLEO/) with 25 To begin, we propose the creation of a formal scientific document, research groups and the BaBar collaboration with 77 research called analyses report, which describes the completed experimental analyses according to EHEP ontologies in an orderly Finally, the completed models will be formalised as EHEP manner. A systematic elaboration of the analyses would allow for ontologies. We intend to implement the ontologies in a clear and detailed description of the content. Publishing the DAML+OIL [4], a Frame and Descriptive Logic integrated analyses with annotations that further enrich its description can ontology language, which is set to be the standard semantic ensure optimal exchange of information between researchers markup language for web resources. within a collaboration. Moreover, these machine-readable ontologies can also be utilised 5. OUR MAIN RESEARCH ISSUES The EHEP ontologies will provide the framework for to describe analysis jobs. The formal specification of description communication, integration and sharing of resources among the can be interpreted runtime by analyser agents to perform the distributed research groups. It is the foundation for the web required data analyses. services that will be enacted for the EHEP community. In the The EHEP ontologies can also be used to mark-up the essential process, this research project is set to investigate the two key parts of the publications in open archives, allowing semantic issues: searches on the collection. Alternately, a publication can now straightaway point to the relevant experimental analysis reports in • How well can we express the domain knowledge pertaining to the analysis archive. Accessing relevant publications or the EHEP experimental analyses in a natural way (mirroring the discovering similar experimental analyses will require far less real world semantics)? time and effort. • There is a concomitant need to mark-up data and information This opportunity to embark upon an innovative way of handling regarding experimental context in the scientific documents, scientific information generated within an EHEP collaboration is before it can be used as knowledge. How can we facilitate the illustrated in Fig. 1. It affirms the belief that the next generation annotation of the EHEP collaboration documents? web can indeed change the way scientific knowledge is produced and shared, as envisaged by Berners-Lee and Hendler [5]. semantic links Analysis Job EHEP Ontologies 4. CREATING THE EHEP ONTOLOGIES new analysis The EHEP ontologies will be developed to be reused across different applications as depicted in section 3. The ontologies Analyser completed analyses produces Publication Analysis Report emphasise the formal semantics and capture the intrinsic structure of the domain embodied as concepts, relations and axioms. The creation of the EHEP ontologies is carried out in stages. First of review existing analysis hyperlinks all, there is a need for the ontologists to attain sufficient level of literacy in the EHEP domain to facilitate the impending Analysis Archive Publication Archive knowledge acquisition task. Initial discussion with the EHEP physicists and related literature review enabled us to identify the Fig. 1. Handling EHEP collaboration documents. Researchers main domain concepts in a typical EHEP experimental analysis. prepare and deliver the semantically marked up analysis reports These concepts will become the ‘hooks’ in the skeletal EHEP and publications, which can be archived and referred during knowledge model. subsequent experimental analysis. The content of the archives can Next, each of these ‘hook’ concepts is expanded systematically, as also be searched more productively using precisely defined sub-models of the EHEP domain. These models are in essence, queries. Jobs described using the ontological terms can be taxonomies of defined concepts with their roles (properties) processed directly by the agent analyser restricted. The knowledge models are elaborated from interviews with EHEP researchers, scientific documents, such as pre-prints and journal articles, and existing standard HEP terminology, such 6. REFERENCES as the terms maintained by the Particle Data Group. [1] Board on Physics and Astronomy, National Research We are developing these models using a Frame-based tool, called Council, USA: Elementary-Particle Physics - Revealing the Protégé-2000 [2]. Frames provide an object view of the world Secrets of Energy and Matter, National Academy Press, and an intuitive modelling style. In spite of some modelling Washington D.C., 1998. limitations, Protégé-2000 still is a useful interaction tool for [2] Grosso, W.E., Eriksson, H. et. el.: Knowledge Modelling at eliciting knowledge from the EHEP physicists. the Millennium (The Design and Evolution of Protégé-2000), A parallel activity undertaken during this time is the formulation In the Proceedings of KAW99, 1999. of a set of competency questions that outline the competence of [3] Gruber, T: A Translation Approach to Portable Ontologies, the EHEP ontologies. The regularly updated competency Knowledge Acquisition, Vol. 5, No. 2 (1993), 199-220. questions effectually guide the acquisition of the correct domain knowledge for the models. [4] Hendler, J. & McGuiness D: The DARPA Agent Markup Language, IEEE Intelligent Systems, Vol. 15, No. 6 (2000), In short, the development of the knowledge models follows an 72-73. evolutionary development cycle, which also encompasses the model validation, verification and refinement. This is part of our [5] Lee, T.B & Hendler J: Scientific Publishing on the Semantic ongoing work. Web, Nature, April 2001.