Computer-Supported Collaborative Knowledge Modeling in Ecology Deana D. Pennington Joshua Madin Ferdinando Villa Ioannis N. Athanasiadis University of New Mexico Univ. of California SB University of Vermont Istituto Dalle Molle di Studi MSC03 2020 735 State Street 617 Main St sull’Intellienza Artificiale Albuquerque, NM 87131 Santa Barbara, CA 93101 Burlington, VT 05405 Manno, Lugano, Switzerland +1-505-277-2595 +1-805-893-7108 +1-802-656-2968 +41-586-666-671 dpennington@LTERnet.edu madin@nceas.ucsb.edu ferdinando.villa@uvm.edu ioannis@idsia.ch ABSTRACT The authors are part of several large-scale initiatives that will use We describe collaborative efforts between a knowledge shared ontologies: the National Science Foundation-funded representation team, a community of scientists, and scientific projects Science Environment for Ecological Knowledge (SEEK; information managers in developing knowledge models for http://seek.ecoinformatics.org) and Assessment and Research ecological and environmental sciences. Formal, structured Infrastructure for Ecosystem Services (ARIES; approaches to knowledge representation used by the team (e.g., http://ecoinformatics.uvm.edu/projects/the-aries-framework.html) ontologies) can be informed by unstructured approaches to that focus on automated integration of environmental and knowledge representation and semantic tagging already in use by economic data with models and analytical pipelines; and the EU- the community. Observations about the process of collaboration funded SEAMLESS project (http://www.seamless-ip.org), aimed between the team and the community are used to generate an at generating integrated assessment tools to understand how future interaction model for supporting software tools. alternative agricultural and environmental polices affect sustainable development in Europe. In all these projects, the need to crystallize community knowledge into formal ontologies has Categories and Subject Descriptors emerged paramount. However, each of these projects has been H.5.3 [Information Interfaces and Presentation]: Group and confronted by the challenges identified by Grudin [10] specific to Organization Interfaces – Computer-supported cooperative work, groupware development, particular the following two problems: Web-based interaction. • Disparity in work and benefit. Scientists who have the knowledge that must be incorporated into ontologies lack General Terms understanding of the benefits that semantic modelling will Design, Human Factors ultimately provide them and are unwilling to engage in activities that do not provide clear, short-term benefits. Keywords Information managers who might be able to provide some of Collaboration, observation, ontologies, concept maps, ecological the knowledge and may even understand the long-term knowledge benefits for the scientists have more immediate problems and focus their time on developing short-term solutions. Hence, ontology development requires “additional work from 1. INTRODUCTION individuals who do not perceive a direct benefit” [10]. Understanding and solving global environmental problems requires a new kind of science: science that is interdisciplinary, • Critical mass and Prisoner’s dilemma. Ontology-driven collaborative, and responsive to the needs of decision-makers [9, applications are expected to be most useful when multiple 17, 29]. Cross-disciplinary networks of scientists worldwide are users share their resources. The work involved in ontology marshalling their understanding in efforts to provide scientific development and annotation of resources is not justified by a results that target complex problems. Formal networks of single user. Hence, these projects require a “critical mass of scientists—such as the Long Term Ecological Research (LTER) users to be useful” [10] and early adopters must commit to networks originally developed in the US (http://www.lternet.edu/) substantial effort with no guarantee that others will follow. and now located worldwide (http://www.ilternet.edu/)—employ information managers whose primary task is to provide online Grudin makes a number of relevant suggestions for addressing access to relevant information. With available resources rapidly these problems [10]: increasing, the difficulty of discovering and making use of those • Reducing the work required of non-beneficiaries and indirect resources (e.g., knowledge synthesis) is increasing as well, beneficiaries. especially in conjunction with rapid expansion of the Web as a whole. A number of efforts are underway to enable better sharing • Design processes that create benefits for all group members. of data, information, and knowledge within the natural sciences, • Build in incentives for use. as discussed in [1, 16, 22, 26]. These efforts include ontology- driven applications that make use of formal semantic reasoning to Developing an innovative approach to community-based ontology enable integration of heterogeneous resources. development that incorporates these suggestions presents an ill- defined, unstructured problem requiring creative thinking. Ontology-based approaches require eliciting shared knowledge Development of solutions to such problems can be conceived as from large communities of domain scientists and decision makers. two-phased [27]: 1) an idea generation phase that requires a (http://protege.stanford.edu/). Another is tasked with acting as combination of divergent thinking and domain expertise, and 2) liaison to the scientific community. an implementation phase. In this paper, we focus on the idea The KR team collaborates with the scientific and information generation phase, envisioning systems that could effectively link management communities to elicit domain-specific knowledge. short-term user needs supported by informal semantics with Few of the community collaborators have the time or interest to longer-term formal ontology development. The ideas are based cultivate an understanding of formal ontologies. Nor do they on our experiences working with these science communities, fully understand the benefits of ontology-driven systems, since understanding of their tasks, and ongoing efforts at community- few examples of these systems exist. Hence, their personal based ontology development. The goal of this paper is to propose commitment to ontology development is limited. Yet they innovative designs for systems that enable collaborative ontology recognize that semantic approaches may provide future benefits to development derived from our particular case, and also to them and are willing to help to the extent that it does not impede stimulate vibrant debate and creative thinking about generic their more immediate objectives. issues that confront interdisciplinary ontology development efforts. 3. ONTOLOGY NEEDS We begin with a brief description of the participants. That is In each of our projects, KR is tightly integrated into technical followed by a brief description of our ontology needs and an research and development. We are working toward semi- upper-level ontology that we have created. These sections automated and automated resource discovery and integration, provide context for understanding the kinds of knowledge that we including finding and merging heterogeneous datasets and need to elicit from the community and the resources that we have construction of workflows that pipe data through heterogeneous available to apply to the problem. Next, we present a set of use computing environments [4, 5, 6]. We are also constructing cases for supporting semantic-based work tasks that are knowledge-driven rule-based systems. These applications require commonly undertaken in our communities. We describe how high-quality ontologies and formal reasoning provided by these tasks provide an opportunity to capture knowledge relevant description logics for consistency checking and validation. Much to formal ontology development while providing immediate of the functionality provided by ontological reasoning will be benefits to the users. We provide a high-level conceptualization hidden from the user, yet will automate many low-level tasks that of a system that we are currently designing to implement these the user would otherwise have to undertake manually. ideas. Then, we describe methods that we have already undertaken to extract knowledge from users in direct and indirect Our ontology development has been two tiered: 1) development ways, without the support of enabling systems. These provide of an upper-level structuring framework for observation and real examples of tasks that inform ontology development. We measurements (core ontology), and 2) development of domain- discuss how these could be incorporated into our hypothetical specific extensions to the core ontology. Our early work was system in ways that limit the work required from the user. Lastly, more focused on the first though the need for domain extensions we abstract our specific problems and proposed solutions into a was known and information was continually gathered from the simple model for enabling collaborative ontology development. community whenever possible. Recently, the core ontology has been finalized and is currently being documented [15]. 2. PARTICIPANTS Scientists make observations about the world that are recorded as Initially, each project had its own Knowledge Representation measurements. The core ontology is the Extensible Observation (KR) research and personnel. Several years ago we began to Ontology (OBOE), which is a formal and generic conceptual collaborate with a view towards constructing ontologies that framework for describing the semantics of observation and would interoperate between projects, providing an opportunity to measurement. The objective of OBOE was to separate knowledge leverage each others’ work but also creating a larger, multi- that is essential for describing observation and measurement from disciplinary group that was more capable of critical evaluation of knowledge that is asserted by a scientist and therefore a function different proposed ontologies. of opinion, interpretation, or even space and time. OBOE requires that an observation is about an entity (concept or thing), The KR team has cross-disciplinary expertise in computer science and a measurement is of a characteristic of the entity. and domain science. It consists of two computer scientists with Measurement relates a value to a measurement standard as well expertise in ontologies, reasoning, and semantic mediation, and as an estimate about the confidence level of the value (e.g., four domain scientists with differing disciplinary expertise, measurement precision). OBOE prescribes a structured approach relatively high levels of computing experience, and varying for organizing domain-specific ontologies through the use of backgrounds in knowledge representation. The team has met “extension points,” i.e., specific classes, properties, and regularly to devise strategies for ontology development. constraints that are elaborated by different areas or views/models Discussion at these meetings ranges from formal symbolic logic of science. Therefore, OBOE can serve as an upper level to philosophy of science to targeted discussion about implicit framework for defining new domain ontologies as well as knowledge embedded in datasets. Time and effort was required interoperating and relating existing domain ontologies. to bridge disciplinary boundaries and understand inherent assumptions that impact the teams’ ability to collaborate on what While OBOE enforces a formal framework for describing the is clearly an interdisciplinary task. Numerous real examples of semantics of observational data, extension of this framework with environmental data and analyses obtained from scientists and domain ontologies requires the knowledge and experience of information managers have guided and informed these domain scientists. The KR team is continually involved in discussions. One of the domain scientists is tasked with outreach to acquire community-based vocabularies and knowledge engineering, and is responsible for developing and informally-structured knowledge. These outreach activities maintaining the ontologies in Protégé provide a flow of informally-structured semantic description 4.1 Controlled vocabulary use case among collaborators (Figure 1). Karen Mann is an information manager for one of the LTER field sites. She and several of her colleagues at other field sites have decided to construct a standard set of terms and definitions to be used as metadata keywords, to enable better data discovery by scientists across the LTER network. She is aware of the observation ontologies that are being developed, but doesn’t really understand them. She is reluctant to attempt to make use of an approach that she doesn’t understand. She does understand that ontologies enable even better data discovery and integration than her approach. Therefore, she wants to work within the context of keywords and controlled vocabularies since that is what she understands, but she would also like to link her list of Figure 1. Collaborative relationships between the keywords to the ontology to take advantage of whatever knowledge engineering team, scientists, and information additional functionality is made available. managers, and the types of semantic information that are Karen enters a website that provides an intuitive interface to a shared Ultimately the knowledge representation team must make some knowledge base that holds many ontologies, both private and independent decisions about how best to model the domain within shared. From this website she can create and manage her own a formal ontology. This, therefore, necessitates at least one and private knowledge base. She imports a list of terms that she has perhaps many iterations of review by scientists. When the team previously generated. She can also import informal definitions has an ontology ready for review, we would like to recruit people (not constrained logical definitions), or she can enter the from that domain to view it, comment, and propose changes. definitions on the website. Her colleagues import their lists into While trees may be used effectively to review the hierarchical their own private knowledge base as well. They all indicate to structure, relationships are more difficult to communicate the system that they want to share (or not) their private effectively. They do not understand symbolic logic commonly knowledge bases. Karen selects her colleagues’ shared employed in editors. Usability testing of graphic visualizations knowledge bases from a list, generates a collaborative knowledge conducted by the SEEK project indicates that they are confusing base, and sends a message through the website asking them to to domain users (Downey, personal communication). collaborate with her. From a collaboration screen, they are able Additionally, community-wide ontological commitment [8] to merge their vocabulary lists into a single unfiltered list. The requires collective decision making, difficult to achieve without system maintains a link between their individual lists and the synchronous communication. Currently, there is no obvious collective list, so that any changes made during collaboration can mechanism by which to obtain the needed input from reviewers. optionally be copied back to their individual knowledge bases. Their screens are linked. When one person selects or edits a term 4. USER SEMANTIC TASKS AND everyone else’s screen automatically shows the change. They can make use of VoIP or a chat window to discuss their vocabularies. COMPUTER-SUPPORTED USE CASES In this case, because there are a number of participants they prefer There is a need for more collaboration between our KR team, to use chat [14]. Their chat session is recorded and at the end of scientists, and information managers. The complexity of their discussion they can request for the chat session to be copied ontologies and the difficulty of the knowledge modeling task to a blog attached to the collaborative knowledge base, providing presents a daunting obstacle to those who are not familiar with a permanent record. knowledge representation. We need tools that link knowledge elicitation with tasks in which the community is already engaged, They collaboratively review duplicate terms and definitions to and development of methods and tools that enable rapid mapping determine semantic relationships. They identify synonyms and from those to formal ontologies. can drag and drop synonyms on the screen so that they are adjacent to one another. Where there are semantic conflicts they There are many reasons to capture and represent knowledge in resolve them and edit the collective vocabulary. science, separate and apart from the resource discovery and integration goals of the Semantic Web. Smith [23] suggested that Once they have a complete collective list of terms, they can oftentimes philosophers turn to science as a reliable way to learn choose an option to annotate the terms in their list with an about the things and processes of a given domain. Much effort in ontology. A list of ontologies is provided to them, which includes science is focused on acquiring knowledge through scientific a list of “Our Favorite Ontologies” that the system generates from discourse. This begins during formal education but is ongoing each individual’s list of “My Favorite Ontologies.” They decide throughout the life of a scientist, who must be able to share his on the ontologies they want to use (all of which are extensions to own perspective and understand those of competing explanations. the OBOE observation ontology), and begin to the annotation Those semantic perspectives are implicit in the artifacts of process. For each term, the system automatically shows them science: tools, models, datasets, and publications. Creation of syntactically exact matches from their selected ontologies along these artifacts involves tasks that are inherently semantic and with definitions. They can easily explore parent, sibling, and could both contribute to ontology development and be assisted by child concepts as well as other related concepts to ensure that they a knowledge base. Here we provide four use cases of some understand the context of any given concept in the ontology and example tasks, knowledge-based computer support for those to reconsider their term selection. They are able to search the tasks, and a vision for interaction mechanisms between and knowledge base using a google-style interface to see what other among different stakeholders. concepts might be relevant. They can ask the system to analyze their searches and suggest concepts based on the choices by other users who have made similar searches. If they are uncertain about through that task. If John has an attribute that he does not think is whether a concept is appropriate, they can request several levels adequately expressed by any of the terms in the controlled of help: tips and tricks, online documentation of annotation vocabulary, he has all of the same ontology exploration procedures, examples, live chat with a knowledge engineer, or e- functionality available to the information managers. He can mail support. suggest terms to be added to the controlled vocabulary and/or to the ontology using the same procedure as the information If they do not find a concept that fits, they can suggest terms to be manager. In this case, his recommendation is forwarded to the added to the ontology. They recommend a concept and the information manager who can assess the term, add it to the system provides them with a wizard to capture their controlled vocabulary and link it to the ontology, or forward it to recommendations about where the concept belongs in the the knowledge engineer if it requires modification of the ontology. The system allows them to go ahead and use the term ontology. with a tentative annotation. Asynchronously, a knowledge engineer will consider where to place the term in the ontology. Once the first dataset has been described and annotated, John has The system will provide him with information about the term several datasets that used the same schema. He loads the second from their knowledge base and from their search history; he may dataset and indicates to the system that it is a duplicate of the first also request additional information from them. If he decides to in terms of physical, logical, and semantic description. The add the concept as suggested, the system makes any needed system analyzes both datasets using a metadata ontology and adjustments to their knowledge base. If the concept is not added, verifies that that seems to be the case. The system duplicates the the knowledge engineer can identify it as a synonym or make metadata and annotations then prompts John for any edits that some other link from that term to the ontology such that the user might need to be made. The system “knows” which parts of the can continue to use that term but the system can resolve it to the metadata or annotations could possibly change because of the correct annotation. They will get automatic notification of the existence of the metadata ontology and leads him through those. final decision made by the knowledge engineer. Task support for If the datasets are not duplicates, the system will inform John the knowledge engineer is further discussed in Section 4.4. where there are discrepancies and support him through the process of comparing datasets, resolving issues and generating When Karen and her colleagues apply keywords to resources such correct metadata and annotations. as datasets or publications, they each apply terms from their individual controlled vocabulary. They can then select an option The remaining datasets are similar to the first dataset but vary in for automatic annotation that runs a script that constructs the different ways. John loads a new dataset into the tool and correct ontological annotation. The metadata therefore includes indicates to the system that it is similar to the first dataset. The keywords from the local vocabulary and annotation to one or system compares table structures, data types, and column content more ontologies allowing the resources to be used with ontology- and recognizes where there are differences. Again, the system driven discovery and integration tools. knows where metadata and annotations could possibly change, and prompts John to enter the correct information. 4.2 Data description use case John wants to generate a template dataset that is already described John Green is an ecologist with LTER who collects field data on and annotated (to the extent possible) for future use. He can pick plants. He has numerous spreadsheets with similar but slightly any of the datasets already described and annotated, and request a varying schemas that he has collected over a number of years. template. The system generates a blank table with associated John is interested in contributing his data to a portal so that he can metadata and annotations, then prompts for other information that participate in a new collaborative project that will analyze plant is likely to be constant, such as project descriptions and species from around the globe. In order to do so, he must provide personnel. John can elect to fill these in automatically from the metadata that includes ontological annotation. original dataset or he can enter new information manually. Once The LTER information managers have previously developed a the template is finished, he can save it and easily generate new web application that walks users through the process of creating datasets from it. Every time he does so, the system prompts him metadata for datasets. Their knowledge base is accessed by this for information that is collection-specific. application, providing access to the site’s controlled vocabulary Now that John has his datasets described and annotated, he linked to ontologies. His information manager has provided some contributes them to the portal, which is also tied to the knowledge training on how to make use of the application. John has never base. He and a number of other scientists then begin to actually used the system, but has a vague recollection of how to collaboratively decide which data should be integrated. They do it and enters the website with confidence knowing that both the enter a web application that allows them to load up multiple description and annotation tasks are supported with intuitive user datasets and collectively discuss them. As with the information interfaces online help for novices. managers, they can link their screens such that changes by one John creates metadata for the first dataset. He loads the dataset person automatically appear on everyone else’s screen. They also into the web application, which analyzes the dataset and is able to have chat, blog, and VoIP options. As they discuss the datasets automatically generate a fair amount of metadata. The system they are able to map between them semi-automatically using the prompts him for the remainder of the metadata. Then he must knowledge base and attribute annotations. They can modify any begin the semantic annotation process. He starts with the of the mappings that the knowledge base suggests plus add new controlled vocabulary for his site. The system prompts him to mappings. They can generate integrated datasets based on their select keywords for the dataset as a whole, then for each attribute mappings that inherit relevant metadata and annotations from the in the dataset. Because the keywords are linked to an upper-level source datasets, prompting them to complete whatever new ontology, the system prompts him to annotate the relationships metadata or annotations are needed. As they collaboratively between attributes required by that ontology and guides him decide on the mappings between datasets, the knowledge base tracks their decisions. For instance, the scientists decide that portal that are to be input to the workflow. When they are dataset 1 attribute 12 maps to dataset 2 attribute 6. These two satisfied with their workflow, they can export it as a beginning attributes were annotated differently and there currently is no workflow for a scientific workflow system and the annotations are relationship between those concepts in the ontology. Through transferred with the workflow. their collaborative mapping, however, they have indicated that there is indeed a relationship between these concepts. As they 4.4 Ontology review use case work through semi-automatic mapping of many attributes from Bob Card is a knowledge engineer working with the LTER many datasets the system is able to analyze their choices and community. He works on a tightly-coupled team that includes suggest changes to the ontology to the knowledge engineer. both computer and domain scientists. Combining the teams’ collective knowledge with information from text mining he has 4.3 Concept mapping use case generated the knowledge base used in the above cases. He is Through the data portal, John has begun a dialogue with several rapidly receiving input from all of the suggestions made by his scientists from different disciplines about potentially working colleagues, as well as analysis of user actions from the system. together on a research project. Because they are familiar with He needs some sort of semantic management system to help him different theories, research paradigms, and study methods, they track all of these recommendations, make sense of them, and need to spend a significant amount of time developing a generate automated response to users who are affected by a given conceptual framework that is well thought out and integrates their decision that he makes. different perspectives. They are located in different universities He is able to generate term lists from any combination of the and they can’t take enough time away from their teaching to above sources, flexibly sort and group terms, and try out tentative adequately develop a collaborative approach. They decide to hierarchical structures before making any changes to his formal make use of a new web application that provides collaborative ontology. As he works with the tentative hierarchies he can invite concept mapping and is linked to the knowledge base. participants to collaborate with him using linked screens. Or, he They enter the website and rather than choose specific ontologies, can request that colleagues review and modify a copy of any they select the portal and request to use the same ontologies as the tentative hierarchy. The system will compare the modified copy portal. Independently, they each draw concept maps and process with his tentative structure and show him where changes have flow diagrams that represent their research interests. Each term been proposed. At any point he can modify the tentative that they use, if present in the selected ontologies, is automatically ontology. When Bob is ready, he can request the system to align completed as they type it in. Again, if they want to use a term his tentative ontology with the existing ontology and show that isn’t in the ontology they can suggest terms. The linkages changes. When he is satisfied with the tentative ontology he can between terms in the diagram provide information about “commit” it and the system will automatically replace the affected relationships between concepts that the system tracks, analyzes, portion of the existing ontology with the necessary changes. The and can use to suggest changes to the knowledge engineer. earlier version is stored in case he needs to return to it. The system analyzes the changes and determines which annotated Once they have each constructed their own diagrams they can resources are affected. It creates a new version of annotations for collaboratively view and discuss each others work using linked those resources and notifies the user of the change. screens, chat, blogs, and VoIP. They can draw diagrams together representing their collective views. As they discuss the diagrams they begin to resolve semantic issues. They determine that there 5. EXAMPLE COLLABORATION- is a close relationship between certain concepts in their different CENTERED SOLUTION disciplines but they use different terminology for those concepts. Our team has started investigating technical solutions to the As they find these differences they draw links on their diagrams. challenge of defining user-friendly, semi-automated processes to The system tracks these linkages and can use them to suggest distill disciplinary knowledge into formal ontologies. Our goal is links across domain-specific extensions of the ontology. to accomplish this with the least possible amount of difficulty for the user and transparent, non-obtrusive involvement of the They can request the system to “show datasets,” and next to each knowledge base. The approach that we are taking is design of term on their maps it will provide titles of datasets in the portal interacting systems for knowledge base development and that are annotated with that term or related terms. They can management, community-based ontology interaction, and explore these datasets in the same collaborative way as described multiple knowledge-based applications (Figure 2.). above, and construct integrated datasets. The portal is linked to a repository of publications that have been annotated. Therefore The ThinkCap Collaborative Knowledge Portal is a prototype “show publications” can be used to display publications that have web application still under development that provides user been annotated with the terms related to those they have used. interfaces over a remote, multi-ontology knowledge base, designed to meet the needs of both non-technical and technical After drawing many diagrams, exploring datasets, and reading users (http://ecoinformatics.uvm.edu/technologies/thinkcap.html). relevant publications they are ready to design their research It aims to allow remote users of diverse disciplines and technical project. They make use of a “workflow design” module that levels to develop shared conceptualizations that are automatically provides some structure for diagramming a conceptual scientific formalized into OWL or RDFS ontologies. workflow using concepts from the knowledge base. Each node in the workflow represents a computational analysis or procedure The paradigm of knowledge elicitation being implemented in [16]. Links between the nodes represent flow of output data from ThinkCap uses a knowledge engineer in an asynchronous way; by one component to input data for the next. They use terms from decoupling the formal knowledge base from the "arena" of user model and process ontologies, with the system using automatic discussion full concurrency of the editing of both is made word completion. They can indicate specific datasets from the possible. The process is assisted by a full-text search engine that indexes OWL concept descriptions as well as user-provided the semantic-driven functionality described in our use cases. documentation (such as web pages or academic papers). SciDesign will provide an interface for knowledge-based scientific discourse, resource discovery, exploration, and We are currently extending ThinkCap to help such a diverse management, and research design. As scientists and information community of users negotiate the rigorous, streamlined axioms in managers make use of SciDesign for individual or collaborative an OWL knowledge space. A new collaborative portal in efforts, their actions will be captured and analyzed by the system ThinkCap will use a reasoner-assisted process and an upper and used to inform ontology development. Technical designs for ontology to define different views of an OWL knowledge base. SciDesign, OntoGrow, and ThinkCap are currently being These simplified views will allow applications to show only the developed under the second, implementation phase of complex level of semantic complexity necessary for the immediate task. problem solving that follows idea generation. Views will include conversion of ontologies to topic maps (www.topicmaps.org). Topic maps reflect the knowledge in the 6. KNOWLEDGE ELICITATION- ontology base in ways that are much friendlier to the user community and much easier to operate on concurrently. The CENTERED PROCESSES AND portal will provide a web-based whiteboard environment for SOLUTIONS collaborative topic map editing. A reasoner-assisted listener We present four approaches that our KR team has used to acquire process will analyze user changes to the topic map and provide scientific knowledge, beginning with the least demanding for the suggestions to a knowledge engineer about possible relevance to participants and ending with the most collaboration-intensive. the underlying OWL axioms. Once a prototype has been tested Each is followed by suggestions for incorporation of these tasks with users, we will design additional interfaces. into the proposed system. 6.1 Text mining In science, the knowledge representation method of choice has historically been written texts (publications) or conference presentations with accompanying figures and tables. These approaches are highly expressive and have worked well for sharing scientific knowledge for generations. A wealth of information about scientific concepts is locked up in textbooks and publications. Effective mechanisms for mining these sources provide abundant information for ontology development with no additional effort on scientists’ parts. The downside of this approach is that structure or presentation of knowledge within a text represents the perspective of one or a few scientists, and does not necessarily capture the perspective of the broader community. It may not provide a knowledge model for which there can be widespread ontological commitment [8]. Therefore, text mining approaches are dependent on extensive collaborative review of Figure 2. Interacting systems that make use and inform the results. development of multiple ontologies. The knowledge representation team is exploring different ways of OntoGrow is an interface to ThinkCap that is currently under extracting knowledge from a popular ecological textbook [3] for design. OntoGrow will provide functionality for communities to use in the OBOE framework. The team is quantifying the interact with ThinkCap and can either be accessed directly or strength of association among key ecological terms using various indirectly through an application add-in. OntoGrow has three measures of proximity. For example, the term “population” is objectives: 1) provide community feedback/critique of strongly associated with “individual” and also “community”; ontologies, 2) recommend a term for an ontology, and 3) map however, the association between “individual” and “community” semantics between a resource and one or more ontologies. The is considerably weaker. Moreover, the proximity of different sets multiple views of ThinkCap will allow OntoGrow to provide of prepositions and verbs to coupled ecological terms is being wizards that step a user through these processes in more intuitive used as a mechanism to determine the most likely type of ways. For instance, to recommend a new term, the user could relationship between terms. For example, when “individual” and first be asked to provide its definition through the dictionary view, “population” are in close proximity, words like “in”, “part” and find a related term with a thesauri search, place the term in a “contain” are often also in close proximity suggesting a part-of hierarchy by using a taxonomy to expose the context around the relationship between these terms. The team is also using book related terms that the user has selected, and relate the term on a chapter, section, and subsection headings to help structure the topic map generated from the portion of the ontology that nested ecological terms, which helps distill broader concepts in includes that hierarchical element. Thus, the user can be stepped the textbook domain (e.g., “competition” or “ecosystem”). through the task of ontology navigation by using their choices at There are many mechanisms for incorporating text mining into each point to simplify the choices at the next level of complexity. the hypothetical system. This functionality could be provided to The applications that are currently being developed by our knowledge engineers within ThinkCap. Text mining could be projects will each be able to make use of OntoGrow as an add-in integrated into SciDesign as an aid for scientific literature search or through remote calls, providing a uniform mechanism of and review. Substantial time is dedicated by scientists to interaction with the knowledge base. In addition, we are following the literature in their own discipline. Increasingly the designing a new system, SciDesign, that is envisioned to provide boundaries between disciplines must be crossed and scientists must search for relevant literature in disciplines that are less well management needs can be leveraged to support the activities of known to them. Visual analytics is a new approach that mines the scientists. semantic content across many potential resources and provides For instance, information managers collectively invest much tools for visual content analysis [25]. Linking visual analytics effort in designing databases, developing normalized schemas, with text mining would provide scientists with functionality to standardizing keywords, and developing standards for metadata. more easily, effectively, and comprehensively conduct literature They have their own knowledge arena that combines both generic searches. Providing computer support that enables this task data management concepts and how those concepts are best would create an environment where it is to the scientist’s benefit applied to a particular domain of interest. Separate ontologies to use the system while providing valuable semantic information should be constructed to capture this knowledge. Rule-bases for ontology development. In a given literature search, selection could be constructed that link to those ontologies and can be used of multiple resources from different disciplines, journals, to guide data management efforts. For instance, in designing a websites, and other online sources provides evidence that these new table for collection of a particular kind of field data, the content sources are semantically related in some way. Combining system could use an ontology and rules about database design to source-specific semantic keywords with the choices and actions provide expert advice and best practices, mine available data to of many scientists equates to other forms of social tagging find and show examples of datasets that meet those guidelines and prevalent in Web 2.0. The system should be equipped to analyze are semantically equivalent to the data the scientist intends to these choices, mine the relevant texts, and both suggest other collect, and suggest one or more table designs. literature that might be relevant to the scientist and in parallel, propose terms and relationships to the knowledge engineer. 6.3 Concept mapping Concept mapping is an approach that the KR team has used that 6.2 Keywords and controlled vocabularies provides direct input for ontology development from a number of Scientist’s regularly apply keywords to textbooks, publications, scientists while they are engaged in an activity that is useful to and datasets. Traditionally these are uncontrolled, though them. Concept mapping is a representation mechanism that has controlled vocabularies are becoming more common (i.e. for been developed to support a constructivist notion of learning [18]. computer science publications IEEE and ACM share a definite Concept maps are a form of directed graph that captures tree-structured list of terms). Additionally, the titles they choose associations (links) between concepts (nodes; Figure 3). Concept provide information about important terms. Mining titles and mapping provides maximum flexibility for conceptualization of a keywords for concepts and relationships provides a pathway for domain of interest, and any kind of association can be mapped. acquiring views on scientific knowledge that requires little effort From a collaborative perspective, concept maps provide visual from scientists, but does require collaboration with information representation of disparate conceptual frameworks including the managers who know how to access these on their systems. most important terms from a particular view, and places those Separate from our projects, LTER information managers terms in context with one another for rapid understanding. conducted a mining project on network datasets and publications in order to develop a controlled vocabulary [21]. A list was generated by compiling all words appearing in metadata titles, keywords, and attributes, and in publication titles and keywords. The resultant list contained 21,153 terms. The list was filtered for ‘of,’ ‘the,’ and similar definite articles and prepositions. Terms were then rated in importance based on a number of usage criteria. The information managers are continuing to work with this list to develop a controlled vocabulary for use in tagging datasets and publications. They provided this list to our KR team, who were able to incorporate these terms into ontology development. The intention of both groups is to ultimately link the information managers’ controlled vocabularies to the ontology such that controlled keywords applied to any resource are automatically annotated to the ontology, the ontology can be used to suggest terms that are not available in the controlled vocabulary, and the process of users applying new keywords can inform continued development of both the controlled vocabulary and the ontology. In the proposed system, support for information management activities could be embedded in SciDesign. One of the above use cases explicitly addresses supporting construction and management of local vocabularies. There are many other Figure 3. Example concept map showing relationships information management activities that could be supported. between terms related to the scientific method. Sometimes these activities are conducted by information The utility of concept maps as a mechanism for enabling managers, but there are many scientists who work independently interdisciplinary discussion has been demonstrated [11, 13]. In and must conduct these activities themselves. Even when cross-disciplinary problem solving efforts, colleagues with information managers are employed, they must work closely with differing conceptual frameworks often have limited ability to scientists. Design of functionality to assist information comprehend each other [7, 28, 13]. The degree to which comprehension is limited depends on the conceptual proximity of relevant conceptual frameworks - hence, two physical scientists the US. The selected participants represent the most technically- are more readily able to collaborate than a physical and social savvy of young ecologists tackling problems that require scientist, or a life scientist and a computer scientist. Enabling computational approaches. During the workshop, one full day is cross-disciplinary collaboration is therefore a problem of spent covering ontologies. Over the four years that the training representing disciplinary concepts in a way that enables rapid has been conducted, the ontology portion has been constantly comprehension and learning by those outside of that discipline modified based on feedback from students, and many new such that integrative problems can be solved. approaches have been tried. In general, the students are exposed to exercises that highlight the semantic issues in ecological The process of concept mapping is analogous in many ways to datasets and the requirements for resolving those issues. They social tagging systems. The content, in this case, is an construct ontologies for their research interests on paper. We unrepresented concept in the mind of the scientist. A node in a demonstrate ontology editors and touch graph visualizations. concept map represents that concept. Two scientists may use They step through portions of ontology editing exercises such as different terms in the node that describes that concept, essentially CO-ODE’s pizza ontology [12]. The ontology portion of the tagging that concept differently. Links between nodes specify training is always the most difficult to present, and often receives that a relationship of some sort exists between those concepts. criticism in post-training surveys. Even though participants This is roughly equivalent to inferring implicit semantic links understand the semantic issues and recognize that ontologies between Web content. Two scientists drawing concept maps might be useful for addressing them, they do not think that it is about the same research area will each have their own map using important for them to understand ontologies. In the most recent the same or different terms and relationships, but they are tagging training (January 2007) survey feedback indicated that 50 percent the same semantic content. During scientific discourse, these of participants, when asked what one thing they would change disparate concept spaces may become partially aligned. Hence, about the training, thought the ontology portion should be concept maps from multiple scientists build a participatory removed. This is a clear indication that direct exercises with ecosystem of content that can provide important vocabulary, ontologies is an obscure task for ecological scientists and more indicate synonyms, show informal associations between terms, gentle tools are needed for communicating semantic models. and provide hierarchical relationships. These semantic tags require structuring by the KR team and subsequent review and The KR team has attempted to engage groups of scientists in editing for clearance, cohesion, and soundness. However, the ontology development through working meetings where they are benefit of using concept maps is that it engages the scientific asked to talk about their research, explain terms, brainstorm community in supplying knowledge for ontology development in hierarchies, and provide lists of terms. Generally, their level of a way that has other direct and immediate benefits to them, such interest in these activities fades rather rapidly, mirroring the that they are more likely to participate. response from the training activities. Additionally, the hierarchical structures that they propose are often unusable in our In the proposed system, concept maps and other diagrammatic ontologies due to logical errors. Most importantly, those who are forms are expected to be an important part of SciDesign. willing to participate are typically new faculty who are under Scientists draw many sorts of diagrams and frequently find that substantial pressure to produce research results quickly in order to mode of expression useful while discussing complicated cross- obtain tenure. Their modus operandi is to only get involved in disciplinary subjects. Process diagrams, flow diagrams, project activities that will quickly lead to publication. Few obtain any diagrams – there are an unlimited number of uses of diagrams. short-term professional benefit for assisting in the development of The system should provide flexible, intuitive diagramming tools ontologies; hence, few can remain engaged at the level needed. that can be collaboratively constructed and shared, plus easily extracted and converted to publication-quality diagrams. If the Given all of these issues, the KR team has to be creative about nodes on the diagrams are linked to ontologies they can provide finding other ways to obtain their input. The hypothetical system an individual “view” of the knowledge base, allowing each as a whole represents a new approach to “meeting with the scientist to maintain his own conceptual perspective without scientists.” This new approach is virtual rather than physical, and compromising the collective formal structure. We have found focuses on linking user-centered task support with knowledge that it is important to the scientists to be able to express their development task needs. It combines “pulling” ontology individual view with no constraints, and that the underlying development through analysis of the way semantics are used by subsumption hierarchy is much less important to them [20]. the community with “pushing” ontology development with easy Science is, after all, about investigating areas of our mechanisms for reviewing and suggesting changes during task understanding where there is not agreement, and understanding performance. It is an attempt to solve the problems of disparity of linkages across hierarchies rather than within hierarchies. Much work and benefit, critical mass, and Prisoner’s dilemma [10] that of our analysis involves providing mechanisms for online and are prevalent in collaborative ontology development projects. It collaborative construction of concept maps and other scientific does that by bridging the gap between formal and informal diagrams that facilitate working with different ‘views’ of a set of semantic approaches in ways that reduces workload and provide ontologies based on individual perspectives and choices about benefits for all participants. representation. 7. COLLABORATIVE ONTOLOGY 6.4 Meeting with scientists DEVELOPMENT MODEL The utility of ontologies has been introduced to scores of Developing semantic systems that depend on and enable group ecologists during a week-long training workshop on sharing of resources differ in fundamental ways from developing ecoinformatics that the SEEK project holds each January. The software that supports individuals and large organizations [10]. participants in this training are 20 new faculty and postdoctoral One clear difference is that in both of the latter, the tasks to be associates selected from on average 60-80 applicants from around supported are well-defined in advance by product managers or in- house IT experts, respectively. In contrast, semantic tasks may be systems target. Developing cross-disciplinary understanding is understood for the work of the KR team but are poorly defined for the first step towards the truly interdisciplinary perspective that is any new community that is to be supported. For instance, much required for effective idea generation. While there are few work has been conducted on semantic tasks of online shoppers theories about enabling interdisciplinary interaction, social and therefore systems that support and make use of these science research on boundaries, boundary crossing, and boundary activities are becoming common place. Those tasks are not spanners point to the importance of constructing shared artifacts, necessarily analogous in any way to the semantic tasks of a facilitated by an individual whose is explicitly tasked with completely different group such as scientists. The semantic tasks mediating between the groups [24, 13, 30, 2]. The role of a must be understood before they can be supported. A second mediator in any sort of groupware development is currently difference is that the introduction of systems that drastically unspecified but in the semantic system case, could include soft change work patterns require corresponding investments in system analysis of the KR team, domain specialists, and the dealing with social and political factors that go along with change broader community. management. These issues are largely absent in development of single-user software. They are strongly present in organizational 8. CONCLUSIONS settings where there is also an infrastructure in place to provide This paper describes interactions that have taken place between a training, restructure work, and provide leadership. Our semantic knowledge representation team, natural scientists, and information systems for scientists bring about all of the challenges of managers, and uses those to drive a set of use cases for design of changing work processes with little of the supporting systems that enable better collaboration on ontology development. infrastructure. This is a common reason for failure of new Previous interactions have been stymied by the lack of groupware solutions. For these reasons and many others it is community understanding of ontologies and willingness to essential that collaborative knowledge development teams dedicate time towards ontology development. These problems become strategic in their activities. Unfortunately, there are few reflect the lack of direct, immediate benefit for the participant. models available to guide strategic choices. Our experience leads us to believe that formal ontology development could be more effectively informed by constructing We propose the following model for development of semantic tools that capture semantic decisions that are made in the course systems that depend on collaboration between knowledge of the community’s everyday work. Our community of interest representation specialists and the communities that they aspire to regularly semantically tags the artifacts used in the conduct of support. System development should be explicitly divided into science – datasets, publications, and models, and makes use of two phases: an idea generation phase and an implementation them in ways that capture semantic linkages. Design and phase (Figure 4). The idea generation phase can be conceived of development of systems that capture these semantic decisions and as product development on steroids. It is separated out to effectively make use of them to inform ontology development has emphasize that this is a lengthy, time-consuming process that may been initiated but is in its infancy. Ultimately, we hope to have require as much resource investment as the implementation phase. prototype systems and showcase applications that use those systems to demonstrate the collective benefits of ontology-based systems and applications. The ideas that are generated through this process are not a complete set. They represent one or a few of many possible integrated approaches to linking semantic tasks. As the ideas are implemented and enacted within the broader community, other ideas will emerge. It is extremely important that any strategy taken explicitly account for feedbacks throughout the entire process including providing mechanisms to incorporate the views of the broader community in long-term system development. 9. ACKNOWLEDGMENTS This work was funded through National Science Foundations Figure 4. Model of collaboration on semantic systems. The grant 0225665 for the SEEK project, grant DBI 0640837 for the idea generation phase is made explicit and involves needs ARIES project, and European Union grant 010036-2 for analysis across the collective stakeholders rather than a SEAMLESS. We would like to recognize the many relevant single user group. Every step in the model is iterative and discussions with the rest of the SEEK and ARIES teams, along involves feedback from other steps. with valuable comments by anonymous reviewers that led to Idea generation is an iterative process that has the goal of restructuring of this paper and considerable sharpening of content. discovering linkages between semantic tasks of the collective group of participants that can be leveraged by system design. In 10. REFERENCES its simplest form, it consists of learning about the workflow of [1] Athanasiadis, IN (2007). Towards a virtual enterprise each participating stakeholder group, analyzing those in terms of architecture for the environmental sector, In: (Protogeros, N, semantic tasks, then analyzing the collective set for tasks that can Ed.) Agent and Web Service Technologies in Virtual be linked in some way. In practice this involves a rather chaotic Enterprises. Idea Group Inc. period of interaction between different participants and the KR [2] Baker, KS, Jackson, SJ, and Wanetick, JR (2005). Strategies team as they learn about each other’s perspectives and search for supporting heterogeneous data and interdisciplinary common ground. These interactions are difficult because they collaboration: Towards an ocean informatics environment, depend on overcoming the very semantic barriers that semantic Proceedings of the 38th Hawaii International Conference on [17] Newell, B, Crumley, CL, Hassan, N, Lambin, EF, Pahl- system Sciences. Wostl, C, Underdal, A, Wasson, R (2005). A conceptual [3] Begon, M, Townsend, C, and Harper, JL (2006). Ecology, template for integrative human-environment research, Blackwell Publishing, 752 pp. Global Environmental Change 15:299-307. [4] Berkley, C, Bowers, S, Jones, M, Ludaescher, B, [18] Novak, JD, and Wurst, M (2005). Collaborative knowledge Schildhauer, M, and Tao, J (2005). Incorporating semantics visualization for cross-community learning, In: (Tergan, S in scientific workflow authoring, Proceedings of the and Keller, T Eds.) Knowledge and Information Statistical and Scientific Database Management (SSDBM) Visualization, Lecture Notes in Computer Science 3426:95- 2005. 116, Berlin Heidelberg: Springer-Verlag. [5] Bowers, S, and Ludaescher, B (2004). An ontology driven [19] Noy, NF, Sintek, M, Decker, S, Crubezy, M, Fergerson, RW, framework for data transformation in scientific workflows, and Musen, MA (2001). Creating semantic web content with Proceedings of Data Integration for Life Sciences (DILS) Protégé-2000, Intelligent Systems 16(2):60-71. 2004. [20] Pennington, D (2006). Representing the dimensions of an [6] Bowers, S, Thau, D, Williams, R, and Ludaescher, B (2004). ecological niche. Proceedings 5th International Semantic Data procurement for enabling scientific workflows: On Web Conference (ISWC’06) Workshop: Terra Cognita 2006 exploring inter-and parastism, Proceedings of Semantic Web – Directions to the Geospatial Semantic Web, November 6, and Databases (SWDB) 2004. 2006, Athens, Georgia. Available online: http://www.ordnancesurvey.co.uk/oswebsite/partnerships/res [7] Daily, GC and Ehrlich, PR (1999). Managing earth’s earch/research/terracognita.html. ecosystems: an interdisciplinary challenge, Ecosystems 2:277-280. [21] Porter, J (2006). Improving data queries through use of a controlled vocabulary, DataBits: An Electronic Newsletter [8] Davis, R, Shrobe, H, and Szolovits, P (1993). What is a for Information Managers, Spring 2006. Available online: knowledge representation? AI Magazine 14(1):17-33. http://intranet.lternet.edu/archives/documents/Newsletters/Da [9] DiCastri, F (2000). Ecology in a context of economic taBits/06spring/. globalization, BioScience 50(4):321-332. [22] Rizzoli, AE, Donatelli, M, Athanasiadis, IN, Villa, F, and [10] Grudin, J (1994). Groupware and social dynamics: Huber, D (accepted). Semantic links in integrated modeling frameworks, Mathematics and Computers in Simulation. eight challenges for developers, Communications of the ACM 37(1): 92-105. [23] Smith, B (2003). Ontology: An introduction. In: (Floridi, L ed.), Blackwell Guide to the Philosophy of Computing and [11] Heemskerk, M, Wilson, K, and Pavao-Zuckerman, M (2003). Information. Oxford:Blackwell, pp. 155-166. Conceptual models as tools for communication across disciplines, Conservation Ecology 7(3):8-17. [24] Star, SL (1990). The structure of ill-structured solutions: boundary objects and heterogeneous distributed problem [12] Horridge, H, Knublauch, H, Rector, A, Stevens, R, and solving. In: (L. Gasser and EMN Huhns, Eds.) Distributed Wroe, C (2004). A Practical Guide To Building OWL Artificial Intelligence, Vol. 2. London: Morgan Kaufmann Ontologies Using the Protégé-OWL Plugin and CO-ODE Publishers, Inc., pp. 35-54. Tools, Edition 1.0. Cooperative Ontologies Program tutorial, 118 pp. Available at http://www.co- [25] Thomas, JJ and Cook, KA (2006). A visual analytics ode.org/resources/tutorials/ProtegeOWLTutorial.pdf. agenda, IEEE Computer Graphics and Applications 26(1):10-13. [13] Jeffrey, P (2003). Smoothing the waters: observations on the process of cross-disciplinary research collaboration, Social [26] Villa, F, and Athanasiadis, IN (submitted). Modelling with Studies of Science 33(4):539-562. knowledge: Emerging semantic approaches to ecological modeling, Ecological Modelling. [14] Löber, A, Schwabe, G, Grimm, S (2007). Audio vs. chat: The effects of group size on media choice. [27] Vincent, AS, Decker, BP, and Mumford, MD (2002). Proceedings of the 40th HICCS Hawaii International Divergent thinking, intelligence, and expertise: A test of Conference on System Sciences. alternative models, Creativity Research Journal 14(2):163- 178. [15] Madin, J, Bowers, S, Schildhauer, M, Krivov, S, Pennington, D, and Villa, F (in review). An ontology for describing and [28] Wear, DN (1999). Challenges to interdisciplinary discourse, synthesizing ecological observation data. Submitted to Ecosystems 2:299-301. International Journal of Ecological Informatics. [29] Welp, MA, de la Vega-Leinert, A, Stoll-Kleemann, S, and [16] Michener, WK, Beach, JH, Jones, M.B, Ludaescher, B, Jaeger, CC (2006). Science-based stakeholder dialogues: Pennington, DD, Pereira, RS, Rajasekar, A, and Schildhauer, Theories and tools, Global Environmental Change 16:170- M, (2007). A knowledge environment for the biodiversity 181. and ecological sciences. Journal of Intelligent Information [30] Williams, P (2002). The competent boundary spanner, Systems DOI 10.1007/s10844-006-0034-8 available online at Public Administration 80(1):103-124. url: http://www.springerlink.com/content/e252n818242783g4/.