Semantic Association of Taxonomy-based Standards Using Ontology Hung-Ju Chu, Randy Y. C. Chow, Su-Shing Chen Computer and Information Science and Engineering, University of Florida Gainesville, FL, U.S.A. {hchu, chow, suchen}@cise.ufl.edu Raja R.A. Issa, Ivan Mutis Rinker School of Building Construction, University of Florida. Gainesville, FL, U.S.A. {raymond-issa, imutis}@ufl.edu ABSTRACT with respect to their usage) from automated project docu- ment processing and semi-automatic domain expert inputs. The vision of semantic interoperability, the fluid sharing of A high-level architecture of an integration framework in digitalized knowledge, has led much research on ontol- web environment is suggested for depicting the role of the ogy/schema mapping/aligning. Although this line of re- semantic association approach in the system. search is fundamental and has brought valuable contribu- Categories and Subject Descriptors tions to this endeavor, it does not represent a solution to the challenge, semantic heterogeneity, since the performance of H.3.1 [Information Storage and Retrieval]: Content Analy- proposed approaches significantly relies on the degree of sis and Indexing - Indexing methods, Linguistic process; uniformity, formalization and sufficiency of data represen- I.2.4 [Artificial Intelligence]: I.2.1 Applications and Expert tations but most of today’s independently developed infor- Systems - Industrial automation; I.2.4 Knowledge Repre- mation systems seldom have common knowledge modeling sentation Formalisms and Methods; I.2.6 [Artificial Intelli- frameworks and their data are often not formally and ade- gence]: Learning - Knowledge Acquisition quately specified. Consequently, a workable solution usu- Keywords ally requires interventions of domain experts. taxonomy and standards, semantic interoperability, ontol- In human society, hierarchically structured standards (or ogy-based knowledge extraction, semantic mapping. taxonomies) for characterizing complex application proc- esses and objects used in the processes are often used as a 1. INTRODUCTION common and effective way to achieve some semantic agreements among stakeholders within a domain. This The vision of semantic interoperability, the fluid sharing of research hypothesizes that the establishment and the use of digitalized knowledge, has led much research on ontology such standards can serve as a framework that can effec- (formal specification of conceptualization) and its lan- tively facilitate the reconciliation of semantic heterogeneity guages, such as Web Ontology Language (OWL) [8]. The in complex application domains. However, the reality language provides primitives for specifying concepts, prop- shows that a comprehensive priori consensus is extremely erties, explicit semantic relationships, and logical con- difficult, if not impossible, to reach. Consequently, various straints on those objects. However, it does not address the complementary and competing standards are often created issue of semantic heterogeneity between two independently and their constant-changing nature yields another level of developed ontologies. For example, a program that reads an challenge in achieving the hypothesis. ontology in OWL does not understand another ontology in This paper focuses on the development of methodology for the same language unless there is an explicit mapping be- bridging complementary standards within an application tween them. This difficulty has led much research on on- domain. It exemplifies such standards in building construc- tology/schema mapping/alignment [4], [5], [6], [11], [12], tion industry where interoperability problems are prevalent [13], and [14] and various matching technologies have been and human interactions are commonplace. It proposes a developed based on the attributes of objects and their asso- semi-automatic approach for semantically associating the ciated data. Although this line of research is fundamental standards to reduce costly human intervention in a work- and has brought valuable contributions to this endeavor, it flow. The approach formalizes standards by using ontology does not represent a solution to the challenge as we see. and discovers their affinity (to what degree they are related The performance of proposed approaches significantly re- lies on the degree of uniformity, formalization and suffi- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are ciency of data representations. Unfortunately, the concept not made or distributed for profit or commercial advantage and that of unified, formal, and sufficient specification is often an copies bear this notice and the full citation on the first page. To copy after-thought and most of today’s independently developed otherwise, or republish, to post on servers or to redistribute to lists, re- quires prior specific permission by the copyright owners. Copyright 2005 18 information systems seldom have common knowledge is primarily based on the attributes and location of struc- modeling frameworks and their data are often not formally tural building components, such as foundations and exterior and adequately specified. Consequently a workable solu- walls, which reflects the architect’s view of a construction tion usually requires interventions of domain experts. project. Although their views are different but both address In human society, hierarchically structured standards (or the same building object. In other words, the taxonomies of taxonomies) for characterizing complex application proc- the standards classify the same set of objects but on differ- esses and objects used in the processes are often used as a ent attributes. From here one can easily infer that cross- common and effective way to achieve some semantic referencing or document conversion between the standards agreements among stakeholders within a domain. This is inevitable for interaction among project participants in research hypothesizes that the establishment and the use of applications such as cost estimation and code compliance such standards can serve as a framework that can effec- checking. For example, a wall (interial or exterial) in Uni- tively facilitate the reconciliation of semantic heterogeneity formatII needs to be associated with the material (metal, in complex application domains. However, the reality wood or fiberglass) in MaterFormat and conformed to its shows that a comprehensive priori consensus is extremely intended usage (hurricane or fire proof) according to build- difficult, if not impossible, to reach. Consequently, various ing code regulations (standards yet to be formalized by the complementary and competing standards are often created industry). In general, UniformatII by design is more suit- and their constant-changing nature yields another level of able as a participant communication/interaction framework challenge in achieving the hypothesis. than MasterFormat during the earlier phases of the life cy- cle. On the other hand, Masterformat has been used for This paper focuses on the development of methodology for years and has gained the majority of the construction indus- bridging complementary standards within an application trial support for specifying detailed project documents. To domain. We have chosen a target application in the build- facilitate more efficient collaboration among project par- ing construction domain, where interoperability problems ticipants, it is a common practice to supplement Unifor- are prevalent and human interactions are commonplace. In matII with Preliminary Project Descriptions (PPDs) or that domain, a variety of taxonomy-based standards have schematic design in earlier phases, and convert them to been established but still lack a uniform and systematic way construction documents in Masterformat during later for supporting efficient collaboration among project par- phases. In addition, the conversion is also necessary for ticipants using different standards. This problem is further cost calculation since most databases of building materials compounded by the complexity and the dynamics of busi- suppliers are based on MasterFormat. It is desirable to ness applications, which often require changes of the well- transform pre-bid elemental estimates to MasterFormat, and known standards. The interoperability cost in such envi- from there to the trade costs of the project [2]. This process ronment is tremendous. For example, based on a recent is often tedious and requires cross-area knowledge. Cur- National Institute of Standards and Technology (NIST) rently, it is done manually by domain experts and it is con- report [3], a conservative figure of $15.8 billion was deter- sidered a major cause that hampers interoperability in the mined to be annual costs due to a lack of interoperability in construction domain. Bridging the two standards is a key the capital facilities industry in 2002. enabler for enhancing the interoperability. Two mainstream complementary standards, MasterFormat and UniformatII, in that domain are considered in our re- Directly matching approaches based on attributes of the search. MasterFormat [1] is a specification standard estab- entities of the standards are expected to be inefficient due to lished by the Construction Specification Institute (CSI) for the heterogeneous nature of complementary standards. This most nonresidential building construction projects in North paper proposes a practical compromise by redefining the America. UniformatII is a newer American Society of Test- notion of mapping with a semi-automatic semantic extrac- ing and Materials (ASTM) standard aiming at providing a tion framework to assist domain experts in achieving inter- consistent reference for the description, economic analysis, operability. The mapping is termed as semantic association and management of buildings during all phases of their life for relating elements between standards, and is dependent cycles [2]. These standards were created by different on the intended use such as cross-referencing of elements or stakeholders with different perspectives for different pur- specification semantic mapping. The semantic relationship poses. For instance, an architect is interested in the design can be characterized in two measurements: similarity (how and structure of a building, a contractor wants to know what closely objects resemble each other in their representation) materials are used and how much they cost, and a building and affinity (to what degree they are coupled in their us- inspector is concerned about building code compliance is- age). In some sense similarity is more static while affinity sues. MasterFormat classifies items primarily based on the is more dynamic and general. For example, a bicycle is specification of products and materials used in construction, similar to a car due to their physical structures and proper- so it is based on a conceptual view of a contractor. Com- ties. However, gasoline is more affinitive to a car although plementarily, the taxonomical classification in Uniformat II they do not resemble each other. Exploiting affinity in addi- 19 tion to similarity through semantic association is the focus Ontology Development from Taxonomy of this research. The term, ontology, has been widely used in several disci- The approach consists of three components: formalization plines, such as philosophy, epistemology, and computer of taxonomies, ontology-based semantic extraction and science. There is much confusion in its definition. For ex- measurement of affinity. The first component is a simple ample, in philosophy it refers to the subject of existence and yet novel approach for annotating a standard in primi- while in epistemology it is about knowledge and knowing. tive descriptive statements constructed by a set of necessary In computer science, many people use Gruber’s definition and sufficient orthogonal relations. They are then normal- [10] – an explicit specification of a conceptualization. In ized and generalized into ontology. The second component the context of our research, we interpret it as a description shows how the ontology can be used for the extraction of of the concepts/terms and relationships that can exist in an relevant information from the instances in other standards application domain. Centered on terms and relations, the for semantic association. The third component quantifies transformation of taxonomy into ontology is described in the affinity for ranking the extracted metadata to identify the following steps. optimal association. The following sections detail the three Step 1: relation set identification components and outline an overall architecture of an inte- gration framework depicting the relationship between the The goal of this step is to identify a sufficient and necessary proposed approach and other related technologies and sys- set of orthogonal relations for a given taxonomy/standard so tems. that assumed domain knowledge and complex concepts can be formally specified. This step should be manually done 2. FORMALIZATION OF TAXONOMY by standard committees who know best about the original intended use of the standards. The set should be con- Taxonomies are initially designed for human consumption structed from two types of relations: primitive and derived. therefore some domain knowledge that is obvious and as- Primitive relations are those that are unambiguously under- sumed by stakeholders is often omitted in their specifica- stood by the general public and the relationship between tions. Moreover, taxonomies classifying large and complex concepts connected by them does not change over time. items usually have the following characteristics: Moreover, they reflect the intrinsic properties of objects or 1. The entities being classified and the attributes upon describe time and space and the intention of users when the which the classification is based, are themselves com- objects are used. In addition, their definitions should in- plex concepts. clude set relationship, such as instance-instance, instance- 2. Multiple attributes (different concepts) might be used class, and class-class, to avoid ambiguity. For example, to classify entities at the same level. part_of is ambiguous since it could mean a subcomponent of an object or the membership of an object in a class. Its 3. Attributes are not orthogonal and might result in over- meaning can be identified as the first explanation if in- lapping concepts in low-level entities (an object can fit stance-instance is specified. into multiple categories). Derived relations are those that can be composed/modeled There is a need for a systematic approach for annotating from primitive relations. assumed semantics, clarifying complex concepts, and trans- forming them into formal representation before taxonomies To elaborate this step, a small portion of the top three levels can be effectively used for semantic association. in MasterFormat taxonomy, Division 5 (D5) Metals and Divi- sion 6 (D6) Wood and Plastic rooted from Material, is exem- Semantic depends on context and context depends on appli- plified as follows: cations. In other words, the semantic of a standard is open depending on how they are used. To avoid a standard being Division 5- Metals bound to specific applications, the intrinsic semantic of a 05100 Structural Metal Framing standard without context should include the following: 05120 Structural steel 1. the attributes being used for classification under the 05140 Structural aluminum general perception in the application domain and 05160 Metal framing systems 2. the entities under the inheritance of the taxonomy and the attributes. 05400 Cold formed metal framing To model the intrinsic semantics, ontology is considered in 05410 Load bearing metal studs this research. The following subsection describes a system- 05420 Cold formed metal joists atic approach for transforming taxonomy into ontology. 05430 Slotted channel framing Division 6 - Wood and Plastics 06100 Rough carpentry 20 06110 Wood framing 6. “Load bearing metal studs” are kind_of Metal studs 06400 Architectural woodwork (05410_1)  05410 7. 05410 is used_for 05400  (05400_05410) 06460 Wood frames Note that each statement is given a unique identifier (fol- The following relations are identified for formalizing the lowing ) derived from the original identifier of a taxon- above example: omy entity. 1. used_for (class-class, human intention): purpose Step 3: normalization 2. kind_of (class-class, intrinsic): containment rela- It is likely that redundant or conflict statements are gener- tion of attributes of instances. ated along the way when domain experts annotate their tax- 3. instance_of (instance-class, intrinsic): member- onomies in the above steps. Based on the mathematical ship properties of the relations, this step normalizes the state- 4. made_of (class-class, intrinsic): material compo- ments by: nent 1. redundancy elimination (removing same or equivalent Table 1 shows the mathematical properties of these rela- statements) tions that are used in the subsequent step for data normali- 2. conflict detection (for example: A-r1-B, and B-r1-A zation. They are also used for reasoning in knowledge ex- statements are conflict if r1 has asymmetric property) traction. 3. implication detection (for example, A-r1-B, and B-r1 C Table 1. Mathematical Properties of the relations statements imply A-r1-C through transitive property). Step 4: semi-automatic generalization Relations Transitive reflexive antisymmetric This step is to generalize the resulting statements from step used_for - - - 3 into higher-level concepts connected by the same set of kind_of + + + relations. Human being intervention is required in this step due to the complexity of the process. For example, if there instance_of + + + exist A-r1-C, A-r1-D, B-r1-C, and B-r1-D, they can be gen- made_of + - - eralized to concept1{A,B}-r1-concept2{C,D} by union. However, it becomes difficult when the above example is Step 2: relation statements construction extended to include concept1{A,B}-r1-E and con- cept2{C,D}-r2-F. One cannot conclude concept1{A,B}-r1- This step is to construct simple statements using the rela- concept2{C,D,E} unless an exception indicating no E-r2-F tions defined in step one and all keywords in the taxonomy. is added. Alternatively, it can be generalized to con- The statements are then processed in subsequent steps for cept1{A,B}-r1-concept3{E,concept2{C,D}}. The system constructing ontology. There are two advantages using this interacts with users by prompting the dilemmas for resolu- bottom-up approach for formalizing taxonomies. One is tions along the process of a whole taxonomy. that it can better address the dynamic nature of standards by enabling incremental updates and modifications of the Figure 1 shown below depicts the generalized view or on- statements and their resulting ontology. The other advan- tology of the relation statements shown in previous steps. tage is that domain experts who are not familiar with ontol- ogy can directly express their knowledge in the simple Material statements without communication overhead with knowl- used_for edge modeling experts. made_of kind_of The following are examples of relation statements that par- tially describe the example shown in previous step. Item Metal Function 1. Metals (D5), Wood (D6), Plastics (D6_1) are in- stance_of Material (root)  (D5_root, D6_root, kind_of kind_of kind_of D6_1_root) Steel Aluminum Process 2. Metals (D5) are used_for framing  05100_1 3. Structural is a kind_of “metal framing” (05100_1)  05100 {metals, wood, plastics ..} are instance_of Material 4. Cold formed is a kind_of “metal framing” (05100_1) {stud, joist ..}are instance_of Item  05400 {framing, ..}are instance_of Function 5. Studes are made_of Metals (D5)  (05410_1) {cold formed, structural ..} are instance_of Process Figure 1. Ontology Example 21 4. ONTOLOGY-BASED SEMANTIC EXTRACTION transitive property of the relation, kind_of. The match is extended to statement 05400, which includes “cold-formed”. The task of the previous module, standard formalization, is Finally “studs” is added to the match of statement 05410, usually a one-time effort (though it is an iterative process) through statement (05400_05410). Indeed the entity B2010 and it needs significant domain experts’ involvement. Exterior Wall in UniformatII has a semantic relationship with This module is different in that it is used in every work- 05410 Load bearing metal studs in MasterFormat and the flow/task and extracted semantics can be accumulated in semantic can be described by the relation made_of. repository and used for improving future semantic associa- One characteristic worthy of mentioning is that the entity tion performance. Also, it can be relatively automated by B2010 Exterior Wall in the taxonomy provides a good con- using general linguistic processing technologies. text for helping refining the association. For instance, the Standards, such as UniformatII and MasterFormat, ad- above matching, even without the “framing” keyword, is still dressed in this paper are functionally complementary to possible since the inherited semantic of the hierarchy, shell, each other in an application domain and they are costly closure, and exterior walls, has very close meaning as framing. cross-referenced by domain experts in workflows due to As shown in the above example, the documents or specifi- their complexity (vast many-to-many mappings). This cations that this research addresses have following charac- module basically is to automat the process by mimicking a teristics: domain expert doing cross-referencing from the context of a 1. Content has limited scope. It often details what, standard-compliant project specification, a script represen- where, how, and when objects and activities being tation indexed of the standard, which defines intentionality. involved in a domain application. It usually con- For example, the following text is quoted from a PPD [7] tains rich semantics (author’s intention for com- under entity B2010 in UniformatII taxonomy: municating with other stakeholders) related to B SHELL standards (due to the agreement among stake- B20 EXTERIOR CLOSURE holders) that coordinate objects and activities in the domain. B2010 EXTERIOR WALLS 2. Content are categorized according to taxonomy. In 1. Exterior Wall Framing: Cold-formed, light gage other words, text in a document has some assump- steel studs, C-shape, galvanized finish, 6" metal tion or context, which is inherited along the taxon- thickness as designed by manufacturer according omy hierarchy. to American Iron and Steel Institute (AISI) Specifi- cation for the Design of Cold Formed Steel Struc- 3. Terminologies are relatively unified and unambi- tural Members, for L/240 deflection. Downside: guous. specifications often contain note-style sentences. 4. Sentences are relatively free styled, such as note- Supposedly, the PPD is written by an architect and a con- styled or template-styled due to writing convention tractor wants to estimate cost for exterior walls. He might or standards. comprehend that the wall framing will be made of cold- These characteristics distinguish this research from others, formed steel studs (semantic). Based on his expertise, he such as [9] and [15] which extract shallow information from identifies that its corresponding entity in Masterformat is general or web documents. 05410 Load bearing metal studs (association). The following In addition to the intrinsic semantics of standards, this paragraph shows how the ontology/relation statements be- module also explores their application or context semantics ing used for discovering the semantic under the context of in order to achieve more effective semantic extraction. The entity B2010 that links the entity to MasterFormat entity application semantics depend on the stakeholders’ view or 05410 (semantic association): interests, such as information they intent for. For example, a cost estimator might look for MasterFormat items and B2010 Exterior Wall: some numerical information so that they can link them to 1. Exterior Wall Framing: Cold-formed, light gage their MasterFormat-based cost databases. On the other steel studs, C-shape, galvanized finish, 6" metal hand, an inspector might be interested in the same informa- thickness tion but in different view points that yield to different se- mantics. For example, to a cost estimator, “6" metal thick- ness” in the PPD means how much the studs with such 05100_1 05400 05410 thickness cost. But for an inspector, it means 6” thickness compliance to associated code. In the diagram, “steel” and “framing” match the statement 05100_1 (one of the identifiers of the relation statements In summary, this module extracts semantics from the in- exemplified in previous subsection) which is Metals (D5) stances (specifications) of multiple standards based on three used_for framing. The “steel” matches “Metals” through the kinds of ontologies: the ontology of the source standard, the 22 ontology of target standard, and the application ontology ETIF based on the stakeholders’ views. The extracted semantics are evidences of semantic association of entities between Ontology DB source and target standards. Standards, In- Ontology Ontology Mapping, 5. MEASUREMENT OF AFFINITY stances, Views, Generation Reconciling, Merge, meta data, Rela- Data Mining tions The ontology-based semantic extraction module can be implemented via a matching process between relation state- ments and text. The goal is to identify a set of matched Change Man- Semantic relation statements of related entities with respect to their agement Extraction & Ontology standards. For a given entity, its associated relation state- Association Editing and Taxonomy Presentation ments carry different weights depending on their positions Formalization in the taxonomy and the information content [16] of their keywords. The measurement of affinity is to quantify the Protocol Adapter Component Plug-in: XML Interface & API, weights so that the degree of the closeness between matched relation statements and their associated entity can Internet be determined. Based on the measurement, a ranking scheme can be devised to identify optimal semantic associa- tions among all matches. The ranking scheme can be mod- Data Human Semantic Web eled as a function of the following factors: View: Appli- Services Instance, cation Ontol- Browsers: Based Dy- 1. Number of relation statements matched. Standards. PPD…. ogy, standard PC, Mobile namic Work- 2. Number of keywords matched. IDs Devices flow Systems 3. Quality of the matches. The measurement of the qual- ity is an open question. Basically the more specific the Figure 2. Extensible Taxonomy-based Integration Framework (ETIF) matches are, the higher quality they represent. One ef- fective way to model the quality is by their positions in In the framework shown in Figure 2, relations and relation the taxonomy (higher level means less specific and thus statements of various versions of standards written in natu- carries less weight) and by the information content of ral languages are developed and uploaded via web-based their keywords. The information content can be quanti- tools to the system by stakeholders in the application do- fied by their inverse document frequency (IDF) [17] main. The taxonomy formalization along with the change combined with their counts in the taxonomy (appearing management modules process them through parsing, nor- more times means less specific and thus carries less malization, generalization, linguistic processing (such as weight) inflection, derivation, compounds, and synonyms), and in- For instance, in the given example, several entities in Mas- dexing for incremental update in the ontology database. terFormat contain “framing” and “Metals”, which are all can- For a particular application, the stakeholders upload in- didates for semantic association. The entity 05410 is con- stances of the source standard (e.g., PPDs), target standard, sidered as the optimal one because it matches more key- and its application ontology. After processing the free text words along its taxonomy hierarchy and some of them, such of PPD instances through linguistic techniques such as to- as studs, are very specific with respect to both position and kenization, chunk parsing, and grammatical function recog- IDF. nition [9], the system applies the semantic extraction and ranking algorithms, and returns/deposits extracted metadata 6. ARCHITECTURE and semantic association to the ontology database and also to the users or clients, if applicable, for feedback. The major thrust of the research is to develop an integration framework that facilitates exploitation of semantics from taxonomy-based standards and instantiations of the stan- The integration of competing and complementary standards dards to achieve higher interoperability between domain is a critical step for enhancing interoperability among het- participants and their information systems. To demonstrate erogeneous systems using the standards. The proposed the applicability of the proposed approach toward the goal, semantic association is only one aspect in this effort. It this section shows an overall architecture depicting one should be supplemented with other technologies such as possible implementation and its relationship with other re- ontology mapping, reconciling, and merging to provide a lated technologies. practical and complete solution. The framework includes a plug-in mechanism via XML-based interfaces and API for external software component integration. 23 The formalized standards, their instances, users’ application [5] N.F. Noy and M.A. Musen. The prompt suite: In- ontologies, and extracted metadata form a semantic rich teractive tools for ontology merging and mapping. ontology repository. Integrating the repository with other Journal of Human-Computer Studies, 59(6):983-- ontology techniques through the plug-in mechanism allows 1024, 2003. the effective construction of application domain ontology. [6] M. Paolucci, T. Kawamura, T. Payne, and K. Sy- Web services enriched with the vision of the semantic web cara. Semantic matching of web services capabili- have emerged as a mainstream solution to system integra- ties. In The First International Semantic Web Con- tion over the Internet. Following the same trend, the im- ference (ISWC), 2002. plementation of the proposed framework adopts the Web [7] Rosen, Harold J. : Construction specifications writ- Ontology Language (OWL) [8] with the intention of inte- ing : principles and procedures 5th edition, Hobo- grating building construction workflow systems via seman- ken, N.J. : J. Wiley, c2005. tic web services. [8] Mike Dean and Guus Schreiber: Editors OWL Web Ontology Language Reference, W3C Recommen- 7. CONCLUSION AND FUTURE WORKS dation, http://www.w3.org/TR/2004/REC-owl-ref- 20040210, 10 February 2004. This paper demonstrates the effective use of taxonomy for [9] Maedche, A., Neumann, G., Staab, S.: Bootstrap- ontology developments and the semantic association of ping an Ontology-Based Information Extraction ontology for interoperability in a workflow system with System, Intelligent Exploration of the Web, building construction as the target example. It illustrates a Springer 2002. systematic approach to semantic association through taxon- omy formalization and ontology-based semantic extraction. [10] Gruber, T.R., A Translation Approach to Portable The overall system implementation in web environment is Ontology Specification: Knowledge Acquisition 5: also proposed. Current activities of the research project 199-220, 1993. include the complete ontological formalization of the Ma- [11] Rahm, E and Bernstein, P. A. “A Survey of Ap- terFormat and UniformatII standards, refinement of the proaches to Automatic Schema Matching.” The affinity measure for general taxonomy, and the integration VLDB Journal, Vol. 10, pp. 334-350, 2001. of the algorithms with dynamic workflow systems through [12] Do, H., Melnik, S. and Rahm, E. “Comparison of semantic web services. Schema Matching Evaluations.” In Proceedings of the 2nd Int. Workshop on Web Databases (German 8. ACKNOWLEDGMENTS Informatics Society), 2002. [13] Aberber, K., Cudré-Mauroux, P. and Hauswirth, M. This work is partially supported by an NSF research grant “The Chatty Web: Element Semantics through Gos- ITR-0404113. siping.” The Proceedings of the 20th International REFERENCES World Wide Web Conference, pp. 197 – 206, 2003. [14] Doan, A., Madhavan, J., Domingos, P. and Halevy, [1] Construction Specifications Institute. MasterFormat A. “Learning to Map between Ontologies on the 95™ : Alexandria, VA: The Construction Specifica- Semantic Web.” The VLDB Journal, Vol. 12, pp. tions Institute, 1995 edition. 303-319, 2003. [2] Charette, R. P. and Marshall, H. E.: UNIFORMAT [15] David W. Embley , Douglas M. Campbell , Randy II Elemental Classification for Building Specifica- D. Smith , Stephen W. Liddle.: Ontology-based ex- tions, Cost Estimating, and Cost Analysis, NISTIR traction and structuring of information from data- 6389, Gaithersburg, MD: National Institute of rich unstructured documents, Proceedings of the Standards and Technology, October, 1999 seventh international conference on Information [3] Gallaher, M. P.; O'Connor, A. C.; Dettbarn, J. L., and knowledge management, p.52-59, November Jr.; Gilday, L. T.: Cost Analysis of Inadequate In- 02-07, 1998, Bethesda, Maryland, United States teroperability in the U.S. Capital Facilities Industry, [16] Ross, S.: A First Course in Probability. Macmillan NIST GCR 04-867, Gaithersburg, MD: National Publishing, 1976. Institute of Standards and Technology, August, [17] Church, K. W. and Gale, W. A. : Inverse document 2004. frequency (IDF): A measure of deviations from [4] Jayant Madhavan, Philip A. Bernstein, and Erhard Poisson. In Yarowsky, D. and Church, K., editors, Rahm: Generic Schema Matching with Cupid, at Proceedings of the Third Workshop on Very Large the Twenty Seventh International Conference on Corpora, pages 121--130. Association for Compu- Very Large Databases (VLDB'2001), Roma, Italy. tational Linguistics. 1995. 24