=Paper=
{{Paper
|id=Vol-156/paper-4
|storemode=property
|title=Semantic Association of Taxonomy-based Standards Using Ontology
|pdfUrl=https://ceur-ws.org/Vol-156/paper4.pdf
|volume=Vol-156
|dblpUrl=https://dblp.org/rec/conf/kcap/ChuCCIM05
}}
==Semantic Association of Taxonomy-based Standards Using Ontology==
Semantic Association of Taxonomy-based Standards Using Ontology
Hung-Ju Chu, Randy Y. C. Chow, Su-Shing Chen
Computer and Information Science and Engineering, University of Florida
Gainesville, FL, U.S.A.
{hchu, chow, suchen}@cise.ufl.edu
Raja R.A. Issa, Ivan Mutis
Rinker School of Building Construction, University of Florida.
Gainesville, FL, U.S.A.
{raymond-issa, imutis}@ufl.edu
ABSTRACT with respect to their usage) from automated project docu-
ment processing and semi-automatic domain expert inputs.
The vision of semantic interoperability, the fluid sharing of A high-level architecture of an integration framework in
digitalized knowledge, has led much research on ontol- web environment is suggested for depicting the role of the
ogy/schema mapping/aligning. Although this line of re- semantic association approach in the system.
search is fundamental and has brought valuable contribu-
Categories and Subject Descriptors
tions to this endeavor, it does not represent a solution to the
challenge, semantic heterogeneity, since the performance of H.3.1 [Information Storage and Retrieval]: Content Analy-
proposed approaches significantly relies on the degree of sis and Indexing - Indexing methods, Linguistic process;
uniformity, formalization and sufficiency of data represen- I.2.4 [Artificial Intelligence]: I.2.1 Applications and Expert
tations but most of today’s independently developed infor- Systems - Industrial automation; I.2.4 Knowledge Repre-
mation systems seldom have common knowledge modeling sentation Formalisms and Methods; I.2.6 [Artificial Intelli-
frameworks and their data are often not formally and ade- gence]: Learning - Knowledge Acquisition
quately specified. Consequently, a workable solution usu-
Keywords
ally requires interventions of domain experts.
taxonomy and standards, semantic interoperability, ontol-
In human society, hierarchically structured standards (or ogy-based knowledge extraction, semantic mapping.
taxonomies) for characterizing complex application proc-
esses and objects used in the processes are often used as a 1. INTRODUCTION
common and effective way to achieve some semantic
agreements among stakeholders within a domain. This The vision of semantic interoperability, the fluid sharing of
research hypothesizes that the establishment and the use of digitalized knowledge, has led much research on ontology
such standards can serve as a framework that can effec- (formal specification of conceptualization) and its lan-
tively facilitate the reconciliation of semantic heterogeneity guages, such as Web Ontology Language (OWL) [8]. The
in complex application domains. However, the reality language provides primitives for specifying concepts, prop-
shows that a comprehensive priori consensus is extremely erties, explicit semantic relationships, and logical con-
difficult, if not impossible, to reach. Consequently, various straints on those objects. However, it does not address the
complementary and competing standards are often created issue of semantic heterogeneity between two independently
and their constant-changing nature yields another level of developed ontologies. For example, a program that reads an
challenge in achieving the hypothesis. ontology in OWL does not understand another ontology in
This paper focuses on the development of methodology for the same language unless there is an explicit mapping be-
bridging complementary standards within an application tween them. This difficulty has led much research on on-
domain. It exemplifies such standards in building construc- tology/schema mapping/alignment [4], [5], [6], [11], [12],
tion industry where interoperability problems are prevalent [13], and [14] and various matching technologies have been
and human interactions are commonplace. It proposes a developed based on the attributes of objects and their asso-
semi-automatic approach for semantically associating the ciated data. Although this line of research is fundamental
standards to reduce costly human intervention in a work- and has brought valuable contributions to this endeavor, it
flow. The approach formalizes standards by using ontology does not represent a solution to the challenge as we see.
and discovers their affinity (to what degree they are related The performance of proposed approaches significantly re-
lies on the degree of uniformity, formalization and suffi-
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
ciency of data representations. Unfortunately, the concept
not made or distributed for profit or commercial advantage and that of unified, formal, and sufficient specification is often an
copies bear this notice and the full citation on the first page. To copy after-thought and most of today’s independently developed
otherwise, or republish, to post on servers or to redistribute to lists, re-
quires prior specific permission by the copyright owners.
Copyright 2005
18
information systems seldom have common knowledge is primarily based on the attributes and location of struc-
modeling frameworks and their data are often not formally tural building components, such as foundations and exterior
and adequately specified. Consequently a workable solu- walls, which reflects the architect’s view of a construction
tion usually requires interventions of domain experts. project. Although their views are different but both address
In human society, hierarchically structured standards (or the same building object. In other words, the taxonomies of
taxonomies) for characterizing complex application proc- the standards classify the same set of objects but on differ-
esses and objects used in the processes are often used as a ent attributes. From here one can easily infer that cross-
common and effective way to achieve some semantic referencing or document conversion between the standards
agreements among stakeholders within a domain. This is inevitable for interaction among project participants in
research hypothesizes that the establishment and the use of applications such as cost estimation and code compliance
such standards can serve as a framework that can effec- checking. For example, a wall (interial or exterial) in Uni-
tively facilitate the reconciliation of semantic heterogeneity formatII needs to be associated with the material (metal,
in complex application domains. However, the reality wood or fiberglass) in MaterFormat and conformed to its
shows that a comprehensive priori consensus is extremely intended usage (hurricane or fire proof) according to build-
difficult, if not impossible, to reach. Consequently, various ing code regulations (standards yet to be formalized by the
complementary and competing standards are often created industry). In general, UniformatII by design is more suit-
and their constant-changing nature yields another level of able as a participant communication/interaction framework
challenge in achieving the hypothesis. than MasterFormat during the earlier phases of the life cy-
cle. On the other hand, Masterformat has been used for
This paper focuses on the development of methodology for years and has gained the majority of the construction indus-
bridging complementary standards within an application trial support for specifying detailed project documents. To
domain. We have chosen a target application in the build- facilitate more efficient collaboration among project par-
ing construction domain, where interoperability problems ticipants, it is a common practice to supplement Unifor-
are prevalent and human interactions are commonplace. In matII with Preliminary Project Descriptions (PPDs) or
that domain, a variety of taxonomy-based standards have schematic design in earlier phases, and convert them to
been established but still lack a uniform and systematic way construction documents in Masterformat during later
for supporting efficient collaboration among project par- phases. In addition, the conversion is also necessary for
ticipants using different standards. This problem is further cost calculation since most databases of building materials
compounded by the complexity and the dynamics of busi- suppliers are based on MasterFormat. It is desirable to
ness applications, which often require changes of the well- transform pre-bid elemental estimates to MasterFormat, and
known standards. The interoperability cost in such envi- from there to the trade costs of the project [2]. This process
ronment is tremendous. For example, based on a recent is often tedious and requires cross-area knowledge. Cur-
National Institute of Standards and Technology (NIST) rently, it is done manually by domain experts and it is con-
report [3], a conservative figure of $15.8 billion was deter- sidered a major cause that hampers interoperability in the
mined to be annual costs due to a lack of interoperability in construction domain. Bridging the two standards is a key
the capital facilities industry in 2002. enabler for enhancing the interoperability.
Two mainstream complementary standards, MasterFormat
and UniformatII, in that domain are considered in our re- Directly matching approaches based on attributes of the
search. MasterFormat [1] is a specification standard estab- entities of the standards are expected to be inefficient due to
lished by the Construction Specification Institute (CSI) for the heterogeneous nature of complementary standards. This
most nonresidential building construction projects in North paper proposes a practical compromise by redefining the
America. UniformatII is a newer American Society of Test- notion of mapping with a semi-automatic semantic extrac-
ing and Materials (ASTM) standard aiming at providing a tion framework to assist domain experts in achieving inter-
consistent reference for the description, economic analysis, operability. The mapping is termed as semantic association
and management of buildings during all phases of their life for relating elements between standards, and is dependent
cycles [2]. These standards were created by different on the intended use such as cross-referencing of elements or
stakeholders with different perspectives for different pur- specification semantic mapping. The semantic relationship
poses. For instance, an architect is interested in the design can be characterized in two measurements: similarity (how
and structure of a building, a contractor wants to know what closely objects resemble each other in their representation)
materials are used and how much they cost, and a building and affinity (to what degree they are coupled in their us-
inspector is concerned about building code compliance is- age). In some sense similarity is more static while affinity
sues. MasterFormat classifies items primarily based on the is more dynamic and general. For example, a bicycle is
specification of products and materials used in construction, similar to a car due to their physical structures and proper-
so it is based on a conceptual view of a contractor. Com- ties. However, gasoline is more affinitive to a car although
plementarily, the taxonomical classification in Uniformat II they do not resemble each other. Exploiting affinity in addi-
19
tion to similarity through semantic association is the focus Ontology Development from Taxonomy
of this research.
The term, ontology, has been widely used in several disci-
The approach consists of three components: formalization plines, such as philosophy, epistemology, and computer
of taxonomies, ontology-based semantic extraction and science. There is much confusion in its definition. For ex-
measurement of affinity. The first component is a simple ample, in philosophy it refers to the subject of existence
and yet novel approach for annotating a standard in primi- while in epistemology it is about knowledge and knowing.
tive descriptive statements constructed by a set of necessary In computer science, many people use Gruber’s definition
and sufficient orthogonal relations. They are then normal- [10] – an explicit specification of a conceptualization. In
ized and generalized into ontology. The second component the context of our research, we interpret it as a description
shows how the ontology can be used for the extraction of of the concepts/terms and relationships that can exist in an
relevant information from the instances in other standards application domain. Centered on terms and relations, the
for semantic association. The third component quantifies transformation of taxonomy into ontology is described in
the affinity for ranking the extracted metadata to identify the following steps.
optimal association. The following sections detail the three
Step 1: relation set identification
components and outline an overall architecture of an inte-
gration framework depicting the relationship between the The goal of this step is to identify a sufficient and necessary
proposed approach and other related technologies and sys- set of orthogonal relations for a given taxonomy/standard so
tems. that assumed domain knowledge and complex concepts can
be formally specified. This step should be manually done
2. FORMALIZATION OF TAXONOMY by standard committees who know best about the original
intended use of the standards. The set should be con-
Taxonomies are initially designed for human consumption structed from two types of relations: primitive and derived.
therefore some domain knowledge that is obvious and as- Primitive relations are those that are unambiguously under-
sumed by stakeholders is often omitted in their specifica- stood by the general public and the relationship between
tions. Moreover, taxonomies classifying large and complex concepts connected by them does not change over time.
items usually have the following characteristics: Moreover, they reflect the intrinsic properties of objects or
1. The entities being classified and the attributes upon describe time and space and the intention of users when the
which the classification is based, are themselves com- objects are used. In addition, their definitions should in-
plex concepts. clude set relationship, such as instance-instance, instance-
2. Multiple attributes (different concepts) might be used class, and class-class, to avoid ambiguity. For example,
to classify entities at the same level. part_of is ambiguous since it could mean a subcomponent
of an object or the membership of an object in a class. Its
3. Attributes are not orthogonal and might result in over- meaning can be identified as the first explanation if in-
lapping concepts in low-level entities (an object can fit stance-instance is specified.
into multiple categories).
Derived relations are those that can be composed/modeled
There is a need for a systematic approach for annotating from primitive relations.
assumed semantics, clarifying complex concepts, and trans-
forming them into formal representation before taxonomies To elaborate this step, a small portion of the top three levels
can be effectively used for semantic association. in MasterFormat taxonomy, Division 5 (D5) Metals and Divi-
sion 6 (D6) Wood and Plastic rooted from Material, is exem-
Semantic depends on context and context depends on appli- plified as follows:
cations. In other words, the semantic of a standard is open
depending on how they are used. To avoid a standard being Division 5- Metals
bound to specific applications, the intrinsic semantic of a 05100 Structural Metal Framing
standard without context should include the following: 05120 Structural steel
1. the attributes being used for classification under the 05140 Structural aluminum
general perception in the application domain and
05160 Metal framing systems
2. the entities under the inheritance of the taxonomy and
the attributes. 05400 Cold formed metal framing
To model the intrinsic semantics, ontology is considered in 05410 Load bearing metal studs
this research. The following subsection describes a system- 05420 Cold formed metal joists
atic approach for transforming taxonomy into ontology. 05430 Slotted channel framing
Division 6 - Wood and Plastics
06100 Rough carpentry
20
06110 Wood framing 6. “Load bearing metal studs” are kind_of Metal studs
06400 Architectural woodwork (05410_1) 05410
7. 05410 is used_for 05400 (05400_05410)
06460 Wood frames
Note that each statement is given a unique identifier (fol-
The following relations are identified for formalizing the
lowing ) derived from the original identifier of a taxon-
above example:
omy entity.
1. used_for (class-class, human intention): purpose
Step 3: normalization
2. kind_of (class-class, intrinsic): containment rela- It is likely that redundant or conflict statements are gener-
tion of attributes of instances. ated along the way when domain experts annotate their tax-
3. instance_of (instance-class, intrinsic): member- onomies in the above steps. Based on the mathematical
ship properties of the relations, this step normalizes the state-
4. made_of (class-class, intrinsic): material compo- ments by:
nent 1. redundancy elimination (removing same or equivalent
Table 1 shows the mathematical properties of these rela- statements)
tions that are used in the subsequent step for data normali- 2. conflict detection (for example: A-r1-B, and B-r1-A
zation. They are also used for reasoning in knowledge ex- statements are conflict if r1 has asymmetric property)
traction. 3. implication detection (for example, A-r1-B, and B-r1 C
Table 1. Mathematical Properties of the relations statements imply A-r1-C through transitive property).
Step 4: semi-automatic generalization
Relations Transitive reflexive antisymmetric
This step is to generalize the resulting statements from step
used_for - - - 3 into higher-level concepts connected by the same set of
kind_of + + + relations. Human being intervention is required in this step
due to the complexity of the process. For example, if there
instance_of + + +
exist A-r1-C, A-r1-D, B-r1-C, and B-r1-D, they can be gen-
made_of + - - eralized to concept1{A,B}-r1-concept2{C,D} by union.
However, it becomes difficult when the above example is
Step 2: relation statements construction extended to include concept1{A,B}-r1-E and con-
cept2{C,D}-r2-F. One cannot conclude concept1{A,B}-r1-
This step is to construct simple statements using the rela- concept2{C,D,E} unless an exception indicating no E-r2-F
tions defined in step one and all keywords in the taxonomy. is added. Alternatively, it can be generalized to con-
The statements are then processed in subsequent steps for cept1{A,B}-r1-concept3{E,concept2{C,D}}. The system
constructing ontology. There are two advantages using this interacts with users by prompting the dilemmas for resolu-
bottom-up approach for formalizing taxonomies. One is tions along the process of a whole taxonomy.
that it can better address the dynamic nature of standards by
enabling incremental updates and modifications of the Figure 1 shown below depicts the generalized view or on-
statements and their resulting ontology. The other advan- tology of the relation statements shown in previous steps.
tage is that domain experts who are not familiar with ontol-
ogy can directly express their knowledge in the simple Material
statements without communication overhead with knowl-
used_for
edge modeling experts. made_of kind_of
The following are examples of relation statements that par-
tially describe the example shown in previous step. Item Metal Function
1. Metals (D5), Wood (D6), Plastics (D6_1) are in-
stance_of Material (root) (D5_root, D6_root, kind_of kind_of kind_of
D6_1_root)
Steel Aluminum Process
2. Metals (D5) are used_for framing 05100_1
3. Structural is a kind_of “metal framing” (05100_1)
05100 {metals, wood, plastics ..} are instance_of Material
4. Cold formed is a kind_of “metal framing” (05100_1) {stud, joist ..}are instance_of Item
05400 {framing, ..}are instance_of Function
5. Studes are made_of Metals (D5) (05410_1) {cold formed, structural ..} are instance_of Process
Figure 1. Ontology Example
21
4. ONTOLOGY-BASED SEMANTIC EXTRACTION transitive property of the relation, kind_of. The match is
extended to statement 05400, which includes “cold-formed”.
The task of the previous module, standard formalization, is Finally “studs” is added to the match of statement 05410,
usually a one-time effort (though it is an iterative process) through statement (05400_05410). Indeed the entity B2010
and it needs significant domain experts’ involvement. Exterior Wall in UniformatII has a semantic relationship with
This module is different in that it is used in every work- 05410 Load bearing metal studs in MasterFormat and the
flow/task and extracted semantics can be accumulated in semantic can be described by the relation made_of.
repository and used for improving future semantic associa- One characteristic worthy of mentioning is that the entity
tion performance. Also, it can be relatively automated by B2010 Exterior Wall in the taxonomy provides a good con-
using general linguistic processing technologies. text for helping refining the association. For instance, the
Standards, such as UniformatII and MasterFormat, ad- above matching, even without the “framing” keyword, is still
dressed in this paper are functionally complementary to possible since the inherited semantic of the hierarchy, shell,
each other in an application domain and they are costly closure, and exterior walls, has very close meaning as framing.
cross-referenced by domain experts in workflows due to As shown in the above example, the documents or specifi-
their complexity (vast many-to-many mappings). This cations that this research addresses have following charac-
module basically is to automat the process by mimicking a teristics:
domain expert doing cross-referencing from the context of a 1. Content has limited scope. It often details what,
standard-compliant project specification, a script represen- where, how, and when objects and activities being
tation indexed of the standard, which defines intentionality. involved in a domain application. It usually con-
For example, the following text is quoted from a PPD [7] tains rich semantics (author’s intention for com-
under entity B2010 in UniformatII taxonomy: municating with other stakeholders) related to
B SHELL standards (due to the agreement among stake-
B20 EXTERIOR CLOSURE holders) that coordinate objects and activities in
the domain.
B2010 EXTERIOR WALLS
2. Content are categorized according to taxonomy. In
1. Exterior Wall Framing: Cold-formed, light gage other words, text in a document has some assump-
steel studs, C-shape, galvanized finish, 6" metal tion or context, which is inherited along the taxon-
thickness as designed by manufacturer according omy hierarchy.
to American Iron and Steel Institute (AISI) Specifi-
cation for the Design of Cold Formed Steel Struc- 3. Terminologies are relatively unified and unambi-
tural Members, for L/240 deflection. Downside: guous.
specifications often contain note-style sentences. 4. Sentences are relatively free styled, such as note-
Supposedly, the PPD is written by an architect and a con- styled or template-styled due to writing convention
tractor wants to estimate cost for exterior walls. He might or standards.
comprehend that the wall framing will be made of cold- These characteristics distinguish this research from others,
formed steel studs (semantic). Based on his expertise, he such as [9] and [15] which extract shallow information from
identifies that its corresponding entity in Masterformat is general or web documents.
05410 Load bearing metal studs (association). The following In addition to the intrinsic semantics of standards, this
paragraph shows how the ontology/relation statements be- module also explores their application or context semantics
ing used for discovering the semantic under the context of in order to achieve more effective semantic extraction. The
entity B2010 that links the entity to MasterFormat entity application semantics depend on the stakeholders’ view or
05410 (semantic association): interests, such as information they intent for. For example,
a cost estimator might look for MasterFormat items and
B2010 Exterior Wall: some numerical information so that they can link them to
1. Exterior Wall Framing: Cold-formed, light gage their MasterFormat-based cost databases. On the other
steel studs, C-shape, galvanized finish, 6" metal hand, an inspector might be interested in the same informa-
thickness tion but in different view points that yield to different se-
mantics. For example, to a cost estimator, “6" metal thick-
ness” in the PPD means how much the studs with such
05100_1 05400 05410 thickness cost. But for an inspector, it means 6” thickness
compliance to associated code.
In the diagram, “steel” and “framing” match the statement
05100_1 (one of the identifiers of the relation statements In summary, this module extracts semantics from the in-
exemplified in previous subsection) which is Metals (D5) stances (specifications) of multiple standards based on three
used_for framing. The “steel” matches “Metals” through the kinds of ontologies: the ontology of the source standard, the
22
ontology of target standard, and the application ontology ETIF
based on the stakeholders’ views. The extracted semantics
are evidences of semantic association of entities between
Ontology DB
source and target standards.
Standards, In- Ontology Ontology Mapping,
5. MEASUREMENT OF AFFINITY stances, Views, Generation Reconciling, Merge,
meta data, Rela- Data Mining
tions
The ontology-based semantic extraction module can be
implemented via a matching process between relation state-
ments and text. The goal is to identify a set of matched Change Man- Semantic
relation statements of related entities with respect to their agement Extraction & Ontology
standards. For a given entity, its associated relation state- Association Editing and
Taxonomy Presentation
ments carry different weights depending on their positions Formalization
in the taxonomy and the information content [16] of their
keywords. The measurement of affinity is to quantify the
Protocol Adapter Component Plug-in: XML Interface & API,
weights so that the degree of the closeness between
matched relation statements and their associated entity can Internet
be determined. Based on the measurement, a ranking
scheme can be devised to identify optimal semantic associa-
tions among all matches. The ranking scheme can be mod- Data Human Semantic Web
eled as a function of the following factors: View: Appli- Services
Instance, cation Ontol- Browsers: Based Dy-
1. Number of relation statements matched. Standards. PPD…. ogy, standard PC, Mobile namic Work-
2. Number of keywords matched. IDs Devices flow Systems
3. Quality of the matches. The measurement of the qual-
ity is an open question. Basically the more specific the Figure 2. Extensible Taxonomy-based Integration Framework (ETIF)
matches are, the higher quality they represent. One ef-
fective way to model the quality is by their positions in
In the framework shown in Figure 2, relations and relation
the taxonomy (higher level means less specific and thus
statements of various versions of standards written in natu-
carries less weight) and by the information content of
ral languages are developed and uploaded via web-based
their keywords. The information content can be quanti-
tools to the system by stakeholders in the application do-
fied by their inverse document frequency (IDF) [17]
main. The taxonomy formalization along with the change
combined with their counts in the taxonomy (appearing
management modules process them through parsing, nor-
more times means less specific and thus carries less
malization, generalization, linguistic processing (such as
weight)
inflection, derivation, compounds, and synonyms), and in-
For instance, in the given example, several entities in Mas- dexing for incremental update in the ontology database.
terFormat contain “framing” and “Metals”, which are all can- For a particular application, the stakeholders upload in-
didates for semantic association. The entity 05410 is con- stances of the source standard (e.g., PPDs), target standard,
sidered as the optimal one because it matches more key- and its application ontology. After processing the free text
words along its taxonomy hierarchy and some of them, such of PPD instances through linguistic techniques such as to-
as studs, are very specific with respect to both position and kenization, chunk parsing, and grammatical function recog-
IDF. nition [9], the system applies the semantic extraction and
ranking algorithms, and returns/deposits extracted metadata
6. ARCHITECTURE and semantic association to the ontology database and also
to the users or clients, if applicable, for feedback.
The major thrust of the research is to develop an integration
framework that facilitates exploitation of semantics from
taxonomy-based standards and instantiations of the stan- The integration of competing and complementary standards
dards to achieve higher interoperability between domain is a critical step for enhancing interoperability among het-
participants and their information systems. To demonstrate erogeneous systems using the standards. The proposed
the applicability of the proposed approach toward the goal, semantic association is only one aspect in this effort. It
this section shows an overall architecture depicting one should be supplemented with other technologies such as
possible implementation and its relationship with other re- ontology mapping, reconciling, and merging to provide a
lated technologies. practical and complete solution. The framework includes a
plug-in mechanism via XML-based interfaces and API for
external software component integration.
23
The formalized standards, their instances, users’ application [5] N.F. Noy and M.A. Musen. The prompt suite: In-
ontologies, and extracted metadata form a semantic rich teractive tools for ontology merging and mapping.
ontology repository. Integrating the repository with other Journal of Human-Computer Studies, 59(6):983--
ontology techniques through the plug-in mechanism allows 1024, 2003.
the effective construction of application domain ontology. [6] M. Paolucci, T. Kawamura, T. Payne, and K. Sy-
Web services enriched with the vision of the semantic web cara. Semantic matching of web services capabili-
have emerged as a mainstream solution to system integra- ties. In The First International Semantic Web Con-
tion over the Internet. Following the same trend, the im- ference (ISWC), 2002.
plementation of the proposed framework adopts the Web [7] Rosen, Harold J. : Construction specifications writ-
Ontology Language (OWL) [8] with the intention of inte- ing : principles and procedures 5th edition, Hobo-
grating building construction workflow systems via seman- ken, N.J. : J. Wiley, c2005.
tic web services.
[8] Mike Dean and Guus Schreiber: Editors OWL Web
Ontology Language Reference, W3C Recommen-
7. CONCLUSION AND FUTURE WORKS
dation, http://www.w3.org/TR/2004/REC-owl-ref-
20040210, 10 February 2004.
This paper demonstrates the effective use of taxonomy for
[9] Maedche, A., Neumann, G., Staab, S.: Bootstrap-
ontology developments and the semantic association of
ping an Ontology-Based Information Extraction
ontology for interoperability in a workflow system with
System, Intelligent Exploration of the Web,
building construction as the target example. It illustrates a
Springer 2002.
systematic approach to semantic association through taxon-
omy formalization and ontology-based semantic extraction. [10] Gruber, T.R., A Translation Approach to Portable
The overall system implementation in web environment is Ontology Specification: Knowledge Acquisition 5:
also proposed. Current activities of the research project 199-220, 1993.
include the complete ontological formalization of the Ma- [11] Rahm, E and Bernstein, P. A. “A Survey of Ap-
terFormat and UniformatII standards, refinement of the proaches to Automatic Schema Matching.” The
affinity measure for general taxonomy, and the integration VLDB Journal, Vol. 10, pp. 334-350, 2001.
of the algorithms with dynamic workflow systems through [12] Do, H., Melnik, S. and Rahm, E. “Comparison of
semantic web services. Schema Matching Evaluations.” In Proceedings of
the 2nd Int. Workshop on Web Databases (German
8. ACKNOWLEDGMENTS Informatics Society), 2002.
[13] Aberber, K., Cudré-Mauroux, P. and Hauswirth, M.
This work is partially supported by an NSF research grant
“The Chatty Web: Element Semantics through Gos-
ITR-0404113.
siping.” The Proceedings of the 20th International
REFERENCES World Wide Web Conference, pp. 197 – 206, 2003.
[14] Doan, A., Madhavan, J., Domingos, P. and Halevy,
[1] Construction Specifications Institute. MasterFormat A. “Learning to Map between Ontologies on the
95™ : Alexandria, VA: The Construction Specifica- Semantic Web.” The VLDB Journal, Vol. 12, pp.
tions Institute, 1995 edition. 303-319, 2003.
[2] Charette, R. P. and Marshall, H. E.: UNIFORMAT [15] David W. Embley , Douglas M. Campbell , Randy
II Elemental Classification for Building Specifica- D. Smith , Stephen W. Liddle.: Ontology-based ex-
tions, Cost Estimating, and Cost Analysis, NISTIR traction and structuring of information from data-
6389, Gaithersburg, MD: National Institute of rich unstructured documents, Proceedings of the
Standards and Technology, October, 1999 seventh international conference on Information
[3] Gallaher, M. P.; O'Connor, A. C.; Dettbarn, J. L., and knowledge management, p.52-59, November
Jr.; Gilday, L. T.: Cost Analysis of Inadequate In- 02-07, 1998, Bethesda, Maryland, United States
teroperability in the U.S. Capital Facilities Industry, [16] Ross, S.: A First Course in Probability. Macmillan
NIST GCR 04-867, Gaithersburg, MD: National Publishing, 1976.
Institute of Standards and Technology, August, [17] Church, K. W. and Gale, W. A. : Inverse document
2004. frequency (IDF): A measure of deviations from
[4] Jayant Madhavan, Philip A. Bernstein, and Erhard Poisson. In Yarowsky, D. and Church, K., editors,
Rahm: Generic Schema Matching with Cupid, at Proceedings of the Third Workshop on Very Large
the Twenty Seventh International Conference on Corpora, pages 121--130. Association for Compu-
Very Large Databases (VLDB'2001), Roma, Italy. tational Linguistics. 1995.
24