=Paper= {{Paper |id=Vol-156/paper-4 |storemode=property |title=Semantic Association of Taxonomy-based Standards Using Ontology |pdfUrl=https://ceur-ws.org/Vol-156/paper4.pdf |volume=Vol-156 |dblpUrl=https://dblp.org/rec/conf/kcap/ChuCCIM05 }} ==Semantic Association of Taxonomy-based Standards Using Ontology== https://ceur-ws.org/Vol-156/paper4.pdf
                           Semantic Association of Taxonomy-based Standards Using Ontology
                                       Hung-Ju Chu, Randy Y. C. Chow, Su-Shing Chen
                                Computer and Information Science and Engineering, University of Florida
                                                                Gainesville, FL, U.S.A.
                                                         {hchu, chow, suchen}@cise.ufl.edu
                                                               Raja R.A. Issa, Ivan Mutis
                                       Rinker School of Building Construction, University of Florida.
                                                                Gainesville, FL, U.S.A.
                                                           {raymond-issa, imutis}@ufl.edu


ABSTRACT                                                                            with respect to their usage) from automated project docu-
                                                                                    ment processing and semi-automatic domain expert inputs.
The vision of semantic interoperability, the fluid sharing of                       A high-level architecture of an integration framework in
digitalized knowledge, has led much research on ontol-                              web environment is suggested for depicting the role of the
ogy/schema mapping/aligning. Although this line of re-                              semantic association approach in the system.
search is fundamental and has brought valuable contribu-
                                                                                    Categories and Subject Descriptors
tions to this endeavor, it does not represent a solution to the
challenge, semantic heterogeneity, since the performance of                         H.3.1 [Information Storage and Retrieval]: Content Analy-
proposed approaches significantly relies on the degree of                           sis and Indexing - Indexing methods, Linguistic process;
uniformity, formalization and sufficiency of data represen-                         I.2.4 [Artificial Intelligence]: I.2.1 Applications and Expert
tations but most of today’s independently developed infor-                          Systems - Industrial automation; I.2.4 Knowledge Repre-
mation systems seldom have common knowledge modeling                                sentation Formalisms and Methods; I.2.6 [Artificial Intelli-
frameworks and their data are often not formally and ade-                           gence]: Learning - Knowledge Acquisition
quately specified. Consequently, a workable solution usu-
                                                                                    Keywords
ally requires interventions of domain experts.
                                                                                    taxonomy and standards, semantic interoperability, ontol-
In human society, hierarchically structured standards (or                           ogy-based knowledge extraction, semantic mapping.
taxonomies) for characterizing complex application proc-
esses and objects used in the processes are often used as a                         1. INTRODUCTION
common and effective way to achieve some semantic
agreements among stakeholders within a domain. This                                 The vision of semantic interoperability, the fluid sharing of
research hypothesizes that the establishment and the use of                         digitalized knowledge, has led much research on ontology
such standards can serve as a framework that can effec-                             (formal specification of conceptualization) and its lan-
tively facilitate the reconciliation of semantic heterogeneity                      guages, such as Web Ontology Language (OWL) [8]. The
in complex application domains. However, the reality                                language provides primitives for specifying concepts, prop-
shows that a comprehensive priori consensus is extremely                            erties, explicit semantic relationships, and logical con-
difficult, if not impossible, to reach. Consequently, various                       straints on those objects. However, it does not address the
complementary and competing standards are often created                             issue of semantic heterogeneity between two independently
and their constant-changing nature yields another level of                          developed ontologies. For example, a program that reads an
challenge in achieving the hypothesis.                                              ontology in OWL does not understand another ontology in
This paper focuses on the development of methodology for                            the same language unless there is an explicit mapping be-
bridging complementary standards within an application                              tween them. This difficulty has led much research on on-
domain. It exemplifies such standards in building construc-                         tology/schema mapping/alignment [4], [5], [6], [11], [12],
tion industry where interoperability problems are prevalent                         [13], and [14] and various matching technologies have been
and human interactions are commonplace. It proposes a                               developed based on the attributes of objects and their asso-
semi-automatic approach for semantically associating the                            ciated data. Although this line of research is fundamental
standards to reduce costly human intervention in a work-                            and has brought valuable contributions to this endeavor, it
flow. The approach formalizes standards by using ontology                           does not represent a solution to the challenge as we see.
and discovers their affinity (to what degree they are related                       The performance of proposed approaches significantly re-
                                                                                    lies on the degree of uniformity, formalization and suffi-
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
                                                                                    ciency of data representations. Unfortunately, the concept
not made or distributed for profit or commercial advantage and that                 of unified, formal, and sufficient specification is often an
copies bear this notice and the full citation on the first page. To copy            after-thought and most of today’s independently developed
otherwise, or republish, to post on servers or to redistribute to lists, re-
quires prior specific permission by the copyright owners.
 Copyright 2005
                                                                               18
information systems seldom have common knowledge                      is primarily based on the attributes and location of struc-
modeling frameworks and their data are often not formally             tural building components, such as foundations and exterior
and adequately specified. Consequently a workable solu-               walls, which reflects the architect’s view of a construction
tion usually requires interventions of domain experts.                project. Although their views are different but both address
In human society, hierarchically structured standards (or             the same building object. In other words, the taxonomies of
taxonomies) for characterizing complex application proc-              the standards classify the same set of objects but on differ-
esses and objects used in the processes are often used as a           ent attributes. From here one can easily infer that cross-
common and effective way to achieve some semantic                     referencing or document conversion between the standards
agreements among stakeholders within a domain. This                   is inevitable for interaction among project participants in
research hypothesizes that the establishment and the use of           applications such as cost estimation and code compliance
such standards can serve as a framework that can effec-               checking. For example, a wall (interial or exterial) in Uni-
tively facilitate the reconciliation of semantic heterogeneity        formatII needs to be associated with the material (metal,
in complex application domains. However, the reality                  wood or fiberglass) in MaterFormat and conformed to its
shows that a comprehensive priori consensus is extremely              intended usage (hurricane or fire proof) according to build-
difficult, if not impossible, to reach. Consequently, various         ing code regulations (standards yet to be formalized by the
complementary and competing standards are often created               industry). In general, UniformatII by design is more suit-
and their constant-changing nature yields another level of            able as a participant communication/interaction framework
challenge in achieving the hypothesis.                                than MasterFormat during the earlier phases of the life cy-
                                                                      cle. On the other hand, Masterformat has been used for
This paper focuses on the development of methodology for              years and has gained the majority of the construction indus-
bridging complementary standards within an application                trial support for specifying detailed project documents. To
domain. We have chosen a target application in the build-             facilitate more efficient collaboration among project par-
ing construction domain, where interoperability problems              ticipants, it is a common practice to supplement Unifor-
are prevalent and human interactions are commonplace. In              matII with Preliminary Project Descriptions (PPDs) or
that domain, a variety of taxonomy-based standards have               schematic design in earlier phases, and convert them to
been established but still lack a uniform and systematic way          construction documents in Masterformat during later
for supporting efficient collaboration among project par-             phases. In addition, the conversion is also necessary for
ticipants using different standards. This problem is further          cost calculation since most databases of building materials
compounded by the complexity and the dynamics of busi-                suppliers are based on MasterFormat. It is desirable to
ness applications, which often require changes of the well-           transform pre-bid elemental estimates to MasterFormat, and
known standards. The interoperability cost in such envi-              from there to the trade costs of the project [2]. This process
ronment is tremendous. For example, based on a recent                 is often tedious and requires cross-area knowledge. Cur-
National Institute of Standards and Technology (NIST)                 rently, it is done manually by domain experts and it is con-
report [3], a conservative figure of $15.8 billion was deter-         sidered a major cause that hampers interoperability in the
mined to be annual costs due to a lack of interoperability in         construction domain. Bridging the two standards is a key
the capital facilities industry in 2002.                              enabler for enhancing the interoperability.
Two mainstream complementary standards, MasterFormat
and UniformatII, in that domain are considered in our re-             Directly matching approaches based on attributes of the
search. MasterFormat [1] is a specification standard estab-           entities of the standards are expected to be inefficient due to
lished by the Construction Specification Institute (CSI) for          the heterogeneous nature of complementary standards. This
most nonresidential building construction projects in North           paper proposes a practical compromise by redefining the
America. UniformatII is a newer American Society of Test-             notion of mapping with a semi-automatic semantic extrac-
ing and Materials (ASTM) standard aiming at providing a               tion framework to assist domain experts in achieving inter-
consistent reference for the description, economic analysis,          operability. The mapping is termed as semantic association
and management of buildings during all phases of their life           for relating elements between standards, and is dependent
cycles [2]. These standards were created by different                 on the intended use such as cross-referencing of elements or
stakeholders with different perspectives for different pur-           specification semantic mapping. The semantic relationship
poses. For instance, an architect is interested in the design         can be characterized in two measurements: similarity (how
and structure of a building, a contractor wants to know what          closely objects resemble each other in their representation)
materials are used and how much they cost, and a building             and affinity (to what degree they are coupled in their us-
inspector is concerned about building code compliance is-             age). In some sense similarity is more static while affinity
sues. MasterFormat classifies items primarily based on the            is more dynamic and general. For example, a bicycle is
specification of products and materials used in construction,         similar to a car due to their physical structures and proper-
so it is based on a conceptual view of a contractor. Com-             ties. However, gasoline is more affinitive to a car although
plementarily, the taxonomical classification in Uniformat II          they do not resemble each other. Exploiting affinity in addi-




                                                                 19
tion to similarity through semantic association is the focus          Ontology Development from Taxonomy
of this research.
                                                                      The term, ontology, has been widely used in several disci-
The approach consists of three components: formalization              plines, such as philosophy, epistemology, and computer
of taxonomies, ontology-based semantic extraction and                 science. There is much confusion in its definition. For ex-
measurement of affinity. The first component is a simple              ample, in philosophy it refers to the subject of existence
and yet novel approach for annotating a standard in primi-            while in epistemology it is about knowledge and knowing.
tive descriptive statements constructed by a set of necessary         In computer science, many people use Gruber’s definition
and sufficient orthogonal relations. They are then normal-            [10] – an explicit specification of a conceptualization. In
ized and generalized into ontology. The second component              the context of our research, we interpret it as a description
shows how the ontology can be used for the extraction of              of the concepts/terms and relationships that can exist in an
relevant information from the instances in other standards            application domain. Centered on terms and relations, the
for semantic association. The third component quantifies              transformation of taxonomy into ontology is described in
the affinity for ranking the extracted metadata to identify           the following steps.
optimal association. The following sections detail the three
                                                                      Step 1: relation set identification
components and outline an overall architecture of an inte-
gration framework depicting the relationship between the              The goal of this step is to identify a sufficient and necessary
proposed approach and other related technologies and sys-             set of orthogonal relations for a given taxonomy/standard so
tems.                                                                 that assumed domain knowledge and complex concepts can
                                                                      be formally specified. This step should be manually done
2. FORMALIZATION OF TAXONOMY                                          by standard committees who know best about the original
                                                                      intended use of the standards. The set should be con-
Taxonomies are initially designed for human consumption               structed from two types of relations: primitive and derived.
therefore some domain knowledge that is obvious and as-               Primitive relations are those that are unambiguously under-
sumed by stakeholders is often omitted in their specifica-            stood by the general public and the relationship between
tions. Moreover, taxonomies classifying large and complex             concepts connected by them does not change over time.
items usually have the following characteristics:                     Moreover, they reflect the intrinsic properties of objects or
1.   The entities being classified and the attributes upon            describe time and space and the intention of users when the
     which the classification is based, are themselves com-           objects are used. In addition, their definitions should in-
     plex concepts.                                                   clude set relationship, such as instance-instance, instance-
2.   Multiple attributes (different concepts) might be used           class, and class-class, to avoid ambiguity. For example,
     to classify entities at the same level.                          part_of is ambiguous since it could mean a subcomponent
                                                                      of an object or the membership of an object in a class. Its
3.   Attributes are not orthogonal and might result in over-          meaning can be identified as the first explanation if in-
     lapping concepts in low-level entities (an object can fit        stance-instance is specified.
     into multiple categories).
                                                                      Derived relations are those that can be composed/modeled
There is a need for a systematic approach for annotating              from primitive relations.
assumed semantics, clarifying complex concepts, and trans-
forming them into formal representation before taxonomies             To elaborate this step, a small portion of the top three levels
can be effectively used for semantic association.                     in MasterFormat taxonomy, Division 5 (D5) Metals and Divi-
                                                                      sion 6 (D6) Wood and Plastic rooted from Material, is exem-
Semantic depends on context and context depends on appli-             plified as follows:
cations. In other words, the semantic of a standard is open
depending on how they are used. To avoid a standard being             Division 5- Metals
bound to specific applications, the intrinsic semantic of a            05100 Structural Metal Framing
standard without context should include the following:                     05120 Structural steel
1.   the attributes being used for classification under the                05140 Structural aluminum
     general perception in the application domain and
                                                                           05160 Metal framing systems
2.   the entities under the inheritance of the taxonomy and
     the attributes.                                                   05400 Cold formed metal framing
To model the intrinsic semantics, ontology is considered in                05410 Load bearing metal studs
this research. The following subsection describes a system-                05420 Cold formed metal joists
atic approach for transforming taxonomy into ontology.                     05430 Slotted channel framing
                                                                      Division 6 - Wood and Plastics
                                                                       06100 Rough carpentry



                                                                 20
    06110 Wood framing                                                     6.  “Load bearing metal studs” are kind_of Metal studs
 06400 Architectural woodwork                                                  (05410_1)  05410
                                                                           7. 05410 is used_for 05400  (05400_05410)
    06460 Wood frames
                                                                       Note that each statement is given a unique identifier (fol-
The following relations are identified for formalizing the
                                                                       lowing ) derived from the original identifier of a taxon-
above example:
                                                                       omy entity.
    1.   used_for (class-class, human intention): purpose
                                                                       Step 3: normalization
    2.   kind_of (class-class, intrinsic): containment rela-           It is likely that redundant or conflict statements are gener-
         tion of attributes of instances.                              ated along the way when domain experts annotate their tax-
    3.   instance_of       (instance-class, intrinsic): member-        onomies in the above steps. Based on the mathematical
         ship                                                          properties of the relations, this step normalizes the state-
    4.   made_of (class-class, intrinsic): material compo-             ments by:
         nent                                                          1. redundancy elimination (removing same or equivalent
Table 1 shows the mathematical properties of these rela-                     statements)
tions that are used in the subsequent step for data normali-           2. conflict detection (for example: A-r1-B, and B-r1-A
zation. They are also used for reasoning in knowledge ex-                    statements are conflict if r1 has asymmetric property)
traction.                                                              3. implication detection (for example, A-r1-B, and B-r1 C
     Table 1. Mathematical Properties of the relations                       statements imply A-r1-C through transitive property).
                                                                       Step 4: semi-automatic generalization
   Relations     Transitive        reflexive   antisymmetric
                                                                       This step is to generalize the resulting statements from step
   used_for            -               -              -                3 into higher-level concepts connected by the same set of
    kind_of            +              +              +                 relations. Human being intervention is required in this step
                                                                       due to the complexity of the process. For example, if there
  instance_of          +              +              +
                                                                       exist A-r1-C, A-r1-D, B-r1-C, and B-r1-D, they can be gen-
   made_of             +               -              -                eralized to concept1{A,B}-r1-concept2{C,D} by union.
                                                                       However, it becomes difficult when the above example is
Step 2: relation statements construction                               extended to include concept1{A,B}-r1-E and con-
                                                                       cept2{C,D}-r2-F. One cannot conclude concept1{A,B}-r1-
This step is to construct simple statements using the rela-            concept2{C,D,E} unless an exception indicating no E-r2-F
tions defined in step one and all keywords in the taxonomy.            is added. Alternatively, it can be generalized to con-
The statements are then processed in subsequent steps for              cept1{A,B}-r1-concept3{E,concept2{C,D}}. The system
constructing ontology. There are two advantages using this             interacts with users by prompting the dilemmas for resolu-
bottom-up approach for formalizing taxonomies. One is                  tions along the process of a whole taxonomy.
that it can better address the dynamic nature of standards by
enabling incremental updates and modifications of the                  Figure 1 shown below depicts the generalized view or on-
statements and their resulting ontology. The other advan-              tology of the relation statements shown in previous steps.
tage is that domain experts who are not familiar with ontol-
ogy can directly express their knowledge in the simple                                             Material
statements without communication overhead with knowl-
                                                                                                                      used_for
edge modeling experts.                                                             made_of                  kind_of
The following are examples of relation statements that par-
tially describe the example shown in previous step.                                 Item            Metal               Function
      1. Metals (D5), Wood (D6), Plastics (D6_1) are in-
          stance_of Material (root)  (D5_root, D6_root,                             kind_of                kind_of          kind_of
          D6_1_root)
                                                                                 Steel            Aluminum                Process
      2. Metals (D5) are used_for framing  05100_1
      3. Structural is a kind_of “metal framing” (05100_1)
           05100                                                      {metals, wood, plastics ..} are instance_of Material
      4. Cold formed is a kind_of “metal framing” (05100_1)            {stud, joist ..}are instance_of Item
           05400                                                      {framing, ..}are instance_of Function
      5. Studes are made_of Metals (D5)  (05410_1)                    {cold formed, structural ..} are instance_of Process
                                                                                           Figure 1. Ontology Example



                                                                  21
4.    ONTOLOGY-BASED SEMANTIC EXTRACTION                              transitive property of the relation, kind_of. The match is
                                                                      extended to statement 05400, which includes “cold-formed”.
The task of the previous module, standard formalization, is           Finally “studs” is added to the match of statement 05410,
usually a one-time effort (though it is an iterative process)         through statement (05400_05410). Indeed the entity B2010
and it needs significant domain experts’ involvement.                 Exterior Wall in UniformatII has a semantic relationship with
This module is different in that it is used in every work-            05410 Load bearing metal studs in MasterFormat and the
flow/task and extracted semantics can be accumulated in               semantic can be described by the relation made_of.
repository and used for improving future semantic associa-            One characteristic worthy of mentioning is that the entity
tion performance. Also, it can be relatively automated by             B2010 Exterior Wall in the taxonomy provides a good con-
using general linguistic processing technologies.                     text for helping refining the association. For instance, the
Standards, such as UniformatII and MasterFormat, ad-                  above matching, even without the “framing” keyword, is still
dressed in this paper are functionally complementary to               possible since the inherited semantic of the hierarchy, shell,
each other in an application domain and they are costly               closure, and exterior walls, has very close meaning as framing.
cross-referenced by domain experts in workflows due to                As shown in the above example, the documents or specifi-
their complexity (vast many-to-many mappings). This                   cations that this research addresses have following charac-
module basically is to automat the process by mimicking a             teristics:
domain expert doing cross-referencing from the context of a               1.   Content has limited scope. It often details what,
standard-compliant project specification, a script represen-                   where, how, and when objects and activities being
tation indexed of the standard, which defines intentionality.                  involved in a domain application. It usually con-
For example, the following text is quoted from a PPD [7]                       tains rich semantics (author’s intention for com-
under entity B2010 in UniformatII taxonomy:                                    municating with other stakeholders) related to
B SHELL                                                                        standards (due to the agreement among stake-
     B20 EXTERIOR CLOSURE                                                      holders) that coordinate objects and activities in
                                                                               the domain.
     B2010 EXTERIOR WALLS
                                                                          2.   Content are categorized according to taxonomy. In
       1. Exterior Wall Framing: Cold-formed, light gage                       other words, text in a document has some assump-
          steel studs, C-shape, galvanized finish, 6" metal                    tion or context, which is inherited along the taxon-
          thickness as designed by manufacturer according                      omy hierarchy.
          to American Iron and Steel Institute (AISI) Specifi-
          cation for the Design of Cold Formed Steel Struc-               3.   Terminologies are relatively unified and unambi-
          tural Members, for L/240 deflection. Downside:                       guous.
          specifications often contain note-style sentences.              4.   Sentences are relatively free styled, such as note-
Supposedly, the PPD is written by an architect and a con-                      styled or template-styled due to writing convention
tractor wants to estimate cost for exterior walls. He might                    or standards.
comprehend that the wall framing will be made of cold-                These characteristics distinguish this research from others,
formed steel studs (semantic). Based on his expertise, he             such as [9] and [15] which extract shallow information from
identifies that its corresponding entity in Masterformat is           general or web documents.
05410 Load bearing metal studs (association). The following           In addition to the intrinsic semantics of standards, this
paragraph shows how the ontology/relation statements be-              module also explores their application or context semantics
ing used for discovering the semantic under the context of            in order to achieve more effective semantic extraction. The
entity B2010 that links the entity to MasterFormat entity             application semantics depend on the stakeholders’ view or
05410 (semantic association):                                         interests, such as information they intent for. For example,
                                                                      a cost estimator might look for MasterFormat items and
     B2010 Exterior Wall:                                             some numerical information so that they can link them to
     1. Exterior Wall Framing: Cold-formed, light gage                their MasterFormat-based cost databases. On the other
     steel studs, C-shape, galvanized finish, 6" metal                hand, an inspector might be interested in the same informa-
     thickness                                                        tion but in different view points that yield to different se-
                                                                      mantics. For example, to a cost estimator, “6" metal thick-
                                                                      ness” in the PPD means how much the studs with such
       05100_1               05400              05410                 thickness cost. But for an inspector, it means 6” thickness
                                                                      compliance to associated code.
In the diagram, “steel” and “framing” match the statement
05100_1 (one of the identifiers of the relation statements            In summary, this module extracts semantics from the in-
exemplified in previous subsection) which is Metals (D5)              stances (specifications) of multiple standards based on three
used_for framing. The “steel” matches “Metals” through the            kinds of ontologies: the ontology of the source standard, the



                                                                 22
ontology of target standard, and the application ontology              ETIF
based on the stakeholders’ views. The extracted semantics
are evidences of semantic association of entities between
                                                                               Ontology DB
source and target standards.
                                                                          Standards,   In-          Ontology             Ontology Mapping,
5. MEASUREMENT OF AFFINITY                                                stances, Views,           Generation           Reconciling, Merge,
                                                                          meta data, Rela-                               Data Mining
                                                                          tions
The ontology-based semantic extraction module can be
implemented via a matching process between relation state-
ments and text. The goal is to identify a set of matched                  Change Man-             Semantic
relation statements of related entities with respect to their             agement                 Extraction &           Ontology
standards. For a given entity, its associated relation state-                                     Association            Editing    and
                                                                          Taxonomy                                       Presentation
ments carry different weights depending on their positions                Formalization
in the taxonomy and the information content [16] of their
keywords. The measurement of affinity is to quantify the
                                                                        Protocol Adapter          Component Plug-in: XML Interface & API,
weights so that the degree of the closeness between
matched relation statements and their associated entity can                                         Internet
be determined. Based on the measurement, a ranking
scheme can be devised to identify optimal semantic associa-
tions among all matches. The ranking scheme can be mod-                 Data                                     Human          Semantic Web
eled as a function of the following factors:                                                   View: Appli-                     Services
                                                                                     Instance, cation Ontol-     Browsers:      Based Dy-
1.   Number of relation statements matched.                            Standards.    PPD…. ogy, standard         PC, Mobile     namic Work-
2.   Number of keywords matched.                                                               IDs               Devices        flow Systems

3.   Quality of the matches. The measurement of the qual-
     ity is an open question. Basically the more specific the          Figure 2. Extensible Taxonomy-based Integration Framework (ETIF)
     matches are, the higher quality they represent. One ef-
     fective way to model the quality is by their positions in
                                                                      In the framework shown in Figure 2, relations and relation
     the taxonomy (higher level means less specific and thus
                                                                      statements of various versions of standards written in natu-
     carries less weight) and by the information content of
                                                                      ral languages are developed and uploaded via web-based
     their keywords. The information content can be quanti-
                                                                      tools to the system by stakeholders in the application do-
     fied by their inverse document frequency (IDF) [17]
                                                                      main. The taxonomy formalization along with the change
     combined with their counts in the taxonomy (appearing
                                                                      management modules process them through parsing, nor-
     more times means less specific and thus carries less
                                                                      malization, generalization, linguistic processing (such as
     weight)
                                                                      inflection, derivation, compounds, and synonyms), and in-
For instance, in the given example, several entities in Mas-          dexing for incremental update in the ontology database.
terFormat contain “framing” and “Metals”, which are all can-          For a particular application, the stakeholders upload in-
didates for semantic association. The entity 05410 is con-            stances of the source standard (e.g., PPDs), target standard,
sidered as the optimal one because it matches more key-               and its application ontology. After processing the free text
words along its taxonomy hierarchy and some of them, such             of PPD instances through linguistic techniques such as to-
as studs, are very specific with respect to both position and         kenization, chunk parsing, and grammatical function recog-
IDF.                                                                  nition [9], the system applies the semantic extraction and
                                                                      ranking algorithms, and returns/deposits extracted metadata
6. ARCHITECTURE                                                       and semantic association to the ontology database and also
                                                                      to the users or clients, if applicable, for feedback.
The major thrust of the research is to develop an integration
framework that facilitates exploitation of semantics from
taxonomy-based standards and instantiations of the stan-              The integration of competing and complementary standards
dards to achieve higher interoperability between domain               is a critical step for enhancing interoperability among het-
participants and their information systems. To demonstrate            erogeneous systems using the standards. The proposed
the applicability of the proposed approach toward the goal,           semantic association is only one aspect in this effort. It
this section shows an overall architecture depicting one              should be supplemented with other technologies such as
possible implementation and its relationship with other re-           ontology mapping, reconciling, and merging to provide a
lated technologies.                                                   practical and complete solution. The framework includes a
                                                                      plug-in mechanism via XML-based interfaces and API for
                                                                      external software component integration.



                                                                 23
The formalized standards, their instances, users’ application         [5] N.F. Noy and M.A. Musen. The prompt suite: In-
ontologies, and extracted metadata form a semantic rich                    teractive tools for ontology merging and mapping.
ontology repository. Integrating the repository with other                 Journal of Human-Computer Studies, 59(6):983--
ontology techniques through the plug-in mechanism allows                   1024, 2003.
the effective construction of application domain ontology.            [6] M. Paolucci, T. Kawamura, T. Payne, and K. Sy-
Web services enriched with the vision of the semantic web                  cara. Semantic matching of web services capabili-
have emerged as a mainstream solution to system integra-                   ties. In The First International Semantic Web Con-
tion over the Internet. Following the same trend, the im-                  ference (ISWC), 2002.
plementation of the proposed framework adopts the Web                 [7] Rosen, Harold J. : Construction specifications writ-
Ontology Language (OWL) [8] with the intention of inte-                    ing : principles and procedures 5th edition, Hobo-
grating building construction workflow systems via seman-                  ken, N.J. : J. Wiley, c2005.
tic web services.
                                                                      [8] Mike Dean and Guus Schreiber: Editors OWL Web
                                                                           Ontology Language Reference, W3C Recommen-
7. CONCLUSION AND FUTURE WORKS
                                                                           dation, http://www.w3.org/TR/2004/REC-owl-ref-
                                                                           20040210, 10 February 2004.
This paper demonstrates the effective use of taxonomy for
                                                                      [9] Maedche, A., Neumann, G., Staab, S.: Bootstrap-
ontology developments and the semantic association of
                                                                           ping an Ontology-Based Information Extraction
ontology for interoperability in a workflow system with
                                                                           System, Intelligent Exploration of the Web,
building construction as the target example. It illustrates a
                                                                           Springer 2002.
systematic approach to semantic association through taxon-
omy formalization and ontology-based semantic extraction.             [10] Gruber, T.R., A Translation Approach to Portable
The overall system implementation in web environment is                    Ontology Specification: Knowledge Acquisition 5:
also proposed. Current activities of the research project                  199-220, 1993.
include the complete ontological formalization of the Ma-             [11] Rahm, E and Bernstein, P. A. “A Survey of Ap-
terFormat and UniformatII standards, refinement of the                     proaches to Automatic Schema Matching.” The
affinity measure for general taxonomy, and the integration                 VLDB Journal, Vol. 10, pp. 334-350, 2001.
of the algorithms with dynamic workflow systems through               [12] Do, H., Melnik, S. and Rahm, E. “Comparison of
semantic web services.                                                     Schema Matching Evaluations.” In Proceedings of
                                                                           the 2nd Int. Workshop on Web Databases (German
8. ACKNOWLEDGMENTS                                                         Informatics Society), 2002.
                                                                      [13] Aberber, K., Cudré-Mauroux, P. and Hauswirth, M.
This work is partially supported by an NSF research grant
                                                                           “The Chatty Web: Element Semantics through Gos-
ITR-0404113.
                                                                           siping.” The Proceedings of the 20th International
REFERENCES                                                                 World Wide Web Conference, pp. 197 – 206, 2003.
                                                                      [14] Doan, A., Madhavan, J., Domingos, P. and Halevy,
   [1] Construction Specifications Institute. MasterFormat                 A. “Learning to Map between Ontologies on the
       95™ : Alexandria, VA: The Construction Specifica-                   Semantic Web.” The VLDB Journal, Vol. 12, pp.
       tions Institute, 1995 edition.                                      303-319, 2003.
   [2] Charette, R. P. and Marshall, H. E.: UNIFORMAT                 [15] David W. Embley , Douglas M. Campbell , Randy
       II Elemental Classification for Building Specifica-                 D. Smith , Stephen W. Liddle.: Ontology-based ex-
       tions, Cost Estimating, and Cost Analysis, NISTIR                   traction and structuring of information from data-
       6389, Gaithersburg, MD: National Institute of                       rich unstructured documents, Proceedings of the
       Standards and Technology, October, 1999                             seventh international conference on Information
   [3] Gallaher, M. P.; O'Connor, A. C.; Dettbarn, J. L.,                  and knowledge management, p.52-59, November
       Jr.; Gilday, L. T.: Cost Analysis of Inadequate In-                 02-07, 1998, Bethesda, Maryland, United States
       teroperability in the U.S. Capital Facilities Industry,        [16] Ross, S.: A First Course in Probability. Macmillan
       NIST GCR 04-867, Gaithersburg, MD: National                         Publishing, 1976.
       Institute of Standards and Technology, August,                 [17] Church, K. W. and Gale, W. A. : Inverse document
       2004.                                                               frequency (IDF): A measure of deviations from
   [4] Jayant Madhavan, Philip A. Bernstein, and Erhard                    Poisson. In Yarowsky, D. and Church, K., editors,
       Rahm: Generic Schema Matching with Cupid, at                        Proceedings of the Third Workshop on Very Large
       the Twenty Seventh International Conference on                      Corpora, pages 121--130. Association for Compu-
       Very Large Databases (VLDB'2001), Roma, Italy.                      tational Linguistics. 1995.




                                                                 24