<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Linked Data in Architecture and Construction Workshop, May</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A network-based framework for dynamic linkage of unstructured data to BIM: supporting predictive analysis in work order management</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Soroush Sobhkhiz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tamer El-Diraby</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre for Information Systems in Infrastructure and Construction, Civil Engineering Department, University of Toronto</institution>
          ,
          <addr-line>35 St George St, Toronto</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>29</volume>
      <issue>2022</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Linking BIM to other data models is essential to establishing digital twins. Recently, ontologies have been used to establish the link between IFC and other data models. However, most of the data in digital twins are unstructured, for example, specifications, reports, and other communication. These types of data dynamically change and incorporate a wide range of concepts with complex relationships. It is difficult to develop and maintain an ontological representation for such forms of data. This research work explores the use of concept networks as means to link BIM to unstructured data. Using topic modeling tools, a set of key terms are extracted from documents. The relationships between the key terms are investigated and a network of these key terms is established. This approach is illustrated through a use-case example application to the work order management domain.</p>
      </abstract>
      <kwd-group>
        <kwd>1 BIM</kwd>
        <kwd>Text</kwd>
        <kwd>Unstructured Data</kwd>
        <kwd>Network Analytics</kwd>
        <kwd>Predictive Analysis</kwd>
        <kwd>Work Order Management</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Unstructured data such as text and chats contain valuable knowledge. The processing and
management of such data within a BIM environment is essential to developing digital twins. In this
context, digital twin is not a simple digitization or a 3D model of a facility. It is a complete virtualization
of facility data, work processes, stakeholder profiles that aim to create a digital replica to enable facility
managers to study work and operations scenario in the virtual world before implementing them in the
real world [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In the case of work orders, they contain valuable information regarding the overall
building performance and occupant satisfaction. Studying patterns in work orders can enable the
development of business intelligence tools, particularly predictive analysis [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We can study and model
expected deteriorations, the timing of critical levels of performance, the expected work durations, and
the expected maintenance costs.
      </p>
      <p>However, currently, work orders are generated and managed inconsistently and it is difficult to
derive the hidden and valuable knowledge within them. More importantly, a comprehensive analysis
that can take work order data and other data sources into account, is practically infeasible. This is mainly
because there is practically no link between work order data and structured data sources such as BIM.</p>
      <p>
        We present here an approach to link BIM to work order data. Work order contents tend to be
unstructured data describing the performance situation and the needed action [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. They also contain
structured data, particularly IFC-based data such as location and facility specifications. As a result,
automating the analysis and processing of work orders requires finding pathways of linking structured
and unstructured data domains.
      </p>
      <p>
        Typical linkage approaches in the BIM domain have focused on the use of ontologies. Such approach
provides stable and reliable correspondence between ifcOWL, or other building ontologies, and
established (and common) data models of other related domains, such as energy analysis, performance
management, etc. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. While this top-down and structured approach provide reliable and scalable means
for data linkage, it might not be flexible enough to capture specific contexts of work. Especially, when
dealing with concepts that dynamically change over time. Although there has been some efforts to
capture temporal changes, because of the needed overhead to update ontologies, it is hard to capture the
evolution of knowledge in the domain [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>We need to establish means to help formalize the conceptualization in a text source in a manner that
enables clear linkages to IFC, and, at the same time, be model-independent to enable dynamic and
contextualized capture of knowledge contained in unstructured data corpus. By formalizing both the
structured (IFC-based data) and unstructured data of a work order, we can automate many of the
processes in managing work orders. Comparing data of work orders can help us discover clusters or
similarities between work orders, which can be used to predict future work orders. For instance, if we
have historical records of a specific object, say an air handler, then we can perform predictive analysis
to estimate the possibility of a new work order being submitted in close future. We can perform this
analysis in such a way that provides planning insights for managers by informing them of the overall
possible impacts (such as expected down time, or cost) of the work order.</p>
      <p>In this paper, we aim to investigate an alternative bottom-up solution to the problem of linking work
order text to IFC concepts. Specifically, the objective of this paper is to explore the potentials of linking
IFC classes to network concepts obtained from textual data such as work orders. The motivation behind
this approach is two-fold. First, modeling a text into a network is different from pushing it into
predefined concept models (e.g., ontologies) in that the concepts are data-driven. Second, a network model
is flexible enough to dynamically adapt to the changes in data and therefore, capture its evolution.</p>
      <p>It should be noted that this paper is simply an exploration of an alternative possible solution, and
that the proposed methodology has not yet matured for practical applications. Nevertheless, we
provided a simple prototype use-case to showcase how the approach can potentially be used in practice.
Future research will provide more details and implementations.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Overview of the proposed approach</title>
      <p>
        Conventional linked-data solutions are based on static representations and do not support capturing
the evolution of the knowledge (dynamic changes in concepts and relationships) [
        <xref ref-type="bibr" rid="ref6 ref7">6,7</xref>
        ]. We argue that
the solution should not be a top-down expert-based method where we define the standard/model for
data correspondence. Rather, it should be a data-driven bottom-up method where the patterns are
dynamically obtained and used for identifying relationships.
      </p>
      <p>
        As for the case of work order data, we need an unsupervised method for extracting concepts and
relationships from textual data. As a result, concept networks can potentially provide a solution. Here,
a network refers to a directed graph of interlinked web of concepts that represent conceptual
connectivity. With network analysis, we can extract the key concepts and relationships in a set of data,
without imposing any restrictions (i.e., having to define the concepts or relationships). Further, the
analysis can be dynamic, and the concepts and relationships can be updated over time when more data
is available [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In the following section, we explain a proposed approach in more detail, take the
following set of work order components as an example:
• “Door hardware issue”
• “Replace Door handle”
• “Need a locksmith in room GB329”
• “Rekey | Room 215”
• “Need fob access for contractor”
• “Cylinder issue in room 132, too much noise”
      </p>
      <p>The examples above are extracted from the work order database of the University of Toronto
Facilities. Conceptually, all of them relate to the object/concept ‘Door’. In order to make the link
between these work orders and BIM objects, we propose the following steps:
1. Establish standardized links for obvious cases: develop correspondence lists for clearly linked
terms and IFC concepts. For instance, the word ‘door’ always refers to the concept ‘ifcDoor’.
2. Transfer text corpus into concept networks.
3. Use data analytics to discover the cluster of other concepts that should be also linked to IFC
concepts. For instance, we do not link concepts such as ‘lock’ or ‘cylinder’ to ifcDoor. Instead,
we develop a concept network, where we find that these concepts are always associated with
the key term ‘door’. With this, we use a bottom-up, no-model approach to deduce that a linkage
should be made between ifcDoor and ‘lock’. In other words, we use text mining to enrich the
simple, obvious, static relationship “door-corresponds-to-ifcDoor”. The anchor node “door”
can be used to find additional semantically relevant key terms to be linked to the IFC concept.</p>
      <p>The richness, diversity and the evolution of key terms (linked to an IFC) open the door for
unsupervised learning, where we can discover clusters of work orders. This is the foundation for pattern
discovery and predictive analysis.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Association Rule Mining and Network analysis</title>
      <p>process initiates by pre-processing the text data so that it can be clean enough for analysis. For instance,
punctuations and stop-words are removed and words are lowercased.</p>
      <p>
        Next, we perform association rule mining to identify relationships between the concepts in the work
order data. The lift value of an association rule defines how the presence of one word increases the
probability of the appearance of the other, see Equation (1) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. As a result, this relationship can provide
meaningful links between concepts which we can use in our study. Here, if there is a strong relationship
between the word ‘door’ and any other concept, it will be discovered. We can use these links and
develop a graph network where concepts are linked to each other based on their lift, the larger the lift
value, the stronger the relationship.
      </p>
      <p>(
⇒  ) =

 
(</p>
      <p>⇒  )
 ( )
=
 ( ∪  )
 ( ) ( )
,
(1)
• {Crossbar} =&gt; {Cylinder}: Lift value = 78.1
• {Crossbar} =&gt; {Handle}: Lift value = 71</p>
      <p>As seen, the lift values between these words are very high showing a very strong relationship. In
other words, these words are often used with each other in work orders. Now consider the below rules:
• {Crossbar} =&gt; {Door}: Lift Value: 13
• {Handle} =&gt; {Door}: Lift Value: 12.3
• {Cylinder} =&gt; {Door}: Lift Value: 11.98</p>
      <p>As can be seen, these words have a very strong connection to the word “door” as well, although not
as strong as their relationship with each other. Figure 2 shows an example of network extraction from
a set of work order data obtained from an institutional building located in Toronto. The sample included
over 1000 records collected over a period of 8 months. As can be seen, the concept ‘door’ (circled in
the figure) is connected to several other concepts in direct and indirect ways. For instance, there is a
connection to the concepts ‘cylinder’, ‘fob’. ‘hardware’, ‘handle’, ‘access’ and so on. We can use these
relationships to establish the links discussed in the previous section. For instance, if we establish a static
link between the concept ‘IfcDoor’ and the node ‘door’ in the network, then an indirect but not as strong
link can be deduced between ‘IfcDoor’ and ‘lock’, because ‘lock’ is directly linked to ‘Door’. The
stronger the relationship between ‘Door’ and ‘lock’, the more probable the link between‘IfcDoor’ and
‘lock’. In the following section, we show this through an example.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Example</title>
      <p>To better explain the process, we perform the proposed steps with focus on the sample of work
orders provided above. The IFC schema (the BIM data schema) represents the relationships between
‘ifcDoor’ and other concepts as shown in Figure 3. Following step 1, assume that we can link the
‘ifcDoor’ to the concept ‘door’ in the work order network. This is the static obvious relationship. Step
2 is already performed with the results shown in Figure 2. We now perform step 3 and link the IFC data
schema to the work order network. The result is shown in Figure 4.</p>
      <p>Please note that this is a high-level prototype example simply to provide a generic understanding of
the idea of a network-based solution. Detailed implementations will be provided in future research.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and Conclusion</title>
      <p>The proposed approach combines the advantages of structured, expert-based analysis with
bottomup, data-driven analysis. A set of anchor key terms are linked to corresponding IFC concept at the start.
The list of these linkages will tend to be stable and static. Parallel to this, text mining is used to find
other key terms that are typically connected to the anchor node. Indirectly, these additional key terms
are linked to the IFC concept. As the concept network evolves, the extended word cloud around an IFC
concept morphs.</p>
      <p>This process is more capable of capturing the changes in the relationships. If a new relationship
emerges, the process can capture that because it is not limited to the expert intuition and relies on the
actual patterns in the data. We can leverage the weights of the relationships to analyze how strong
relationships are. Therefore, the proposed approach does not result in firm connections and each
connection comes with a certainty. For instance, a work order with the concept ‘cylinder’ and
‘hardware’ has a strong connection to the concept ‘door’ and therefore to the concept ‘ifcDoor’. But the
connection of the same workorder to the concept ‘HVAC’ is weak. So, there is a higher probability that
the work order relates to the door object in a BIM model.</p>
      <p>The fundamental contribution of this approach is not linking a set of key terms to IFC concept. It is
in doing so dynamically, and in an unsupervised manner that learns from data. The enrichment of IFC
concept with a dynamically generated could of key terms enables the implementation of machine
learning on work order data that are both structured (extracted from BIM) and unstructured (extracted
from free text).</p>
    </sec>
    <sec id="sec-6">
      <title>6. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>El-Diraby</surname>
          </string-name>
          , Tamer, and Soroush Sobhkhiz, “
          <article-title>The Building as a Platform: Predictive Digital Twinning”, Buildings and Semantics: Data Models and Web Technologies for the Built Environment, edited by Pieter Pauwels</article-title>
          , CRC Press,
          <year>2022</year>
          . To appear.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Lavy</surname>
            , Sarel,
            <given-names>Nishaant</given-names>
          </string-name>
          <string-name>
            <surname>Saxena</surname>
            , and
            <given-names>Manish</given-names>
          </string-name>
          <string-name>
            <surname>Dixit</surname>
          </string-name>
          .
          <article-title>"Effects of BIM and COBie database facility management on work order processing times: Case study</article-title>
          .
          <source>" Journal of Performance of Constructed Facilities</source>
          <volume>33</volume>
          , no.
          <issue>6</issue>
          (
          <year>2019</year>
          ):
          <fpage>04019069</fpage>
          . doi:
          <volume>10</volume>
          .1061/(ASCE)CF.
          <fpage>1943</fpage>
          -
          <volume>5509</volume>
          .
          <fpage>0001333</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Dutta</surname>
            , Saptak,
            <given-names>H. Burak</given-names>
          </string-name>
          <string-name>
            <surname>Gunay</surname>
            , and
            <given-names>Scott</given-names>
          </string-name>
          <string-name>
            <surname>Bucking</surname>
          </string-name>
          .
          <article-title>"A method for extracting performance metrics using work-order data." Science and Technology for the Built Environment 26</article-title>
          , no.
          <issue>3</issue>
          (
          <year>2020</year>
          ):
          <fpage>414</fpage>
          -
          <lpage>425</lpage>
          . doi:
          <volume>10</volume>
          .1080/23744731.
          <year>2019</year>
          .
          <volume>1693208</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>Do-Yeop</surname>
          </string-name>
          , Hung-lin
          <string-name>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jun</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Xiangyu</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <surname>Chan-Sik Park</surname>
          </string-name>
          .
          <article-title>"A linked data system framework for sharing construction defect information using ontologies and BIM environments." Automation in Construction 68 (</article-title>
          <year>2016</year>
          ):
          <fpage>102</fpage>
          -
          <lpage>113</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.autcon.
          <year>2016</year>
          .
          <volume>05</volume>
          .003.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Turk</surname>
            ,
            <given-names>Žiga.</given-names>
          </string-name>
          <article-title>"Interoperability in construction-Mission impossible?." Developments in the Built Environment 4 (</article-title>
          <year>2020</year>
          ):
          <fpage>100018</fpage>
          . doi:
          <volume>10</volume>
          .1016/j.dibe.
          <year>2020</year>
          .
          <volume>100018</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Bizer</surname>
          </string-name>
          , Christian, Tom Heath, and
          <string-name>
            <surname>Tim</surname>
          </string-name>
          Berners-Lee.
          <article-title>"Linked data: The story so far." In Semantic services, interoperability and web applications: emerging concepts</article-title>
          , pp.
          <fpage>205</fpage>
          -
          <lpage>227</lpage>
          . IGI global,
          <year>2011</year>
          . doi:
          <volume>10</volume>
          .4018/978-1-
          <fpage>60960</fpage>
          -593-3.
          <year>ch008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Sobhkhiz</surname>
          </string-name>
          , Soroush, Hossein Taghaddos, Mojtaba Rezvani, and Amir Mohammad Ramezanianpour.
          <article-title>"Utilization of semantic web technologies to improve BIM-LCA applications." Automation in Construction 130 (</article-title>
          <year>2021</year>
          ):
          <fpage>103842</fpage>
          . doi:
          <volume>10</volume>
          .1016/j.autcon.
          <year>2021</year>
          .
          <volume>103842</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Aragao</surname>
          </string-name>
          , Rodrigo, and Tamer E.
          <string-name>
            <surname>El-Diraby</surname>
          </string-name>
          .
          <article-title>"Network analytics and social BIM for managing project unstructured data." Automation in Construction 122 (</article-title>
          <year>2021</year>
          ):
          <fpage>103512</fpage>
          . doi:
          <volume>10</volume>
          .1016/j.autcon.
          <year>2020</year>
          .
          <volume>103512</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>McNicholas</surname>
            ,
            <given-names>Paul</given-names>
          </string-name>
          <string-name>
            <surname>David</surname>
          </string-name>
          , Thomas Brendan Murphy, and
          <string-name>
            <surname>M. O'Regan</surname>
          </string-name>
          .
          <article-title>"Standardising the lift of an association rule</article-title>
          .
          <source>" Computational Statistics &amp; Data Analysis</source>
          <volume>52</volume>
          , no.
          <volume>10</volume>
          (
          <year>2008</year>
          ):
          <fpage>4712</fpage>
          -
          <lpage>4721</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.csda.
          <year>2008</year>
          .
          <volume>03</volume>
          .013.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>