=Paper=
{{Paper
|id=Vol-210/paper-7
|storemode=property
|title=Semantic Interoperability in Multi-Disciplinary Domain. Applications in Petroleum Industry
|pdfUrl=https://ceur-ws.org/Vol-210/paper7.pdf
|volume=Vol-210
|dblpUrl=https://dblp.org/rec/conf/ecai/GullaST06
}}
==Semantic Interoperability in Multi-Disciplinary Domain. Applications in Petroleum Industry==
Semantic Interoperability in Multi-Disciplinary Domain.
Applications in Petroleum Industry
Jon Atle Gulla1 and Darijus Strasunskas1 and Stein L. Tomassen1
Abstract. The petroleum industry is a technically challenging
business with high investments, complex projects and operational 2 THE SUBSEA PETROLEUM INDUSTRY
structures. There are numerous companies and public offices in- The Norwegian subsea petroleum industry is a technically challeng-
volved in the exploitation of a new oil field, and there is a high de- ing business. Sophisticated equipment and highly competent com-
gree of specialization among them. Even though standardization has panies are needed, and the projects tend to be both large and
been considered important in this industry for many years, there is expensive. Many disciplines and competences need to come to-
still very little integration across phases and across disciplines. An gether in these projects, and their success is highly affected by the
industrially driven consortium launched the Integrated Information way people and systems are able to collaborate and coordinate their
Platform project in 2004, in which semantic standards based on work. On the Norwegian Continental Shelf (NCS) there are tradi-
OWL and Semantic Web technologies were to be developed for the tional oil companies, specialized service companies and smaller
subsea petroleum industry. In this paper, we present the IIP project ICT service companies. The multidiciplinarity of the industry
in more detail and discuss applications for semantic information in- causes in various perspectives towards the domain, and contextual
teroperability and retrieval. usage of different terminologies. One of the challenges is to deal
with contextual information and multi-perspective data integration
in the multidisciplinary industry.
1 INTRODUCTION Both the projects and the subsequent production systems are in-
The petroleum industry in Norway is technically challenging with formation-intensive. When a well is put into operation, the produc-
subsea installations and difficult climatic conditions. It is industri- tion has to be monitored closely to detect any deviation or
ally still quite fragmented, in the sense that there is little collabora- problems. The next generation subsea systems include numerous
tion between phases and disciplines in large petroleum projects. sensors that measure the status of the systems and send real-time
There are many specialized companies involved, though their data- production data back to certain operation centers. For these centers
bases and applications are not necessarily well integrated with each to be effective, they need tools that allow them to understand this
other. Research done by the Norwegian Oil Industry Association data, relate it to other relevant information, and help them deal with
(OLF) shows that there is a need for more collaboration and integra- the situation at hand. There is a challenge in dealing with all this in-
tion across phases, disciplines and companies [1]. The existing formation, but also in interpreting information that is deeply rooted
standards do not provide the necessary support for this, and the re- in various technical terminologies.
sult is costly and risky projects and decisions based on wrong or The multitude of companies involved, with their own applica-
outdated data. tions and databases, makes coordination and collaboration more
This paper presents the Integration Information Platform (IIP) important than in the past. For the industry as a whole, this severely
project [2] and preliminary results. The project’s goal is to extend hampers the integration of applications and organizations as well as
and formalize an existing terminology standard for the petroleum the decision making processes in general:
industry, ISO 15926. Using Semantic Web technologies, we turn • Integration. Even though there is some cooperation between
this standard into a real ontology that provides a consistent unambi- companies in the petroleum sector, this cooperation tends to
guous terminology for subsea petroleum production systems. How- be set up on an ad-hoc basis for a particular purpose and sup-
ever, creating and maintaining ontologies is both time-consuming ported by specifically designed mappings between applica-
and costly. Consequently, ontologies are applied for many different tions and databases. There is little collaboration across
tasks to increase return on investment (ROI). Therefore, the IIP pro- disciplines and phases, as they usually have separate databases
ject focuses on reuse of ontologies in traditional vector-space in- rooted in different goals, structures and terminologies. It is of
formation retrieval (IR) systems, in addition to rules-based course possible to map data from one database to another, but
notification. Considering multi-disciplinary domain and a big varia- with the complexity of data and the multitude of companies
tion of terminology used one of the challenges is adoption of the and applications in the business this is not a viable approach
created ontology to the document space. Finally, it is necessary to for the industry as a whole.
consider how ontologies will be used in those applications, i.e. ap- • Decision making. A current problem is the lack of relevant
plication specific ontology value is an important concern in IIP. high-quality information in decision making processes. Some
The paper is structured as follows. In Section 2 we go through data is available too late or not at all because of lack of inte-
the structures and challenges in the subsea petroleum industry, ex- gration of databases. In other cases relevant data is not found
plaining the status of current standards and the vision of future inte- due to differences in terminology or format. And even when
grated operations. In Section 3 the IIP project is briefly introduced. information is available, it is often difficult to interpret its real
Whereas in Section 4, we discuss chosen approaches. Finally, the content and understand its limitations and premises. This is for
conclusions are drawn in Section 5. example the case when companies report production figures to
the government using slightly different terminologies and
1
Dept. of Computer & Information Science, Norwegian University of Sci- structures, making it very hard to compare figures from one
ence & Technology (NTNU), NO-7491 Trondheim, Norway; company to another.
email: Jon.Atle.Gulla@idi.ntnu.no, Darijus.Strasunskas@idi.ntnu.no,
XML is already used extensively in the petroleum industry as a
Stein.L.Tomassen@idi.ntnu.no
syntactic format for exchanging data. Over the last few years, there
have been several initiatives for defining semantic standards to envision a semantic standard in the future that supports integration
achieve semantic interoperability and information sharing in the and interoperability between data from all phases and disciplines.
business. Suppliers’s applications interact with the operators’ data through
standardized semantic interfaces, making sure that a unified termi-
nology is used and data is consistent and unambiguous. The imple-
2.1 ISO 15926 Integration of Life-Cycle Data mentation requirements for integrated operations include the
introduction of proper standards for efficient sharing and exchange
ISO 15926 is a standard for integrating life-cycle data across phases of information.
(e.g. concept, design, construction, operation, decommissioning)
and across disciplines (e.g. geology, reservoir, process, automation).
It consists of 7 parts, of which part 1, 2 and 4 are the most relevant
to this work. Whereas part 1 gives a general introduction to the
principles and purpose of the standard, part 2 specifies the modeling
language for defining application-specific terminologies. Part 2
comes in the form of a data model and includes 201 entities that are
related in a specialization hierarchy of types and sub-types. It is in-
tended to provide the basic types necessary for defining any kind of
industrial data. Being specified in EXPRESS [3], it has a formal
definition based on set theory and first order logic.
Part 4 of ISO 15926 is comprised of application or discipline-
specific terminologies, and is usually referred to as the Reference
Data Library (RDL). These terminologies, described as RDL
classes, are instances of the data types from part 2, are related to
each other in a specialization hierarchy of classes and sub-classes as
well as through memberships and relationships. If part 2 defines the
language for describing standardized terminologies, part 4 describes
the semantics of these terminologies. There is ongoing work in the
Norwegian offshore industry to provide a comprehensive standard-
ized terminology for the petroleum industry in part 4. Part 4 today
contains approximately 50.000 general concepts like motor, turbine,
pump, pipes and valves.
Figure 1. (a) Current situation; (b) The vision of integrated operations
ISO 15926 is still under development, and only Part 1 and 2 have
so far become ISO standards. In addition to adding more RDL
classes for new applications and disciplines in Part 4, there is also a
discussion about standards for geometry and topology (Part 3), pro- 3 THE INTEGRATED INFORMATION PLAT-
cedures for adding and maintaining reference data (Part 5 and 6), FORM PROJECT
and methods for integrating distributed systems (Part 7). Neither
The Integrated Information Platform (IIP) project is a collaboration
ISO 15926 nor other standards have the scope and formality to en-
project between companies active on NCS and academic institu-
able proper integration of data across phases and disciplines in the
tions, supported by the Norwegian Research Council (NFR). Its
petroleum industry.
long-term target is to provide high quality real-time information for
decision making at onshore operation centers.
2.2 The Vision of Integrated Operations The IIP project addresses the need for a common understanding
of terms and structures in the subsea petroleum industry. The objec-
The Norwegian Oil Industry Association proposed the Integrated tive is to ease the integration of data and processes across phases
Operations program in 2004. The fundamental idea is to integrate and disciplines by providing a comprehensive unambiguous and
processes and people onshore and offshore using new information well accepted terminology standard that lends itself to machine-
and communication technologies. Facilities to improve onshore’s processable interpretation and reasoning. This should reduce risks
abilities to support offshore operationally are considered vital in this and costs in petroleum projects and indirectly lead to faster, better
program. Personnel onshore and offshore should have access to the and cheaper decisions.
same information in real-time and their work processes should be The project is identifying an optimal set of real-time data from
redefined to allow more collaboration and be less constrained by reservoirs, wells and subsea production facilities. The OWL web
time and space. OLF has estimated that the implementation of inte- ontology language is chosen as the markup language for describing
grated operations on the NCS can increase oil recovery by 3-4%, these terms semantically in an ontology. The entire standard is thus
accelerate production by 5-10% and lower operational costs by 20- rooted in the formal properties of OWL, which has a model-
30% [1]. theoretic interpretation and to some extent support formal reason-
Central in this program is the semantic and uniform manipula- ing. A major part of the project is to convert and formalize the terms
tion of heterogeneous data. Decisions often depend on real-time already defined in ISO 15926 Part 2 (Data Model) and Part 4 (Ref-
production data, visualization data, and background documents and erence Data Library), which we will come back to in the next Sec-
policies, and the data range from highly structured database tables tion. Since the ISO standard addresses rather generic concepts,
to unstructured textual documents. This necessitates intelligent fa- though, the ontology must also include more specialized terminol-
cilities for capturing, tracking, retrieving and reasoning about data. ogies for the oil and gas segment. Detailed terminologies for stan-
Figure 1 illustrates the objectives of the integrated operations ini- dard products and services are included from other dictionaries and
tiative. Whereas we in the current situation have numerous data- initiatives (DISKOS, WITSML, ISO 13628/14224, SAS), and the
bases that need to be mapped to each other on an ad hoc basis, we project also opens for the inclusion of terms from particular proc-
esses and products at the bottom level. In sum, the ontology being
built in IIP has a structure as shown in Figure 2.
Figure 4. Explanation of a feature vector by adapted semiotic triangle
4.1 Semantic Web Technology and Interoperability
Figure 2. The standardization approach in IIP The general idea in the Semantic Web is to annotate each piece of
data with machine-processable semantic descriptions. These de-
scriptions must be specified according to a certain grammar and
4 APPROACH AND DISCUSSION with reference to a standardized domain vocabulary. The domain
The success of the new ontology, and standardization work in gen- vocabulary is referred to as an ontology and is meant to represent a
eral, depends on the users’ willingness to commit to the standard common conceptualization of some domain. The grammar is a se-
and devote the necessary resources. If people do not find it worth- mantic markup language, as for example the OWL web ontology
while to take the effort to follow the new terminology, it will be dif- language recommended by W3C. With these semantic annotations
ficult to build up the necessary support. This means that it is in place, intelligent applications can retrieve and combine docu-
important to provide environments and tools that demonstrate the ments and services at a semantic level, they can share, understand
value of using the ontology. Intelligent ontology-driven applications and reason about each other’s data, and they can operate more inde-
must demonstrate the benefits of the new technology and convince pendently and adapt to a changing environment by consulting a
the users that the additional sophistication pays off. shared ontology.
Recall, the multidisciplinary settings of the petroleum industry. Interoperability can be defined as a state in which two applica-
The multidisciplinarity results in different views on the domain fol- tion entities can accept and understand data from the other and per-
lowed by vast terminology variation between disciplines, e.g. oil form a given task in a satisfactory manner without human
companies, specialized service and ICT service companies. Non- intervention. We often distinguish between syntactic, structural and
consistent usage of terminology causes the problems in documents semantic interoperability [6, 7]:
exchange among the industrial partners (see illustration in left part • Syntactic interoperability denotes the ability of two or more
of Figure 2). Furthermore, the variation in terminology may prohibit systems to exchange and share information by marking up
successful commitment to the ontology and its adoption in daily data in a similar fashion (e.g. using XML).
work routines. Therefore, we propose an approach to bridge the gap • Structural interoperability means that the systems share se-
among terminologies by constructing a feature vector for each of mantic schemas (data models) that enable them to exchange
the concepts in the ontology (see right part of Figure 3). and structure information (e.g. using RDF).
• Semantic interoperability is the ability of systems to share and
understand information at the level of formally defined and
mutually accepted domain concepts, enabling machine-
processable interpretation and reasoning.
For the Semantic Web technology to enable semantic interoperabil-
ity in the petroleum industry, it needs to tackle the problem of se-
mantic conflicts, also called semantic heterogeneity. Since the
databases are developed by different companies and for different
phases and/or disciplines, it is often difficult to relate information
that is found in different applications. Even if they represent the
same type of information, they may use formats or structures that
prevent the computers from detecting the correspondence between
data.
Figure 3. Multidisciplinary interpretation of domain and ontology
Development of the approach is inspired by a linguistics method 4.2 Industrial Ontologies
for describing the meaning of objects – the semiotic triangle (known
as triangle of meaning or Ogden’s triangle, as well) [4]. In our ap- In recent years a number of powerful new ontologies have been
proach, a feature vector connects a concept and a document collec- constructed and applied in selected domains. This is particularly
tion (Figure 4), i.e., the feature vector is tailored to the terminology true in medicine and biology, where Semantic Web technologies
used in a particular collection of the documents (that is company or and web mining have been exploited in new intelligent applications
discipline specific). The construction of feature vector is further ex- [6, 8, 9]. However, these disciplines are heavily influenced by gov-
plained in section 4.3 and [5]. ernment support and are not as commercially fragmented as the pe-
troleum industry. Creating an industry-wide standard in a
fragmented industry is a huge undertaking that should not be under- Onto-based retrieval engine: This component performs the
estimated. In this particular case, we have been able to build on an search and post-processing of the retrieved results.
existing standard, ISO 15926. This has ensured sufficient support
from companies and public institutions. There is still an open ques-
tion, though, what the coverage of such an ontology should be. 4.4 Rule-based Notification
There are other smaller standards out there, and many companies Since the Semantic Web is still a rather immature technology, there
use their own internal terminologies for particular areas. The scope are still open issues that need to be addressed in the future. One
of this standard has been discussed throughout the project as the on- problem in the IIP project is that we need the full expressive power
tology grew and new companies signalled their interest. For any of OWL (OWL Full) to represent the structures of ISO 15926-2/4.
standard of this complexity, it is important also to decide where the Reasoning with OWL specifications is then incomplete and infer-
ontology stops and to what extent hierarchical or complementing ence becomes undecidable [10]. Here we consider investigate the
ontologies are to be encouraged. Techniques for handling ontology limits of inference using the ontology implemented in OWL Full.
hierarchies and ontology alignment and enrichment must be consid- This will allow identifying possible scenarios and restrictions in us-
ered in a broader perspective. ing OWL Full for a such scale project. This is important, since one
of the application areas is specification of rules that will be used to
4.3 Ontology-driven Information Retrieval analyze anomalies in real-time data from subsea sensors. At that
point we will need to exploit the logical properties of OWL and
For an Information Retrieval tool developed in IIP, we are adding a start experimenting with the next generation rule-based notification
mechanism to adopt the ontology with the words used in particular systems.
discipline (i.e. by particular company) [5]. Figure 5 illustrates the
overall architecture of the ontology-based information retrieval sys-
tem. The individual components of the system will be given a brief 4.5 Application-specific Ontology Value
account. The quality of ontologies is a delicate topic. It is important to
Feature vector miner: This component associates concept from choose an appropriate level of granularity. In this project we have
the ontology with relevant terms from the document space. An on- been fortunate to have an existing standard to start with. What was
tology concept is a class defined in the ontology being used. These considered satisfactory in ISO 15926 may however not be optimal
concepts are extended into feature vectors with a set of relevant for the ontology-driven applications that will make use of the future
terms extracted from the document collection using text-mining ontology. Ultimately, we need to consider how the ontology will be
techniques. The feature vectors provide interpretations of concepts used in these applications.
with respect to the document collection and needs to be updated as The ontology value quadrant [11] in Figure 6 is used to evaluate
the document collection changes. This allows us to relate the con- an ontology’s usefulness in a particular application. The ontology’s
cepts defined in the ontology to the terms actually used in the ability to capture the content of the universe of discourse at the ap-
document collection. propriate level of granularity and precision and offer the application
understandable correct information are important features that are
addressed in many ontology/model quality frameworks (e.g. [12,
13, 14, 15]). But the construction of the ontology also needs to take
into account dynamic aspects of the domain as well as the behavior
of the application. For Ontology-driven Information Retrieval this
means that we need to consider the following issues about content
and dynamics [11]:
Concept familiarity. Terminologies are used to subcategorize
phenomena and make semantic distinctions about reality. Ideally
the concepts preferred by the user in his queries correspond to the
concepts found in the ontology.
Figure 5. Architecture of ontology-driven IR system
Indexing engine: The main task of this component is to index
the document collection. The indexing system is built on top of Lu-
cene, which is a freely available and fully featured text search en- Figure 6. Ontology value quadrant [11]
gine from Apache. Lucene is using the traditional vector space
approach, counting term frequencies, and using tf.idf scores to cal- Document discrimination. The structure of concepts in the ontol-
culate term weights in the index. ogy decides which groups of documents in the collection can theo-
Query enrichment: This component handles the query specified retically be singled out and returned as result sets. Similarly, the
by the user. The query can initially consist of concepts and/or ordi- concepts preferred by the user indicate which groups of documents
nary terms (keywords). Each concept or term can be individually she might be interested in and which distinctions between docu-
weighted. The concepts are replaced by corresponding feature vec- ments she considers irrelevant. If the granularity of the user’s pre-
tors. ferred concepts and the ontology concepts are perfectly compatible,
combinations of these terms can single out the same result sets from REFERENCES
the document collection.
Query formulation. The user queries are usually very short, like [1] OLF. Integrated Work Processes: Future work processes on the Norwe-
gian Continental Shelf. The Norwegian Oil Industry Association. URL:
2-3 words, and specialized or generalized terms tend to be added to
http://www.olf.no/?28867.pdf (Accessed: 2006 03 05).
refine a query [16]. This economy of expression seems more impor- [2] Sandsmark, N.; Mehta, S. Integrated Information Platform for Reservoir
tant to users than being allowed to specify detailed and precise user and Subsea Production Systems. In Proceedings of the 13th Product Data
needs, as very few use advanced features to detail their query. Technology Europe Symposium (PDT 2004), Stockholm, (2004).
Domain stability. The search domain may be constantly chang- [3] International Standards Association. Industrial automation systems and
ing, and parts of the domain may be badly described in documents integration – Product data representation and exchange. Part 11: Descrip-
compared to others. The ontology needs regular and frequent main- tion methods: The EXPRESS language reference manual. URL:
tenance, making it difficult to depend on the availability of domain http://www.iso.org/iso/en/CatalogueDetailPage.CatalogoueDetail?CSNU
MBER=18348. (Accessed: 2006 03 05)
experts.
[4] Ogden, C.K., and Richards, I.A. The Meaning of Meaning. 8th Edition,
New York, Harcourt, Brace & World, Inc., (1923).
5 CONCLUSIONS [5] Tomassen, S.L.; Gulla, J.A.; Strasunskas, D. Document Space Adapted
Ontology: Application in Query Enrichment. Proceedings of 11th Interna-
The Integrated Information Platform project is one of the first at- tional Conference on Applications of Natural Language to Information
tempts at applying state-of-the-art Semantic Web technologies in an Systems (NLDB’2006), Klagenfurt, Austria, Springer-Verlag, LNCS
3999, 46-57, (2006).
industrial setting. Existing standards are now being converted and
[6] Aguilar, A. Semantic Interoperability in the Context of e-Health. 2005.
extended into a comprehensive OWL ontology for reservoir and CDH Seminar. URL: http://m3pe.org/seminar/aguilar.pdf (Accessed:
subsea production systems. The intention is that this ontology will 2006 03 05).
later be approved as an ISO standard and form a basis for develop- [7] Dublin Core. Dublin Core Metadata Glossary.
ing interoperable applications in the industry. http://library.csun.edu/mwoodley/dublincoreglossary.html. (2004).
With the new ontology at hand, the industry will have taken the [8] Gene Ontology Consortium. Gene Ontology: tool for the unification of
first step towards integrated operations on the Norwegian Continen- biology. Nature Genet. 25, 25-29, (2000).
tal Shelf. Data can then be related across phases and disciplines, [9] Pisanelli, D. M. (Ed.). Ontologies in Medicine. Volume 102 Studies in
Health Technology and Informatics. IOS Press. (2004).
helping people collaborate and reducing costs and risks. However,
[10] Horrocks, I., Patel-Schneider, P. and F. van Harmelen. From SHIQ and
there are costs associated with building and maintaining such an RDF to OWL: The making of a web ontology language. Journal of Web
ambitious ontology. It remains to be seen if the industry is able to Semantics 1(1), 7–26, (2003).
take full advantage of the additional expressive power and formality [11] Gulla, J. A.; Borch, H.O.; Ingvaldsen, J.E. Unsupervised Keyphrase Ex-
of the new ontology. The work in IIP indicates that both informa- traction for Search Ontologies. . Proceedings of 11th International Con-
tion retrieval systems and sensor monitoring systems can benefit ference on Applications of Natural Language to Information Systems
from having access to an underlying ontology for analyzing data (NLDB’2006), Klagenfurt, Austria, Springer-Verlag, LNCS 3999, (2006).
and interpreting user needs. [12] Burton-Jones, A.; Storey, V.C.;Sugumaran, V.; Ahluwalia, P. A Semi-
otic Metrics Suite for Assessing the Quality of Ontologies. Data and
One of the main applications developed in IIP is an ontology-
Knowledge Engineering, 55(1), 84-102, (2005).
driven information retrieval system [5]. Here, the concepts in the [13] Gangemi, A.; Catenacci, C.; Ciaramita, M.; Lehmann, J. Ontology
ontology are associated with contextual definitions in terms of evaluation and validation. An integrated formal model for the quality di-
weighted feature vectors tailoring the ontology to the content of the agnostic task. Technical Report, ISTC-CNR, Trento, Italy, (2005). URL:
document collection. Further, the feature vector is used to enrich a http://www.loa-cnr.it/Files/OntoEval4OntoDev_Final.pdf (Accessed:
provided query. Query enrichment by feature vectors provides 2006 04 02)
means to bridge the gap between query terms and terminology used [14] Lindland, O. I., Sindre, G., Solvberg, A.: Understanding Quality in
in a document set, and still employing the knowledge encoded in Conceptual Modeling. IEEE Software, 11 (2), 42-49, (1994).
[15] Lozano-Tello, A.; Gomez-Perez, A. Ontometric: A method to choose
ontology.
appropriate ontology. Journal of Database Management, 15(2), 1-18,
Also, we can build more complete semantic descriptions of (2004)
documents and add more reasoning capabilities to our information [16] Gulla, J.A., Auran, P.G., Risvik, K.M.: Linguistic Techniques in Large-
retrieval tools. We will then see if a strong semantic foundation Scale Search Engines. Fast Search & Transfer, (2002) 15 p.
makes it easier for us to handle and interpret the vast amount of data
that are so typical to the petroleum industry.
Main future work is an inclusion of rules to be used to analyze
anomalies in the real-time data from the subsea sensors. Then we
will need to evaluate and investigate the logical properties of OWL
and start experimenting with the next generation rule-based notifi-
cation systems.
Acknowledgements
This research work is funded by the Integrated Information Plat-
form for reservoir and subsea production systems (IIP) project,
which is supported by the Norwegian Research Council (NFR).
NFR project number 163457/S30.