Towards Integration and Coverage Assessment of Ontologies for Knowledge Reuse in the Aviation Sector Jos Lehmann Michael Shamiyeh Sven Ziemer Bauhaus Luftfahrt e.V. Bauhaus Luftfahrt e.V. Bauhaus Luftfahrt e.V. Willy-Messerschmitt-Straße 1 Willy-Messerschmitt-Straße 1 Willy-Messerschmitt-Straße 1 82024 Taufkirchen, Germany 82024 Taufkirchen, Germany 82024 Taufkirchen, Germany jos.lehmann@ michael.shamiyeh@ sven.ziemer@ bauhaus-luftfahrt.net bauhaus-luftfahrt.net bauhaus-luftfahrt.net ABSTRACT Ontology-based applications can support the reuse of corpo- rate-knowledge in engineering environments. In order to streamline the harnessing of semantics, problems of inte- gration and of coverage assessment of ontologies need to be addressed. This paper provides examples of such prob- lems when applying Semantic Technology in the aviation sector and outlines a strategy towards their solution. The paper also provides a preliminary discussion of how knowl- edge management architectures such as the one presented may be positioned in the wider research area of Industrie 4.0. Keywords Knowledge Integration and Language Technologies; Termi- nology, Thesaurus and Ontology Management; Industry and Figure 1: Architecture supporting Knowledge Reuse Engineering; Web Ontology Language; Aviation; Industrie 4.0 ; Linked Data The core Semantic Technology applied to increase access 1. INTRODUCTION and relevance is ontology. This makes it possible, on the one In the context of the German high-tech strategic program hand, to relate non-matching information that conceptually Industrie 4.0, which promotes research on the ongoing fourth belongs together and, on the other hand, to filter matching industrial revolution yielded by the digitization of products, information that is conceptually unrelated. processes and organizations, we are investigating the appli- Section 2 of this paper describes an ontology-based soft- cation of Semantic Technology to the reuse of corporate- ware architecture for the support of knowledge reuse in the knowledge in the aviation sector. aviation sector. Section 3 provides examples of the problems This research focuses on how semantics could be harnessed raised by the harnessing of the needed semantics. Section by the information systems employed during the conceptual 4 outlines a strategy towards the solution of such problems. Section 5 provides a preliminary discussion of how knowl- design of aeronautical components. The key idea, akin to ex- edge management architectures such as the one presented amples of Knowledge-based Engineering such as [9] or [11], is should be positioned with respect to the wider research area that engineering-projects in their early stages would benefit of Industrie 4.0, in particular with respect to subareas that from semantic search, as this would expand access to ex- involve Linked Data and Robotics applications. isting corporate-knowledge (e.g. legacy-data from previous projects) as well as make search-results more relevant. The types of corporate-knowledge being considered include two 2. ARCHITECTURE main categories of information sources: textual data sources Figure 1 shows the software architecture being researched and non-textual data sources. and developed to support knowledge reuse. The architec- ture attempts to combine two pre-existing architectures and legacy-data (all in gray areas). The pre-existing architec- tures are based on divergent ontological commitments, es- pecially different representational choices (regarding, for in- stance, what to model as a class and what as an individ- © 2017 Copyright held by the author/owners. ual, how many different properties to use in the ontology, whether to represent constraints, etc.). Such pre-existing SEMANTiCS 2017 workshops proceedings: LIDARI representational commitments are conveyed by the two main September 11-14, 2017, Amsterdam, Netherlands ontological modules being only partially in a gray area of Figure 1. Moreover, textual and non-textual data too are pre-existing. Otherwise, all white parts of Figure 1, includ- ing the proper ontological content of the ontological mod- ules, the criteria of mutual relevance of textual and non- textual data (conveyed by the larger data-cylinder) as well as import and query procedures are being researched and developed as part of the project discussed in this paper. On the left of Figure 1, two datasets of textual and non- textual data are selected for integration. The present par- ticular case study aims at working on a dataset of textual data that contains a varied corpus of documents ranging from design-descriptions, to performance-reports, to inter- nal standards, to inspection-reports, to lessons-learned. The dataset of non-textual data is intended to contain a selection of component models ranging from Computer-Aided Design (CAD) models, to calculation (MatLab) models, to simu- Figure 2: Ontological modules in Architecture lation (Simulink) models, to models for in-house tools used in preliminary design. It is assumed that the textual and the non-textual datasets contain information sources that ogy box (TDocO ) is simple, as it includes a limited number are interrelated or relevant to one another. For instance, of owl classes of things likely to be mentioned in aviation- the textual data may contain indirect references to the non- related texts, e.g. names of aviation companies, of com- textual data, in the form of figures of components generated pany management, of aircraft families, models or compo- from the non-textual data. It may also contain knowledge nents. For the most part, DocO contains individuals in its that helps recovering the original intended meaning of legacy assertion box (ADocO ), which are related to one another by non-textual data that is poorly documented. assertions of a single type of object-property akin to the The right side of Figure 1 shows the intended end-users. narrower -than relation between a hyponym and its hyper- The user at the top types text into a text document through nym in linguistics. For instance, the individual represent- a text editor, and in the process receives references to rel- ing a given aircraft model (e.g. A320 -100 ) has a narrower evant textual as well as non-textual data. The user at the scope than the individual representing such model’s aircraft bottom works on tagging and/or on modifying a model in a family (i.e. A320 ). Similarly, both individuals represent- model-management tool (possibly embedding a design soft- ing the model and the family have a narrower scope than ware, from which the model is accessed), and in the process the individual representing the airframer (i.e. Airbus). As receives references to relevant non-textual as well as textual explained in Section 3.1.2, this design choice allows greater data. representational freedom, by supporting for instance meta- The central part of Figure 1 shows the part of the archi- classifications. Also note that at present this ontology is not tecture being developed to achieve the linguistic-semantic publicly available. integration within and across the two datasets. This part of the architecture relies on three ontologies: the Documenta- tion Ontology (DocO), the Model Ontology (ModO), the TDocO ::= { Design Ontology (DesO). In the case-study under consider- AicraftModel v Thing, (1) ation, these ontologies have been developed independently AircraftFamily v Thing, (2) of one another and they make disparate ontological com- Airframer v Thing, (3) mitments in order to meet the representational and compu- AircraftClass v Thing, (4) tational requirements of the software components that rely on them (the boxes in Figure 1). Their differences should AircraftComponent v Thing, (5) be resolved in such a way that each component can access narrower -than v topObjectProperty} (6) knowledge available in all three ontologies without changing ADocO ::= { its representational requirements. A320 : AircraftFamily, (7) Figure 2 provides an overview of the main groups of con- A320 -100 : AicraftModel, (8) cepts (white modules), which the three ontologies under con- SingleAisle : AircraftClass, (9) sideration represent. The gray bars at the bottom show the Airbus : Airframer , (10) extent to which each ontology covers such concepts, as well as the extent of their overlap. DocO, DesO and ModO narrower -than(A320 -100 , A320 ), (11) contain specifications of upper, core and domain classes es- narrower -than(A320 -100 , SingleAisle), (12) sential for aviation. narrower -than(A320 , Airbus) (13) narrower -than(A320 -100 , Airbus)} (14) 2.1 Documentation Ontology The top part of Figure 1 illustrates a proprietary ontology- Figure 3: DocO on A320-100 based Natural Language Processing software that classifies word and phrase occurrences in text using DocO, which the description logic (dl) notion of concept is called class. contains aliases as well as disambiguation terms in multiple dl’s logical constants have their standard meaning, i.e.: C1 languages. As exemplified in Figure 31 , DocO’s terminol- v C2, C1 is a subclass of C2; i : C, i is an individual of class C; ∃ R.C, all individuals of a given class are in a relation R 1 As in the Web Ontology Language (owl), in this paper with individuals of class C. 2.2 Model Ontology Ontology2 proposed in [1]. As exemplified in Figure 6, The central bottom part of Figure 1, shows the model ex- DesO contains a rich terminology for aircraft design. The traction and management software that supports the inte- emphasis is on providing conceptual descriptors: DesO con- gration of technical model data, based on a system proposed tains upper and core modules for quantities, dimensions, in [5] and similar to the framework discussed in [11]. That units and parameters, imported from the ontologies QU3 is achieved by first transforming a technical model into an and QU-Rec-204 . It also provides a mereological and con- ontology (a transformation that takes place in the Semantic nectedness structure specified between and across aircraft Data Model Integration module). Consider for instance the components, with a module for so-called aircraft aspects, decomposition for an A320 -100 ’s fuselage height modeled in i.e. functional combinations of physically separate subcom- the model-excerpt shown in Figure 4. Such decomposition is ponents (e.g. the undercarriage group). As mentioned, the transformed into a model ontology by creating an individual model ontology is combined with the design ontology as in of class MeasuredValue with data properties for name, value Figure 7, i.e. by establishing an updated version of ModO, and unit, as shown in Figure 5. Note that TM odO in Fig- ModO∗ , which imports DocO and allows to classify the ure 5 contains all of ModO’s class hierarchy. An interface individual introduced in Axiom (26). between such hierarchy and a particular tool’s data model allows for the automatic transformation of a tecnical model’s parameters into individuals of ModO’s classes. TDesO ::= { Aircraft v Thing, (30) Fuselage v AircraftSubComponent, (31) aircraft model: A320-100 FuselageDescribingParameter v geom SubComponentDescribingParameter , (32) fuselage DistanceParameter v SingleAircraftParameter , (33) height: 14.4 m hasFuselage v · · · v hasPart, (34) mass length: 37.57 m isDescribedByHeight v · · · v fuselage: 7800 kg isDescribedByParameter , (35) aero Aircraft v ∃hasFuselage.Fuselage, (36) Fuselage v Figure 4: Excerpt of model for sizing code ∃isDescribedByFuselageDescribingParameter . FuselageDescribingParameter , (37) FuselageDescribingParameter v ∃isDescribedByHeight.DistanceParameter (38) TM odO ::= { DistanceParameter v < java : javax .measure.unit.Unit > v Thing, (15) ∃unit.DistanceUnit, (39) NamedElement v Thing, (16) DistanceUnit v Unit, (40) DataType v NamedElement, (17) Unit v ∃name.xsd : string u ∃symbol.xsd : string (41) CompositeValues v DataType, (18) SingleAircraftParameter v LeafValue v DataType, (19) ∃numericalValue.xsd : double} (42) Scalar v LeafValue, (20) ADesO ::= { FloatPointValue v Scalar , (21) A320 -100 : Aircraft} (43) MeasuredValue v FloatPointValue, (22) FloatPointValue.value v topDataProperty, (23) Figure 6: DesO on A320-100 NamedElement.name v topDataProperty, (24) MeasuredValue.unit v topDataProperty} (25) AM odO ::= { TM odO∗ ::= TM odO ∪ TDesO (44) geom.fuselage.height : MeasuredValue, (26) AM odO∗ ::= AM odO ∪ { NamedElement.name(geom.fuselage.height, 00 geom.fuselage.height : DistanceParameter } (45) height 00 b bxsd : string), (27) MeasuredValue.unit(geom.fuselage.height, 00 00 Figure 7: Combined ModO and DesO on A320-100 m b bhjava : javax .measure.unit.BaseUniti), (28) FloatPointValue.value(geom.fuselage.height, 00 4 .14 00 b bxsd : double)} (29) 3. THE CHALLENGES Figure 5: ModO on A320-100 There are two main types of challenges in combining DocO, ModO and DesO. On the one hand, there are challenges of 2 https://github.com/astbhltum/Aircraft-Ontology 2.3 Design Ontology 3 http://www.w3.org/2005/Incubator/ssn/ssnx/qu/qu ModO is mapped onto a reference design ontology DesO. 4 http://www.w3.org/2005/Incubator/ssn/ssnx/qu/ In the present case-study DesO builds on the Aircraft qu-rec20 ontology integration: given ontologies, each separately rep- resenting knowledge relevant to the aviation sector, how can they be combined? As mentioned, their differences should be resolved in a way that preserves each component’s repre- sentational requirements. On the other hand, there are chal- lenges of coverage: do the integrated ontologies adequately represent the aviation sector or should their content be en- riched? 3.1 Ontology Integration DocO, ModO and DesO need to be integrated in two ways: terminology alignments should be found by means of ontology matching techniques [8], difference in abstraction levels should be resolved by meta-modeling [7, 6]. 3.1.1 Matching DocO contains individuals for measurement-related no- tions, although not organized in any structure. On the other hand, the measurement-related modules of ModO and DesO largely overlap as apparent in the similarities between Axioms (27) and (38), or Axioms (28) and (39), or Axioms (29) and (42), all of which make Axiom (45) plausible. The (a) Matching (b) Meta-modeling challenge is to find a general approach to resolve the mod- eling differences between ModO and DesO. Without such Figure 8: Target results of integration alignment between AM odO and TDesO , the consequences of classifying geom.fuselage.height as a DistanceParameter can- not be tested by a reasoner based on individual’s and class’ Automatic or human, the pilot has control functions that properties, thereby limiting the main feature of ontological require agent-like properties. Finally, as described in [4], modeling. automated alternatives to literature-surveys have been pro- posed. Such techniques measure semantic similarity against 3.1.2 Meta-modeling a standard ontology, or against a relevant corpus or a the- DocO contains meta-classifications. Axiom (8) asserts an saurus, or measure the fitness of an application that embeds individual aircraft model (class introduced in Axiom (1)). the to-be-evaluated ontology to accomplish a certain task Axiom (9) asserts an individual aircraft class (class intro- (e.g. answering competency questions relevant to the goals duced in Axiom (4)). Yet, an A320-100 is often classified as of the ontology development). an instance of (in owl: an individual of class) single-aisle. This would require to assert in dl a higher-order axiom like the following: A320 -100 : SingleAisle : AircraftClass. 4. FUTURE WORK Of course, that is not possible, as an individual in dl (a The system being developed based on the proposed ar- fragment of first-order logic) cannot on its turn classify other chitecture attempts to tackle the challenges described above individuals. DocO mimics such encapsulated classification by providing operational definitions of the arrows References between individuals by asserting a narrower -than relation, and Mappings in Figure 1. This should result in the follow- as in combined Axioms (9) and (12). ing operations. Match: To integrate taxonomical structures and paramet- ric data, alignments between DocO, DesO and ModO need 3.2 Ontology Coverage Assessment to be established. Recommender systems are being tested While the ontology resulting from the integration of the to support this operation, which present complexities, for ontologies shown in Figure 2 would include many notions rel- instance when matching ModO with DesO. As shown in evant to aviation, they would miss modules usually included Figure 8a, the former entangles in a single individual the in multi-disciplinary engineering ontologies. notions of: distance parameter, measure of height, fuse- One way of assessing ontology coverage is to compare a lage parameter; the latter, on the other hand, separates given group of ontological modules with benchmarks pro- these notions. The matching mechanism may need to be posed in the relevant ontological literature. For instance, complemented with the creation of individuals or property [10] discusses the range of notions comprised in such mul- assertions (e.g. between FuselageDescribingParameter and tidisciplinary engineering domains. According to this pro- geom.fuselage.height). posal, DesO misses conceptualizations for: physical objects Meta-model: To make available for DocO the result of (though implied by part-of relationships between compo- the match operation at the appropriate level of abstraction, nents), functionality (partly implicit in aircraft aspects), relevant classes in DesO need to be modeled as individuals, processes and materials. Also, DocO contains many in- i.e. meta-modeled or, more specifically, reified in DocO. As dividuals representing agents (e.g., persons, organizations). shown in Figure 8b, the resulting version of ADesO would DesO does not provide any conceptualization of agents, contain the same knowledge as the three original matched given its focus on preliminary design. For a wider scope, ontologies, although extra classes would be added to classify though, at least one agent may become relevant: the pilot. the meta-modeled (reified) classes. Assess Coverage: To estimate to which extent the matched assessment; (ii) conversely, knowledge that is gained dur- DocO, DesO, ModO contain the terminology found in a ing the pre-production of a product may be fed forward to corpus and point out missing notions, coverage assessment later phases (e.g. scheduling or configuration): such feed– technique will be tested focusing on the automated testing forward loop would directly impact the production line and of mereological and functional properties. the robots operating in it. Also, the increasing role of virtualization is an additional 5. DISCUSSION motivation for researching if it is possible to blur the distinc- tion between data generated in the Smart Factory and in the This section provides a preliminary methodological dis- Smart Studio. On the one hand, the Smart Factory needs cussion of how knowledge management architectures, such virtual representations of physical products for new produc- as the one presented above, should be positioned with re- tion techniques (e.g. 3D Printing). On the other hand, the spect to the wider research area of Industrie 4.0, in partic- Smart Studio applies virtual design approaches (e.g. virtual ular with respect to subareas that involve Linked Data and testing or hardware-in-the-loop testing). As both the Smart Robotics applications. Factory and the Smart Studio will increasingly be working In our present working definition of the relationships be- on virtual representations of the same final physical prod- tween the different research-areas that contribute to the vi- uct, linking the data underlying those representations would sion of Industrie 4.0 we are assuming a fairly rigid partition translate into increased agility throughout the product life- between: cycle. As a final (counter)point on what discussed in this Section, Pre-production processes (or work phases) which, for the it should be noted that the possibility of blurring the distinc- most part, are based on intellectual or experimental tion between data generated in the Smart Factory and in the activities (i.e. the part of the process chain from Con- Smart Studio, even if eventually viable, may not be uncon- ceptual Design to Prototyping). ditionally welcome by practitioners. The aviation industry, Production processes (or work phases) which have at their for instance, is subject to the strictest design certification core physical activities or transformations (i.e. the requirements, which entail long and costly certification pro- part of the process chain from Mass Production and cedures. As a result, designs in Aviation are rather stable Assembly to Quality Assurance). in time. Therefore, the advantages of a feedback loop, such as (i) above, may not be obvious, because putting effort Questions about production processes are investigated in into an automated Knowledge Management infrastructure the subarea of Industrie 4.0 usually referred to as Smart to achieve fine-grained changes to product designs based on Factory. Here Robotics plays a central role, as a means feedback from the production line may not be considered to reduce production costs by more efficient and effective cost-effective. adjustment of production lines. Alongside Robotics, Cyber- physical Systems and the Internet of Things are key-ingre- 6. ACKNOWLEDGMENTS dients in achieving interoperability and decentralization on Research supported by German program LUFOV2, project the floor of the Smart Factory. EFFPRO 4.0, grant no. 20Y1509E. On the other hand, questions about pre-production pro- cesses are investigated in a subarea of Industrie 4.0 that, by analogy, could be called Smart Studio. Here Robotics plays, 7. REFERENCES if any, a less important role, whereas Knowledge Manage- [1] M. Ast, M. Glas, and T. Roehm. Creating an ontology ment and Artificial Intelligence are more prominent. The for aircraft design. In Deutscher Luft- und architecture presented in Figure 1, contributes to achieving Raumfahrtkongress 2013, Stuttgart, 2013. interoperability in the Smart Studio. [2] S. Biffl and M. Sabou, editors. Semantic Web In this context Linked Data, i.e. the result of interlinking Technologies for Intelligent Engineering Applications. structured data coming from different sources (as proposed Springer, 2016. in 2006 by Tim Berners-Lee5 or in [3]) play an important [3] C. Bizer. The emerging web of linked data. IEEE role in achieving research goals either within the Smart Fac- Intelligent Systems, 24(5):87–92, 2009. tory or within to Smart Studio separately, because require- [4] N. DiGiuseppe, L. C. Pouchard, and N. F. Noy. ments and models within each of these research areas are SWEET ontology coverage for earth system sciences. sufficiently homogeneous. Earth Science Informatics, 7(4):249–264, 2014. What still needs to be clarified, though, is the extent to [5] M. Glas. Ontology-based Model Integration for the which Linked Data (or any other integration approach) can Conceptual Design of Aircraft. Dissertation, deliver results across the Smart Factory and the Smart Stu- Technische Universität München, München, 2013. dio. Is it possible to blur the distinction between data used [6] B. Glimm, S. Rudolph, and J. Völker. Integrated or generated during pre-production and data used or gener- metamodeling and diagnosis in OWL 2. In P. F. ated during production? Patel-Schneider, Y. Pan, P. Hitzler, P. Mika, L. Zhang, Ideally, (i) knowledge that is gained during the produc- J. Z. Pan, I. Horrocks, and B. Glimm, editors, tion of a product would be fed back to previous phases (e.g. International Semantic Web Conference, Revised the design of a new version of that same product): such Selected Papers, Part I, volume 6496 of Lecture Notes feedback loop would allow to modify the design of a product in Computer Science, pages 257–272. Springer, 2010. based on data generated during its production or its quality [7] N. Jekjantuk, J. Z. Pan, and Y. Qu. Diagnosis of 5 software models with multiple levels of abstraction W3C recommendation https://www.w3.org/DesignIssues/ LinkedData.html using ontological metamodeling. In Proceedings of the International Computer Software and Applications Conference, pages 239–244. IEEE Computer Society, 2011. [8] O. Kovalenko and J. Euzenat. Semantic matching of engineering data structures. In Biffl and Sabou [2], pages 137–157. [9] G. La Rocca. Knowledge based engineering: Between AI and CAD. review of a language based technology to support engineering design. Advanced Engineering Informatics, 26(2):159–179, 2012. [10] C. Legat, C. Seitz, S. Lamparter, and S. Feldmann. Semantics to the shop floor: towards ontology modularization and reuse in the automation domain. IFAC Proceedings Volumes, 47(3):3444–3449, 2014. [11] T. Moser. The engineering knowledge base approach. In Biffl and Sabou [2], pages 85–103.