Ontology as a backbone of future-proof software
                                development
                                Patrik Kompuš1
                                1
                                    Faculty of Informatics and Statistics University of Economics, nám. W. Churchilla 1938/4, Prague, Czech Republic


                                                                         Abstract
                                                                         The world is experiencing a huge demand for software developers and, at the same time, the amount of
                                                                         skilled software developers is low. As a result, companies tend to hire unskilled resources. Therefore, the
                                                                         development of a system, in relation to business demands, is often very slow. Among various technical
                                                                         solutions to overcome this issue, separation of concerns between services is showing good results. I
                                                                         would like to show that we can apply the same logic in a not only technical solution and introduce
                                                                         (and demand) separation of concerns in the development itself. This would include correctly assigning
                                                                         ownership of the data model to data experts instead of development teams. The proposition is that
                                                                         ontologies are suitable means of supporting this separation. In this PhD project, I will attempt to partially
                                                                         verify whether a new methodology and approach based on ontologies will allow us to lower the number
                                                                         of resources spent on developing new products and future product features; these resources include time,
                                                                         personnel and costs.

                                                                         Keywords
                                                                         ontology modelling tools, semantics in services and processes, business optimization


                                1. Introduction
                                Software development has been a crucial part of almost every industry for the past decades,
                                very often referred to as the ”fourth industrial revolution”. Not that it was not here even before,
                                but since the hardware capabilities rose dramatically, the need for big amounts of software
                                multiplied even more and we are long past the mark to get away without it. Be it in the world of
                                data persistence (big-data management, cloud storage, etc.), libraries and other implementations
                                of business logic on top of these data (micro-services, APIs, etc.), up to presentation towards
                                human consciousness (explosion of front-end and UX pattern libraries, mobile apps, etc.). As a
                                side effect, we are facing enormous demand for so-called ”IT specialists” as a human resource.
                                These roles accounted for 4.5 % of the total workforce in 2021 across the whole of the EU, an 1.3
                                percentage points increase from 2012. In numbers, the growth in that decade is more than 50
                                %, which was slightly less than 8 times as high as the corresponding increase (6.3 %) for total
                                employment [1]. Every single company out there today has a version of an IT-department, e.g.,
                                there must be someone responsible for the company’s IT infrastructure and daily operations.
                                But let us focus on software development.


                                The 22nd International Semantic Web Conference, November 06–10, 2023, Athens, Greece
                                Envelope-Open qkomp00@vse.cz (P. Kompuš)
                                                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Figure 1: Natuvion 2022 Study - data and product transformation challenges


2. Problem definition
2.1. The problem
Since there is a huge demand for software developers, and the amount of skilled software
developers is low, companies tend to hire unskilled resources. Their idea is following:

    • invest time and resources into a junior developer,
    • after few months the expertise would rise to mid-level, later to senior,
    • such resources would then stay with the company for next decade, preferably longer.

   Such flow generally works in non-IT related positions, but is hitting a wall very early in all IT
fields. People flee to competition, as they offer them better compensation and benefits. This is
nothing new, but what is new is the time they stay with one employer - the employee retention.
It varies between 1.5 years to 3.2 years in big tech companies (Facebook, Tesla, Netflix, etc.) [2].
   Looking as a human resources problem at first, the biggest challenge the companies are
facing, when they lose an IT resource, is the knowledge they take with them. As a result, they
must do the whole recruitment cycle all over again, and on top of that, they need to collect the
knowledge of the leaving personnel and they need to do it very quickly.
   The argument against might be that the knowledge is not really lost if it was documented,
which is true. Unfortunately, because companies are hiring junior developers at a start, those
often lack the inner urge to document what they have been working on. If they manage to
hire mid-level or senior developers, they are often too busy to fix issues left by the gone junior
developers or are busy reverse-engineering the code from gone senior developers, who took
the knowledge with them. Therefore, the development is slow. When something is developed
fast, due to pressure from stakeholders, errors are introduced.
   A study by Natuvition from 2022 [3], where 200 companies responded to a structured survey
which investigated experiences and challenges during their product and data transformation
journey, shows the biggest challenge is the shortage of qualified personnel followed by data
quality issues, see Figure 1.
   In the same study, the companies were asked to define how quickly relevant changes to
their systems have to be implemented in the future. About 95 % stated that changes should be
introduced within a year, about 70 % indicated a time frame of six months, and 40 % even a
period of only three months.
   To summarize the problem:

   Software in industries is very often developed slowly and/or with errors and it seems that one
reason for it is huge demand for software developers. Companies hire unskilled resources and
appoint them with core responsibilities. Instead of having access to the needed data knowledge
or consulting with data experts, they tend to build data structures on their own, which causes
mismatch between data experts’ view and product owners’ view over the same domain. As a result,
developers are the owners of the data model. When the development team is assigned to another
project, or simply leaves the company, the knowledge of data model is usually lost or incomplete.

2.2. Typically proposed solutions
This issue has been known for some time now and there are different angles of proposed
solutions for it. Namely, these 3 are being used broadly:

    • raise motivational incentives for developers, thus making them stay in the company
      longer (HR),
    • outsource whole IT or some part of it (HR + Business management),
    • introduce different methodology for software development, e.g. Agile and Scrum (Project
      management).

  There is one more proposed solution, that, in my opinion, at least partly addresses the issue
correctly. Micro-services (IT).

2.2.1. Micro-services
This is not a business, HR, or project management solution, but a technical solution. It decouples
monolithic infrastructure into micro-services, where each service has one purpose only, and
together they form a symbiotic ecosystem of services/components communicating through
predefined channels and only across allowed layers of infrastructure. By doing that, each service
has not just its own responsibility externally, but it also must assure its integrity internally.
With such a requirement, the developer working on such service, is required to:

    • communicate with external stakeholders (other developers) in a way, which starts to look
      more like a business relationship, rather than a friendly discussion. They are mutually
      agreeing on a contract between services;
    • by working on a contract, they are in fact preparing the documentation for external
      parties trying to communicate with their service in the future. It is also used for any
      newcomer to work on that service, thus again serving externally and internally;
    • as they are responsible for one service with one purpose only, they are keen on writing
      appropriate tests to keep the integrity of the service solid and not to be accountable if
      error occurs in the chain of instructions being invoked over several services/components.
      In other words, they do not want to be the one where the error is. Surprisingly, this
      comes to them naturally;
    • as they work towards smaller objectives, they are putting high effort on re-usability of
      the code and therefore component itself. Again, this comes to them naturally.

   For companies, moving towards micro-services, it requires huge investments in resources,
but also time. Yet still, it is not the final solution for the problem of high demand for developers.
Although it brought a great shift in the way how the developers perform, it did not go deep
enough, e.g.: contracts are prepared, reviewed, and agreed between developers only and it is
also only them who is creating documentation about things.

2.2.2. Research question
The problem summary and typically proposed solutions are subject of the main research
questions:
Q1: Can non-technical separation of concerns in software development, meaning assigning the
ownership of data models solely to data experts, help solving the technical problem of badly designed,
not future-proof and unreliable software, as well as help solving the problem with knowledge loss
within a business?
Q2: What methodologies and supporting tools can provide sufficient foundation for such separation
of concerns and effective long-term dynamic persistence of domain knowledge?


3. Approach
Using Ontology as a foundation for data models and their use in development phase might
solve the problem partially, if not fully. The idea of micro-services has shown how separation
of concerns between services solves the technical issues. Using similar logical, not technical,
approach of concern separation, I would like to show how it can be applied on a bigger scale
and introduce and demand separation of concerns in the design and development itself. Once
the knowledge is inside the graph, it is not bound to any expert nor technology, which solves
the problem of crucial internal business knowledge and information loss.
   This work is, not only, but also going to follow in steps of following projects:

    • MOST - Marrying Ontology and Software Technology [4], a European research project
      with the goal to improve software engineering by leveraging ontology technology,
    • ODSD - Ontology Driven Software Development [5], result of MOST and a way of moving
      from Model-Driven Software Development (MDSD).

 To answer the research questions, I would like to test my objective against real environment,
meaning operating business. Partially, following goals are meant to be reached:
3.1. Methodology and supporting tools for data experts
Preliminary propositions are following: current data experts in the business (not researchers) of-
ten lack the knowledge of ontology, semantic technologies. Therefore, it is necessary to prepare,
evaluate and later implement new methodology for data owners to easily manipulate with the
models. Moreover, it needs to be backed up with lightweight but powerful toolkit/services-suite
for them to be available to accomplish the modelling tasks without any technical obstructions.
   This methodology should be strict on following:
    • do not let developers construct the models about data that the service operates with. Let
      it be data owners who do this modelling and deliver it to developers as-is;
    • do not let developers discuss the contracts between services but let the predefined data
      models rule the language throughout all communication channels;
    • let data owners and product owners control the process of full quality assurance across
      services/components, based on the models defined by data owners, and requirements
      defined by product owners;
    • let these models define the final contract to the outside world, thus having the possible
      external stakeholders discuss potential issues with data owners and not developers;
    • let the whole knowledge of data structures and business requirements be outside of
      developers’ domain and let it be documented and stored for anyone to access.
  All mentioned above will be subject of broad field research, interviews with data experts and
evaluation of the answers and information provided.

3.2. Methodology and supporting tools for developers
Just like new methodology is needed for data experts, it is also needed for the developers, who
would be implementing the business requirements and will need to understand what the data
experts have modelled. In fact, it will be more the services who would need to understand it,
then the developers themselves. High level of automation and generalisation will need to take
place and developers will need to start using predefined tests rather then verbal description of
requirements.

3.3. Prototype of data pipeline software
As a use-case to demonstrate how the whole data-product collaboration can be backed with
ontology and semantics, a prototype data pipeline to support the data models and data deliveries
will be implemented. The idea is to create an event-based data supply pipeline, that would
transform heterogeneous datasets into domain-specific homogeneous data events with their
life-cycle and pass it on to the product as single small RDF graph, as opposed to big XML/JSON
structures. Integrity of the whole data-delivery process, with emphasize on test driven develop-
ment in data/mastering layer, shall be one of the cornerstones here and shall be described in
detail. Proposed architecture see Figure 2. Technical implementation of such pipeline is not the
key part of my research, it is rather the internal complications in communication and agreement
between components of the pipeline and people developing and managing them. Knowledge
graph should serve as mediator and source of truth for all the components’ data models.
Figure 2: Data pipeline architecture


3.4. Prototype of product software
With data pipeline in place, prototype of product software would consume data events and
prepare a persistent storage of the whole universe. Here, the methodology for developers would
be put for evaluation. This too shall be implemented with test-driven approach, and in the same
time, kept within the agile way of features implementation.


4. State of the art in research
The opportunity of using ontologies in software engineering projects has already been addressed
by many. However, most of the previous approaches mainly focused on resolving particular
technological problems, either related to the mapping between an ontological representation
and the structure of implemented code, or way of developing the ontology itself.
   Parreiras [6] discusses Model driven software development (MDSD) in a synergy with Seman-
tic web technologies, ontologies in particular, as a new way of enterprise computing. MDSD is
a way to form a contract between components based on a static UML model. Yet this model
does not contain all relevant metadata, nor any representations of the modeled entities, like
OWL does with it’s individuals. Also the ownership of data models it is still within developers
only. Using knowledge graph populated with ontology not only as a contract, but as a core
part of the development dictating the structures anywhere in pipeline (and further at external
consumers) allows for model and data content validation anywhere in any of the components,
and in the same manner, thus reducing the implementation time. At the same time, the dynamic
essence of ontology is kept and used, again throughout whole pipeline.
   Ledvinka & Křemen [7] bring a very helpful comparison of existing mapping libraries between
triples (semantic world) and objects (object-oriented programming). Yet this does not suffice
the purpose of the research, the competence split between developers and data experts. Once
object-oriented classes are prepared, developers are in charge, which drags them back to their
standard way of thinking.
   Verborgh & Taelman [8] discuss and showcase how helpful for a developer’s experience is, if
they think outside of the box, for example with Linked Data abstraction. This is the view from
the other end than than modelling - the application view. In my research, I would like to show
how we can bridge these worlds of data experts and developers.
   At ISWC in 2022, Hovland and Chrislock [9] described their need for sending small portions
of data in non-traditional way to suffice their business requirements. Their proposal discusses
implementation of so-called versioned objects, small RDF graphs. During the research I would
like to study and evaluate this approach more in depth and possibly improve it with my findings
while implementing the data pipeline prototype.
   OWL-S: Semantic Markup for Web Services, a 2004 W3C Submission [10], is an ontology
formally describing the web services. This is not essential for the proposed idea, but might be
beneficial especially when integrating knowledge graph in product layer, therefore this will be
further researched in my work.
   Last but not least is the LOT project [11], which is oriented for engineering ontologies for
usage in industries. The topic is very similar to the purpose of my research and also proposes
methodology of ontology creation and maintenance, therefore it will be a major point in my
evaluation. However, LOT methodology does not go further to the usage of ontologies and the
data by further components in the chain, which is exactly what I would like to address in my
research.


5. Evaluation
Main target of the research is to show how having the ontology as a core component in the
whole system can speed up the process of implementation and augmentation of it, series of
use-case scenarios will be selected from interviews with stakeholders. As for comparison,
identical system/product will be developed using more standard technologies and tools, e.g.
SQL, XML, JSON and no primary modelling language, as that is the situation in businesses
nowadays - system A.
   Once in place, performing the same use-cases with proposed methodologies and pipeline -
system B, would visualize, whether using the ontology improved the implementation time, cut
the resources and dependencies. The goal is to show, that the change request time-to-market
is the key for business to be keen to adopt new methodologies. Therefore, the scenarios will
mainly include structural changes, new product feature requirements, application of datasets
usage restrictions and other compliance rules. Evaluation metrics will be defined based on
importance level for the stakeholders, e.g. time-to-market, amount of resources needed etc.
   Moreover, evaluations and comparisons with current state of the art tools will take place.
LOT project, mentioned before, will be the closest competitor in terms of ontology creation
methodology for data experts proposal, together with the tools being used.
6. Conclusion
This article proposes an idea and prototype of technical/architectural solution, using semantic
technologies as a core component, for social and business problem of high demand for IT pro-
fessionals, from the perspective of high employee retention and cost optimization, respectively.
The idea is provocative and needs deeper research and evaluation. In case we will see significant
time savings in the development phase, clear responsibility and concerns separation between
data experts, developers and product owners, yet achieve the same, or even higher level of
quality and integrity of demonstrated solutions, this will be a key achievement for majority of
businesses, who are struggling with low-quality or late time-to-market solution and product
delivers. It will also show that all stakeholders involved will need to change their view on their
roles in the chain of solution and product development.


Acknowledgments
Here I would like to thank my supervisor, prof. Vojtěch Svátek, for his support and guidance.


References
 [1] Eurostat, Ict specialists in employment, 2022. URL: https://ec.europa.eu/eurostat/
     statistics-explained/index.php.
 [2] A. Levitsky, Facebook no longer has silicon valley’s highest employee turnover, linkedin
     user data shows, 2020. URL: https://www.bizjournals.com/sanjose/news/2020/12/30/
     employee-turnover-linkedin-data-2020.html.
 [3] Natuvation, Transformation 2022, the study, 2022. URL: https://www.natuvion.com/
     newsroom/challenges-2023/.
 [4] Most: Marrying ontology and software technology, 2011. URL: https://cordis.europa.eu/
     project/id/216691.
 [5] J. Z. Pan, S. Staab, U. Amann, J. N. Ebert, Y. Zhao, Ontology-Driven Software Development,
     Springer Berlin Heidelberg, 2012.
 [6] F. Parreiras, Semantic Web and Model-Driven Engineering, Wiley-IEEE Press, 2012.
 [7] M. Ledvinka, P. Křemen, Comparison of object-triple mapping libraries, Semantic Web
     Journal 11 (2020) 483–524.
 [8] R. Verborogh, R. Taelman, Ldflex: A read/write linked data abstraction for front-end web
     developers, ISWC 2 (2020) 193–211.
 [9] D. Hovland, F. Chrislock, Versioned objects, ISWC Industry Proceedings (2020).
[10] D. Martin, M. Burstein, J. Hobbs, O. Lassila, D. McDermot, S. McIlraith, S. Narayanan,
     M. Paolucci, B. Parsia, T. Payne, E. Sirin, N. Srinivasan, K. Sycara, Owl-s: Semantic markup
     for web services, 2004. URL: https://www.w3.org/Submission/OWL-S/.
[11] M. Poveda-Villalón, A. Fernández-Izquierdo, M. Fernández-López, R. García-Castro, Lot:
     An industrial oriented ontology engineering framework, 2022. URL: https://doi.org/10.
     1016/j.engappai.2022.104755.