Methodical Conversion of Text to Models
MuDForM Definition and Case Study

Robert Deckers1 , Patricia Lago1
1
    Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands


                                         Abstract
                                         To enable the people involved in a software development process to communicate and reason close
                                         to their area of knowledge, we are investigating a method to formalize and integrate knowledge into
                                         domain models and into specifications in terms of those domain models. For this purpose, we have
                                         previously defined a list of method objectives, and an initial version of the method –called MuDForM.
                                         This paper reports on the method part that covers the creation of an initial model from textual documents
                                         via systematic grammatical analysis, which is especially helpful in the transition from a text-based- to
                                         a model-driven development process. We performed a case study in the printing domain to validate
                                         the method. We found that the presented analysis concepts, method steps, and guidelines help to
                                         systematically convert a textual specification into an unambiguous model.

                                         Keywords
                                         Method engineering, Natural language processing, Domain modeling, Model-based engineering


1. Introduction
This work introduces an integral modeling method, called Multi-Domain Formalization Method
(MuDForM), which provides support for the creation of domain models (DM), and for the creation
of models that are defined in terms of a domain model, called domain-based models (see Fig. 1).
All together, these are called domain-oriented models. MuDForM provides analysis and modeling
concepts, steps, and guidelines to conduct a modeling process, which starts with a knowledge
source, like a (domain) text or (domain) expert. This paper explains the method part for creating
an initial MuDForM model from a text, and demonstrates it in a case study.

                                                                                           based on
                                         (Domain) text             Knowledge source                    Domain-oriented               Domain model
                                                                                                           model

                                                                                                                                             defined in terms of

                                                                    (Domain) expert                    MuDForM model                 Domain-based                  Feature model
                                                                                                                                        model


Figure 1: Context of MuDForM models (UML class diagram)


PoEM 2022 Forum, 15th IFIP Working Conference on the Practice of Enterprise Modeling 2022 (PoEM-Forum 2022),
November 23-25, 2022, London, UK
$ robert.deckers@atomfreeit.com (R. Deckers); p.lago@vu.nl (P. Lago)
 https://www.linkedin.com/in/robertdeckers/ (R. Deckers); http://patricialago.nl (P. Lago)
 0000-0002-3020-7550 (R. Deckers); 0000-0002-2234-0845 (P. Lago)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
   The rest of this section describes the problem we aim to address, our contribution, and
target audience. Section 2 explains in more detail what we aim to achieve with MuDForM.
Section 3 explains the research methodology. Section 4 gives an overview of MuDForM, which
explains how the support for grammatical analysis (GA) and the text-to-model transformation,
respectively defined in Sections 5 and 6, are integrated in MuDForM. Section 7 reports on a
case study in which we applied MuDForM to formalize system behavior descriptions. Section 8
reflects on MuDForM’s support for grammatical analysis. Section 9 discusses related work, and
Section 10 concludes the paper and presents suggestions for future work.
Problem Statement. When organizations are transitioning from a development process based
on specifications in natural language to a model-based development process, they face the
challenge of creating correct models from not only the input of (domain) experts, but also
from existing system specification documents. A process that utilizes such documents, which
often have cost significant effort, such that it minimizes the need for involvement of often busy
domain experts, would be a great advantage.
   Kosar et al. [1] present a systematic mapping study on domain specific languages (DSLs).
They conclude that (domain) analysis is mostly done in an informal and incomplete way.
Among the reasons for this weakness, they mention that domain analysis is too complex and
outside software engineers’ competencies. Czech et al. [2] gathered 130 best practices from 19
studies on domain-specific modeling (DSM). They group the best practices in different classes:
domain model, language design and concepts, generators, DSL-tooling, meta-model tooling,
and practices that concern an entire DSM-solution. Only 3 best practices are about the domain
model, and those are actually not about modeling itself, but about the context of a domain
model. We observe that they did not find and distill any best practices for extracting domain
models from text.
   Deckers and Lago observe in a systematic literature review (SLR) [3] that most approaches for
domain-oriented specifications do not offer full methodical support, i.e., a metamodel, notation,
fine-grained method steps, and guidelines, for extracting models from natural language texts.
Some offer parts of those, but none integrates them all.
   MuDForM explicitly aims at making the (domain) analysis phase a systematic activity, with
integrated metamodel, steps, and guidelines, starting from a natural language text, in order to
make the creation of models more predictable and easier to learn.
Contribution and Audience. This paper has two main contributions. First, it presents
integrated methodical support for the analysis of domain texts to extract model elements and
model fragments. The support goes further than other comparable methods, because, next to
extracting domain models, MuDForM also supports extraction of model elements for feature and
context models (clarified in Sections 8 and 9). The metamodel covers the GA concepts and their
integration into the modeling concepts. The method steps enable the planning and organization
of analysis and modeling activities. The explicit guidelines help to capture and dissipate analysis
and modeling knowledge. Moreover, the method steps and guidelines are defined in terms of the
metamodel. Practitioners may use the methodical support to bootstrap their modeling activity.
Method developers may use the description of the support as an example of how to extend a
modeling method with a part for bootstrapping a model from an input text.
   As a second contribution, the paper presents the validation of the method in an industrial
case study. This paper reports on the phase from text to initial model. Researchers may use
the case study to understand the methodical support. Practitioners may use it as an example of
how to systematically analyze a text in order to create domain models.


2. Background: MuDForM Development
To understand the work that is reported here, we explain what we aim to achieve with MuDForM.
   We envision software development as a process in which the involved people make decisions
in their own area of knowledge, i.e., domain, and in which those decisions are integrated, and
finally result in a machine-readable specification.
   We have presented the objectives for MuDForM in [3]. One of them is that a method should
have a complete definition, which means it has a clear underlying model, i.e., meta model with
clear semantics, a defined notation (viewpoints and syntax), defined method steps, and guidance
for the steps and viewpoints. We have explicitly defined a new metamodel, because no existing
metamodel fulfilled all the objectives. We have based the method steps on the KISS method for
object orientation [4], which was already offering grammatical analysis integrated with domain
modeling and feature modeling, and extended it with explicit guidelines.
   The MuDForM objective that is the focus this paper is as follows: Almost all people, including
domain experts, use natural language to convey their knowledge and decisions. It is used in
many documents that are relevant in a system development process. A specification method
should support the transformation of knowledge described in natural language into
unambiguous models. The purpose of this support is to minimize loss of semantics and
increase mutual understanding in the communication between modelers and domain experts.


3. Research Methodology
This section describes the research methodology we have applied to gather the results presented
in this paper. Based on the problem statement and the above explanation of the MuDForM
vision, we define the following research questions:

 (RQ1) What methodical support can be given for the conversion of text into ingredients of a
    domain-oriented model? The answer is given in Sections 5 and 6, in terms of GA concepts,
    method steps, and guidelines, and validated through the case study from Section 7.

 (RQ2) How should methodical support for extracting knowledge from text be integrated in a
    method that aims to produce domain-oriented models? The answer is given in Section 4,
    in terms of how modeling concepts and method steps fit with MuDForM’s other modeling
    concepts and method steps.

   The development of MuDForM started as a project in which experience from industry practice
is captured and made tangible in a method vision and definition, followed by a phase in which
the method is applied to cases, and adjusted based on case findings. The approach can be seen as
a combination of design science and action research in the way described by Iivari and Venable
[5]. For this paper, we focus on the action research aspects according to the description by
Petersen et al. [6, 7], which inspired us to organize our study along the phases described below.
   Diagnosis. Based on our experience with modeling, architecture, and model driven devel-
opment in the past 25 years, we have defined a vision on software development and related
method objectives (see Section 2), and defined an initial version of the method .
   We have started to record and generalize our experiences, and work them out in detail since
we started the MuDForM research program in 2015. The method definition is available in [8].
   Action planning. We have performed a SLR [3], which was derived from the same method
objectives. From the SLR and the initial method definition, we identified topics needing further
research, and the parts of MuDForM that needed further development. One of them is the
methodical support for extracting models from natural language texts, i.e., the topic of this
paper. Meanwhile, we contacted industry partners and explained the MuDForM vision, the
MuDForM modeling process, and what a case study could do for them.
   Action taking. We have defined the metamodel and steps for the identified gaps and added
applicable guidelines from other approaches.
   Case study. We defined the case-specific objectives together with the industry partner, and
agreed on the timeline, and availability of people and documentation. We performed the case
study and presented and explained the recorded model to the industry partner. They used
the final model as the terminology in Gherkin test scenarios (e.g., [9]), in order to make those
scenarios unambiguous, and provided feedback. In Sections 8 and 9, we reflect on the case study
from the perspective of the research questions and related work.
   Reflection and action re-design. After completing the case study, we identified method
gaps and flaws, and defined the required method changes, i.e., revised the metamodel, method
steps, and guidelines.


4. MuDForM Overview
This section presents an overview of MuDForM, which forms the framework for the method
parts described in Sections 5 and 6.
   MuDForM is defined according to the guidelines of Kronlöf [10], which has resulted in a
method definition with the following ingredients: (i) a metamodel containing classes, activities,
attributes, associations, specializations, and constraints, which define the modeling concepts
and their relations, and (ii) a method flow containing steps, guidelines, and viewpoints, which
guide the modeling process.
   Section 4.1 explains the overall MuDForM modeling process; Section 4.2 the high level
structure of a MuDForM model; and Section 4.3 the modeling concepts that form the link
between the GA and the model engineering phase.

4.1. MuDForM Modeling Process
Figure 2a shows the steps of the MuDForM modeling process:

   1. Scoping: the scope of the targeted model is specified by defining its purpose, its bound-
      aries, and the input text that is selected from the knowledge source. The knowledge
      source is often an existing document, or a document that is created from interviews with
      (domain) experts.
   2. Grammatical analysis: the input text is analyzed and transformed into a set of phrases
      with terms that are candidate elements for the model. The goal of this step is to maximize
      the knowledge elicitation from the source, and to make the resulting model traceable
      back to the input. This step is explained in detail in Section 5.
   3. Text-to-model transformation: the specification spaces, which form the top-level
      structure of a model (see Section 4.2), are identified, and the phrases are transformed into
      model fragments, which are allocated to one of the specification spaces. This transforma-
      tion is the transition from working with text to working with models, and is explained in
      Section 6.
   4. Model engineering: the initial model is completed and inconsistencies are solved. Model
      engineering consists of a step to manage the dependencies between the specification
      spaces, and three steps for engineering the different types of specification spaces, i.e.,
      contexts, domains, and features. The complete MuDForM definition [8] contains more
      detail about the sub steps of model engineering.


4.2. MuDForM Model Structure
The top-level structure of a MuDForM model consists of related specification spaces, as depicted
by the MuDForM metamodel fragment in Fig. 2b. MuDForM uses specification spaces (similar
to UML packages) as containers for model elements.
   In our notion of domain, a domain model describes what can happen and what can exist in a
domain. A feature model prescribes what shall happen and what shall exist, and is expressed
in terms of domain model elements (see Figure 1). Context models capture assumptions and
knowledge about elements that are needed to specify domains and features, but that exist outside
those domains and features. By defining the dependencies between the different specification
spaces, domains and features have no implicit semantics.

4.3. MuDForM Model Elements
MuDForM offers different types of model elements. The type of specification space, i.e., domain,
feature, or context, determines which types of model elements are allowed, and what is their
semantics. The three different specification spaces all have concepts to specify state, behavior,
and concepts to specify the relation between state and behavior. Moreover, almost all model
elements can have attributes and specializations, and can have constraints attached to them.
The following elements are specific for engineering the domain model, and are thus possible
output of the GA and text-to-model transformation:

    • Domain activities define what can happen in a domain. Instances of domain activities
      are actions, which represent atomic (state) changes in the domain.
    • Domain classes define what objects can exist in a domain. Instances of domain classes
      are objects, which have a state that can be changed via actions.
    • Interactions define which objects can participate in which actions. Objects change state
      when participating in an action.
             Scoping                   Grammatical analysis

         Define purpose                   Extract phrases


        Demarcate area                  Determine relevance


        Select input text               Eliminate homonyms
                                            and synonyms


   Text-to-model transformation         List the final phrases

       Identify candidates


       Classify candidates             Model engineering

                                      Manage specification
      Identify specification          space dependencies
             spaces
                                        Engineer context
                                                                  Model element          Context Model
          Create initial
       specification spaces
               view
                                        Engineer domain
                                                                   Specification         Domain model
      Declare and allocate
                                        Engineer feature              space
        model elements
                                                                 +child    +parent
      Create initial models                                                              Feature model
                                                                   depends on

        (a) MuDForM method steps (UML activity diagram)          (b) Model structure (UML class diagram)
Figure 2: MuDForM overview.


  We have limited the explanation above to the domain model, because feature models and
context models are absent in the description of the case study in Section 7. However, they are
explained in the complete MuDForM metamodel [8].
  The overview of MuDForM from this section forms the context for the definition of GA and
text-to-model transformation, which are explained in the next two sections.


5. Grammatical Analysis
This section describes the grammatical analysis step as introduced in Section 4.1. This paper
only describes the method steps and GA concepts. The full method description can be found
in [8]. The sub steps of grammatical analysis are:
   1. Extract phrases from the selected input sentences and format them according to one of
      the following phrase types:
         • An interaction structure expresses a change to one or more objects. The format
           is: (subject) TO verb object (preposition object)*.
         • A static structure expresses a static relation between two terms. The format is:
           noun HAS noun, or verb HAS noun, or verb HAS verb.
         • A state structure expresses a property or type of a term. The format is: noun IS
           adjective or verb IS adverb or noun ISA noun or verb ISA verb
         • a constraint that expresses some condition to a term, typically formatted with
           propositional or predicate logic, like a “if A then B”, or a “for all A: B”. Also temporal
           constraints are possible like “after A then B” or “within X seconds after A”.
      Table 1 in Section 7.2 shows examples of these phrase types. Typically, each sentence
      from the input text leads to one or more extracted phrases, each containing two or more
      terms (nouns, verbs, adverbs, adjectives). The extracted phrases form a decomposition
      of the original sentence, and are processed in the next method steps, in which they can
      change in terminology or structure due to analysis decisions.
   2. Determine the relevance of each extracted phrase from the perspective of the defined
      scope. Discard phrases that do not fit the scope definition. Also check if phrases are still
      valid in case legacy text is analyzed.
   3. All phrases are checked for homonyms and synonyms. These are then eliminated in
      consultation with the domain experts to assure that all terms have exactly one meaning,
      and that all relevant meanings are covered by exactly one term.
   4. This results in a list of final phrases which is used as input for the model. The list
      contains all extracted phrases that are marked as relevant and not discarded, and newly
      added phrases, in which the identified homonyms and synonyms are replaced with the
      chosen term.
During the analysis, issues can be raised for an analysis item, i.e., a phrase or term. Guidelines
can be used in the decisions made to solve an issue. Fig. 3 presents the metamodel for GA and
the text-to model transformation.


6. Text-to-model Transformation
The transformation from text to an initial model consists of the following steps:
   1. Identify candidates: Determine which terms, i.e., nouns, verbs, adjectives, and adverbs,
      are a potential model element.
   2. Classify candidates: Select the type of each identified term. The metaclass Term type in
      Fig. 3 gives the possible types, which are partially explained in Section 4.3.
   3. Identify specification spaces: Identify contexts, domains, and features. Each specifica-
      tion space should have an owner who is responsible for its content.
   4. Create initial specification spaces view: create a view with all the specification spaces.
      Create dependencies and compositions between spaces if they are expected, or already
      known.
      Guideline                              Knowledge source
                                                                                                                   «enumeration»
applied to solve
                                                                                              MuDForM                Term type
                      raised for      1..*                           allocated to            Specification          Domain class
  Analysis issue                               Analysis item
                                                                               0..*     -     demarcation           Domain activity
 -     decision                                                                         -     purpose               Function
                                                                                                                    Actor
                                              «enumeration»                                                         Context class
                                               Phrase type                                                          Domain
                                                                                                                    Feature
     Input sentence                          Interaction structure
                                                                                                                    Context
                                             static structure                                               0..*
                                   0..1                                                                             Attribute
                                             state structure                                        of types
                          of type                                                                                   Operation
                                             constraint
                                                                                                                    Constraint
        Phrase                                                                              Term                    Function event
                                                                                                       0..1         Function step
               0..1
                                                                                                                    Event
     extracted from                                                                 originates from

Figure 3: MuDForM concepts for grammatical analysis (UML class diagram).


     5. Declare and allocate elements: create a model element for each candidate term and
        put it in a specification space. The model engineering phase will reallocate an element if
        it was initially allocated incorrectly.
     6. Create initial models: create a first version of the models from the list of final phrases.
        All interaction phrases become a relation between a behavioral element (activity, opera-
        tion, function) and a class. All static structure phrases become an attribute of the subject,
        and the attribute type corresponds with the object of the phrase. All state structure
        phrases become a generalization relation between the subject and the nominal part of the
        phrase. For the constraint phrases it depends; they can become invariants, preconditions,
        postconditions, or a temporal ordering in the lifecycle of a domain class or function. The
        initial model is the input for the model engineering step, which offers support for making
        the model complete and consistent.


7. A Case Study: System Behavior Description of the History
   Feature
This section presents the results from a case study in which we, together with domain experts
from a high tech company, applied MuDForM to a system feature described in a so-called system
behavior description (SBD).
   The high tech company develops and produces products and services for printing and work-
flow management. The development process for one of their product lines uses SBDs to specify
the behavior of product features. SBDs are the result of discussions and negotiations between
product managers, developers, and testers, and are used throughout the development and test
process. Currently, SBDs contain mostly natural language text. The case in this section is about
one of in total 90 SBDs, namely the SBD of the History feature, which describes the system
behavior for the management of completed print jobs.
   The goal of the case study is to evaluate MuDForM support for transforming textual specifica-
tions into initial models. The rest of this section focuses on the phase from text to initial domain
model. Deckers and Lago present more GA examples, including some for feature modeling,
in [11].

7.1. Case Study Overview and Execution
The case study was executed as a collaboration between a MuDForM researcher, a modeling
expert from the customer to guard the fit for purpose of the model, and several domain experts
supporting the unraveling of unclarities in the SBD text.
   During the modeling process, the most important decisions were recorded, and some of them
are used in the explanation of the results. We show examples of the resulting model to illustrate
how the method is applied. The complete model is not publicly available due to intellectual
property rights. But we have made a more elaborate excerpt of the case available via [12]. The
next two sections discuss the execution of the steps Grammatical analysis and Text-to-model
transformation as explained in Sections 5 and 6.

7.2. Grammatical Analysis of the History SBD
The GA starts with Extracting phrases from the input text. We follow the guideline “Use a
structure to separate input sentences”, as described in [8], and created Table 1, which shows
the sentences that are selected from the History SBD, and the phrases that are extracted from
them. In each row, the first column contains the input sentence, and the second column has
one or more extracted phrases. After the extraction, we Determined the relevance of each
phrase together with the domain expert, and Eliminate homonyms and synonyms across
the phrases. The last column explains the analysis decisions made for the raised issues, possibly
with a reference to the guideline on which the decision is based. After that we List the final
phrases, which are the emphasized phrases in the table. For clarification, we explain one row
(highlighted in gray) of the table: “Therefore, jobs that are too old will automatically be removed
from the history”. First, we extracted two phrases: “TO remove job from history” and “Job
IS too old”. We already had “to Delete” so we asked what is the difference with “to Remove”.
The domain experts said they are synonyms, and chose the term “to Delete”. Following the
guideline “Detect type of adjectives and adverbs”, we asked what kind of thing “too old” is. The
involved domain experts could not immediately provide clarity and were discussing about it.
So, we applied the guideline "Postpone too long analysis discussions" and kept the information
as is, and postponed the discussion to the model engineering phase, which will solve the issue
because then the discussions are more directed due to the use of specific viewpoints like the
object lifecycle, and model engineering criteria like (data) normalization.

                         Table 1: Selection of the grammatical analysis
 Input sentence                  Extracted (including final)   Decisions   (including   new   final
                                 phrases                       phrases)
 Input sentence                       Extracted (including final)    Decisions (including new final phrases)
                                      phrase
 When a print job is completed, it    TO complete job                To archive and to move are synonyms. Cho-
 will be archived in the so-called    TO archive job in history      sen: to Move.
 “History”.
 The History is a job store that      History ISA job store          To intend and to use are ignored because
 will be used as a local temporary    TO use history as local tem-   of guideline “Ignore intention phrases”.
 job store and is not intended for    porary job store
 long term archiving purposes.        TO intend History for pur-
                                      pose
 Only jobs that have been             TO complete job                To end up is not a domain activity. “job
 completed will end up in the         Job TO end up in history       is in History” is a state after “to archive”.
 History.                                                            Chosen: to move job from job store to job
                                                                     store
 Proof prints initiated from the      TO initiate proof print from   To initiate is considered out of scope. (It is
 waiting room and system jobs         waiting room                   in the scope of Job scheduling.), but Proof
 will not end up in the history       System job ISA job             print ISA job.
 when completed.
 Also jobs that have been aborted     To abort job
 or deleted will not end up in the    To delete job
 History.
 The Settings editor provides         To clean up history at time    Use retain period instead of time period.
 functionality to clean up the        period.                        Furthermore, it is the retain period of the
 History at specified time periods.   TO specify time period.        History which is specified, giving TO spec-
 The following time periods can       One day, one week, one         ify retain period of history, and TO clean up
 be specified: One day, One week,     month, forever ISA time pe-    History at retain Period.
 One month, Forever.                  riod.                          One day, one week, one month, forever are
                                                                     possible values of retain period.
 Jobs that have been longer in the    History HAS jobs
 History than the specified time      To specify time period
 period for the automatic cleanup
 are removed from the History
 Therefore, jobs that are too old     TO remove job from history     To Remove and to Delete are synonyms.
 will automatically be removed        Job IS too old                 Chosen: to Delete. Following the guideline:
 from the history.                                                   “Detect type of adjectives and adverbs”, we
                                                                     asked what kind of thing “too old” is. We
                                                                     did not get a clear answer. So, we kept it.
 If the history is disabled new       TO disable history             System and controller are synonyms. Cho-
 completed jobs will be removed       TO complete job                sen: controller. Giving: Controller HAS his-
 from the system, so they will not    TO remove job from system      tory
 end-up in the history.
 A job can be reprinted from the      TO reprint job from history    Is “reprint” the activity or “copy”? Answer:
 History by copying them from         TO copy job from history to    To copy. Reprint is the intention. And, what
 history to waiting room.             waiting room                   is a waiting room? Answer: Waiting room
                                                                     ISA job store. Giving: TO copy job from job
                                                                     store to job store.


7.3. Text-to-model Transformation for the History Domain Model
This section discusses the creation of the initial models from the results of the grammatical
analysis.
  The first steps are Identify candidates and Classify candidates as described in Section 6.
In this case, every term becomes a domain class if it is a noun, or a domain activity if it is a verb.
“Too old” is an adjective probably indicates a possible value of a context class. But we classified
it as a class, for reasons explained in the previous section.
   The next step is to Identify the specification spaces. We used the guideline “Begin with
one context, one domain, and one feature”, because the case was relatively small, and there were
no existing specification spaces. This led to the specification spaces History domain, History
feature, and Context.
   The next step is to Declare and allocate the model elements to the specification spaces.
We have allocated all terms to the History domain by following the guideline “In case of doubt,
put a candidate term in the domain”, except for Retain period and its possible values, which are
allocated to the Context.
   The last step is to Create the initial models from the phrases, which resulted in Figures 4a
and 4b. (The interaction view is compliant with the UML metamodel [13], because classes and
activities are both classifiers, and, as such, can have association relations.) All the emphasized
phrases are present in the diagrams. The more elaborate report of the case study contains more
phrases (see [12]).


                                                                     too old                 Controller
                    from
      Job store              to Copy                  to Abort
                     to
    to       from                                                     Job                        History

      to Move                  Job                   to Complete

                                                                   System Job                    Job store
     to Clean up             History                  to Delete
                                              from
            at               of

    Retain period          to Specify                 to Disable         Proof print       WaitingRoom

                       (a) Interaction view                                    (b) Static view
Figure 4: Initial model of the History domain. (UML notation)


8. Discussion
In the following we reflect on the research questions and discuss the findings from the case
study. We first discuss the support for GA (RQ1 from Section 3) and how it fits in the rest of
MuDForM (RQ2).
   Figure 2a presents the steps for the conversion of a text into an initial model. The steps
form a clear structure on how to organize this process. Each step can be planned and executed
accordingly. However, during the case study we found out that in practice it is easier to first
focus on the basic elements of the domain model, and not on the constraints and other aspects
of the feature model. This means that there are at least two iterations. The first iteration focuses
on the extraction of static phrase and state phrases, and interaction phrases that have no actor,
i.e., most phrases starting with “TO”. After that, conduct the model engineering until the domain
model is stable. The second iteration is about the extraction of interaction phrases with actors,
and about constraint phrases, which can immediately be rewritten to match the created domain
model. This second iteration is also a validation of the created domain model. Namely, all the
constraints should be expressed in terms of the domain model and possibly context model. If
not, then either the constraint phrase is unclear or incorrect, or the domain model must to be
adapted. Thanks to this insight, we have added the guideline “First do the domain model, then
the feature model” to the MuDForM method flow [8]. We also found that the usefulness of this
and other guidelines depends on purpose of the input text, i.e., describing system behavior. If the
text’s purpose is different, e.g., a set of requirements, a process description, or pure explanatory,
then the usefulness of guidelines might shift. This is even more apparent when not pure natural
language specifications are analyzed. Further research is needed to take this aspect into account.
    Section 4 presents how the steps for the text-to-model conversion fit into MuDForM. The
partial metamodel of Fig. 3 addresses how the concepts fit. All the possible values for Term type
and Phrase type correspond to classes and relations from the rest of the MuDForM metamodel
[8]. The fact that we do not have all the classes from the MuDForM metamodel as a possible
term type is due to two pragmatic reasons. First, we have only put classes in the metamodel
that we have actually used in one of our past modeling projects. Second, the main purpose of
the model engineering phase, which comes after the phase described in this paper, is to bring
preciseness, consistency, and completeness to the model. The modeling environment is more
suitable to do that than the natural language environment. However, it might be possible that
we change the possible Phrase types and Term types due to new insights in later projects.
    The above discussion only pertains to the integration of GA in MuDForM. We think that
similar constructs should be applied when the support for GA is integrated in other modeling
methods. The following describes the general aspects of such an integration.
    On the metamodel. The presented metamodel (Fig. 3) has concepts that are specific for GA,
which are related to the MuDForM modeling concepts via the classes Phrase type and Term
type. For an other method, other phrase types and term types may be used. For example, most
domain modeling methods do not have a primary modeling concept for specifying behavior,
like the domain activity concept in MuDForM. They just model classes, attributes, and relations
between classes, and often capture behavior in class operations or in generic data-oriented
operations like create, update, and delete.
    On the notation. The case study uses tables and plain text for the notation. MuDForM itself
does not prescribe a specific notation. When GA is integrated with another modeling method,
it is possible to choose a notation that is close to the existing notation of that modeling method.
    On the method steps. The four main steps of the MuDForM method flow (Fig. 2a cf. page 6)
can be generalized into: Scoping, Discovery and Elicitation (for capturing specific knowledge
from a knowledge source), Switch to modeling, and Model engineering. In general, the GA step
can (partially) replace Discovery and elicitation step from another method. Having different
modeling concepts may also imply that the step of switching from text to model will differ.
    On the guidelines. Guidelines can be reused as is. But if other phrase types and term types
are identified, which is very likely, the guidelines might must be adjusted too.
9. Related Work
Deckers and Lago performed a SLR on domain-oriented specification techniques [3]. It identified
several approaches that extract models from text [14, 15, 4, 16, 17, 18, 19, 20]. None of them,
however, provides a metamodel for GA.
   MuDForM is based on the KISS method [4], which is the only approach from the mentioned
SLR with an explicit phase and concepts for GA, and a distinction between domain and feature.
It however does not provide a metamodel, fine-grained method steps, or guidelines.
   Abirami et al. [20] give guidelines for conceptual modeling of non-functional requirements.
They overlap with the MuDForM guidelines for extracting phrases, but do not distinguish an
explicit intermediate step for GA.
   Arora et al. [15] present an approach for extracting domain models from natural- language
requirements. They give guidelines for creating classes, associations, and attributes from
sentences. Some of those guidelines are also present in MuDForM. The main difference is that
they do not distinguish behavioral concepts, such as the domain activity concept in MuDForM,
and do not distinguish between domain, feature, and context.
   Elbendak et al. [16] describe an approach for automatic generation of class diagrams from
use case descriptions. They solved the issue of multiple binary associations representing one
action by using n-ary associations. However, they too do not distinguish between domain,
feature, and context, and let the creation of a class in the target model depend on the number of
occurrences that its corresponding noun has in the text. The same holds for the paper from
Sagar and Abirami [17], which reuses and improves many of the rules given by Elbendak, and
introduces a clear distinction between a strict text-to-model transformation and suggesting
model candidates. However, it is limited to models that can be captured fully in standard UML
class diagrams. Ibrahim and Ahmad [18] introduce a tool for the automatic extraction of class
diagrams from textual requirements, which follows many of the rules from the other papers.
Compared to MuDForM, these approaches loose semantics in the transition from text to model,
regarding the different specifications spaces (domain, feature, context) and the way behavior is
captured. Repairing this semantic loss in the model would require to go back to input text to
perform the GA anyway.
   Although we are open to automating part of the text-to-model process, we think that the
involvement of domain experts in the GA process is essential. They do not only provide missing
information and help to eliminate homonyms and synonyms, but often feel more comfortable
with discussing natural language sentences than with discussing graphical models, which mostly
have their own specific metamodel. The paper from Hoppenbrouwers et al. [19], which is based
on the KISS method [4], makes a claim for partially automating the text-to-model phase, such
that domain experts are still actively involved via natural language. MuDForM also supports the
involvement of domain experts via the verbalization of models in natural language, which is also
addressed by Proper et al. [14], Kristen, [4], and Hoppenbrouwers et al. [19]. The method steps
List the final phrases, Identify candidates, and Create initial models can easily be automated. In
an experiment, we have tried to automate the step Extract phrases. However, we observed that
this leads to an abundance of irrelevant phrases, which costed more effort to discard, than the
time it saved compared to doing the extraction manually.
   There are more papers about the transformation of text into models, e.g., the 20 primary studies
in the SLR of Yue et al. [21]. They all have in common that they focus on the transformation
from text to model, but do not consider an explicit model engineering phase with similar main
principles as MuDForM. For example, they do not separate domain, feature, and context, and
they do not have modeling concepts for integrating static and behavioral properties in a model.
However, some of the studies might contain useful guidelines for the text-to-model phase of
MuDForM, which we will investigate.


10. Conclusion and Future Work
This paper describes the MuDForM methodical support for converting a text into an initial
model, and reports on an industrial case study.
   In doing so, we observe that the defined metamodel and method steps are quite mature, as
we did not detect relevant knowledge from the case text that we could not capture. But the
guidelines are far from complete, because we easily found new ones during the relatively small
case study.
   The results from our study fill an important gap in the state of the art, which to the best of
our knowledge lacks in providing methodical support in the first place. It lays the foundation
for our future work on building a validated and reusable set of guidelines, for which we
plan the following: (i) Building a community that actively validates, identifies, and manages
guidelines. (ii) Conducting a literature review to find, and analyze guidelines from natural
language processing approaches, e.g., the primary studies from [21] et al., to possibly integrate
in the GA step of MuDForM.
   To facilitate industrial adoption, we plan to create a MuDForM handbook for practitioners,
and manage its evolution via an open platform, as replacement of the document that contains
the method definition [8]. We are currently investigating the requirements and possibilities for
a modeling tool that supports MuDForM, in order to replace MS Word and Enterprise Architect
[22].


References
 [1] T. Kosar, S. Bohra, M. Mernik, Domain-specific languages: A systematic mapping study,
     Information and Software Technology 71 (2016) 77–91.
 [2] G. Czech, M. Moser, J. Pichler, A systematic mapping study on best practices for domain-
     specific modeling, Software Quality Journal (2019) 1–30.
 [3] R. Deckers, P. Lago, Systematic literature review of domain-oriented specification
     techniques, Journal of Systems and Software (2022) 1–23. doi:10.1016/j.jss.2022.
     111415.
 [4] G. Kristen, Object Orientation, The KISS Method, From Information Architecture to
     Information System, Addison Wesley, 1994.
 [5] J. Iivari, J. R. Venable, Action research and design science research-seemingly similar
     but decisively dissimilar, in: ECIS 2009 Proceedings, 2009. URL: https://aisel.aisnet.org/
     ecis2009/73/.
 [6] K. Petersen, C. Gencel, N. Asghari, D. Baca, S. Betz, Action research as a model for industry-
     academia collaboration in the software engineering context, in: Proceedings of the 2014
     international workshop on Long-term industrial collaboration on software engineering,
     2014, pp. 55–62.
 [7] K. Petersen, C. Gencel, N. Asghari, S. Betz, An elicitation instrument for operationalising
     gqm+ strategies (gqm+ s-ei), Empirical Software Engineering 20 (2015) 968–1005.
 [8] R. Deckers, MuDForM Method definition, Technical Report, Atom Free IT, online at
     https://github.com/robertdeckers/MuDForM, 2022.
 [9] J. Smart, BDD in Action: Behavior-driven development for the whole software lifecycle,
     Simon and Schuster, 2014.
[10] K. Kronlöf, Method integration, concepts and case studies, John Wiley and Sons, 1993.
[11] R. Deckers, D. van den Brand, P. Lago, Modeling features in terms of domain models: MuD-
     ForM method definition and case study, https://research.vu.nl/en/publications/modeling-
     features-in-terms-of-domain-models-mudform-method-defini, Under submission.
[12] R. Deckers, From text to model for the SBD History, Technical Report, Atom Free IT, online
     at https://github.com/robertdeckers/CaseStudySBDHistory, 2022.
[13] OMG, Unified Modeling Language Version 2.5.1, Technical Report, OMG, 2017. URL:
     https://www.omg.org/spec/UML/2.5.1/pdf.
[14] H. A. Proper, A. I. Bleeker, S. J. B. A. Hoppenbrouwers, Object–role modelling as a domain
     modelling approach, in: Proceedings of the Workshop on Evaluating Modeling Methods
     for Systems Analysis and Design (EMMSAD‘04), 2004, pp. 317–328.
[15] C. Arora, M. Sabetzadeh, L. Briand, F. Zimmer, Extracting domain models from natural-
     language requirements: Approach and industrial evaluation, in: Proceedings of the
     ACM/IEEE 19th International Conference on Model Driven Engineering Languages and
     Systems, ACM, 2016, pp. 250–260.
[16] M. Elbendak, P. Vickers, N. Rossiter, Parsed use case descriptions as a basis for object-
     oriented class model generation, Journal of Systems and Software 87 (2011) 1209–1223.
[17] V. B. V. Sagar, S. Abirami, Conceptual modeling of natural language functional require-
     ments, Journal of System and Software 88 (2014).
[18] M. Ibrahim, R. Ahmad, Class diagram extraction from textual requirements using natural
     language processing techniques, in: 2nd International Conference on Computer Research
     and Development (ICCRD’10), IEEE, 2010, pp. 200–204.
[19] B. van der Vos, J. Hoppenbrouwers., S. Hoppenbrouwers, Nl structures and conceptual
     modelling: the kiss case, in: Applications of Natural Language to Information Systems:
     Proceedings of the Second International Workshop, 1996, p. 197.
[20] S. Abirami, G. Shankari, S. Akshaya, M. Sithika, Conceptual modeling of non-functional
     requirements from natural language text, in: L. C. Jain, H. S. Behera, J. K. Mandal, D. P.
     Mohapatra (Eds.), Computational Intelligence in Data Mining - Volume 3, Springer India,
     New Delhi, 2015, pp. 1–11.
[21] T. Yue, L. C. Briand, Y. Labiche, A systematic review of transformation approaches between
     user requirements and analysis models, Requirements engineering 16 (2011) 75–99.
[22] Sparx Systems, Enterprise architect version 15.2, https://sparxsystems.com/products/ea/,
     2021. Accessed: 2021-08-19.