<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Application of process-oriented case-based reasoning to the delimitation of mobile genetic elements in bacterial chromosomes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Toufik Hamadouche</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Université de Lorraine</institution>
          ,
          <addr-line>CNRS, LORIA, F-54000 Nancy</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Université de Lorraine, INRAE, DynAMic</institution>
          ,
          <addr-line>F-54000 Nancy</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This PhD project investigates the use of process-oriented case-based reasoning (PO-CBR) to formalize and automate expert reasoning in the field of microbiology. The goal is to represent expert reasoning as structured process cases, where each case encodes a sequence of reasoning steps that can be reused to solve new problems. The method is applied to the problem of delimiting mobile genetic elements (MGEs) in bacterial genomes, a task that traditionally relies on manual biological expertise. By formalizing this reasoning using PO-CBR, we build a case base that can be reused to delimit diferent MGEs. This approach integrates adaptation mechanisms to handle failures and adjust reasoning. Initial application of the case base on 254 manually annotated MGEs in 124 bacterial genomes show a high success rate (96.8% of elements have been correctly delimited). This study demonstrates the feasibility of encoding biological expertise into a structured automated reasoning system that ofers a reliable alternative to identify MGEs in bacterial genomes.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;PO-CBR</kwd>
        <kwd>application to genomics</kwd>
        <kwd>identification of mobile genetic elements</kwd>
        <kwd>knowledge representation</kwd>
        <kwd>expert reasoning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Case-Based Reasoning (CBR [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) is an approach that solves new problems by relying on past experiences,
each of these experiences is represented by a case. In this research, I use Process-Oriented Case-Based
Reasoning (POCBR [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]), an extension of CBR in which these experiences are presented as sequences
of steps. The aim of my thesis is to formalize the biological expertise in the form of episodes using the
PO-CBR approach, thus capturing and reusing the knowledge of biologists in process cases. The final
objective is to develop an explanatory system based on expert knowledge defined by biologists, which
improves both the precision and eficiency of the task, while also ensuring the explainability of the
decisions made.
      </p>
      <p>
        This approach is applied to a microbiological problem: the precise delimitation (or identification)
of Mobile Genetic Elements (MGEs) in bacterial chromosomes. MGEs are DNA segments capable of
moving between bacteria through the mechanism of bacterial conjugation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Their precise delimitation
is important, as they frequently disseminate antibiotic resistance and virulence genes at high rates,
with a significant impact on human health. Currently, this task relies on the manual expertise of
biologists [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. Although previous work has addressed the automatic delimitation of MGEs [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], it
often lacks precision in identifying the exact start and end positions of these elements in bacterial
genomes and lacks explainability of the results obtained. A paper about this work has been accepted to
ICCBR-2025.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Research Plan</title>
      <sec id="sec-2-1">
        <title>2.1. Biological background on mobile genetic elements</title>
        <p>Mobile Genetic Elements (MGEs) are DNA segments that are integrated into the bacterial genome.
These elements can excise themselves from a bacterium, transfer, and then reintegrate into the genome
of another bacteria. A microbiology expert has studied for years these elements, and through his
observations, he was able to establish the knowledge required to delimit MGEs —that is, his
reasoning experience to found boundaries of MGEs. This reasoning follows a sequence of steps based on
genomic characteristics of these elements, such as the integrase gene (genes responsible for excision
and integration of MGEs) and the target gene (or the insertion sites) where the element integrates.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Research Objectives</title>
        <p>Our research aims to automate the delimitation of mobile genetic elements (MGEs) by formalizing the
reasoning of a microbiology expert into reusable process cases using the PO-CBR approach. To achieve
this, the following objectives have been defined:
1. Capture expert reasoning through the formalization of representative MGE delimitation cases
as process cases, based on integration mechanisms involving integrase and target gene classes.
2. Define a case representation model , where each problem is described by structured attributes
(e.g., integrase class, target gene class), and each solution is a sequence of reasoning steps applied
by the expert.
3. Implement a retrieval mechanism to identify and reuse a relevant process case from the case
base to solve a new delimitation problem.
4. Develop and integrate adaptation mechanisms that can adjust retrieved solutions when
direct reuse fails (e.g., when DRs difer at the nucleotide level but match at the amino acid level).
5. Evaluate the system on a collection of previously annotated MGEs to measure its ability to
reproduce expert-level delimitation results.
6. Ensure explainability at both system and user levels, by making the reasoning steps traceable
and enabling the triggering of rule-based adaptations based on failure analysis.</p>
        <p>The purpose of these objectives is to provide a system of reasoning that is structured and adaptable
for precise MGE delimitation.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Approach / Methodology</title>
        <p>Definition of the case base. The modeling of the microbiology expert’s knowledge as process cases
led to the establishment of the case base (CB). A source case (element of CB) is a pair (x, y), where x
denotes a source problem, and y is its corresponding solution. A delimitation problem x is formalized
by three attributes:
• idGenomes: bacterial genome identifiers characterizing a bacterial strain;
• targetGeneClass: the target gene class where the mobile element is integrated
• integraseClass: the integrase class involved in the excision and integration processes of an</p>
        <p>MGE.</p>
        <p>The solution y applied by the expert to solve the problem x corresponds to an ordered sequence of
steps and knowledge about the MGE to be delimited —for example, the number of genes between the
integrase and target genes, and the potential size of the MGE boundaries. The first two steps in the
delimitation process consist of: (1) identifying the integrase genes specified in the attributes of the MGE
delimitation problem within the bacterial genome and (2) locating the target gene, also specified in the
problem attributes, within the bacterial genome. Currently, the process case solution y is implemented
as a sequence of reasoning steps defined in Python. Each step corresponds to an operation such as
locating an integrase, validating gene distance, or identifying boundary DRs. Ongoing work will extend
this structure into a graph-based representation to allow conditional paths and clear process modeling.</p>
        <p>A delimitation result for an MGE, denoted by , is obtained by executing y on x:  = y(x). 
includes the genomic coordinates of the element’s integrase and target gene, as well as the genomic
positions of the left and right boundaries, which flanks and precisely defines the position of the element
within the bacterial genome.</p>
        <p>Each case (x, y) is defined from discussion with the microbiology expert, who describes a concrete
example of MGE delimitation. From this discussion, the problem x and the reasoning process he
applied, represented by the solution y, are formalized.</p>
      </sec>
      <sec id="sec-2-4">
        <title>Retrieval and reuse of a source case to solve a target problem. To solve a new delimitation</title>
        <p>problem xtgt, a retrieval mechanism selects in CB a relevant process case (x, y) to the problem xtgt.
This retrieval mechanism relies on the similarity between the attributes defining x and xtgt.</p>
        <p>The reuse of a process case (x, y) ∈ CB consists in applying y to xtgt: y(xtgt) is computed, and
then there are two possibility:
• Either the computation y(xtgt) succeeds and returns the delimitation of MGEs for the target
problem (tgt).
• Or it returns a failure: the application of y does not succeed to solve xtgt. In such cases, it may
be possible to adapt the source case so that it solves xtgt. This adaptation step is currently under
consideration: a few adaptation rules have already been acquired to improve the system’s reuse
capabilities. Adaptation in our system applies to the solution process y, not to the source case
itself, and is triggered only when the process fails to handle xtgt.</p>
        <p>Failures during case reuse are automatically detected at execution time. For example, if an MGEs
limits is not found or the target gene is missing, a failure message is generated. These messages are
used to trigger adaptation rules, when available. The adaptation rules are designed based on recurring
failure patterns observed during testing, and their application is guided by the type of failure identified.
Evaluation of the system. At the start of the acquiring and modeling of process cases, the expert
provided 291 informal cases, each corresponding to a distinct MGE to be delimited. From these, 37
process cases were selected to constitute the case base (CB). The selection is based on biological diversity
criterion of MGEs, ensuring a broad coverage of integration mechanisms. The aim was to maximize the
coverage of the remaining 291 − 37 = 254 cases, representing the test base (TB). These informal cases
refer to real MGE delimitation problems manually annotated by the expert prior to the formalization
phase. They represent ground truth examples used to test the system’s ability to reproduce expert-level
reasoning.</p>
        <p>
          A total of 246 elements (96.8%) were delimited precisely by the system, with only 8 (3.2%) exhibiting
delimitation failures unrelated to the case base. However, the developed approach’s key limitation is its
reliance on external genomic annotations (public databases). External genomic databases (NCBI) and
tools such as ICEscreen [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] are used to extract gene annotations (e.g., integrases, tRNA genes), which
are required to execute the reasoning steps in each process case. The system depends on the availability
and accuracy of these annotations.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Progress Summary</title>
      <p>A PO-CBR system has been implemented, allowing the reuse of process cases on new delimitation
problems. The case base is applicable and the reuse mechanism works eficiently on bacterial strains
of the Streptococcus group. Current work focuses on the adaptation of process cases. The objective
is to improve reuse on Streptococcus strains and extend the approach to other bacterial groups. This
requires further acquisition and formalization of expert knowledge, specifically for adaptation rules.
We are also investigating the XAI (eXplainable Artificial Intelligence) dimension of the system. We
consider explainability at two levels: (1) for the user, to understand the reasoning steps behind a result,
and (2) for the system itself, since adaptation rules are triggered by explanations generated from failure
analysis. For example, when a process case fails on a new problem, the system displays a detailed trace
of the failed step (e.g., unmatched limts or missing target gene), and uses this information to trigger an
appropriate adaptation rule, if available.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and Future Work</title>
      <p>This work demonstrates the relevance of PO-CBR for automating the delimitation of mobile genetic
elements (MGEs) in bacterial genomes, a task traditionally performed through manual expert analysis.
The developed approach retrieves and applies process cases according to problem similarity, and
integrates adaptation mechanisms where required. Currently, the PO-CBR system has shown high
delimitation accuracy on bacterial strains from the streptococcus group, precisely identifying MGE
boundaries while making the reasoning process explainable.</p>
      <p>In the future, we will focus on extending the approach to other bacterial groups and new types of
MGEs. A key objective is to define adaptation rules, in order to better handle cases not initially covered
by the current case base. Improvements will also be made to the external data used, by integrating more
comprehensive external data sources and balancing the expansion of the case base with the evolution
of adaptation rules. Process cases are currently implemented as Python files. This implementation will
be refined by integrating a graphical representation for intuitive modification and export to Python for
execution.</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Perplexity.ai exclusively as a research assistant to
locate MGEs biology background information, and Reverso for grammar, spelling checks and sentence
reformulation. All content was reviewed and verified by the authors, who take full responsibility for
the final publicatio</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C. K.</given-names>
            <surname>Riesbeck</surname>
          </string-name>
          , R. C.
          <article-title>Schank, Inside Case-Based Reasoning</article-title>
          , Lawrence Erlbaum Associates, Inc.,
          <string-name>
            <surname>Hillsdale</surname>
          </string-name>
          , New Jersey,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Minor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Montani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Recio-García</surname>
          </string-name>
          ,
          <article-title>Process-oriented case-based reasoning</article-title>
          ,
          <source>Information Systems</source>
          <volume>40</volume>
          (
          <year>2014</year>
          )
          <fpage>103</fpage>
          -
          <lpage>105</lpage>
          . URL: https://doi.org/10.1016/j.is.
          <year>2013</year>
          .
          <volume>06</volume>
          .004. doi:
          <volume>10</volume>
          .1016/j.is.
          <year>2013</year>
          .
          <volume>06</volume>
          .004.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <article-title>Generalization of Workflows in Process-Oriented Case-Based Reasoning</article-title>
          , in: I.
          <string-name>
            <surname>Russell</surname>
          </string-name>
          , W. Eberle (Eds.),
          <source>Proceedings of the 28th International Florida Artificial Intelligence Research Society Conference (FLAIRS-28)</source>
          , AAAI Press, Hollywood, Florida, USA,
          <year>2015</year>
          , pp.
          <fpage>391</fpage>
          -
          <lpage>396</lpage>
          . URL: https://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS15/paper/view/10437.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Bellanger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Payot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Leblond-Bourget</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. Guédon.</surname>
          </string-name>
          ,
          <article-title>Conjugative and mobilizable genomic islands in bacteria: evolution and diversity</article-title>
          ,
          <source>FEMS Microbiology Reviews</source>
          <volume>38</volume>
          (
          <year>2014</year>
          )
          <fpage>720</fpage>
          -
          <lpage>760</lpage>
          . URL: https://doi.org/10.1111/
          <fpage>1574</fpage>
          -
          <lpage>6976</lpage>
          .12058. doi:
          <volume>10</volume>
          .1111/
          <fpage>1574</fpage>
          -
          <lpage>6976</lpage>
          .12058. arXiv:https://academic.oup.com/femsre/article-pdf/38/4/720/18147733/38-4-720.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ambroset</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Coluzzi</surname>
          </string-name>
          , G. Guédon,
          <string-name>
            <surname>M.-D. Devignes</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Loux</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Payot</surname>
            ,
            <given-names>N. LeblondBourget.</given-names>
          </string-name>
          ,
          <article-title>New insights into the classification and integration specificity of streptococcus integrative conjugative elements through extensive genome exploration, Frontiers in Microbiology 6 (</article-title>
          <year>2015</year>
          )
          <article-title>1483</article-title>
          . doi:
          <volume>10</volume>
          .3389/fmicb.
          <year>2015</year>
          .
          <volume>01483</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Coluzzi</surname>
          </string-name>
          , G. Guédon,
          <string-name>
            <surname>M.-D. Devignes</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Ambroset</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Loux</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Payot</surname>
            ,
            <given-names>N. LeblondBourget.</given-names>
          </string-name>
          ,
          <article-title>A Glimpse into the world of integrative and mobilizable elements in streptococci reveals an unexpected diversity and novel families of mobilization proteins, Frontiers in Microbiology 8 (</article-title>
          <year>2017</year>
          ). URL: https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.
          <year>2017</year>
          .00443/ full. doi:
          <volume>10</volume>
          .3389/fmicb.
          <year>2017</year>
          .
          <volume>00443</volume>
          , publisher: Frontiers.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Deng</surname>
          </string-name>
          , H.-Y. Ou.,
          <source>ICEberg 2</source>
          .
          <article-title>0: an updated database of bacterial integrative and conjugative elements</article-title>
          ,
          <source>Nucleic Acids Research</source>
          <volume>47</volume>
          (
          <year>2018</year>
          )
          <fpage>D660</fpage>
          -
          <lpage>D665</lpage>
          . URL: https://doi.org/10.1093/nar/gky1123. doi:
          <volume>10</volume>
          .1093/nar/gky1123. arXiv:https://academic.oup.com/nar/article-pdf/47/D1/D660/27437376/gky1123.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lacroix</surname>
          </string-name>
          , G. Guédon,
          <string-name>
            <given-names>C.</given-names>
            <surname>Coluzzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Payot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Leblond-Bourget</surname>
          </string-name>
          , H. Chiapello,
          <article-title>ICEscreen: a tool to detect Firmicute ICEs and IMEs, isolated or enclosed in composite structures</article-title>
          ,
          <source>NAR Genomics and Bioinformatics</source>
          <volume>4</volume>
          (
          <year>2022</year>
          )
          <article-title>lqac079</article-title>
          . URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9585547/. doi:
          <volume>10</volume>
          .1093/nargab/lqac079.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>