<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Eficient and Efective Approximate Query Answering in Probabilistic DeLP (Thesis Summary)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mario A. Leiva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Departamento de Cs. e Ing. de la Computacion, Universidad Nacional del Sur (UNS) &amp; Inst. de Cs. e Ing. de la Computacion (UNS-CONICET)</institution>
          ,
          <addr-line>San Andres 800, (8000) Bahia Blanca</addr-line>
          ,
          <country country="AR">Argentina</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>0</volume>
      <fpage>6</fpage>
      <lpage>09</lpage>
      <abstract>
        <p>Within the domain of Artificial Intelligence, continuous developments and research across diverse domains are a daily occurrence, with machine learning taking the forefront in recent years. An ongoing challenge in this field is the need for explainability and interpretability. While knowledge-based systems ofer promise in this regard, many systems in various domains grapple with information from multiple heterogeneous sources characterized by diferent levels of uncertainty stemming from gaps in knowledge, over-specification, or inherent unpredictability. Given the intricate nature of these domains, an ideal solution involves automated reasoning systems with human-in-the-loop capabilities. Such systems should possess several key features [1], including the ability to: (i) reason about evidence rigorously; (ii) account for evidence with probabilistic uncertainty; (iii) incorporate logical rules to derive conclusions from evidence; (iv) handle conflicting pieces of information, determining their relevance and; (v) provide an understandable and transparent status report, explaining the rationale behind conclusions (i.e., ensuring explainability and interpretability). The DeLP3E framework [1], tailored to meet these specific needs, is adopted in this work. A DeLP3E knowledge base is comprised of two components: an analytical model (AM) and an environmental model (EM), representing distinct aspects of the domain (cf. Figure 1). The AM models background knowledge, including rules, facts, or presumptions that inform domain knowledge. Conclusions are drawn from this model using defeasible logic programming (DeLP) [2]. In contrast, the EM focuses on probabilistic background knowledge, ensuring consistency by adhering to probabilistic distributions over possible domain states while satisfying constraints and the axioms of probability theory. The EM typically contains data such as evidence, intelligence reporting, or information about actions, software, and systems. DeLP3E efectively accommodates both probabilistic and defeasible uncertainty, allowing for a seamless combination of the two, including annotations of defeasible rules and presumptions with probabilistic events through annotation functions (AF).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Analytical Model (AM)</title>
        <p>Ω ∪ Θ ∪ Δ ∪ Φ
Rules, Facts
Presuamndptions
⋯
Arguments
Dialectical
Process
⋯
a
f</p>
      </sec>
      <sec id="sec-1-2">
        <title>Environmental Model (EM)</title>
        <p>db g ec Wo123rlds TTTa TTTb TTTc ………... TTFj 000(...410755)
hi j 1…024 …F …F …F …… …F 0.…003
Probabilistic Model</p>
        <p>AM output:
All literals with their
warrant status</p>
      </sec>
      <sec id="sec-1-3">
        <title>Annotation Function (af)</title>
        <p>ÿÿ: Ω ∪ Θ ∪ Δ ∪ Φ → ÿā ÿ
ÿÿ 1 = ÿÿĀýĀ
ÿÿ 2 = þ
ÿÿÿ1 = Ā ā ℎ
…</p>
      </sec>
      <sec id="sec-1-4">
        <title>DeLP3E Knowledge Base</title>
        <p>ÿÿ Ă12 ÿ21 ⋯
↝ Ă…1 Ă…3
…</p>
        <p>EM output:
Probability Distribution
over all possible worlds</p>
        <p>DeLP3E output:</p>
        <p>Literals with their
probability intervals in
the DeLP3E KB</p>
        <p>In this model, to answer a query for a literal we need to compute the probability interval
with which it is warranted in the DeLP3E KB. For this, we must sum the probability mass of the
worlds where the queried literal is warranted (warranting scenarios, for the lower bound) and
the mass of worlds that warrant the complement of the query (for the upper bound). The lower
and upper bounds obtained in this manner comprise the probability interval for the query.</p>
        <p>This is one of the main sources of computational intractability, since we must answer the query
either for all the worlds in the EM model or all the programs in the AM model. Addressing such
intractability is the main problem that we address in this paper, which is focused on approximate
query answering in probabilistic structured argumentation based on DeLP3E. The ultimate goal
is to develop a suite of algorithms for tackling the intractability of this task and selection criteria
for choosing the best one based on the analysis of available information.
2. Approximation Methods based on information from the KB
One established approach for addressing the issue of intractability in such situations is to
sample a subset of the solution space, aiming to obtain an approximate solution. Two sampling
techniques were proposed to approximate the exact value. To determine the most suitable
technique for a given scenario, we conducted an analysis of the information within each model,
aiming to establish a range of metrics for assessing the configuration’s characteristics. Once
these metrics were defined, we devised a collection of algorithms capable of leveraging them to
approximate the exact interval.</p>
        <p>
          Regarding sampling techniques, we can consider two types of sampling: world-based and
subprogram-based. In world-based sampling, a subset of the possible EM worlds are chosen and,
based on the annotation function, diferent subprograms of AM are obtained. So, each AM( )
is a classical DeLP program [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] in which we can query for the status of some literal. On the other
hand, subprogram-based sampling divides the universe of all possible subprograms into regions
that represent programs that warrant the query or its complement. Each of these programs
can be generated by multiple worlds through their annotations. In both cases, we also have
those worlds or programs in which the status of the query can be “undecided” or “unknown”.
The objective is to sum the probability value corresponding to the worlds that are warranting
scenarios for the query or its complement. Given the incomplete nature of the process, some
probability mass will remain unexplored (though a sound approximation is guaranteed).
        </p>
        <p>According to the information obtained using the metrics that determine the structure of
each model, we can explore the possible alternatives to derive sampling-based approximation
algorithms and run diferent experiments in diferent configurations. Our ultimate goal is to
develop a set of decision criteria to select the best algorithm for the job, taking into account
the inherent tradeofs between execution time (including the cost of computing any necessary
approximate metrics) and the accuracy of the result obtained.
3. KB Metrics, Master Algorithm, and Benchmark Generators
To base technique selection on the best available data, we analyzed the information available in
each model in order to define a series of metrics that characterize the current configuration.
KB Metrics. For the definition of metrics of the input KB, two main attributes were established:
size and complexity. The first defines a quantitative value in terms of the number of elements of
the model, while the second is related to the dificulty of working and generating structures
associated with the model. The defined metrics are shown below, along with an annotation of
the attribute they measure – () for size and () for complexity, and whether it is possible to
compute it tractably (* ) or approximate it (∼ ):
• Analytical Model (AM):
– #Rules and #Facts: Number of rules and facts in the program, resp. (, * )
– ALE: Average argument length (defeasible rules present in any argument) (, ∼ )
– ℎ: Average height of dialectic trees (, ∼ )
–  : Number of dialectical trees (, ∼ )
• Environmental Model (EM):
– #RandomVars and #PGM_Arcs: Number of random variables and number of arcs in
a probabilistic graphical model (PGM), resp. (, * )
– PGM_TW: Treewidth of a PGM (such as a BN); approximations for this metric can
be computed (, ∼ )
– ent: Entropy of the probability distribution represented by the model (, ∼ )
• Annotation Function (AF): %AF_ann and AF_Comp – Percentage of formulas of the</p>
        <p>AM that are annotated and the complexity of each annotation itself, resp. (, * )
Master Query Answering Algorithm. We defined a series of sampling algorithms according
to the values of the presented metrics: For example, consider a scenario in which we have 10
elements in the AM (#Rules+#Facts) and 50 in the EM (#RandomVars), and AF_Comp is low (only
one variable from EM is used in the annotations, without operators). Here, we can approximate
the probability of a query by sampling subprograms, since the number of subprograms will be
smaller in comparison with the number of worlds (210 &lt; 250), and the annotations associated
with subprograms are simple to evaluate (conjunctions of variables or their negation). On
the other hand, if the annotations are complex (they use several variables, connectors and
negations, for example  (1) =  ∨  → ¬) then sampling by worlds is simpler (even if there
are more elements in the EM than rules in the AM). This is because the task of obtaining the
subprogram is simple (substitute the random variable values according to the sampled world
and evaluating the annotation formula, then query for the literal in the generated program)
compared to computing the probability of a complex expression in the EM.</p>
        <p>Note that the examples mentioned above expose a kind of asymmetry between the AM and
the EM – though given a world it is only possible to generate a single program, a program can
be generated by a set of worlds (since formulas in annotations can have many models). Consider
a case in which the EM is a Bayesian Network with high values of treewidth and entropy. This
leads to slower, more complex, and less guided sampling ; thus, an algorithm to sample worlds
randomly is not recommended, and in this case it is better to decide in a previous step which
worlds to sample (such as weighted sampling given the BN), in order to optimize resources.</p>
        <p>
          Based on the characterization of the size and complexity of DeLP3E models, we can define a
set of criteria to determine diferent levels of complexity, which can be represented as branches
of a decision tree, where the leaf nodes will determine the sampling algorithm to choose. The
purpose of this tree will be to serve as a guide for the selection of the approximation algorithm
based on the value of the metrics and the cost of computing and/or approximating them. This
master query answering algorithm is presented in detail in the thesis [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>Automatic Generation of Benchmarks. After developing approximation methods, the
need arises to evaluate them under diferent conditions of size and complexity to analyze
their behavior and performance. Towards this end, we developed generators designed to yield
synthetic instances of each of the DeLP3E components, and thus be able to create our own
datasets to act as benchmarks in our experiments. In particular, the objective was to generate
DeLP programs, Bayesian Networks (to be used as a probabilistic model in the environment
model) and Annotation Functions that relate the elements of the other two models. Each
component generator was designed with parameters that allowed flexibility and range in terms
of their capacity to yield instances of varied size and complexity, in order to map instances to
each branch of the decision tree.</p>
        <p>Here, we only briefly describe the main aspects that guide the creation of models in each
generator. In the case of analytical model generation, we build DeLP programs in three stages:
(Stage 1) Begins by generating the basic components on which the most complex structures will
be created, here the facts and pressumptions are created; (Stage 2) The arguments are created
following an organization by levels, where each level groups arguments of a certain length
(number of rules that make up the argument) and is constructed using arguments from the lower
levels; and (Stage 3) The defeaters and dialectical trees are generated; for this, the structure
of an argument is taken and its defeater is constructed using a greater number of rules and
supporting the complement of the conclusion of the initial argument. This entire process is
guided by a set of parameters that allow the generation of structures that will later impact the
value of the metrics of the model.</p>
        <p>
          As for the Bayesian network generator, it begins by creating a directed acyclic graph with
as many variables and arcs as required. The conditional probability tables of each variable
are then modiefid to reduce or increase the entropy of the network. Finally, the annotation
function generator consists of taking a subset of AM elements and, for each of them, a logical
formula is created and associated using the EM variables and logical connectors. The detail of
all the parameters that guide these generators, as well as their design and implementation, are
presented in Chapter 4 of the thesis [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>Based on all these developments, several benchmark scenarios were generated and each
approximation method was tested. The results of these tests informed the process of developing
the master approximate query answering algorithm described above.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Conclusions and Future Work</title>
      <p>This work primarily focuses on eficiently computing query answers in probabilistic structured
argumentation formalisms, specifically focusing on the DeLP3E model. This formalism is
instrumental in handling inconsistent, incomplete, and probabilistic information. To address
this challenge, our work introduces mechanisms that approximate answers using algorithms
that make the most of available information from each component of the model.</p>
      <p>Future research will involve a broader empirical evaluation exploring various sampling
methods. The aim is to optimize sampling processes to avoid redundancy and wasted efort.
Additionally, we will investigate the use of diferent probabilistic models, ranging from complex
ones like Markov Logic Networks to simpler ones like sets of pairwise independent random
variables, each ofering unique opportunities for approximations. Finally, we will explore the
application of machine learning techniques to guide sampling during the approximation process.
Acknowledgments. This Ph.D. thesis was carried out at Universidad Nacional del Sur (UNS)
under the supervision of Gerardo I. Simari and Marcelo A. Falappa.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Shakarian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. I.</given-names>
            <surname>Simari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Moores</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Paulo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Parsons</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Falappa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Aleali</surname>
          </string-name>
          ,
          <article-title>Belief revision in structured probabilistic argumentation</article-title>
          ,
          <source>Annals of Mathematics and Artificial Intelligence</source>
          <volume>78</volume>
          (
          <year>2015</year>
          )
          <fpage>259</fpage>
          -
          <lpage>301</lpage>
          . URL: https://doi.org/10.1007%
          <fpage>2Fs10472</fpage>
          -
          <fpage>015</fpage>
          -9483-5. doi:
          <volume>10</volume>
          . 1007/s10472-015-9483-5.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A. J.</given-names>
            <surname>García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. R.</given-names>
            <surname>Simari</surname>
          </string-name>
          ,
          <article-title>Defeasible logic programming: an argumentative approach</article-title>
          ,
          <source>Theory and Practice of Logic Programming</source>
          <volume>4</volume>
          (
          <year>2004</year>
          )
          <fpage>95</fpage>
          -
          <lpage>138</lpage>
          . URL: https://doi.org/10.1017% 2Fs1471068403001674. doi:
          <volume>10</volume>
          .1017/s1471068403001674.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Leiva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shakarian</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. I. Simari</surname>
          </string-name>
          ,
          <article-title>Argumentation-based query answering under uncertainty with application to cybersecurity, Big Data Cogn</article-title>
          .
          <source>Comput</source>
          .
          <volume>6</volume>
          (
          <year>2022</year>
          )
          <article-title>91</article-title>
          . URL: https://doi.org/10.3390/bdcc6030091. doi:
          <volume>10</volume>
          .3390/bdcc6030091.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Leiva</surname>
          </string-name>
          ,
          <article-title>Herramientas Prácticas para Argumentación Estructurada Probabilística con Aplicación a Ciberseguridad</article-title>
          ,
          <source>Ph.D. thesis</source>
          , DCIC,
          <source>Universidad Nacional del Sur (UNS)</source>
          ,
          <year>2022</year>
          . URL: https://repositoriodigital.uns.edu.ar/handle/123456789/6314.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>