<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>EVIR: Workshop on AI for evidential reasoning, December</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>AI for Evidential Reasoning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ludi van Leeuwen</string-name>
          <email>l.s.van.leeuwen@rug.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roos Schefers</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bart Verheij</string-name>
          <email>bart.verheij@rug.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Evidential Reasoning, Bayesian Networks, Belief updating, Hypotheses, Scenarios</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>9</institution>
          ,
          <addr-line>9747 AG Groningen</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Bernoulli Institute for Mathematics</institution>
          ,
          <addr-line>Computer Science and Artificial Intelligence</addr-line>
          ,
          <institution>University of Groningen.</institution>
          <addr-line>Nijenborgh</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Information and Computing Sciences, Utrecht University.</institution>
          <addr-line>Princetonplein 5, 3584 CC Utrecht</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>9</volume>
      <issue>2025</issue>
      <abstract>
        <p>We summarize the first workshop of AI for Evidential Reasoning (AI4EVIR), held on December 9, 2025 in Turin, Italy. The workshop was co-located with JURIX 2025, The 38th International Conference on Legal Knowledge and Information Systems. Reasoning with evidence to establish relevant facts lies at the heart of legal reasoning. Technologies that allow us to reason about facts are evolving, and new kinds of evidence are becoming available (for example, digital forensic science). At the same time, reasoning with evidence is a dynamic and complex process: practitioners must decide what evidence to collect and how to interpret it amid vast amounts of data. These choices in selecting evidence and the following reasoning with evidence are complex tasks that may benefit from standardization and assistance by AI, for example, to avoid probabilistic and other fallacies. The AI for Evidential Reasoning (AI4EVIR) workshop aimed to bring together researchers working on evidential reasoning, in the broadest sense, as well as those with expertise in forensic science and evidence evaluation, in order to share their progress on handling various problems. As well as to foster discussion and exchange between theoretical and applied perspectives on how AI can contribute to evidential reasoning in legal and investigative contexts.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>https://www.ai.rug.nl/~verheij/ (B. Verheij)
© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
1The CFP, deadlines, PC, and schedule of the workshop can also be found at https://aludi.github.io/AI4EVIR/.</p>
      <p>Henry Prakken presented on Bayesian reasoning under rare events, whose presentation
focused on how picking reference classes or propositions needs to be done carefully and
transparently, in order to avoid (the appearance of) fallacious reasoning.</p>
      <p>In the second session, Bertram Ludäscher presented a proposal for a chain of LLMs for
trustworthiness, where LLMs together would create an argumentation graph, which could
then serve as an artifact for further discussion. The artifact serves as an externally verifiable
evidence, increasing the trustworthiness of the model.</p>
      <p>Then, Leya Hampson continued with a presentation on the independent creation by two
diferent modelers of a BN model of an entire case, presenting the diferences within modelers
in going from the qualitative structure of the Bayesian network to quantifying the BN.</p>
      <p>Continued, Daira Pinto Prieto discussed advances in a qualitative logic framework for
reasoning with evidence, by adding work on the certainty-dominance of evidence.
Certaintydominance is a novel method to compare sets of evidence.</p>
      <p>Helen Qiao presented work on comparing human and LLM updating on evidence from the
perspectives of the defense and prosecution, and showed that these are conservative Bayesian
updaters. LLMs were shown to exhibit a recency bias in various roles and presentation styles.</p>
      <p>The talks concluded with a presentation by Henrik Palmer Olsen on evaluating credibility
assessments of Danish asylum cases with LLMs, covering the Prompt Valley of Death, an
LLM-based categorization of credibility assessments, and an analysis of temporal changes in
credibility assessments.</p>
      <p>Throughout the talks, main themes that arose were the importance of practice and the extent
to which theory should be applied to it; specifying propositions; and whether the study of
modeling reasoning with evidence should be a descriptive or prescriptive practice. These themes
were further discussed by participants in the wrap-up at the end of the workshop.</p>
      <p>The Program Committee (PC) received a total of 9 submissions. Following a single-blind
reviewing process, each paper was peer-reviewed by at least two PC members. The committee
decided to accept 9 papers, containing original work. One paper was withdrawn due to logistical
issues. Two papers are not included in the proceedings on request of the authors. Of one accepted
paper, the authors were not able to make it to the workshop. There were 7 presentations and
one keynote at the workshop, and there are 6 papers in the proceedings</p>
      <p>The specifics of the program were as follows.</p>
    </sec>
    <sec id="sec-2">
      <title>Keynote</title>
      <p>• Marouschka Vink - the evaluation of digital findings in forensic casework</p>
    </sec>
    <sec id="sec-3">
      <title>Paper presentations</title>
      <p>• Federico Costantini, Fausto Galvan, Francesco Crisci, Luca Baron and Pier Luca
Montessoro - The Quality Assessment of LLM in Digital Forensics
• Anne Ruth Mackor and Henry Prakken - On Reporting Likelihood Ratios of Exhaustive and</p>
      <p>Non-Exhaustive Hypotheses about Rare Events in Criminal Cases
• Shawn Bowers and Bertram Ludäscher - Towards Trustworthy AI Results using Evidence</p>
      <p>Structures: From Certificates to Argumentation Frameworks
• Leya Hampson and Ludi van Leeuwen - Investigating the value of qualitative Bayesian
networks of complete cases as “double-check” tools on traditional judicial reasoning: An
exploratory study
• Aybüke Özgün and Daira Pinto Prieto - A Qualitative Logic for Uncertain Evidence and</p>
      <p>Belief Comparison
• Mengxuan Helen Qiao, Vanessa Cheung, Leya Hampson and David Lagnado - Recency
Efects, Cautious Convictions, and Conservative Updating in GPT-4o’s Legal Decisions (not
included in proceedings)
• Henrik Palmer Olsen, Mohammad N S Jahromi, Frederik Bay-Jørgensen, Thomas B
Moeslund and Thomas Gammeltoft-Hansen - Managing Fuzziness: Leveraging LLMs for
Discovering Credibility Indicators in Asylum Cases (not included in proceedings)
• Mario Guenther and Conrad Friedrich - Probabilifying the Scenario Approach to Legal</p>
      <p>Proof (Unable to present)
Organization
Workshop Chairs
• Ludi van Leeuwen, University of Groningen
• Roos Schefers, Utrecht University
• Bart Verheij, University of Groningen
Program Comittee
The organisation of the workshop was supported by the Hybrid Intelligence Center, a 10-year
programme funded by the Dutch Ministry of Education, Culture and Science through the
Netherlands Organisation for Scientific Research, https://hybrid-intelligence-centre.nl. The
organisation of the workshop was also supported by the project ‘AI4Intelligence: From Multimodal
Data to Trustworthy Evidence in Court’ with file number KICH1.VE01.20.011 of the research
programme Data and Intelligence, which is partly financed by the Dutch Research Council
(NWO). The organizers would like to thank the JURIX 2025 workshop chairs and organizers for
providing an excellent framework for our workshop.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>