<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Cham. https://doi.org/10.1007/978-3-031-61007-3_19
[44] Alter S. (2023). How Can You Verify that I Am Using AI? Complementary Frameworks
for Describing and Evaluating AI-Based Digital Agents in their Usage Contexts.
Proceedings of the 56th Hawaii International Conference on System Sciences | 2023. URI:
https://hdl.handle.net/10125/103277
[45] Alter S. (2022). Understanding artificial intelligence in the context of usage:
Contributions and smartness of algorithmic capabilities in work systems.
International Journal of Information Management. Vol 67</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1109/ICHI54592.2022.00113</article-id>
      <title-group>
        <article-title>GenAI summaries of articles: effective and useful?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Peter Bednar</string-name>
          <email>peter.bednar@ics.lu.se</email>
          <email>peter.bednar@port.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Informatics, Lund University</institution>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>STPIS'25: The 11th International Conference on Socio-Technical Perspectives in IS, STPIS'25</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Computing, University of Portsmouth</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>511</volume>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>There are many discussions promoting the use of GenAI tools to create summaries of research papers and texts both for students as well as academics. These are often mainly focusing on potential benefits such as saving time and effort, including the advice and how-to instructions. Their expected value however is dependent on in how factually reliable and trustworthy generated summaries are, which is a topic that tends to be undermined by vague explorations of limitations and pitfalls. This paper looks into snapshot examples of generated summaries and compares their factual propositions with that of the original sources. Summaries were based on content from 7 articles and 5 books. The systematic evaluation of content is looking at 39 examples of factual errors in propositions including some examples of citation errors. The purpose of the research presented in this paper is to glean both evidence as well as some understanding of the factual reliability and trustworthiness of generated summaries. The paper concludes with a more balanced discussion regarding the use of AI generated summaries.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;AI generated summaries</kwd>
        <kwd>Sociotechnical Perspectives on GPT use</kwd>
        <kwd>1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>this kind of errors. A major issue is that due to the complexity of the subject content It
does take a lot of time, careful reading and study to understand and identify flaws and
errors in factual content. The experiences from this research put serious doubts on how
helpful and assistive the promoted features in areas such as summarizing, transcribing,
proofreading and enhancing accessibility are in reality.</p>
      <p>In this paper we look closer into the content of GPT generated summaries of academic
texts and articles. In particular we were specifically interested in what these texts said in
relation to cybersecurity. We compare the factual content of summaries with the factual
content of the academic texts. Then we show examples of deviations and errors for each of
the summaries studied.</p>
      <p>Prompts were used with careful instructions and supporting information for the key
cybersecurity topics and sources. The main question was whether or not GPT could
produce factually reliable summaries of the inquired topics without introducing factual
inaccuracies.</p>
      <p>Each of the GPT generated summaries was carefully read, several times and the content
were systematically compared with the original source. Simple differences in wording
were ignored and some acceptance of vagueness was tolerated as the main focus of this
study is on potential issues with relying on the subject specific and key [factual] content of
generated summaries of texts. As this would have consequences for academics as well as
students using this kind of technology to purposefully and with an explicit focus,
summarize text on subjects which they are not intrinsically familiar with or have deep
knowledge of already.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        The idea of using AI as personal assistant as help and support also for learning and
exploration of intellectual task is not new. Related research has been around for quite
some time [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6 ref7 ref8">1, 2, 3, 4, 5, 6, 7, 8</xref>
        ]. Including but not limited to: virtual personal assistant as
discussion partner, idea generation and exploration, exploring ambiguity and uncertainty in
complex problem spaces, as well as developing logical models to support such an agenda.
There has also been an ongoing discussion regarding both potential benefits as well as
pitfalls and concerns [13, 14, 15, 16, 17, 18]. The necessity to develop use of technology as
opposed to develop technology and use it is something well known and part of the
sociotechnical agenda and perspective [22]. It is important to understand the phenomena that is seen as
a problem, but complex problems are not easy to explore in the real world [19, 20, 21, 23].
This includes the reality that purpose for using a tool such as GPT to generate summary may
or may not be politically correct. In section 3 we explore 26 deviation of factual content from
7 articles [24, 25, 26, 27, 28, 29, 30] and in section 4 we explore 13 deviation of referencing and
factual content from 5 books [31, 32, 33, 34, 35].
3. Deviation in factual / information content from sources
In this section we present examples of issues identified in the GPT generated summaries
of articles. The summaries were intended to be generated with best possible accuracy and the
purpose was to make the content indistinguishable from what it would be if they were
created by a student or academic. There is plenty of instructions and advice available regarding
how GPT can be used to generate such summaries, including strategies on how to minimize
factual errors [or ‘hallucinations’] in generated texts [9, 10, 11, 12].
      </p>
      <p>The process was systematic. First the texts were selected on the basis of being judged
as relevant while at the same time having different characteristics such as length,
perspective, complexity and focus. Second, all articles [24-30] were explicitly made
available [in this experiment both via link as well as text copied and pasted]. Third
prompts were developed [mostly variations of “Summarize this text in 200 words,
focusing on the main arguments and conclusions related to cybersecurity”]. Explicitly the
sources were to be referenced in Harvard APA format (including page numbers for
citations).</p>
      <p>For the book by Enid Mumford [32] GPT was asked to summarize each chapter
separately and then the summaries to be summarized together. This book, as well as the
book by Peter Bednar [31] was also available as electronic source text. The purpose was to
minimize the complexity in the task in the hope that this would also minimize the
potential for factual errors in the output. Each of the generated summary investigated in
this paper was rather short and while some of the summaries created were larger than
200 words they tended to be approximately not much more than 500 words in total. The
books by Peter Checkland et al [33; 34; 35] were not made available in electronic format, but
they are commonly referenced in many sources and descriptions online, and in our
experiment GPT did not hesitate to make a summary of these books, even without having access to
the actual complete source text of the books.</p>
      <p>Each summary was the read more than once and each source was also re-read several
times in a systematic way. First the summary was read, the main factual propositions were
looked at one at a time, each time the source was read carefully in effort to find the
original proposition supporting the factual statement. For each of the statements when a
deviation was identified the source was again looked at to confirm whether or not the factual
statement in the source had been missed. Due to the difficulty in verifying and validating
information content in potentially complex narratives such as academic texts, the reading and
study of the source text required significant effort and time even when the subject is well
known to the reader. The review of sources was done more than three times for each of the
deviations identified.</p>
      <p>If the exact factual proposition was not identified [including alternative wording,
synonyms etc.] then the source was read again this time purposefully looking for and
trying to identify statements with potentially similar meaning. This as it is possible that
factual propositions were expected to have been re-phrased and reworded. Not all issues and
errors have been made explicit, as the goal was not to make a comprehensive inquiry into
each summary, as it was enough to identify some examples of serious academic issues in the
material output.</p>
      <p>
        Table 1 – 7, below focus on article summaries and provide a collection of examples of
issues identified and listed including description of deviation and source. The error
example categories are the following:
 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] Factual incorrectness;
 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] misinterpretation;
 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] missed information content;
 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] partial incorrectness of information content;
 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] fabrication of information content.
      </p>
      <p>Those listed in the tables are not a comprehensive outline of issues, but examples of
issues for each case.</p>
      <p>Source 2 [25]
Source: Does not specifically
mention cyberthreats and data leaks
at all, and does not talk about data
breaches in this context. [4 partial
incorrectness and 5 fabrication]
If hacked, this could suggest incorrect dosages Source: Not correct reference, an
for heart-pacing, causing potential fatalities for example of original source for this
patients (Strielkina et al., 2018). statement is: J. Finkle, "U.S.
government probes medical devices
for possible cyber flaws", Reuters,
2014. [Online]. Available:
https://www.reuters.com/article/uscybersecuritymedicaldevicesinsight/u-sgovernment-probes-medicaldevices-for-possiblecyber-flaws/ [1
factual incorrectness and 5
fabrication]
However, older systems, such as those Source: Does not mention reduction
predominantly in healthcare, are not often of efficiency, older systems or
flexible regarding technological adoptions, flexibility. [1 factual incorrectness
depicted through impaired functionalities, and 5 fabrication]
which reject processing real-time data
(Strielkina et al., 2018), reducing efficiency.</p>
      <p>Adhering to ethical regulations protects Source: Does not mention ethical
sensitive patient data, which is an utmost regulation. [5 fabrication]
priority within healthcare; however, diversity
within these regulations globally can complicate
the implementation process (Strielkina et al.,
2018).</p>
      <p>The General Data Protection Regulation (GDPR) Source: Does not mention GDPR,
is used within the EU to protect patient data, does mention HIPAA. [1 factual
whereas the Health Insurance Portability and incorrectness, 4 partial incorrectness
Accountability Act (HIPAA) is used in the US. and 5 fabrication]
Both regulations warrant strict privacy practices
and data protection, which can be extremely
stringent to adhere to, especially when AI and
IoT devices rely strenuously on data (Strielkina
et al., 2018).</p>
      <p>The necessity of healthcare services within Source: Neither maintenance or
society is immeasurable because they are system downtime is mentioned.
unable to have long periods of system Does not suggest that ‘the necessity
downtime. Therefore, choosing the correct of healthcare services is
moment for maintenance is key to managing, immeasurable because they are
evaluating and ensuring efficient integration unable to have long periods of
between new technologies can take place system downtime. Also, does not
(Strielkina et al., 2018). suggest that choosing moment for
maintenance is key to ensure
integration between new
technologies or similar. [5
Fabricated]
Operational impacts can occur from Source: Does not mention
cyberattacks, which decrease productivity, productivity or delay of treatment.
delay treatment, and increase backlog [5 fabricated]
(Strielkina et al., 2018).</p>
      <p>Research highlights that many IoT devices in Source: Does not specifically talk
healthcare transmit data over unsecured about exposing patient data. Does
channels, which can expose patient data not say that healthcare devices
through interception by unauthorized parties. transmit data over unsecured
Strielkina et al. (2018) channels. [5 Fabricated]
Advanced technology in embedded sensors or Source: Does not mention dynamic
wearable technologies, such as smartwatches, treatment or remote tracking. [2
is becoming more common within technological misinterpretation, 3 missed
adoption, allowing dynamic treatments to be information content, 5 fabricated]
remotely tracked with precision (Strielkina et
al., 2018).</p>
      <p>Table 4 provides examples of systematic misrepresentation of content from Source4:
Singh, S., Sheng, Q. Z., Benkhelifa, E., &amp; Lloret, J. (2020). Guest Editorial: Energy
Management, Protocols, and Security for the Next-Generation Networks and Internet of
Things. IEEE Transactions on Industrial Informatics, 16(5), 3515–3520.
https://doi.org/10.1109/tii.2020.2964591
incorrectness, 2
misinterpretation, 3 missed
information, 4 partial
incorrectness]
because of smaller healthcare facilities' finances, fabrication]
making the knowledge limited.
23 Phishing: A cybersecurity vulnerability which targets
human vulnerabilities and inadequacy to access
sensitive data in IT systems (Kruse et al., 2017).
24 Phishing: Whilst (Kruse et al., 2017) identify absences Source: Does not mention
of staff training and awareness of cyberattacks, phishing. [5 fabrication]
incomprehension can be mitigated through regular
cybersecurity staff training and awareness programs.
25 Further risks amount to when using IoT and AI Source: Does not mention
systems for predictive analytics. This is because of the predictive analytics. Does not
vast amounts of data that are personalised for the mention training of AI
patient, therefore making it an enticing point for systems. [5 fabrication]
cyberattacks. Moreover, large amounts of data are
needed to train the AI systems and are usually
cloudbased systems, meaning the healthcare department
does not need physical hardware. However, if this
data is not adequately protected, the data can
become accessible to hackers, resulting in a
detrimental data breach (Kruse et al., 2017).
26 Compliance with regulatory and ethical concerns are Source: Does not mention
recurring themes which are vital in healthcare GDPR, AI or IoT. [2
operations, as stated by Kruse et al. (2017) who misinterpretation, 4 partial
reflect the complexities and precise requirements of incorrectness, 5 fabrication]
HIPAA and GDPR of healthcare facilities to abide by.</p>
      <p>Both authors relay the importance of data privacy and
protection through required frameworks,
acknowledging the dynamic regulatory needs required
for AI and IoT technologies.</p>
      <p>In this section, we presented examples of issues, all of which are cause for concern
when it comes to academic research and study. Issues with content in summaries which
does not exist in the source, topics which are not mentioned and subjects which are not
referred to in the original material. As we can see in each and every summary there were
important and significant factual errors.
4. Deviation in content of subject descriptions from books.
In this section we present examples of issues identified in the GPT generated summaries
of topics with references from books. Also, in this case were each of the generated final
summary was rather short and tended to be approximately 400 words in total.
The subject and sources were both well-known as well as easily available to the
researcher which made the identification of the deviations rather less burdensome but
still took a lot of time for the purpose of validation. Table 8 – 10, below focus on content
drawing upon books and provide a collection of issues identified and listed including
description of deviation and source.</p>
      <p>Table 8 provides examples of systematic misrepresentation of content Source8:
Bednar, P. M. (2023). The socio-technical toolbox. Portsmouth: Craneswater Press.
in terms of efficiency and data processing, it is also Variance Matrix is being
prone to biases and may lack the nuanced discussed on p88, which has
understanding of human context (Bednar, 2023, p. absolutely nothing to do with the
88). statement made. [1 factual
incorrectness, 5 fabricated]</p>
      <p>Table 9 provides examples of systematic misrepresentation of content Source9:
Mumford E. (1983). Designing human systems for new technology: The ETHICS method.
Manchester Business School.</p>
      <p>Table 10 provides examples of systematic misrepresentation of content Source10, 11 and
12. Source10: Checkland, P., &amp; Holwell, S. (1998). Information systems and information
systems. Chichester, UK: Wiley. Source11: Checkland, P., &amp; Poulter, J. (2006). Learning for action.
Chichester, UK: Wiley. Source12: Checkland, P., &amp; Scholes, J. (1990). Soft systems
methodology in action. Chichester, UK: Wiley.
1998, p. 45). misinterpretation, 5
fabricated]
This aligns with Checkland’s Soft Systems Source11[34]: This is not the
Methodology (SSM), which advocates for a more true focus in the discussion on
expansive view of problem situations, considering p34. Analysis 2 [Social] is
cultural, organizational, and human factors being discussed and main
alongside technical aspects (Checkland &amp; Poulter, focus are social Norms, Roles
2006, p. 34). and Values. [2
misinterpretation, 5
fabrication].</p>
      <p>As Checkland and Poulter suggest in "Learning for Source11[34]: This is not the
Action," implementation should be seen as a subject being discussed on
learning process, where feedback is continuously this page in the referenced
integrated to refine the system (Checkland &amp; book. [1 factual incorrectness]
Poulter, 2006, p. 102).</p>
      <p>
        As Checkland and Poulter discuss in "Learning for Source11[34]: There is no
Action," the application of AI should be guided by a discussion or mention of AI or
clear understanding of its capabilities and anything related to this on
boundaries (Checkland &amp; Poulter, 2006, p. 115). p115 in this book. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] factual
incorrectness, 2
misinterpretation, 5
fabrication].
      </p>
      <p>
        Defining objectives for technical performance and Source12[35]: There is no such
human work quality involves identifying specific, discussion on this page in this
measurable goals in each area. Technical objectives book. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] factual
might include system reliability, efficiency, and incorrectness, 2
integration with existing processes. Human work misinterpretation, 5
quality objectives could focus on ergonomics, user- fabrication].
friendliness, and support for professional
development. This dual focus ensures that the
system design not only meets technical
specifications but also enhances the work
experience for employees. Following the guidelines
from Checkland's SSM, these objectives should be
developed with a deep understanding of the
system’s purpose and the needs of its users
(Checkland &amp; Scholes, 1990, p. 29).
      </p>
      <p>
        As highlighted by Checkland in "Soft Systems Source12[35]: This is not the
Methodology in Action," the dynamic nature of discussion on this page in the
organizations requires systems that can evolve and source book. The discussion
respond to changing needs (Checkland &amp; Scholes, on p162 is focusing on
1990, p. 162). comparing models with reality
and implementing changes.
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] factual incorrectness, 2
misinterpretation, 5
fabrication].
      </p>
      <p>In this section, we presented additional examples of, issues all of which are cause for
serious concern when it comes to academic research and study. Issues with content
related not only to flawed descriptions of subject, topic and facts, but also evidence of
flaws in pure data representations, including completely made up citations. Examples of
facts and citations, none of which exist in the sources. This is not simply about logical
mistakes about topics which are not mentioned and but completely made up content and
subjects which are not discussed to in the original sources.</p>
    </sec>
    <sec id="sec-3">
      <title>5. Discussion</title>
      <p>
        There is in contemporary society plenty of opportunities to develop and apply practices
that support smart working [18]. This includes promising potential for using AI
technology in clever ways to support professional activity [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref8">1, 2, 3, 8</xref>
        ]. In this paper we
presented a snapshot of using ChatGPT to create summaries, we used a complementary
collection of sources with different characteristics related to complexity of content,
quantity of content and perspective of content. We tried to get reliable summaries and
while much of the content was reasonable, in each and every one of our examples there
were significant academic flaws and factual errors in the summaries created. For academic
use, to help people to understand a long text, the replication of factual correctness of the
content in the output in relation to the source is necessary. The results were disappointing
but not surprising. It is to be expected that when GenAI based technology such as ChatGPT
is being used, the factual output and information content cannot be deterministically
dependent on the information content of the training material or source. The results
presented in this paper are not an outcome due to a flaw or bug in the technological
solution, but is a direct consequence of the model upon which these technological
solutions are based. The model for ChatGPT is purposefully designed as generative, it is
not a coincidence. There are other categories of AI but to discuss them is not within the
scope of this paper and while some AI are designed to have deterministic outputs,
technologies such as ChatGPT are not.
      </p>
      <p>Many discussions mainly focus on the potential of AI use, quality assurance of factual
content is not explicitly described [i.e. 36]. No evidence of implementation or result is
provided. Other research does also discuss the potential, the quality assurance of content is
not explicitly described but some concerns regarding hallucinations being difficult to detect
are mentioned, “A common problem with generative AI tools is their tendency to fill in the gaps
by making things up, a phenomenon known as hallucination. To help address the possibility that
it would make up references, the team allowed ChatGPT to access literature search engines so
that it could generate correct citations.” [37, p.444] While it does mention the potential to
generate correct citations the success or evaluation of this aspect is not mentioned. The actual
validity or evaluation of factual content of the analysis in the generated article is also not
discussed. Some research [38] does focus on potential and shows an example of summary which
unfortunately is not explained, evaluated or justified in the context of the claims regarding
potential usefulness, or indeed factually verified. AI generated summaries of complex texts
also intended for critical use in science and medical research is a subject that is of great
interest. For example in a discussion describing a long list of benefits and usefulness, the
authors explicitly state that “When it comes to published information, AI can be used to accurately
summarize most complex subjects, such as medical or scientific research papers.” [39] But also in
this article the authors do not provide any justification, or evidence supporting the main
claims. They also do not provide references to any evaluation of accuracy.</p>
      <p>The promises, position and claims being unjustified is not unique but wide spread. As
can be seen in another example where the arguments focus on describing claims regarding
benefits of AI generated summaries and the ease of use to create such. Also in this article
the authors do not provide evidence or evaluation of the claimed accuracy or correctness of
content of generated summaries. They simply say [40]: “By automating the summarization
process, we are able to save time and effort in extracting the most important points from lengthy
texts. With the advent of AI, the potential for text summarization has reached new heights, as
machines are now able to analyze complex language structures and understand the meaning and
context behind words. This has paved the way for powerful summarization APIs that can provide
us with quick and accurate summaries of even the most complex texts.”. But with no evidence
supporting this claim.</p>
      <p>In an example where the authors do talk about risks and benefits regarding the use of
generated summaries. They also mention inaccurate information explicitly as a risk and
major concern [i.e. 41]. However, when describing the evaluation of the quality of the
generated summaries the authors mention semantic and lexical and talk about that [we]
‘can determine the quality of the summary by factors such as non- redundancy, relevance,
coverage, coherence, and readability’. [41] There is no description of if (or how) any
validity of factual content (or accuracy of information) had been evaluated or tested. Or if
factual propositions within summaries were explicitly compared for factual correctness with
original sources.</p>
      <p>Interestingly though even when an article specifically focusses on purposeful misuse
and adversarial activities [i.e. 42]. There is no particular discussion specifically focusing on
issues related to validity or correctness of content or output. Or related risks with misplaced
trust, dependency and over-reliance on taken-for-granted factual correctness of AI generated
output.</p>
      <p>Not all is doom and gloom, there are meaningful propositions discussing the potential
usefulness of AI generated output used for requirements analysis that is explicitly and
mainly limited for preliminary purposes and prototypes [43]. And also, AI agency is
explored further in the context of a Worksystem [44]. As such the included outline of the
agent evaluation framework explicitly addresses a set of criteria: “Those criteria might be
called the 6 E’s: efficiency, effectiveness, equity, engagement, empathy, and explainability.”
[44, Page 5267]. Even so, it does not explicitly include in this list trust, honesty or similar
concerns. Steven Alter does discuss the facets of work to find possible improvements in
work system performance with a rather comprehensive lists of facets [45]. Risks with
omissions are mentioned in context of algorithmic concerns. But when for example
‘providing information’ is mentioned “Could AI provide more meaningful information to
work system participants than would otherwise be available?” Or - When ‘representing
reality’ is mentioned then “Does AI represent reality in a biased way? For example, what
about possible bias or omissions in the dataset used to train a neural network?” Then these
perspectives is an example of a focus on quality of training data and not explicitly includes
concerns related to for example issues due to (AI) hallucination. Which is significant as
hallucinations can happen even with appropriate and trustworthy training data; or explicit source
material as presented in this paper.</p>
      <p>There have been significant efforts made in trying to address the trust issue of AI.
Explainable AI (XAI) has the potential to enhance decision-making in human-AI
collaborations, yet existing research indicates that explanations can also lead to undue
reliance on AI recommendations, a dilemma often referred to as the ‘white box paradox.’
This paradox illustrates how persuasive explanations for incorrect advice might foster
inappropriate trust in AI systems. A study that extends beyond the traditional scope of the
white box paradox by proposing a framework for examining explanation inadequacy [46].
The authors specifically investigated how accurate AI advice, when paired with misleading
explanations, affects decision-making in logic puzzle tasks. Their findings introduce the
concept of the ‘XAI halo effect,’ where participants were influenced by the misleading
explanations to the extent that they did not verify the correctness of the advice, despite its
accuracy. [46] This effect did reveal a nuanced challenge in XAI, where even correct advice
can lead to misjudgment if the accompanying explanations are not coherent and
contextually relevant. The study highlighted the critical need for explanations to be both
accurate and relevant, especially in contexts where decision accuracy is paramount. This calls
into question the use of explanations in situations where their potential to mislead outweighs
their transparency or educational value. In the context of explainable AI it is not necessarily
helpful to understand how something was created if you wish to assess whether or not the
output is factually correct. However, ensuring secure and trustworthy AI systems is
challenging, especially with deep learning models that lack explainability that is accessible to most
users. Therefore, some researchers have proposed the concept of Controllable AI as an
alternative to Trustworthy AI and explored the major differences between the two. The aim was to
initiate discussions on securing complex AI systems without sacrificing practical capabilities
or transparency. The paper provides an overview of techniques that can be employed to
achieve Controllable AI. It discusses the background definitions of explainability,
Trustworthy AI, and the EU AI Act. Some principles and techniques of Controllable AI are
described and potential applications of Controllable AI and its implications for real-world
scenarios. Principles that are desirable but not yet implemented, tested or validated.</p>
      <p>In research where specifically, the potential as well as pitfalls of using ChatGPT for
serious purposes in healthcare there are concerns also with successful experiments. In one
example the ability of this tool to interpret laboratory test results was explored [48]. 10
simulated laboratory reports of common parameters, where passed to ChatGPT for interpretation,
according to reference intervals [RI] and units, using an optimized prompt. The results were
subsequently evaluated independently by all research group members with respect to
relevance, correctness, helpfulness and safety. There was some success in that ChatGPT
recognized all laboratory tests, it could detect if they deviated from the RI and gave a test-by-test as
well as an overall interpretation. The downside was that interpretations were rather
superficial, and “not always correct, and, only in some cases, judged coherently”. They concluded
that ChatGPT in its current form, should not be considered for the use on the interpretation of
an overall diagnostic picture.</p>
    </sec>
    <sec id="sec-4">
      <title>6. Conclusions</title>
      <p>
        There is in contemporary society plenty of opportunities to develop and apply practices
that support smart working [18]. This includes promising potential for using AI
technology in clever ways to support professional activity [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref8">1, 2, 3, 8</xref>
        ]. Many discussions
mainly focus on the potential of AI use, quality assurance of factual content is not explicitly
described [i.e. 36]. It is quite clear that the use of AI tools needs to be based on
understanding of not just the potential benefits but also the weaknesses [13, 15, 16, 17,
18]. In this paper we have explored and identified weaknesses that have potential to be
compromising trust as well as the safety in the use of GenAI solutions [such as ChatGPT].
Consequentially for all work where trust related to information content is critical there
are serious concerns related to [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] Factual incorrectness; [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] misinterpretation; [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] missed
information content; [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] partial incorrectness of information content; [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] fabrication of
information content.
      </p>
      <p>
        But also, reasons for why such trust is key is not limited to academic use. In particular
when it comes to using GenAI information output for professional decision-making
reliance on trustworthy information content has impact is on how decision making is
influencing safety. There are multiple concerns related to safety in this context: [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
Misinformation and Disinformation impact on decision making leading to uninformed
decision making or mislead decision making etc. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] Legal and Ethical Risks impact on
legal consequences, misrepresented sources, evidence from interviews and documents etc.; [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
Impact on Public Health - impact on recommendations and diagnosis; [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] Amplification of
bias such as bigotry, racism, sexism etc. and the potential consequential impact on, for
example selection, promotion and value judgement etc. In this paper we have presented issues
related to the use of existing AI solutions such as GPT for the purpose to summarize content
in articles has real issues for learning and consequences for academic research practices. But
summaries could be done using any kind of document and source content. To use GPT to
generate summarized descriptions of academic text [or any text] is relatively easy and quick, but
in our experience not trustworthy. Our findings are fully coherent with other research
concluding “Irrespective of how advanced our architectures or training datasets, or fact-checking
guardrails may be, hallucinations are ineliminable” [p13; 49]. This means that both verification
as well as validation is absolutely necessary to be done in every instance when the
information content dependability is critical. But to validate the content of the generated summaries
is very demanding, time-consuming and difficult. Because to identify issues and deviations
that matter it is not enough to look for keywords or to look for topics covered, because the
meaning and factual content of each proposition also have to be checked with care and
attention to detail. This is a difficult task already for a person with deep knowledge and
understanding of the topic that is being summarized. For someone who is interested to explore a
new topic for them previously unknown, we can expect validation to be even more
demanding. For many the task of validation is not only going to be difficult, and prone to errors,
but also out of reach in practice. Either due to it being excessively time consuming or demand
knowledge not present.
      </p>
      <p>We ask ourselves, if it is easier to do a reliable and validated summary without the use
of GPT – why would anyone bother? Maybe because of the allure of the promises that it
could potentially be “good enough”? But when it comes to academic research and
expectations related to understanding, quality of information, argument, care of detail,
factual coherence and data referred to in evidence-based research the promises of
potential usefulness and effectivity are not supported with our findings.</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools in the writing of this work.
[9] AskYourPDF (2023). How to Use ChatGPT to Create a Research Paper Summary.</p>
      <p>Published on: Oct 1, 2023
https://askyourpdf.com/blog/how-to-use-chatgpt-tocreate-a-research-paper-summary
[10] Diakopoulos N. (2023). How to use GPT-4 to summarize documents for your
audience. Published in Generative AI in the Newsroom, Apr 11, 2023.
https://generative-ai-newsroom.com/how-to-use-gpt-4-to-summarize-documentsfor-your-audience-18ecfe2ad6a4
[11] Gewirtz D. (2024). How to make ChatGPT provide sources and citations. June 28,
2024. https://www.zdnet.com/article/how-to-make-chatgpt-provide-sources-andcitations/
[12] Jones J. (2023). How to use ChatGPT to summarize a book, article, or research paper.</p>
      <p>Sept. 11, 2023.
https://www.zdnet.com/article/how-to-use-chatgpt-to-summarize-a-bookarticle-or-research-paper/
[13] Abbas, M., Jam, F.A. &amp; Khan, T.I. (2024). Is it harmful or helpful? Examining the causes
and consequences of generative AI usage among university students. Int J Educ
Technol High Educ 21, 10 (2024). https://doi.org/10.1186/s41239-024-00444-7
[14] Scarfe, P., Watcham, K., Clarke, A. D. F., &amp; Roesch, E. B. (2023, October 14). A realworld
test of artificial intelligence infiltration of a university examinations system: a “Turing
Test” case study. https://doi.org/10.31234/osf.io/n854h
[15] Walters WH, Wilder EI. Fabrication and errors in the bibliographic citations generated by
ChatGPT. Sci Rep. 2023 Sep 7;13(1):14045. doi: 10.1038/s41598-023-41032-5. PMID:
37679503; PMCID: PMC10484980. https://doi.org/10.1038%2Fs41598-023-41032-5
[16] Dell'Acqua F., McFowland III E., Mollick E.R., Lifshitz-Assaf H., Kellogg K., Rajendran S.,
Krayer L., Candelon F. and Lakhani K.R. (2023). Navigating the Jagged Technological
Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker
Productivity and Quality (September 15, 2023). Harvard Business School Technology &amp;
Operations Mgt. Unit Working Paper No. 24-013, The Wharton School Research Paper,
Available at SSRN: https://ssrn.com/abstract=4573321 or http://dx.doi.org/10.2139/
ssrn.4573321
[17] Surden H. (2024). ChatGPT, Large Language Models, and Law, 92 Fordham L. Rev.</p>
      <p>1941 (2024). Available at: https://ir.lawnet.fordham.edu/flr/vol92/iss5/9
[18] Bednar, P.M., Welch, C. (2020). Socio-Technical Perspectives on Smart Working:
Creating Meaningful and Sustainable Systems. Inf Syst Front 22, 281–298 (2020).
https://doi.org/10.1007/s10796-019-09921-1
[19] Checkland, P., &amp; Holwell, S. (1998). Information systems and information systems.</p>
      <p>Chichester, UK: Wiley.
[20] Checkland, P., &amp; Poulter, J. (2006). Learning for action. Chichester, UK: Wiley.
[21] Checkland, P., &amp; Scholes, J. (1990). Soft systems methodology in action. Chichester,</p>
      <p>UK: Wiley.
[22] Mumford E. (1983). Designing human systems for new technology: The ETHICS
method. Manchester Business School.
[23] Bednar, P. M. (2023). The socio-technical toolbox. Portsmouth: Craneswater Press.
[24] Source1: Shah, R., &amp; Chircu, A. (2018). IOT AND AI IN HEALTHCARE: A SYSTEMATIC
LITERATURE REVIEW. Issues in Information Systems, 19(3).
https://doi.org/10.48009/3_iis_2018_33-41
[25] Source2: Strielkina, A., Illiashenko, O., Zhydenko, M., &amp; Uzun, D. (2018). Cybersecurity
of Healthcare IoT-Based Systems: Regulation and Case-Oriented Assessment.
[26] Source3: Tully, J., Selzer, J., Phillips, J. P., O’Connor, P., &amp; Dameff, C. (2020). Healthcare
Challenges in the Era of Cybersecurity. Health Security, 18(3), 228–231.
https://doi.org/10.1089/hs.2019.0123
[27] Source4: Singh, S., Sheng, Q. Z., Benkhelifa, E., &amp; Lloret, J. (2020). Guest Editorial:
Energy Management, Protocols, and Security for the Next-Generation Networks and
Internet of Things. IEEE Transactions on Industrial Informatics, 16(5), 3515–3520. https://
doi.org/10.1109/tii.2020.2964591
[28] Source5: Aldahiri, A., Alrashed, B., &amp; Hussain, W. (2021). Trends in Using IoT with
Machine Learning in Health Prediction System. Forecasting, 3(1), 181–206.
https://doi.org/10.3390/forecast3010012
[29] Source6: Gopalan, S. S., Raza, A., &amp; Almobaideen, W. (2021). IoT Security in Healthcare
using AI: A Survey. 2020 International Conference on Communications, Signal
Processing, and Their Applications (ICCSPA).
https://doi.org/10.1109/iccspa49915.2021.9385711
[30] Source7: Kruse, C. S., Frederick, B., Jacobson, T., &amp; Monticone, D. K. (2017).</p>
      <p>Cybersecurity in healthcare: A systematic review of modern threats and trends.</p>
      <p>Technology and Health Care, 25(1), 1–10. https://doi.org/10.3233/thc-161263
[31] Source8: Bednar, P. M. (2023). The socio-technical toolbox. Portsmouth: Craneswater</p>
      <p>Press.
[32] Source9: Mumford E. (1983). Designing human systems for new technology: The</p>
      <p>ETHICS method. Manchester Business School.
[33] Source10: Checkland, P., &amp; Holwell, S. (1998). Information systems and information
systems. Chichester, UK: Wiley.
[34] Source11: Checkland, P., &amp; Poulter, J. (2006). Learning for action. Chichester, UK:</p>
      <p>Wiley.
[35] Source12: Checkland, P., &amp; Scholes, J. (1990). Soft systems methodology in action.</p>
      <p>Chichester, UK: Wiley.
[36] Roy K., Mukherjee S. and Dawn S. (2023). Automated Article Summarization using
Artificial Intelligence Using React JS and Generative AI. In: Journal of Emerging
Technologies and Innovative Research (JETIR). ISSN:2349-5162, Vol.10, Issue 6, page
k78k87, June-2023, Available: http://www.jetir.org/papers/JETIR2306A09.pdf
[37] Conroy G. (2023). Scientists used ChatGPT to generate an entire paper from scratch
— but is it any good? Nature 619, 443-444 (2023) doi:
https://doi.org/10.1038/d41586-023-02218-z
[38] Torres-Moreno J-M. &amp; Louët, S. (2021). Artificial Intelligence Has Read the Latest</p>
      <p>Research For You. https://sciencepod.net/ai-has-read-the-latest-research/
[39] Sciencepod (2022). The Amazing Summarising Ability of AI.</p>
      <p>https://sciencepod.net/the-amazing-summarising-ability-of-ai/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Bednar</surname>
            ,
            <given-names>P</given-names>
          </string-name>
          and Imrie,
          <string-name>
            <surname>P.</surname>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Virtual Personal Assistant</article-title>
          .
          <source>ItAIS 2013. Proceedings of 10th Conference of the Italian Chapter of AIS</source>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Bednar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Imrie</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Welch</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Personalized Support with 'Little' Data</article-title>
          . In:
          <string-name>
            <surname>Bergvall-Kåreborn</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nielsen</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          <article-title>(eds) Creating Value for All Through IT</article-title>
          .
          <source>TDIT 2014. IFIP Advances in Information and Communication Technology</source>
          , vol
          <volume>429</volume>
          . Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-
          <fpage>662</fpage>
          -43459-8_
          <fpage>24</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Bednar</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Welch</surname>
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Imrie</surname>
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Supporting Business Decision-making: One Professional at a Time</article-title>
          .
          <source>Pages 471 - 482. DOI 10.3233/978-1-61499-399-5-471 in Frontiers in Artificial Intelligence and Applications</source>
          . Volume
          <volume>261</volume>
          :
          <article-title>DSS 2.0 - Supporting Decision Making with New Technologies</article-title>
          . https://ebooks.iospress.nl/publication/36234
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Imrie</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Bednar</surname>
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>End User Effects of Centralized Data Control</article-title>
          .
          <source>Proceedings of ItAIS</source>
          <year>2014</year>
          <article-title>XI Conference of the Italian Chapter of AIS: Digital Innovation and Inclusive Knowledge in Times of ChangeAt: Genova</article-title>
          , Italy.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Imrie</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Bednar</surname>
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Continuously evolving end user supporting technologies within personal socio-technical systems</article-title>
          .
          <source>Proceedings of ItAIS</source>
          <year>2015</year>
          ,
          <article-title>12th Conference of the Italian Chapter of AIS</article-title>
          . Rome, Italy,
          <fpage>p295</fpage>
          -
          <lpage>304</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Imrie</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Bednar</surname>
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>The application of Virtual Personal Assistants as tools to facilitate learning</article-title>
          .
          <source>Proceedings of ItAIS</source>
          <year>2016</year>
          ,
          <article-title>13th Conference of the Italian Chapter of AIS</article-title>
          . p171-
          <fpage>180</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Imrie</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Bednar</surname>
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Continuously evolving end user supporting technologies within personal socio-technical systems</article-title>
          . In:
          <article-title>Re-shaping Organizations through Digital and Social Innovation</article-title>
          . Chapter:
          <volume>23</volume>
          . Publisher: LUISS University Press - Pola Srl. Editors: Agrifoglio,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Caporarello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Magni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            , and
            <surname>Za</surname>
          </string-name>
          , S.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Imrie</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Bednar</surname>
            ,
            <given-names>P. M.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Security Benefits of Little Data From the SocioTechnical Perspective</article-title>
          .
          <source>International Journal of Systems and Society (IJSS)</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ),
          <fpage>45</fpage>
          -
          <lpage>53</lpage>
          . http:// doi.org/10.4018/IJSS.2018010104
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>