<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>URL: https://hdl.handle.net/</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Ethical AI Systems and Shared Accountability: The Role of Economic Incentives in Fairness and Explainability</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dae-Hyun Yoo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Caterina Giannetti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Economics and Management, University of Pisa</institution>
          ,
          <addr-line>Pisa PI 56124</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1721</year>
      </pub-date>
      <volume>1</volume>
      <abstract>
        <p>This paper presents a principal-agent model for aligning artificial intelligence (AI) behaviors with human ethical objectives. In this framework, the end-user acts as the principal, ofering a contract to the system developer (the agent) that specifies desired levels of ethical alignment for the AI system. The developer can exercise varying levels of efort to achieve this alignment, with higher levels - such as those required in Constitutional AI - demanding more efort and posing greater challenges. To incentivize the developer to invest more efort in aligning AI with higher ethical principles, appropriate compensation is necessary. When ethical alignment is unobservable and the developer is risk-neutral, the optimal contract achieves the same alignment and expected utilities as when it is observable. For observable alignment, a fixed reward is uniquely optimal for strictly risk-averse developers, while for risk-neutral developers, it remains one of several optimal solutions. This simple model demonstrates that balancing responsibility between users and developers is crucial for fostering ethical AI. Users seeking higher ethical alignment must not only compensate developers adequately but also adhere to design specifications and regulations to ensure the system's ethical integrity.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ethical Alignment</kwd>
        <kwd>Asymmetric Information</kwd>
        <kwd>Principal-Agent Model</kwd>
        <kwd>Responsibility Allocation</kwd>
        <kwd>Constitutional AI</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        As artificial intelligence (AI) systems become increasingly integrated into society and tasked with
making complex decisions on behalf of humans, ensuring the ethical alignment between AI behavior
and human values is essential for fostering trust and collaboration [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, the ethical alignment
problem in AI is complicated by the involvement of multiple entities—developers, deployers, and
users—each of whom may have diferent objectives, incentives, and levels of information [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This
misalignment can lead to conflicts, especially when users delegate the ethical design of AI systems to
developers, who often possess more information but may not fully share the users’ ethical goals. The
ethical alignment challenge in AI systems mirrors the principal-agent problem commonly observed in
economics, where discrepancies arise between the interests of a principal (e.g., the user) and an agent
(e.g., the system developer). In AI, such misalignment can occur due to incomplete information, reward
misspecification, or diferences in values [
        <xref ref-type="bibr" rid="ref2 ref4 ref5">2, 4, 5</xref>
        ].
      </p>
      <p>
        As AI systems, such as autonomous vehicles and large language models, take on more
decisionmaking authority, addressing misalignment with human ethical standards becomes critical. Developers
have the flexibility to exert varying levels of efort when integrating ethical objectives into an AI
system. One approach, known as Constitutional AI, involves a training process in which a language
model is guided by a set of ethical principles, referred to as a "constitution" [
        <xref ref-type="bibr" rid="ref6">6, 7</xref>
        ]. These principles
are systematically instilled in the model throughout its development, shaping its behavior to align
with ethical guidelines. This approach ensures that the AI makes decisions and provides outputs that
reflect these predefined standards, creating a framework for responsible and transparent AI operation.
However, achieving a higher degree of ethical alignment requires significantly more efort, expertise,
and resources from the developer.These increased costs stem from the complexity of embedding stronger
ethical principles, ensuring compliance with evolving guidelines, and addressing unforeseen dilemmas
in AI decision-making. The greater the desired ethical rigor, the more challenging and resource-intensive
the development process becomes, both in terms of technical implementation and ongoing oversight.
      </p>
      <p>To investigate the various possibilities for aligning a system, this paper adapts a basic principal-agent
model from economics [8] to explore how responsibility for ethical AI systems can be distributed
among diferent stakeholders through economic incentives. By focusing on the contractual relationship
between users and system developers, we analyze optimal reward schemes that incentivize developers
to align AI behaviors with human ethical objectives. This model contributes to the growing discussion
on how to allocate responsibility for ethical AI and ofers insights into how economic mechanisms can
be used to mitigate ethical risks in AI deployment.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Principal-Agent Model</title>
      <sec id="sec-2-1">
        <title>2.1. Assumptions</title>
        <p>The variable  represents the observable benefits that arise from deploying ethically aligned AI systems.
While influenced by the level of ethical alignment ( ),  is not entirely determined by it and take values
within [ ,  ]. The relationship between  and  is characterized by a conditional density function,
 ( |), where  ( |) &gt; 0 for all  ∈  and  ∈ [ ,  ]. This introduces uncertainty, as any realization
of  can occur for a given level of ethical alignment.</p>
        <p>The level of ethical alignment , chosen by the system developer, represents the efort made to align AI
systems with ethical objectives. The set  encompasses all available ethical alignment levels, with two
primary options:
{︃1 : high ethical alignment</p>
        <p>2 : low ethical alignment</p>
        <p>We assume that the efort level 1, which corresponds to higher ethical alignment, yields greater
benefits for the user (principal) but imposes greater challenges on the system developer (agent). These
challenges arise due to the increased complexity and resource demands of implementing stronger
ethical guidelines, as well as ensuring compliance and addressing unforeseen dilemmas in the AI’s
decision-making process.</p>
        <p>This creates a conflict of interests between the user and developer. More specifically, the distribution
of  conditional on 1 first-order stochastically dominates that of 2. The conditional density functions
 ( |1) and  ( |2) satisfy;</p>
        <p>( |1) ≥  ( |2)
and the distribution functions:
 ( |1) ≤  ( |2) at all  ∈ [ ,  ]
∫︁</p>
        <p>∫︁
 ( |1)  &gt;
 ( |2) 
, with strict inequality on some interval. This implies that the expected benefits from 1 exceed those
from 2;</p>
        <p>The system developer is an expected utility maximizer with a Bernoulli utility function  (, ) over
reward () and ethical alignment level (), satisfying;
(, ) &gt; 0 and (, ) ≤ 0 for all (, )
 (, 1) &lt;  (, 2) for all 
(1)
(2)
(3)
Thus, the developer prefers higher rewards but dislikes a high level of ethical alignment. The choice of
1 provides greater benefits to the user but imposes more "disutility" on the developer compared to 2.
We focus on a specific utility function commonly used in the literature [8]:
, where</p>
        <p>(, ) = () − ()
′() &gt; 0, ′′() ≤ 0,
and (1) &gt; (2)</p>
        <p>The user, assumed to be risk-neutral, seeks to maximize expected returns, receiving the benefits of
ethical alignment minus the rewards paid to the system developer.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. The Optimal Contract with Observable Ethical Alignment Level</title>
        <p>Suppose the user ofers a contract specifying the ethical alignment level  ∈ {1, 2} and the system
developer’s reward as a function of observed benefits ( ). The system developer must receive an
expected reward at least equal to ¯, the reservation utility, if they accept the contract. If they reject it,
they receive zero. The developer is assumed to find it worthwhile to align the AI system to the ethical
objectives set by the contract.</p>
        <p>The user’s objective is to choose the optimal contract to maximize their expected benefits:
or
reward.</p>
        <p>(4)
(A.1)</p>
        <p>(5)
(A.2)
(A.3)
(A.4)
(A.5)
subject to the same constraint. The constraint always binds at the solution, as lowering the reward
would prevent ethical alignment.</p>
        <p>Let  denote the multiplier on the constraint, (5). The optimal reward scheme satisfies:
−  ( | ) +  · ′(( )) ·  ( | ) = 0
 =</p>
        <p>1
′(( ))
(* ) − () = ¯
subject to the constraint:
Choosing ( ) to minimize the user’s reward costs reduces to:
(( )) ·  ( |) − () ≥ ¯</p>
        <p>∫︁ 
Min
( )</p>
        <p>( ) ·  ( |)</p>
        <p>Max
∈{1,2},( )</p>
        <p>∫︁ 
∫︁</p>
        <p>( − ( )) ·  ( |)</p>
        <p>If the system developer is risk-averse (i.e., ′(( )) is decreasing), the optimal reward is a fixed
amount, reflecting a risk-sharing result. The risk-neutral user insures the risk-averse developer by
ofering a fixed reward</p>
        <p>* that satisfies:
Since (1) &gt; (2), it follows that *1 &gt; *2 , meaning higher ethical alignment results in a higher
When the developer is risk-neutral (i.e., () = ), a fixed reward is just one of many optimal schemes,
provided the expected reward is ¯+</p>
        <p>().</p>
        <p>To determine the optimal , the user selects the ethical alignment levels  ∈ {1, 2} that maximizes:
∫︁</p>
        <p>·  ( |) − − 1(¯+ ())
of .
alignment level * solves:
The first term represents the gross benefit from the ethically aligned AI system, while the second
term represents the rewards paid to the developer for alignment efort. Whether 1or 2 is optimal
depends on the trade-of between the incremental benefits of
1 over 2 and the disutility imposed on
the developer.</p>
        <p>Specifically, if the additional benefits from a higher level of ethical alignment under
1 outweigh the
increased cost and efort required from the developer, then 1 becomes optimal. However, if the marginal
benefit is insuficient to cover the greater efort and resource demands of achieving a more stringent
ethical standard, then 2 may be the preferred choice. This balance reflects the fundamental tension
between maximizing ethical outcomes and managing the practical limitations faced by developers,
particularly in complex frameworks like Constitutional AI, where higher ethical alignment often requires
significantly more efort, expertise, and oversight.</p>
        <p>Proposition 1. In the principal-agent model with observable ethical alignment, the optimal contract
specifies the level of ethical alignment * that maximizes the user’s benefits. The system developer receives
a fixed reward * = − 1(¯+ (* )) if risk-averse. When the developer is risk-neutral, a fixed reward is
one of many possible optimal reward schemes.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. The Optimal Contract with Unobservable Ethical Alignment Level</title>
        <p>The optimal contract described in Proposition 1 achieves two objectives: it specifies an eficient level
of ethical alignment and insures the system developer against reward risk. However, when the ethical
alignment level  is not observable, these objectives conflict, as the developer’s pay must be tied to the
uncertain benefits  to incentivize alignment. This leads to a welfare loss due to the non-observability
Suppose the system developer is risk-neural, so () = . Under full observability, the optimal
 ·  ( |) − () − ¯
(A.6)
The user’s benefits are the value of expression (A.6), and the developer receives an expected utility of ¯.
When the developer’s efort is unobservable, Proposition 2 states that the user can still achieve the
full-information payof.
under full observability.</p>
        <p>Proposition 2. In the principal-agent model with unobservable ethical alignment and a risk-neutral
system developer, an optimal contract results in the same ethical alignment level and expected utilities as
developer chooses  to maximize his utility,
Proof. The user ofers a contract ( ) =  −  , where  is a fixed payment (“alignment price"). The
Since * maximizes (A.7), this contract induces the first-best alignment efort level
* .</p>
        <p>The developer accepts this contract if it provides at least ¯ in expected utility:
Let  * be the value of  where (A.8) holds with equality. Rearranging:
(A.7)
(A.8)
(A.8.1)</p>
        <p>Max
∈{1,2} 
∫︁ 
∫︁</p>
        <p>Max
∈{1,2} 
=
∫︁</p>
        <p>∫︁ 

∫︁</p>
        <p>(( ) ·  ( |))  − ()
 ·  ( |)  −  − ()
 ·  ( |* )  −  − (* ) ≥ ¯
 * =</p>
        <p>·  ( |* )  − (* ) − ¯
Thus, with ( ) =  −  * , both the user and the developer receive the same payof as under full
observability, with the user’s payof being  * .</p>
        <p>The intuition behind Proposition 2 is straightforward. When the system developer is risk-neutral,
the need for risk-sharing mechanisms is eliminated, allowing for more eficient incentives. In this case,
the developer can be fully compensated based on the marginal returns of their efort in aligning the AI
system with ethical principles, without incurring any risk-bearing losses. For example, in the context
of Constitutional AI, this implies that a risk-neutral developer can focus solely on embedding ethical
principles - such as those outlined in a "constitution" - without being deterred by the risks associated
with uncertain outcomes. The user as a principal can therefore provide direct incentives to reward
the developer’s efort in achieving higher levels of ethical alignment, leading to a more transparent
and accountable AI system. Since the developer is indiferent to risk, the compensation structure can
be fully aligned with the ethical objectives, enabling a smoother implementation of Constitutional AI
without the need to factor in risk-related adjustments.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>Our principal-agent model identifies the optimal reward scheme for system developers to align ethical
objectives under specific conditions. In the case where the ethical alignment level is unobservable
and the developer is risk-neutral, the optimal contract leads to the same ethical alignment choice and
expected utilities for both the developer and the user as if the ethical alignment level were observable.
When the ethical alignment level is observable, the optimal contract specifies a fixed reward for the
system developer. This is uniquely optimal if the developer is strictly risk-averse. However, if the
developer is risk-neutral, a fixed reward scheme is one of several possible optimal rewards. Furthermore,
if users desire high levels of ethical alignment in AI systems, they must ofer greater compensation to
system developers, as higher ethical alignment comes with increased efort and costs for developers.
This trade-of is particularly relevant in practical scenarios where ethical considerations are paramount,
such as in Constitutional AI frameworks. In these cases, users play a key role in incentivizing developers
to achieve robust ethical standards by providing the necessary financial and contractual incentives.
Ultimately, the model highlights the importance of economic incentives in balancing responsibilities
between users and developers in the creation of ethical AI systems.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion &amp; Conclusion</title>
      <p>This research demonstrates that economic incentives play a crucial role in ensuring the ethical alignment
of AI systems through a reward scheme in a contract. Our findings emphasize that achieving higher
levels of ethical alignment, such as those seen in Constitutional AI, requires greater compensation
for system developers due to the increased efort, complexity, and resources involved. However, even
after developers align AI systems with ethical objectives, users share the responsibility of ensuring
these systems are deployed and utilized ethically. This includes adhering to the system’s design
specifications, regularly monitoring AI outputs and behavior to prevent deviations from ethical standards,
and complying with regulatory frameworks like the EU AI Act [9, 10]. If users identify unethical
outcomes - such as biased decisions - they must take corrective actions, whether by adjusting the
system’s parameters or collaborating with developers to address the issue. Ethical AI is a shared
responsibility, not solely resting on developers. Users must also maintain ongoing oversight to ensure
that AI continues to operate in alignment with ethical principles throughout its lifecycle, particularly
as it interacts with new environments and data.</p>
      <p>Our adaptation of the principal-agent model provides a theoretical framework that is both relevant
and applicable to current discussions on AI governance. By aligning economic incentives with ethical
outcomes, this model ofers insights that can inform regulatory approaches, such as those proposed
in the EU AI Act, ensuring that system developers and users are both held accountable. This shared
responsibility can enhance compliance with ethical standards, particularly as the complexity of AI
systems increases.</p>
      <p>While our model ofers valuable insights, it is important to acknowledge its limitations. For instance,
it assumes developers are fully rational and respond predictably to incentives, which may not always
hold true in practice. Additionally, the model does not account for other factors that could influence
ethical alignment, such as societal pressures or rapidly evolving technological landscapes.</p>
      <p>In conclusion, this research contributes to the ongoing dialogue about responsibility for ethical
AI and how it should be distributed between developers and users. By ofering a concrete economic
model, it helps clarify how incentives can be structured to promote ethical AI while ensuring that all
stakeholders - developers, users, and regulators - actively maintain ethical standards. This is critical for
building trust and accountability as AI becomes increasingly integrated into society.</p>
      <p>Future research should explore more complex and dynamic incentive structures, including multiple
principals, as well as ways to incorporate factors such as societal pressures, evolving regulations, and
technological advancements like AI’s increasing autonomy and adaptability into the framework.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The authors thank Maria Bigoni and Nicola Meccheri for useful comments, and acknowledge support
from the project "Teaming-up with social artificial agents" funded by Italian Ministry of Education,
University and Research under the Program for Research Projects of National Interest (PRIN) grant no
2022ALBSWX.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Musto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pellungrini</surname>
          </string-name>
          , E. Purificato, G. Semeraro,
          <string-name>
            <given-names>M.</given-names>
            <surname>Setzu</surname>
          </string-name>
          , XAI.it
          <year>2024</year>
          :
          <article-title>An Overview on the Future of Explainable AI in the era of Large Language Models</article-title>
          ,
          <source>in: Proceedings of 5th Italian Workshop on Explainable Artificial Intelligence, co-located with the 23rd International Conference of the Italian Association for Artificial Intelligence</source>
          , Bolzano, Italy,
          <source>November 25-28</source>
          ,
          <year>2024</year>
          , CEUR. org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hadfield-Menell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Hadfield</surname>
          </string-name>
          ,
          <article-title>Incomplete contracting and ai alignment</article-title>
          ,
          <source>Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (AIES '19)</source>
          (
          <year>2019</year>
          )
          <fpage>417</fpage>
          -
          <lpage>422</lpage>
          . doi:
          <volume>10</volume>
          .1145/ 3306618.3314250.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shavit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brundage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Adler</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. O'Keefe</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Campbell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mishkin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Eloundou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hickey</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Slama</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Ahmada</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>McMillan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Beutel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>D. G.</given-names>
          </string-name>
          <string-name>
            <surname>Robinson</surname>
          </string-name>
          ,
          <article-title>Practices for governing agentic ai systems</article-title>
          ,
          <year>2023</year>
          . URL: https://openai.com/index/ practices
          <article-title>-for-governing-agentic-ai-systems/</article-title>
          , last accessed
          <year>2024</year>
          /07/25.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Phelps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Ranson</surname>
          </string-name>
          ,
          <article-title>Of models and tin men - a behavioural economics study of principal-agent problems in ai alignment using large-language models</article-title>
          ,
          <source>ArXiv abs/2307</source>
          .11137 (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          . 48550/arXiv.2307.11137.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          Hadfield-Menell,
          <article-title>Consequences of misaligned ai</article-title>
          ,
          <source>in: Thirty-Fourth International Conference on Neural Information Processing Systems (NeurIPS)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          . doi:
          <volume>10</volume>
          .48550/ arXiv.2102.03896.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kadavath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kundu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kernion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Goldie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mirhoseini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>McKinnon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Olsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Olah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Drain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ganguli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>TranJohnson</surname>
          </string-name>
          , E. Perez,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kerr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mueller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ladish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Landau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ndousse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lukosuite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lovitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sellitto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Elhage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Schiefer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mercado</surname>
          </string-name>
          , N. DasSarma, R. Lasenby,
          <string-name>
            <given-names>R.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ringer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Johnston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kravec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Showk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lanham</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
            Telleen-Lawton,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Conerly</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Henighan</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Hume</surname>
            ,
            <given-names>S. R.</given-names>
          </string-name>
          <string-name>
            <surname>Bowman</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Hatfield-Dodds</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Amodei</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Joseph</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>McCandlish</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Brown</surname>
          </string-name>
          , J. Kaplan, Constitutional ai:
          <source>Harmlessness from ai feedback</source>
          ,
          <year>2022</year>
          . URL: https://arxiv. org/abs/2212.08073.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>