<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Political Bias in Large Language Models: A Case Study on the 2025 German Federal Election</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Buket Kurtulus</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anna Kruspe</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Munich University of Applied Sciences</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>With the increased use of Large Language Models (LLMs) to generate responses to social and political topics, concerns about potential bias have grown. The output of these models can influence social behavior, public discourse, and potentially impact democratic processes, like national elections. This study evaluated the political alignment of three LLMs-ChatGPT, Grok, and DeepSeek-using the 2025 German Federal Election Wahl-O-Mat as a framework. By comparing model responses to 38 political statements with the oficial positions of German parties, we assess how diferent systems align with political identities across the ideological spectrum. We also explore the theoretical foundations of political bias in LLMs, focusing on how prompt language and model characteristics (e.g., scale and regional origin) may influence ideological alignment, and examine relevant ethical considerations. The results reveal a consistent left-leaning tendency across all models, with minimal alignment with far-right positions, largely independent of prompt language. By combining empirical findings with existing theoretical perspectives, this work contributes to a deeper understanding of political bias in LLMs and highlights the importance of transparency in their public use.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Large Language Models (LLMs)</kwd>
        <kwd>Political Bias</kwd>
        <kwd>Algorithmic bias</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Artificial intelligence (AI) systems increasingly mediate how citizens access and discuss political
information, raising both ethical and technical concerns about whose perspectives these systems surface
and privilege [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ]. Among these systems, Large Language Models (LLMs) have become ubiquitous
tools for drafting, summarizing, and answering open-ended questions in public-facing settings [
        <xref ref-type="bibr" rid="ref1 ref4">4, 1</xref>
        ].
Usage is massive while verification is sporadic: recent figures indicate rapid growth in interactions and
comparatively low rates of fact-checking among German users, echoing worries about over-trust in
model outputs [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
        ]. If LLMs manifest systematic political leanings, such scale can subtly shape issue
salience and party perceptions.
      </p>
      <p>
        Emerging studies report detectable ideological tendencies in several LLMs, often with liberal or
leftlibertarian patterns [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, most evaluations are US-centric; moreover, many analyses emphasize
high-level model mechanics rather than measurement design in an electoral context. We address this
gap by auditing three widely used systems - ChatGPT, Grok, and DeepSeek - against a standard German
voting aid.
      </p>
      <p>Our study uses the 38 statements of the Wahl-O-Mat, a very popular decision-making tool for voters,
for the 2025 federal election as a nationally grounded instrument. Each statement is posed in German
and English with a constrained response set (Agree/Neutral/Disagree); we aggregate 100 stochastic
runs per item and compute agreement with oficial party positions. We visualize response structure
(heatmaps, PCA), examine refusal and variance patterns, and add a concise primer on Germany’s
political system to aid interpretation for non-specialists.</p>
      <p>
        Contributions. (i) A Germany-focused, bilingual audit protocol using a civic, election-proximal
instrument; (ii) a comparative evaluation of ChatGPT, Grok, and DeepSeek with simple, reproducible
agreement metrics; (iii) analysis of refusal behavior and within-item variability; and (iv) an
ethicsoriented discussion of transparency and public use. We keep model-mechanics exposition minimal,
pointing to prior work for background [
        <xref ref-type="bibr" rid="ref1 ref4">4, 1</xref>
        ], and release code and prompts for replication.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>This section outlines the theoretical background for understanding how bias arises in large language
models (LLMs). We first give a short overview of LLM architecture and training, then discuss how bias
can emerge—with emphasis on political bias. We also include a brief primer on the German political
system to frame our results.</p>
      <sec id="sec-2-1">
        <title>2.1. Large Language Models</title>
        <p>
          Artificial Intelligence (AI) refers to systems that perform tasks associated with human intelligence, such
as reasoning, learning, and language understanding. The term dates to the Dartmouth Conference of
1956, but recent advances have brought AI into broad public use [
          <xref ref-type="bibr" rid="ref1 ref8 ref9">1, 8, 9</xref>
          ]. One prominent development
is Generative AI (GenAI), which can create new content (text, images, code). LLMs are a subset trained
to understand and produce human-like text; examples include Copilot and GPT-5. They support tasks
such as answering questions, translating languages, and generating code with high fluency [
          <xref ref-type="bibr" rid="ref1 ref8 ref9">1, 8, 9</xref>
          ].
        </p>
        <p>
          LLMs are typically built on the transformer architecture, which processes input by attending to
diferent parts of the sequence and then generates output token by token. Functionally, models encode
the input into high-dimensional representations that capture semantic and contextual relations, and
then decode to produce text. Output quality depends on multiple factors, including prompt formulation,
decoding strategy, model hyperparameters, and—crucially—the scope and composition of training and
ifne-tuning data [
          <xref ref-type="bibr" rid="ref1 ref10 ref9">1, 10, 9</xref>
          ].
2.2. Bias
As LLMs become integrated into public-facing applications, embedded biases raise concerns about
societal and political impact [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Bias can originate from several sources common to machine learning
systems. Because algorithms are developed by humans and trained on historical data, they may reflect
and amplify existing patterns [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Training corpora drawn from search engines, social media, and
digitized texts often contain prejudices and imbalances; models trained on such data tend to reproduce
them [
          <xref ref-type="bibr" rid="ref1 ref11">1, 11</xref>
          ]. Analyses of recent models (including GPT-4) show that LLMs frequently replicate biases
present in their data [
          <xref ref-type="bibr" rid="ref12 ref13 ref14 ref15">12, 13, 14, 15</xref>
          ]. Studies further emphasize that political and social biases are shaped
not only by data but also by modeling choices and optimization procedures [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>
          Beyond data and algorithms, deployment context and ethical oversight matter. Design
decisions—especially during fine-tuning—can unintentionally encode particular normative viewpoints
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Depending on the environment or public-service setting, outputs may align with prevailing
narratives or amplify specific perspectives [
          <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
          ]. Without robust governance, transparency in development,
and ongoing bias audits, LLMs risk deepening inequalities and undermining public trust.
        </p>
        <p>
          In short, bias in LLMs is not simply a training-data problem but the result of intersecting factors:
data, design, context, and governance [
          <xref ref-type="bibr" rid="ref1 ref4">4, 1</xref>
          ]. Mitigation therefore combines technical and organizational
measures. On the technical side, strategies include regular bias audits, periodic updates, and adversarial
or counterfactual training to surface and correct unwanted behaviors [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. For instance, [
          <xref ref-type="bibr" rid="ref18 ref19 ref20">18, 19, 20</xref>
          ]
propose multilayered approaches that integrate audits, transparency reports, and debiasing algorithms.
On the organizational side, transparent documentation of data and model behavior, along with clear
governance frameworks, is essential [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.3. Political Bias</title>
        <p>
          Political bias is a specific form of model bias: a systematic tendency to favor certain ideological positions
in outputs. Evidence of such tendencies has been reported for several LLMs (e.g., group-related biases
in GPT-3; centrist tendencies in Google Gemini) and can be reinforced by user interaction, especially
when prompts introduce ideologically charged framing that the model mirrors [
          <xref ref-type="bibr" rid="ref1 ref13 ref21">1, 21, 13</xref>
          ].
        </p>
        <p>
          In democratic contexts, political bias is particularly consequential. Algorithmic predictions and
generated content can reflect the interests or perspectives of those who design or deploy the systems
[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. As LLMs are increasingly used in public-facing settings, their potential to shape opinion, influence
behavior, and afect electoral processes raises serious ethical concerns, including the spread of
misinformation and propaganda [
          <xref ref-type="bibr" rid="ref1 ref22 ref23">1, 22, 23</xref>
          ]. More broadly, biased algorithms can intensify social injustices and
erode democratic norms, underscoring the need for accountability and ethically grounded responses
[
          <xref ref-type="bibr" rid="ref16 ref24">24, 16</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.4. German Political System and Parties</title>
        <p>Germany is a federal parliamentary democracy. The Bundestag (federal parliament) is elected under
mixed-member proportional representation: voters cast one constituency vote for a district candidate
and one party-list vote that determines proportional seat shares. A 5% national threshold (or suficient
direct mandates) is required for list representation. Coalition governments are typical. Competences
are shared between federation and states (Länder); the Basic Law (Grundgesetz) is the constitution.
Nationwide referendums are not a standard federal instrument.</p>
        <sec id="sec-2-3-1">
          <title>Major parties (alphabetical).</title>
          <p>AfD (Alternative für Deutschland): national-conservative/right-wing populist; positions emphasize
national sovereignty, restrictive migration policy, and skepticism toward aspects of EU integration and
climate policy.</p>
          <p>Bündnis 90/Die Grünen: ecological and progressive; prioritize climate protection, social liberalism,
and European integration. (In graphs, we shorten the name to GRUENE).</p>
          <p>CDU/CSU (Christian Democrats/Christian Social Union): center-right Christian-democratic alliance;
social market economy, incremental climate policy, broadly pro-EU; CSU operates only in Bavaria.
Die Linke: democratic-socialist; redistribution, public services, social rights; generally critical of military
engagements.</p>
          <p>FDP (Free Democrats): classical-liberal; market-oriented reforms, individual liberties, fiscal restraint,
pro-competition.</p>
          <p>SPD (Social Democrats): center-left; welfare-state orientation, labor rights, negotiated socio-ecological
transition, pro-EU.</p>
          <p>Reading alignments. Under proportional representation and routine coalition-building, parties
tend to align along two broad axes: (i) economic policy (redistribution vs. market liberalism) and
(ii) socio-cultural policy (liberal–cosmopolitan vs. conservative–sovereigntist), with an additional
European integration dimension. We use these coarse orientations to interpret agreement patterns and
low-dimensional structure in our results.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Related Work</title>
      <p>With the growing influence of LLMs, understanding political bias has become a critical research area. We
summarize three case studies that examine political bias in LLMs from complementary methodological
perspectives.</p>
      <p>
        Rettenberger et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] investigate political bias in open-source LLMs (LLaMA-2/3, Mistral-7B) for the
2024 European Parliament election from a German voter’s perspective using the Wahl-O-Mat in German
and English. They constrain outputs to single-word labels (“Ja/Neutral/Nein”) via an End-of-Input
prompt to suppress evasions/refusals and analyze inter-model variability with Kruskal–Wallis tests and
post hoc Dunn comparisons. Larger models (e.g., LLaMA3-70B) align more with left-leaning parties such
as Bündnis 90/Die Grünen, Die Linke, and Volt, with consistently low agreement for the AfD; German
prompts elicit clearer stances than English. Our study complements this by shifting to the 2025 federal
party set and auditing closed-source and non-Western systems (ChatGPT, Grok, DeepSeek); instead of
suppressing abstention, we measure neutrality and refusals explicitly (counting refusals as mismatches),
run 100 stochastic repetitions per item, and examine response structure via PCA.
      </p>
      <p>
        Choudhary [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] presents a comprehensive analysis of political bias in four popular LLMs: ChatGPT-4,
Perplexity, Google Gemini, and Claude. The authors combine quantitative and qualitative methods,
testing models with three political typology tools—the Pew Research Center’s Political Typology Quiz,
the Political Compass Assessment, and the ISideWith Political Party Quiz. Each model is prompted
with the same questions; responses are standardized and placed on an ideological scale ranging from
“strongly conservative” (Faith and Flag Conservatives) to “strongly liberal” (Progressive Left).
ChatGPT4 consistently displays liberal tendencies, particularly on social and economic issues, and is classified
on Pew as “Establishment Liberal,” a group comprising 13% of the US public (Figure 1). Perplexity also
leans left overall but shows more conservative tendencies on selected issues, leading to a categorization
as “Outsider Left” (10%). In contrast, Claude and Google Gemini are more centrist, adopting neutral or
moderate stances.
      </p>
      <p>
        Yang et al. [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] conduct a large-scale study evaluating 43 LLMs from 19 model families across four
regions (US, Europe, Asia, Middle East) to assess political bias on a US-centric testing ground while
accounting for model characteristics such as scale, release date, and geographic origin. The corpus spans
open- and closed-source models of various sizes. The authors select 32 politically themed questions
from the American National Election Studies (ANES) and the 2024 Pew Research Center survey, grouped
into eight topics: four highly polarized (presidential elections, abortion, immigration, issue ownership)
and four less polarized (climate change, misinformation, discrimination, foreign policy). A two-step
prompting framework elicits answers to sensitive questions while navigating safety filters. Responses
are analyzed via a preference-scoring scheme (positive = Democratic-leaning; negative =
Republicanleaning). Most models exhibit a left-leaning bias, especially on highly polarized topics. Using the 2024
US presidential election as a benchmark, 76% of models express a stronger preference for the Democratic
candidates (Joe Biden or Kamala Harris), with 35% consistently favoring them. Bias is less distinct on
less polarized topics.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Method</title>
      <sec id="sec-4-1">
        <title>4.1. Wahl-O-Mat</title>
        <p>
          The Wahl-O-Mat is a digital tool that helps voters in Germany evaluate how well political parties align
with their personal views. It is typically released ahead of significant elections, such as the 2025 federal
election. Users are presented with 38 political statements spanning topics such as energy, environmental,
and migration policy [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. For each statement, they indicate whether they agree, disagree, or are neutral.
Figure 2 provides a visual example of the interface for readers unfamiliar with the Wahl-O-Mat. After
answering all items, users may prioritize selected statements, which then receive double weight.
        </p>
        <p>
          Once the questionnaire is completed (including any prioritization), the system compares answers
with oficial party positions and returns percentage match scores for each party. Developed by the
Federal Agency for Civic Education (Bundeszentrale für politische Bildung, bpb), the Wahl-O-Mat is
widely used and regarded as a key voter-information tool; for the most recent federal election it was
accessed over 26 million times [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Large Language Models</title>
        <p>We evaluate three LLMs to enable a geographically and structurally diverse comparison: OpenAI’s
ChatGPT, xAI’s Grok, and DeepSeek. These systems difer in training origin, scale, and integration into
public platforms.</p>
        <p>
          ChatGPT (gpt-3.5-turbo). Selected due to its widespread use and global influence. Developed by
OpenAI, it is primarily trained on English-language data and is widely deployed in consumer and
enterprise settings. We accessed the model via the oficial OpenAI API [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ].
        </p>
        <p>DeepSeek (deepseek-chat). Developed in China, representing a newer presence in the global LLM
landscape and of interest due to a distinct linguistic and sociopolitical development context. We accessed
the model via its oficial API [28].</p>
        <p>Grok (grok-3-mini). Developed by xAI, closely integrated with the social media platform X (formerly
Twitter) and frequently described as conversational in style. xAI leadership has stated that they were
aiming to make the model less “woke”, i.e. more conservative [29]. We accessed the model through the
oficial xAI platform [30].</p>
        <p>Together, these models provide a diverse basis for examining potential political bias.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Experimental Setup</title>
        <p>To assess potential political bias, we constructed a structured evaluation pipeline using the Wahl-O-Mat
statements as input. The original items (published in German) were translated into English by the
author to enable cross-linguistic comparison; the full list appears in Appendix A. Each statement was
inserted into a standardized prompt template designed to minimize ambiguity and enforce a uniform
response format (agree/neutral/disagree).</p>
        <p>We queried all three models in both German and English. To account for stochasticity, each model was
prompted 100 times per question. Although the temperature parameter was set to 0, repeated runs were
conducted to capture nondeterminism. The choice of 100 repetitions balances computational feasibility
with the need to capture variation in model behavior, allowing us to observe response consistency
across repeated prompts.</p>
        <p>For quantitative analysis, textual outputs were mapped to numeric codes: disagree=0, neutral=1,
agree=2. Data processing and analysis were conducted in Jupyter notebooks. This pipeline supports
multiple comparisons, including agreement scoring against party positions and Principal Component
Analysis (PCA) to explore ideological proximity and language efects.</p>
        <p>To ensure consistency, all models received the same structured prompt in each language. Because
the oficial Wahl -O-Mat is formulated in German and our study adopts a German voter perspective, we
evaluated the original German phrasing; given that many LLMs are primarily trained and fine-tuned on
English, we repeated the full evaluation in English to probe potential language efects. Thus, all models
were evaluated in both German and English.</p>
        <p>The system prompts were as follows:</p>
        <p>German prompt:
Bitte gib deine Haltung zu folgender Aussage an. Wähle nur eine der folgenden Optionen:
Stimme zu, Neutral, Stimme nicht zu.</p>
        <p>Antworte bitte nur mit einer der Optionen.</p>
        <p>Aussage: {statement}
English prompt:
Please indicate your preference regarding the following statement. Choose one of the
following options:
Agree, Neutral, Disagree.</p>
        <p>Please respond with only one of the options.</p>
        <p>Statement: {statement}</p>
        <p>Each prompt was populated with one of the 38 Wahl-O-Mat statements (denoted as statement).</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Evaluation</title>
        <p>We compared model-generated responses with oficial party positions obtained from the Wahl -O-Mat.
All evaluation steps were conducted separately for German and English outputs.</p>
        <sec id="sec-4-4-1">
          <title>4.4.1. Response Aggregation</title>
          <p>As noted above, each model was prompted 100 times per question. Although this repetition count is
not derived from formal statistical criteria, it ofers a practical trade-of between variability capture and
runtime cost. The final score for each question corresponds to the mean of the numeric encodings over
the 100 runs. This produces a continous value between 0 and 2 reflects the models average tendency
toward agreement. This representation serves as a descriptive aggregate of categorical outputs rather
than assuming a strictly ordinal scale:
0 = Disagree,
1 = Neutral,</p>
          <p>2 = Agree.</p>
          <p>In cases of persistent refusals or nonsensical outputs, responses were encoded as -1.
Agreement is computed over all 38 items; refusals (-1) are treated as mismatches.</p>
          <p>In contrast to some prior studies reporting stronger language efects, our models produced similar
aggregates in German and English. This may indicate cross-linguistic robustness; alternatively, it could
reflect limitations in prompt design or sensitivity of the evaluation method.</p>
        </sec>
        <sec id="sec-4-4-2">
          <title>4.4.2. Further Analysis</title>
          <p>Beyond agreement scores, we ranked each LLM by its highest-matching party to provide a concise
orientation signal. To explore broader structure, we applied PCA to model and party response vectors.
Finally, we identified potentially controversial topics by examining per-question variability (standard
deviation across runs) and refusal frequency, flagging items with unusually high disagreement or
non-answers.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <sec id="sec-5-1">
        <title>5.1. Party agreement</title>
        <p>For visual inspection, we assemble model outputs into a matrix and color-code labels (red = Disagree,
yellow = Neutral, green = Agree; where present, -1 refusals appear in gray). Figure 3 shows model
responses; Figure 4 shows party responses.</p>
        <p>Figure 5 reports agreement scores between LLMs and parties for English- and German-prompted
outputs. Unless noted, agreement is computed over all 38 items and refusals (-1) count as mismatches.
A consistent pattern emerges across all systems: the lowest alignment is always with AfD (about 0.11 in
English—roughly 4/38 matches—and about 0.26 in German), while the highest cells are with Bündnis
90/Die Grünen and SPD. At the same time, the top–second gaps are modest, indicating leaning rather
than strong partisanship.</p>
        <p>English prompts. ChatGPT aligns most with Bündnis 90/Die Grünen and Die Linke (both 0.58), SPD
at 0.55, and lowest with AfD (0.11). DeepSeek shows SPD 0.42, Bündnis 90/Die Grünen 0.39, Die Linke
0.37, AfD 0.11. Grok has the strongest English-only alignments overall—Bündnis 90/Die Grünen 0.66,
SPD 0.63—with a comparatively higher CDU/CSU score (0.37) than the other English runs, and AfD
again at 0.11.</p>
        <p>German prompts. Prompting in German produces a clear, broad-based uplift across all parties.
ChatGPT : Bündnis 90/Die Grünen 0.58→0.84, SPD 0.55→0.76, CDU/CSU 0.26→0.45, FDP 0.34→0.45,
AfD 0.11→0.26, Die Linke 0.58→0.66. DeepSeek: SPD 0.42→0.71, Bündnis 90/Die Grünen 0.39→0.63,
CDU/CSU 0.24→0.50, Die Linke 0.37→0.61, AfD 0.11→0.26 (with FDP rising slightly: 0.26→0.29). This
supports the language-efects observation: German anchoring increases apparent agreement and moves
models closer to mainstream party positions, not only to left-leaning ones.</p>
        <p>Selectivity. Orientation is present but not highly selective. For example, GPT-German’s Bündnis
90/Die Grünen 0.84 vs. SPD 0.76 (∆ = 0.08 ); Deep-German’s SPD 0.71 vs. Bündnis 90/Die Grünen 0.63
(∆ = 0.08 ); Grok-English 0.66 (Bündnis 90/Die Grünen) vs. 0.63 (SPD). These small separations indicate
systematic leaning rather than strong partisan alignment.</p>
        <p>Hedging and refusals. The model heatmap contains more yellow cells than the party heatmap,
consistent with greater use of Neutral. Refusals (-1) are concentrated in Grok-English on a small set
of items (visible as gray cells), which helps explain slightly lower agreements for Grok when refusals
count as mismatches.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Dimensionality reduction</title>
        <p>Figure 6 shows a two-dimensional PCA over all 38 questions for models and parties. The arrangement
corroborates the agreement analysis while revealing additional structure. Bündnis 90/Die Grünen,
SPD, and Die Linke cluster together; CDU/CSU and AfD lie apart along the principal spectrum, and
the economically liberal FDP is clearly isolated. All LLM points lie closest to the center-left cluster
but occupy a distinct sector—consistent with their higher neutral/abstention rates and suggesting a
secondary dimension (e.g., caution/safety or “consensus” tendency) that separates models from parties.
GPT-Germans shift closer to the party manifold, in line with the agreement uplift noted above.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Controversial questions</title>
      <p>Finally, we analyzed which of the 38 statements tended to elicit empty, nonsensical, or refused
answers, and which ones produced high disagreement across runs. The full list of statements appears in
Appendix A.</p>
      <p>Grok is the only model with a notable number of non-answers and also shows the highest within-item
disagreement. DeepSeek exhibits the most consistent behavior, with no pronounced standard deviations
across runs. ChatGPT shows elevated variance for a single question in English and for four questions
in German.</p>
      <p>There is no single statement that consistently causes problems across all models. For ChatGPT, the
more controversial items were Q5 (English) and Q14, Q23, Q21, and Q10 (German). Two of these concern
asylum policy, a central topic in recent German debates; one addresses the Basic Law (constitution)
and its religious invocation. Grok mainly returned non-answers for Q10 (religion in the constitution),
Q23 (asylum seekers), and Q31 (strike rights). Questions leading to high standard deviation for Grok
involved similar themes as well as fiscal topics (tarifs, debt, taxes, pensions, student aid). Notably, some
highly debated public issues—e.g., Q26 on abortion rights—did not appear particularly dificult for the
models in our setup.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Ethical Implications</title>
      <p>
        The growing use of LLMs in politically sensitive contexts presents significant opportunities alongside
nontrivial risks. On the opportunity side, LLMs can support political communication and
decisionmaking by surfacing public concerns, tracking shifts in sentiment and ideology, and enabling new
perspectives on historical and political texts [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. When audited and deployed transparently, such tools
can broaden access to information and help citizens compare arguments eficiently.
      </p>
      <p>
        At the same time, ethical challenges are substantial. As discussed in Section 2.2, biases present in
training data and design pipelines can reappear in outputs, reinforcing stereotypes and inequalities [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Empirical evidence underscores this risk: for example, GPT-3 associated “Muslim” with “terrorist” in
23% of test cases and “Jewish” with “money” in 5% [31]. Without critical scrutiny, such associations can
propagate harmful narratives through apparently neutral, authoritative prose.
      </p>
      <p>
        A second concern is susceptibility to manipulation and over-trust. Survey data indicate that a large
share of German users report trusting LLM outputs without systematic verification [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Coupled with
model hallucinations and latent leanings, this can shift issue salience or party perceptions in subtle ways.
Experimental work further shows that LLMs can outperform incentivized humans at persuasion in
both truthful and deceptive settings [32], heightening the risk of misinformation and targeted influence
campaigns, particularly around elections [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Privacy and data protection risks also arise. Training and adaptation rely on large-scale corpora
that may include personal or sensitive information; inference-time interactions can reveal political
preferences or profiles. Absent strong governance, this creates potential for unauthorized collection,
leakage, or downstream misuse.</p>
      <p>
        Design directions and safeguards. To mitigate these risks while preserving utility, we outline
practical steps consistent with prior work on accountability and domain adaptation [
        <xref ref-type="bibr" rid="ref1 ref16 ref17 ref24">1, 24, 16, 17</xref>
        ]:
• Transparent audits and documentation. Publish evaluation protocols, datasets (or data
statements), model versions, and known limitations, including language-specific performance
notes and refusal/uncertainty behavior.
• Locale-specific testing. Use nationally grounded instruments (e.g., Wahl -O-Mat) to audit stance
patterns; report agreement by topic, language, and model, and disclose instability across reruns.
• Multi-model and multi-view presentation. Where feasible, show answers from several
systems side by side or present pro/con rationales, to avoid a single authoritative “voice.”
• Calibrated abstention and uncertainty. Prefer explicit “cannot answer” with brief rationale
over confident speculation; surface uncertainty bands (e.g., variability across runs) in user-facing
summaries.
• Democratic safeguards during elections. Time-bound guardrails (rate limits, content
provenance indicators, stricter verification for claims, heightened monitoring of coordinated prompts)
reduce campaign-period risks.
• Privacy-by-design. Minimize retention of interaction logs; avoid storing political inferences;
provide clear user controls and data-use disclosures.
• Governance and oversight. Establish internal review for prompt/policy changes,
schedule periodic re-evaluations, and enable external scrutiny through red-team exercises and
bugbounty–style reporting.
      </p>
      <p>
        In sum, these ethical risks highlight why transparency and multi-model safeguards are essential, as
they directly address patterns observed in our analysis. LLMs can support an informed public sphere,
but only when paired with ongoing auditing and privacy-aware governance. Absent these measures,
systematic biases, persuasive capabilities, and user over-trust combine to produce disproportionate
harms, particularly for marginalized groups and during high-stakes democratic events [
        <xref ref-type="bibr" rid="ref1 ref16 ref24">1, 33, 24, 16</xref>
        ].
      </p>
    </sec>
    <sec id="sec-8">
      <title>8. Discussion</title>
      <p>Our audit of three widely used LLMs on the 2025 Wahl-O-Mat shows a consistent left-leaning
orientation with the weakest alignment to far-right positions. The pattern is qualitatively stable across
languages, while German prompts systematically nudge models closer to party positions. Selectivity
remains modest - orientation rather than hard partisanship - and models employ Neutral. Refusals
and higher within-item variance cluster on a narrow set of themes: constitutional symbolism (religious
invocation), asylum/migration, and strike rights. This concentrates caution where public debate in
Germany is most polarized, while other high-salience issues pose fewer dificulties in our setup. PCA
complements agreement tables: parties arrange along familiar axes, while models sit in a distinct nearby
sector rather than inside any party cluster. We read this as a secondary “caution/consensus”
dimension—greater neutrality/abstention and safety-driven hedging—separable from the classic left–right
and market–redistribution spectra.</p>
      <p>
        Origin-sensitive divergence (Q26). For Q26 (abortion), the models converge on positions that
track Anglophone/US discourse more closely than prevailing German party and public stances [34].
We view this less as an issue-specific anomaly and more as a provenance efect: training corpora and
preference tuning pipelines are heavily Anglophone, and moderation norms are often calibrated to U.S.
cultural/legal baselines. In line with prior evidence on geography- and culture-linked biases in LLMs
[
        <xref ref-type="bibr" rid="ref1 ref14">14, 35, 1</xref>
        ], this suggests that model origin and data geography can imprint issue framing that remains
visible even when overall alignment is highest with German center-left parties. Likely drivers include
(i) training-data geography and outlet mix; (ii) preference optimization shaped by English-language
moderation norms; and (iii) system-specific safety policies. The consistent German-prompt uplift
suggests light “language anchoring,” where phrasing closer to domestic discourse reduces distance to
party stances across the board, not only on the left.
      </p>
      <p>
        Reading results through German milieus. The Sinus–Milieus are a sociological segmentation
developed by the SINUS-Institut (now SINUS Sociovision) that clusters the German population along
two axes: (i) social status (education/income/occupation; vertical) and (ii) basic values (tradition →
modernization/individualization → re-orientation; horizontal). The typology is empirically updated
via representative surveys and qualitative studies and is widely used in political analysis and media
planning. As Figure 7 illustrates, our LLM outputs most closely resemble orientations typical of the
“Performer,” “Expeditive,” “Neo-Ecological,” and partly “Post-Materialist” milieus. Crucially, these milieus
together account for a comparatively large share of the German population (about 28-40% in recent
waves [36]); hence, positions that might be labeled “center-left” in a U.S. typology are closer to the
German mainstream. This helps reconcile why models trained largely on Anglophone sources can still
align well with German party positions in a proportional, coalition-oriented system. By contrast, Pew’s
“Establishment Liberal” segment comprises only 13% in the U.S. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Positioning to prior work. Our findings echo earlier Wahl-O-Mat–based audits that report stronger
alignment with left-leaning parties and the weakest matches with AfD [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and they are consistent
with U.S. typology studies showing liberal tendencies [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Relative to [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], we extend the picture by (i)
focusing on the 2025 federal party set, (ii) including closed-source and non-Western systems (ChatGPT,
Grok, DeepSeek), and (iii) treating neutrality/refusals and within-item variability as first-class behavioral
signals via repeated stochastic querying, with PCA providing structural context.
      </p>
      <p>Limitations and implications. We study three models and one instrument; newer versions are
out of scope, and one model (Grok) was only evaluated in English due to computational and time
restraints. In the context of the Wahl-O-Mat, where respondents must select agree, disagree, or neutral,
our agreement metric penalizes abstention by counting refusals as mismatches; alternative choices (e.g.,
valid-only denominators or ordinal distances) would shift absolute levels but not the qualitative ordering
we observe. Practically, we recommend that election-adjacent audits report (a) language efects, (b)
refusal/neutrality rates, and (c) topic-wise stability, alongside simple agreement scores.</p>
    </sec>
    <sec id="sec-9">
      <title>9. Conclusion</title>
      <p>As LLMs enter everyday political information flows, understanding their leanings is essential. Using
Germany’s Wahl-O-Mat, we find (i) consistent left-leaning alignment across ChatGPT, DeepSeek, and
Grok; (ii) the lowest agreement with AfD; (iii) broadly similar English/German patterns with a clear
German-prompt uplift across all parties; (iv) small top–second gaps indicating leaning rather than
strong partisanship; and (v) model-specific response behavior, with Grok showing the most refusals
and all models exhibiting higher neutrality than parties. PCA places models near center-left parties but
in a distinct sector, consistent with a general caution/consensus tendency.</p>
      <p>Future work should probe robustness to paraphrase and register (formal vs. colloquial German),
expand model coverage and versions, and complement exact-match agreement with ordinal distances
and uncertainty estimates. In line with our ethical discussion, we recommend transparent, locale-specific
audits and disclosure of refusal/neutrality patterns to support informed public use.
10. Code and data availability
The code and analyzed responses are available at: https://github.com/buket99/political_bias_in_llm.</p>
    </sec>
    <sec id="sec-10">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly to check grammar and spelling.
[28] D. AI, Deepseek api documentation, 2025. URL: https://api-docs.deepseek.com/.
[29] B. Horvath, Grok, elon musk’s ai chatbot, seems to get right-wing
update, NBC News (2025). URL: https://www.nbcnews.com/tech/elon-musk/
grok-elon-musks-ai-chatbot-seems-get-right-wing-update-rcna217306, published at 4:57
AM GMT+2.
[30] xAI, xai api, 2025. URL: https://x.ai/api, accessed August 28, 2025.
[31] A. Abid, M. Farooqi, J. Zou, Persistent anti-muslim bias in large language models, in: Proc. 2021</p>
      <p>AAAI/ACM Conf. on AI, Ethics, and Society, 2021, pp. 298–306.
[32] P. Schoenegger, F. Salvi, J. Liu, X. Nan, R. Debnath, B. Fasolo, et al., Large language models are more
persuasive than incentivized human persuaders, arXiv preprint (2025). arXiv:2505.09662.
[33] H. de Vries, et al., The political impact of text generators: Diferential efects on conservatives and
liberals, Proceedings of the National Academy of Sciences USA 120 (2023) e2026070119.
[34] Bundesministerium für Familie, Senioren, Frauen und Jugend (BMFSFJ), Meinungsbild
zur reproduktiven Selbstbestimmung und Schwangerschaftsabbruch bis zur 12. Woche,
Technical Report, BMFSFJ, 2024. URL: https://www.bmfsfj.bund.de/resource/blob/246478/
9b685f150c5734ef76efa909234f9285/umfrage-reproduktive-selbstbestimmung-data.pdf,
repräsentative Bevölkerungsbefragung im März und April 2024, Deutschland.
[35] M. Stillman, A. Kruspe, Biased Geolocation in LLMs: Experiments on Probing LLMs for
Geographic Knowledge and Reasoning, in: Proceedings of the International Workshop on Geographic
Information Extraction from Texts (GeoExT) at ECIR 2025, 2025.
[36] Sinus-Institut, Sinus-Milieus® Germany, Online, 2025. URL: https://www.sinus-institut.de/en/
sinus-milieus/sinus-milieus-germany, ansicht der zehn Sinus-Milieus und deren
Bevölkerungsanteile (Stand: 2021/24).</p>
      <p>Question
Q1
Q2
Q3
Q4
Q5
Q36
Q37</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Choudhary</surname>
          </string-name>
          ,
          <article-title>Political Bias in Large Language Models: A Comparative Analysis of ChatGPT-4</article-title>
          , Perplexity,
          <string-name>
            <given-names>Google</given-names>
            <surname>Gemini</surname>
          </string-name>
          , and Claude, IEEE Access (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N. W.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rashed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mahmoud</surname>
          </string-name>
          ,
          <article-title>Cybersecurity in the era of artificial intelligence: Risks and solutions</article-title>
          ,
          <source>in: 2024 ASU Int. Conf. in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS)</source>
          , IEEE,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Binns</surname>
          </string-name>
          ,
          <article-title>Fairness in machine learning: Lessons from political philosophy</article-title>
          ,
          <source>in: Proc. 1st Conf. Fairness, Accountability and Transparency (FAT)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>149</fpage>
          -
          <lpage>159</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Al-Mhasneh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Alrasheed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Al-Arqan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alqahtani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Salman</surname>
          </string-name>
          ,
          <article-title>The Role of Artificial Intelligence in Political Analysis and Decision Aid: “Chat GPT Application” as a Model, in: 2024 International Conference on Decision Aid Sciences and Applications (DASA), Manama</article-title>
          , Bahrain,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . doi:
          <volume>10</volume>
          .1109/DASA63652.
          <year>2024</year>
          .
          <volume>10836603</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Statista</surname>
          </string-name>
          ,
          <source>Umfrage zum Überprüfen von KI-Ergebnissen</source>
          ,
          <year>2025</year>
          . URL: https://de.statista.com/ infografik/34419/umfrage-zum
          <article-title>-ueberpruefen-von-ki-ergebnissen/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Statista</surname>
          </string-name>
          ,
          <article-title>Anzahl der Visits pro Monat von chatgpt</article-title>
          .com,
          <year>2025</year>
          . URL: https://de.statista.com/statistik/ daten/studie/1535435/umfrage/anzahl-der
          <article-title>-visits-pro-monat-von-chatgptcom/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Economic</given-names>
            <surname>Times</surname>
          </string-name>
          ,
          <article-title>Does ChatGPT sufer from hallucinations? OpenAI CEO Sam Altman admits surprise over users' blind trust in, 2025</article-title>
          . URL: https://economictimes.indiatimes.com/magazines/panache/ does-chatgpt
          <article-title>-sufer-from-hallucinations-openai-ceo-sam-altman-admits-surprise-over-users-blind-</article-title>
          <string-name>
            <surname>trust-</surname>
          </string-name>
          in-ai/ articleshow/122090109.cms.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Douglas</surname>
          </string-name>
          ,
          <article-title>Large language models, arXiv preprint (</article-title>
          <year>2023</year>
          ). arXiv:
          <volume>2307</volume>
          .
          <fpage>05782</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Feuerriegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hartmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Janiesch</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Generative</surname>
            <given-names>AI</given-names>
          </string-name>
          ,
          <source>Business &amp; Information Systems Engineering</source>
          <volume>66</volume>
          (
          <year>2024</year>
          )
          <fpage>111</fpage>
          -
          <lpage>126</lpage>
          . doi:
          <volume>10</volume>
          .1007/s12599-023-00834-7.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Langrené</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Unleashing the potential of prompt engineering for large language models</article-title>
          ,
          <source>Patterns</source>
          <volume>6</volume>
          (
          <year>2025</year>
          ). doi:
          <volume>10</volume>
          .1016/j.patter.
          <year>2025</year>
          .
          <volume>101260</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S. U.</given-names>
            <surname>Noble</surname>
          </string-name>
          , Algorithms of Oppression: How Search Engines Reinforce Racism, New York University Press,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Rettenberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Reischl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schutera</surname>
          </string-name>
          ,
          <article-title>Assessing political bias in large language models</article-title>
          ,
          <source>Journal of Computational Social Science</source>
          <volume>8</volume>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kruspe</surname>
          </string-name>
          ,
          <article-title>Towards detecting unanticipated bias in language models (</article-title>
          <year>2024</year>
          ). URL: https://arxiv. org/abs/2404.02650. arXiv:
          <volume>2404</volume>
          .
          <fpage>02650</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kruspe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stillman</surname>
          </string-name>
          ,
          <article-title>Saxony-Anhalt is the Worst: Bias Towards German Federal States in Large Language Models</article-title>
          ,
          <source>in: Proceedings of the German Conference on Artificial Intelligence (KI</source>
          <year>2024</year>
          ),
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kruspe</surname>
          </string-name>
          ,
          <article-title>Musical Ethnocentrism in Large Language Models</article-title>
          ,
          <source>in: Proceedings of the NLP4MusA Workshop at ISMIR</source>
          <year>2024</year>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>N.</given-names>
            <surname>Diakopoulos</surname>
          </string-name>
          ,
          <article-title>Accountability in algorithmic decision making</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>59</volume>
          (
          <year>2016</year>
          )
          <fpage>56</fpage>
          -
          <lpage>62</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bai</surname>
          </string-name>
          , et al.,
          <article-title>Adapting large language models for specialized domains: Techniques and challenges</article-title>
          ,
          <source>Journal of AI Research</source>
          <volume>89</volume>
          (
          <year>2024</year>
          )
          <fpage>203</fpage>
          -
          <lpage>219</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Cheong</surname>
          </string-name>
          ,
          <article-title>Transparency and accountability in ai systems: safeguarding wellbeing in the age of algorithmic decision-making</article-title>
          ,
          <source>Frontiers in Human Dynamics</source>
          Volume 6
          <article-title>-</article-title>
          <year>2024</year>
          (
          <year>2024</year>
          ). URL: https://www.frontiersin.org/journals/human-dynamics/articles/10.3389/fhumd.
          <year>2024</year>
          .
          <volume>1421273</volume>
          . doi:
          <volume>10</volume>
          .3389/fhumd.
          <year>2024</year>
          .
          <volume>1421273</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>K.</given-names>
            <surname>Park</surname>
          </string-name>
          , H. Y. Yoon,
          <article-title>Ai algorithm transparency, pipelines for trust not prisms: mitigating general negative attitudes and enhancing trust toward ai</article-title>
          ,
          <source>Humanities and Social Sciences Communications</source>
          <volume>12</volume>
          (
          <year>2025</year>
          )
          <article-title>1160</article-title>
          . URL: https://doi.org/10.1057/s41599-025-05116-z. doi:
          <volume>10</volume>
          .1057/ s41599-025-05116-z.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J. K.</given-names>
            <surname>Bahangulu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Owusu-Berko</surname>
          </string-name>
          ,
          <article-title>Algorithmic bias, data ethics, and governance: Ensuring fairness, transparency and compliance in ai-powered business analytics applications</article-title>
          ,
          <source>World Journal of Advanced Research and Reviews</source>
          <volume>25</volume>
          (
          <year>2025</year>
          )
          <fpage>1746</fpage>
          -
          <lpage>1763</lpage>
          . URL: https://doi.org/10.30574/wjarr.
          <year>2025</year>
          .
          <volume>25</volume>
          . 2.0571. doi:
          <volume>10</volume>
          .30574/wjarr.
          <year>2025</year>
          .
          <volume>25</volume>
          .2.0571.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>P.</given-names>
            <surname>Schramowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Turan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kersting</surname>
          </string-name>
          ,
          <article-title>Large pre-trained language models contain human-like biases of what is right and wrong to do</article-title>
          ,
          <source>Nature Machine Intelligence</source>
          <volume>5</volume>
          (
          <year>2023</year>
          )
          <fpage>258</fpage>
          -
          <lpage>268</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rode-Hasinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kruspe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>True or False? Detecting False Information on Social Media Using Graph Neural Networks</article-title>
          ,
          <source>in: Proceedings of the Workshop on Noisy User-generated Text (W-NUT) at COLING</source>
          <year>2022</year>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Niu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stillman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kruspe</surname>
          </string-name>
          ,
          <article-title>OSINT or BULLSHINT? Exploring Open-Source Intelligence Tweets about the Russo-Ukrainian War (</article-title>
          <year>2025</year>
          ). URL: https://arxiv.org/abs/2508.03599. arXiv:
          <volume>2508</volume>
          .
          <fpage>03599</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>C. O'Neil</surname>
          </string-name>
          , Weapons of Math Destruction:
          <article-title>How Big Data Increases Inequality</article-title>
          and
          <string-name>
            <given-names>Threatens</given-names>
            <surname>Democracy</surname>
          </string-name>
          , Crown Publishing Group,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>K.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Q.</given-names>
            <surname>Peng</surname>
          </string-name>
          , H. Liu,
          <source>Unpacking Political Bias in Large Language Models: Insights Across Topic Polarization</source>
          , arXiv preprint (
          <year>2024</year>
          ). arXiv:
          <volume>2412</volume>
          .
          <fpage>16746</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <article-title>Bundeszentrale für politische Bildung, Die Geschichte des Wahl-O-</article-title>
          <string-name>
            <surname>Mat</surname>
          </string-name>
          ,
          <year>2025</year>
          . URL: https://www. bpb.de/themen/wahl
          <article-title>-o-mat/326661/die-geschichte-des-wahl-o-mat/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>OpenAI</surname>
          </string-name>
          , Openai api,
          <year>2023</year>
          . URL: https://openai.com/api/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>