=Paper= {{Paper |id=Vol-3278/paper5 |storemode=property |title=Investigating the Relationship Between Liking and Belief in AI Authorship in the Context of Irish Traditional Music |pdfUrl=https://ceur-ws.org/Vol-3278/paper5.pdf |volume=Vol-3278 |authors=Ken Déguernel,Bob L. T. Sturm,Hugo Maruri-Aguilar |dblpUrl=https://dblp.org/rec/conf/aiia/DeguernelSM22 }} ==Investigating the Relationship Between Liking and Belief in AI Authorship in the Context of Irish Traditional Music== https://ceur-ws.org/Vol-3278/paper5.pdf
Investigating the relationship between liking and
belief in AI authorship in the context of Irish
traditional music
Ken Déguernel1,* , Bob L. T. Sturm1,* and Hugo Maruri-Aguilar2
1
    Royal Institute of Technology KTH, Lindstedtsvägen 24 SE-100 44 Stockholm, Sweden
2
    Queen Mary University of London, Mile End Road, London E1 4NS, UK


                                         Abstract
                                         Past work has investigated the degree to which human listeners may be prejudiced against music knowing
                                         that it was created by artificial intelligence (AI). While these studies did not find a statistically significant
                                         relationship, the listening experiments were performed with music genres such as contemporary classical
                                         music or free jazz which are fairly welcoming of technology. In this work, we explore this prejudice
                                         in a context where strong opinions on authenticity and technology are typical: Irish traditional music
                                         (ITM). We conduct a listening experiment with practitioners of ITM asking each subject to first listen to a
                                         human performance of music generated by a computer in the style of ITM (this provenance is unknown
                                         to the listener), and then rate how much they like the piece. After rating all six pieces, each subject
                                         listens to each again but rates how likely they believe it is composed by a computer. The results of our
                                         pilot study suggest ITM practitioners tend to rate belief in AI authorship lower the more they rate liking
                                         a tune.

                                         Keywords
                                         Creative AI systems, Appreciation bias, Liking, Expertise, Listening test, Irish traditional music




1. Introduction
One’s experience of music can involve numerous factors, some of which are related to what
one senses (e.g., skill and effort, materials, setting), and some related to what one knows (e.g.,
programmatic information, historic context, authenticity). Music appreciation is influenced by
musical properties [1, 2, 3] modulated by personal [4, 5] and contextual factors [6, 7], such as
socio-cultural contexts [8, 9]. Prejudice and expectations also play a role in one’s engagement
with music. For instance, Canonne [10] showed that the listening experience is drastically
different whether someone thinks they are listening to a composition or an improvisation. And
Kroger and Margulis [11] showed that music appreciation can be biased by a listener’s belief
that a performance is of a renowned musician or a student.
   When applying artificial intelligence (AI) to music creation, how does one’s knowledge about
the involvement of AI impact their appreciation of the resulting music? Moffat and Kelly [12]
investigated such bias by having music listeners rate their liking of particular pieces of music

CREAI 2022, Workshop on Artificial Intelligence and Creativity, Nov.28–Dec.02, 2022, Udine, Italy
*
 Corresponding author.
$ kende@kth.se (K. Déguernel); bobs@kth.se (B. L. T. Sturm); h.maruri-aguilar@qmul.ac.uk (H. Maruri-Aguilar)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
in several styles, such as contemporary classical music or free jazz, and asking whether they
thought each music excerpt was composed by a human or by a computer. This study was
unable to find any bias for any of the styles. Pasquier et al. [13] performed more extensive
listening experiments in these directions with many more subjects, but test only one kind of
music (contemporary string quartets). Their results also suggest that bias against AI in music
is not significant. Moura and Maw [14] investigated attitudes about the involvement of AI in
music using a survey and a behavioral experiment. Interestingly, they found contradictory
results, where the survey respondents display negative attitudes toward AI in music, but the
experiment participants do not show any significant differences in their responses based on
knowing whether the music comes from a human or AI.
   In this paper, we investigate the extent to which bias against AI is present in a context
where computer authorship is considered at odds in the musical practice. To what extent
does the culture and context of the music to which AI is being applied matter when it comes
to a listener’s perception of the results? Inspired by these prior studies, we investigate the
relationship between liking for a musical piece and the belief that it is AI-composed in the
context of Irish traditional music (ITM). Participants of our experiments are active practitioners
of ITM, and are drawn from traditional music programs in Ireland. Our hypothesis is that the
context of ITM is one in which authenticity is so heavily human-centred that practitioners
will show a bias against liking music they believe is authored by AI, or alternatively against
believing music that they like is AI-composed.
   In the next section, we review the details of past experiments in this area [12, 13, 14]. Section
3 briefly presents the context of ITM and its relationship to innovations, both social and
technological. Both of these help motivate the decisions we make in the design and analysis of
our listening experiment, which are described in Sec. 4 and 5, respectively. Section 6 discusses
the results of our experiment. Finally, we conclude with a look towards future iterations of this
experiment.


2. Previous studies investigating bias against AI in music
One of the challenges of studying bias against AI involvement in music is that there is no
standard methodology on how to proceed. Different experimental designs have been proposed,
as described for the following three studies.
   Moffat and Kelly [12] presents a two-stage listening experiment conducted with 20 partic-
ipants. Each of six stimuli of the experiment is a one-minute audio recording excerpt of a
musical piece: three are designated computer-generated, and the other three human-composed.
In the first stage, participants rate their liking of the stimulus on a 5-point Likert scale, and
indicated whether they believed it was composed by a human or a computer. In the second
stage, the participant are given information about the authorship of each stimulus, and then
answer a written questionnaire asking how much they enjoyed the music, and whether they
would buy the music, download it, or recommend it to a friend. The authors conclude from
their results that their participants preferred music thought to be human-composed rather than
computer-composed music, but the participants did not change their liking of a music piece
after being told about its origins. That is, there was no evidence that a listener likes a piece less
or more after being told it was created by a computer or a human.
   Pasquier et al. [13] presents an experiment reproducing and extending that of Moffat and
Kelly. They use six video-recorded stimuli, three of which are from the same human composer,
and the other three are generated by an AI system designed by that composer [15]. A trial
of this experiment involves watching a video and then giving a 50-point rating along each
of four dimensions: “Good–Bad”, “Like–Dislike”, “Emotional–Unemotional” and “Natural–
Artificial”. All participants did this twice for each stimulus (randomized), but under three
different conditions. In one condition, a participant is never told about human/computer
authorship, and in another condition, a participant is told about authorship. In the third
condition (“informed”) a participant is told about the authorship only after they first rate the
six stimuli. Pasquier et al. [13] reports results from 122 participants, and conclude that there is
no significant difference in any of the four dimensions between conditions, or even within the
informed condition.
   Moura and Maw [14] explores listener attitudes to the involvement of AI in music creation.
They conducted an online survey with one group of people consisting of 72 music professionals
and another of 374 non-professionals. Both groups showed some minor degree of questioning
the credibility of musicians using AI, and reported a low likelihood of purchasing music created
by AI. They also conducted a listening test with 86 university students split into two groups.
In one group a subject reads a narrative describing the music they will hear as AI-generated.
In the other group a subject reads a narrative describing human emotions and experiences
reflected in the music. Both groups listened to the same music: two 1.5-minute excerpts of an
AI (co-)composed work. Moura and Maw found no significant differences in responses between
the two groups, and thus concluded that a listener’s perception of a song they like is not affected
by knowing it was generated by AI.


3. The Context of Irish Traditional Music
Irish traditional music (ITM) is a complex genre encompassing many practices, and is an
important part of Irish culture and identity [16, 17, 18]. Due in great part to the folk music
revivals of the mid-20th century, as well as culture-focused organizations of Ireland, not to
mention waves of immigration resulting from dire economic and environmental conditions in
the 19th and 20th centuries, ITM has spread around the globe, and is actively practiced today
by enthusiasts of the music [19].
   The practice of ITM is accompanied by values that emphasize authenticity, etiquette and often
nationalism [18, 20, 21]. For instance, there are strong opinions about the ways in which tunes
should be learned and taught, how tunes should be performed, which tunes and instruments
are acceptable, and so on. Several of these aspects of ITM are implicitly and explicitly codified
in the local and national summer schools and music competitions organized throughout Ireland
each year by cultural organizations, such as Comhaltas1 and Oireachtas na Gaeilge,2 founded to
preserve and promote Irish culture [22, 23].
   Considering the context of ITM and its values, there exists tension around innovation, as
1
    https://comhaltas.ie
2
    https://www.antoireachtas.ie/
well as how ITM should be used and presented [24]. In a speech delivered at a 1996 academic
conference focused on traditional music in Ireland, the musician and Irish music advocate
Tony MacMahon [25] relayed concerns about the loss of authenticity due to external forces
of innovation by commercialization. Hillhouse [18] highlights the contradiction between
community ownership and authorship inherent to ITM and the importance of intellectual
property in national and international marketplaces. More recently, computer science research
applying AI to modeling and imitating stylistic elements of ITM revealed friction around
notions of authorship and the (im)proper treatment of the tradition [24]. That authenticity is
so important to ITM suggests that there should be a bias on the part of the ITM practitioner
against inauthentic forces, such as AI, coming to play. However, this has yet to be explored to
the best of our knowledge.


4. Method
The hypothesis we want to test is the following: an ITM practitioner will exhibit an inverse
relationship between liking a piece of music and believing it is authored by AI in the context of
ITM.

4.1. Participants
We drew participants from traditional music programs at the University of Limerick, Ireland.
The first cohort (E1) includes 20 participants (12 women and 8 men, aged 18–64 years, M=36.75,
SD=16.02), who are students, teachers and technicians from the BLAS International Summer
School of Irish Traditional Music and Dance.3 The second cohort (E2) includes 26 participants
(20 women and 6 men, aged 19–60 years, M=25.73, SD=10.14), who are students of degree
programs offered at the Irish World Academy of Music and Dance.4 Each participant was
compensated with a €20 gift card.

4.2. Stimuli
Since we expected participants to be very knowledgeable about ITM, we decided not to use
existing traditional tunes as stimuli for this experiment to prevent familiarity issues. Instead,
we hired professional Irish accordionist Padraig O’Connor5 to select six tunes that he likes from
large collections generated using a particular AI system [26, 27], including one consisting of
58,105 tunes [28].6 The six double jigs selected by O’Connor all have an AABB form, with each
section comprising eight 6/8 bars. Figure 1 shows the notation of one of these tunes.
   O’Connor recorded himself playing each tune on solo accordion, with stylistic ornamentation,
variation, bass, and harmonic content added as he saw fit. We permitted O’Connor to make
minor changes to the notation of each tune as he wanted, but very few changes were made.

3
  https://www.blas.ie
4
  https://https://www.irishworldacademy.ie/
5
  http://www.paudieoconnor.com/
6
  O’Connor participated as one of four judges in the AI Music Generation Challenge 2020, which focused on generating
  plausible Irish double jigs [27].
Figure 1: Notation of AI-composed double jig No. 8091 [27].


Having a professional ITM practitioner select and perform the stimuli ensures that they are
presented in a realistic setting with an authentic performance. The resulting six stimuli are
about the same duration (M=75.5 seconds, SD=1.51) and tempo (M=106 bpm, SD=1.84). All
stimuli were recorded by O’Connor at his home with the same accordion and single microphone
audio setup to avoid discrepancies in the audio quality. The stimuli were encoded with mp3
format (MPEG ADTS, layer III, v1, 128 kbps, 44.1 kHz, stereo) and are available online.7

4.3. Procedure
For both cohorts (E1 and E2), the experimental sessions took place in a media lab at the University
of Limerick. We designed and hosted the experiments with a web-based interface built using
jsPsych [29]. All participants were briefed and provided their informed consent before taking
part. Participants used the same computer and headset models. A soundcheck using a recording
of an Irish traditional tune allowed the participants to adjust the volume setting to a comfortable
level.
   The experimental paradigm comprised two tasks in series: the “liking task” and then the
“authorship task”.
· In the “Liking task”, the participant is asked, “How much do you like the tune?”, and is given a
5-point Likert scale anchored by the labels (from left to right) “Don’t like it at all”, “Don’t like
it”, “Neutral”, “Like it” and “Like it a lot”.
· In the “Authorship task”, the participant is asked, “How likely do you believe that the tune is
composed by a computer?”, and is given a 5-point Likert scale anchored by the labels (from left
to right) “Not likely at all”, “Not likely”, “Neutral”, “Likely” and “Very likely”.
These tasks were completed in this order to avoid liking ratings being influenced by a prior
mention of AI. For each task, on-screen instructions were provided. In both tasks, the participants
were encouraged to use the full range of the scale. The rating scale only appeared once the
stimulus had finished playing. The order of stimuli in each task was randomized for each
participant.
   After completing the two tasks, the participants filled out a short questionnaire about demo-
graphics, musical practice, and familiarity with ITM. They answered the following questions:

       • age: “How old are you in years?”
       • gender: “What is your gender?”
       • nationality: “What is your nationality?”

7
    https://www.kth.se/profile/bobs/page/research-data
    • education: “What is the highest level of education you have achieved?”
    • mus_pro: “Are you or have you been a professional musician?”
    • irish_fam: “How many Irish traditional tunes can you play/sing from memory?”, with
      options “0–10”, “11-50”, “51–100” and “100+”
    • instrument: “What is (are) your main instrument(s)”
As a final question, the participant was asked to freely describe any strategies they used to
determine whether a tune was composed by a human or a computer.
  Each trial of this experiment lasted about 20 minutes. The experiment was reviewed and
approved by the Ethics Committee of KTH (V-2021-0615).

4.4. Data analysis
We code the demographic data as follows: age is left as an interval variable; gender and
mus_pro are nominal with two levels; nationality is nominal with two levels based on
being Irish; instrument is nominal with two levels based on whether accordion is specified;
education and irish_fam are both ordinal with four levels.
   A two-sample 𝑡-test on demographic data collected during the experiment shows that E1
and E2 differ significantly in age (𝑡 = 2.79, df = 44, 𝑝 < .008). A Mann-Whitney U test
shows a significant difference between E1 and E2 only in mus_pro (𝑈 = 171.0, 𝑝 < 0.03),
but does not show a significant difference in education (𝑈 = 315.5, 𝑝 > 0.16), irish_fam
(𝑈 = 293.5, 𝑝 > 0.44), gender (𝑈 = 304.0, 𝑝 > 0.22), nationality (𝑈 = 196.0, 𝑝 > 0.06),
or instrument (𝑈 = 269.0, 𝑝 > 0.74).
   Our analytical approach involves two stages. In the first stage we model the bivariate
responses of the participants using linear mixed-effects models [30]. In the second stage we
regress on the individual coefficients of the model in the first stage using the demographic
covariates mentioned above.
   Denote by 𝑥𝑗𝑡 the value of the “Liking” by the 𝑗-th participant in the trial they evaluate the
𝑡-th tune, and 𝑦𝑗𝑡 the value of their belief that the same tune was composed by an AI (AIC). One
model relating these responses is

                               𝑥𝑗𝑡 = 𝜇 + 𝑚𝑗 𝑦𝑗𝑡 + 𝛽𝑡 + 𝑏𝑗 + 𝜀𝑗𝑡                               (1)

where 𝜇 and 𝑚𝑗 are the intercept a participant-based slope, respectively; 𝛽𝑡 is a fixed effect
of tune and 𝑏𝑗 is a random effect of participant; and 𝜀𝑗𝑡 is the residual error. An alternative
mixed-effects model of the bivariate responses casts AIC as a function of reported liking:

                               𝑦𝑗𝑡 = 𝜇 + 𝑚𝑗 𝑥𝑗𝑡 + 𝛽𝑡 + 𝑏𝑗 + 𝜀𝑗𝑡 .                             (2)

All random quantities are considered independent with zero means and variances 𝜎𝐽2 and 𝜎 2 ,
respectively. Each model considers tune to be a factor with six levels (and thus a fixed effect),
because we only wish to draw conclusions about the population of participants. In other word,
we wish to generalize our conclusions to ITM practitioners for these six stimuli, and not to the
population of tunes O’Connor would curate and perform from a large collection.
  The differences in interpretation of these models is important. Model (1) poses liking as a
function of the belief of being AI-composed, which can be motivated by current thinking about
how aesthetic appreciation is a function of many factors, such as the value a stimulus has for
a participant, the context of the perception of the stimulus, and the physiological state of the
participant [31]. Model (2) seeks to determine how the factors considered by a participant in
rating their belief a tune is AI-composed relate to or modulate factors they considered in rating
their liking of the tune. If 𝑚𝑗 is significantly different from zero then one might conclude there
to be a significant overlap of these factors. However, since the number of stimuli are few and
the time between the two tasks is short, we expect there to be some contribution of memory
informing AIC for a participant. In other words, their memory of having rated their liking of a
tune a particular way could inform their AIC rating of the tune. Model (2) can thus be motivated
by a more machine-learning oriented goal where one wants to predict a participant’s AIC rating
from their rating of their liking of the tune.
   In the second stage, we attempt to explain the participant coefficients 𝑚𝑗 of the first stage
using participant covariates. To this end, denote by 𝑧𝑗 the vector of covariate measurements,
available for the 𝑗-th participant. The model is a standard regression
                                     𝑚𝑗 = 𝜃0 + 𝜃𝑇 𝑧𝑗 + 𝜀𝑗                                      (3)
where 𝜃0 and 𝜃 are the model coefficients and 𝜀𝑗 is the usual independent error terms assumed
normal with variance 𝜎 ′2 . In the above formula, for simplicity, we write 𝑚𝑗 to mean the fitted
coefficient stemming from the first stage of the analysis.
  Finally, we analyze the free-form response about strategies using the method of constant
comparison [32] to identify recurring themes and convergences in the strategies of participants.
This allows us to explore potential explicit bias and identify which elements of ITM practice are
expected to be different when composed by a computer. These, in turn, might reflect a listening
focus in ITM practice that may inform future research in empirical musicology and generative
systems.


5. Results
5.1. Quantitative analysis: Relationship between liking and belief in
     AI-authorship
Figure 2 illustrates a cross-tabulation of bivariate responses for each cohort and the numerical
values of Pearson’s and Kendall’s 𝜏 correlations. Exploratory analysis shows that high “Liking”
scores are associated with low belief in AI “Authorship” (AIC). We find a negative correlation
moderately strong in E1 while in E2 we find a weaker but still negative correlation.
   Considering that the two cohorts are samples of the population of interest drawn from the
same location, we first model the pooled data. The results are shown in the rows “E1+E2” in
Table 1, and the estimates of the participant coefficients are shown in Fig. 3. The number
of significant negative coefficients of Model (1) is 19 while that for Model (2) is 24. When
regressing the 𝑚𝑗 coefficients of Model (1) the only significant factors we find are cohort
(𝜃 = 0.163, 𝑡 = 2.62, 𝑝 < 0.013) and education (𝜃 = −0.073, 𝑡 = −2.17, 𝑝 < 0.04). For
the regression of the coefficients of Model (2), we find three significant factors: education
(𝜃 = −0.116, 𝑡 = −3.46, 𝑝 < 0.002), gender (𝜃 = −0.171, 𝑡 = −2.60, 𝑝 < 0.014) and
instrument (𝜃 = −0.388, 𝑡 = −4.29, 𝑝 < 0.0002).
      (E1) Correlation: 𝑟 = −0.412, 𝜏 = −0.361          (E2) Correlation: 𝑟 = −0.158, 𝜏 = −0.11
Figure 2: Tabulation of cases involving “Liking” and “Authorship” for cohorts E1 and E2. 𝑟 and 𝜏
correspond respectively to Pearson’s and Kendall’s correlation.




                    (a) Model (1)                                      (b) Model (2)

Figure 3: Estimated fixed effects 𝑚𝑗 with 95% confidence intervals for all participants (ordered by effect
size).


   We now fit our models to data of each cohort individually, which is motivated first by the
regression on the coefficients of Model (1), and second a difference in sampling participants
between them at the University of Limerick. More specifically, E1 consists of students of a two-
week-long summer school focused on Irish traditional music, as well as several staff (professors,
administrators, etc.). E2 consists of students enrolled in longer-term educational programs
about Irish traditional music. These cohorts may differ in other unmeasured ways as well, e.g.,
motivations to attend a two-week summer school are different from those to attend a longer
educational program at a university. This could explain why cohort is a significant factor in
the regression of the effects of Model (1) for the pooled data.
   For Model (1) we see that E1 has a much larger number of significant negative 𝑚𝑗 terms
(14 of 20) relative to the number for E2 (4 of 26). For Model (2), E1 has 18 𝑚𝑗 terms that are
significantly less than zero, of which E2 has 7. For E1, when regressing on the coefficients of
                             2
    Effect       Cohort     𝑟𝑚        𝑟𝑐2   AkaikeIC      BIC        Lik      𝑑𝑟𝑒𝑠       𝑚 [95% CI]
                 E1+E2     0.316    0.461    858.04     1042.23    -375.02     224   -0.18 [-0.28,-0.09]
  liking∼AIC
                   E1      0.263    0.263    388.59      459.80    -166.30      94   -0.35 [-0.50,-0.20]
   (Model 1)       E2      0.332    0.505    485.13      581.02    -208.56    124     -0.05 [-0.17,0.07]
                 E1+E2     0.253    0.566    1005.34    1189.57    -448.67     224   -0.31 [-0.47,-0.15]
 AIC∼liking
                   E1      0.350    0.350    426.91      498.12    -185.46      94   -0.46 [-0.68,-0.23]
   (Model 2)       E2      0.206    0.732    588.29      684.17    -260.14    124     -0.20 [-0.43,0.03]
Table 1
                                                                                            2
Analysis results for Models 1 and 2 for both cohorts pooled and separately. The quantities 𝑟𝑚 and 𝑟𝑐2 are
the marginal and the conditional coefficients of determination, respectively. We also report the Akaike
Information Criterion, Bayesian Information Criterion and (log)likelihood in columns AkaikeIC, BIC
and Lik, respectively, as well as the degrees of freedom for the residual 𝑑𝑟𝑒𝑠 . The column 𝑚 has the
estimate of average of 𝑚𝑗 with a confidence interval.



Model (1), we find significant positive contribution from irish_fam (𝜃 = 0.049, 𝑡 = 2.69,
𝑑𝑓 = 12, 𝑝 < 0.02) and age (𝜃 = 0.003, 𝑡 = 2.45, 𝑑𝑓 = 12, 𝑝 < 0.031). The adjusted
𝑅2 of this model is 0.301. When regressing on the coefficients of Model (2) for E1, we find
significant contributions from all covariates except mus_pro: age (𝜃 = 0.005, 𝑡 = 3.72,
𝑑𝑓 = 12, 𝑝 < 0.003), education (𝜃 = −0.048, 𝑡 = −2.30, 𝑑𝑓 = 12, 𝑝 < 0.04), irish_fam
(𝜃 = 0.078, 𝑡 = 4.42, 𝑑𝑓 = 12, 𝑝 < 0.001), gender (𝜃 = −0.12, 𝑡 = −2.28, 𝑑𝑓 = 12,
𝑝 < 0.042), nationality (𝜃 = −0.145, 𝑡 = −2.87, 𝑑𝑓 = 12, 𝑝 < 0.015), and instrument
(𝜃 = −0.248, 𝑡 = −3.9, 𝑑𝑓 = 12, 𝑝 < 0.003). The adjusted 𝑅2 of this model is 0.676. When
regressing on the coefficients of either model for E2, we find no significant contributions from
the covariates (adjusted 𝑅2 < 0.18).
   The explanatory variables had interesting imbalances between cohorts which we briefly
describe. Concerning variable age, E2 was younger than E1, but both cohorts covered about
the same range of age; that is, E2 has a more skewed distribution than E1 in age. Although
we can test equality of means and reject it strongly between cohorts (𝑝 = 4 × 10−10 ), the
shape of distributions is such that the influence in the analysis is much deeper than just the
location. Concerning binary variables, mus_pro had opposing proportions between cohorts;
whereas variables gender, nationality had less pronounced differences between cohorts,
only variable instrument had similar patterns between E1 and E2.

5.2. Qualitative analysis: Biases in self-reported strategies
At the conclusion of the listening test, each participant is asked “What strategies did you use to
determine if a tune was composed by a human or a computer?”. Our qualitative analysis of the
free-form responses done across both cohorts find five main strategies:
   ∙ 17 participants (37.0%) (5 from E1, 12 from E2) reported listening to repetitions and patterns.
Tunes that were deemed “overly repetitive” were associated with computer authorship.
   ∙ 12 participants (26.1%) (8 from E1, 4 from E2) reported listening to structure and harmony.
In particular, participants listened to how phrases were linked together as well as chord pro-
gressions and cadences. “Unnatural” harmonic and melodic structures or chords following a
                                       Adjectives/Descriptives used
                         for computers/AI                           for humans
               inorganic, unnatural, uncanny val- catchy, flowed, clarity, fluid, like
               ley feeling, simple, weird, rigid, out speaking language, alive, emotive, or-
               of place, robotic, unusual, logical, ganic, usual, finicky, surprising, sus-
               generic, predictable, algorithmic      taining interest, with purpose, famil-
                                                      iar, creative, thought out, natural
Table 2
List of adjectives/descriptives used by participants in their self-reported strategies to rate the authorship
of a tune.


structure deemed “very rigid” were associated with computer authorship.
    ∙ 11 participants (23.9%) (4 from E1, 7 from E2) reported listening to variation and ornamenta-
tion. Tunes with more variations in the melody and with more ornamentation/embellishment
were associated with human authorship.
    ∙ 10 participants (21.7%) (4 from E1, 6 from E2) reported listening to familiarity. Tunes that
were deemed to fit with the style of Irish music and sounded familiar in that aspect were
associated with human authorship. By contrast, tunes deemed too similar to existing tunes or
too “generic” were associated with computer authorship.
    ∙ 7 participants (15.2%) (5 from E1, 2 from E2) reported listening to instrumental technique.
Tunes with elements deemed “unnatural” to play on the accordion, such as “unusual” phrasing
or range were associated with computer authorship.
It is also noteworthy that 5 participants tried to use strategies based on audio quality, thinking
that some of the tunes were synthesized (although all tunes were human-performed in the same
conditions). This shows some potential confusion on what is meant by “computer-composed”.
    We also looked at the adjectives/descriptives used by the participants when talking about
the tunes they believed to be AI- or human-composed (see Table 2). in order to have a basic
sentiment analysis of their report. When talking about what they believe to be computer
generated, the descriptives used by participants were almost exclusively negative, with a few
neutral ones. Conversely, when talking about what they believe to be human-composed, the
descriptives used were for the large majority positive. This difference points to a potential
conscious prejudice regarding what a subject believes the capacities of AI or computers are
compared to humans when it comes to ITM composition. Anecdotal but still amusing, one of
the participants even claimed that their strategy was assuming that “the tunes [they] liked better
were composed by humans and the ones [they] disliked were composed by the computer.”


6. Discussion
Our quantitative analysis points to a plausible bias among ITM practitioners for these six
AI-composed tunes: they tend to like more the tunes they deem hardly likely to be composed
by an AI. Alternatively, the more they report liking a tune the less they report believing the
tune is AI-composed. The difference in results with the previous studies could validate our
hypothesis on the importance of the context (both in terms of musical culture and participants)
when it comes to observing such a bias. However, as it stands, this study can only be considered
a pilot study before performing a power analysis of the experiment to determine its likelihood
of producing a Type-I error. Nonetheless, this pilot study provides us with valuable information
regarding further testing of our hypothesis. In particular, expertise and professionalism must be
explicitly defined, accounting for differences between instrumental or vocal traditions within
ITM. Our wording of the professionalism question (mus_pro) makes no distinction between
career performers and a musician who has been paid to play at some event.
   The analysis of the self-reported strategies of the participants suggests a conscious bias
against AI authorship with more limited expectations on computer capabilities when it comes
to some musical criteria, and an overall negative sentiment when describing tunes they believed
to be computer composed. It is interesting to compare that attitude with the reception of
technology in general in ITM. Cawley [23] studied ethnographically the use and reception of
technology as a cultural process in the enculturation of Irish traditional musicians. From the
use of music notation to audio and video recording and the numerous websites providing ITM
educational resources, technology has changed the way ITM is learned, played and discussed.
These technologies are now well-accepted overall and an integral part of the day-to-day life of
Irish traditional musicians. However, a couple of caveats are noted: the “information overload”
arising from the amount of resources available, and the alienation of the tradition that can
come with the use of technology – especially if used without engaging in the “traditional”
social and musical interactions with other musicians. A good illustration of this relationship
with technology can be observed with Tunepal [33], a service enabling people to retrieve the
name and music notation of a tune from a short recording. Tunepal is the most downloaded
traditional music software, is widely used during sessions and classes, and generally well-
received by musicians, with some criticism from a minority pointing out that it sometimes
hinders engagement with ITM practitioners [33].
   It is therefore difficult to argue that a bias against AI authorship emerges only from a general
negative attitude toward technology. The difference in attitude that arises in the reception
of AI music generation shows that this practice might raise some specific issues. A potential
explanation, suggested by our qualitative results, could simply be the belief that current AI
systems are not skilled enough to reach human capabilities and we are observing a similar bias
as in Kroger and Margulis [11]. Alternatively, such a negative attitude could also stem from a
more Romantic and human-centered notion of creativity as an unconscious process associated
with the notions such as “inspiration” or “genius” [34, 35].
   It is also worth noting that AI music generation is not the only use of technology where
tension becomes apparent in ITM. For instance, Tunetracker [36] – a software designed to
“surveil” local sessions in a pub by documenting which tunes were played which days and in
which configuration with other tunes – was met with concerns from practitioners going from
the sheer presence of a microphone in a pub (although it was explicitly not used to store or
broadcast recordings), to more complex issues such as the idea of the homogenization of the
practice, or being caught by the Irish Music Rights Organisation seeking to collect royalties on
modern, copyright protected tunes.
7. Conclusion and future work
Overall, the results of our pilot study are encouraging. First, for these six tunes and our
ITM practitioners, our quantitative analysis shows evidence that they tend to have a negative
association between liking a tune and believing that it is AI-composed. Second, our qualitative
analysis points to a conscious prejudice against the application of AI systems to ITM. The results
from this pilot study encourages us to continue our investigation, and study larger populations.
It would be interesting to conduct an online version of this listening test in order to test our
hypothesis in the general population, and observe how bias might appear in non-practitioners of
ITM. It would also be interesting to test whether such differences hold in other musical traditions
that value authenticity in ways similar to ITM. However, given the potential importance of the
cultural context for the observation of bias, it is difficult to define a standardised experimental
design. Appropriate decisions should be taken to adapt to the different musical traditions and
to the participants. We also want to conduct a more in-depth qualitative study involving ITM
professionals and students using process-oriented research methodology such as Think-Aloud
Protocols involving introspection and retrospection [37] as well as participant-oriented research
with semi-guided interviews [38]. This research would help us have a better understanding on
the prejudice ITM practitioners may have with the involvement of AI in music composition or
AI-assisted co-composition, and help inform future work on the ethics of this field.


Acknowledgments
This paper is an outcome of MUSAiC, a project that has received funding from the European
Research Council under the European Union’s Horizon 2020 research and innovation program
(Grant agreement No. 864189). We would like to thank P. Cotter, S. Joyce, N. Keegan, and A.
Dormer for letting us conduct our study at the BLAS Summer School at the Irish World Academy
of Music and Dance, University of Limerick, Ireland. We would like to thank P. O’Connor for
taking part in the creation of the corpus of stimuli and for his interpretation of the tunes. We
would like to thank A. Clemente for discussions about the psychological aspects of this work.


References
 [1] G. Ilie, W. F. Thompson, A comparison of acoustic cues in music and speech for three
     dimensions of affect, Music Perception 23(4) (2006) 319–330.
 [2] V. Salimpoor, D. Zald, R. Zatorre, A. Dagher, A. McIntosh, Predictions and the brain: How
     musical sounds become rewarding, Trends in Cognitive Sciences 19(2) (2015) 86–91.
 [3] A. J. Milne, S. A. Herff, The perceptual relevance of balance, evenness, and entropy in
     musical rhythms, Cognition 203 (2020) 104233.
 [4] I. Lahdelma, T. Eerola, Cultural familiarity and musical expertise impact the pleasantness
     of consonance/dissonance but not its perceived tension, Scientific Reports 10 (2020) 8693.
 [5] M. Orr, S. Ohlsson, Relationship between complexity and liking as a function of expertise,
     Music perception 22(4) (2005) 583–611.
 [6] E. Brattico, From pleasure to liking and back: Bottom-up and top-down neural routes to the
     aesthetic enjoyment of music, in: J. Huston, M. Nadal, F. Mora, L. Agnati, C. Cela-Conde
     (Eds.), Art, aesthetics and the brain, Oxford University Press, 2015, pp. 303–318.
 [7] A. North, D. Hargreaves, J. Hargreaves, Uses of music in everyday life, Music perception
     22(1) (2004) 41–77.
 [8] A. North, D. Hargreaves, The social and applied psychology of music, Oxford University
     Press, 2008.
 [9] A. Greasley, A. Lamont, Musical preferences, in: S. Hallam, I. Cross, M. Thaut (Eds.), The
     Oxford Handbook of Music Psychology, Oxford University Press, 2016, pp. 263–284.
[10] C. Canonne, Listening to improvisation, Empirical Musicology Review 13(1–2) (2018).
[11] C. Kroger, E. H. Margulis, “But they told me it was professional": Extrinsic factors in the
     evaluation of musical performance, Psychology of Music 45(1) (2016) 49–64.
[12] D. C. Moffat, M. Kelly, An investigation into people’s bias against computational creativity
     in music composition, Assessment 13(11) (2006).
[13] P. Pasquier, A. Burnett, N. Gonzalez Thomas, J. B. Maxwell, A. Eigenfeldt, T. Loughin,
     Investigating listener bias against musical creativity, in: Proc. of the 7th International
     Conference on Computational Creativity, 2016, pp. 42–51.
[14] F. T. Moura, C. Maw, Artificial intelligence became Beethoven: how do listeners and music
     professionals perceive artificially composed music, Journal of Consumer Marketing 38(2)
     (2021) 137–146.
[15] A. Eigenfeldt, Corpus-based recombinant composition using a genetic algorithm, Soft
     Computing 16(12) (2012) 2049–2056.
[16] M. Ó Súilleabháin, Irish music defined, The Crane Bag 5(2) (1981) 83–87.
[17] M. Trachsel, Oral and literate constructs of “authentic” irish music, Éire-Ireland 30(3)
     (1995) 27–46.
[18] A. N. Hillhouse, Tradition and innovation in Irish instrumental folk music, Master’s thesis,
     The University of British Columbia, 2005.
[19] H. O’Shea, The making of Irish Traditional Music, Cork University Press, Cork, Ireland,
     2008.
[20] G. Ó hAllmhuráin, A pocket history of Irish Traditional Music, The O’Brien Press, 1998.
[21] M. D. Nicholsen, Francis O’Neill, music collection, and Irish traditional musicians in
     Chicago, 1898-1921, in: R. T. Cornish, M. Quintelli-Neary (Eds.), Crafting Infinity: Rework-
     ing Elements of Irish Culture, Cambridge Scholars Publishing, 2012.
[22] S. Spencer, Traditional Irish Music in the twenty-first century: Networks, technology, and
     the negotiation of authenticity, in: S. Brady, F. Walsh (Eds.), Crossroads: Performance
     studies and Irish culture, Palgrave Macmillan, 2009, pp. 58–70.
[23] J. Cawley, The musical enculturation of Irish traditional musicians: An ethnographic study
     of learning processes, Ph.D. thesis, National University of Ireland, Cork, 2013.
[24] R. Huang, B. L. T. Sturm, Reframing “aura”: Authenticity in the application of AI to Irish
     Traditional Music, in: Proc. of the AI Music Creativity conference, 2021.
[25] T. MacMahon, The language of passion, in: Crossroads Conference, 1996.
[26] B. L. Sturm, J. F. Santos, O. Ben-Tal, I. Korshunova, Music transcription modelling and
     composition using deep learning, in: Proc. of the Conference on Computer Simulation of
     Musical Creativity, Huddersfield, UK, 2016.
[27] B. L. T. Sturm, H. Maruri-Aguilar, The Ai Music Generation Challenge 2020: Double jigs
     in the style of O’Neill’s “1001”, Journal of Creative Music Systems (2021).
[28] B. L. T. Sturm, 2021, 58,105 irish-style double jigs, URL: http://kth.diva-portal.org/smash/
     record.jsf?pid=diva2%3A1562396.
[29] J. R. de Leeuw, A javascript library for creating behavioral experiments in a web browser,
     Behavior research methods 47(1) (2015) 1–12.
[30] J. J. Hox, M. Moerbeek, R. Van de Schoot, Multilevel analysis: Techniques and applications,
     Routledge, 2017.
[31] M. Skov, Aesthetic appreciation: The view from neuroimaging, Empirical Studies of the
     Arts 37(2) (2019) 220–248.
[32] K. Henwood, N. Pidgeon, Grounded theory in psychological research, in: P. Camic,
     J. Rhodes, L. Yardley (Eds.), Qualitative Research in Psychology: Expanding Perspectives
     in Methodology and Design, American Psychological Association, 2003, pp. 131–156.
[33] B. Duggan, B. O’Shea, Tunepal: searching a digital library of traditional music scores,
     OCLC Systems & Services: International digital library perspectives 27(4) (2011) 284–297.
[34] R. Pope, Creativity: Theory, History, Practice, Routledge, 2005.
[35] C. G. Johnson, The creative computer as Romantic hero? Computational creativity
     systems and creative personae, in: Proc. of the International Conference on Computational
     Creativity, 2012, pp. 57–61.
[36] B. Duggan, N. M. Su, TuneTracker: Tensions in the surveillance of traditional music, in:
     Proc. of the ACM Conference on Designing Interactive Systems, 2014, pp. 845–854.
[37] T. Boren, J. Ramey, Thinking aloud: Reconciling theory and practice, IEEE Transactions
     on Professional Communication 43(3) (2000) 261–278.
[38] G. Saldanha, S. O’Brien, Research methodologies in translation studies, Routledge, 2014.