<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>mendations with Natural-Language Profiles</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Paul Gagliano</string-name>
          <email>gaglianopa@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Grifin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Homan</string-name>
          <email>griffinhomanj@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Douglas Turnbull</string-name>
          <email>dougturnbull@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Venkata S Govindarajan</string-name>
          <email>vgovindarajan@ithaca.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Ithaca College</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1855</year>
      </pub-date>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Recent research has shown that language models can be used efectively for recommendation. We are interested in using language models to create natural-language (NL) profiles based on users' listening habits for the purpose of providing accurate, interpretable, and steerable recommendations. In this paper, we explore the usage of these NL profiles for recommendation with both a large commercial language model and a smaller open-source model. We find that, though these methods do not perform as well as traditional recommender systems (e.g., matrix factorization) in terms of accuracy, they do produce meaningful recommendations and provide the user the ability to control their recommendations using natural language.</p>
      </abstract>
      <kwd-group>
        <kwd>music recommendation</kwd>
        <kwd>language models</kwd>
        <kwd>natural language profiles</kwd>
        <kwd>popularity bias</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Discovering local music is challenging: most local artists are relatively obscure, and conventional</title>
        <p>recommendation interfaces struggle to surface them in a way that users trust and are able to influence.</p>
      </sec>
      <sec id="sec-1-2">
        <title>Localify.org addresses this by providing personalized, locally grounded artist and event recommenda</title>
        <p>
          tions. Building on prior work that characterized the dificulty of recommending long-tail artists and
the additional challenge of contextualizing those music recommendations for users [
          <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
          ], we explore
whether language models can be used to generate high-quality local artist recommendations.
        </p>
        <p>
          Specifically, we investigate whether the natural language (NL) user profiles (see Figure
1)
produced by language models can (1) recover the quality of the recommendation
comparable to more direct approaches that use a list of seed artists as a item-based profile, and
(2) remain interpretable [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and
potentially steerable [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] by users, thus
improving trust, scrutability, and
adoption by users. To do so, we compare two
item profile approaches (one that uses
a traditional matrix factorization
algorithm as a baseline, and one that uses
a language model without the added
step of creating NL profiles), introduce
prompt engineering for composing
descriptive and editable NL listening
proifles, and perform a detailed error
analysis to surface issues like popularity bias
that afect recommendation accuracy.
LGOBE
https://griffinhoman.com/ (G. Homan); https://dougturnbull.org/ (D. Turnbull); https://venkatasg.net (V. S. Govindarajan)
        </p>
        <p>CEUR</p>
        <p>ceur-ws.org</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>Recent work has reframed recommendation to be a language task by using large language models</title>
        <p>
          (LLMs), either as complete recommender systems (using a variety of embedding approaches) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] or
as agents to mediate between users and traditional recommender systems [
          <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
          ]. One focus of this
approach is to leverage the NL medium to first generate NL user profiles from lists of items with which
a user is associated, and then use these profiles to rank-order or recommend novel items [
          <xref ref-type="bibr" rid="ref10 ref3 ref4 ref9">4, 3, 9, 10</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>The advantage of this approach is that the NL profiles provide a degree of scrutability (the ability to</title>
        <p>examine) and steerability (the ability to alter) through the reading and manual editing of these profiles.</p>
      </sec>
      <sec id="sec-2-3">
        <title>The generation of these NL profiles can be reframed as a prompt engineering task, allowing us to utilize</title>
        <p>
          techniques like providing examples (one/few-shot prompting), and providing contrasting item sets to
improved the quality in terms of both readability and recommendation quality [
          <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Our work sits at the intersection of these threads. We evaluate LLMs both as recommenders and as interface layers by embedding user behavior into editable NL profiles, using established prompt engineering techniques, and studying how intrinsic data biases [13, 14] , such as artist popularity, afect model behavior in long-tail music recommendation settings.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Setup</title>
      <p>The evaluation of the experiments in this paper was carried out using data from Localify.org1 users
who signed up with their Spotify accounts. This data is comprised of the “heavy rotation” artists (artists
to which a user listens frequently) of 192 Localify users, and contains 3551 artists. Each user must have
at least 10 artists in their heavy rotation, but there is no upper limit on the number of artists in this set2.</p>
      <sec id="sec-3-1">
        <title>Henceforth, we will refer to the artists in these heavy rotations as ”seed artists”.</title>
        <p>
          During evaluation, each user’s seed artists were split into two groups at random. The first group was
used as their seed artists in the recommendation, and the second group was placed in the candidate
set. A set of artists (i.e. distractors) of the same size as the split seed groups was then taken from the
seed artists of other users and placed in the candidate set as well, and the language model was asked to
rank the candidate set. The score for each recommendation was found by calculating the area under
the ROC curve (AUC) [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] of a binary list representing the ranked artists, where 1 (a true positive)
represents an artist that was recovered from the user’s seeds, and 0 (a true negative) represents an artist
that is from distractor set. AUC is a common evaluation metric for evaluating the quality of ranked
data, useful because it evaluates the accuracy of the order of a list of recommendations where randomly
ranking items has an expected value of 0.5, and a perfect ranking (all true positives rank ahead of all
true negatives) is 1.0.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>In our previous work [1], we establish Alternating Least Squares Matrix Factorization (MF) [16]</title>
        <p>as a useful algorithm for local music recommendation, and the algorithm used currently by Localify.</p>
      </sec>
      <sec id="sec-3-3">
        <title>With the evaluation metric used in this paper, MF performs with an AUC of 0.82. We will use this as a baseline by which to compare the recommendation techniques introduced in this paper.</title>
        <p>4. Direct Recommendations Without NL Profiles
In order to establish a baseline for recommendation performance using language models with our
evaluation method, we first generate recommendations by providing a language model directly with a
list of seed artists for a user. We then generated recommendations by providing both artist names and
their associated genres. The prompt used for these approaches can be seen in B.1 and the results can be
seen in Table 1.</p>
      </sec>
      <sec id="sec-3-4">
        <title>1https://localify.org</title>
      </sec>
      <sec id="sec-3-5">
        <title>2In order to ensure anonymity, 20% of all users’ seed artists were removed at random prior to any experiments or processing</title>
        <p>of data. This ensures that the sets of user seeds cannot be traced back to the user with which they are associated, even by
system administrators.</p>
        <p>We ran these experiments on two models; gpt-4o- Table 1: Average AUC (with standard error) for
mini via the OpenAI API and gemma-3-4b-it run- 192 Users with Item-based Profiles
ning on a local GPU workstation. When we
provided the model with artist genres to contextualize Model Artists Names Ar&amp;tisGteNnarmeses
the names, neither model’s accuracy improved by
a substantial amount. In the case of gemma-3-4b-it, gpt-4o-mini 0.75 (± 0.02) 0.77 (± 0.02)
providing genres improved the accuracy in some gMeFm(mbaas-e3l-i4nbe-)it 00..6892 ((±± 00..0022)) 0.68 (±– 0.02)
iterations of the experiment and degraded the
accuracy in others.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Initial Listening Profiles</title>
      <p>Our first experiment with embedding user seeds into a natural language (NL) profile used a naive prompt
as a starting point for these experiments. We asked the model to construct a natural language profile
of a user’s listening habits given a list of artists that this user is known to listen to. The prompt for
this can be found in B.3. We then passed this profile, along with a list of candidate artists, to the model
in a new context window and asked it to rank the candidate set. Though we expected the quality of
the recommendations to degrade by removing the set of seed artists, the experiments with these initial
natural language listening profiles showed no substantial change in accuracy from the experiments
that provided the seed artists directly in the prompt (see Table 2).</p>
    </sec>
    <sec id="sec-5">
      <title>6. Prompt Engineering</title>
      <sec id="sec-5-1">
        <title>We performed a series of prompt engineering experiments to try and improve the quality of our natural language profiles. We hypothesized that, in so doing, we would improve the quality of the recommendations. We also wanted to make these profiles as human-readable and concise as possible in the interest of future research on steerability [4].</title>
        <sec id="sec-5-1-1">
          <title>6.1. Example</title>
          <p>
            Research shows that LM-generated
summaries can be improved by providing the
model with contrasting examples [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]. We
applied this to our listening profiles by
providing the model with the known artists of
n random other users from the test set, and
asking it to write the profile paying special
attention to the diference between the
target user and the other users. We tried this
both with 3 contrasting users and with 5
contrasting users, and did not notice a
substantial improvement with either approach.
          </p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>The prompt for this approach can be seen in B.4.</title>
        <sec id="sec-5-2-1">
          <title>6.2. Contrast</title>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>Research shows that providing a language</title>
        <p>
          model with an example output (one-shot prompting) can improve the quality of the response [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. We
tried this approach with two diferent example listening profiles. The first was human-written, and
the second was generated by the language model, and then iterated upon to create an optimal prompt.
        </p>
      </sec>
      <sec id="sec-5-4">
        <title>We found that the accuracy of the model degraded slightly when provided with the human-written</title>
        <p>example, and improved slightly when provided with the optimal generated example. However, both of
these changes were within two standard errors of the original score, and are not statistically significant.</p>
      </sec>
      <sec id="sec-5-5">
        <title>The prompt for this approach can be seen in B.5.</title>
        <sec id="sec-5-5-1">
          <title>6.3. First-Person, List</title>
        </sec>
      </sec>
      <sec id="sec-5-6">
        <title>Our next two experiments with prompt engineering put more of an emphasis on human readability</title>
        <p>than recommendation quality. We wanted to see if the quality would hold up if the listening profile was
formatted as a list, and if it was written in first person. This is because lists are easy to edit and augment,
and users will likely want to write in first person if they are editing and adding to their listening profile.</p>
      </sec>
      <sec id="sec-5-7">
        <title>Both of these experiments improved the accuracy by around 0.01, which is not significant given the standard error.</title>
        <sec id="sec-5-7-1">
          <title>6.4. All Techniques</title>
          <p>Our final experiments with prompt engineering combined all of these approaches. We also wrote the
prompts while keeping in mind OpenAI’s guidelines3 for prompt engineering using their API. In this
experiment, we found an accuracy of around 0.77, which is not a statistically significant improvement
over our previous LLM-based approaches. We also tried this experiment with an accompanying system
prompt, but did not see a significant diference in the AUC score. As such, although we will continue to
use these prompt engineering approaches due to the slight improvement in accuracy, prompt engineering
does not improve accuracy by a statistically significant amount in our case. The prompt for this approach
can be seen in B.6.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>7. Error Analysis</title>
      <sec id="sec-6-1">
        <title>To gain insight into why these recommendations are less efective than traditional matrix factoriza</title>
        <p>tion, we tracked the performance for diferent users’ seed artist sets across our multiple experiments.</p>
      </sec>
      <sec id="sec-6-2">
        <title>Identifying features of seed artist sets that correlate with high recommendation accuracy could inform</title>
        <p>the development of more efective recommendation techniques. As such, we recorded the scores for
individual sets of user seeds across six experiments; the original recommendation experiment with no</p>
      </sec>
      <sec id="sec-6-3">
        <title>3https://platform.openai.com/docs/guides/prompt-engineering</title>
        <p>(a) Average Popularity vs. Average AUC for Natural</p>
        <p>Language Recommendations
(b) Average Popularity vs. AUC for ALS-MF
Recommendations
embedded listening profile, the prompt engineering experiment using examples, the prompt engineering
experiment using contrasting items, and the three combined prompt engineering experiments (one
of which used a human-written example, one of which used a solid generated example, and one of
which used a generated example and a system prompt). The goal was not to see the accuracy of these
experiments, but to analyze how individual users’ recommendations were performing under a variety
of conditions.</p>
        <p>We found that the average score for recommendations using a specific user’s seed artists was
correlated positively (r = 0.337) with the mean Spotify popularity4 for those artists (i.e., the more
popular artists a user listens to, the more accurate their recommendations will be). A graph illustrating
this finding can be found in figure 2a. Each point in the graph represents the average artist popularity
for a user’s item-based profile vs. the average accuracy of their recommendations over 6 LLM prompts
(see Appendix B). For our baseline ALS-MF evaluation, we found that the correlation coeficient for
average artist popularity and AUC was only r = -0.015. A graph illustrating this finding can be found in
ifgure 2b. This suggests that our LLM-based recommendation approach is more susceptible to popularity
bias than our MF baseline.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>8. Popularity Bias</title>
      <p>Given the potential for popularity bias re- Table 3: Accuracy with NL Profiles by Persona
vealed by these findings, we perform an ex- Experiment Accuracy (gpt)
periment specifically focused on
determining the influence of artist popularity on LLM- Full Test Set 0.76 (± 0.02) 0.82 (± 0.02)
based recommendations. LMoewdiPuompuPloapruitlyarity 00..7746 ((±± 00..0034)) 00..8831 ((±± 00..0022))</p>
      <p>To start, we divide the set of artists in High Popularity 0.81 (± 0.03) 0.82 (± 0.02)
our dataset into three roughly equal-sized
groups based on their Spotify popularity; low popularity artists with a score below 64, medium popularity
artists with a score between 64 and 73, and high popularity artists with a score above 73.</p>
      <sec id="sec-7-1">
        <title>We used these popularity thresholds to divide each user’s seed artists into three groups, each of</title>
        <p>which is considered an individual user persona for this experiment. After filtering these personas by
size, we were left with 110 low popularity personas, 82 medium popularity personas, and 109 high
popularity personas. We then used these personas to run our best-performing approach, ”All (Generated</p>
      </sec>
      <sec id="sec-7-2">
        <title>Example)”, over the three popularity groups. The results of this experiment can be seen in Table 3.</title>
      </sec>
      <sec id="sec-7-3">
        <title>These results (see Table 3) suggest a popularity bias; the score for the recommendations using medium popularity personas is similar to our original experiment score, the score for the recommendations using low popularity personas is slightly lower, and the score for the recommendations using high popularity personas is similar to the MF baseline.</title>
        <p>Accuracy (Baseline MF)</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>9. Conclusion and Future Work</title>
      <sec id="sec-8-1">
        <title>In this paper, we explored a variety of methods for generating recommendations using NL user profiles</title>
        <p>and language models. It is clear that, with these methods, recommendations made with language
models do not perform as well (in terms of recommendation accuracy) as recommendations made with
traditional methods. However, these listening profiles provide a potential for user control that would
not be possible with conventional methods. As such, we believe that language models could be a useful
tool for scrutable and steerable recommendation.</p>
      </sec>
      <sec id="sec-8-2">
        <title>There are some limitations on the work we were able to do with NL user profiles. There are an infinite</title>
        <p>number of approaches to prompt engineering we could have tried, and there are many more language
models (both commercial-grade and open-source) that we could have used for the experiments. These
are both avenues that could lead to better accuracy and more readable listening profiles.</p>
      </sec>
      <sec id="sec-8-3">
        <title>4https://developer.spotify.com/documentation/web-api/reference/get-an-artist</title>
      </sec>
      <sec id="sec-8-4">
        <title>Future research will include attempting to improve recommendation performance through language</title>
        <p>
          model training and fine-tuning. We are also interested in exploring hybrid recommendation which uses
traditional recommendation algorithms to rank order candidate artists and then uses language models
to help explain recommendations [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Finally, we plan to explore scrutability and steerability which
involves conducting user studies to explore diferent ways users can benefit from natural language
profiles.
        </p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <sec id="sec-9-1">
        <title>This research was supported by NSF Award IIS-2312866. All content represents the opinion of the</title>
        <p>authors, which is not necessarily shared or endorsed by their respective employers and/or sponsors.</p>
      </sec>
      <sec id="sec-9-2">
        <title>We’d like to thank Joyce Zhou and Thorsten Joachims for their helpful discussions related to this research.</title>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Declaration on Generative AI</title>
      <sec id="sec-10-1">
        <title>During the preparation of this work, the authors used X-GPT-4 for grammar and spell-checking. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the publication’s content.</title>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>A. Experiment Code</title>
      <p>The code used for this paper can be found on GitHub at https://github.com/JimiLab/localify-nlp.</p>
    </sec>
    <sec id="sec-12">
      <title>B. Prompts</title>
      <p>This section enumerates prompts used for LLM artist recommendation. B.1 is used for item-based profile recommendation.
B.2 is the generic prompt for natural language (NL) profile recommendation. B.3 - B.6 are the prompts used to generate NL
profiles using diferent prompt engineering techniques. B.7 is used in conjunction with the recommendation prompts to
achieve a model response that can be parsed by our evaluation procedure.</p>
      <sec id="sec-12-1">
        <title>B.1. Prompt used for Direct Recommendations</title>
        <p>You are an expert in music recommendation. Your specialty is in ranking a list of artists by how similar each one is
to a diferent set of artists that someone already knows.</p>
        <p>You are presented with a client who frequently listens to the following artists:
You are asked to use your expert knowledge of these artists to rank the following artists (on which you are also an
expert) in order from most recommended to least recommended:
B.2. Prompt used for Recommendations with Listening Profiles
You are an expert in music recommendation. Your specialty is in ranking a list of artists based on a textual listening
profile that you are provided.</p>
        <p>You are able to perfectly rank these artists in order of preference, where preference is defined by how much a user
matching your provided listening profile will enjoy that artist.</p>
        <p>You are presented with a client with the following listening profile:</p>
      </sec>
      <sec id="sec-12-2">
        <title>B.3. Prompt used for Initial Listening Profiles</title>
        <p>You are an expert in describing people’s music listening habits. You are presented with a client who listens to the
following artists:</p>
        <p>Give a textual description of this person’s listening habits, without using artist names.</p>
      </sec>
      <sec id="sec-12-3">
        <title>B.4. Prompt used for Listening Profiles by Example</title>
        <p>You are an expert in describing people’s music listening habits. You are presented with a client who listens to the
following artists:
Give a textual description of this person’s listening habits, without using artist names. Write this description
according to the following example, but the details should correspond to the user I have provided.
You are asked to use your expert knowledge of musical artists and your complete understanding of the listening
profile to rank the following artists (on which you are also an expert) in order from most recommended to least
recommended:
B.4.1. Human-Written Example</p>
        <p>My music taste is a mix of pop, musical theatre, r&amp;b and dance/electronic. I used to be in theatre, so I love a
great singer, and I really value the quality of voice and the skill of the band. As such, I love when artists use real
instruments and studio musicians. I do also like music at the other side of the spectrum that’s electronic, but not
synth trying to be instruments. I like upbeat music, so I don’t usually listen to sad songs but I will if the vocals are
amazing, and I love a dreamy relaxing r&amp;b track. I also love a rap feature on a pop song.</p>
        <p>My musical taste weaves velvety jazz-infused neo-soul with driving, synth-heavy electronic grooves that efortlessly
blend warmth and futurism. I gravitate toward smoky, upright-bass-led lounge numbers that evoke intimate club
corners, alongside pulsating house tracks that ignite late-night dancefloors. I refresh my sets with experimental
ambient textures and lo-fi hip-hop beats that layer nostalgic vinyl crackle over head-nodding rhythms, while
occasional avant-garde free-jazz injections add daring dissonance. This balance of cozy soulfulness and
boundarypushing sonic exploration speaks to my love of music that soothes and stimulates in equal measure. The result is a
listening profile rooted in sophistication and spontaneity, comfort and curiosity, ofering a journey that feels both
familiar and thrilling.</p>
      </sec>
      <sec id="sec-12-4">
        <title>B.5. Prompt used for Listening Profiles by Contrast</title>
        <p>You are an expert in describing people’s music listening habits. You are presented with a client who listens to the
following artists:
{other_seeds}
You are also presented with the following other users’ familiar artists:
Give a textual description of this person’s listening habits, without using artist names. Focus on how your client’s
listening habits difer from the habits of the other users (what makes them unique). Do not directly mention these
other users, and do not pander to the client. Give an accurate description that gives the best possible summary of
the artists that they like. The description should be designed such that a third party could use it to make artist
recommendations.</p>
        <p>B.6. Prompt used for Listening Profiles by All Techniques
You are an expert in describing people’s music listening habits. You are experienced in writing clear, concise, and
descriptive summaries of people’s listening habits based on a list of artists that they like.</p>
        <p>You are presented with a client who listens to the following artists:
You are also presented with the following other users’ familiar artists:
Give a textual description of this person’s listening habits, without using artist names. Focus on how your client’s
listening habits difer from the habits of the other users (what makes them unique). Do not directly mention these
other users, and do not pander to the client. Give an accurate description that gives the best possible summary of
the artists that they like. The description should be designed such that a third party could use it to make artist
recommendations, and it should follow the following example:
as a structure and guide, but the details should pertain to the client it is written for. The description should be
written in first-person, from the perspective of the client.</p>
      </sec>
      <sec id="sec-12-5">
        <title>B.7. Post-amble used for Parseable Model Responses</title>
        <p>You must present these recommendations in a very specific way. Each candidate artist that you are recommending
has an integer ID associated with them. The key is as follows:
You must finish your response by listing your recommendations only using these ids, separated by commas, and
surrounded by &lt;&gt;. An example recommendation would be as follows:
&lt;#,#,#,#,#,#,#,#,#,#,...&gt;
With each hashtag replaced by the artist you recommend in that position. You must abide by the following rules:
- Rank all of the artists in the candidate set I provided you, not just however many are in my example recommendation
- Do not include any artists not in that candidate set
- Your response must be formatted as I have said, in a list of comma separated ids (governed by the key I provided)
and surrounded by angle brackets/chevrons (&lt;&gt;)</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Turnbull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Trainor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Homan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Richards</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bentley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Conrad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gagliano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Raineault</surname>
          </string-name>
          , T. Joachims, Localify. org:
          <article-title>Locally-focus music artist and event recommendation</article-title>
          ,
          <source>in: Proceedings of the 17th ACM Conference on Recommender Systems</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>1200</fpage>
          -
          <lpage>1203</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gagliano</surname>
          </string-name>
          , G. Homan,
          <string-name>
            <given-names>C.</given-names>
            <surname>Raineault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ayambem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Burns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Turnbull</surname>
          </string-name>
          ,
          <article-title>Localify.org: Contextualizing long-tail music for local artist discovery</article-title>
          ,
          <source>in: LBD Proceedings of the 2024 ISMIR Conference</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Radlinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Balog</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Diaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dixon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wedin</surname>
          </string-name>
          ,
          <article-title>On natural language user profiles for transparent and scrutable recommendation</article-title>
          ,
          <source>in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2022</year>
          . URL: https://dl.acm.org/doi/10.1145/ 3477495.3531873.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Joachima</surname>
          </string-name>
          ,
          <article-title>Language-based user profiles for recommendation</article-title>
          ,
          <source>in: WSDM 2024 Workshop on Large Language Models for Individuals, Groups, and Society</source>
          ,
          <year>2024</year>
          . URL: https: //arxiv.org/pdf/2402.15623.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Oramas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferraro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sarasua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gouyon</surname>
          </string-name>
          ,
          <article-title>Talking to your recs: Multimodal embeddings for recommendation and retrieval</article-title>
          ,
          <source>in: MuRS 2024: 2nd Music Recommender Systems Workshop</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <article-title>Large language models for recommendation: Past, present, and future</article-title>
          ,
          <source>in: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2024</year>
          . URL: https://dl.acm.org/doi/abs/10. 1145/3626772.3661383.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Zhang,</surname>
          </string-name>
          <article-title>iagent: Llm agent as a shield between user and recommender systems</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2502.14662. doi:
          <volume>10</volume>
          .48550/arXiv.2502.14662. arXiv:
          <volume>2502</volume>
          .
          <fpage>14662</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Weng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Zhang,</surname>
          </string-name>
          <article-title>Recommender systems meet large language model agents: A survey, Foundations and Trends® in Privacy and Security 7 (</article-title>
          <year>2025</year>
          )
          <fpage>247</fpage>
          -
          <lpage>396</lpage>
          . URL: http://dx.doi.org/10.1561/3300000050. doi:
          <volume>10</volume>
          .1561/3300000050.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ramos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Rahmani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lipani</surname>
          </string-name>
          ,
          <article-title>Transparent and scrutable recommendations using natural language user profiles</article-title>
          ,
          <source>in: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2024</year>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>753</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Penaloza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Gouvert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Charlin</surname>
          </string-name>
          , Tears:
          <article-title>Textual representations for scrutable recommendations</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2410.19302.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B.</given-names>
            <surname>Meskó</surname>
          </string-name>
          ,
          <article-title>Prompt engineering as an important emerging skill for medical professionals: Tutorial</article-title>
          ,
          <source>Journal of Medical Internet Research</source>
          (
          <year>2023</year>
          ). URL: https://www.jmir.org/
          <year>2023</year>
          /1/e50638/PDF.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>X.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. Das</surname>
          </string-name>
          ,
          <article-title>Customizing language model responses with contrastive in-context learning</article-title>
          ,
          <source>in: Association for the Advancement of Artificial Intelligence Conference</source>
          ,
          <year>2024</year>
          . URL: https: //arxiv.org/abs/2401.17390v1.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Abdollahpouri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mansoury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mobasher</surname>
          </string-name>
          ,
          <article-title>The unfairness of popularity bias in recommendation</article-title>
          , arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>13286</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Trainor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Turnbull</surname>
          </string-name>
          ,
          <article-title>Popularity degradation bias in local music recommendation</article-title>
          ,
          <source>in: Proceedings of the MuRS Workshop at the 17th ACM Conference on Recommender Systems (RecSys</source>
          <year>2023</year>
          ),
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Bradley</surname>
          </string-name>
          ,
          <article-title>The use of the area under the roc curve in the evaluation of machine learning algorithms</article-title>
          ,
          <source>Pattern recognition 30</source>
          (
          <year>1997</year>
          )
          <fpage>1145</fpage>
          -
          <lpage>1159</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Koren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Volinsky</surname>
          </string-name>
          ,
          <article-title>Collaborative filtering for implicit feedback datasets</article-title>
          , in: 2008 Eighth IEEE international conference on data mining, Ieee,
          <year>2008</year>
          , pp.
          <fpage>263</fpage>
          -
          <lpage>272</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>