<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multilingual Tagging Behaviour: The role of recommender systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Angelina Ziesemer</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>James Blustein</string-name>
          <email>jamie@cs.dal.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Milene Selbach Silveira</string-name>
          <email>milene.silveira@pucrs.br</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dalhousie University, Faculty of Computer Science</institution>
          ,
          <addr-line>Halifax, NS</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>PUCRS, Faculdade de Informática</institution>
          ,
          <addr-line>Av. Ipiranga 6681, Prédio 32, Porto Alegre - RS</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>PUCRS, Faculdade de Informática</institution>
          ,
          <addr-line>Av. Ipiranga 6681, Prédio 32, Porto Alegre - RS</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper investigates how recommendation a ects tagging behaviour regarding the language adopted in tags. We conducted a study to compare the tags assigned to digital images with and without the support of a recommender system. Results pointed to have an association between users' language used to assign tags and the type of systems supporting this task.</p>
      </abstract>
      <kwd-group>
        <kwd>tagging</kwd>
        <kwd>recommendation</kwd>
        <kwd>multilingual</kwd>
        <kwd>digital image</kwd>
        <kwd>annotation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Tag recommender systems have arisen to help users choose
tags most suited to lead to better (more accurate, more e
cient, more satisfying, etc.) content retrieval. One popular
approach used for such recommendations is to suggest tags
which co-occur [
        <xref ref-type="bibr" rid="ref1 ref5 ref6">6, 5, 1</xref>
        ]. When users assign recommended
tags the set of their tags becomes more homogeneous [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
However, what has not yet been studied is whether the
resulting homogeneity of tags also applies to the language
adopted for tagging.
      </p>
      <p>
        Understanding how users perform the same task in
different environments can provide insight for designers to
decide among distinct approaches according to user and system
needs. The language used for tagging has several
implications for the dissemination of content on the Web [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In
general, co-occurrence based approaches for recommending
tags do not take in account the user's language, but rather
use the collection of tags that co-occur with a target tag to
recommend other tags.
      </p>
      <p>In this work we report a study investigating whether the
presence of recommendation on tagging system can change
the users' choices of the natural language they use for
tagging digital images. Participants from whom tags were
collected were residents of a Portuguese-speaking country and
no particular previous de nition regarding the language for
tagging was given. The results show that there is di
erence in the language adopted to assign tags when users are
supported by recommendation.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>METHOD</title>
      <p>
        Because we had two experimental conditions to test
(tagging with and without the recommendation aid), a
counterbalancing design approach was employed in the current
study. A total of 57 participants, all of them residents in a
Portuguese-speaker country, were partitioned in two groups:
G1 had 33 participants (16 female, 17 male, with a mean
age of 27 years); G2 had 24 participants (10 female and
14 male, with a mean age of 25). Participants were
presented the same images, interfaces, and were asked to
perform the same tasks, but each group received the conditions
in reverse order. As a design platform for recommendation,
we used a model [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] primarily intended to recommend tags
based on co-occurrence. This approach computes the utility
of tags using a combination of three measures to compute
a ranking of similar tags based on a reference tag (a tag
the user assigned to the image before receiving any
recommendations). As source for recommending tags we used a
training dataset from Flickr composed by more than 600; 000
tags. Participants had to assign tags to seven distinct
images publicly available on Google Images. As is common in
such research, images were classi ed by content [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Four
images were present in both NR and RS stages for the
purpose of comparing behaviour. The other three images were
presented only in the RS stage to address whether no
previous experience with the image have di erence on tagging
language.
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Classifying Languages</title>
      <p>
        To process the language of tags assigned in this
experiment, we use a standalone language identi cation tool based
on a Nave Bayes classi er [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This classi er provides a
probability estimate of the natural language from which a
given set of words are drawn. By performing the language
identi cation and observing a bunch of tags and its
probability estimation resulted, we found that some users tagged
images multilingually, so the language classi er was useful
to estimate a language score for each image classifying it as
mainly assigned with tags in English (EN) or Portuguese
(PT), the two main languages used by participants in the
tagging task.
      </p>
      <p>Our rst analysis assessed the e ect of using, or not using,
a recommender system on the language used to assign tags.
The set of tags assigned to each image by each participant
was classi ed as either PT or EN. Table 1 shows the
difference in the proportion of images and the main language
used by each group to images that were presented in both
stages of this study.</p>
      <p>G2</p>
      <p>NR RS
63 (65%) 25 (26%)
33 (35%) 71 (74%)</p>
      <p>When not using the recommender system, participants
tagged fewer images using EN (G1: mean = 1:54 SD =
1:60; G2: mean = 1:37, SD = 1:71). However in the RS
stage, more images were tagged mainly in EN (G1: mean =
3:27; SD = 1:30; G2: mean = 2:91, SD = 1:28). A (paired)
Wilcoxon signed-rank test indicated that the mean of images
with tags in EN changed (p &lt; 0:01) from one stage to
another for both groups. This behaviour also was found when
we looked to the language used in each image individually,
before and after recommendation (McNemar p &lt; 0:01) and
also for the images that were tagged only in the RS stage.</p>
      <p>To make sure that the results found in this study were
not narrowed by a few participants' behaviour, we looked to
their results individually. We classi ed users as: PT-taggers,
EN-taggers or multilingual-taggers (ML); PT-taggers | had
all their images classi ed mainly by tags assigned in PT;
and ML-taggers had a mix of images tagged in EN and PT.
Figure 1 shows the proportion of participants and their
respective tagging behaviour in each stage of this study.</p>
      <p>At the NR stage, 45% of participants were classi ed as PT
taggers. However, this behaviour changed in the RS stage,
only 8% of them kept tagging images mainly with tags in
PT. In the RS stage, the majority of PT taggers switched
their tagging language and behaved as EN- and ML-taggers.
8
.</p>
      <p>0
% .4
0
0
.
0
ML
EN</p>
      <p>PT
NR</p>
      <p>RS</p>
      <p>To try understand participants' behaviour, we examined
the order of tags assigned in the RS stage: We noticed that
at rst some images received reference tags in PT but the
following reference tags were assigned in EN. We
hypothesized that, as participants received tag recommendation in
EN, they switched the language of reference tags. However,
individual users' behaviour needs future investigation.
4.</p>
    </sec>
    <sec id="sec-4">
      <title>DISCUSSION AND CONCLUSION</title>
      <p>The combination of semi-automatic recommendation and
tags' co-occurrence is an interesting and important approach
used for recommendation, in part because the users'
reference tags are the seed for recommending other tags. This
approach is useful to make rich annotation, and also intends
to improve the user experience by decreasing the e ort to
assign tags while still allowing users to use personal tags.
In our study we have shown that the recommendation
approach used a ects the language choice for tagging. Results
indicate that the quantity of images assigned mainly with
tags in EN changed in the RS stage (compared to the NR
stage). Because co-occurrence based approaches use the link
among tags to recommend other tags, the results found here
have several implications for the design process of tagging
recommender systems. The approach increased the language
homogeneity (EN) of tags which could result in a cultural
isolation of online indexed content. We are aware that the
training dataset used for recommendation has a
representative quantity of tags in EN and consequently many tags
in EN can co-occur with tags in another languages.
However, the social tag dataset we used represents the natural
imbalance of the language on the Web and how tags are
connected, which reinforces that the EN language functions
as a hub to other languages. On the other hand, the
interface used in the RS stage only recommends tags based on
the reference tags assigned, so users still had the autonomy
to assign their own tags and keep their vocabulary without
any recommended tag. The lack of switching (or converting)
languages is evident when one considers that even in the RS
stage some users continued as PT-taggers. These ndings
highlight the needed to investigate whether there is distinct
users pro le that are more likely to accept multilingual
recommendations.
5.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>A. de CA Ziesemer and J. B. S. de Oliveira</surname>
          </string-name>
          .
          <article-title>Keep querying and tag on: Collaborative folksonomy using model-based recommendation</article-title>
          .
          <source>In Collaboration and Technology</source>
          , pages
          <volume>10</volume>
          {
          <fpage>17</fpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Dong</surname>
          </string-name>
          and W.-T. Fu.
          <article-title>Cultural di erence in image tagging</article-title>
          .
          <source>In Proc. SIGCHI</source>
          , pages
          <volume>981</volume>
          {
          <fpage>984</fpage>
          . ACM,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lui</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Baldwin</surname>
          </string-name>
          .
          <article-title>Cross-domain feature selection for language identi cation</article-title>
          .
          <source>In Intl. Joint Conf. on Natural Language Processing</source>
          , pages
          <volume>553</volume>
          {
          <fpage>561</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ronen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Goncalves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. Z.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vespignani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pinker</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Hidalgo</surname>
          </string-name>
          .
          <article-title>Links that speak: The global language network and its association with global fame</article-title>
          .
          <source>Proc. of the Nat'l Acad. of Sciences</source>
          ,
          <volume>111</volume>
          (
          <issue>52</issue>
          ):E5616{
          <fpage>E5622</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Sigurbjo</surname>
          </string-name>
          rnsson and R. van Zwol.
          <article-title>Flickr tag recommendation based on collective knowledge</article-title>
          .
          <source>In Proc. WWW</source>
          , pages
          <volume>327</volume>
          {
          <fpage>336</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wartena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brussee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Wibbels</surname>
          </string-name>
          .
          <article-title>Using tag co-occurrence for recommendation</article-title>
          .
          <source>In Intelligent Systems Design and Applications</source>
          ,
          <year>2009</year>
          . ISDA'
          <volume>09</volume>
          . Ninth International Conference on, pages
          <volume>273</volume>
          {
          <fpage>278</fpage>
          . IEEE,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>