<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Estimating users' areas of research by publications and profiles on social networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Petr Saloun</string-name>
          <email>petr.saloun@vsb.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adam Ondrejka</string-name>
          <email>adam.ondrejka.st@vsb.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivan Zelinka</string-name>
          <email>ivan.zelinka@vsb.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>VSB-Technical University of</institution>
          ,
          <addr-line>Ostrava, 17. listopadu 15, 70833 Ostrava</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We focus on estimating a research area of a researcher/user by nding a unique identity in digital libraries and social networks and by analyse of public metadata of their publications and published information on social networks pro les. The lack of content of the metadata in some of the publications is solved by the information retrieval using techniques of NLP. We estimate the author's domain by extracting keywords from abstracts as well as by information published on social pro les. The result of this work is a design, an original algorithm and experimental veri cation of the algorithm.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;digital library</kwd>
        <kwd>identify user</kwd>
        <kwd>social media</kwd>
        <kwd>information retrieval</kwd>
        <kwd>natural language processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>There are situations in life when we need to nd works of a
speci c researcher, for example when we organize a
conference. One of the most common way to solve this problem
is to search for the information about this researcher, either
by looking at the institutions and his publications or by
examining the topics he had on various conferences, and then
create a pro le of the researcher manually. With the boom
of social networking people began to publish more openly
accessible data than before. Using the data may reveal an
interesting complement to the true identity of a person.
Unfortunately, the expansion and the emergence of various
social networks caused a relatively large fragmentation and
users publish speci c information about themselves to a
social network focusing on the speci c topic. The fact that
people can have the same name is another obstacle,
therefore it is necessary to verify that it is actually a pro le of the
right person and not of his namesake. The main objective
of this work is to identify researchers on social networks and
digital libraries. Based on the public information on these
sites, we estimate the area of a person's research. The
results are keywords that serve both as a description of the
person and as an input for further research in nding
suitable reviewers of publications presented at conferences and
for detecting the violations of a copyright.</p>
    </sec>
    <sec id="sec-2">
      <title>2. ESTIMATING AREA OF AUTHOR’S RE</title>
    </sec>
    <sec id="sec-3">
      <title>SEARCH</title>
      <p>
        To nd the right pro les we used a technique which compares
speci c attributes by di erent weights. Details are described
in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We used a modi ed version shown in the Equation 1
(similar work is mentioned in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]).
simu;p =
(Pn
      </p>
      <p>i=0 wi sim(ai;u; ai;p) if sim(aname) &gt; thname
0
otherwise
(1)
where sim(aname) is similarity between author and user
prole names, thname is threshold value to decide if names are
the same or not, n is count of compared attributes, wi is
weight of compared attributes, ap is set of user's pro le
attributes, au is set of user's attributes by his publications,
sim(ai;u; ai;p) is similarity between attributes. The text
comparison is done by fuzzy matching to include potential
typing errors in attributes.</p>
      <p>
        As shown in the Algorithm 1 the input is the name of the
researcher. Then the search requests to all the digital libraries
are executed and it downloads the publications. Each
publication is then categorized by the de ned criteria. Initially
we eliminated all the articles that were similar or equal and
were occurring in multiple libraries. Then we categorize the
publications by a liations using the text similarity
algorithm and also by their co-authors. Now we have groups
of possible unique authors. There is an issue now with the
author publishing on his own or being active in multiple
afliations, because then the algorithm divides him into more
groups. To handle the situation, we included a
comparison with user's connections retrieved from social networks
and additional information about skills, experiences and so
on. After that we categorized the keywords by social pro le
similarity. We found all the pro les associated with the
researcher name. Then we tried to nd common connections
and a liations, and if there were at least one in each pair we
would assign them together with the compared social
proles. The process was repeated for every found co-authors
and referred publications with the input of the previously
found authors, so the results would be more accurate.
People with the same name are not merged into one identity,
because of the classi cation by connections and a liates. It
is highly unlikely that these people will have the same
coauthors, friends and jobs. More information about a unique
user identity is described in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>The research domain is obtained by analysing the keywords
of all found publications and by extracting the additional
information from the social pro les. Because of lacking and
incompletely chosen keywords in many publications we had
to use our original technique to get additional keywords from
abstract. We do not go into detail describing our original
technique, because of the page limit of this poster.
Data: Author's rst name and last name
Result: User's identities</p>
      <p>rstName, lastName fuser raw inputg;
for searcher in DigitalLibrariesSearchers do</p>
      <p>publications SearchAuthor( rstName, lastName);
end
GroupByPublication(publications);
GroupByA liates(publications);
for searcher in SocialNetworkSearchers do</p>
      <p>publications SearchAuthor( rstName, lastName);
end
for group in groups do
for publication in publications do
groupKeywords +=</p>
      <p>AnalyzePublication(publication);
end
end</p>
      <p>nalGroups = GroupBySocialPro les(groups);
Algorithm 1: Finding unique author identity on digital
libraries and social networks</p>
    </sec>
    <sec id="sec-4">
      <title>3. EXPERIMENT</title>
      <p>From the digital libraries we chose IEEExplorer1, ACM
Digital Library2 and SpringerLink3. In this work, the researchers
are found on LinkedIn4 and Researchgate5 social networks.
In the experiment we check if we can nd unique
identities and research domains of 180 randomly selected
researchers. The search of user identities in digital libraries
has been tested by at least 180 researchers, by
downloading and analysing about 3100 publications (Table 1). The
researchers were chosen randomly and included people of
different nationalities. Initially there were users grouped only
by co-authors and a liates. There were 118 authors grouped
correctly ("R") with rate 65 %. 3 authors had assigned other
author's publications ("POA", 2 % error rate) because of
fact that searched author had publications with namesake
co-authors and it was poorly evaluated as same person, error
rate in this case was 59 authors were not merged correctly
("NA"), there were too many created identities of which
should be same one author. This was caused by
publications with no or one co-author and di erent a liations, it
was not possible to nd connection between them. Error
rate of this category was 33 %.
In the next step we included comparisons of authors by data
found on their social pro les. 132 users were identi ed
correctly (73 %) and to 3 same authors were again assigned
wrong publications due to the same reasons, error rate
remained 2 %. The only improvements were made in the case
when one author was in two di erent groups ("NA") and
when there were connections found in social pro les between
them, so error rate decreased to 25 %.Finally we added
comparisons by keywords between publications with a single
author and publications with multiple authors. 166 users were
identi ed correctly, correct rate increased to 92 %. There
was no situation with a one author in more groups ("NA",
error rate of this category decreased to 0 %). Unfortunately
14 users had assigned wrong publications ("POA"), error
rate increased to 8 %. It was caused by errors in extracting
of keywords and the associated bad detection of a similarity
between researchers and publications.</p>
    </sec>
    <sec id="sec-5">
      <title>4. CONCLUSION</title>
      <p>The goal of our work was to create algorithm to estimate
research area of users by nding their identities in digital
library and social networks and by analyse found data. As the
results from our experiment show, the algorithm for
identifying research identities on digital libraries and social networks
was successful in 92 % of all the attempts in nal. This work
was the rst step in the research of recommending
publications to authors and nding violations of copyrights. We
would want to try to add comparing authors' domains
detected from publications and information on the Internet
to classic full-text search approach. This work is input for
further research in nding suitable reviewers of publications
presented at conferences and for detecting the violations of
a copyright.</p>
    </sec>
    <sec id="sec-6">
      <title>5. ACKNOWLEDGMENT</title>
      <p>The following grant is acknowledged for the nancial support
provided for this research: Grant of SGS No. SP2014/42,
VSB - Technical University of Ostrava, Czech Republic.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kostkova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Barla</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Bielikova</surname>
          </string-name>
          .
          <article-title>Social relationships as a means for identifying an individual in large information spaces</article-title>
          . In M. Bramer, editor,
          <source>Arti cial Intelligence in Theory and Practice III</source>
          , volume
          <volume>331</volume>
          <source>of IFIP Advances in Information and Communication Technology</source>
          , pages
          <volume>35</volume>
          {
          <fpage>44</fpage>
          . Springer Berlin Heidelberg,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Raad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chbeir</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A. Dipanda.</surname>
          </string-name>
          <article-title>User pro le matching in social networks</article-title>
          .
          <source>In Network-Based Information Systems (NBiS)</source>
          ,
          <year>2010</year>
          13th International Conference on, pages
          <volume>297</volume>
          {
          <fpage>304</fpage>
          ,
          <string-name>
            <surname>Sept</surname>
          </string-name>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Vosecky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V. Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          .
          <article-title>User identi cation across social networks using the web pro le and friend network</article-title>
          .
          <source>IJWA</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <volume>23</volume>
          {
          <fpage>34</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>