<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Use of text mining for understanding Peruvian students and faculties' perceptions on bibliometrics training</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>San Borja</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lima - Peru´ .</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad de San Mart ́ın de Porres / Av.</institution>
          <addr-line>Las Calandrias s/n, Santa Anita</addr-line>
          ,
          <institution>Lima - Peru ́ . Joel Alhuay-Quispe Universidad San Ignacio de Loyola / Av. la Fontana 550, La Molina, Lima - Per u ́. Universidad Nacional Mayor de San Marcos / Ca. Germa ́n Ame ́zaga 375</institution>
          ,
          <addr-line>Lima, Lima -</addr-line>
          <country country="PE">Peru ́</country>
        </aff>
      </contrib-group>
      <fpage>165</fpage>
      <lpage>169</lpage>
      <abstract>
        <p>Background: Studies on bibliometrics and informetrics training have focused on teachers and curricular experts' opinion, only a few studies have examined undergraduate students and practitioners' perceptions. Objective: To understand how librarianship students and professionals perceive the bibliometrics and informetrics training delivered to them. Methods: For data collection, we used a survey with opened-ended questions, to know the genuine responses of the participants. After working with the automatic term extraction technique, for codifying the answers we employed a data dictionary for quantifying the frequency of occurrences. The software programs used at this stage were terMEXt and LWIC. Data analysis was carried out with statistics of mean difference and the correlation coefficient. Results: The output of statistical analysis lets us understood how students and practitioners perceive the bibliometrics and informetrics training delivered to them. Conclusion: Text mining techniques facilitates the processing of responses to openedended questions, and contributes with a quantitative approach to analyzing people's opinions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1 Introduction</p>
      <p>Bibliometrics training within librarian
academic community
Consultants, analysts, as well as research
managers, are increasingly using bibliometrics and
informetrics based approaches and techniques.
Because of the widespread adoption of bibliometrics,
several times the actors mentioned earlier do not
have the required training or capabilities for
using those tools in a proper way. Seminal studies
on informetrics training go back to the middle of
the 1970s. Since then, the analysis carried out
focused on opinions of Library and Information
Science (LIS) faculties and experts in curricular
design and, in general, those findings did not
evidence any variation when it was discussed the
bibliometrics training for Latin American countries.</p>
      <p>
        At the beginning of the 1980s,
        <xref ref-type="bibr" rid="ref11">Schader (1981)</xref>
        reported on a growing interest in bibliometrics
teaching at higher education, particularly in the
medical sciences, because this analytical approach
would contribute to improving the curricular
contents in information management, by considering
two dimensions for teaching it: one theoretical and
the other applied.
      </p>
      <p>
        In the LIS field,
        <xref ref-type="bibr" rid="ref1">Aiyepeku (1975)</xref>
        , one of the
pioneers who studied the approach used for
bibliometrics training, examined the inclusion of
bibliometrics teaching within the curricular plans of
librarianship schools. Later, Dou, Quoniam, &amp;
Hassanaly (1988) proposed to teach bibliometric
analysis by starting with the bibliographical references
downloaded from the previous indexing databases.
To reinforce their argument, they introduced
examples based on command-line instructions of
MS-DOS operating system. In the middle of the
1990s, Ungern-Sternberg (1995) stated that
teaching of bibliometric methods could be delivered by
presence seminars and the use of online systems;
also, he proposed a curricular method and contents
programming for the design of a course on
bibliometrics
        <xref ref-type="bibr" rid="ref14">(Ungern-Sternberg, 1998)</xref>
        .
      </p>
      <p>
        In brief, first studies on bibliometrics
training described proposals for developed countries:
United States
        <xref ref-type="bibr" rid="ref11">(Schader, 1981)</xref>
        , France
        <xref ref-type="bibr" rid="ref4">(Dou, et
al., 1988)</xref>
        , and Finland
        <xref ref-type="bibr" rid="ref14">(Ungern-Sternberg, 1998)</xref>
        ;
nevertheless, teaching experiences of bibliometric
methods within LIS schools in Hispanic countries
are recent. In Spain, studies found that
statistical topics in LIS were only applied for academic
programs oriented to getting the professional
license (4-6 years), but this orientation was not an
option for 3-year-diplomas
        <xref ref-type="bibr" rid="ref7">(Jime´nez-Contreras, &amp;
Pulgar´ın-Guerrero, 1998)</xref>
        .
      </p>
      <p>As observed, the reflections of cited authors
highlight the programming of curricular contents
or the acquisition of specific skills (e.g.,
management of bibliographic references for later
analysis with statistical tools). However, the mentioned
studies did not examine the opinions of
beneficiaries of the educative service at the higher
education level: undergraduate students and
practitioners, because the last ones must select and train
students in the final stage of their professional
training.
ions.
1.3</p>
    </sec>
    <sec id="sec-2">
      <title>Research purpose</title>
      <p>It is an exploratory study (do not include
hypotheses) that analyzes students and practitioners’
opinions, graduated from Peruvian LIS schools,
on bibliometrics and informetrics training. For
that purpose, researchers worked with text
miningbased techniques and tools.</p>
      <p>Methods</p>
    </sec>
    <sec id="sec-3">
      <title>Sample</title>
      <p>We worked with a nonprobabilistic sample, and
contacted participants attending academic events,
joining mailing lists or sending invitations through
email. We considered the two Peruvian LIS
schools: Pontificia Universidad Cato´lica del Peru´
(PUCP) and Universidad Nacional Mayor de San
Marcos (UNMSM).
1.2</p>
    </sec>
    <sec id="sec-4">
      <title>Text mining for analyzing opinions 2.2</title>
    </sec>
    <sec id="sec-5">
      <title>Data collection</title>
      <p>Text mining as a method for quantitative
analysis –after standardization of responses to
openedended questions– has been applied to academic
and business contexts for understanding
participants’ views and attitudes.</p>
      <p>
        Within the academic landscape, few researchers
have worked with students and practitioners as
subjects of the study.
        <xref ref-type="bibr" rid="ref12">Sliusarenko, et al. (2013)</xref>
        used text mining for examining written responses
in opened-composed-based evaluations. Based on
this approach, authors were able to understand
how responses related with the obtained scores in
evaluations.
        <xref ref-type="bibr" rid="ref6">Freak &amp; Miller (2015)</xref>
        employed it
for identifying thematic groups in the replies about
perceptions of teachers specialized in physical
education. Outside the academic boundaries,
        <xref ref-type="bibr" rid="ref15">Yi, et
al. (2015)</xref>
        analyzed the public perception on a
Chinese touristic trademark by applying text mining
approaches to deal with questionnaires responses.
From another perspective, within an effort to
automate the analysis of opinions,
        <xref ref-type="bibr" rid="ref8">Kumar &amp; Jain
(2015)</xref>
        proposed a system for automatic evaluation
that uses text mining for analyzing views collected
in questionnaires used to measure the professional
development of lecturers at a higher education
institution.
      </p>
      <p>As described, there is evidence showing that
text mining tools facilitate the processing of
opened-ended questions and make it possible a
quantitative approach for analyzing peoples’
opinWe designed a questionnaire (see Annex) that had
two sections: background data and three
openedended questions for exploring librarians’ opinions
in three primary areas for bibliometrics and
informetrics training: shared meanings, contribution to
the career, and required conditions for its teaching.
1. How do you define bibliometrics and
informetrics?
2. What role can play bibliometrics and
informetrics within the profession?
3. What knowledge and skills are required to
deal with the bibliometrics approaches and
techniques?
Researchers applied a print-based and an online
version (Google Forms). Before giving their
answers, participants signed an informed consent
form. The time for filling out the questionnaire
was 15 minutes.
2.3</p>
    </sec>
    <sec id="sec-6">
      <title>Text processing and coding</title>
      <p>
        We transcribed responses into simple text files.
Before creating the data dictionaries for carrying
out the quantitative analysis of content, we used
a text mining technique known as automatic term
extraction, for which we worked with the software
program terMEXt. This application, developed by
        <xref ref-type="bibr" rid="ref3">Barro´n Ceden˜o (2008</xref>
        ), is based on the online
service TerMine, created by
        <xref ref-type="bibr" rid="ref5">Frantzi, Ananiadou, &amp;
Mima (2000</xref>
        ) from the National Centre for Text
Mining at Manchester University.
      </p>
      <p>Given that the responses to the first question
were redundant with the other two, this item was
omitted from text processing. We parsed out the
answers to the second and third questions by
using the default dictionary of terMEXt, and
computed the NC-value for multi-word terms with the
higher semantic value. Algorithms for automatic
term extraction use the NC-value to identify
multiword terms with a meaning that makes it possible
to differentiate from other candidate terms. Based
on these words, we build dictionaries for the two
questions considered for later analysis.</p>
      <p>After creating both dictionaries, we cleaned
out text files for processing them with the
program Linguistic Word Inquiry Count (LWIC), that
counts the number of occurrences, stems and
conditional structures associated with each dictionary
category. This procedure allowed us to compute
the score used in correlation analysis for all the
variables. Data obtained with LWIC program were
entered into the Statistical Program for Social
Sciences (SPSS) for later analysis.
2.4</p>
      <sec id="sec-6-1">
        <title>Data analysis</title>
        <p>For the description of results, we used measures
of central tendency and for dispersion only the
standard deviation. To examine whether variables
were associated, we computed a correlation
matrix, based on the product moment Pearson’s
coefficient or Spearman’s rho, if variables had a high
dispersion.
3</p>
        <sec id="sec-6-1-1">
          <title>Results</title>
          <p>Most of the variables showed a high data
dispersion (Table 1). In average, participants had
30 years and their educations lasted five years
(the difference between the year they entered
university [academic age] and the year they
finished their undergraduate studies [professional
age] from their center of studies; nevertheless,
both variables showed the highest dispersion of all
analyzed variables). Word counting –of those
variables detailed in the data dictionary, as described
in the Methods section–for categories of questions
two and three also exhibited a high dispersion.
The large values of the standard deviation of the
variables were taken into account when carrying
out the correlation analysis.</p>
          <p>Regarding the quantitative analysis of the
content, in question two, the categories evaluator and
promoter showed a small correlation (⇢ = 0.364, p
= 0.004). We observed a similar pattern for
question three: categories theoretical and analytical
skills also exhibited a low correlation (⇢ = 0.388,
p = 0.002), but a moderated one for the categories
analytical skills and information search expertise
(⇢ = 0.432, p = 0.001). Table 2 details the
correlation matrix for variables considered for the study.
4
4.1</p>
        </sec>
        <sec id="sec-6-1-2">
          <title>Discussion and conclusions</title>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>Text mining and bibliometrics training</title>
        <p>Question two of the questionnaire asked about the
role of bibliometrics and informetrics in the
development of LIS as a discipline. Text mining
techniques led us to identify two categories that helped
us to understand this function: one dimension
oriented to evaluation and the other one to promotion,
which also showed a moderate correlation.</p>
        <p>Effectively, the application of bibliometrics
tools contributes to disseminating and promoting
one the few contributions originated from LIS:
metric studies on information (informetrics). It
is true that, because of its applied nature,
librarianship receives influences from several
disciplines: psychology, sociology, history, and, in
recent years, computer science, particularly a
specialized field known as computer-human
interaction, given the current significance of the
development of search interfaces and the
implementation of institutional repositories. However, those
contributions come from outside the LIS
profession. For that reason, it is revealing that
practitioners highlight the promoter role of bibliometrics as
a driver for renewal within this discipline. This
promoting-oriented dimension complements the
other role: evaluator, in the sense that
bibliometrics becomes a sort of thermometer to measure the
internal development of the discipline. This way,
it will be possible to determine whether the library
profession is ready to face the challenges set out
by academic community; otherwise, the demand
for
professionals-experts-in-evaluating-scientificproduction will be met by practitioners educated
in other careers.</p>
        <p>On the other side, question three focused on
knowledge and skills required to deal
appropriately with bibliometric approaches and
techniques. About this issue, text mining let us identify
three primary areas for advising the necessary
professional training: knowledge and concepts,
analytical skills, and information search expertise.
Correlation between the last ones was higher than
between the first two variables; nevertheless, all
of them got statistical significance. Those
relationships exemplify that the most valued skills of
librarians (information search) is associated with
the ability to analyze bibliometric studies and
indicators, but this link is weak when it deals with
the acquisition of theoretical foundations of
bibliometrics.</p>
        <p>We can verify this fact by looking at the
popularity gained by bibliometric indicators –at least
the first generation ones– within the Peruvian LIS
community. However, this widespread adoption
has not translated into studies or investigations
featuring sound theoretical foundations, but until
now Peruvian librarians have preferred the applied
dimension, rather than the integration of
theoretical and applied perspectives, which would lead
us to bibliometrics studies published in specialized
journals.
4.2</p>
        <p>
          Challenges for bibliometrics training
Bibliometrics education, as an element of a
specialized training program or as part of a
curricular plan, needs to be oriented toward a target
population with a previous background in issues
related to information management or scientific
publications. A practical way to carry out an
educative program in bibliometrics would be
targeting to managers of information services at
university libraries, with courses necessarily lectured
by recognized experts
          <xref ref-type="bibr" rid="ref9">(Laitinen, 2015)</xref>
          . Also, the
methodological proposal for bibliometrics
education must emphasize the use of open access
software programs
          <xref ref-type="bibr" rid="ref10">(Sanz-Casado, et al., 2002)</xref>
          ,
because of its widespread adoption in Latin
American countries and the cost-effectiveness of using
these tools, compared with the benefits obtained
by working with licensed programs and platforms.
A.1
        </p>
        <p>Annex</p>
        <p>Informed consent protocol and survey
form [In Spanish]
Avaliable at http://dx.doi.org/10.
6084/m9.figshare.3817155</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Aiyepeku</surname>
            ,
            <given-names>W. O.</given-names>
          </string-name>
          <year>1975</year>
          . Bibliometrics in Information-
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>Science curricula</article-title>
          .
          <source>Information Scientist</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ),
          <fpage>29</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          Barro´n Ceden˜o,
          <string-name>
            <surname>L.</surname>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Manual para el extractor de te´rmino terMEXt</article-title>
          . Retrieved from http://goo. gl/jQjlk2
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Dou</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quoniam</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Hassanaly</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>1988</year>
          .
          <article-title>Teaching bibliometric analysis and MS/DOS commands</article-title>
          .
          <source>Education for Information</source>
          ,
          <volume>6</volume>
          (
          <issue>4</issue>
          ),
          <fpage>411</fpage>
          -
          <lpage>423</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Frantzi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ananiadou</surname>
            , S. y Mima,
            <given-names>H.</given-names>
          </string-name>
          <year>2000</year>
          .
          <article-title>Automatic recognition of multi-word terms</article-title>
          .
          <source>International Journal of Digital Libraries</source>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          ), pp.
          <fpage>117</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Freak</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Magnifying pre-service generalist teachers' perceptions of preparedness to teach primary school physical education</article-title>
          .
          <source>Physical Education and Sport Pedagogy</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>Jime´nez-</article-title>
          <string-name>
            <surname>Contreras</surname>
          </string-name>
          , E., y Pulgar´
          <article-title>ın-</article-title>
          <string-name>
            <surname>Guerrero</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>1998</year>
          .
          <article-title>Bibliometrics-Informetrics and other quantitative subjects in Library and Information Science curricula in Spain</article-title>
          .
          <source>Education for Information</source>
          ,
          <volume>16</volume>
          (
          <issue>4</issue>
          ),
          <fpage>341</fpage>
          -
          <lpage>355</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Sentiment analysis and Feedback Evaluation. En MOOCs, Innovation and Technology in Education (MITE</article-title>
          ),
          <source>2015 IEEE 3rd International Conference</source>
          (pp.
          <fpage>433</fpage>
          -
          <lpage>436</lpage>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Laitinen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>The Benefits of Learning Bibliometrics on the Teaching Librarians' Information Literacy</article-title>
          .
          <source>En The Third European Conference on Information Literacy (ECIL)</source>
          (p.
          <fpage>159</fpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Sanz-Casado</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suarez-Balseiro</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garc</surname>
          </string-name>
          <article-title>´ıa-</article-title>
          <string-name>
            <surname>Zorita</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mart</surname>
          </string-name>
          <article-title>´ın-</article-title>
          <string-name>
            <surname>Moreno</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lascurain-Sa</surname>
            ´nchez,
            <given-names>M. L.</given-names>
          </string-name>
          <year>2002</year>
          .
          <article-title>Metric studies of information: An Approach towards a Practical Teaching Method</article-title>
          .
          <source>Education for Information</source>
          ,
          <volume>20</volume>
          (
          <issue>2</issue>
          ),
          <fpage>133</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Schader</surname>
            ,
            <given-names>A. M.</given-names>
          </string-name>
          <year>1981</year>
          .
          <article-title>Teaching bibliometrics</article-title>
          .
          <source>Library Trends</source>
          ,
          <volume>30</volume>
          (
          <issue>1</issue>
          ),
          <fpage>151</fpage>
          -
          <lpage>172</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Sliusarenko</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clemmensen</surname>
            ,
            <given-names>L. K. H.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ersbøll</surname>
            ,
            <given-names>B. K.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Text mining in students' course evaluations: Relationships between open-ended comments and quantitative scores</article-title>
          .
          <source>En CSEDU 2013- Proceedings of the 5th International Conference on Computer Supported Education. SciTePress.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>von Ungern-Sternberg</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>1995</year>
          .
          <article-title>Applications in teaching bibliometrics</article-title>
          .
          <source>En 61st IFLA General Conference - Conference Proceedings. Retrieved</source>
          from http: //hdl.handle.
          <source>net/10150/106138</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>von Ungern-Sternberg</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>1998</year>
          .
          <article-title>Teaching Bibliometrics</article-title>
          .
          <source>Journal of Education for Library and Information Science</source>
          ,
          <volume>39</volume>
          (
          <issue>1</issue>
          ),
          <fpage>76</fpage>
          -
          <lpage>80</lpage>
          . http://doi.org/ 10.2307/40324182
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Yi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>and Hong</given-names>
            <surname>Mei</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <year>2015</year>
          .
          <article-title>Research on audience's perception of tourism brand of Guiyang based on the text mining of ROST</article-title>
          . Journal of Chongqing Normal University,
          <volume>32</volume>
          (
          <issue>1</issue>
          ),
          <fpage>126</fpage>
          -
          <lpage>134</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>