<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Kibana and ElasticSearch for the Recommendation of Job Offers to Students</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antonio Robles-Gómez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Salvador Ros</string-name>
          <email>sros@scc.uned.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonio Martínez-Gámez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Hernán- dez</string-name>
          <email>roberto@scc.uned.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Llanos Tobarra</string-name>
          <email>llanos@scc.uned.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafael Pastor</string-name>
          <email>rpastor@scc.uned.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Agustín C. Caminero</string-name>
          <email>accaminero@scc.uned.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jesús Cano</string-name>
          <email>jcano@scc.uned.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Control and Communications System Department, ETSI Informática Universidad Nacional de Educación a Distancia (UNED)</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Student, ETSI Informática Universidad Nacional de Educación a Distancia (UNED)</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper proposes the use of Kibana and ElasticSearch for the efficient recommendation of job offers to students, in order to allow them to improve their professional career. Both Kibana and ElasticSearch tools are integrated in a self-developed web-system at UNED. By using Kibana, users can get a graphical representation of the job situation for a time period, which could indicate them what is the knowledge most wanted by companies. Also the system can let users search for information using keywords within the job offers stored in the system. This is intended to provide users with a picture of the job market, which allows them to focus their training on the most demanded professional profiles by employers, or to find hiring trends. For storing all the collected information, ElasticSearch has been deployed in order to perform agile queries from not-structured information, like job descriptions.</p>
      </abstract>
      <kwd-group>
        <kwd>Academic Analytics (AA)</kwd>
        <kwd>Kibana/ElasticSearch</kwd>
        <kwd>Visualization</kwd>
        <kwd>Education</kwd>
        <kwd>Professional Career</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In recent years, the production of digital information has raised in the context of
employment. One of the key features of this growth is the existence of multiple massive
sources of information, including social networks (such as InfoJobs [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or Monster
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]) and Internet services, among others. The use of social networking in education as
a motivating tool and its use to determine the impact of education on labor market
integration is not a new issue. For instance, LinkedIn is employed in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A key topic
associated with the volume of data is the semantic and structural heterogeneity of
such data (structured, semi-structured…). This fact adds additional complexity to the
extraction, processing, and visualization of data, since classical methods cannot be
employed.
      </p>
      <p>This variety of heterogeneous pieces of information gives birth to the Big Data
paradigm, which has a great potential to revolutionize different dimensions of society.</p>
      <p>Copyright © 2017 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.</p>
      <p>However, there are still research areas to explore to obtain a higher throughput. In the
field of education, the Learning Analytics (LA) area covers this paradigm.</p>
      <p>Our current work is focused on analyzing the students’ learning to improve it or to
make recommendations, but not in the students’ learning process itself. More
specifically, our attention is focused at the field of Academic Analytics (AA) in an
institutional level, in order to help our University to analyze students’ learning profiles and
academic performance. It is very important for an Institution or University to give
their students the possibility to check the job offers available depending of the level of
knowledge. Also, the administrative staff could carry out this task in order to analyze
the requirements of companies. This can be a starting point for the Institution to
perform additional studies in order to improve the employability of students.</p>
      <p>
        Therefore, this paper proposes a self-developed web-system at UNED based on
Kibana and ElasticSearch. It can obtain information from several sources of job
offers, such as Infojobs and Monster [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and make an educational counseling for the
successful employment of students. This is a relevant society challenge, since this fact
increases the students’ possibilities of graduation and professional development.
      </p>
      <p>
        Kibana [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] allows us to graphically represent a selection of relevant indicators for job
offers. Users in the system can play an active role, since they get a graphical
representation of the job situation with relevant information about the job offers, and are also
able to search information of interest by using keywords. This is intended to provide
users with a picture of the job market to adapt their professional career to the profiles
most demanded by employers or find hiring trends, niche specialization, etc.
      </p>
      <p>
        From the back-end side, job offers are gathered by using both web-scraping
techniques and/or by means of APIs, depending of the availability of information in the
sources of information [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. To manage no-structured or semi-structured information,
ElasticSearch [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is being broadly employed in the context of Big Data paradigm, for
agile visual queries, like job descriptions. In the LASI 2016 conference, a workshop
about Kibana and ElasticSearch was presented. More information can be found in
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>The structure of this paper is as follows: Section II depicts the data extraction and
processing procedures. After that, Section III focuses on the data visualization phase
for the recommendation of job offers to students. Finally, Section IV discusses our
conclusions and suggests guidelines for future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data Extraction and Processing</title>
      <p>
        The information hosted on the web is normally unstructured, or at most
semistructured –in the case that access mechanisms have been provided to developers.
Some websites take advantage of APIs (Access Programming Interfaces). These APIs
operate as a bridge between the users’ browser and the REST services and they allow
getting JSON responses to source’s invocations by using the field selectors. For
instance, the LinkedIn API has many restrictions which limit its use. For this reason,
this data collection method is not the most appropriate in some cases, as stated in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Depending on the user’s permissions, several professional on-line networks allow
getting information, such as the actual job position, job requirements, summary
description, and so on. This have been the method employed to obtain job offers in the
Infojobs professional network for the current work. In the case of the Monster
professional network, a set of scrapping techniques have been employed [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Currently, our
system supports both professional social networks. It is intended to be extended with
additional social networks in a nearby future, such as LinkedIn, Twitter, and so on.
The state of the art of the data extraction and processing of information can be found
in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>As for our web-system, the administrator can manage the extraction and processing
of the different sources of information, for instance, starting, stopping or modifying
parameters such as the time between captures. Job offers will be stored in
ElasticSearch for a more advanced management of job offers.</p>
      <p>
        Once data is retrieved from its source, our self-developed web-system carefully
preprocesses information to guarantee the quality of the data collection and its later
aggregation [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Once a job offer is extracted from the corresponding source of
information, this is cleaned and adapted to a new ontology defined in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] for the
educational counselling based on professional indicators. Depending on the professional
social network, the gathered information can have a different level of quality. We
filter all data to keep common information in our dataset, independent from the
specific source of information, in order to have common indicators for all the job offers.
The UNESCO’s ICSED [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] education degree levels classification presents a revision
of the ISCED 1997 levels of education classification. These levels of education are
employed for matching job offers to students’ preferences.
      </p>
      <p>On the other hand, we use a relational database to save the data provided by web
crawling tools in a structured way. It has also been considered appropriate the storage
of data in a powerful server of searches, in order to experiment with flexibility which
functionalities and performance this type of product presents when storing and
performing searches by text. Within this category we can mention Apache Solr,
ElasticSearch, or MongoDB. In these kinds of databases, the information can be stored in
the form of JSON documents, which are indexed so that access to information is
much faster than with relational databases. These are open-source, and can be used for
the most common programming languages by using REST services. They also offer
additional tools that give value and complete the initial product, among others, data
visualization and analysis tools. The most powerful and agile database is
ElasticSearch, after studying all of them. ElasticSearch is based on Lucene, so it can be
seen as a powerful search-engine for texts, which can be deployed easily in a
distributed way. Additionally, it can handle RESTful services in an efficient way.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Data Visualization</title>
      <p>
        The final decision of using ElasticSearch compared to any other competitor was also
due to the need to implement a system that offered different graphs that allowed users
to interpret the labor market. This is where Kibana [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], another product of the
company Elastic takes part. Kibana eases the analysis and visualization of data thanks to its
intuitive interface, which also offers the power to share or insert control panels. Fig. 1
shows an example of job offers stored in ElasticSearch though the Discover tab of
Kibana.
      </p>
      <p>Kibana only queries data held in ElasticSearch, which acts as both the data store
and the analytics engine. Fig. 2 show a visualization panel, a Dashboard, which has
been fully-integrated from Kibana as a frame in our self-developed web-system. This
panel is totally flexible and adaptable from the Kibana interfaces, as shown in Fig. 3.
When any of the graphics of the panel in use is modified in Kibana, this is
automatically updated in the associated frame integrated in our web-system.</p>
      <p>As an example, Fig. 2 and Fig. 3 show a set of analytics for job offers collected
from the different professional networks that our expert system supports. In both
figures, we can observe the ten provinces with more job offers. In this case, Madrid and
Barcelona are the cities with more job offers. The second graphic is related to number
of categories and subcategories of job offers. The “commercial_y_ventas” category is
the one with more job offers and, as the most popular subcategory, we have
“comercial”. In Fig. 3, we can observe additional graphics, and other relevant information,
such as the amount of job offers located in the system, and the date of the last offer
captured by the system.</p>
      <p>Finally, users can perform free-text searches by employing the powerful
ElasticSearch engine from our application, so looking for relevant offers that could be
interesting from his/her point of view. In Fig. 4, an example is shown. In this case, the
user has performed a search using the “Informática” word, and our system has
searched and presented all the related job offers located in the ElasticSearch database.
Learning does not take place in an isolated context, but occurs in a social, cultural and
working atmosphere. For this reason, it is very relevant for our students the analysis
and aggregation of external sources of information from current professional
networks. This will help students to guide their education and, therefore, their
professional career. For this purpose, several professional sources of information, such as
Infojobs and Monster, are available on the Internet. For this reason, a large amount of
data is currently being obtained from these websites in the field of Big Data, and more
specifically, in the topic of Academic Analytics (AA), as stated above.</p>
      <p>
        This work proposes the use of Kibana and ElasticSearch for the recommendation
of job offers to students to improve their professional career. Both Kibana and
ElasticSearch tools are integrated in a self-developed web-system. By using the Kibana
interface, users can get a graphical representation of the job situation during a period
of time, which could indicate them what the most demanded knowledge by companies
is, and search relevant keywords that fit their necessities or preferences. This is
intended to provide users with a picture of the job market that allows them to focus their
training on the professional profiles most demanded by employers. To store the
collected information, ElasticSearch has been deployed for its ability to perform agile
visual queries from not-structured information, like job descriptions. Both Kibana and
ElasticSearch are fully-connected and integrated with our web system, used for
linking job offers and users’ preferences. The recommendation step matches the ontology
proposed in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] with the common selected indicators for job offers. In addition to this,
users can perform searches with some criteria, such as location, category, and so on.
      </p>
      <p>Authors would like to acknowledge the support of the European research project
ERC-2015-STG-679528 POSTDATA, and the local project (2014I/PPRO/031) from
UNED and Banco Santander; and the Region of Madrid for the support of E-Madrid
Network of Excellence (S2013-ICE2715). The authors also acknowledge the support
of SNOLA, officially recognized Thematic Network of Excellence
(TIN2015-71669REDT) by the Spanish Ministry of Economy and Competitiveness.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Case</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gardiner</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rutner</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A LinkedIn Analysis of Career Paths of Information Systems Alumni</article-title>
          .
          <source>Journal of the Southern Association for Information Systems</source>
          , vol.
          <volume>1</volume>
          , no.
          <issue>1</issue>
          ,
          <string-name>
            <surname>Winter</surname>
            <given-names>2013</given-names>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nespereira</surname>
            ,
            <given-names>C. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vilas</surname>
            ,
            <given-names>A. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Redondo</surname>
            ,
            <given-names>R. P. D.</given-names>
          </string-name>
          :
          <article-title>Scraping and Clustering Techniques for the Characterization of LinkedIn Profiles</article-title>
          .
          <source>In Proc. 4th International Conference on Information Technology Convergence &amp; Services</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          ,
          <year>January 2015</year>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. ElasticSearch. https://www.elastic.co/ (accessed 2017-
          <volume>06</volume>
          -07)
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Infojobs. https://www.infojobs.net/ (accessed 2017-
          <volume>06</volume>
          -07)
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <article-title>Institute for Statistics of the United Nations Educational, Scientific and Cultural Organization (UNESCO): "International Standard Classification of Education: ISCED 2011"</article-title>
          . http://www.uis.unesco.org/Education/Documents/isced-2011
          <source>-en.pdf (accessed</source>
          <year>2017</year>
          -
          <volume>06</volume>
          -07)
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Jacikevicius</surname>
            ,
            <given-names>Z</given-names>
          </string-name>
          : Introduction to Data Quality,
          <year>February 2016</year>
          , http://www.datasciencecentral.com/profiles/blogs/introduction-to
          <article-title>-data-quality (accessed</article-title>
          <year>2017</year>
          -
          <volume>06</volume>
          -07)
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Kibana. https://www.elastic.co/products/kibana (accessed
          <year>2017</year>
          -
          <volume>06</volume>
          -07)
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Monster. http://www.monster.es/ (accessed 2017-
          <volume>06</volume>
          -07)
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Robles-Gómez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ros</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martínez-Gámez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hernández</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caminero</surname>
            ,
            <given-names>A. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tobarra</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pastor</surname>
            ,
            <given-names>R</given-names>
          </string-name>
          &amp; Cano,
          <string-name>
            <surname>J.</surname>
          </string-name>
          :
          <article-title>Defining a Novel Ontology for Educational Counselling based on Professional Indicators</article-title>
          .
          <source>In Proc. Workshop on Applied and Practical Learning Analytics (WAPLA)</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          , September,
          <year>2016</year>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Vozniuk</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodríguez-Triana</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Gillet</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <source>CLEO - Workshop on Contextual Learning Analytics Enforcing data Ownership</source>
          .
          <article-title>Building Interactive Learning Dashboards with Logstash, Elasticsearch and Kibana (ELK)</article-title>
          .
          <source>In Learning Analytics Summer Institute</source>
          <year>2016</year>
          (LASI
          <year>2016</year>
          ), Bilbao, Spain, June,
          <year>2016</year>
          . https://infoscience.epfl.ch/record/220334 (accessed
          <year>2017</year>
          -
          <volume>06</volume>
          -07)
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>