<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Information Management and Big Data: SIMBig overview</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Juan Antonio Lossio-Ventura</string-name>
          <email>jlossioventura@ufl.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hugo Alatrista-Salas</string-name>
          <email>h.alatristas@up.edu.pe</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Health Outcomes &amp; Policy, University of Florida</institution>
          ,
          <addr-line>Florida</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad del Pac ́ıfico, Pontificia Universidad Cato ́lica del Peru ́</institution>
          ,
          <addr-line>Lima</addr-line>
          ,
          <country country="PE">Peru</country>
        </aff>
      </contrib-group>
      <fpage>11</fpage>
      <lpage>14</lpage>
      <abstract>
        <p>Big Data and Data Science are popular terms used to describe the exponential growth of data and the analysis of this data respectively. The aim of the symposium is to present the analysis of methods for extracting knowledge from large volumes of data through techniques of data science and artificial intelligence. Bringing together main national and international actors in the decision-making field to state in new technologies dedicated to handle large amount of information.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1 Introduction
Big Data is a popular term used to describe the
exponential growth and availability of data, which
could be structured and unstructured. Data
Science is a field seeking to extract knowledge or
insights from large volumes of heterogeneous data
(e.g. video, audio, text, image). Data Science is a
continuation of some fields such as the data
analysis, statistics, machine learning, data mining
similar to Knowledge Discovery in Databases (KDD).</p>
      <p>
        Big Data has taken place over the last 20 years.
For instance, social networks such as Facebook,
Twitter and Linkedin generate masses of data,
which is available to be accessed by other
applications. Several domains, including biomedicine,
life sciences and scientific research, have been
affected by Big Data1. Therefore there is a need to
understand and exploit this data. This process can
be carried out thanks to “Data Science”, which is
based on methodologies of Data Mining, Natural
Language Processing, Semantic Web, Statistics,
etc. That allows us to gain new insight through
1By 2015 the average of data annually generated in
hospitals is 665TB: https://datafloq.com/read/
body-source-big-data-infographic/413.
data-driven research
        <xref ref-type="bibr" rid="ref1 ref4">(Madden, 2012; Embley and
Liddle, 2013)</xref>
        . A major problem hampering Big
Data Analytics development is the need to process
several types of data, such as structured, numeric
and unstructured data (e.g. video, audio, text,
image, etc)2.
      </p>
      <p>Our third edition of the Annual International
Symposium on Information Management and Big
Data - SIMBig 20163, seeks to present the new
methods of data science an related fields for
analyzing and managing large volumes. Counting
with main national and international actors in the
decision-making field to state in new technologies
dedicated to handle large amount of information.</p>
      <p>
        The second edition, SIMBig 20154, was held
in Cusco, Peru, from September 2nd to 4th,
2015. SIMBig 2015 has been indexed on DBLP5
        <xref ref-type="bibr" rid="ref3">(Lossio-Ventura and Alatrista-Salas, 2015)</xref>
        and on
CEUR Workshop Proceedings6.
      </p>
      <p>
        Our first edition, SIMBig 20147 took also place
in Cusco, Peru in September 2014. SIMBig 2014
has also been indexed on DBLP8
        <xref ref-type="bibr" rid="ref2">(Lossio-Ventura
and Alatrista-Salas, 2014)</xref>
        and on CEUR
Workshop Proceedings9.
      </p>
      <p>Scope and Topics
2Today, 80% of data is unstructured such as images,
video, and notes
3http://simbig.org/SIMBig2016/
4http://simbig.org/SIMBig2015/
5http://dblp2.uni-trier.de/db/conf/
simbig/simbig2015.html
6http://ceur-ws.org/Vol-1478/
7https://www.lirmm.fr/simbig2014/
8http://dblp2.uni-trier.de/db/conf/
simbig/simbig2014.html
9http://ceur-ws.org/Vol-1318/
Multilingual Text Processing, Biomedical NLP.
Topics of interest of SIMBig 2016 included but
were not limited to:</p>
      <p>Keynote Speakers
This third edition of SIMBig, we had experts on
different areas, such as Data Science, Information
Retrieval, Natural Language Processing and Data
Mining. Our four invited were:
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Albert Bifet (Prof, PhD)</title>
      <p>Albert Bifet is a Big Data scientist with 10+ years
of international experience in research and in
leading new open source software projects for
business analytics, data mining and machine learning
(Huawei, Yahoo, University of Waikato, UPC). He
obtained a PhD from UPC-BarcelonaTech. He has
worked in Hong Kong, New Zealand and Europe.
At Yahoo Labs, he co-founded Apache SAMOA
(Scalable Advanced Massive Online Analysis) in
2013. Apache SAMOA is distributed streaming
machine learning (ML) framework that contains a
programing abstraction for distributed streaming
ML algorithms. At the WEKA Machine
Learning group, he is co-leading MOA (Massive Online
Analysis) since 2008. MOA is the most popular
open source framework for data stream mining,
with more than 20,000 downloads each year. He is
the author of a book on Adaptive Stream Mining
and Pattern Learning and Mining from Evolving
Data Streams. Additionally, he was editor of the
Big Data Mining special issue of SIGKDD
Explorations in 2012. Also, he is serving as Co-Chair
of the Industrial track of ECML PKDD 2016,
and served as Co-Chair of BigMine (2014, 2013,
2012), and ACM SAC Data Streams Track (2016,
2014, 2013, 2012).
2.2</p>
    </sec>
    <sec id="sec-3">
      <title>Fabio Crestani (Prof, PhD)</title>
      <p>Fabio Crestani is a full professor at the Faculty of
Informatics of USI since January 2007. Before
arriving in Lugano he was a (full) Professor at the
University of Strathclyde in Glasgow (UK) since
2000. During that time he was a Visiting
Professor at IMAG (France), and spent a year sabbatical
at UC Berkeley (USA) and Xerox PARC (USA).
In 1997-1999 he was a Postdoctoral Research
Fellow at the University of Glasgow (UK), at the
International Computer Science Institute in
Berkeley (USA), and at the Rutherford Appleton
Laboratory (UK). Earlier, in 1992-97 he was Assistant
Professor at the Department of Information
Engineering of the University of Padova (Italy). Fabio
holds a degree in Statistics from the University of
Padova (Italy) and an MSc and PhD in Computing
Science from the University of Glasgow (UK). His
main areas of research are Information Retrieval,
Text Mining, and Digital Libraries. He has
coedited 9 books and published over 150
publications in these areas of research. He is
Editor-inChief of Information Processing and Management
(Elsevier) and member of the editorial board of a
number of journals.
2.3</p>
    </sec>
    <sec id="sec-4">
      <title>Kevin-Bretonnel Cohen (Prof, PhD)</title>
      <p>Kevin Bretonnel Cohen is the Director of the
Biomedical Text Mining Group at the University
of Colorado School of Medicine. His research
covers a wide range of topics in biomedical
natural language processing, ranging from named
entity recognition to software engineering and
evaluation for language processing applications. Since
2008, he has been the chair of the Association for
Computational Linguistics special interest group
on biomedical natural language processing.
2.4</p>
    </sec>
    <sec id="sec-5">
      <title>Maguelonne Teisseire (Prof, PhD)</title>
      <p>Maguelonne Teisseire received a PhD degree in
Computing Science from the Me´diterrane´e
University, France, in 1994. Her research interests
focused on behavioral modeling and design. In
1995- 2008, she was an Assistant Professor of
Computer Science and Engineering in Montpellier
II University and Polytech’Montpellier, France.
She headed the Data Mining Group at the LIRMM
Laboratory Lab, Montpellier, France, from 2000
to 2008. She is currently a Research Professor
- Irstea and she joined the TETIS lab in March
2009. Her research interest focus on advanced
data mining approaches when considering that
data are time ordered. Particularly, she is
interested in text mining and sequential patterns. Her
research takes part on different projects supported
by either National Government (RNTL) or
regional project. She has published numerous
papers in refereed journals and conferences either on
behavioral modeling or data mining.
3</p>
      <sec id="sec-5-1">
        <title>Track on Social Network and Media</title>
      </sec>
      <sec id="sec-5-2">
        <title>Analysis and Mining (SNMAM 2016)</title>
        <p>Online social networks are web platforms that
provide a variety of services. Users may: share
locations and community activities, post and tag
photos and other media content, as well as contact
individuals with similar interests. The rapid growth
of social networks, as well as the rapid increase
in social media consumption and production have
made the analysis of social media and networks
a hot topic among academic researchers and
industry practitioners alike. SIMBig has become an
important venue that has attracted computer
scientists, computer engineers, software engineers,
and application developers from around the world.
The Social Network and Media Analysis and
Mining (SNMAM) track of SIMBig has provided a
forum that brings both researchers and practitioners
to discuss: research trends and techniques related
to social networks and media.</p>
        <sec id="sec-5-2-1">
          <title>Topics of Interest</title>
          <p>We included all the important topics related to
social network and media analysis and mining within
SNMAM. The topics suitable for SNMAM
included:
• Data modeling for social networks and social
media
• Dynamics and evolution of social networks
• Topological, geographical and temporal
analysis of social networks
• Privacy and security in social networks
• Pattern analysis in social networks
• Community structure analysis in social
networks
• Link prediction and recommendation
systems
• Propagation and diffusion of information in
social networks
• Detection of spam, misinformation and
malicious activities in social networks
• Location-based social networks
• Modeling of user behavior and interaction in
social networks
• Information retrieval in social network and
media services
• Business and political impact in social
network and media analysis
• Big data issues in social network and media
analysis
• Monitoring social networks and media
• Analysis of the relationship between social
media and traditional media
• Exploratory and visual data mining of social
network and media data
4</p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>Sponsors and Organizers</title>
        <p>We want to thank our wonderful sponsors! We
extend our sincere appreciation to our sponsors,
without whom our symposium would not be
possible. They showed their commitment to making
our research communities more active. We invite
you to support these community-minded
organizations.
4.1</p>
        <sec id="sec-5-3-1">
          <title>Organizing Institutions</title>
          <p>• Universidad Andina del Cusco, Peru´10
• Universidad del Pac´ıfico, Peru´11
• University of Florida, USA12
• Universite´ de Montpellier, France13
4.2</p>
        </sec>
        <sec id="sec-5-3-2">
          <title>Collaborating Institutions</title>
          <p>• Grupo de Reconocimiento de Patrones e
Inteligencia Artificial Aplicada, PUCP, Peru´14
• Universidad Nacional Mayor de San Marcos,</p>
          <p>Peru´15
• Escuela de Post-grado de la Pontificia
Universidad Cato´lica del Peru´16
10http://www.uandina.edu.pe/
11http://www.up.edu.pe/
12http://www.ufl.edu/
13http://www.umontpellier.fr/
14http://inform.pucp.edu.pe/˜grpiaa/
15http://www.unmsm.edu.pe/
16http://posgrado.pucp.edu.pe/
la-escuela/presentacion/</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>David W Embley</surname>
          </string-name>
          and Stephen W Liddle.
          <year>2013</year>
          .
          <article-title>Big data-conceptual modeling to the rescue</article-title>
          .
          <source>In Conceptual Modeling, ER'13</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . LNCS, Springer.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Juan Antonio</surname>
          </string-name>
          Lossio-Ventura and Hugo AlatristaSalas, editors.
          <source>2014. Proceedings of the 1st Symposium on Information Management and Big Data - SIMBig</source>
          <year>2014</year>
          , Cusco, Peru, September 8-
          <issue>10</issue>
          ,
          <year>2014</year>
          , volume
          <volume>1318</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Juan Antonio</surname>
          </string-name>
          Lossio-Ventura and Hugo AlatristaSalas, editors.
          <source>2015. Proceedings of the 2nd Annual International Symposium on Information Management and Big Data - SIMBig</source>
          <year>2015</year>
          , Cusco, Peru, September 2-
          <issue>4</issue>
          ,
          <year>2015</year>
          , volume
          <volume>1478</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Sam</given-names>
            <surname>Madden</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>From databases to big data</article-title>
          . volume
          <volume>16</volume>
          , pages
          <fpage>4</fpage>
          -
          <lpage>6</lpage>
          . IEEE Educational Activities Department, Piscataway, NJ, USA, may.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>