=Paper= {{Paper |id=Vol-1743/o1 |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-1743/o1.pdf |volume=Vol-1743 }} ==None== https://ceur-ws.org/Vol-1743/o1.pdf
               Information Management and Big Data: SIMBig overview

                Juan Antonio Lossio-Ventura                       Hugo Alatrista-Salas
                  Health Outcomes & Policy                       Universidad del Pacı́fico
                    University of Florida                Pontificia Universidad Católica del Perú
                        Florida, USA                                    Lima, Peru
               jlossioventura@ufl.edu                       h.alatristas@up.edu.pe



                        Abstract                             data-driven research (Madden, 2012; Embley and
                                                             Liddle, 2013). A major problem hampering Big
        Big Data and Data Science are popular                Data Analytics development is the need to process
        terms used to describe the exponential               several types of data, such as structured, numeric
        growth of data and the analysis of this data         and unstructured data (e.g. video, audio, text, im-
        respectively. The aim of the symposium               age, etc)2 .
        is to present the analysis of methods for               Our third edition of the Annual International
        extracting knowledge from large volumes              Symposium on Information Management and Big
        of data through techniques of data science           Data - SIMBig 20163 , seeks to present the new
        and artificial intelligence. Bringing to-            methods of data science an related fields for an-
        gether main national and international ac-           alyzing and managing large volumes. Counting
        tors in the decision-making field to state           with main national and international actors in the
        in new technologies dedicated to handle              decision-making field to state in new technologies
        large amount of information.                         dedicated to handle large amount of information.
                                                                The second edition, SIMBig 20154 , was held
1       Introduction                                         in Cusco, Peru, from September 2nd to 4th,
Big Data is a popular term used to describe the              2015. SIMBig 2015 has been indexed on DBLP5
exponential growth and availability of data, which           (Lossio-Ventura and Alatrista-Salas, 2015) and on
could be structured and unstructured. Data Sci-              CEUR Workshop Proceedings6 .
ence is a field seeking to extract knowledge or in-             Our first edition, SIMBig 20147 took also place
sights from large volumes of heterogeneous data              in Cusco, Peru in September 2014. SIMBig 2014
(e.g. video, audio, text, image). Data Science is a          has also been indexed on DBLP8 (Lossio-Ventura
continuation of some fields such as the data analy-          and Alatrista-Salas, 2014) and on CEUR Work-
sis, statistics, machine learning, data mining simi-         shop Proceedings9 .
lar to Knowledge Discovery in Databases (KDD).
                                                             Scope and Topics
   Big Data has taken place over the last 20 years.
For instance, social networks such as Facebook,              To share the new analysis methods for manag-
Twitter and Linkedin generate masses of data,                ing large volumes of data, we encouraged par-
which is available to be accessed by other appli-            ticipation from researchers in all fields related to
cations. Several domains, including biomedicine,             Big Data, Data Science, Data Mining, Natural
life sciences and scientific research, have been af-         Language Processing and Semantic Web, but also
fected by Big Data1 . Therefore there is a need to              2
                                                                  Today, 80% of data is unstructured such as images,
understand and exploit this data. This process can           video, and notes
                                                                3
                                                                  http://simbig.org/SIMBig2016/
be carried out thanks to “Data Science”, which is               4
                                                                  http://simbig.org/SIMBig2015/
based on methodologies of Data Mining, Natural                  5
                                                                  http://dblp2.uni-trier.de/db/conf/
Language Processing, Semantic Web, Statistics,               simbig/simbig2015.html
                                                                6
etc. That allows us to gain new insight through                   http://ceur-ws.org/Vol-1478/
                                                                7
                                                                  https://www.lirmm.fr/simbig2014/
    1                                                           8
    By 2015 the average of data annually generated in             http://dblp2.uni-trier.de/db/conf/
hospitals is 665TB: https://datafloq.com/read/               simbig/simbig2014.html
                                                                9
body-source-big-data-infographic/413.                             http://ceur-ws.org/Vol-1318/




                                                        11
Multilingual Text Processing, Biomedical NLP.               the author of a book on Adaptive Stream Mining
Topics of interest of SIMBig 2016 included but              and Pattern Learning and Mining from Evolving
were not limited to:                                        Data Streams. Additionally, he was editor of the
                                                            Big Data Mining special issue of SIGKDD Explo-
    • Data Science                                          rations in 2012. Also, he is serving as Co-Chair
    • Big Data                                              of the Industrial track of ECML PKDD 2016,
    • Data Mining                                           and served as Co-Chair of BigMine (2014, 2013,
    • Natural Language Processing                           2012), and ACM SAC Data Streams Track (2016,
    • Bio NLP                                               2014, 2013, 2012).
    • Text Mining
    • Information Retrieval                                 2.2   Fabio Crestani (Prof, PhD)
    • Machine Learning                                      Fabio Crestani is a full professor at the Faculty of
    • Semantic Web                                          Informatics of USI since January 2007. Before ar-
    • Ontologies                                            riving in Lugano he was a (full) Professor at the
    • Web Mining                                            University of Strathclyde in Glasgow (UK) since
    • Knowledge Representation and Linked Open              2000. During that time he was a Visiting Profes-
      Data                                                  sor at IMAG (France), and spent a year sabbatical
    • Social Networks, Social Web, and Web Sci-             at UC Berkeley (USA) and Xerox PARC (USA).
      ence                                                  In 1997-1999 he was a Postdoctoral Research Fel-
    • Information visualization                             low at the University of Glasgow (UK), at the In-
    • OLAP, Data Warehousing                                ternational Computer Science Institute in Berke-
    • Business Intelligence                                 ley (USA), and at the Rutherford Appleton Labo-
    • Spatiotemporal Data                                   ratory (UK). Earlier, in 1992-97 he was Assistant
    • Health Care                                           Professor at the Department of Information Engi-
    • Agent-based Systems                                   neering of the University of Padova (Italy). Fabio
    • Reasoning and Logic                                   holds a degree in Statistics from the University of
    • Constraints, Satisfiability, and Search               Padova (Italy) and an MSc and PhD in Computing
                                                            Science from the University of Glasgow (UK). His
2    Keynote Speakers
                                                            main areas of research are Information Retrieval,
This third edition of SIMBig, we had experts on             Text Mining, and Digital Libraries. He has co-
different areas, such as Data Science, Information          edited 9 books and published over 150 publica-
Retrieval, Natural Language Processing and Data             tions in these areas of research. He is Editor-in-
Mining. Our four invited were:                              Chief of Information Processing and Management
                                                            (Elsevier) and member of the editorial board of a
2.1 Albert Bifet (Prof, PhD)                                number of journals.
Albert Bifet is a Big Data scientist with 10+ years
of international experience in research and in lead-        2.3   Kevin-Bretonnel Cohen (Prof, PhD)
ing new open source software projects for busi-             Kevin Bretonnel Cohen is the Director of the
ness analytics, data mining and machine learning            Biomedical Text Mining Group at the University
(Huawei, Yahoo, University of Waikato, UPC). He             of Colorado School of Medicine. His research
obtained a PhD from UPC-BarcelonaTech. He has               covers a wide range of topics in biomedical nat-
worked in Hong Kong, New Zealand and Europe.                ural language processing, ranging from named en-
At Yahoo Labs, he co-founded Apache SAMOA                   tity recognition to software engineering and evalu-
(Scalable Advanced Massive Online Analysis) in              ation for language processing applications. Since
2013. Apache SAMOA is distributed streaming                 2008, he has been the chair of the Association for
machine learning (ML) framework that contains a             Computational Linguistics special interest group
programing abstraction for distributed streaming            on biomedical natural language processing.
ML algorithms. At the WEKA Machine Learn-
ing group, he is co-leading MOA (Massive Online             2.4   Maguelonne Teisseire (Prof, PhD)
Analysis) since 2008. MOA is the most popular               Maguelonne Teisseire received a PhD degree in
open source framework for data stream mining,               Computing Science from the Méditerranée Uni-
with more than 20,000 downloads each year. He is            versity, France, in 1994. Her research interests



                                                       12
focused on behavioral modeling and design. In                   • Community structure analysis in social net-
1995- 2008, she was an Assistant Professor of                     works
Computer Science and Engineering in Montpellier                 • Link prediction and recommendation sys-
II University and Polytech’Montpellier, France.                   tems
She headed the Data Mining Group at the LIRMM                   • Propagation and diffusion of information in
Laboratory Lab, Montpellier, France, from 2000                    social networks
to 2008. She is currently a Research Professor                  • Detection of spam, misinformation and mali-
- Irstea and she joined the TETIS lab in March                    cious activities in social networks
2009. Her research interest focus on advanced                   • Location-based social networks
data mining approaches when considering that                    • Modeling of user behavior and interaction in
data are time ordered. Particularly, she is inter-                social networks
ested in text mining and sequential patterns. Her               • Information retrieval in social network and
research takes part on different projects supported               media services
by either National Government (RNTL) or re-                     • Business and political impact in social net-
gional project. She has published numerous pa-                    work and media analysis
pers in refereed journals and conferences either on             • Big data issues in social network and media
behavioral modeling or data mining.                               analysis
                                                                • Monitoring social networks and media
3    Track on Social Network and Media                          • Analysis of the relationship between social
     Analysis and Mining (SNMAM 2016)                             media and traditional media
Online social networks are web platforms that pro-              • Exploratory and visual data mining of social
vide a variety of services. Users may: share loca-                network and media data
tions and community activities, post and tag pho-
                                                           4        Sponsors and Organizers
tos and other media content, as well as contact in-
dividuals with similar interests. The rapid growth         We want to thank our wonderful sponsors! We
of social networks, as well as the rapid increase          extend our sincere appreciation to our sponsors,
in social media consumption and production have            without whom our symposium would not be pos-
made the analysis of social media and networks             sible. They showed their commitment to making
a hot topic among academic researchers and in-             our research communities more active. We invite
dustry practitioners alike. SIMBig has become an           you to support these community-minded organiza-
important venue that has attracted computer sci-           tions.
entists, computer engineers, software engineers,
                                                           4.1       Organizing Institutions
and application developers from around the world.
The Social Network and Media Analysis and Min-                  • Universidad Andina del Cusco, Perú10
ing (SNMAM) track of SIMBig has provided a fo-                  • Universidad del Pacı́fico, Perú11
rum that brings both researchers and practitioners              • University of Florida, USA12
to discuss: research trends and techniques related              • Université de Montpellier, France13
to social networks and media.
                                                           4.2       Collaborating Institutions
Topics of Interest                                              • Grupo de Reconocimiento de Patrones e In-
We included all the important topics related to so-               teligencia Artificial Aplicada, PUCP, Perú14
cial network and media analysis and mining within               • Universidad Nacional Mayor de San Marcos,
SNMAM. The topics suitable for SNMAM in-                          Perú15
cluded:                                                         • Escuela de Post-grado de la Pontificia Uni-
                                                                  versidad Católica del Perú16
    • Data modeling for social networks and social             10
                                                                http://www.uandina.edu.pe/
      media                                                    11
                                                                http://www.up.edu.pe/
    • Dynamics and evolution of social networks              12
                                                                http://www.ufl.edu/
                                                             13
    • Topological, geographical and temporal anal-           14
                                                                http://www.umontpellier.fr/
                                                                http://inform.pucp.edu.pe/˜grpiaa/
      ysis of social networks                                15
                                                                http://www.unmsm.edu.pe/
    • Privacy and security in social networks                16
                                                                http://posgrado.pucp.edu.pe/
    • Pattern analysis in social networks                  la-escuela/presentacion/




                                                      13
4.3 WTI Organizing Institutions
   • Instituto de Ciências Matemáticas e de
     Computação, USP, Brasil17
   • Labóratorio de Intêligencia Computacional,
     ICMC, USP, Brasil18
   • Universidade Federal de São Carlos, Brasil19



References
David W Embley and Stephen W Liddle. 2013.
  Big data—conceptual modeling to the rescue. In
  Conceptual Modeling, ER’13, pages 1–8. LNCS,
  Springer.
Juan Antonio Lossio-Ventura and Hugo Alatrista-
  Salas, editors. 2014. Proceedings of the 1st
  Symposium on Information Management and Big
  Data - SIMBig 2014, Cusco, Peru, September 8-10,
  2014, volume 1318 of CEUR Workshop Proceed-
  ings. CEUR-WS.org.
Juan Antonio Lossio-Ventura and Hugo Alatrista-
  Salas, editors. 2015. Proceedings of the 2nd An-
  nual International Symposium on Information Man-
  agement and Big Data - SIMBig 2015, Cusco, Peru,
  September 2-4, 2015, volume 1478 of CEUR Work-
  shop Proceedings. CEUR-WS.org.

Sam Madden. 2012. From databases to big data. vol-
  ume 16, pages 4–6. IEEE Educational Activities De-
  partment, Piscataway, NJ, USA, may.




  17
     http://www.icmc.usp.br/Portal/
  18
     http://labic.icmc.usp.br/
  19
     http://www2.ufscar.br/home/index.php




                                                       14