=Paper= {{Paper |id=Vol-2029/o1 |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-2029/o1.pdf |volume=Vol-2029 }} ==None== https://ceur-ws.org/Vol-2029/o1.pdf
        Overview of SIMBig 2017: 4th Annual International Symposium on
                     Information Management and Big Data

                 Juan Antonio Lossio-Ventura                           Hugo Alatrista-Salas
                   Health Outcomes & Policy                           Universidad del Pacı́fico
                     University of Florida                        Av. Salaverry 2020, Jesús Marı́a
                         Florida, USA                                       Lima, Peru
                jlossioventura@ufl.edu                            h.alatristas@up.edu.pe


                         Abstract                             Salas, 2014) and on CEUR Workshop Proceed-
                                                              ings5,6,7 .
        SIMBig presents the analysis of new
        methods for extracting knowledge from                 Scope and Topics
        large volumes of data through techniques              To share the new analysis methods for managing
        of data science and artificial intelligence.          large volumes of data, we encouraged participa-
        SIMBig gathers national and international             tion from researchers in all fields related to Data
        researchers in the data science field to state        Science, Big Data, Data Mining, Natural Lan-
        in new technologies dedicated to handle               guage Processing, and Semantic Web. Topics of
        large amount of information.                          interest of SIMBig 2017 included but were not
                                                              limited to:
1       Introduction
                                                                  • Data Science
Our fourth edition of the Annual International                    • Big Data
Symposium on Information Management and Big                       • Data Mining
Data - SIMBig 20171 , took place in Septem-                       • Natural Language Processing
ber 2017; in Lima, Peru; at the Universidad del                   • Semantic Web
Pacı́fico. SIMBig 2017 presented the new meth-                    • Bio NLP, Healthcare Informatics
ods of data science an related fields for analyzing               • Text Mining
and managing large volumes. Counting with main                    • Information Retrieval
national and international actors in the decision-                • Machine Learning
making field to state in new technologies dedi-                   • Ontologies, Knowledge Representation,
cated to handle large amount of information.                        Linked Open Data
   The best papers of SIMBig 2016 and SIMBig                      • Social Networks, Social Web, and Web Sci-
2015 (e.g., eleven papers) have been published                      ence
in Springer (Lossio-Ventura and Alatrista-Salas,                  • Information visualization
2017). Our third edition, SIMBig 20162 , was                      • OLAP, Data warehousing, Business intelli-
held in Cusco, Peru in September 2016. As well                      gence
as, the second edition, SIMBig 2015, was held in                  • Spatiotemporal Data
Cusco, Peru, in September 2015. The first edition,                • Agent-based Systems
SIMBig 20143 , took place in Cuzco Peru too in
September 2014.                                               2       Keynote Speakers
   SIMBig 2016, 2015, and 2014 have been in-
dexed on DBLP4 (Lossio-Ventura and Alatrista-                 SIMBig 2017 has welcomed five keynote speakers
Salas, 2016, 2015; Lossio-Ventura and Alatrista-              experts in Data Science, Big Data, Data Mining,
                                                              Natural Language Processing (NLP), and Seman-
    1
    http://simbig.org/SIMBig2017/
    2
                                                              tic Web:
    http://simbig.org/SIMBig2016/
  3                                                               5
    https://www.lirmm.fr/simbig2014/                                http://ceur-ws.org/Vol-1743/
  4                                                               6
    http://dblp.uni-trier.de/db/conf/                               http://ceur-ws.org/Vol-1478/
                                                                  7
simbig/index.html                                                   http://ceur-ws.org/Vol-1318/




                                                         12
2.1 Regina Barzilay (Professor, PhD)                         ford University, where he is Director of the Stan-
Dr. Regina Barzilay is a professor in the De-                ford Center for Biomedical Informatics Research.
partment of Electrical Engineering and Computer              Dr. Musen conducts research related to intelligent
Science and a member of the Computer Science                 systems, reusable ontologies, metadata for publi-
and Artificial Intelligence Laboratory at the Mas-           cation of scientific data sets, and biomedical deci-
sachusetts Institute of Technology. Barzilay’s re-           sion support. His group developed Protégé, the
search on natural languages focuses on the devel-            world’s most widely used technology for build-
opment of models of natural language, and uses               ing and managing terminologies and ontologies.
those models to solve real-world language pro-               He is principal investigator of the National Center
cessing tasks. Her research in computational lin-            for Biomedical Ontology, one of the original Na-
guistics deals with multilingual learning, interpret-        tional Centers for Biomedical Computing created
ing text for solving control problems, and finding           by the U.S. National Institutes of Heath (NIH).
document-level structure within text.                        He is principal investigator of the Center for Ex-
                                                             panded Data Annotation and Retrieval (CEDAR).
   She is a recipient of various awards including
                                                             CEDAR is a center of excellence supported by
of the NSF Career Award, the MIT Technology
                                                             the NIH Big Data to Knowledge Initiative, with
Review TR-35 Award, Microsoft Faculty Fellow-
                                                             the goal of developing new technology to ease the
ship and several Best Paper Awards at NAACL and
                                                             authoring and management of biomedical experi-
ACL. She received her PhD in Computer Science
                                                             mental metadata.
from Columbia University, and spent a year as a
                                                                Dr. Musen directs the World Health Organiza-
postdoc at Cornell University
                                                             tion Collaborating Center for Classification, Ter-
2.2 Jiawei Han (Professor, PhD)                              minology, and Standards at Stanford University,
                                                             which has developed much of the information in-
Dr. Jiawei Han is Abel Bliss Professor in the                frastructure for the authoring and management of
Department of Computer Science at the Univer-                the 11th edition of the International Classification
sity of Illinois. He has been researching into               of Diseases (ICD-11). Dr. Musen was the recip-
data mining, information network analysis, and               ient of the Donald A. B. Lindberg Award for In-
database systems, with over 700 publications. He             novation in Informatics from the American Medi-
served as the founding Editor-in-Chief of ACM                cal Informatics Association in 2006. He has been
Transactions on Knowledge Discovery from Data                elected to the American College of Medical Infor-
(TKDD). Dr. Han has received ACM SIGKDD                      matics, the Association of American Physicians,
Innovation Award (2004), IEEE Computer Soci-                 and the National Academy of Medicine. He is
ety Technical Achievement Award (2005), IEEE                 founding co-editor-in-chief of the journal Applied
Computer Society W. Wallace McDowell Award                   Ontology.
(2009), and Daniel C. Drucker Eminent Faculty
Award at UIUC (2011). He is a Fellow of ACM                  2.4   Ravi Kumar (PhD)
and Fellow of IEEE. His co-authored textbook                 Dr. Ravi Kumar has been a senior staff research
“Data Mining: Concepts and Techniques” (Mor-                 scientist at Google since 2012. Prior to this, he
gan Kaufmann) has been adopted worldwide.                    was a research staff member at the IBM Almaden
   Dr.    Han is currently the co-Director of                Research Center and a principal research scientist
KnowEnG, a Center of Excellence in Big Data                  at Yahoo! Research. Dr. Ravi Kumar obtained
Computing, funded by NIH Big Data to Knowl-                  his PhD in Computer Science from Cornell Uni-
edge (BD2K) Initiative. He also served in 2009-              versity. His research interests include Web search
2016 as the Director of Information Network Aca-             and data mining, algorithms for massive data, and
demic Research Center (INARC) supported by the               the theory of computation.
Network Science-Collaborative Technology Al-
liance (NS-CTA) program of U.S. Army Research                2.5   Clement Jonquet (Professor, PhD)
Lab.                                                         Dr. Clement Jonquet is assistant professor at
                                                             University of Montpellier, France and since Sept.
2.3 Mark A. Musen (Professor, PhD, MD)                       2015 visiting scholar at the Stanford University.
Dr. Mark Musen is Professor of Biomedical Infor-             He is a researcher at the Laboratory of Informat-
matics and of Biomedical Data Science at Stan-               ics, Robotics, and Microelectronics of Montpel-



                                                        13
lier (LIRMM), on (biomedical/agronomical) on-                  • Crowd sourcing of network data generation
tologies, semantic data indexing and annotation,                 and collection
semantic Web, text mining, knowledge represen-                 • Community structure analysis in social net-
tation. Dr. Jonquet obtained his PhD in Informat-                works
ics from the same university in 2006 (about multi-             • Link prediction and recommendation sys-
agent systems, grid and service-oriented comput-                 tems
ing), then he served as a postdoc for 3 years at               • Propagation and diffusion of information in
the Stanford BMIR within Pr. Mark A. Musen’s                     social networks
group where he was working on semantic anno-                   • Location-based social networks
tations of biomedical data using biomedical on-
                                                               • Mobile computing and applications on social
tologies in the context of the National Center for
                                                                 networks
Biomedical Ontology (NCBO) project. He con-
                                                               • Modeling of user behavior and interaction in
tributed actively to the design, evolution and de-
                                                                 social networks
velopment of the NCBO BioPortal and won the
                                                               • Information retrieval in social network and
1st prize at ISWC Semantic Web Challenge 2010.
                                                                 media services
3    Track on Social Network and Media                         • Business and political impact in social net-
     Analysis and Mining (SNMAM 2017)                            work and media analysis.
                                                               • Monitoring social networks and media.
Online social networks are web platforms that pro-             • Analysis of the relationship between social
vide a variety of services. Users may share lo-                  media and traditional media
cations and community activities, post and tag                 • Exploratory and visual data mining of social
photos and other media content, as well as con-                  networks and media data.
tact individuals with similar interests. The rapid             • Ethics and privacy in social network and me-
growth of social networks, as well as the rapid                  dia services.
increase in social media consumption and pro-
                                                               • Big data issues in social network and media
duction have made the analysis of social media
                                                                 analysis.
and networks a hot topic amongst academic re-
searchers and industry practitioners alike. SIMBig
                                                           4    Track on Applied Natural Language
has become an important venue that has attracted
                                                                Processing (ANLP 2017)
computer scientists, computer engineers, software
engineers, and application developers from around          The availability and size of textual information
the world. Within the general symposium, the               have grown dramatically in different areas such
Social Network and Media Analysis and Mining               as academic, work or individual. Emails, work-
(SNMAM) track provided a forum that brought                ing papers, scientific articles or social media pub-
both researchers and practitioners to discuss re-          lications are some examples of large sources of
search trends and techniques related to social net-        data that are presented in natural language. This
works and media.                                           raises a challenge, since the language presents a
                                                           type of unstructured data that contains ambiguity,
Topics of Interest
                                                           among other properties that increase the difficulty
We included all the important topics related to so-        in the processing task. In this context, there is
cial network and media analysis and mining within          a growing interest in improving the accessibility
SNMAM. The topics suitable for SNMAM in-                   to information and its exploitation in different en-
cluded:                                                    vironments by companies and organizations. For
                                                           all this, the applications of Natural Language Pro-
    • Data modeling for social networks and social         cessing have become very important today. SIM-
      media                                                Big has become an important meeting point of
    • Dynamics and evolution of social networks            computer scientists, computer engineers, software
    • Topological, geographical and temporal anal-         engineers, and application developers from around
      ysis of social networks                              the world. The Applied Natural Language Pro-
    • Privacy and security in social networks              cessing (ANLP) track of SIMBig provided a fo-
    • Pattern analysis in social networks                  rum that brought both researchers and practition-



                                                      14
ers to discuss: research trends and techniques re-           • Escuela de Post-grado de la Pontificia Uni-
lated to Natural Language Processing.                          versidad Católica del Perú13

Topics of Interest                                         5.3    SNMAM Organizing Institutions
                                                             • Instituto de Ciências Matemáticas e de
We included all the important topics related to ap-
                                                               Computação, USP, Brasil14
plied natural language processing within ANLP.
                                                             • Labóratorio de Intêligencia Computacional,
The topics suitable for ANLP included:
                                                               ICMC, USP, Brasil15
    • Machine Translation.                                   • Universidade Federal de São Carlos, Brasil16
    • Sentiment Analysis/Opinion Mining.                   5.4    ANLP Organizing Institutions
    • Automatic Summarization.                               • Universidad Nacional Mayor de San Marcos,
    • Plagiarism Detection.                                    Perú17
    • Language Detection.                                    • Grupo de Reconocimiento de Patrones e In-
    • Natural Language Generation.                             teligencia Artificial Aplicada, PUCP, Perú18
    • Natural Language Interfaces.                           • Instituto de Ciências Matemáticas e de
    • NLP in Informal Texts.                                   Computação, USP, Brasil
    • Question-Answering Systems.                            • Universidade Federal de São Carlos, Brasil
    • Content Analysis.
    • NLP for Education.
    • NLP for Low-Resource Languages.                      References
    • Bio-NLP.                                             Juan Antonio Lossio-Ventura and Hugo Alatrista-
    • Dialogue System                                        Salas, editors. 2014. Proceedings of the 1st Sym-
    • Information Retrieval and Extraction                   posium on Information Management and Big Data
                                                             - SIMBig 2014, Cusco, Peru, September 8-10,
    • Event Detection                                        2014, volume 1318 of CEUR Workshop Proceed-
    • Text Classification                                    ings. CEUR-WS.org. http://ceur-ws.org/Vol-1318.
    • Multilingual NLP                                     Juan Antonio Lossio-Ventura and Hugo Alatrista-
    • Ontology-based NLP                                     Salas, editors. 2015. Proceedings of the 2nd An-
                                                             nual International Symposium on Information Man-
5       Sponsors                                             agement and Big Data - SIMBig 2015, Cusco,
                                                             Peru, September 2-4, 2015, volume 1478 of CEUR
We want to thank our wonderful sponsors! We                  Workshop Proceedings. CEUR-WS.org. http://ceur-
                                                             ws.org/Vol-1478.
extend our sincere appreciation to our sponsors,
without whom our symposium would not be pos-               Juan Antonio Lossio-Ventura and Hugo Alatrista-
sible. They showed their commitment to making                Salas, editors. 2016. Proceedings of the 3rd An-
                                                             nual International Symposium on Information Man-
our research communities more active. We invite              agement and Big Data - SIMBig 2016, Cusco,
you to support these community-minded organiza-              Peru, September 1-3, 2016, volume 1743 of CEUR
tions.                                                       Workshop Proceedings. CEUR-WS.org. http://ceur-
                                                             ws.org/Vol-1743.
5.1 Organizing Institutions                                Juan Antonio Lossio-Ventura and Hugo Alatrista-
    • Universidad del Pacı́fico, Perú8                      Salas. 2017. Information Management and Big
                                                             Data: Second Annual International Symposium,
    • University of Florida, USA9                            SIMBig 2015, Cusco, Peru, September 2-4, 2015,
    • Universidad Andina del Cusco, Perú10                  and Third Annual International Symposium, SIM-
                                                             Big 2016, Cusco, Peru, September 1-3, 2016, Re-
5.2 Collaborating Institutions                               vised Selected Papers, volume 656.       Springer.
                                                             https://doi.org/10.1007/978-3-319-55209-5.
    • Springer11
    • Banco de Crédito del Perú12                          13
                                                                http://posgrado.pucp.edu.pe/
                                                           la-escuela/presentacion/
    8                                                        14
       http://www.up.edu.pe/                                    http://www.icmc.usp.br/Portal/
    9                                                        15
       http://www.ufl.edu/                                      http://labic.icmc.usp.br/
    10                                                       16
       http://www.uandina.edu.pe/                               http://www2.ufscar.br/home/index.php
    11                                                       17
       http://www.springer.com/la/                              http://www.unmsm.edu.pe/
    12                                                       18
       https://www.viabcp.com/wps/portal/                       http://inform.pucp.edu.pe/˜grpiaa/




                                                      15