Information Management and Big Data: SIMBig overview Juan Antonio Lossio-Ventura Hugo Alatrista-Salas Health Outcomes & Policy Universidad del Pacı́fico University of Florida Pontificia Universidad Católica del Perú Florida, USA Lima, Peru jlossioventura@ufl.edu h.alatristas@up.edu.pe Abstract data-driven research (Madden, 2012; Embley and Liddle, 2013). A major problem hampering Big Big Data and Data Science are popular Data Analytics development is the need to process terms used to describe the exponential several types of data, such as structured, numeric growth of data and the analysis of this data and unstructured data (e.g. video, audio, text, im- respectively. The aim of the symposium age, etc)2 . is to present the analysis of methods for Our third edition of the Annual International extracting knowledge from large volumes Symposium on Information Management and Big of data through techniques of data science Data - SIMBig 20163 , seeks to present the new and artificial intelligence. Bringing to- methods of data science an related fields for an- gether main national and international ac- alyzing and managing large volumes. Counting tors in the decision-making field to state with main national and international actors in the in new technologies dedicated to handle decision-making field to state in new technologies large amount of information. dedicated to handle large amount of information. The second edition, SIMBig 20154 , was held 1 Introduction in Cusco, Peru, from September 2nd to 4th, Big Data is a popular term used to describe the 2015. SIMBig 2015 has been indexed on DBLP5 exponential growth and availability of data, which (Lossio-Ventura and Alatrista-Salas, 2015) and on could be structured and unstructured. Data Sci- CEUR Workshop Proceedings6 . ence is a field seeking to extract knowledge or in- Our first edition, SIMBig 20147 took also place sights from large volumes of heterogeneous data in Cusco, Peru in September 2014. SIMBig 2014 (e.g. video, audio, text, image). Data Science is a has also been indexed on DBLP8 (Lossio-Ventura continuation of some fields such as the data analy- and Alatrista-Salas, 2014) and on CEUR Work- sis, statistics, machine learning, data mining simi- shop Proceedings9 . lar to Knowledge Discovery in Databases (KDD). Scope and Topics Big Data has taken place over the last 20 years. For instance, social networks such as Facebook, To share the new analysis methods for manag- Twitter and Linkedin generate masses of data, ing large volumes of data, we encouraged par- which is available to be accessed by other appli- ticipation from researchers in all fields related to cations. Several domains, including biomedicine, Big Data, Data Science, Data Mining, Natural life sciences and scientific research, have been af- Language Processing and Semantic Web, but also fected by Big Data1 . Therefore there is a need to 2 Today, 80% of data is unstructured such as images, understand and exploit this data. This process can video, and notes 3 http://simbig.org/SIMBig2016/ be carried out thanks to “Data Science”, which is 4 http://simbig.org/SIMBig2015/ based on methodologies of Data Mining, Natural 5 http://dblp2.uni-trier.de/db/conf/ Language Processing, Semantic Web, Statistics, simbig/simbig2015.html 6 etc. That allows us to gain new insight through http://ceur-ws.org/Vol-1478/ 7 https://www.lirmm.fr/simbig2014/ 1 8 By 2015 the average of data annually generated in http://dblp2.uni-trier.de/db/conf/ hospitals is 665TB: https://datafloq.com/read/ simbig/simbig2014.html 9 body-source-big-data-infographic/413. http://ceur-ws.org/Vol-1318/ 11 Multilingual Text Processing, Biomedical NLP. the author of a book on Adaptive Stream Mining Topics of interest of SIMBig 2016 included but and Pattern Learning and Mining from Evolving were not limited to: Data Streams. Additionally, he was editor of the Big Data Mining special issue of SIGKDD Explo- • Data Science rations in 2012. Also, he is serving as Co-Chair • Big Data of the Industrial track of ECML PKDD 2016, • Data Mining and served as Co-Chair of BigMine (2014, 2013, • Natural Language Processing 2012), and ACM SAC Data Streams Track (2016, • Bio NLP 2014, 2013, 2012). • Text Mining • Information Retrieval 2.2 Fabio Crestani (Prof, PhD) • Machine Learning Fabio Crestani is a full professor at the Faculty of • Semantic Web Informatics of USI since January 2007. Before ar- • Ontologies riving in Lugano he was a (full) Professor at the • Web Mining University of Strathclyde in Glasgow (UK) since • Knowledge Representation and Linked Open 2000. During that time he was a Visiting Profes- Data sor at IMAG (France), and spent a year sabbatical • Social Networks, Social Web, and Web Sci- at UC Berkeley (USA) and Xerox PARC (USA). ence In 1997-1999 he was a Postdoctoral Research Fel- • Information visualization low at the University of Glasgow (UK), at the In- • OLAP, Data Warehousing ternational Computer Science Institute in Berke- • Business Intelligence ley (USA), and at the Rutherford Appleton Labo- • Spatiotemporal Data ratory (UK). Earlier, in 1992-97 he was Assistant • Health Care Professor at the Department of Information Engi- • Agent-based Systems neering of the University of Padova (Italy). Fabio • Reasoning and Logic holds a degree in Statistics from the University of • Constraints, Satisfiability, and Search Padova (Italy) and an MSc and PhD in Computing Science from the University of Glasgow (UK). His 2 Keynote Speakers main areas of research are Information Retrieval, This third edition of SIMBig, we had experts on Text Mining, and Digital Libraries. He has co- different areas, such as Data Science, Information edited 9 books and published over 150 publica- Retrieval, Natural Language Processing and Data tions in these areas of research. He is Editor-in- Mining. Our four invited were: Chief of Information Processing and Management (Elsevier) and member of the editorial board of a 2.1 Albert Bifet (Prof, PhD) number of journals. Albert Bifet is a Big Data scientist with 10+ years of international experience in research and in lead- 2.3 Kevin-Bretonnel Cohen (Prof, PhD) ing new open source software projects for busi- Kevin Bretonnel Cohen is the Director of the ness analytics, data mining and machine learning Biomedical Text Mining Group at the University (Huawei, Yahoo, University of Waikato, UPC). He of Colorado School of Medicine. His research obtained a PhD from UPC-BarcelonaTech. He has covers a wide range of topics in biomedical nat- worked in Hong Kong, New Zealand and Europe. ural language processing, ranging from named en- At Yahoo Labs, he co-founded Apache SAMOA tity recognition to software engineering and evalu- (Scalable Advanced Massive Online Analysis) in ation for language processing applications. Since 2013. Apache SAMOA is distributed streaming 2008, he has been the chair of the Association for machine learning (ML) framework that contains a Computational Linguistics special interest group programing abstraction for distributed streaming on biomedical natural language processing. ML algorithms. At the WEKA Machine Learn- ing group, he is co-leading MOA (Massive Online 2.4 Maguelonne Teisseire (Prof, PhD) Analysis) since 2008. MOA is the most popular Maguelonne Teisseire received a PhD degree in open source framework for data stream mining, Computing Science from the Méditerranée Uni- with more than 20,000 downloads each year. He is versity, France, in 1994. Her research interests 12 focused on behavioral modeling and design. In • Community structure analysis in social net- 1995- 2008, she was an Assistant Professor of works Computer Science and Engineering in Montpellier • Link prediction and recommendation sys- II University and Polytech’Montpellier, France. tems She headed the Data Mining Group at the LIRMM • Propagation and diffusion of information in Laboratory Lab, Montpellier, France, from 2000 social networks to 2008. She is currently a Research Professor • Detection of spam, misinformation and mali- - Irstea and she joined the TETIS lab in March cious activities in social networks 2009. Her research interest focus on advanced • Location-based social networks data mining approaches when considering that • Modeling of user behavior and interaction in data are time ordered. Particularly, she is inter- social networks ested in text mining and sequential patterns. Her • Information retrieval in social network and research takes part on different projects supported media services by either National Government (RNTL) or re- • Business and political impact in social net- gional project. She has published numerous pa- work and media analysis pers in refereed journals and conferences either on • Big data issues in social network and media behavioral modeling or data mining. analysis • Monitoring social networks and media 3 Track on Social Network and Media • Analysis of the relationship between social Analysis and Mining (SNMAM 2016) media and traditional media Online social networks are web platforms that pro- • Exploratory and visual data mining of social vide a variety of services. Users may: share loca- network and media data tions and community activities, post and tag pho- 4 Sponsors and Organizers tos and other media content, as well as contact in- dividuals with similar interests. The rapid growth We want to thank our wonderful sponsors! We of social networks, as well as the rapid increase extend our sincere appreciation to our sponsors, in social media consumption and production have without whom our symposium would not be pos- made the analysis of social media and networks sible. They showed their commitment to making a hot topic among academic researchers and in- our research communities more active. We invite dustry practitioners alike. SIMBig has become an you to support these community-minded organiza- important venue that has attracted computer sci- tions. entists, computer engineers, software engineers, 4.1 Organizing Institutions and application developers from around the world. The Social Network and Media Analysis and Min- • Universidad Andina del Cusco, Perú10 ing (SNMAM) track of SIMBig has provided a fo- • Universidad del Pacı́fico, Perú11 rum that brings both researchers and practitioners • University of Florida, USA12 to discuss: research trends and techniques related • Université de Montpellier, France13 to social networks and media. 4.2 Collaborating Institutions Topics of Interest • Grupo de Reconocimiento de Patrones e In- We included all the important topics related to so- teligencia Artificial Aplicada, PUCP, Perú14 cial network and media analysis and mining within • Universidad Nacional Mayor de San Marcos, SNMAM. The topics suitable for SNMAM in- Perú15 cluded: • Escuela de Post-grado de la Pontificia Uni- versidad Católica del Perú16 • Data modeling for social networks and social 10 http://www.uandina.edu.pe/ media 11 http://www.up.edu.pe/ • Dynamics and evolution of social networks 12 http://www.ufl.edu/ 13 • Topological, geographical and temporal anal- 14 http://www.umontpellier.fr/ http://inform.pucp.edu.pe/˜grpiaa/ ysis of social networks 15 http://www.unmsm.edu.pe/ • Privacy and security in social networks 16 http://posgrado.pucp.edu.pe/ • Pattern analysis in social networks la-escuela/presentacion/ 13 4.3 WTI Organizing Institutions • Instituto de Ciências Matemáticas e de Computação, USP, Brasil17 • Labóratorio de Intêligencia Computacional, ICMC, USP, Brasil18 • Universidade Federal de São Carlos, Brasil19 References David W Embley and Stephen W Liddle. 2013. Big data—conceptual modeling to the rescue. In Conceptual Modeling, ER’13, pages 1–8. LNCS, Springer. Juan Antonio Lossio-Ventura and Hugo Alatrista- Salas, editors. 2014. Proceedings of the 1st Symposium on Information Management and Big Data - SIMBig 2014, Cusco, Peru, September 8-10, 2014, volume 1318 of CEUR Workshop Proceed- ings. CEUR-WS.org. Juan Antonio Lossio-Ventura and Hugo Alatrista- Salas, editors. 2015. Proceedings of the 2nd An- nual International Symposium on Information Man- agement and Big Data - SIMBig 2015, Cusco, Peru, September 2-4, 2015, volume 1478 of CEUR Work- shop Proceedings. CEUR-WS.org. Sam Madden. 2012. From databases to big data. vol- ume 16, pages 4–6. IEEE Educational Activities De- partment, Piscataway, NJ, USA, may. 17 http://www.icmc.usp.br/Portal/ 18 http://labic.icmc.usp.br/ 19 http://www2.ufscar.br/home/index.php 14