Overview of SIMBig 2017: 4th Annual International Symposium on Information Management and Big Data Juan Antonio Lossio-Ventura Hugo Alatrista-Salas Health Outcomes & Policy Universidad del Pacı́fico University of Florida Av. Salaverry 2020, Jesús Marı́a Florida, USA Lima, Peru jlossioventura@ufl.edu h.alatristas@up.edu.pe Abstract Salas, 2014) and on CEUR Workshop Proceed- ings5,6,7 . SIMBig presents the analysis of new methods for extracting knowledge from Scope and Topics large volumes of data through techniques To share the new analysis methods for managing of data science and artificial intelligence. large volumes of data, we encouraged participa- SIMBig gathers national and international tion from researchers in all fields related to Data researchers in the data science field to state Science, Big Data, Data Mining, Natural Lan- in new technologies dedicated to handle guage Processing, and Semantic Web. Topics of large amount of information. interest of SIMBig 2017 included but were not limited to: 1 Introduction • Data Science Our fourth edition of the Annual International • Big Data Symposium on Information Management and Big • Data Mining Data - SIMBig 20171 , took place in Septem- • Natural Language Processing ber 2017; in Lima, Peru; at the Universidad del • Semantic Web Pacı́fico. SIMBig 2017 presented the new meth- • Bio NLP, Healthcare Informatics ods of data science an related fields for analyzing • Text Mining and managing large volumes. Counting with main • Information Retrieval national and international actors in the decision- • Machine Learning making field to state in new technologies dedi- • Ontologies, Knowledge Representation, cated to handle large amount of information. Linked Open Data The best papers of SIMBig 2016 and SIMBig • Social Networks, Social Web, and Web Sci- 2015 (e.g., eleven papers) have been published ence in Springer (Lossio-Ventura and Alatrista-Salas, • Information visualization 2017). Our third edition, SIMBig 20162 , was • OLAP, Data warehousing, Business intelli- held in Cusco, Peru in September 2016. As well gence as, the second edition, SIMBig 2015, was held in • Spatiotemporal Data Cusco, Peru, in September 2015. The first edition, • Agent-based Systems SIMBig 20143 , took place in Cuzco Peru too in September 2014. 2 Keynote Speakers SIMBig 2016, 2015, and 2014 have been in- dexed on DBLP4 (Lossio-Ventura and Alatrista- SIMBig 2017 has welcomed five keynote speakers Salas, 2016, 2015; Lossio-Ventura and Alatrista- experts in Data Science, Big Data, Data Mining, Natural Language Processing (NLP), and Seman- 1 http://simbig.org/SIMBig2017/ 2 tic Web: http://simbig.org/SIMBig2016/ 3 5 https://www.lirmm.fr/simbig2014/ http://ceur-ws.org/Vol-1743/ 4 6 http://dblp.uni-trier.de/db/conf/ http://ceur-ws.org/Vol-1478/ 7 simbig/index.html http://ceur-ws.org/Vol-1318/ 12 2.1 Regina Barzilay (Professor, PhD) ford University, where he is Director of the Stan- Dr. Regina Barzilay is a professor in the De- ford Center for Biomedical Informatics Research. partment of Electrical Engineering and Computer Dr. Musen conducts research related to intelligent Science and a member of the Computer Science systems, reusable ontologies, metadata for publi- and Artificial Intelligence Laboratory at the Mas- cation of scientific data sets, and biomedical deci- sachusetts Institute of Technology. Barzilay’s re- sion support. His group developed Protégé, the search on natural languages focuses on the devel- world’s most widely used technology for build- opment of models of natural language, and uses ing and managing terminologies and ontologies. those models to solve real-world language pro- He is principal investigator of the National Center cessing tasks. Her research in computational lin- for Biomedical Ontology, one of the original Na- guistics deals with multilingual learning, interpret- tional Centers for Biomedical Computing created ing text for solving control problems, and finding by the U.S. National Institutes of Heath (NIH). document-level structure within text. He is principal investigator of the Center for Ex- panded Data Annotation and Retrieval (CEDAR). She is a recipient of various awards including CEDAR is a center of excellence supported by of the NSF Career Award, the MIT Technology the NIH Big Data to Knowledge Initiative, with Review TR-35 Award, Microsoft Faculty Fellow- the goal of developing new technology to ease the ship and several Best Paper Awards at NAACL and authoring and management of biomedical experi- ACL. She received her PhD in Computer Science mental metadata. from Columbia University, and spent a year as a Dr. Musen directs the World Health Organiza- postdoc at Cornell University tion Collaborating Center for Classification, Ter- 2.2 Jiawei Han (Professor, PhD) minology, and Standards at Stanford University, which has developed much of the information in- Dr. Jiawei Han is Abel Bliss Professor in the frastructure for the authoring and management of Department of Computer Science at the Univer- the 11th edition of the International Classification sity of Illinois. He has been researching into of Diseases (ICD-11). Dr. Musen was the recip- data mining, information network analysis, and ient of the Donald A. B. Lindberg Award for In- database systems, with over 700 publications. He novation in Informatics from the American Medi- served as the founding Editor-in-Chief of ACM cal Informatics Association in 2006. He has been Transactions on Knowledge Discovery from Data elected to the American College of Medical Infor- (TKDD). Dr. Han has received ACM SIGKDD matics, the Association of American Physicians, Innovation Award (2004), IEEE Computer Soci- and the National Academy of Medicine. He is ety Technical Achievement Award (2005), IEEE founding co-editor-in-chief of the journal Applied Computer Society W. Wallace McDowell Award Ontology. (2009), and Daniel C. Drucker Eminent Faculty Award at UIUC (2011). He is a Fellow of ACM 2.4 Ravi Kumar (PhD) and Fellow of IEEE. His co-authored textbook Dr. Ravi Kumar has been a senior staff research “Data Mining: Concepts and Techniques” (Mor- scientist at Google since 2012. Prior to this, he gan Kaufmann) has been adopted worldwide. was a research staff member at the IBM Almaden Dr. Han is currently the co-Director of Research Center and a principal research scientist KnowEnG, a Center of Excellence in Big Data at Yahoo! Research. Dr. Ravi Kumar obtained Computing, funded by NIH Big Data to Knowl- his PhD in Computer Science from Cornell Uni- edge (BD2K) Initiative. He also served in 2009- versity. His research interests include Web search 2016 as the Director of Information Network Aca- and data mining, algorithms for massive data, and demic Research Center (INARC) supported by the the theory of computation. Network Science-Collaborative Technology Al- liance (NS-CTA) program of U.S. Army Research 2.5 Clement Jonquet (Professor, PhD) Lab. Dr. Clement Jonquet is assistant professor at University of Montpellier, France and since Sept. 2.3 Mark A. Musen (Professor, PhD, MD) 2015 visiting scholar at the Stanford University. Dr. Mark Musen is Professor of Biomedical Infor- He is a researcher at the Laboratory of Informat- matics and of Biomedical Data Science at Stan- ics, Robotics, and Microelectronics of Montpel- 13 lier (LIRMM), on (biomedical/agronomical) on- • Crowd sourcing of network data generation tologies, semantic data indexing and annotation, and collection semantic Web, text mining, knowledge represen- • Community structure analysis in social net- tation. Dr. Jonquet obtained his PhD in Informat- works ics from the same university in 2006 (about multi- • Link prediction and recommendation sys- agent systems, grid and service-oriented comput- tems ing), then he served as a postdoc for 3 years at • Propagation and diffusion of information in the Stanford BMIR within Pr. Mark A. Musen’s social networks group where he was working on semantic anno- • Location-based social networks tations of biomedical data using biomedical on- • Mobile computing and applications on social tologies in the context of the National Center for networks Biomedical Ontology (NCBO) project. He con- • Modeling of user behavior and interaction in tributed actively to the design, evolution and de- social networks velopment of the NCBO BioPortal and won the • Information retrieval in social network and 1st prize at ISWC Semantic Web Challenge 2010. media services 3 Track on Social Network and Media • Business and political impact in social net- Analysis and Mining (SNMAM 2017) work and media analysis. • Monitoring social networks and media. Online social networks are web platforms that pro- • Analysis of the relationship between social vide a variety of services. Users may share lo- media and traditional media cations and community activities, post and tag • Exploratory and visual data mining of social photos and other media content, as well as con- networks and media data. tact individuals with similar interests. The rapid • Ethics and privacy in social network and me- growth of social networks, as well as the rapid dia services. increase in social media consumption and pro- • Big data issues in social network and media duction have made the analysis of social media analysis. and networks a hot topic amongst academic re- searchers and industry practitioners alike. SIMBig 4 Track on Applied Natural Language has become an important venue that has attracted Processing (ANLP 2017) computer scientists, computer engineers, software engineers, and application developers from around The availability and size of textual information the world. Within the general symposium, the have grown dramatically in different areas such Social Network and Media Analysis and Mining as academic, work or individual. Emails, work- (SNMAM) track provided a forum that brought ing papers, scientific articles or social media pub- both researchers and practitioners to discuss re- lications are some examples of large sources of search trends and techniques related to social net- data that are presented in natural language. This works and media. raises a challenge, since the language presents a type of unstructured data that contains ambiguity, Topics of Interest among other properties that increase the difficulty We included all the important topics related to so- in the processing task. In this context, there is cial network and media analysis and mining within a growing interest in improving the accessibility SNMAM. The topics suitable for SNMAM in- to information and its exploitation in different en- cluded: vironments by companies and organizations. For all this, the applications of Natural Language Pro- • Data modeling for social networks and social cessing have become very important today. SIM- media Big has become an important meeting point of • Dynamics and evolution of social networks computer scientists, computer engineers, software • Topological, geographical and temporal anal- engineers, and application developers from around ysis of social networks the world. The Applied Natural Language Pro- • Privacy and security in social networks cessing (ANLP) track of SIMBig provided a fo- • Pattern analysis in social networks rum that brought both researchers and practition- 14 ers to discuss: research trends and techniques re- • Escuela de Post-grado de la Pontificia Uni- lated to Natural Language Processing. versidad Católica del Perú13 Topics of Interest 5.3 SNMAM Organizing Institutions • Instituto de Ciências Matemáticas e de We included all the important topics related to ap- Computação, USP, Brasil14 plied natural language processing within ANLP. • Labóratorio de Intêligencia Computacional, The topics suitable for ANLP included: ICMC, USP, Brasil15 • Machine Translation. • Universidade Federal de São Carlos, Brasil16 • Sentiment Analysis/Opinion Mining. 5.4 ANLP Organizing Institutions • Automatic Summarization. • Universidad Nacional Mayor de San Marcos, • Plagiarism Detection. Perú17 • Language Detection. • Grupo de Reconocimiento de Patrones e In- • Natural Language Generation. teligencia Artificial Aplicada, PUCP, Perú18 • Natural Language Interfaces. • Instituto de Ciências Matemáticas e de • NLP in Informal Texts. Computação, USP, Brasil • Question-Answering Systems. • Universidade Federal de São Carlos, Brasil • Content Analysis. • NLP for Education. • NLP for Low-Resource Languages. References • Bio-NLP. Juan Antonio Lossio-Ventura and Hugo Alatrista- • Dialogue System Salas, editors. 2014. Proceedings of the 1st Sym- • Information Retrieval and Extraction posium on Information Management and Big Data - SIMBig 2014, Cusco, Peru, September 8-10, • Event Detection 2014, volume 1318 of CEUR Workshop Proceed- • Text Classification ings. CEUR-WS.org. http://ceur-ws.org/Vol-1318. • Multilingual NLP Juan Antonio Lossio-Ventura and Hugo Alatrista- • Ontology-based NLP Salas, editors. 2015. Proceedings of the 2nd An- nual International Symposium on Information Man- 5 Sponsors agement and Big Data - SIMBig 2015, Cusco, Peru, September 2-4, 2015, volume 1478 of CEUR We want to thank our wonderful sponsors! We Workshop Proceedings. CEUR-WS.org. http://ceur- ws.org/Vol-1478. extend our sincere appreciation to our sponsors, without whom our symposium would not be pos- Juan Antonio Lossio-Ventura and Hugo Alatrista- sible. They showed their commitment to making Salas, editors. 2016. Proceedings of the 3rd An- nual International Symposium on Information Man- our research communities more active. We invite agement and Big Data - SIMBig 2016, Cusco, you to support these community-minded organiza- Peru, September 1-3, 2016, volume 1743 of CEUR tions. Workshop Proceedings. CEUR-WS.org. http://ceur- ws.org/Vol-1743. 5.1 Organizing Institutions Juan Antonio Lossio-Ventura and Hugo Alatrista- • Universidad del Pacı́fico, Perú8 Salas. 2017. Information Management and Big Data: Second Annual International Symposium, • University of Florida, USA9 SIMBig 2015, Cusco, Peru, September 2-4, 2015, • Universidad Andina del Cusco, Perú10 and Third Annual International Symposium, SIM- Big 2016, Cusco, Peru, September 1-3, 2016, Re- 5.2 Collaborating Institutions vised Selected Papers, volume 656. Springer. https://doi.org/10.1007/978-3-319-55209-5. • Springer11 • Banco de Crédito del Perú12 13 http://posgrado.pucp.edu.pe/ la-escuela/presentacion/ 8 14 http://www.up.edu.pe/ http://www.icmc.usp.br/Portal/ 9 15 http://www.ufl.edu/ http://labic.icmc.usp.br/ 10 16 http://www.uandina.edu.pe/ http://www2.ufscar.br/home/index.php 11 17 http://www.springer.com/la/ http://www.unmsm.edu.pe/ 12 18 https://www.viabcp.com/wps/portal/ http://inform.pucp.edu.pe/˜grpiaa/ 15