=Paper= {{Paper |id=Vol-1942/preface |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-1942/preface.pdf |volume=Vol-1942 }} ==None== https://ceur-ws.org/Vol-1942/preface.pdf
                               Workshop Notes




                          The 3rd international workshop on

 Advances in Bioinformatics and Artificial
   Intelligence: Bridging the Gap (BAI)
                     Melbourne, Australia, August 20, 2017
                http://bioinfo.uqam.ca/IJCAI_BAI2017/

Editors:

Name                                       Affiliation
Wajdi Dhifli                               University of Lille
                                           wajdi.dhifli@univ-lille2.fr
                                           https://sites.google.com/site/wajdidhifli/
Abdoulaye Baniré Diallo                    University of Quebec at Montreal (Canada)
                                           diallo.abdoulaye@uqam.ca
                                           http://labo.bioinfo.uqam.ca
Engelbert Mephu Nguifo                     University Clermont Auvergne (France)
                                           mephu@isima.fr
                                           http://www.isima.fr/mephu
Mohammed Javeed Zaki                       Rensselaer Polytechnic Institute, NY (USA)
                                           zaki@cs.rpi.edu
                                           http://www.cs.rpi.edu/~zaki/
                                       Preface
The goal of this workshop called Bioinformatics and Artificial Intelligence (BAI) is to bring
together active scholars and practitioners at the frontiers of Artificial Intelligence (AI) and
Bioinformatics. AI holds a tremendous repertoire of algorithms and methods that constitute
the core of different topics of bioinformatics and computational biology research. BAI goals
are twofolds :
     - How can AI techniques contribute to bioinformatics research?, and
     - How can bioinformatics research raise new fundamental questions in AI?
Contributions clearly points out answers to one of these goals focusing on AI techniques as
well as focusing on biological problems.

Aims and Scope:

AI has played an increasingly important role in the analysis of sequence, structure and
functional patterns or models from sequence databases. Bioinformatics aims to store,
organize, explore, extract, analyze, interpret, and utilize information from biological data.
The main outcome of this workshop is to present latest results in this exciting area at the
intersection of biology and AI.

AI approaches can revolutionize new age of bioinformatics and computational biology with
discoveries in basic biology, evolution, metagenomics, system biology, regulatory
genomics, population genomics and diseases, structural bioinformatics, protein docking,
next-generation sequencing (NGS) data processing, chemoinformatics, etc.

Bioinformatics provides opportunities for developing novel AI methods. Some of the grand
challenges in bioinformatics include protein structure prediction, homology search,
epigenetics, multiple alignment and phylogeny construction, genomic sequence analysis,
gene finding and gene mapping, as well as applications in gene expression data analysis,
drug discovery in pharmaceutical industry, etc.

Two questions were at the heart of this workshop :
   - How can AI techniques contribute to Bioinformatics research, and in particular
      dealing with biological problems?
   - How can Bioinformatics raise new fundamental research problem for AI research?

This one-day workshop aims at bringing together scholars and practitioners active in
Artificial Intelligence driven Bioinformatics, to present and discuss their research, share
their knowledge and experiences, and discuss the current state of the art and the future
improvements to advance the intelligent practice of computational biology.
Workshop Topics:

Topics of interest lie at the intersection of AI and Bioinformatics. They include, but are not
limited to, the following inter-linked topics:

Artificial Intelligence :
     - Constraints, satisfiability and search
     - Knowledge representation, reasoning and logic
     - Machine learning and data mining
     - Planning and scheduling
     - Agent-based and multi-agent systems
     - Web and knowledge-based information systems
     - Natural language processing
     - Uncertainty

Bioinformatics :
    - Comparative genomics
    - Evolution and phylogenetics
    - Epigenetics
    - Functional genomics
    - Genome organization and annotation
    - Genetic variation analysis
    - Metagenomics
    - Pathogen informatics
    - Population genetics, variation and evolution
    - Protein structure and function prediction and analysis
    - Proteomics
    - Sequence analysis
    - Systems biology and networks


Workshop Contributions:

This year, the papers submitted to the workshop were carefully peer-reviewed by at least
three members of the program committee and among the 10 submissions, 5 papers with the
highest scores were selected. We would like to thank all the PC members and the reviewers
for their reviews, as well as all the authors for their contributions. The workshop was a one
day format with two keynote speakers and five oral presentations.
Keynote Speakers:
The first keynote speaker was Dr. Saman Halgamuge, Professor and Director of Research
School of Engineering at the Australian National University, Canberra, Australia. His talk
was entitled : « Unsupervised Deep Learning: Applications in Metagenomics, Metabolomics
and Drug Characterisation ». Most of the existing Deep Learning methods rely on the
assumption that all possible class labels sufficient to apply Supervised Learning are
available. Although these types of learning algorithms can be generalized, their predictive
power will be heavily constrained in the presence of partial information of a problem. For
example, the classes that are available to a classifier are assumed to be ground truth, and
their correctness is not generally questioned. In contrast to this approach, we propose a
learning framework where the number of classes within a dataset do not need to be known a
priori, and more specifically, the entire set of class labels are not required at the time of
training. Instead, we propose to develop a method that will be able to infer the number of
classes based only on the data and generate a more representative set of classes to train a
robust classifier. Furthermore, we will also relax the assumption that these class labels are
ground truth, and allow a degree of uncertainty in their correctness. An interesting solution
for a subclass of these problems is Positive Unlabelled Learning. Applying data analytics to
microbial ecology has direct benefits to the design of vaccines and treatments to emerging
pathogens, such as the Zika virus. In Metagenomic applications, very little may be known
since we have only curated information pertaining to less than 2% of microbial diversity,
and far less for novel variants of viruses. It is therefore not a realistic assumption that one
can access all the true and underlying (organism) classes of any available data when
analysing these organisms. Moreover, if we also consider the different and unknown number
of effects that viral mutants can have on different hosts, and that these mutations could be
linked to several environmental or geographical factors, we arrive at a complex,
heterogeneous data set where labels are mostly unavailable, or any pre-existing labels
available may be incorrect or not applicable to emerging viral strains. Even so, all these
different types of data are essential to building a near-complete picture of the problem and
understanding these pathogens at a deeper, more intimate level. Statistical analysis of DNA
sequence data has previously assisted us in identification of features that may further be
used to discriminate species in a sample of multiple organisms using unsupervised learning
methods. Methods for increasing the resolution in realising the microbial population
structure in a metagenomic sample is being worked on and coupling known data with
unsupervised learning is found to be useful. Repositioning of existing drugs as appropriate
medication for previously not associated medical conditions can reduce the time, costs and
risks of drug development by identifying new therapeutic effects. Investigating and
understanding the interactions between drugs as well as how they work on our body is
important in improving the effectiveness of clinical care. A method based on Positive
Unlabelled Learning and Growing Self Organising Maps is used on data available in
DrugBank database. It was possible to infer 589 drug pairs that are likely to not interact
with each other. Unsupervised Deep Learning also contributes in working with
multielectrode array data.
The Second keynote speaker was Dr. Shoba Ranganathan, Professor of Bioinformatics at
the Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney,
Australia. Her talk was entitled : « A protocol for finding missing proteins ». In the quest to
uncover the entire human proteome, finding "missing proteins" remains the Holy Grail of
scientists. In order to capture existing information, in addition to high-stringency MS data,
we have launched the MissingProteinPedia (MPP; missingproteins.org), as an integrative
biological database. While MPP incorporates automated data collection, novel tools for
functional annotation and collated publications, there is an urgent need to identify a protocol
for evaluating MPP data, to facilitate missing protein annotation jamborees. We will present
how best to evaluate "extraordinary evidence" for missing proteins, with some exciting data
confirming successful uncovering of some missing proteins.

Oral Presentations:

The accepted papers were presented during the workshop.
Workshop Program:

       Time                                       Event

    8:00 - 9:00    Registration and opening

                   Keynote speaker 1: Prof. Saman Halgamuge
    9:00 - 10:00   Unsupervised Deep Learning: Applications in Metagenomics,
                   Metabolomics and Drug Characterisation.

   10:00 - 10:30   Coffee Break

                   Oral presentation: Qingyu Chen, Xiuzhen Zhang, Yu Wan, Justin
                   Zobel and Karin Verspoor.
   10:30 - 11:00   Sequence Clustering Methods and Completeness of Biological
                   Database Search

                   Oral presentation: Aidan O'Brien, Piotr Szul, Oscar Luo, Andrew
                   George, Robert Dunne and Denis Bauer.
   11:00 - 11:30   Breaking the curse of dimensionality for machine learning on genomic
                   data.

                   Oral presentation: Alexandre Bazin, Didier Debroas and Engelbert
                   Mephu Nguifo.
   11:30 - 12:00   A De Novo Robust Clustering Approach for Amplicon-Based Sequence
                   Data.

   12:00 - 14:00   Lunch

                   Keynote speaker 2: Prof. Shoba Ranganathan
   14:00 - 15:00   A protocol for finding missing proteins.

                   Oral presentation: Borut Budna, Martin Gjoreski, Anton Gradišek
                   and Matjaz Gams.
   15:00 - 15:30   JSI Sound – a machine-learning tool in Orange for simple biosound
                   classification.

                   Oral presentation: Isabelle Bichindaritz and Thomas Quinn.
   15:30 - 16:00   Feature Selection and Deep Learning for Survival Analysis.

   16:00 - 16:30   Coffee Break

   16:30 - 17:00   Concluding remarks
Program Committee:
Sabeur Aridhi, University of Lorraine (France)
Enrico Coiera, Macquarie University (Australia)
Annie Chateau, University of Montpellier 2 (France)
Sergio Vale Aguiar Campos, Federal University of Minas Gerais (Brasil)
Elisabetta De Maria, University of Nice Sophia-Antipoliss (France)
Wajdi Dhifli, University of Lille (France)
Abdoulaye Baniré Diallo, University of Quebec at Montreal (Canada)
Mohamed Elati, University of Evry-Val-d’Essonne (France)
Anna Gambin, University of Warsaw (Poland)
Simon de Givry, INRA Toulouse (France)
Tu-Bao Ho, JAIST School of Knowledge Science (Japan)
Attila Kertesz-Farkas, National Research University Higher School of Economics (Russia)
Chung-Shou Liao, National Tsing Hua University (Taiwan)
Fréderique Lisacek, Swiss Institute of Bioinformatics, Geneva (Switzerland)
Mondher Maddouri, Taibah University (Saudi Arabia)
Osamu Maruyama, Kyushu University (Japan)
Engelbert Mephu Nguifo, University Clermont Auvergne (France)
Claire Nedellec, INRA Jouy-en-josas (France)
Dave Ritchie, INRIA-LORIA, University Henry Poincaré, Nancy (France)
Sushmita Roy, University of Wisconsin, Madison (USA)
Marcilio de Souto, University of Orleans (France)
Dechang Xu, Harbin Institute of Technology (China)
Mohammed J. Zaki, Rensselaer Polytechnic Institute, NY (USA)


Acknowledgements :
We would also like to thank all authors for contributing to our workshop and for their great
presentation at the workshop. Furthermore, we thank all reviewers and subreviewers for
their time and efforts in helping us build an interesting program.



                                                             The Workshop chairs
                                                             Wajdi Dhifli
                                                             Abdoulaye Banier Diallo
                                                             Engelbert Mephu Nguifo
                                                             Mohammed Javeed Zaki