<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Research Summary: Knowledge Transfer in Arti cial Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pierre-Alexandre Murena</string-name>
          <email>murena@telecom-paristech.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Telecom ParisTech - Universite Paris Saclay</institution>
          ,
          <addr-line>46 rue Barrault, 75013 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>204</fpage>
      <lpage>208</lpage>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>This document presents the main research problems addressed during my PhD
studies. All these researches are led inside the two teams DBWeb in Telecom
ParisTech and LInK (Learning and Integration of Knowledge) in AgroParisTech, both
located in Paris, and supervised by Pr. Jean-Louis Dessalles and Pr. Antoine
Cornuejols.</p>
      <p>My researches focus on learning theory both in the perspective of symbolic
machine learning and of learning in continuous domains. I aim at nding an
information-theoretic principle guiding information transfer in learning.</p>
      <p>The start point of these researches is the idea that most machine learning
takes a strong stationary hypothesis for granted. The general framework of
statistical learning (mainly supervised and semi-supervised learning) considers two
data sets: a learning data set (from which the concepts have to be learned) and a
test data set (on which quality of the learned concepts is evaluated). The key idea
of current learning methods and theories is to assume that training data and test
data are independent and identically distributed (i.i.d.). However this strong
hypothesis does not hold in many cases: either the data generation process evolves
over time (aging e ect, trending e ect...) or the data belong to a di erent
domain. Because similar questions of transfer and domain adaptation had already
been addressed in analogical reasoning, we proposed to use an approach based
on Kolmogorov complexity instead of probabilities. Kolmogorov complexity is a
measure of the information contained inside an object. The use of Kolmogorov
complexity in machine learning is accepted by the community, but mainly in
a stationary point of view (when the key concept does not vary); we proposed
to extend its use to non-stationary environments, in the same way as done in
analogical reasoning. A presentation of these issues is given in section 2.</p>
      <p>The strong similarity between transfer learning and analogical reasoning led
me to consider this issue in my researches. Analogical reasoning consists in
situations of the form \`b' is to `a' as `d' is to `c' ". Because its value has already been
demonstrated, I focus on Hofstadter's micro-world, made up of strings of
alphabetical characters that can be described with simple concepts like `predecessor',
`successor' or `repetition'. The use of Kolmogorov complexity for analogical
reasoning had already been considered, but our approach is slightly di erent. We
Copyright © 2017 for this paper by its authors. Copying permitted for private and
academic purpose. In Proceedings of the ICCBR 2017 Workshops. Trondheim, Norway
developed a small descriptive language for Hofstadter's problems and convert it
into a binary code, the length of which corresponds to Kolmogorov complexity.
Our work on analogy is presented in section 3. Finally, because of its very global
perspective, our research topic leads naturally to collaborations on various topics
related to learning. This side aspect of my research is presented in section 4.
2</p>
    </sec>
    <sec id="sec-2">
      <title>A global approach of learning</title>
      <p>Statistical machine learning in its current form is often considered to be based
mainly on three inductive principles: Empirical Risk Minimization (ERM),
Bayesianism and Minimum Description Length (MDL). The validity of ERM has been
demonstrated under the strong i.i.d. assumption using several frameworks, all
inspired by the Probably Approximately Correct learning framework. Besides
strong links have been stated between Bayesianism and MDL.</p>
      <p>
        When the i.i.d. hypothesis is not veri ed, these three principles are not valid
anymore and have to be replaced by new principles. Exploring this direction, we
considered the most straightforward transfer learning problem and the classi
cation problem, the purpose of which is to associate data to labels. Given source
data XS associated to their classes YS and a target data XT , we aim at nding
the corresponding classes YT . The idea is to nd a classi cation function S such
that S (XS ) = YS and to transfer this function into a classi cation function T
available on the target data XT . Because this problem is the same as analogical
reasoning, we used a simpli cation of the general MDL principle in the context
of analogy [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        The mathematical tool used to measure the description length in MDL is
Kolmogorov complexity [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Kolmogorov complexity (also called algorithmic
complexity) of an object x is an information theoretic measure de ned as the minimal
length of a program de ned on a Universal Turing Machine (UTM) and the
output of which is object x. This quantity can be shown to be machine independent,
but non calculable.
      </p>
      <p>The idea we developed is to consider an upper-bound of this complexity
based on a restricted Turing machine. The choice of a restricted Turing
machine corresponds to an inductive bias, inevitable in any inductive reasoning
(see for example the no-free-lunch theorem). This choice also raises the problem
of machine dependency which seems crucial in human learning.</p>
      <p>
        Our rst contribution is a direct use of analogical MDL in the context of
transfer learning [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]: I presented a two-part MDL equation based on a data
representation called model. A model is any object which may be used to compress
data. In transfer learning, our purpose is to infer a source model MS and a target
model MT :
      </p>
      <p>min
MS;MT</p>
      <p>C(MS ) + C(XS jMS ) + C(YS jMS ; XS ) + C(MT jMS ) + C(XT jMT ) (1)
where C(:) designates Kolmogorov complexity, X the data, YS the source labels
and M the model. This equation applies both for continuous data (X is a matrix,
the rows of which correspond to a vector data point) and for symbolic data (X is
a sequence of symbols, for example a sequence of letters). The di erent terms
in the equation present a strong similarity with usual machine learning terms:
C(M ) corresponds to a model penalization based on its complexity; C(XjM )
corresponds to a likelihood term, ie. a tness of the model toward data; and
the term C(Y jM; X) corresponds to an empirical risk. In the paper, we also
proposed experimental validations on two toy data sets with a simple
prototypebased model. They present good results and high performance on these data
sets.</p>
      <p>
        A direct variant of this formula has been proposed for incremental
learning [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In incremental learning, the system faces a sequence of questions X1; X2; : : :
and has to nd the solution to each problem one by one. The model used to
describe data can be updated if it is outdated and does not correspond to the
current data anymore. We propose the following simpli ed MDL objective:
min
M
      </p>
      <p>X C(MtjM
t</p>
      <p>t 1(f1g)) + C(XtjMt) + C( tjMt; Xt) + C(YtjMt; Xt; t) (2)
where t is a model association function such that t(u) = 1 if model Mt can
be described with model Mu, and t(u) = 0 otherwise. The consistency of our
framework with existing heuristics state-of-the-art methods has been established,
as well as the validity of a naive algorithm based on the same neural model as
presented for transfer learning.</p>
      <p>The successful results obtained with MDL so far encourage future research
tracks. Several problems emerge from the developed framework. First, the
transfer objective 1 is valid for one target only. In practice, several target problems
may occur, hence a multi-target variant of transfer has to be given. In
particular, i.i.d. hypothesis consists in assuming in nitely-many targets with speci c
distributions. I am currently exploring an approach based on Pareto-optimality,
implying a prior over the future and thus a new learning concept: concern for
future question.</p>
      <p>Another question of interest is the theoretical validity of such approaches.
Unlike statistical learning which has a clear measure of quality (given by the
risk), an approach based on MDL does not present any natural quality measure.
Such a function has to be found before an equivalent of PAC learning can be
developed. Additionally, incremental learning methods do not have access to the
whole objective function 2 at all steps: only local optimizations are possible.
A measure of the impact of this algorithmic simpli cation appears as a direct
consequence. Finally, we aim at nding a geometric interpretation of these
equations. An interesting track is o ered by the domain of information geometry and
probability distribution manifolds. Under some speci c conditions, Kolmogorov
complexity may be associated to a probability distributed, hence considered in
the perspective of information geometry.</p>
    </sec>
    <sec id="sec-3">
      <title>Approaches of analogy</title>
      <p>
        Because of its crucial role in my researches on learning, I attach great
importance to studying analogical reasoning. For now on, my researches focused on
Hofstadter's micro-world [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] which presents highly general characteristics of
analogical thought. I will work on this research track with Dr. Laurent Orseau.
      </p>
      <p>
        Preliminary works have already given insightful results and promising
perspectives. I chose to work on the development of a small prototype language
generating Hofstadter's analogies (of the form ABC : ABD :: IJK : IJL,
which has the be read \ABD is to ABC what IJL is to IJK"). Among other
speci cations, the proposed language had to be generative, ie. describe a
dynamic generation rather than a static description (as opposed to the description
in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for instance). For example, the string ABC will be generated by the
program alphabet,sequence,3, which can be read as \Consider the alphabet and
take the sequence of rst three letters ". Once such a language is de ned, it is
turned into a pre x-delimited binary code, the length of which measures an
upper-bound of Kolmogorov complexity.
      </p>
      <p>
        A more elaborated and general version of the language has been recently
proposed. This memory-based language o ers a exibility in the management of
prior knowledge of the user and o ers a simple way to set priority to operations:
its grammar enables any possible operator, as long as the operator can be put
in long-term memory. The complexity of an element in memory is de ned as the
complexity of its depth in memory. A more rigorous presentation of this new
framework including the considerations on memory will be presented at ICCBR
2017 Computational Analogy Workshop [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>We propose a validation of our approach with a human experiment. In an
online survey, we submitted a few Hofstadter's problems and asked participants
for their most intuitive solution. We shew that the majority answers correspond
to local minima of Kolmogorov complexity. These results are not yet published.</p>
      <p>The proposed approach o ers a tool to compare two results of an analogical
problem when the generative instructions are given to the system. A rst logical
direction is to provide automatic instruction generation, hence software able to
produce an optimal generative instruction for any complete analogy. Once this
will done, I have to nd a solution to an analogical problem (eg. nd the best
solution to ABC:ABD::IJK:?): because the space of solutions is in nite, it
cannot be explored naively and research biases have to be found. In order to
address these issues, I am currently working on a Python interpreter for the
developed language.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Collaborations and side problems</title>
      <p>In the context of my research, I have the opportunity to collaborate on several
projects related to non-stationarity and transfer.</p>
      <p>
        A rst project is led jointly with Dr. Jeremie Sublime (ISEP) and Dr. Basarab
Matei (Universite Paris 13) and concerns collaborative clustering. Classic
clustering consists in associating similar data together in clusters. Collaborative and
multi-view clustering is a framework in which several clustering algorithms are
involved and try to in uence each other. The algorithms do not produce the
same number of clusters, the same underlying model nor the same nal solution.
This problem is closely related to transfer because it involves sharing
information between several di erent domains. A rst contribution has been proposed
using operational research tools to select relevant collaborators in an existing
probabilistic framework [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. A brand new approach, based on complexity, is
being developed: we expressed the problem of collaborative clustering in terms
of data compression and worked on minimal assumptions to obtain a tractable
model. This new perspective o ers a theoretical background for a wide range of
heuristic state-of-the-art approaches and inspires a new algorithm [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>A new related collaboration has been engaged recently with Dr. Cristina
Manfredotti, Pr. Juliette Dibie (AgroParisTech) and Dr. Fatiha Sais (LRI),
exploring common approaches in transfer learning and structural mappings in
knowledge bases (ontology alignment).</p>
      <p>
        Finally, I am exploring the problem of Transfer Learning using boosting
algorithms with Pr. Antoine Cornuejols and Sema Akkoyunlu. The use of boosting
may o er new perspectives on transfer in general and be bene cial to my
understanding of transfer mechanisms. A rst contribution has been submitted [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Cornuejols</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akkoyunlu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murena</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olivier</surname>
          </string-name>
          , R.:
          <article-title>Transfer learning by boosting projections from the target domain to the source domain, submitted to NIPS 2017</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cornuejols</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ales-Bianchetti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Analogy and induction : which (missing) link</article-title>
          ? In: Workshop \Advances in Analogy Research :
          <article-title>Integration of Theory and Data from Cognitive, Computational and Neural Sciences"</article-title>
          . So a,
          <source>Bulgaria</source>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hofstadter</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The copycat project: An experiment in nondeterminism and creative analogies</article-title>
          .
          <source>AI</source>
          Memo
          <volume>755</volume>
          ,
          <string-name>
            <surname>Arti</surname>
          </string-name>
          cial Intelligence Laboratory, Massachusetts Institute of Technology (
          <year>1984</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vitanyi</surname>
            ,
            <given-names>P.M.:</given-names>
          </string-name>
          <article-title>An Introduction to Kolmogorov Complexity</article-title>
          and
          <string-name>
            <given-names>Its</given-names>
            <surname>Applications</surname>
          </string-name>
          . Springer Publishing Company, Incorporated,
          <volume>3</volume>
          <fpage>edn</fpage>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Murena</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cornuejols</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Minimum description length principle applied to structure adaptation for classi cation under concept drift</article-title>
          .
          <source>In: 2016 International Joint Conference on Neural Networks, IJCNN</source>
          <year>2016</year>
          , Vancouver, BC, Canada,
          <source>July 24-29</source>
          ,
          <year>2016</year>
          . pp.
          <volume>2842</volume>
          {
          <issue>2849</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Murena</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cornuejols</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dessalles</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          :
          <article-title>Incremental learning with the minimum description length principle</article-title>
          , accepted in
          <source>International Joint Conference on Neural Networks 2017</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Murena</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dessalles</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cornuejols</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A complexity based approach for solving hofstadter's analogies</article-title>
          , accepted at Computational Analogy Workshop, ICCBR 2017
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Murena</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sublime</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matei</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cornuejols</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>: Multi-view clustering with the principle of minimum description length, submitted to ECML 2017</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Sublime</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matei</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murena</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          :
          <article-title>Analysis of the in uence of diversity in collaborative and multi-view clustering</article-title>
          , accepted in
          <source>International Joint Conference on Neural Networks 2017</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>