<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Solving Analogical Equations Between Strings of Symbols Using Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vivatchai Kaveeta</string-name>
          <email>vivatchai@fuji.waseda.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yves Lepage?</string-name>
          <email>yves.lepage@waseda.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IPS, Waseda University 2-7 Hibikino</institution>
          ,
          <addr-line>Wakamatsu-ku, Kitakyushu-shi, 808-0135 Fukuoka-ken</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <fpage>67</fpage>
      <lpage>76</lpage>
      <abstract>
        <p>A neural network model to solve analogical equations between strings of symbols is proposed. The method transforms the input strings into two fixed size alignment matrices. The matrices act as the input of the neural network which predicts two output matrices. Finally, a string decoder transforms the predicted matrices into the final string output. By design, the neural network is constrained by several properties of analogy. The experimental results show a fast learning rate with a high prediction accuracy that can beat a baseline algorithm.</p>
      </abstract>
      <kwd-group>
        <kwd>Proportional Analogy</kwd>
        <kwd>Neural Networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Proportional analogy between sequences of symbols, being they phonemes or
characters is stated as the relationship between four strings in the form of ‘A is
? This work was supported by a JSPS Grant, Number 15K00317 (Kakenhi C), entitled</p>
      <sec id="sec-1-1">
        <title>Language productivity: ecient extraction of productive analogical clusters and their evaluation using statistical machine translation.</title>
        <p>
          to B as C is to D’ denoted by A : B :: C : D. Analogical equations are the
following problems: if three strings A, B and C are given, how to coin the fourth string?
Proportional analogies are seen at work to coin new words or new sentences. In
this work, we focus on a type of analogies called analogies of commutation1. We
do not deal with semantic analogies like: queen : king :: woman : man. Rather,
the computational analogy directly works on the symbolic level. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] gives an
algorithm to solve analogies of commutation on strings. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] proposes a
similar algorithm. Both algorithms base on the notion of edit distance. In [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], the
formalization was successfully applied in the development of an analogy-based
machine translation system. In this work we refer to these previous
formalizations in designing an appropriate structure for a neural network.
        </p>
        <p>
          Neural networks have been successfully applied to many tasks. Their main
advantage is their ability to learn from examples without predefined knowledge of
the problem. Assuming that an appropriate model structure is used, the network
can estimate the underlying structure of the problem. Although many neural
networks are proposed for di↵erent tasks, no specific neural network seems to
have ever been proposed to solve analogical equations on strings of symbols. [
          <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
          ]
proposed networks to generate new images based on the previous image samples
for classification model training. However, these problems are not expressed in
the form of analogy equations. In [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], neural networks generate new images
by solving analogy equations between images. This is similar to our problem
of analogical equations on strings. The successful implementations point at the
possibility of developing neural network to solve analogical equations on strings.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Proposed Method</title>
      <p>In Sect. 3.1, a method to transform input strings into matrices is introduced.
The matrices are re-sampled into fixed-size matrices in Sect. 3.2. Two filtering
methods are introduced in Sect. 3.3. The neural network is explained in Sect. 3.4.
The output matrices are decoded into a final string by the decoder in Sect. 3.5.
3.1</p>
      <sec id="sec-2-1">
        <title>Alignment Matrices</title>
        <p>
          The usual approach for processing strings of characters with neural networks
is vector encoding. Dictionary based one-hot-vectors are used in [
          <xref ref-type="bibr" rid="ref15 ref4 ref6">4, 6, 15</xref>
          ]. For
the analogy resolution task, vectors could be built at the character level.
Unfortunately, this vector representation presents some problems. First, strings are
variable in length. Second, dictionary-based vectors are language-specific. These
limitations make the usage of one-hot-vector limited to fixed length and specific
language strings.
        </p>
        <p>
          Proportional analogy can be processed by calculating similarities through
edit distances. Consequently, a representation for the similarity of strings seems
1 One can distinguish between four types of analogies between strings of symbols:
repetition (e.g., A : A.A :: B : B.B), reduplication (e.g., cat : caat :: dog : doog),
mirror (e.g., abc : wxyz :: cba : zyxw) and commutation (examples in the text).
h
a
r
d
h a r d e r
appropriate. Alignment matrices are widely adopted in the genetic sequence
alignment task [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. They encode a pair of sequences into a matrix where each cell
represents a local matching point. Figure 1a (left) shows the alignment between
the strings ‘harder’ and ‘hard’. Local matching positions with the value of 1.0
are shown as black cells. Unmatched positions with a value of 0.0 are shown as
white cells.
The alignment matrices can vary in dimension depending on the lengths of the
input strings. To feed the alignment data to a neural network, the matrices need
to be re-sampled into fixed dimension matrices.
f (s, t, u) = min (I(s + 1, u), t + 1)
max (I(s, u), t + 1) ,
        </p>
        <p>I(w, n) = w ⇥
(1)
(2)
z
n
AB</p>
        <p>AC
BD</p>
        <p>AB
BD</p>
        <p>AC</p>
        <p>CD</p>
        <p>Input strings A and B with lengths m and n are denoted as a1a2. . . am and
b1b2. . . bn respectively. The original alignment matrix (Fig. 1, left) with
dimension m ⇥ n has been re-sampled into AB with dimension z ⇥ z (Fig. 1, right). The
formula is shown in (2). The matrix AB is using a non-uniform linear
transformation on both axes. Figure 1b illustrates this. The di↵erence with other image
re-sampling methods is shown on Fig. 1c. Our method can generate sharp edges
while maintaining the diagonal line visible.
Anomaly black cells appear because of duplicate characters. As they do not
belong to any valid alignment, these cells degrade the prediction quality. We
thus introduce two filtering methods to remediate this.</p>
        <p>
          Mathematical Morphology Originally proposed in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], mathematical
morphology enhances an input image by specified filters and operations. Alignment noise
usually appears in positions out of the main diagonal. An example is the cell
on the right of Fig. 1a which matches the character ‘r’ in ‘hard’ with the last
character of ‘harder’. Two 3 ⇥ 3 filters are shown in Fig. 1b (2,top). The original
matrix is filtered by both filters for grey scale erosion. The two filtered matrices
are combined by their maximum values. Using this procedure, black cells
appearing out of a diagonal line are filtered out, while diagonal lines keep sharp
ending. Figure 1b (2,bottom) shows this.
        </p>
        <p>Diagonal Weight Usually, the valid alignments are located on the main primary
diagonal line. So, we apply a linear weighting scheme along this diagonal line.
Cells which are further away from the line are gradually less weighted. Figure 1b
(4,top) shows the weighting filter. Equation (3) denotes the value at (x, y) of
the weight-filtered matrix AB. z is the size of the matrix. Figure 1b (4,bottom)
shows the result of this filtering method.</p>
        <p>ABxy = axby ⇥
✓
1
|x
z
y| ◆
(3)
3.4</p>
      </sec>
      <sec id="sec-2-2">
        <title>Neural Network Model</title>
        <p>To design an appropriate neural network structure, we rely on the properties of
analogies. An important property is the equivalent forms of analogy. As described
ABzz
abzz
h11
hp1
h12
hp2
bd1</p>
        <p>
          bd2
BD11 BD12
in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], a single form A : B :: C : D has 7 other equivalent forms: A : C :: B : D,
B : A :: D : C, B : D :: A : C, C : A :: D : B, C : D :: A : B, D : B :: C : A,
D : C :: B : A.
        </p>
        <p>Another property is the mirroring of strings. If A : B :: C : D is an analogy,
then A¯ : B¯ :: C¯ : D¯ holds too, where A¯ represents the mirror of A. The mirror
of string a1a2. . . am is amam 1. . . a1. As a result, we get eight additional
equivalent equations. Equivalent prediction flows are shown in Fig. 2. Four boxes on
corners represent the alignment matrix generated from the input strings
(mirror versions on the right). Matrix AB is the fixed-dimension alignment matrix
build against string A and B using the method explained in Sect. 3.1 to 3.3.
The two input matrices are flattened and concatenated to form the input vector.
The concatenations are represented as a circle connection. The merged data are
fed into the neural network. The dashed line is representing the neural network
structure with shared parameters for all equivalent prediction flows. The
network is trained on all equivalent data flows. Note that the direction of the input
alignment matrices needs to be in the correct orientation.</p>
        <p>The neural network AB AC99KBD is detailed in Fig. 3. The network predicts
the matrix BD. The input data is the flattened and concatenated representation
of matrices AB and AC. The total number of input nodes is thus 2⇥ z2. The
flow goes into p fully connected hidden layers, where each layer has q nodes. The
output layer has z2 nodes. Pairs of predicted output matrices from the network
are decoded into a final string by a decoder algorithm explained below.
3.5</p>
      </sec>
      <sec id="sec-2-3">
        <title>Decoder</title>
        <p>From the alignment matrices AB and AC, the neural network produces a pair
of matrices that stand for the alignment BD and CD. The decoder decodes the
pair of matrices into a final string which is a hypothesis for the solution D of
the analogical equation between strings A : B :: C : D, In the decoder, we rely</p>
      </sec>
      <sec id="sec-2-4">
        <title>Algorithm 1: Decoder</title>
        <sec id="sec-2-4-1">
          <title>Input: BD, CD: two re-sampled matrices</title>
        </sec>
        <sec id="sec-2-4-2">
          <title>Input: A, B, C: input strings</title>
        </sec>
        <sec id="sec-2-4-3">
          <title>Input: z: length of output string</title>
        </sec>
        <sec id="sec-2-4-4">
          <title>Data: N : set of number of occurrences for each character Data: V [c, i]: set of likelihood values of character c at each i position</title>
          <p>3
5
6
7
9
10
relationship. The length of D is entirely determined by the lengths of A, B and
C. Another piece of information is the number of occurrences of symbols in the
output string. In (5), |A|a stands for the number of occurrences of symbol a in
string A. (5) applied to all symbols implies (4). Our decoder is based on the
following three pieces of information: the two predicted alignment matrices, the
length of the output string, and the number of occurrences of each symbol.</p>
          <p>|D| = |B| + |C| | A|
|D|a = |B|a + |C|a | A|a , 8 a
(4)
(5)
4
4.1</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <sec id="sec-3-1">
        <title>Dataset</title>
        <p>
          We construct a data set of analogical equations on strings. The data set is
a combination of all 3,370 formal analogies in the Google test set [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] (e.g.,
bright : brightest :: sweet : sweetest)
        </p>
        <p>
          and 2,423 formal analogies in multiple
languages as in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] (e.g., wolf : wolves :: leaf : leaves, (Japanese) r∏
: ar∏*
::
4÷
: a÷4*,
(Chinese) ˚
: ˚⇧
:: f
: f⇧,
(Malay) kawan : mengawani ::
keliling : mengelilingi).
        </p>
        <p>
          We randomly selected 10 % of the data as our test set,
the rest is used for training. The statistics of the data set are: # of training
samples = 5214, # of test samples = 579, average edit distance = 1.78, average
length of string = 7.04 ± 2.54.
We performed a series of experiments to evaluate the performance of each
parameter (see subsection below). Another experiment determines the highest
accuracy rate. For each experiment, all parameters are constant except those
parameters which are tested. The basic parameter settings are: size of alignment
matrices = 16⇥ 16, re-sampling method = proposed, filtering method = both,
number of hidden nodes = 128, number of hidden layers = 1, loss function =
MSE, activation = ReLU[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], optimizer = Adam[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], and number of epochs =
200. Any alteration to these basic parameters is clearly stated in the description
of each experiment.
Training Time A significant advantage of our design is its ability to learn at a
high speed. We ran experiments to test the influence of various hyper-parameters.
We measured the training times after 200 epochs.
Loss We measured the values of a loss function (or objective function). These
values reflect the ability of the neural network to predict the correct alignment
matrices by comparing the output with the reference ground truth. The Mean
Square Error (MSE) function is given in (6). P is the predicted matrix from the
neural network, and T is the ground truth matrix. Lower values reflect a more
precise prediction, hence better model configurations.
        </p>
        <p>Accuracy We measured the accuracy of our network by the percentage of
correct answers over the total number of test samples (see (7)). In any case if one
or more characters in the output string are di↵ erent from the reference string,
the prediction is counted as a failure.
1 Xz Xz (Pij
z2
i=1 j=1</p>
        <p>Rij )2
# of correct answers</p>
        <p>Accuracy = total # of test samples ⇥ 100
1ss00
o
L
Size of Alignment Matrices In this experiment, the size of alignment matrices
are varied from 4 to 32 by subsequent powers of 2. Experiment results in Fig. 4
show expected behaviors. The bigger the alignment matrices, the higher the
accuracy. The highest rate of 84.11% is obtained with 32 ⇥ 32 matrices. Figure 4
(right) shows that the size of alignment matrices directly contributes to their
ability to output longer solutions. The downside of bigger alignment matrices
is the number of network connections which causes an exponential explosion of
hyper-parameters. Table 1 shows that training times increase with the number
of hyper-parameters.</p>
        <p>Matrix Re-sampling Methods We compared our re-sampling method with
nearest neighbor, bilinear and bicubic methods. Results show that our re-sampling
method achieves the highest accuracy.</p>
        <p>Filtering methods As we proposed two filtering methods, we test all
combinations: no filtering, only one, or both. This gives four combinations. The results in
Table 1 show that the use of both filtering methods yields the highest accuracy.
(6)
(7)</p>
        <p>10</p>
        <sec id="sec-3-1-1">
          <title>Output length 15 Fig. 5. Accuracy against output length</title>
          <p>We observe that the use of mathematical morphology yields a lower accuracy
rate than without any filtering. This may indicate some limitation of the selected
morph filters.</p>
          <p>
            Number of hidden nodes We set the number of hidden layers to one, but the
number of nodes varies. A higher number of hidden nodes reflects the ability to
recognize more complex patterns. Results show that networks with more hidden
nodes can yield higher accuracy rates at the expense of the training time.
Number of hidden layers Usually, deeper networks are favored for more
complicated patterns, a disadvantage being longer learning times. In this experiment,
each network has the same number of hidden nodes (128) per layer, but a
di↵erent number of layers. As expected, a deeper structure yields better recognition
rates. Interestingly, with four hidden layers, the number of hyper-parameters is
much lower than one large single layer network as in the last experiment. Yet, a
deeper network can achieve a better accuracy rate with shorter training time.
Benchmark We select an extreme configuration (alignment matrices = 32⇥ 32,
number of hidden nodes = 1024, number of hidden layers = 2) to compare the
results with a baseline algorithm given in [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. Our proposed neural network achieves
a higher accuracy rate over the baseline, 95.68 against 94.47. The baseline does
not need any training, while our network needed to be trained for 36 minutes.
This may come from the fact that our dataset contains some samples which do
not comply the formalization of the baseline algorithm. Figure 5 shows some
limitation in our network to solve equations with longer strings. Nevertheless,
this result proves the e↵ectiveness of our neural network model for the task.
6
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this work, a neural network design to solve analogical equations of
commutation on strings of symbols has been proposed. We presented several methods to
transform the input strings to matrix representation and back to output string.
Two filtering methods to reduce the alignment noise were introduced. The model
parameters were tested on a number of experiments. They show promising results
as an accuracy of more than 95% was achieved. The comparison to a baseline
system showed higher accuracy rate on the test set.</p>
      <p>We intend to further improve our neural network in the future. From the
reported experiments, the accuracy degrades with the length of strings.
Improvements in the network structure may help to improve the prediction accuracy.
Also, the decoding algorithm impacts the accuracy of the final string. The
decoding scheme can be further improved.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Dosovitskiy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tobias</surname>
            <given-names>Springenberg</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Brox</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>Learning to generate chairs with convolutional neural networks</article-title>
          .
          <source>IEEE Conference on Computer Vision</source>
          and Pattern Recognition pp.
          <fpage>1538</fpage>
          -
          <lpage>1546</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Gibbs</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McIntyre</surname>
            ,
            <given-names>G.A.</given-names>
          </string-name>
          :
          <article-title>The diagram, a method for comparing sequences</article-title>
          .
          <source>European Journal of Biochemistry</source>
          <volume>16</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          (
          <year>1970</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Gregor</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Danihelka</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rezende</surname>
            ,
            <given-names>D.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wierstra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Draw: A recurrent neural network for image generation</article-title>
          .
          <source>In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15)</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Johnson</surname>
          </string-name>
          , R., Zhang, T.:
          <article-title>Semi-supervised convolutional neural networks for text categorization via region embedding</article-title>
          .
          <source>In: Proceedings of the Advances in neural information processing systems</source>
          . pp.
          <fpage>919</fpage>
          -
          <lpage>927</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>In: Proceedings of the 3rd International Conference for Learning Representations</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Lai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Recurrent convolutional neural networks for text classification</article-title>
          .
          <source>In: Proceedings of the 29th AAAI Conference on Artificial Intelligence</source>
          . pp.
          <fpage>2267</fpage>
          -
          <lpage>2273</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lepage</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Solving analogies on words: an algorithm</article-title>
          .
          <source>In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics</source>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lepage</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Languages of analogical strings</article-title>
          .
          <source>In: Proceedings of the 18th conference on Computational linguistics-Volume</source>
          <volume>1</volume>
          . pp.
          <fpage>488</fpage>
          -
          <lpage>494</lpage>
          . Association for Computational Linguistics (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lepage</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Denoual</surname>
          </string-name>
          , E.:
          <article-title>Purest ever example-based machine translation: Detailed presentation and assessment</article-title>
          .
          <source>Machine Translation</source>
          <volume>19</volume>
          (
          <issue>3-4</issue>
          ),
          <fpage>251</fpage>
          -
          <lpage>282</lpage>
          (
          <year>December 2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Miclet</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delhay</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Analogy on sequences: a definition and an algorithm</article-title>
          .
          <source>Ph.D. thesis</source>
          , INRIA (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Ecient estimation of word representations in vector space</article-title>
          .
          <source>International Conference on Learning Representations</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Nair</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.E.:
          <article-title>Rectified linear units improve restricted boltzmann machines</article-title>
          .
          <source>In: Proceedings of the 27th International Conference on Machine Learning (ICML-10)</source>
          . pp.
          <fpage>807</fpage>
          -
          <lpage>814</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Reed</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.</surname>
          </string-name>
          :
          <article-title>Deep visual analogy-making</article-title>
          .
          <source>In: Proceedings of the Advances in Neural Information Processing Systems</source>
          . pp.
          <fpage>1252</fpage>
          -
          <lpage>1260</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Serra</surname>
          </string-name>
          , J.:
          <article-title>Image analysis and mathematical morphology</article-title>
          . Academic press (
          <year>1982</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , LeCun, Y.:
          <article-title>Character-level convolutional networks for text classification</article-title>
          .
          <source>In: Proceedings of the Advances in Neural Information Processing Systems</source>
          . pp.
          <fpage>649</fpage>
          -
          <lpage>657</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>