<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic analysis of artistic heritage through Artificial Intelligence</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giovanna Castellano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafaele Scaringi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gennaro Vessio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Bari Aldo Moro</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Technological improvements have resulted in a large-scale digitization efort in recent years, leading to the increasing availability of large digitized art collections. This provides an opportunity to develop AI systems capable of understanding art, thus supporting art historians and enjoying culture more generally. This paper briefly reviews our ongoing project on automatic art heritage analysis through AI. In particular, new graph representation learning approaches, combined with computer vision, are investigated to handle the complexity of visual arts.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Digital humanities</kwd>
        <kwd>Deep learning</kwd>
        <kwd>Computer vision</kwd>
        <kwd>Graph representation learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Motivations and objectives</title>
      <p>
        The mass digitization of cultural heritage, which is continuously increasing nowadays [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ],
has ofered the scientific community the opportunity to develop computational methods, from
knowledge-based systems [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to generative models [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], in order to address tasks in the art
domain. However, solving these tasks is challenging.
      </p>
      <p>
        To confirm artwork analysis complexity, consider the task of artwork attribute recognition.
Traditionally, this task has been performed using hand-crafted features and classic machine
learning algorithms (e.g., [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]). However, despite the good results achieved, extracting features
proved to be very dificult, basically due to the subjective perspective of the individual human
expert. This first limitation has been overcome with the advent of deep learning [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which can
automate the feature extraction stage thanks to its efective representation learning capability.
One of the first attempts in this direction was made by Karayev et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], who used a
pretrained Convolutional Neural Network to recognize the school of painting of a given artwork.
Nevertheless, artists usually paint using diferent styles and can represent something unreal.
Since standard pre-trained deep neural networks are biased towards natural domains, they
could not capture certain aspects fundamental to analyzing cultural heritage. Indeed, as studied
by Cetinic et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], feature representation is vital when an artistic task must be solved.
      </p>
      <p>The growing interest in this research field at the intersection of AI and art requires further
eforts to achieve technology transfer. Developing solutions in cultural heritage could represent
a non-trivial source for art experts and casual users. For example, extracting significant and
art-oriented descriptions can be plugged into smart glasses to let blind people appreciate art.
Moreover, throughout history, natural disasters have damaged many artworks: developing
models capable of reconstructing these artworks could be a great tool for art experts. Our
research at the Department of Computer Science, University of Bari, fits into this context. The
rest of the paper outlines the approach we are exploring to address challenges in this domain.
Finally, an overview of our current research and expected results concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Problem approach</title>
      <p>
        As always, in the machine and deep learning community, the first step in creating efective
models is to have a large and representative dataset available. A solid starting point is ℎ [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
which consists of a large Knowledge Graph (KG) regarding cultural heritage, including
information about artworks and their authors from diferent perspectives. However, it is extensible to
include other data, such as textual descriptions, to enrich the encoded knowledge. ℎ is
saved in Neo4j, which already provides information retrieval and knowledge discovery
capabilities even without training learning algorithms, using the Cypher query language. For example,
Fig. 1 shows the subgraph related to “The Last Supper” by Leonardo da Vinci: all the metadata
directly associated with the artwork include many diferent information, from the materials
with which the artwork was made to the people depicted.
      </p>
      <p>
        Once the data are available, some research problems can be addressed. As a first step toward
a system that can understand art, we are interested in artwork attribute recognition and, more
specifically, neuro-symbolic models. This model is suitable for the project’s goal because it can
jointly exploit diferent modes of information related to the images and metadata stored in the
KG. In particular, Graph Neural Networks can be used to exploit graph features, a state-of-the-art
approach to extract meaningful features from the graph and use them for downstream tasks [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Intimately related to attribute recognition is recognizing the emotion an image evokes in the
observer. In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], the authors have presented a full transformer-like architecture; in particular,
they used the ArtEmis dataset [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], which provides utterances describing the motivation behind
an elicited emotion.
      </p>
      <p>Another relevant task, which could be helpful for historians and art experts, is the recognition
of influences among artists. To this end, it is interesting to construct a method for obtaining
such information by reconstructing the history of artistic influences. In particular, from a
methodological point of view, also this task can be approached using the KG, in which all these
relationships are stored, and solving a link prediction task.</p>
      <p>
        Finally, generative algorithms can automate the generation of images and sequences. In this
area, Difusion Models [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and Generative Adversarial Networks [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] represent the current
state-of-the-art and can be considered to solve the task of artwork in-painting, the purpose
of which is to reconstruct damaged artworks. Some generative methods are even multimodal.
One example is DALL· E-2,1 developed by OpenAI. This model can generate an image in many
      </p>
      <p>saints-a…</p>
      <p>St.-Matt…
Milan
inCountry
inCountry
Italy
inCity leonardo…
Church
Santa
…</p>
      <p>St.-Philip</p>
      <p>St.-Peter
Christian…about
about
about
aboutSt.-Simon about
a
bout</p>
      <p>JamSets. the</p>
      <p>Less</p>
      <p>Jesus-C…</p>
      <p>St.</p>
      <p>aboutThomas
about about JamSets. the</p>
      <p>about about Great
Last-Su…</p>
      <p>locatedIn about
locatedIn locacrtpereedaIlaniitgneditoBinuygs hasShtyalesGeenxrleceaibootutenmaelicitrse…licditso… aboelicuitsmtadeOmaTfdeehOlifcaSitdste.dliac…ibttseoumt pera
high
renais…</p>
      <p>Judas-Is…
someelstehing
plaster</p>
      <p>St.</p>
      <p>Bartho…
awe</p>
      <p>St.</p>
      <p>Andrew
ways based on text prompts. On the other hand, image captioning can generate meaningful
descriptions of artworks. For this purpose, methods based on natural language processing are
crucial and will be explored. Unfortunately, image captioning systems that work well with
natural images often fail when asked to generate output from an art image because they lack
the richness and depth that a historical background would provide. This is confirmed by the
study conducted by Cetinic et al. [14]. Developing a system that can mimic a human expert is a
long-term goal of this community research line.
3. Current research and expected results
We are currently developing a tool for solving artwork attribute prediction, a preliminary version
of which was presented in [15]. The main idea is to exploit “contextual” information provided by
ℎ, in combination with visual information of the given artwork, extracted respectively
by a Graph Attention Network [16] and a Vision Transformer [17]. More specifically, we are
trying to address the problem of predicting style, genre, and the evoked emotion of a painting.
In this way, we are figuring out a practical approach to recognize those attributes, extending
pure computer vision methods.</p>
      <p>Regarding long-term objectives, we would like to develop a recommender system that could
represent a digitized art gallery where general users can purchase a specific artwork based on
what they like most. Alternatively, the same system can be used to provide customized tours in
a museum. In fact, in this way, the tool can return and rank for the user a subset of the artworks,
basing the decision on meta information. Along with artwork recommendation, there is another
essential task: artwork captioning. In fact, when returning a selected artwork, a significant
description is needed. To this end, our goal is to develop a system capable of generating a
caption that includes a description based on the visual content and adds some other information,
such as the hidden message the artist wants to communicate to the observer.</p>
      <p>Finally, we would like to develop an end-to-end method for in-painting to support art experts
consistently. In particular, the tool should be able to reconstruct damaged images, varying the
generation based on metadata or text prompts in which the experts specify some constraints
for the reconstruction. For example, the user requires that the presence of a specific object or
person in the reconstructed area is compulsory. Alternatively, the experts could be interested in
a reconstruction based on a specific painting style. A promising starting point in this direction
is the work of Cipolina-Kun et al. [18].
erative adversarial networks: An overview, IEEE signal processing magazine 35 (2018)
53–65.
[14] E. Cetinic, Towards Generating and Evaluating Iconographic Image Captions of Artworks,</p>
      <p>Journal of Imaging 7 (2021).
[15] S. Aslan, G. Castellano, V. Digeno, G. Migailo, R. Scaringi, G. Vessio, Recognizing the
Emotions Evoked by Artworks Through Visual Features and Knowledge Graph-Embeddings, in:
International Conference on Image Analysis and Processing, Springer, 2022, pp. 129–140.
[16] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio, Graph attention
networks, arXiv preprint arXiv:1710.10903 (2017).
[17] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M.
Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words:
Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020).
[18] L. Cipolina-Kun, S. Caenazzo, G. Mazzei, Comparison of CoModGans, LaMa and GLIDE for
Art Inpainting Completing M.C Escher’s Print Gallery, in: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022, pp.
716–724.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.-A.</given-names>
            <surname>Ypsilantis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Garcia</surname>
          </string-name>
          , G. Han,
          <string-name>
            <surname>S</surname>
          </string-name>
          . Ibrahimi,
          <string-name>
            <given-names>N. Van</given-names>
            <surname>Noord</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. Tolias,</surname>
          </string-name>
          <article-title>The met dataset: Instance-level recognition for artworks</article-title>
          ,
          <source>in: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Reshetnikov</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-C. Marinescu</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          <string-name>
            <surname>Lopez</surname>
          </string-name>
          , DEArt: Dataset of European Art,
          <source>arXiv preprint arXiv:2211.01226</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Castellano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Digeno</surname>
          </string-name>
          , G. Sansaro, G. Vessio,
          <article-title>Leveraging Knowledge Graphs and Deep Learning for automatic art analysis</article-title>
          ,
          <source>Knowledge-Based Systems</source>
          <volume>248</volume>
          (
          <year>2022</year>
          )
          <fpage>108859</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rombach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Blattmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ommer</surname>
          </string-name>
          ,
          <article-title>Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Difusion Models</article-title>
          ,
          <source>arXiv preprint arXiv:2207.13038</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Carneiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. P.</given-names>
            <surname>d. Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Bue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Costeira</surname>
          </string-name>
          ,
          <article-title>Artistic image classification: An analysis on the printart database</article-title>
          ,
          <source>in: European conference on computer vision</source>
          , Springer,
          <year>2012</year>
          , pp.
          <fpage>143</fpage>
          -
          <lpage>157</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          , Y. Bengio, G. Hinton,
          <article-title>Deep learning</article-title>
          , nature
          <volume>521</volume>
          (
          <year>2015</year>
          )
          <fpage>436</fpage>
          -
          <lpage>444</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Karayev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Trentacoste</surname>
          </string-name>
          , H. Han,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Agarwala,
          <string-name>
            <given-names>T.</given-names>
            <surname>Darrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hertzmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Winnemoeller</surname>
          </string-name>
          , Recognizing Image Style,
          <source>in: Proceedings of the British Machine Vision Conference</source>
          , BMVA Press,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Cetinic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lipic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Grgic</surname>
          </string-name>
          ,
          <article-title>Fine-tuning Convolutional Neural Networks for fine art classification</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>114</volume>
          (
          <year>2018</year>
          )
          <fpage>107</fpage>
          -
          <lpage>118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ying</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <article-title>Representation learning on graphs: Methods and applications</article-title>
          ,
          <source>arXiv preprint arXiv:1709.05584</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Somandepalli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kundu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lahiri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gratch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narayanan</surname>
          </string-name>
          ,
          <article-title>Understanding of Emotion Perception from Art</article-title>
          ,
          <source>arXiv preprint arXiv:2110.06486</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Achlioptas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ovsjanikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Haydarov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Elhoseiny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. J.</given-names>
            <surname>Guibas</surname>
          </string-name>
          ,
          <article-title>ArtEmis: Afective Language for Visual Art</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>11569</fpage>
          -
          <lpage>11579</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Gao</surname>
          </string-name>
          , G. Chen,
          <string-name>
            <given-names>P.-A.</given-names>
            <surname>Heng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>A survey on generative difusion model</article-title>
          ,
          <source>arXiv preprint arXiv:2209.02646</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Creswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>White</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dumoulin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Arulkumaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sengupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Bharath</surname>
          </string-name>
          , Gen-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>