=Paper=
{{Paper
|id=Vol-3078/paper-40
|storemode=property
|title=Understanding Art with AI: Our Research Experience
|pdfUrl=https://ceur-ws.org/Vol-3078/paper-40.pdf
|volume=Vol-3078
|authors=Giovanna Castellano,Gennaro Vessio
|dblpUrl=https://dblp.org/rec/conf/aiia/CastellanoV21
}}
==Understanding Art with AI: Our Research Experience==
Understanding Art with AI: Our Research Experience Giovanna Castellano, Gennaro Vessio Department of Computer Science, University of Bari Aldo Moro, Italy Abstract Artificial Intelligence solutions are empowering many fields of knowledge, including art. Indeed, the growing availability of large collections of digitized artworks, coupled with recent advances in Pattern Recognition and Computer Vision, offer new opportunities for researchers in these fields to help the art community with automatic and intelligent support tools. In this discussion paper, we outline some research directions that we are exploring to contribute to the challenge of understanding art with AI. Specifically, our current research is primarily concerned with visual link retrieval, artwork clustering, integrating new features based on contextual information encoded in a knowledge graph, and implementing these methods on social robots to provide new engaging user interfaces. The application of Information Technology to fine arts has countless applications, the most important of which concerns the preservation and fruition of our cultural heritage, which has been severely penalized, along with other sectors, by the ongoing COVID pandemic. On the other hand, the artistic domain poses entirely new challenges to the traditional ones, which, if addressed, can push the limits of current methods to achieve better semantic scene understanding. Keywords Digital Humanities, Visual arts, Artificial Intelligence, Deep Learning 1. Introduction Artificial Intelligence is revolutionizing numerous fields of knowledge and has established itself as a key enabling technology. Among the various domains that have been powered by AI-based solutions there is also the artistic one. In fact, in recent years, a large-scale digitization effort has been made, which has led to the increasing availability of huge digitized artwork collections. And this availability, combined with the recent advances in Pattern Recognition and Computer Vision, has opened up new opportunities for researchers in these fields to assist domain experts, particularly art historians, in the study and analysis of visual arts. Among other benefits, a deeper understanding of visual arts can favor their use by an ever wider audience, thus promoting the spread of culture. Visual arts, and more generally our cultural heritage, play a role of primary importance for the economic and cultural growth of our society [1, 2]. The ability to recognize characteristics, similarities and, more generally, patterns within and between digitized artworks, in order to favor a deeper study, inherently falls within the domain of human aesthetic perception [3]. Since this perception is highly subjective, and influenced by various factors, not least the emotion the artwork evokes in the observer, it is extremely difficult to conceptualize. However, representation learning techniques, such as those on which AIxIA 2021 Discussion Papers Envelope-Open giovanna.castellano@uniba.it (G. Castellano); gennaro.vessio@uniba.it (G. Vessio) Orcid 0000-0002-6489-8628 (G. Castellano); 0000-0002-0883-2691 (G. Vessio) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) state-of-the-art neural network models are based [4], appear to hold promise for automatically extracting meaningful features from artworks based on their elementary pixel encoding. Such representations can be useful for carrying out numerous tasks of interest from an artistic point of view, such as the automatic categorization of a painting based on artist, style and genre (e.g., [5, 6, 7]), or the retrieval of works similar to a given artwork based on visual features, textual descriptions, etc. (e.g., [8, 9, 10]). Interest is also growing around the use of generative paradigms, in order to create some form of art without the intervention of human artists in the generation process (e.g., [11, 12, 13]). At the Computational Intelligence laboratory (CILab) of the Department of Computer Science of the University of Bari Aldo Moro we are contributing to the interdisciplinary research in this field, which is now very active and fertile, in order to propose new techniques, methodologies and tools for the automatic and “intelligent” analysis of visual arts. The remainder of this discussion paper will be devoted to briefly outlining the main research directions we are currently exploring at CILab. 2. Visual Link Retrieval One of the building blocks of most analysis in the visual arts is finding similarity relationships, that is retrieving links, between paintings by different artists and painting schools. These relationships can help art historians to discover and better understand the influences and changes from one art movement to another. Indeed, art experts rarely analyze artworks as isolated creations, but typically study paintings within broad contexts, involving influences and connections between different schools. Traditionally, this type of analysis is done manually by inspecting large collections of human annotated photos. However, the manual search over thousands of images, spread across different periods and painting schools, is a very time-consuming and costly process. Along this direction, we have recently proposed a method for visual link retrieval, which relies on the use of a deep convolutional neural network to perform feature extraction and a fully unsupervised nearest neighbor mechanism to retrieve links between the digitized paintings [14]. Searching for visual links is completely unsupervised, making the method especially useful in cases where metadata are scarce, unavailable or difficult to collect. It is worth noting that the proposed method not only provides those images that are most similarly related to the input query, but also allows the user to study historical patterns by analyzing the “influence graph” built on the retrieved links. In fact, by applying graph measures on the network built on the links obtained, the proposed method performs a form of historical knowledge discovery on the artists. To provide an illustrative example of the behavior of the system, in Fig. 1 we provide some image queries along with the corresponding top visually linked artworks retrieved by the system. The proposed method can be beneficial not only for art historians. Enthusiasts, in fact, can benefit from the automatic link retrieval when visiting the digital collections of online museums and art galleries. This can favor a sort of interactive navigation capable of favoring the enjoyment of art. Figure 1: Query examples and corresponding visually linked paintings. 3. Artwork Clustering While the approach described in the previous section is suitable for finding visually linked artworks, it is not effective to cluster artworks into different groups, since the data appears to be uniformly distributed within a single homogeneous cluster in the feature space. Having a model that can cluster artworks without depending on hard-to-collect labels or subjective knowledge can be really useful for many domain applications. For example, the model can be used to discover different periods in the production of the same artist. Likewise, it can help experts classify contemporary art, which cannot be richly annotated. To this end, we have proposed a method that uses a pre-trained deep convolution neural network to perform feature extraction, and uses a deep embedded clustering model [15], based on an auto-encoder neural network, to perform clustering [16]. The choice of this fully deep pipeline was motivated by the difficulty of applying traditional clustering algorithms and feature reduction techniques to both the highly dimensional input pixel space and the feature space resulting from CNN embedding, especially when input images are very complex artistic images. Quantitative and qualitative experimental results showed that the proposed method is able to find well-separated clusters both when considering an overall data set spanning different periods and when focusing on artworks produced by the same artist. In particular, from a qualitative point of view, it seems that the model look not only at stylistic features to group artworks, but also especially at the semantic attributes relating to the content of the scene represented. For example, Fig. 2 shows sample images of clusters found in Pablo Picasso’s artistic production and it can be seen that the model places semantically related works, such as portraits and still Figure 2: Sample images from the clusters found among Picasso’s artworks. lifes, in the same clusters, despite their stylistic representation. This capacity seems to hold promise for addressing the well-known “cross-depiction” problem, which still poses a challenge for the research community [17]. Indeed, this capacity could be exploited to find similarities between artworks despite the way they are depicted. 4. Computer Vision & Knowledge Graphs Our research then moved on from another consideration: much of the work in the literature relies solely on the pixel information inherent in the digitized paintings and drawings. Un- fortunately, this approach leads to ignoring a large amount of domain knowledge, as well as known relationships and connections between artworks and/or artists that could increase the quality of existing solutions. Artworks, in fact, cannot be studied on the basis of their visual appearance alone, but also considering various other historical, social and contextual factors that allow us to frame them in a more complex framework. Therefore, having a knowledge base where not only artworks, but also a rich plethora of metadata, contextual information, textual descriptions, etc., are unified within a structured framework can provide a valuable resource for more powerful information retrieval and knowledge discovery tools in the artistic domain. Such a framework would be beneficial not only for enthusiastic users, who can exploit the encoded information to navigate the knowledge base, but also especially for art experts, interested in finding new relationships between artworks and/or artists for a better understanding of the past and modern art. To fill this gap, we have developed ArtGraph: an artistic knowledge graph (KG) [18]. A KG [19] provides a more expressive and flexible representation to incorporate relationships of arbitrary complexity between entities related to art, which cannot be obtained by considering only the visual content [20]. The proposed KG integrates the information collected by WikiArt1 and DBpedia2 and exploits the potential of the Neo4j3 database management system, which provides an expressive graph modeling and query language. In this way, the NoSQL database already helps provide a powerful knowledge discovery framework without explicitly training a learning system. The user, in fact, can query the graph for example to study the influences between artists, retrieve the works stored in a specific place, etc. Furthermore, the contextual knowledge encoded in ArtGraph can be integrated with visual features automatically learned by deep neural networks to develop more powerful learning models in the art domain. Several tasks, in fact, could be addressed, such as artwork attribute prediction, multi-modal retrieval and artwork captioning, which are attracting increasing interest in this domain. 5. Social Robotics As applications of Computer Vision algorithms to artistic tasks become more mature, an in- teresting implementation of these techniques in real-world cases is to incorporate them into social robots. These represent an emerging field of research focused on developing a “social intelligence” that aims to maintain the illusion of dealing with a human being [21]. In this context, recent advances in Computer Vision allow researchers to equip robots with new and powerful capabilities. In our research we are using a social robot, Pepper, as a museum tour guide. In particular, we are developing a vision-based approach to support people during a museum visit [22]. Pepper is a semi-humanoid robot on wheels, equipped with several cameras and sensors. The vision module allows Pepper to perceive the presence of visitors and locate them in space, estimating their age and gender. Additionally, the same visual link retrieval module previously described gives Pepper the ability to take the image of the painting viewed by the visitor as a visual query to search for visually similar paintings in the museum database. The robot uses these data and other information acquired during the dialogue to provide the visitors with advice on similar artworks that they would like to see in the museum. Designing the behaviors of a social robot that acts as a museum guide requires equipping it with different skills that provide visitors with an engaging and effective experience during the visit. These capabilities are intended to allow the robot to detect and locate people in the museum, recognize the artwork the visitor is looking at, profile the user during the visit in order to generate adequate recommendations and finally involve people in the interaction using adequate conversation skills. We have tested the proposed approach in our research laboratory and preliminary experiments have shown its feasibility. 1 https://www.wikiart.org/ 2 https://www.dbpedia.org/ 3 https://neo4j.com/ 6. Conclusion The growing availability of large collections of digitized artworks has given rise to an intriguing new area of research where Computer Vision and the visual arts meet. The new research area is framed as a constantly growing subfield of Digital Humanities, which aims to bring together digital technologies and humanities. The applications are innumerable and range from information retrieval in digital databases to the synthetic generation of new forms of art. Encouraged by the growing literature that has emerged on the topic, we are working to make a contribution. We are confident that this exciting field of research will be strengthened in the future by leveraging the rapid advances in Deep Learning approaches. We believe that these approaches will continue to evolve rapidly, thus paving the way for the realization of amazing scenarios in which computer systems will be able to analyze and understand fine arts on their own. In fact, one of the final objectives of this research is the ability of the machine, once a picture has been taken, to autonomously derive an understanding of what the scene depicts, what it metaphorically represents, what are the possible historical implications, etc., without any human guidance. However, the artistic domain is significantly different from the natural/photo-realistic domain Computer Vision researchers usually work on. First, there is inherent variability between the stylistic and figurative characteristics of the two domains, as well as between works by different artists belonging to the same period, if not between works by the same artist. Furthermore, the datasets with which we now pre-train Deep Learning models are affected by “recentism”, and are not representative of situations, ways of being/dressing, iconographic and mythological scenes, etc., from the past, which never existed or simply no longer exist. In other words, the cultural heritage, given its historical background over the centuries, poses entirely new and intriguing scientific challenges which, if addressed, can push beyond the semantic scene understanding achieved by current models. Acknowledgments G.V. acknowledges the financial support of the Italian Ministry of University and Research through the PON AIM 1852414 project. References [1] G. Castellano, G. Vessio, Deep learning approaches to pattern extraction and recognition in paintings and drawings: an overview, Neural Computing and Applications 33 (2021) 12263–12282. [2] E. Cetinic, J. She, Understanding and creating art with AI: Review and outlook, arXiv preprint arXiv:2102.09109 (2021). [3] E. Cetinic, T. Lipic, S. Grgic, A deep learning perspective on beauty, sentiment, and remembrance of art, IEEE Access 7 (2019) 73694–73710. [4] Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, M. S. Lew, Deep learning for visual under- standing: A review, Neurocomputing 187 (2016) 27–48. [5] E. Cetinic, T. Lipic, S. Grgic, Fine-tuning convolutional neural networks for fine art classification, Expert Systems with Applications 114 (2018) 107–118. [6] S. Liu, J. Yang, S. S. Agaian, C. Yuan, Novel features for art movement classification of portrait paintings, Image and Vision Computing 108 (2021) 104121. [7] S.-h. Zhong, X. Huang, Z. Xiao, Fine-art painting classification via two-channel dual path networks, International Journal of Machine Learning and Cybernetics 11 (2020) 137–152. [8] N. Garcia, B. Renoust, Y. Nakashima, ContextNet: representation and exploration for paint- ing classification and retrieval in context, International Journal of Multimedia Information Retrieval 9 (2020) 17–30. [9] B. Seguin, C. Striolo, F. Kaplan, et al., Visual link retrieval in a database of paintings, in: European conference on computer vision, Springer, 2016, pp. 753–767. [10] X. Shen, A. A. Efros, M. Aubry, Discovering visual patterns in art collections with spatially- consistent feature learning, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 9278–9287. [11] A. Elgammal, B. Liu, M. Elhoseiny, M. Mazzone, CAN: Creative adversarial networks, generating art by learning about styles and deviating from style norms, arXiv preprint arXiv:1706.07068 (2017). [12] X. Gao, Y. Tian, Z. Qi, RPD-GAN: Learning to draw realistic paintings with generative adversarial network, IEEE Transactions on Image Processing 29 (2020) 8706–8720. [13] Y. Liu, Improved generative adversarial network and its application in image oil painting style transfer, Image and Vision Computing 105 (2021) 104087. [14] G. Castellano, E. Lella, G. Vessio, Visual link retrieval and knowledge discovery in painting datasets, Multimedia Tools and Applications 80 (2021) 6599–6616. [15] J. Xie, R. Girshick, A. Farhadi, Unsupervised deep embedding for clustering analysis, in: International conference on machine learning, PMLR, 2016, pp. 478–487. [16] G. Castellano, G. Vessio, A deep learning approach to clustering visual arts, arXiv preprint arXiv:2106.06234 (2021). [17] H. Cai, Q. Wu, T. Corradi, P. Hall, The cross-depiction problem: Computer vision algorithms for recognising objects in artwork and in photographs, arXiv preprint arXiv:1505.00110 (2015). [18] G. Castellano, G. Sansaro, G. Vessio, Integrating contextual knowledge to visual features for fine art classification, in: Workshop on Deep Learning for Knowledge Graphs (DL4KG 2021), 2021. [19] A. Hogan, E. Blomqvist, M. Cochez, C. d’Amato, G. D. Melo, C. Gutierrez, S. Kirrane, J. E. L. Gayo, R. Navigli, S. Neumaier, et al., Knowledge graphs, ACM Computing Surveys (CSUR) 54 (2021) 1–37. [20] C. Brahim El Vaigh, N. Garcia, B. Renoust, C. Chu, Y. Nakashima, H. Nagahara, GCNBoost: Artwork classification by label propagation through a knowledge graph, arXiv e-prints (2021) arXiv–2105. [21] G. Castellano, B. De Carolis, F. D’Errico, N. Macchiarulo, V. Rossano, PeppeRecycle: Improving children’s attitude toward recycling by playing with a social robot, International Journal of Social Robotics 13 (2021) 97–111. [22] G. Castellano, B. De Carolis, N. Macchiarulo, G. Vessio, Pepper4Museum: Towards a human-like museum guide., in: AVI2 CH@AVI, 2020.