=Paper=
{{Paper
|id=Vol-2865/poster7
|storemode=property
|title=Applying Computer Vision Systems to Historical Book Illustrations: Challenges and First Results 
|pdfUrl=https://ceur-ws.org/Vol-2865/poster7.pdf
|volume=Vol-2865
|authors=Yongho Kim,Thomas Mandl,Chanjong Im,Sebastian Schmideler,Wiebke Helm
|dblpUrl=https://dblp.org/rec/conf/dhn/Kim0ISH20
}}
==Applying Computer Vision Systems to Historical Book Illustrations: Challenges and First Results ==
<pdf width="1500px">https://ceur-ws.org/Vol-2865/poster7.pdf</pdf>
<pre>
               Applying Computer Vision Systems to
                  Historical Book Illustrations:
                   Challenges and First Results

                     Yongho Kim1, Thomas Mandl1, Chanjong Im1,
                        Sebastian Schmideler2, Wiebke Helm2
                  1 University of Hildesheim, Information Science, Germany

                    2 University of Leipzig,
                                        Faculty of Education, Germany
                               mandl@uni-hildesheim.de


       Abstract. Digital humanities still need to unlock the potential of images anlysis
       algorithms to a large extent. Modern deep learning images processing can con-
       tribute much to quantify knowledge about visual components in books. In this
       study, we report on experiments carried out for historical print. The illustrations
       in books offer much for humanities research. Object recognition systems can
       identify the portfolio of objects in book illustrations. In a study with several hun-
       dreds of books, we applied systems to find illustrations and classify them. Results
       show that persons are shown in illustrations within fiction books with a higher
       frequency than in non-fiction books. We also show the classification results for
       an analysis of the printing technology. This expert task can still not be perfectly
       modeled by a CNN. A class activation map analysis can be used to analyze the
       performance qualitatively.

       Keywords: Digital Humanities, Children Books, Deep Learning, CNN.


1      Introduction

Digital Humanities integrates automatic processing and analysis into research practices
in the Humanities. Image analysis is a growing area within Digital Humanities. The
analysis of books is of great interest to many disciplines. Digital historical corpora al-
low the automatic access to illustrations in books and their analysis in large quantities.
This can lead to innovative research questions and quantitative results [1].
   Historical children’s and youth books have yet not often been the subject of research.
Children books typically contain more images than adult books typically. As a conse-
quence, they are of special interest for an analysis of images. In addition, they form a
closed category on the one hand which contains sufficient variety on the other hand [2].
   Illustrated books have played a significant role in knowledge dissemination. The
declining production costs for printed images have led to a growing exposure of more
and more people to rich visual resources. Research in this area could identify trends in
the objects depicted.


             Copyright © 2021 for this paper by its authors. Use permitted under
            Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                           256


2      State of the Art

The progress in computer vision and images analysis in the last decade was substantial.
Deep learning as a new direction in machine learning now can be considered as state of
the art and delivers excellent results in many domains. Deep learning refers to a depar-
ture from feature engineering. Algorithms instead find a representation space for the
problem at hand by themselves. Neural networks have proven to be very successful for
this task. Already early approaches intended to input compresses feature spaces into
backpropagation neural networks [3] but recently, new architectures have grown more
complex and successful.
   Convolution Neural Networks (CNN), the recent state of the art technology is known
to be very effective in automated feature detection and subsequent classification in
many domains. CNNs are composed of recurring sets of two layers: a convolution layer
and a pooling layer. The CNN combines pixels locally and by working through many
layers, more complex features can be extracted [4].
   The applications of computer vision to the digital humanities is a growing area.
However, there are still not very many publications in this domain. One influential ex-
periment in the art domain by Salah & Elgammel is dedicated to classify the painter of
artistic work. Such work is highly dependent on the type of paintings in the collection
[5]. An approach to identify objects within art work has also been presented. Similar to
our approach, it needs to deal with the domain shift and apply current technology to
historic print [6].
   A recent project is focusing on research on graphic novels. Current state of the art
CNNs are applied to tasks like author identification with very good success [7]. In ad-
dition, the processing is aimed at measuring the drawing style of a graphic novel in
order to find similar book titles. A study of modern children books based on information
available in catalogues has analyzed market structures and book formats [8].
   A recent approach shows that the visual analysis of a page structure can be carried
out successfully with CNNs. A system can detect elements on a page and analyze tables
from heterogeneous layouts [9].
   One goal for the research of images in historical children books lies within the pro-
duction technologies. As a classification problem with few classes, it seems like a chal-
lenge which could be solved with current technology. The classification of production
technology in the 19th century is still a hard task [10]. Comparable work is the appli-
cation of deep networks and transfer learning for material classification. Cimpoi et al.
conducted material classification with deeper structure and transfer learning [11].


3      Data Collections of Digitized Books

This research is exploiting two collections of books that are digitized. The first collec-
tion is the Wegehaupt corpus maintained by the Staatsbibliothek in Berlin [12]. The
second data collection is based on the Hobrecker collection [13]. This collection of
books is maintained by the library of the Technical University of Braunschweig.
                                              257


All collections are of great interest for cultural research. They contain a rich variety of
different genres of children books mainly from Germany and mainly from the 19th cen-
tury: e.g. alphabetization books, picture books, biographies, natural history descriptions
as well as adventure stories. Another resource used is the database pictura paedagogica
online [14]. It contains only images, many of which are extracted from books.


4      Results and Analysis

We present some results for using a system with pre-defined classes and one with a
self-trained model. All models use deep learning systems and in particular some variant
of a CNN architecture.
    After extracting images from the book pages, we use them as input for a object clas-
sification system pre-trained on modern photographs. In previous work, we could see
that the classification results are not perfect and that the greatly differ between books.
Typical performance metrics can lie between 30% and 90% for different books [15]
which shows the heterogeneity of the material and the sensitivity of the systems for
that.


     Fig. 1: Example of an illustration from the collection with object recognition results.
                           Bertuch (1801). Bilderbuch für Kinder.

    Yolo to the images and record the recognized object types. The results of Yolo [16]
have been recorded for a subset of 321 books from the Hobrecker collection. The clas-
sification is not very reliable; a detailed evaluation is ongoing. However, for a statistical
analysis, it seems sufficient. We manually classified the fiction books and non-fiction
books. The analysis shows that there is no difference in the overall number of illustra-
tions for the two classes. However, the fiction books contain more images of humans
and horses and thus a more limited scope of object classes than is the case in the non-
fiction books. Non-fiction displaying many different objects and animals seem to cause
that difference (see table 1).
                                             258


                      Table 1. Results of Yolo on the Hobrecker collection.

                  Illustr.           Person                Horse               Chair
Median            per page           per illustr.          per illustr.        per illustr.
fiction books         0.400                 1.143                0.020                0.0
non fiction books     0.526                 0.737                0.042                0.0

                  Illustr.           Person                Horse               Chair
Average           per page           per illustr.          per illustr.        per illustr.
fiction books         0.448                 1.446                0.103              0.072
non fiction books     0.566                 1.151                0.128              0.035


It needs to be stressed that the distribution of the classes is highly skewed. The most
frequent classes are humans and a few animals. These results currently do not allow a
quantitative tracing of many different motifs through the century.
   For the study of historical print, questions of materiality are of great importance.
Issues of aesthetic design needs to be considered in relation to the techniques and print-
ing technologies available. Printing technology like wood cut, wood engraving and li-
thography allowed different levels of elaboration. Finding the technique is a tedious
task. It is often not stated in the meta data and it requires experts to identify it from
digitized books.
   Therefore, the automatic identification is a important requirement. We trained a
model for distinguishing between three classes and managed to achieve only a reason-
able performance. However, for a statistical analysis, this can be sufficient.

          Table 2. Results 3 models for a 3-class classification for printing technology.
       First number reports accuracy on the training set, second number on the test set.


To further analyze the errors and to observe how the algorithms differs from human
experts, we applied a qualitative analysis with class activation mapping (CAM)[17].
They show the areas which were relevant for the system to classify the image into a
class. We can see that the algorithm does often not consider content parts of the image
but rather parts of the frame. Also, experts look at long lines or large areas.
                                           259


            Fig. 2: Example of an CAM analysis for images from the collection.


5      Outlook

Future research needs to also address stylistic and artistic aspects of illustrations. A
deeper analysis of content on a page and in particular of frequent classes (primarily
pictures of humans) offer great potential for advanced analysis tools for digital human-
ists. We intend to develop a scene detection system which allows the study of typical
scene types like family, play, schooling and nature.


Acknowledgements

We thank the Fritz Thyssen Foundation for their funding for the research project Dis-
tant Viewing. We thank the library of the Technische Universität Braunschweig, the
BBF | Bibliothek für Bildungsgeschichtliche Forschung Bibliothek Bildungsgeschicht-
liche Forschung, Abteilung des DIPF | Leibniz-Institut für Bildungsforschung und Bil-
dungsinformation and the Staatsbibliothek Berlin (Preußischer Kulturbesitz) for provi-
ding and facilitating access to their digitized collections.
                                              260


References
 1. Bentkowska-Kafel, A. (2015). Debating Digital Art History. In: International Journal for
    Digital Art History. vol. 1 https://doi.org/10.11588/dah.2015.1.21634
 2. Schmideler, S. (2018). Lutherbilder. Ein Streifzug durch die Illustrationsgeschichte der Kin-
    der-und Jugendliteratur des 18. und 19. Jahrhunderts. Die Reformation in der Kinder-und
    Jugendliteratur.
 3. Mandl, T. (2000). Tolerant information retrieval with backpropagation networks. Neural
    Computing & Applications, 9(4), 280-289.
 4. Skansi, S. (2018). Introduction to Deep Learning. Springer.
 5. Saleh, B. & Elgammal, A. (2016). Large-scale Classification of Fine-Art Paintings: Learn-
    ing the Right Metric on The Right Feature. Intl. Journal for Digital Art History, (2).
 6. Crowley, E. &Zisserman, A. (2014). The State of the Art: Object Retrieval in Paintings using
    Discriminative Regions. Proc. British Machine Vision Conference. BMVA Press.
 7. Dunst, A. & Hartel, R. (2018). Auf dem Weg zur Visuellen Stilometrie: Automatische
    Genre- und Autorunterscheidung in graphischen Narrativen. Kritik der digitalen Vernunft.
    5. Tagung „Digital Humanities im deutschsprachigen Raum“ http://dhd2018.uni-
    koeln.de/wp-content/uploads/boa-DHd2018-web-ISBN.pdf
 8. Steiner, A. (2019). Conservatism in an Innovative Field: Children’s Digital Books in Swe-
    den. DHN 2019 Digital Humanities in the Nordic Countries 4th Conference. ceur-
    ws.org/Vol-2364
 9. Lehenmeier, C., Burghardt, M., & Mischka, B. (2020, August). Layout Detection and Table
    Recognition–Recent Challenges in Digitizing Historical Documents and Handwritten Tab-
    ular Data. In International Conference on Theory and Practice of Digital Libraries (pp. 229-
    242). Springer, Cham.
10. Im, C.; Ghauri, J.; Rothman, J. and Mandl, T. (2018). Deep Learning Approaches to Classi-
    fication of Production Technology for 19th Century Books. LWDA, pp. 150-158. http://ceur-
    ws.org/Vol-2191
11. Cimpoi, M., Maji, S., Kokkinos, I. & Vedaldi, A. (2016). Deep filter banks for texture recog-
    nition, description, and segmentation. International Journal of Computer Vision, vol. 118
    (1) pp. 65-94
12. Staatsbibliothek zu Berlin, Preußischer Kulturbesitz. Wegehaupt Digital: https://digital-
    beta.staatsbibliothek-berlin.de/suche?category[0]=Kinder- und Jugendbücher&queryString
    =project%3A"wegehauptdigital".
13. UB TU Braunschweig, Hobrecker Kollektion Online. https://publikationsserver.tu-braun-
    schweig.de/content/collections/childrens_books.xml
14. Jornitz, S., & Kollmann, S. (2004). Ins Bild hinein und aus dem Bild heraus. Anmerkungen
    zu Erfahrungen im Umgang mit einer pädagogischen Bild-Datenbank. MedienPädagogik:
    Zeitschrift für Theorie und Praxis der Medienbildung, 9, 1-17.
15. Mitera, H.; Im, C.; Mandl, T.; Womser-Hacker, C. (2021). Objekterkennung in historischen
    Bilderbüchern: Eine Evaluierung des Potenzials von Computer Vision Algorithmen. In:–
    KinderBuch. Reihe Studien zu Kinder- und Jugendliteratur und -medien. J.B. Metzler.
16. Redmon, J.; Divvala, S.; Girshick, R. and Farhadi, A. (2016). You only look once: Unified,
    real-time object detection. Proceedings of the IEEE Conference on Computer Vision and
    Pattern Recognition, pp. 779-788.
17. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-
    cam: Visual explanations from deep networks via gradient-based localization. In Proceed-
    ings of the IEEE International Conference on Computer Vision pp. 618-626.

</pre>