Deep Learning Approaches to Classification of
        Production Technology for 19th Century Books

             Chanjong Im, Junaid Ghauri, John Rothman, Thomas Mandl

                  University of Hildesheim, Information Science, Germany
                             mandl@uni-hildesheim.de
                            imchan@uni-hildesheim.de


       Abstract. Cultural research is dedicated to understanding the processes of
       knowledge dissemination and the social and technological practices in the book
       industry. Research on children books in the 19th century can be supported by
       computer systems. Specifically, the advances in digital image processing seem
       to offer great opportunities for analyzing and quantifying the visual components
       in the books. The production technology for illustrations in books in the 19th
       century was characterized by a shift from wood or copper engraving to lithog-
       raphy. We report classification experiments which intend to classify images
       based on the production technology. For a classification task that is also diffi-
       cult for humans, the classification quality reaches only around 70%.. We ana-
       lyze some further error sources and identify reasons for the low performance.


       Keywords: Digital Humanities, Classification, Children Books, Image Pro-
       cessing, CNN.


1      Introduction: Historical Children Book Research

Digital Humanities is having a considerable impact on humanities research related to
text. Many text mining tools have been developed and are currently being applied to
genuine research questions in the humanities. This trend is currently contributing to a
larger variety of methods being used. There has been no comparable paradigm shift in
research related to visual material.
   Research on historical children books has yet not often been the subject of digital
humanities (DH) studies. This research requires processing for both text and images.
Children books are considered to contain more images than adult books typically. As
a consequence, they are of special interest for an analysis of images. In addition, they
form a closed category on the one hand which contains sufficient variety on the other
hand.
   Illustrated books have played a significant role in knowledge dissemination. The
declining production costs for printed images have led to a growing exposure of more
and more people to rich visual resources. Research in this area can identify trends in
2


the objects depicted, as well as the stylistic and aesthetic presentation of the visual
resources [11]. Images also allow the tracing of influence over time by observing
adoptions of content or style.
   For cultural studies, it is necessary to observe efficiently whether the shifts in pro-
duction technology in the 19th century have led to aesthetic developments. Modern
technologies like lithography were more adequate for mass production and allowed
finer lines.
   This shows, that meta data has a great importance for research in children books.
However, annotated meta data for production technology is not always easily availa-
ble. For many books, the title data can even provide erroneous information on the
production technology. Some publishers claimed that they include copper engravings
as a marketing argument, however, in many cases this is not true and e.g. lithography
images were produced.
   As a consequence, one sub task in the DH support for the cultural studies in gen-
eral and the children books research in particular is the development of robust classi-
fication systems which are capable of identifying the production technology based
exclusively on visual information.


                    Fig. 1: Example of an illustration from the collection


   For humans, the identification of the printing technology used for historic books is
also not a trivial task. Only experts in the area are able to classify images for this goal.
They sometimes need magnifying glasses and the task is often easier when the origi-
nal paper version is available. For the digital version, this classification is typically
even more difficult.
   There are also many different forms of lithography (e.g. chromolithography).
However, for the purposes of our study the identification of lithography in general is
sufficient.
                                                                                        3


2      State of the Art

To the best of the authors’ knowledge, this is the first attempt to classify the printing
technology using images from books. Other previous work is focused on identifying
the positions of text and image blocks within a page. The HBA data challenge for old
books intends to improve algorithms for this task [8]. These books are much older
than the Hobrecker collection and contain also manuscripts.
   Most of the research in image processing is currently being carried out for photo-
graphs. These collections vary greatly from the non-realistic drawings and illustra-
tions which can be found in children books. It is unclear how CNN and other models
optimized for photographs perform for these images.
   Also in the analysis of art, there is little work on mining approaches. Few research-
ers have processed large amounts of images in this field. One experiment by Salah &
Elgammel is dedicated to classify the painter of artistic work. Such work is highly
dependent on the type of paintings in the collection [10].
   A recent project is focusing on graphic novels. Current state of the art CNNs are
applied to tasks like author identification with very good success [6]. In addition, the
processing is aimed at measuring the drawing style of a graphic novel in order to find
similar books. However, the production technology is not relevant for these modern
books.
   The work that seems to be most similar to the classification of historical printing
technologies is research in the area of texture recognition. Some authors manage to
classify material from visual data. The materials are different types of plastic. The
algorithm learns the typical texture structure of each material based on magnified
data. For a material classification task, narrow structures for a CNN were used [1].
There has also been work towards applying deep networks and transfer learning for
material classification. Cimpoi et al. conducted material classification with deeper
structure and transfer learning [4]. Some authors achieved very successful results
showing an accuracy of 99 to 99.9 percent with specially constructed dataset i.e. CU-
ReT [5].


3      Data Collection and Classification

One of the first goals for the research of images in historical children books lies with-
in the production technologies. As a classification problem with few classes, it seems
like a challenge which could be solved with current technology.
   Our primary assumption is, that differences in printing technology can be observed
only at the detail level. Therefore, resizing of the images is being avoided. Much ra-
ther, the decision was made to extract small parts of the images in full resolution in
order to retain the minute details of the printing. It is much less relevant to size imag-
es and make sure all objects are fully contained in an image because the problem
seems not related to object identification.
   Consequently, the few available training images were cropped into small parts
without re-sizing them which also leads to more training examples.
4


3.1    Data Collection
The data collection is based on the Hobrecker collection. Karl Hobrecker was a col-
lector of children books. His books are now archived in the library of the Technical
University of Braunschweig. A subset has been digitized and is available online [13].
The collection is of great interest for cultural research. It contains different types of
children books mainly from the 19th century: e.g. alphabetization books, biographies,
natural history descriptions as well as adventure and travel stories. The scanned
Hobrecker collection of children’s books available for digital analysis contains around
350 books. There is no knowledge about the distribution of printing technologies
within the entire set. It would require too much human effort to manually classify all
books.
    Out of these, 32 books were carefully labeled by experts and are used for the clas-
sification tasks. The experts classified the books into either wood engraving or lithog-
raphy.


               Fig. 2: Examples cropping using slicing and a contour method


The images in the scanned books were separated from the text by using image prepro-
cessing techniques as described by Ban [2]. To minimize the pixel information loss on
images, all the images are cropped into the size of 128 * 128 pixels. In order to avoid
an imbalance problem for classification, the number of crops are set equally by uni-
formly sampling from each book. An example for a crop is shown in figure 2.

   Table 1 shows some general statistics of the 32 books sorted by the two printing
technologies.
                                                                                        5


                               Table 1. Training set statistics.

       Printing type      Number of books      Number of images    Number of crops
       Wood Engraving     14                   349                 2235
       Lithography        18                   173                 2235


3.2    Classification
Convolution Neural Network (CNN), the recent state of the art technology is known
to be very effective in automated feature detection and subsequent classification in
many domains [e.g. 14,15]. In the approach presented in this paper, CNNs are used as
the processing model for the printing type classification.
   Similar tasks have conducted material classification with a narrow structure of
CNN. Mehri et al. conducted material classification with deeper structure and transfer
learning [8].
   The major difference between the dataset used in material classification and the
Hobrecker dataset is that the Hobrecker dataset contains a lot more noisy features due
to the nature of scanned images from old books which are often compromised by age.
The dataset used for material classification is created for the main purpose of material
analysis, whereas the material for the books is always paper. However the quality and
type differ greatly.
   Two methods are used for the task. One is using the pre-trained model. We applied
the Inception Network as suggested by Szegedy [12], for feature extraction and use
these features to feed fully connected neural networks (FCN) as well as Support Vec-
tor Machine (SVM) for classification. The second method uses a slim CNN architec-
ture, similar to the one used by Ban [2], and trains the model with a randomized
weight initialization.. Several modifications are made on top of this model. Firstly, the
input size is increased from 64 *64 to 128 * 128. Secondly, two kinds of deeper net-
works with more convolution layers and pooling layers are tested. The architectures
are shown in Table 2.
   These architectures were used for the following motivation. The size the of the fea-
tures space to be learned is unclear. So we initially selected two different filter sizes
to account for that.
   For the smaller filter size, two different layer structures were applied. The first one
uses one pooling layer for each convolution layer. The second one uses fewer pooling
layer.
6


Table 2. Two CNN architectures used for printing type classification. Values in the brackets of
Conv represents the kernel size. For example, Conv(11) is convolution layer with (11*11) sized
 kernel. * represents number of same layers in the flow. Pool represents average pooling with
                               (2*2) sized kernel and stride of 2.

Name                 Architecture
Big-Filters          Conv(11)-Pool-Conv(10)-Pool-Conv(6)-Pool-Conv(3)-Pool-Conv(3)-Pool
Small-Filters-       Conv(3)*5-Pool-Conv(3)*4-Conv(2)-Pool-Conv(6)-Pool-Conv(3)-Pool-
Less Pooling         Conv(3)-Pool
Small-Filters-       Conv(3)-Pool-Conv(4)-Pool-(Conv(3)-Pool)*3-Conv(2)
Balanced Pooling


4       Results and analysis

4.1     Results
Unlike the results of CUReT [5], transfer learning results turned out to be poor of
maximum of 58 % of accuracy. This accuracy is achieved when using FCN of two
hidden layers with 512 nodes respectively using the features passed through by the
pre-trained Inception model. Replacing FCN with SVM decreased the performance
leading to 47% for linear kernel and 52% for non-linear kernel.
   The results using the model from Ban [2] showed 47% of accuracy and a similar
model adopted from Mehri et al. [8] showed 51% of accuracy at the most. The results
of the small-filters less-pooling architecture achieved 61% accuracy and Big-Filters
showed the best performance of 63% classification accuracy. Small-Filters-Balanced-
Pooling showed the lowest performance of 48% among the architectures which were
tested. An overview of the results can be seen in Table 3.

                          Table 3. Results of the CNN architectures

       Name                                           Accuracy
       Big-Filters                                    63%
       Inception Model with Neural Networks           58%
       Inception Model with linear SVM                47%
       Inception Model with linear SVM                52%
       Small-Filters-Less Pooling                     61%
       Small-Filters-Balanced Pooling                 48%


    Some examples for misclassified iages are hown in Figure 3.
                                                                                      7


                        Fig. 3. Three misclassified examples.


4.2    Analysis and Discussion
There are three potential reason for the low accuracy we obtained in the results.
   Noisy dataset: The nature of scanned files leads to difficulties in automated
       image extraction and preprocessing. For example, in the image placed left in
       fig.2, a part of the character is cropped from the scanned file. Such kind of
       crops are not eliminated in the dataset due to background noise confusing the
       algorithms used in extraction phase. Moreover, quality of paper that is scanned
       is not in equal conditions across books. For example, some books tend to be in
       a very good shapes with no sign of text marks printed on top of image. CNNs
       react very sensitive to this noise when small filters are used along with pooling
       methods. Using bigger sized filters at the initial stage of convolution opera-
       tions, gives a model a flexibility on the noises.
   The original assumption about crops might not be correct. Rather full
       knowledge of an image might be required to classify. This seems to be also
       relevant for humans. Some of the misclassified cropped images were presented
       to experts and they could not easily distinguish between the printing technolo-
       gies. They required to see the full image. For future experiments, it is intended
       to work with full images. It seems the experts need to see minute details as
       shown in crops but also the flow of lines overall in order to reach a definite
       decision. As a consequence, future classification experiments will consider the
       output of high and low levels within the CNN.
   Insufficient amount of data: To perform tasks with deep learning, more data is
       needed for training.
8


5         Outlook

   For optimizing the identification of the printing type, more data is obtained. La-
belled images from the Pictura Paedagogica Online (PPO) [3] are obtained and will be
processed.
   Currently, experiments with object recognition on the collection are being carried
out with Yolo [9] which is a pre-trained model based on photographs. It is interesting
to check whether it also works for non-realistic images as we find them in historic
books. First results indicate that the performance is very different for individual clas-
ses. The object recognition with a satisfying quality would allow the humanities
scholars to trace the frequency of objects and the introduction of new knowledge to
children’s books.


Acknowledgements
We thank the Fritz Thyssen Foundation for their funding for the research project Dis-
tant Viewing. We would like to thank the library of the Technische Universität Braun-
schweig for facilitating access to the digitized Hobrecker collection.


References
    1. Andrearczyk, V., Whelan, P. F. (2016). Using filter banks in convolutional neural net-
       works for texture classification. Pattern Recognition Letters, vol. 84, pp. 63-69.
    2. Ban, J. (2018). Image Classification Optimization by Image pre-processing using CNN.
       Master Thesis, Hildesheim University.
    3. Bibliothek für Bildungsgeschichtliche Forschung. Pictura Paedagogica Online.
       http://opac.bbf.dipf.de/virtuellesbildarchiv/
    4. Cimpoi, M., Maji, S., Kokkinos, I., Vedaldi, A. (2016). Deep filter banks for texture
       recognition, description, and segmentation. International Journal of Computer Vision, vol.
       118 (1) pp. 65-94
    5. CUReT Homepage, http://www.cs.columbia.edu/CAVE/software/curet/html/sample.html,
       last accessed 2018/05/31.
    6. Dunst, A, & Hartel, R (2018). Auf dem Weg zur Visuellen Stilometrie: Automatische Gen-
       re- und Autorunterscheidung in graphischen Narrativen. Kritik der digitalen Vernunft. 5.
       Tagung „Digital Humanities im deutschsprachigen Raum“ http://dhd2018.uni-
       koeln.de/wp-content/uploads/boa-DHd2018-web-ISBN.pdf
    7. Mehri, M. , Gomez-Krämer, P. , Heroux, P., Boucher, A. and R. Mullot (2017). A tex-
       ture-based pixel labeling approach for historical books, PAA , pp. 325–364.
    8. Mehri, M. Heroux, P. Gomez-Krämer, P. and R. Mullot: (2017). Texture feature bench-
       marking and evaluation for historical document image analysis, International. Journal
       on Document Analysis and Recognition (IJDAR) pp. 325–364,
    9. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified,
       real-time object detection. In Proceedings of the IEEE conference on computer vision and
       pattern recognition (pp. 779-788).
                                                                                            9


10. Saleh, B., and Elgammal, A. (2015). A unified framework for painting classification. In
    Proceedings of the IEEE International Conference on Data Mining Workshop (ICDMW)
    pp. 1254-1261
11. Schmideler, S. (2014).: Das bildende Bild, das unterhaltende Bild, das bewegte Bild – Zur
    Codalität und Medialität in der Wissen vermittelnden Kinder- und Jugendliteratur des
    Hybridisierung – Intermedialität – Konvergenz. Hrsg. von Gina Weinkauff u.a. Frankfurt
    am Main (= Kinder- und Jugendkultur, -literatur und -medien 89), pp. 13-26.
12. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D. and Rabinovich, A.
    (2015): Going deeper with convolutions. Proceedings of the IEEE conference on computer
    vision and pattern recognition (pp. 1-9)
13. UB TU Braunschweig, Hobrecker Kollektion Online. https://publikationsserver.tu-
    braunschweig.de/servlets/solr/select?q=cate
    gory.top%3ADDC\%3A398+AND+state%3Apublished
14. Goeau, H., Bonnet, P., and Joly, A. (2017, September). Plant identification based on noisy
    web data: the amazing performance of deep learning (LifeCLEF 2017). In CLEF 2017-
    Conference and Labs of the Evaluation Forum (pp. 1-13).
15. Rouhi, R., Jafari, M., Kasaei, S., and Keshavarzian, P. (2015). Benign and malignant
    breast tumors classification based on region growing and CNN segmentation. Expert Sys-
    tems with Applications, 42(3), 990-1002.