Generic Semantic Segmentation of Historical Maps
Rémi Petitpierre1 , Frédéric Kaplan2 and Isabella di Lenardo1
1
    Institute for Area and Global Studies, EPFL, Lausanne, Switzerland
2
    Digital Humanities Laboratory, EPFL, Lausanne, Switzerland


                                 Abstract
                                 Research in automatic map processing is largely focused on homogeneous corpora or even individual
                                 maps, leading to inflexible models. Based on two new corpora, the first one centered on maps of
                                 Paris and the second one gathering maps of cities from all over the world, we present a method
                                 for computing the figurative diversity of cartographic collections. In a second step, we discuss
                                 the actual opportunities for CNN-based semantic segmentation of historical city maps. Through
                                 several experiments, we analyze the impact of figurative and cultural diversity on the segmentation
                                 performance. Finally, we highlight the potential for large-scale and generic algorithms. Training
                                 data and code of the described algorithms are made open-source and published with this article.

                                 Keywords
                                 historical map processing, neural networks, semantic segmentation, computer vision, topology


1. Introduction
The creation of large digital databases on urban development is a strategic challenge, which
could open new perspectives in urban planning, environmental sciences, sociology, economics,
and analysis of urban ecosystems in general [16, 22]. Digital geohistorical data can also be
valorized by cultural institutions, for instance in the form of 3D/4D models [28, 29]. When
the data are of good quality, and when large and homogeneous corpora are considered, it is
possible to obtain excellent segmentation results with traditional computer vision and decision
algorithms [37, 38]. However, these algorithms are very specific. They are based on a detailed
and exact knowledge of the processed map and its figuration [10]. This implies that the process
potential of these inflexible methods lies only in large and very homogeneous cartographic
corpora. Consequently, map vectorization is still largely manual, despite being extremely
time-consuming. In order to process the immense and diverse cartographic collections hosted
by heritage institutions around the world, the development of generic and automatic tools is
required.
   This research intends to set the foundations for a generic approach of the semantic segmenta-
tion of historical maps. The ambition is to design a system capable of processing map corpora
characterized by both graphical and content heterogeneity. Recent progress on convolutional
neural networks (CNN) tend to support the idea that genericity can be achieved, in particular
for segmentation tasks [42]. However, the challenges of generic semantic interpretation (or

CHR 2021: Computational Humanities Research Conference, November 17–19, 2021, Amsterdam, The
Netherlands
£ remi.petitpierre@epfl.ch (R. Petitpierre); frederic.kaplan@epfl.ch (F. Kaplan); isabella.dilenardo@epfl.ch
(I.d. Lenardo)
Ǳ 0000-0001-9138-6727 (R. Petitpierre); 0000-0002-6991-5730 (F. Kaplan); 0000-0002-1747-9164 (I.d.
Lenardo)
                               © 2021 Copyright for this paper by its authors.
                               Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Wor
    Pr
       ks
        hop
     oceedi
          ngs
                ht
                I
                 tp:
                   //
                    ceur
                       -
                SSN1613-
                        ws
                         .or
                       0073
                           g

                               CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                   228
representational flexibility [24]) of graphical objects produced in diverse technical and cultural
contexts remain largely unsolved. In the following sections, we experimentally map the design
space for such a generic processing pipeline and present first working prototypes tested on two
large map corpora.

1.1. Previous Works
Classical segmentation algorithms are based on the specific knowledge of a map collection.
Despite color appearing relatively late in map printing processes [47], this graphical component
is frequently used. Some specialists even consider color-based pre-processing an “essential” task
[36]. The simplest paradigm in this regard is color thresholding [12, 15]. Other studies add a
morphological approach, using region growing algorithms [9, 31, 30], or rely on human feedback
[11]. In deep learning methods, region growing is also used to flood fill and add a semantic layer
to extracted polygons using watershed [41]. Another salient graphical component is texture.
Hatched areas are particularly targeted [56, 4, 39]. Other methods focus on texture energy
[34]. These approaches in general have met some success on textured maps [36, 51, 55]. Unlike
color, however, textures have many degrees of freedom, such as size and rotation. Their use
for segmentation therefore requires fine parametrization.
   Beyond pure graphical marks, some researches focus on detecting morphological features,
such as lines [34], or closed polygons [35]. These approaches are generally confronted with the
problem of incomplete lines, due to the degradation of the document or graphical choices (e.g.
dashed lines) and therefore require the development of reconstruction algorithms [33, 2, 25,
49]. Moreover, the extraction of the geometries is sensitive to information overlay, which is
extremely frequent in cartography. Many specific methods were therefore developed to detect
and eventually remove every disruption, such as the map grid [26], the background texture
[34], the text [1], or the symbols [57]. The detection of these interfering elements generally
requires precise knowledge of their nature and their visual characteristics: size, shape, texture,
color, etc [19]. To conclude, the extraction of the morphology is still unable to add a semantic
layer to the map content. For instance, a rectangle might well be a building, but it can equally
well be a courtyard or a basin.
   More recently, CNNs opened up new perspectives for the resolution of semantic segmentation
problems [32]. In 2019, a first successful transfer was presented with the segmentation of parcels
from ”Napoleonic” cadastral maps of Venice [41], using a UNet architecture with a ResNet
encoder. In 2020, further results were obtained on the extraction of the railway network from
some USGS maps, using a modified PSPNet as encoder [8]. Heitzler and Hurni also presented
a research on the extraction of building footprints from the Swiss Siegfried national maps
(1872-1949) [21]. Other works were focusing on extracting specific elements, such as places of
archaeological interest [17], or road types [6]. This research question being topical, the ICDAR
2021 conference dedicated a competition to the segmentation of building blocks on a corpus
of maps from the Bibliothèque historique de la Ville de Paris [7]. One solution in particular
stood out by proposing to use a DenseNet [23] architecture.
   On the other hand, [54] proposed a method to operationalize cartographic figuration based
on color-histogram based moments. The vector projection of these descriptors allows one to
eﬀiciently visualize the figurative conventions of a map or a corpus of maps. Subsequently,
a texture extraction method has also been proposed [53]. Other efforts to operationalize
cartographic figuration have also been carried out concerning graphic map load, i.e. the visual
density of cartographic content [3, 52]. These researches are part of the field of information


                                               229
theory.

1.2. Research approach
In this research, we seek to better understand the impact of figuration and figurative diversity
on the learning of neural networks, in the context of the semantic segmentation of historical
city maps. Thus, our research questions are the following: 1) How to measure the figurative
diversity in a map corpus? 2) How robust are CNNs when facing high figurative diversity?
   To answer these research questions, we first present a method to operationalize the carto-
graphic figuration and measure the figurative diversity of a map corpus. We demonstrate the
significant variability of our data, in comparison with other map corpora commonly found in
the literature. Then, we propose an effective processing pipeline involving pre-segmentation of
the map frame and semantic segmentation of the map itself. We highlight the potentials and
the limitations of neural networks for solving generic semantic segmentation problems. Finally,
we conduct a set of experiments that allow to investigate the learning mechanisms deployed by
neural networks and to explore the design space of map semantic segmentation. Specifically,
we seek to qualify the impact of learning transfer on each class. We then aim to evaluate the
perspectives for an educated corpus constitution, through examination of cross-cultural per-
formance and confidence prediction. Finally, we challenge the importance of graphical cues,
in comparison to non-graphical concepts.

1.3. Dataset
To create an experimental context that challenges the genericity of the semantic segmentation,
we contrast two corpora that present a different kind of variability. The first corpus gathers
330 maps of Paris from the collections of the Bibliothèque nationale de France (BnF) and the
Bibliothèque historique de la Ville de Paris. Most of the maps were published between 1800 and
1950 in the Paris region, and their scale is generally comprised between 1:25’000 and 1:2’000.
The second corpus gathers 256 maps of cities from all over the world, including numerous
reference maps. They come from 32 different collections, the main ones being: the BnF,
the Library of Congress, the Harvard Library, the David Rumsey Collection, the University
of Bordeaux, the British Library, the Boston Public Library and the Institut Cartogràfic i
Geològic de Catalunya. In total, the corpus represents 182 different cities in 90 countries.
The distribution across the various regions is balanced, except for Oceania (8 maps only).
The regions with the most maps are Eastern Europe & Central Asia (34), Western Europe
(34), and East Asia (30). Conversely, the regions accounting for the fewest maps are Oceania,
Subsaharian Africa (15), and Middle East (17). The urban form of each map in the World
corpus was manually classified into three categories: regular (95 maps), irregular (68), or mixed
(93). Most maps were published between 1720 and 1950. However, due to historical reasons,
most non-Western publications occurred after 1800.
   Both corpora present specific diﬀiculties. For the Paris corpus (Fig. 1), one of the most
complex issue to apprehend is the information cluttering or overlay. In particular, the super-
imposition of information concerning mobility, such as road or underground network (Fig. 1
A.2-3), but also on the water system, the catacombs, or more simply on administrative di-
visions (Fig. 1 B.3-4). For Paris, this intensive use of the city map as a tool for planning
urban works was likely caused by the lack of a proper cadastre before the late 19th century.
Low-contrast may also make the map images diﬀicult to read (Fig. 1 B.3, C.4). For the World


                                              230
                1                      2                     3                      4


   A


   B


   C


Figure 1: Map samples from the Paris corpus.


corpus, besides the extreme cultural diversity, the complexity is due to the differences in scale
(Fig. 2 A.1-2), in formality (Fig. 2 A.3-4, B.1), in the graphical emphasis of monuments (Fig.
2 B.2-4), in the use of writing to represent areas or zones (Fig. 2 C.1-2), to differences in the
regularity of urban form (Fig. 2 C.3-4, D.1), to figurative specificities, such as field shadowing
(Fig. 2 D.2) or blueprint-style (Fig. 2 D.3), to information overlay (Fig. 2 D.4), to digitization
imperfections or poor scanning quality (Fig. 2 E.1-2), and to material alterations (Fig. 2
E.3-4).
   To summarize, we will be comparing two corpora. The first one, centered on Paris, is
culturally and geographically homogeneous, but figuratively diverse. While the second, which
will be called the World corpus, is culturally, geographically, and figuratively heterogeneous.
   To constitute the training set, 330 random patches of 1000 pixels square are randomly cut out
of each map in the Paris corpus. 30 are then selected for the validation set. For the World, 256
random patches are cut out of each map for the training set, and 49 additional patches are cut
for the validation set. Four maps from the World corpus, whose resolution was too low, were
upresolved by a factor 2 using CNN-based super-resolution [5]. The patches are then manually
annotated using a raster-based software. The total annotation time is estimated to be 7 weeks
of work. A simple 3-classes ontology and an extended 5-classes one are defined. The classes
from the simple ontology correspond to: the road network, including railroads and bridges;


                                               231
               1                      2              3   4


 A


 B


 C


 D


 E


Figure 2: Map samples from the World corpus.


                                               232
the frame, i.e. to non-geographic content of the document; and the map content, i.e. all the
geographic content that do not belong to the road network. In the extended ontology, the
map content was subdivided into 3 additional classes: the blocks, the water, and the non-built,
which in fact includes all non-aquatic unbuilt land, except the road network, i.e. wasteland,
meadows, crops, forests, but also parks, or inner courtyards. The Tab. 1 summarizes the
distribution of classes in the different corpora and sets. The training patches are open-source
and freely available online [45].

Table 1
Proportion of areas covered in the different training and validation sets.
                 Corpus      Set     Frame     Road netw.      Blocks    Water    Non-built
                            train    0.222       0.123         0.175     0.121     0.359
                  World
                             val     0.185       0.114         0.178     0.073     0.450
                            train    0.236       0.208         0.338     0.026     0.192
                  Paris
                             val     0.166       0.204         0.360     0.045     0.226
                            train    0.226       0.123                    0.651
                  World
                             val     0.181       0.204                    0.701
                            train    0.236       0.206                    0.559
                  Paris
                             val     0.166       0.201                    0.633

  As in [7], we consider that the problem of the segmentation of the map frame is a different
problem, mainly for scale reasons. This step is therefore carried out beforehand, and the pre-
segmented background class is indicated as +1 in the next pages. The pre-segmentation simply
takes the form of a mask applied on the input images


2. Methods
2.1. Operationalization of the figuration
To make cartographic figuration measurable, we extracted 3 sets of descriptors related to color,
texture, and orientation. The color and texture features are based on previous works by Uhl
et al. [54, 53]. First, mean and standard deviation are computed on the distribution of each
color channel. Then, 256-bins histograms are extracted for each of the latter and the skewness
and kurtosis in the distribution are computed. The mean is used to transcribe the hue and
the color value, the standard deviation indicates the contrasts. The skewness is a descriptor
of the asymmetry of the color distribution, while the kurtosis is a flattening coeﬀicient of the
curve and therefore also allows to describe the contrasts. Color standard deviation, kurtosis
and skewness are summed for the 3 channels. Second, local binary patterns (LBP, [40]) are
computed, using a radius of 2, on the Otsu-binarized images [43]. A 4-bins histogram is
extracted on the LBP values. LBP are invariant to color value (i.e. brightness) and rotation.
They can help differentiate between edges, corners or flat surfaces at a local scale and thus
characterize the texture at a larger scale. Third, a 24-bins histogram of oriented gradient
(HOG, [13]) is computed. Each bin corresponds to a certain orientation angle of the image
local gradients. Therefore, they are ultimately grouped into 5 categories, summarizing the
orientation of the gradients: vertical (±π), horizontal (±π/2, ±3π/2), diagonal (±π/4, ±3π/4),
regular oblique (±π/6, ±2π/6, ±4π/6, ±5π/6), and irregular oblique (all other orientations).
At this point, there are 14 features in total: 6 (3+3) color descriptors, 4 LBP descriptors, and


                                                     233
Figure 3: t-SNE projection of the figuration descriptive features of the Plan of the City of Moka.


5 HOG descriptors. Together, these descriptors can characterize the cartographic figuration,
as can be seen in Fig. 3.
   These visual features are computed, using 50x50 sub-patches, for each image of both corpora,
as well as for the USGS [8], the Napoleonic cadaster [41], and the ICDAR21 dataset [7], for
comparison purposes. On the one hand, the inter-corpora inter-class correlation is computed
between the Paris and World datasets, as well as the intra-corpus inter-class correlation within
each corpus.
   On the other hand, a 32-bins histogram is computed on the distribution of each feature.
As the distributions can be multimodal, due to repetitive homogeneous figuration, the modes
are extracted by smoothing the histogram with a Savitzky-Golay filter [50] of width 3 and
polynomial degree 1. The local minima are identified on a window of width 3 and the histogram
is split between each mode. To investigate the first research question, we want to determine
how much these features vary in a map corpus. To this extend, a κ-coeﬀicient is defined as the
proportionally weighted sum of the kurtosis on each mode of the distribution. In other terms,
the κ-coeﬀicient is a measure of the acuteness of the feature distribution in a map corpus, and
can thus characterize the homogeneity of the figuration. As the value of the κ-coeﬀicient can
vary according to the size of the corpus, the bigger sample sets were randomly downsampled,
without replacement, to the size of the smallest sample set (here the Napoleonic cadaster set).
The κ-coeﬀicient was then computed 5000 times for various downsampling schemes and for
each feature, the median κ-coeﬀicient was retained as an estimator of the real κ-coeﬀicient.
The bias of this recalibration is below ±3.6% for the World, and below ±2.0% for Paris, with
a confidence of 95%. The code of the described algorithms is made open-source and published
with this article [44].


                                                   234
2.2. Map segmentation
For the semantic segmentation, we are using a CNN with UNet architecture [48] and ResNet
[20] as encoder, implemented in a Pytorch version of the open-source tool dhSegment [42, 27].
The batch size and learning rate parameters are optimized. The method and the results of
the tests are detailed in [46]. The selected parameters are a batch size of 1, as in the original
article by Long and Shelhammer [32], and a learning rate of 5 ∗ 10−5 . The decoder weights are
initialized using Glorot and Bengio uniform method [18]. The training data are augmented
by side flip and upside-down flip, and by rotation (rϵ [0, π]). The loss used is cross entropy.
The optimization relies on stochastic gradient descent (SGD). Unless otherwise specified, the
encoder weights are first pretrained on ImageNet [14].
   The second pretraining is done in a crossed way, Paris being pretrained on the World, and
the World being pretrained on Paris, as described in the following subsection. The CNN is
then trained successively during 150 epochs on the Paris (2+1 and 4+1) datasets and on the
World (2+1 and 4+1) datasets. ResNet101 is used as encoder with LeakyReLU as activation.
   Three metrics are used to quantify performance: intersection over union (IoU), precision,
and recall. In a second step, the confusion matrices between the different classes are computed.
They are normalized regarding the proportion of pixels belonging to the class, according to
the ground-truth.

2.3. Semantic segmentation experiments
2.3.1. Impact of learning transfer on each class
In a first experiment, the CNN with ResNet50 as encoder was trained during 100 epochs on the
Paris 5-classes-dataset, respectively on the World 5-classes-dataset. After the first training,
the network was re-trained on the Paris set, using the weights trained in the previous step on
the World corpus as initialization. Reciprocally, the World set was re-trained by initializing
this time the weights on the Paris corpus.

2.3.2. Analysis of cross-cultural performance and dataset design
To estimate the bias of the validation set, and to investigate cross-cultural performance, an 8-
fold cross validation was performed on the World 4+1 dataset. Each time, the CNN was trained
during 100 epochs as described in the subsection on the performance of semantic segmentation.

2.3.3. Perspectives on confidence prediction
In this experiment, we attempted to create an estimator of confidence at the patch scale. First,
a 10-fold cross validation was performed on the Paris 3-classes-dataset in order to estimate the
error on each patch of the training set. A simple ResNet50 encoder was used and trained
during 60 epochs each time. The output predictions from the kth-fold are compared with the
ground truth, and an accuracy map, in which the pixels take the value 1 if the prediction is
correct and 0 if the prediction is wrong, is created.
  A second network, identical to the first one, is then trained on 300 pairs of images and
accuracy maps, and validated on another 30 pairs. Instead of segmenting, the aim of this
network is to predict the accuracy map corresponding to the input image. The output of the
CNN prediction is classified using a global threshold, which is set to meet the mean accuracy of
the training set. The confidence index is defined as the patch accuracy, and the reference as the


                                              235
accuracy previously measured by k-fold cross validation. In order to evaluate the confidence
prediction performance, an 8-fold cross validation was performed on the World 4+1 dataset.
Each time, the CNN was trained during 100 epochs with the same parameters as in 2.2.

2.3.4. Importance of graphical cues for learning

                    REFERENCE            GRAY              BINARY        TEXTURELESS BIN.


Figure 4: Images processed by phasing out visual characteristics.


  This experiment aims to determine what role color, texture, and morphology take in the
CNN performance, in contrast with non-graphical concepts. For this purpose, the images of
the training and validation sets are subjected to 4 different treatments (Fig. 4): reference,
gray, binary, and textureless binary. For the reference, the images are not modified in any
way. The second treatment is the gray treatment. In the latter, the RGB color channels of the
image are transformed into a grayscale. For the third treatment, the images are transformed
into a grayscale, then binarized [43]. For the fourth treatment, the images are transformed
into a grayscale, binarized, and texture is extracted with LBP [40] (r = 3). Finally, a second
Otsu thresholding is applied. The CNN is trained 5 times for 60 epochs separately on each of
the 4 datasets.


3. Results
3.1. Operationalization of the figuration

Table 2
Median overall κ for studied∗ and comparison† datasets. Lower values indicate greater figurative diversity.
                                Paris∗   World∗     Napoleonic†     ICDAR21†    USGS†
                    median κ    1.97     1.99       4.36            5.04        29.1

   The median overall κ was computed on each dataset (Tab. 2). The result is very close for
the two studied corpora, while the Napoleonic cadaster and the ICDAR dataset are already
further away. The USGS map is in a different order of magnitude.
   Fig. 5 and Tab. 3 are the aggregated representations of the correlations of the figurative
features between and within both corpora. The frame class seems to be represented very
similarly in both corpora (ρ = 0.96, Fig. 5), while the non-built class is the most distant
(ρ = 0.81, Fig. 5). In the Paris corpus, the blocks class seems to stand out clearly from the
other classes (ρ̄Blocks = 0.775, Tab. 3, and Fig. 5), while in the World corpus, it is rather the
road network that stands out (ρ̄RoadN etwork = 0.888, Tab. 3, and Fig. 5). In general, however,
all classes are more distinct in the Paris corpus (ρ̄M ean = 0.842 for Paris, ρ̄M ean = 0.917 for
the World, Tab. 3).


                                                    236
                        Non-built


                                                                                                                Inter-corpus class correlation World-Paris

                                                                                                                                                             Inter-class correlation World
                                                                                Inter-class correlation Paris
                        Water
                     Network
                      Road
                        Blocks
                        Frame


                                    Frame   Blocks     Road   Water Non-built
                                                      Network

Figure 5: Aggregated correlation heatmap matrix of intra-corpus inter-class correlation (orange for the
World, cyan for Paris) and inter-corpus class correlation (grey diagonal). (rpearson ϵ [−1, 1]). All correlations
are significant (pvalue < 0.05).


Table 3
Mean intra-corpus inter-class correlations, per class
                                                     Corpus       World   Paris
                                                     Mean         0.917   0.842
                                                     Frame        0.908   0.846
                                                     Blocks       0.924   0.775
                                                     Road net.    0.888   0.862
                                                     Water        0.939   0.877
                                                     Non-built    0.924   0.847


3.2. Map segmentation
The results of the third experiment are summarized in Tab. 4 and Fig. 6. Some example
prediction outputs can be observed in Fig. 7. The clearest finding is the consistent drop in
performance, when increasing the number of classes from 2+1 to 4+1. The mean IoU (mIoU)
on the 4+1 classes problems is not suﬀicient for reliable map segmentation. However, the mIoU
is good on both World and Paris 2+1 corpora. The drop is more noticeable for precision than
recall. The second clear difference occurs between the Paris and the World corpora, the first
performing better. Again, the disparity is mostly due to a low recall. It is worth noticing that
the top-50% of the World corpus performs very similarly to Paris average sample.
   For the Parisian corpus, most confusion occurs as non-built is predicted as blocks, or to a
lesser extent, as blocks are predicted as non-built, or as non-built is predicted as blocks. As it
is also noticeable in Tab. 4, non-built is by far the class performing worse in the 4+1-classes
problem. However, the water is the class suffering from the lowest precision score. In the 2+1-
classes problem, the blocks are sometimes classified as road network, which heavily impacts
the precision of the road network class.


                                                                 237
Table 4
Performance achieved, per class and classes mean, on the two datasets, for 2+1 and 4+1 classes problems.
For 2+1 classes problems, precision and recall were also computed on the top-50% of the dataset, selected
by mIoU.
                                                                 Paris 2+1                    World 2+1
       Metric                    Class          Paris 2+1                      World 2+1                  Paris 4+1             World 4+1
                                                                 Top 50%                      Top 50%
                                 Mean           0.8905           –             0.8055         –           0.6363                0.5595
                                 Frame          0.9953           –             0.9924         –           0.9810                0.9881
                                 Blocks         0.9181           –             0.9114         –           0.5657                0.3559
       IoU
                                 Road Net.      0.7580           –             0.5147         –           0.7132                0.4682
                                 Water          –                –             –              –           0.4682                0.3318
                                 Non-built      –                –             –              –           0.4235                0.6538
                                 Mean           0.9292           0.9679        0.8544         0.9205      0.7292                0.6986
                                 Frame          0.9959           0.9969        0.9935         0.9935      0.9838                0.9893
                                 Blocks         0.9689           0.9774        0.9730         0.9816      0.6872                0.4353
       Precision
                                 Road Net.      0.8229           0.9295        0.5967         0.7863      0.7874                0.5348
                                 Water          –                –             –              –           0.4856                0.7098
                                 Non-built      –                –             –              –           0.7017                0.8240
                                 Mean           0.9456           0.9698        0.9062         0.9445      0.8175                0.7187
                                 Frame          0.9996           0.9990        0.9989         0.9992      0.9971                0.9988
                                 Blocks         0.9448           0.9736        0.9350         0.9957      0.7618                0.6611
       Recall
                                 Road Net.      0.8924           0.9368        0.7848         0.8686      0.8832                0.7898
                                 Water          –                –             –              –           0.9288                0.3839
                                 Non-built      –                –             –              –           0.5165                0.7599


                                                                                                                                 Road
                                          Paris 4+1                             World 4+1                              Blocks   Network
                  Non-built


                                                                                                          Network
                                                                                                           Road


                                                                                                                                           Paris
                                                                                                                                           2+1
                                                                                                              Blocks
                  Water
 Prediction

              Network


                                                                                                          Network
               Road


                                                                                                           Road


                                                                                                                                          World
                                                                                                                                          2+1
                  Blocks


                                                                                                              Blocks


                              Blocks    Road   Water Non-built       Blocks    Road   Water Non-built
                                       Network                                Network

                                                                        Ground-truth

Figure 6: Confusion matrix, normalized according to ground-truth. The diagonal corresponds to recall.


                                                                              238
 IMAGE
 PREDICTION
 GROUND-TRUTH   AVERAGE Paris 2+1   TOP50% Paris 2+1     AVERAGE World 2+1        TOP50% World 2+1


Figure 7: Examples of results. Each time, the first example is close to the median mIoU and the second is
part of the top 50%. Values of the mIoU from left to right: 0.8965, 0.9506, 0.8079, and 0.8622.


  For the World corpus, water on the contrary benefits from a relatively high precision but a
very low recall. It is heavily confused with non-built and blocks. Non-built is still sometimes
predicted as blocks, but for this dataset, the contrary is more frequent. Both blocks and road
network classes suffer from a low precision.

3.3. Semantic segmentation experiments
3.3.1. Impact of learning transfer on each class
As one can see in Fig. 8, the performance in class segmentation shows significant disparities
between both datasets. If, in general, the Parisian corpus achieves much higher performances,
the World corpus seems to be better at recognizing the non-built lands. The transfer learning
is quite successful for the World corpus, when pretrained on Paris. The water (pvalue =
0.0063 < 0.05) in particular seems to be better recognized. The transfer learning from the
World corpus to Paris is a bit less successful. However, the water class also demonstrates a
significant improvement (0.0076). Overall, the improvement is trending for the World corpus,
when pretrained on Paris (0.062).


                                                   239
                                                Frame    Road network      Blocks     Water         Non-built

                                         +60%


                                         +40%

               Relative IoU difference
                                         +20%


                                         –20%


                                         –40%


                                                World wrt. Paris        Paris pretrained on World     World pretrained on Paris
                                                                                wrt. Paris                   wrt. World

Figure 8: Per-class relative performance of regular training between World and Paris corpora (left), and
per-class relative performance of transfer learning between World and Paris corpora (middle and right).
The relative IoU is computed with regard to the median of the reference IoU. When the pretraining is not
specified, the CNN is generically pretrained on ImageNet. Each experiment was repeated 5 times, the values
are represented as boxplots.


3.3.2. Analysis of cross-cultural performances and dataset design
In total, the mIoU over the 8 experiments is 0.6112, which is noticeably better than the 0.5595
score obtained in the previous section for the same 4+1 World set. That means that the
average performance is slightly better than the performance observed on the validation set.
The mIoU can also be computed on each patch separately, which corresponds to an average
of 0.5424. Fig. 9, shows a few examples of the high disparity between the top-50% and the
bottom-50% of the World corpus, which was already noticed in the results of the previous
experiment.
   Maps that have been published by a Western country score 0.5608, while other maps score
0.4911. The region of the city represented is also impactful, with Subsaharian (0.6322) and
North African (0.6280) cities scoring best, followed closely by Eastern Europe and Central Asia
(0.5931), South America (0.5913), Western Europe (0.5891), and North America (0.5711). At
the end of the line are the South Asian cities (0.3985). In the middle, one would find the Middle
East (0.5048), East Asia (0.4896), Oceania (0.4685), and Central America (0.4635). The urban
form also has a clear impact, as cities with a more regular (0.5453) or mixed (0.5535) urban
form score better than cities with an irregular (0.5088) urban form. This performance drop
is especially noticeable on the blocks class for regular (0.3548), mixed (0.3027), and irregular
(0.2179) urban form.

3.3.3. Perspectives on confidence prediction
The average mIoU over the 10-folds is 0.6993. For this third setting, the correlation between
the obtained confidence index and the reference is 0.571 (pvalue = 1.2 ∗ 10−3 ) on the validation
set, which represents an intermediate to high dependency.


                                                                             240
 IMAGE
 PREDICTION
 GROUND-TRUTH   SUCCESSFUL        SUCCESSFUL             UNSUCCESSFUL              UNSUCCESSFUL


Figure 9: Examples of results. Each time, the first example is close to the median mIoU and the second is
part of the top 50%. Values of the mean IoU from left to right: 0.8965, 0.9506, 0.8079, and 0.8622.


3.3.4. Importance of graphical cues for learning
The removal of color had noticeably almost no impact on performance. The median loss is
only 1.3% with regard to the reference mIoU. The binarization of the values resulted in a
drop of 7.2%. Finally, the disappearance of colors and textures led to a 10.4% decrease in
performance. This experiment thus shows that even when most graphical cues are removed,
most of the performance is conserved, and therefore that neural networks may also heavily rely
on more abstract reasonings for image segmentation.


4. Discussion
As measured through the operationalization of the figuration, both corpora, Paris and the
World, present a much greater figurative diversity than the other datasets used in the literature
(Tab. 2). This is good news and validates the interest of the studied datasets. The USGS is
massively less diverse than the other datasets, which is logical since it is a digital-born map.
The representation of the different elements is therefore perfectly codified and reproduced. The
ICDAR21 corpus also shows relatively little diversity. It is composed of plates published in


                                                  241
different years, but is still based on a single printed collection, and thus on a unified grammar.
The Napoleonic cadaster for its part is famous for the high level of formalization of cartographic
grammar. However, its manual execution explains a certain residual diversity. Finally, the two
studied corpora show a similar level of figurative diversity, although Paris stems from a single
cultural pool. This is consistent with the samples taken from this corpus, which show a great
variety of grammars, a high density of information, and above all a high level of technicality,
which also allows for an important figurative diversity.
   Regarding the performances of semantic segmentation, while the Paris corpus shows excellent
results which already indicate a potential for production (mIoU 0.8905, Tab. 4), the results of
the World corpus seem globally still perfectible (mIoU 0.8055). However, the best half (top-
50%) of the World corpus presents performances very similar to the Paris corpus and may thus
also present an important automation potential (see also Fig. 9). The question therefore lies in
identifying this outperforming half in the large map collections, to open automatic vectorization
perspectives. As we demonstrated in the experiment on confidence prediction, the estimation
of a confidence index is possible and could probably solve this problem by identifying promising
maps beforehand. However, we consider that further research is still needed to maximize the
reliability of such an index and that other paradigms should be explored.
   The identification of the most promising maps can also be based on the results of cross-
cultural validation. Indeed, we observed that maps with regular urban forms, like most colonial
cities, or mixed forms, like most European capitals, are better segmented on average. This
is consistent with the results on the importance of non-graphical cues for learning, which
emphasize the crucial importance of non-visual features, such as morphology, topology, and
semantic hierarchy, in CNN performance. In general, we notice that areas urbanized more
recently, such as Africa, perform well. This is consistent with the findings on urban form. The
poorer performance of South and East Asian cities can be explained on the one hand by atypical
visual elements, from a Western point of view, by the irregularity of Indian and Islamic urban
forms, as well as, in some cases, by the poor state of preservation of the documents. The little
performance of the Oceanian maps could be explained by the relatively low representation
of the region (around 3%) in the sample. The cross-cultural validation also highlights the
greater ease of segmenting maps published in the West, although, as mentioned earlier, non-
Western maps are on average more recent. These cultural biases tend to argue for a differential
treatment of non-Western maps (e.g. Fig. 2 A.3-4, C.1-2).
   The detailed analysis of the results of semantic segmentation brings many additional and
interesting insights. First, we notice the particular case of the water class. This class was the
least represented in the training corpora (Tab. 1), in particular for Paris, where it represents
less than 3% of the surface. The proportion lies close to the road network for the World
(12%). However, that class differs little from the other classes, figuratively (r = 0.94, Tab.
3). This explains its poor final performance (IoUP aris = 0.47, IoUW orld = 0.33, Tab. 4). The
performance is higher for Paris. However, water seems to be mostly recognized by elimination of
other classes, which is expressed in poor precision (0.49), compared to recall (0.88). Moreover,
the water class is vulnerable to overfitting on the blue color, as demonstrated by the particular
case of the blueprint (Fig. 9), while the results on the importance of graphical cues for learning
show on the contrary that most of the performance relies on more abstract features. These
elements point to an imbalance of the water class in the original sample. This imbalance is not
specific to our corpus, since it is the result of geographical constraints. However, our research
provides a first solution to this limitation. Indeed, the experiment on the impact of learning
transfer on each class demonstrated that the water class could benefit from a significant transfer


                                               242
(Fig. 8, +18% for Paris, +36% for the World). The constitution of large and diversified corpora
intended for pretraining therefore seems to be justified by this example.
   The second noteworthy result is that the non-built class scores much better for the World
(0.6528), compared to Paris (0.4235), even though the World generally scores lower. For the
World, however, the figuration of this class is not really outstanding (in average rpearson =
0.924). In reality, this class benefits (and suffers) from a catch-all effect. Indeed, a third (0.33)
of the water surfaces and nearly a quarter (0.23) of the blocks surfaces are wrongly classified as
non-built, while the non-built class itself does not confuse these classes (0.03, respectively 0.12).
This catch-all effect results in a relatively high recall, compared to Paris, while the impact on
precision remains little, as this class represents a large area (0.359) of the dataset. As it can
be seen in Fig. 6, this catch-all non-built class is the main reason for the underperformance
of the World dataset. Solutions to this problem might include separating this class into two
smaller and figuratively more specific classes.
   For Paris, this same non-built class obtains lower results, though quite close to those obtained
for water, for example. This class is less of a catch-all, as the balance between confusing and
being confused is shifted. In particular, the non-built is abundantly confused with blocks
(0.33), even though the figuration between those two is the least correlated (0.7). Conversely,
blocks are also sometimes confused with the non-built (0.17). This confusion is by far the most
important reason for this poor performance. Several explanations are possible. Notably, in
Paris, this class mainly describes parks, as well as courtyards. Compared to the World, the
topology of the non-built class is less outstanding. Some parks adopt forms that are quite
similar to building blocks, especially in Paris. Moreover, the spatial juxtaposition of inner
courtyards and buildings promotes confusion, especially in this direction, since an unrecognized
inner courtyard will be classified as a non-built confused with blocks. The Parisian corpus
also suffers from poor learning transfer from the World, which may be caused by a relatively
large figuration distance between Paris and the World for this class (rpearson = 0.81). This
seems obvious, considering that the elements represented in one and the other corpus are also
semantically quite different, as explained.
   The arguments discussed above do not pretend to be exhaustive because neural networks
have many as yet unknown springs. However, they do shed some light on a few points. We
consider that the framework of the maps is ideal for this discussion on the ability of neural
networks to combine figurative and topological cues and to open up avenues of understanding
on their performances. Indeed, maps have a ”textbook” figuration, with basic textures, such
as hatching, some colors, and partly geometric morphology, squares, rectangles, trapezoids.
They are therefore easier to characterize from this point of view. These conditions are met
in very few fields of application of deep learning. In addition, for semantic segmentation, the
ability to visualize the results and understand the errors is particularly useful for interpretation.
This field therefore brings together all the elements that can help to better understand the
performance of neural networks.

4.1. Conclusion
Developing a generic pipeline for processing historical maps will be an important milestone
for massively extracting information from this rich family of cultural heritage documents.
In this research, we made progress in understanding the figurative variance of cartographic
corpora, and established a congruent metric to measure the figurative diversity of a collection
based on the acuity of the multimodal distribution of graphical features. Through several


                                                243
experiments, we have shown that neural networks are extremely robust in the face of figurative
diversity, even if some map grammars seem more diﬀicult to segment. This high performance
is mostly due to the fact that neural networks can integrate highly abstract reasoning, such
as morphology, topology, and semantic hierarchy, to supplement figurative features, without
excluding the latter. This work, and its conclusions, paves the way for the generic segmentation
of historical maps, highlighting the weaknesses of learning processes and outlining potential
levers for action.


Acknowledgments
The authors declare to have no conflicts of interest. We would like to thank our former
collaborators, Raphaël Barman and Nils Hamel, for their support on this project.


References
 [1]   M. G. Arteaga. “Historical map polygon and feature extractor”. In: Proceedings of the
       1st ACM SIGSPATIAL International Workshop on MapInteraction. MapInteract ’13.
       Orlando, Florida: Association for Computing Machinery, 2013, pp. 66–71. doi: 10.1145/
       2534931.2534932.
 [2]   S. Banda, A. Agarwal, C. R. Rao, and R. Wankar. “Contour layer extraction from colour
       topographic map by feature selection approach”. In: 2011 IEEE Symposium on Computers
       Informatics. 2011, pp. 425–430. doi: 10.1109/isci.2011.5958953.
 [3]   R. Barvir and V. Vozenilek. “Developing Versatile Graphic Map Load Metrics”. In: ISPRS
       International Journal of Geo-Information 9.12 (2020), p. 705. doi: 10.3390/ijgi9120705.
 [4]   R. Brügelmann. “Recognition of hatched cartographic patterns”. In: International Archives
       of Photogrammetry and Remote Sensing 31.B3 (1996), pp. 82–87.
 [5]   Bysyk. Github bigjpg. 2019. url: https://github.com/by-syk/bigjpg-app.
 [6]   Y. S. Can, P. J. Gerrits, and M. E. Kabadayi. “Automatic Detection of Road Types
       From the Third Military Mapping Survey of Austria-Hungary Historical Map Series
       With Deep Convolutional Neural Networks”. In: IEEE Access 9 (2021), pp. 62847–62856.
       doi: 10.1109/access.2021.3074897.
 [7]   J. Chazalon, E. Carlinet, Y. Chen, J. Perret, B. Duménieu, C. Mallet, T. Géraud, V.
       Nguyen, N. Nguyen, J. Baloun, L. Lenc, and P. Král. ICDAR 2021 Competition on
       Historical Map Segmentation. 2021. url: https://arxiv.org/abs/2105.13265.
 [8]   Y.-Y. Chiang, W. Duan, S. Leyk, J. H. Uhl, and C. A. Knoblock. Using Historical Maps
       in Scientific Studies: Applications, Challenges, and Best Practices. SpringerBriefs in
       Geography. Cham: Springer International Publishing, 2020. doi: 10.1007/978- 3- 319-
       66908-3.
 [9]   Y.-Y. Chiang and C. A. Knoblock. “A general approach for extracting road vector data
       from raster maps”. In: Ijdar 16.1 (2013), pp. 55–81. doi: 10.1007/s10032-011-0177-1.
[10]   Y.-Y. Chiang, S. Leyk, and C. A. Knoblock. “A Survey of Digital Map Processing Tech-
       niques”. In: ACM Comput. Surv. 47.1 (2014), 1:1–1:44. doi: 10.1145/2557423.


                                              244
[11]   Y.-Y. Chiang, S. Leyk, and C. A. Knoblock. “Eﬀicient and Robust Graphics Recognition
       from Historical Maps”. In: Graphics Recognition. New Trends and Challenges. Ed. by
       Y.-B. Kwon and J.-M. Ogier. Lecture Notes in Computer Science. Berlin, Heidelberg:
       Springer, 2013, pp. 25–35. doi: 10.1007/978-3-642-36824-0\_3.
[12]   A. Cordeiro and P. Pina. “Colour map object separation”. In: Remote Sensing: From
       Pixels to Processes. 2006, pp. 243–247.
[13]   N. Dalal and B. Triggs. “Histograms of Oriented Gradients for Human Detection”. In:
       vol. 1. IEEE Computer Society, 2005, pp. 886–893. doi: 10.1109/cvpr.2005.177.
[14]   J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. “ImageNet: A large-scale
       hierarchical image database”. In: IEEE Conference on Computer Vision and Pattern
       Recognition. 2009, pp. 248–255. doi: 10.1109/cvpr.2009.5206848.
[15]   D. B. Dhar and B. Chanda. “Extraction and recognition of geographical features from
       paper maps”. In: Ijdar 8.4 (2006), pp. 232–245. doi: 10.1007/s10032-005-0010-9.
[16]   H. Ernstson, S. E. van der Leeuw, C. L. Redman, D. J. Meffert, G. Davis, C. Alfsen, and
       T. Elmqvist. “Urban Transitions: On Urban Resilience and Human-Dominated Ecosys-
       tems”. In: Ambio 39.8 (2010), pp. 531–545. doi: 10.1007/s13280-010-0081-9.
[17]   A. Garcia-Molsosa, H. A. Orengo, D. Lawrence, G. Philip, K. Hopper, and C. A. Petrie.
       “Potential of deep learning segmentation for the extraction of archaeological features
       from historical map series”. In: Archaeological Prospection 28.2 (2021), pp. 187–199. doi:
       10.1002/arp.1807.
[18]   X. Glorot and Y. Bengio. “Understanding the diﬀiculty of training deep feedforward
       neural networks”. In: Proceedings of the Thirteenth International Conference on Artificial
       Intelligence and Statistics. Ed. by Y. W. Teh and M. Titterington. Vol. 9. Proceedings of
       Machine Learning Research. Chia Laguna Resort, Sardinia, Italy: Pmlr, 2010, pp. 249–
       256. url: http://proceedings.mlr.press/v9/glorot10a.html.
[19]   B. Graeff, R. Carosio, B. Graeff, and R. Carosio. “Automatic Interpretation of Raster-
       Based Topographic Maps by Means of Queries”. In: FIG XXII International Congress
       Washington, D. C., published on CD-ROM (2002).
[20]   K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition.
       2015. url: https://arxiv.org/abs/1512.03385.
[21]   M. Heitzler and L. Hurni. “Cartographic reconstruction of building footprints from his-
       torical maps: A study on the Swiss Siegfried map”. In: Transactions in GIS 24.2 (2020),
       pp. 442–461. doi: 10.1111/tgis.12610.
[22]   K. Hosseini, K. McDonough, D. van Strien, O. Vane, and D. C. S. Wilson. “Maps of
       a Nation? The Digitized Ordnance Survey for New Historical Research”. In: Journal of
       Victorian Culture 26.2 (2021), pp. 284–299. doi: 10.1093/jvcult/vcab009.
[23]   G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. “Densely Connected
       Convolutional Networks”. In: 2017 IEEE Conference on Computer Vision and Pattern
       Recognition (CVPR). 2017, pp. 2261–2269. doi: 10.1109/cvpr.2017.243.
[24]   A. Karmiloff-Smith. “Constraints on representational change: evidence from children’s
       drawing”. In: Cognition 34.1 (1990), pp. 57–83. doi: 10.1016/0010-0277(90)90031-e.


                                               245
[25]   E. Katona and G. Hudra. “An interpretation system for cadastral maps”. In: Proceedings
       10th International Conference on Image Analysis and Processing. 1999, pp. 792–797. doi:
       10.1109/iciap.1999.797692.
[26]   N. W. Kim, J. Lee, H. Lee, and J. Seo. “Accurate segmentation of land regions in historical
       cadastral maps”. In: Journal of Visual Communication and Image Representation 25.5
       (2014), pp. 1262–1274. doi: 10.1016/j.jvcir.2014.01.001.
[27]   D. H. Laboratory. dhSegment-torch. 2021. url: https : / / github . com / dhlab - epfl /
       dhSegment-torch.
[28]   R. G. Laycock, D. Drinkwater, and A. M. Day. “Exploring cultural heritage sites through
       space and time”. In: J. Comput. Cult. Herit. 1.2 (2008), 11:1–11:15. doi: 10.1145/1434763.
       1434768.
[29]   S. D. Laycock, P. G. Brown, R. G. Laycock, and A. M. Day. “Aligning archive maps
       and extracting footprints for analysis of historic urban environments”. In: Computers &
       Graphics. Virtual Reality in Brazil 35.2 (2011), pp. 242–249. doi: 10.1016/j.cag.2011.01.
       002.
[30]   S. Leyk. “Segmentation of Colour Layers in Historical Maps Based on Hierarchical Colour
       Sampling”. In: Graphics Recognition. Achievements, Challenges, and Evolution. Ed. by
       J.-M. Ogier, W. Liu, and J. Lladós. Lecture Notes in Computer Science. Berlin, Heidel-
       berg: Springer, 2010, pp. 231–241. doi: 10.1007/978-3-642-13728-0\_21.
[31]   S. Leyk and R. Boesch. “Colors of the past: color image segmentation in historical
       topographic maps based on homogeneity”. In: Geoinformatica 14.1 (2009), p. 1. doi:
       10.1007/s10707-008-0074-z.
[32]   J. Long, E. Shelhamer, and T. Darrell. “Fully Convolutional Networks for Semantic
       Segmentation”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
       Recognition. 2015, pp. 3431–3440. url: https://arxiv.org/abs/1411.4038.
[33]   C. Mello, D. Costa, and T. d. Santos. “Automatic image segmentation of old topographic
       maps and floor plans”. In: 2012 IEEE International Conference on Systems, Man, and
       Cybernetics (SMC). 2012, pp. 132–137. doi: 10.1109/icsmc.2012.6377689.
[34]   Q. Miao, P. Xu, T. Liu, Y. Yang, J. Zhang, and W. Li. “Linear Feature Separation
       From Topographic Maps Using Energy Density and the Shear Transform”. In: IEEE
       Transactions on Image Processing 22.4 (2013), pp. 1548–1558. doi: 10.1109/tip.2012.
       2233487.
[35]   T. Miyoshi, W. Li, K. Kaneda, H. Yamashita, and E. Nakamae. “Automatic extraction
       of buildings utilizing geometric features of a scanned topographic map”. In: Proceedings
       of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. Vol. 3.
       2004, 626–629 Vol.3. doi: 10.1109/icpr.2004.1334607.
[36]   S. Muhs. “Computational Delineation of Built-up Area at Urban Block Level from To-
       pographic Maps: A Contribution to Retrospective Monitoring of Urban Dynamics”. PhD
       thesis. Dresden, Germany: Technische Universität Dresden, 2019. url: https : / / nbn -
       resolving.org/urn:nbn:de:bsz:14-qucosa2-340364.


                                               246
[37]   S. Muhs, H. Herold, G. Meinel, D. Burghardt, and O. Kretschmer. “Automatic delin-
       eation of built-up area at urban block level from topographic maps”. In: Computers,
       Environment and Urban Systems 58 (2016), pp. 71–84. doi: 10.1016/j.compenvurbsys.
       2016.04.001.
[38]   S. Muhs, G. Meinel, D. Burghardt, and H. Herold. “Automatisierte Baublockabgrenzung
       in Topographischen Karten”. In: Flächennutzungsmonitoring. ... Flächennutzungsmoni-
       toring V: Methodik, Analyseergebnisse, Flächenmanagement. IÖR Schriften 61. Berlin:
       Rhombos-Verl, 2013, pp. 211–219.
[39]   J.-M. Ogier, R. Mullot, J. Labiche, and Y. Lecourtier. “Technical Map Interpretation: A
       Distributed Approach”. In: Pattern Analysis & Applications 3.2 (2000), pp. 88–103. doi:
       10.1007/pl00010983.
[40]   T. Ojala, M. Pietikainen, and T. Maenpaa. “Multiresolution gray-scale and rotation
       invariant texture classification with local binary patterns”. In: IEEE Transactions on
       Pattern Analysis and Machine Intelligence 24.7 (2002), pp. 971–987. doi: 10 . 1109 /
       tpami.2002.1017623.
[41]   S. Oliveira, I. di Lenardo, B. Tourenc, and F. Kaplan. “A deep learning approach to
       Cadastral Computing”. In: Utrecht, Netherlands, 2019. url: https : / / dev . clariah . nl /
       files/dh2019/boa/0691.html.
[42]   S. A. Oliveira, B. Seguin, and F. Kaplan. “dhSegment: A generic deep-learning approach
       for document segmentation”. In: 16th International Conference on Frontiers in Hand-
       writing Recognition (ICFHR) (2018), pp. 7–12. doi: 10.1109/icfhr-2018.2018.00011.
[43]   N. Otsu. “A Threshold Selection Method from Gray-Level Histograms”. In: IEEE Trans-
       actions on Systems, Man, and Cybernetics 9.1 (1979), pp. 62–66. doi: 10.1109/tsmc.
       1979.4310076.
[44]   R. Petitpierre. Generic Semantic Segmentation of Historical Maps - Github reposi-
       tory. 2021. url: https://github.com/RPetitpierre/Generic%5C%5FSemantic%5C%
       5FSegmentation%5C%5Fof%5C%5FHistorical%5C%5FMaps.
[45]   R. Petitpierre. Historical City Maps Semantic Segmentation Dataset. 2021. doi: 10.5281/
       zenodo.5513639.
[46]   R. Petitpierre. “Neural networks for semantic segmentation of historical city maps: Cross-
       cultural performance and the impact of figurative diversity”. In: CoRR abs/2101.12478
       (2021). url: https://arxiv.org/abs/2101.12478.
[47]   “Procédé d’impression des cartes géographiques en couleur”. In: L’Echo du monde savant,
       journal analytique des nouvelles et des cours scientifiques. 2. Paris, 1840, pp. 401–402.
       url: https://go.epfl.ch/procede%5C%5Fdimpression%5C%5F1840.
[48]   O. Ronneberger, P. Fischer, and T. Brox. “U-Net: Convolutional Networks for Biomedical
       Image Segmentation”. In: arXiv:1505.04597 [cs] (2015). url: http://arxiv.org/abs/1505.
       04597.
[49]   S. Salvatore and P. Guitton. “Contour line recognition from scanned topographic maps”.
       In: (2004). url: http://dspace5.zcu.cz/handle/11025/1744.
[50]   A. Savitzky and M. J. E. Golay. “Smoothing and Differentiation of Data by Simplified
       Least Squares Procedures.” In: Anal. Chem. 36.8 (1964), pp. 1627–1639. doi: 10.1021/
       ac60214a047.


                                               247
[51]   D. Schemala. “Semantische Segmentierung historischer topographischer Karten”. PhD
       thesis. Dresden, Germany: Technische Universität Dresden, 2016.
[52]   G. Touya, B. Decherf, M. Lalanne, and M. Dumont. “Comparing image-based methods
       for assessing visual clutter in generalized maps”. In: ISPRS Annals of the Photogramme-
       try, Remote Sensing and Spatial Information Sciences Ii-3/w5 (2015), pp. 227–233. doi:
       10.5194/isprsannals-II-3-W5-227-2015.
[53]   J. H. Uhl, S. Leyk, Y.-Y. Chiang, W. Duan, and C. A. Knoblock. “Automated Extraction
       of Human Settlement Patterns From Historical Topographic Map Series Using Weakly
       Supervised Convolutional Neural Networks”. In: IEEE Access 8 (2020), pp. 6978–6996.
       doi: 10.1109/access.2019.2963213.
[54]   J. H. Uhl, S. Leyk, Y.-Y. Chiang, W. Duan, and C. A. Knoblock. “Map Archive Mining:
       Visual-Analytical Approaches to Explore Large Historical Map Collections”. In: ISPRS
       International Journal of Geo-Information 7.4 (2018), p. 148. doi: 10.3390/ijgi7040148.
[55]   J.-M. Viglino and M. Pierrot-Deseilligny. “A vector approach for automatic interpreta-
       tion of the French cadastral map”. In: 7th International Conference on Document Analysis
       and Recognition. Proceedings. 2003, 304–308 vol.1. doi: 10.1109/icdar.2003.1227678.
[56]   J. Wu, P. Wei, X. Yuan, Z. Shu, Y.-Y. Chiang, Z. Fu, and M. Deng. “A New Gabor Filter-
       Based Method for Automatic Recognition of Hatched Residential Areas”. In: IEEE Access
       7 (2019), pp. 40649–40662. doi: 10.1109/access.2019.2907114.
[57]   H. Yamada, K. Yamamoto, and K. Hosokawa. “Directional mathematical morphology
       and reformalized Hough transformation for the analysis of topographic maps”. In: IEEE
       Transactions on Pattern Analysis and Machine Intelligence 15.4 (1993), pp. 380–387.
       doi: 10.1109/34.206957.


                                              248