<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>VIPERC</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Searching for cultural relationships through deep learning models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lorenzo Stacchio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessia Angeli</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Lisanti</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gustavo Marfia</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Bologna, Department for Life Quality Studies</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Bologna, Department of the Arts</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Bologna,Department of Computer Science and Engineering</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>1</volume>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Family album photo collections may reveal historical insights regarding specific cultures and times. In most cases, such photos are scattered among private homes and only available on paper or photographic iflm, thus making their analysis very cumbersome. Their study may also become dificult because of the number of photos that such collections contain. It would be exceedingly long to manually verify the characteristics of more than a few hundred photos, considering that often no associated descriptions are available. This work falls in the described domain, addressing the problem of dating an image resorting to the analysis of an analog family album photo dataset, namely IMAGO, containing photos shot in the 20th century. Thanks to the IMAGO dataset, it was possible to apply diferent deep learningbased architectures to date images belonging to photo albums without needing any other sources of information. In addition, with the implementation of cross-dataset experiments, which also involved models previously presented in the literature, it was possible to observe temporal shifts which may be due to known intercultural influences. Concluding, deep learning models revealed their potential not only in terms of their performance but also in terms of their possible applications to intercultural research.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;family album</kwd>
        <kwd>analog photographs</kwd>
        <kwd>date estimation</kwd>
        <kwd>intercultural influences</kwd>
        <kwd>deep learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Family albums represent an example of vernacular photography that has drawn the attention
of researchers and public institutions. Scholars from diferent fields agree in identifying such
collections as capable of capturing salient features regarding the evolution of local communities
in space and time. However, contributions in this field usually base their findings on the study of
small corpora of photos [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], since a large-scale analysis is often impeded as they are too many
to be processed manually. Many research initiatives have addressed the problem of processing
and analyzing digital images. It is more dificult to find initiatives focused on analog ones,
mainly because printed images are scattered in numerous public and private collections, of
variable quality, and worn out due to their prolonged use in time. In essence, any analysis
employing image processing and computer vision algorithms requires the time-consuming and
potentially degrading initial digitization step. Despite the complications and challenges brought
on by analog photographs, they represent an unparalleled source of information regarding the
recent past [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. The diferent clothes that people wear, their haircut styles, the tools and
machinery, the natural landscape, the overall environment, etc., may exhibit the culture of a
given time and place. All of these visual features may amount to important cues to estimate the
shooting year [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This work addresses the problem of dating an image, exploiting the IMAGO
collection of family album photos, started in the year 2004 at the University of Bologna [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Such collection contains digitized versions of analog prints with specific characteristics. Each
image portrays at least one person, and the lion’s share of such photos has been shot in a given
area of Italy by Italian citizens.
      </p>
      <p>
        We here perform a dating analysis of the IMAGO collection, exploiting diferent deep
learningbased architectures, without using any other source of information. Diferently from [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], we
here perform a more thorough analysis, comparing diferent Convolutional Neural Network
(CNN) architectures for the dating task; we then trained a model which combines diferent
salient image regions together to estimate the date; finally, we also attempt to verify possible
intercultural influences (i.e., the adoption of diferent customs and habits in diferent epochs
and countries) by analysing the diferences in dating, resulting from a cross-dataset experiment,
in which we employ the datasets from [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ].
      </p>
      <p>The rest of the paper is organized as follows: in Section 2 we review the state of the art
that falls closest to this contribution. Section 3 describes the considered dataset, along with
its pre-processing and splitting. Sections 4 and 5 present and validate several deep neural
networks models applied to the proposed dataset. In Section 6, we report and discuss
crossdataset experiments from an intercultural influence perspective. Finally, in Section 7 an overall
discussion is carried out, along with possible future works.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Only a few works have proposed so far the dating of collections of vernacular photographs,
also taking into account analog ones [
        <xref ref-type="bibr" rid="ref10 ref5 ref7 ref8 ref9">9, 7, 8, 10, 5</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] the authors employed a deep learning
approach to analyze and date 37,921 historical frontal-facing American high school yearbook
photos taken from 1928 to 2010. Here, a CNN architecture was trained to analyze people’s
faces and predict the year in which a photo was taken. In addition, the authors observed a
gender-dependency in the performance of dating models. Along the same line, the authors of [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
presented a dataset containing images of students taken from high school yearbooks, covering
the 1950 to 2014 time span (considering 1,400 photos per year). They also resorted to CNNs to
estimate the date of an image, to evaluate the quality of color vs. grayscale, considering the
following features: faces, torsos (i.e., people’s upper bodies including faces), and random regions
of images. The best performance was obtained with the torsos of people. In addition, their results
provide cues that human appearance is related to time. In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], instead, dating was implemented
through the analysis of images belonging to the years 1930 through 1999. Vernacular and
landscape photos were considered, including at most than 25,000 pictures per year. The authors
proposed diferent baselines relying on CNNs, using regression and classification approaches.
Diferently, in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the authors formulated the date estimation task as an image-retrieval one
where, given a query, the retrieved images are ranked in terms of date similarity. For their study,
they analyzed the same public dataset employed in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>The contributions so far presented focused on the dating of vernacular photographs shot
in heterogeneous settings (e.g., landscapes, portraits). To the best of our knowledge: (i) none
has considered a dataset solely containing analog pictures depicting at least one person and
belonging to 20th-century family albums, and (ii) no other works have also considered a
crossdataset and intercultural perspective when approaching the dating task.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset, pre-processing and splitting</title>
      <p>
        The IMAGO collection was introduced in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]1. It represents a digital collection of Italian analog
family album photos composed by 16,642 labeled images taken between 1845 and 2009, to focus
on image dating. Fig. 1 reports the number of labeled images available per year in the 1930 to
1999 time frame, exhibiting the unbalance in terms of the number of photos per year (most fall
between 1950 and 1980).
      </p>
      <p>The overall available images in this interval amount to 15,673. Out of such time intervals, the
number of available images is too little to be considered. Fig. 2, shows four exemplar images
from the IMAGO dataset, belonging to diferent decades. Here, it is possible to appreciate what
characterizes each photo (e.g., number of people, clothing, colors, and location), highlighting
one of the main ones, i.e., each portrays at least one person.</p>
      <p>
        The pre-processing phase carried out on the IMAGO dataset aimed at isolating the regions
of interest which could enhance the performance of the deep learning models (more details in
1The IMAGO dataset is available upon request.
Sections 4). Following insight from [
        <xref ref-type="bibr" rid="ref7 ref8">8, 7</xref>
        ] we extracted from each image of the IMAGO dataset,
referred to as FULL-IMAGES, all the faces and full figure crops of the people portrayed, gathered
in FACES and PEOPLE sets. Important to note that such patches are always present since
we are dealing with photos that always include at least one person. In particular, for FACES
and PEOPLE images, we processed each image of the IMAGO dataset using an open-source
implementation of YOLO-FACE [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and YOLO [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], respectively. Then, the FACES and the
PEOPLE images have been constructed accounting for the number of people portrayed in a
photo. Indeed, adopting a fixed size bounding box may result in the possible loss of pixels
related to the faces or people’s full figures. For this reason, we rescale the provided bounding
boxes used to crop a face/people depending on the number of people portrayed in a photo,
i.e., the greater the number of people, the smaller the bounding box. Fig. 3 shows an IMAGO
full-image sample with the respective crops taken from FACES and PEOPLE.
full-image
face crop
person crop
      </p>
      <p>
        It is possible to appreciate that PEOPLE images include details that are not present in FACES
ones (e.g., the clothing of a person). Finally, we verified the utility of performing denoising [
        <xref ref-type="bibr" rid="ref13 ref14">13,
14</xref>
        ] and super resolution [15, 16] operations, as all the images derive from scans of analog prints.
Nevertheless, since the overall improvement obtained adopting such strategies were revealed to
be negligible, we hence opted for an analysis based on the original scans.
      </p>
      <p>The FULL-IMAGES dataset has been then partitioned into three subsets of pictures: 80% for
training and 20% for testing. In addition, 10% of the training set is used as a validation subset.
In particular, for each image in the train, validation, and test sets of IMAGO, the faces and the
people there portrayed are extracted and added to the corresponding FACES and PEOPLE sets,
respectively. This process guarantees that no faces or people crops from the validation or test
sets are observed during the training phase.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Model architectures and training settings</title>
      <p>In this work, we considered single and multi-input deep learning architectures. The former
analyzed the FULL-IMAGES, FACES, and PEOPLE images in isolation, while, the latter, instead,
their combination. More in detail, we employed three well-known CNN architectures
pretrained on ImageNet [17]: ResNet-50 [18], InceptionV3 [19] and DenseNet121 [20]. Each model
was modified replacing the top-level classifier with a new classification layer, whose structure
depends on the number of output classes, with randomly initialized weights. In addition, the
pre-trained convolutional layers were fine-tuned with the given input data. For what regards
the single-input classifiers, one has been trained per each set of images, and named following
the analyzed patches: full-image, faces and people. For the FACES and PEOPLE images we
evaluated the accuracy, not for a single face or person, but agglomerating the activations for all
of those who appeared in a picture. This means that if a picture contained  persons, the final
prediction would obtained by passing to the softmax function the average of the activations
coming from each face or person in that image.</p>
      <p>For the multi-input classifiers, instead, we defined the Merged model which combines together
the single-input classifiers introduced before, with the aim not only to exploit diferent sources
of information but also to learn how. Hence, a new training session was carried out as the newly
introduced network was asked to learn how to perform such a combination. In particular, the
pre-trained single-input classifiers were employed, but the classification layer was removed,
preserving the CNN backbone as feature extractors. Adopting such architecture, the cardinality
of the diferent extracted feature vectors depends on the number of faces/people portrayed in
an image, and the average of such feature vectors was computed to combine them with the
vector obtained from the full image. As a picture could contain more than one person, multiple
FACES and PEOPLE images could stem from a single one in FULL-IMAGES. The three resulting
feature vectors were linearly combined employing a weighted sum, whose weights were a set
of three real scalars learned during the training phase. The final vector, resulting from the
linear combination, is fed to a fully connected layer with a softmax activation, yielding the final
probability vector used for the classification.</p>
      <p>Moving to the training settings, we applied for all the considered patches random cropping,
and horizontal flipping. Each model was fine-tuned using a weighted cross-entropy loss and an
Adam optimizer with a learning rate of 1 − 4 and a weight decay of 5 − 4. We set the batch
size to 32 for the training on the full-images classifier and to 64 for faces and people.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental Results</title>
      <p>
        The results are expressed in terms of time distances, as in [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. The time distance defines
the tolerance accepted in predictions concerning the actual year. For example, if a photo was
labeled with the year 1942 and the model returned 1937 (or even 1947) this would be considered
correct for if the time distance is set to be equal or greater than 5, otherwise it represent an
error. In this work, model accuracies were computed considering temporal distances of 0, 5,
and 10 years. The results are reported in Table 1.
      </p>
      <p>It is possible to appreciate that diferent baseline models (i.e., ResNet-50, InceptionV3,
time distance
d = 0
d = 5
d = 10
d = 0
d = 5
d = 10
d = 0
d = 5
d = 10
time distance
time distance
time distance
d = 0
d = 5
d = 10
11.31
62.56
82.54
15.01
58.09
78.39
15.77
62.40
82.47
18.71
67.59
86.17
full-image
10.45
61.38
82.82</p>
      <p>faces
14.60
56.95
78.46
12.56
60.04
81.39
people
Merged
17.14
67.56
86.30
10.68
60.77
82.47
12.91
57.81
79.70
13.99
59.69
81.42
16.22
66.67
86.07
DenseNet121) return similar accuracies. In addition, Table 1 exhibits diferent accuracies when
diferent single input classifiers, hence diferent image patches, are considered. In particular,
the faces and the people classifiers slightly outperform the full-image one. These results could
be firstly explained by the model averaging obtained from the ensembling of multiple regions
when FACES and PEOPLE images are considered, as the use of more data allows controlling the
uncertainty and reducing the prediction error [21]. These results may also be due to the fact
that each model exploits diferent salient cues from people’s appearance (e.g., dresses, hairstyle,
earrings, trousers). When comparing the results of the diferent approaches, the multi-input
model (Merged) improves compared to the single-input classifiers. In this case, the performance
improvement can be explained by both the ensembling of multiple regions and the fact that the
Merged model has learned to fuse the features from diferent classifiers.</p>
      <p>In the analyses that follow, the ResNet-50 was selected as reference backbone for the models,
since it provided the best trade-of between accuracy and model dimension [22].</p>
      <p>
        To efectively estimate the value, in terms of prediction performance and, in particular, the
comparison between the power of human (e.g., faces and people) vs. non-human features in
image dating, we also considered random-patches. To study the possible use of non-human
features we created a set of images called RANDOM, comprising eight randomly cropped
regions, of 128× 128 pixels, from each image belonging to FULL-IMAGES. Other window sizes
were also tested but returned a lower performance. On top of this set of images, we fine-tuned
an additional ResNet-50 to study its performance against the other models. The evaluation
followed the same protocol already described for the faces and people classifiers in Section 4.
The accuracies obtained with the single-input random classifier are 11.64 for time-distance
equal to 0 (d = 0), 54.26 for d = 5 and 76.12 for d = 10. It is interesting to observe that, as
also exhibited by faces and people classifiers, the random one achieved a slightly higher score
with respect to the full-image classifier, considering a time distance equal to 0. However, it
exhibited a lower accuracy than all the other classifiers with greater time distances. Even taking
into consideration the averaging efect, this diference in performance between the random
and the other classifiers may be caused by the diferent learned visual characteristics of given
time-slices. This said, and considering that the time distance normally adopted in historical
analyses is ± 5 years, as reported in literature [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], we did not consider the RANDOM images
and the random classifier in the rest of our study.
      </p>
      <p>We also investigated which cues led the trained models to determine the specific year of a
picture. We applied the Grad-Cam algorithm [23] to delimit the areas exploited by the deep
learning models to perform the classification. In Fig. 4 are reported the grad-cam results for
some correctly classified images.</p>
      <p>In particular, each row corresponds to a specific decade and includes the grad-cam of an
IMAGO full-image, and the two corresponding FACES and PEOPLE images, respectively. It is
possible to see that the single-input classifiers focused on diferent regions. This may support the
increased accuracy obtained in the multi-input model: diferent single-input classifier exploits
diferent features. From a historical perspective, these visual results may be exploited to verify
whether the highlighted cues correspond to visual factors which are recognized as representative
for a specific period.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Cross-dataset experiments: evidences of intercultural influences?</title>
      <p>
        To study the efects of possible intercultural influences (i.e., the adoption of diferent customs
and habits in diferent epochs and countries) we carried out a cross-dataset study. We considered
the datasets reported in [
        <xref ref-type="bibr" rid="ref10 ref5 ref7 ref8">7, 8, 10, 5</xref>
        ]. While [
        <xref ref-type="bibr" rid="ref10 ref5">10, 5</xref>
        ] included vernacular photos in heterogeneous
settings and countries, where often no people are portrayed, [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] analyzed American datasets
comprising people’s faces and torsos. Although such datasets do not include family album
photos, they share some common traits with IMAGO: people in pictures are often in pose and
dressed for a specific occasion. In particular, it is possible to extract what characterizes all
of them: people’s faces and torsos. This allowed us to perform a cross-dataset comparison
considering the models trained on the IMAGO-FACES and PEOPLE patches and the models
trained to exploit the datasets introduced in [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], switching the considered evaluating datasets.
To do this, we firstly fine-tuned the architectures used in [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] following the procedures
described in their experimental sections. The dataset introduced in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] considers people’s faces,
while the one introduced in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] ofers both people’s faces and torsos. Then, we evaluated these
models on the IMAGO dataset. Vice versa, the faces and people classifiers, presented in this
work, have been evaluated on the corresponding regions ofered in the datasets from [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. For
a fair evaluation, the experiments were carried out on the 1930-1999 time-span for the [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] vs.
IMAGO comparison, while considering the 1950-1999 for the [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] vs. IMAGO one, respectively.
In particular, we collected the error between the predicted and the actual year per each picture.
The error distributions are reported in Fig. 5 for the cross-dataset experiments involving faces
images.
      </p>
      <p>
        In particular, in Figs. 5a, 5c the error distributions shifted towards positive values, while,
in Figs. 5b, 5d towards negative ones. The models built on top of American datasets [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]
applied to IMAGO-FACES tend to overestimate the image shooting year while the opposite
phenomenon (underestimation) occurs when the model presented in this work is applied to [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
and [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The same phenomenon appeared considering people’s torsos. This fact could be due
to diferent reasons. The images contained within the considered datasets have been acquired
(a) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] on IMAGO-FACES
(b) IMAGO-FACES on [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
(c) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] on IMAGO-FACES
(d) IMAGO-FACES on [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
from diferent places and locations, using diferent cameras and scanning devices, leading to
what is defined as the problem of dataset shift. However, there is another dimension to consider:
the efect of the intercultural influences. Indeed, during the second half of the 1900 people’s
appearance from USA and Italy were influenced by each other [ 24, 25]. Finally, the obtained
results, even if not confirmatory, provide us clues about possible intercultural influences: the
model trained with Italian pictures underestimates the American ones while the model trained
with Americans overestimates the Italian ones. These results are not final but certainly motivate
further investigations on this topic.
      </p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions and future works</title>
      <p>
        In this work, we analyzed the problem of image dating exploiting the IMAGO dataset, a collection
composed of analog prints belonging to family albums and shot during the 20th century. We
trained and tested single and multi-input deep learning models exploiting diferent regions
of a given photo to identify its shooting year. We adopted these models to search for cues of
intercultural influences through cross-dataset experiments. We evaluated the models trained on
IMAGO-FACES images and the classifiers trained on the datasets exposed in [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], following a
cross-dataset configuration. The dating error distributions exhibited an interesting symmetry
that motivates further experiments. This work may benefit from the use of larger and more
balanced amounts of data and a deeper analysis of the diferent IMAGO image regions. We could
also resort to diferent sources of historical information (e.g., journals, archival documents) to
multimodally approach the dating problem, mimicking, even more, the process that is usually
carried out by historians in their analyses.
      </p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work was supported by the University of Bologna with the Alma Attrezzature 2017 grant
and by AEFFE S.p.a. and the Golinelli Foundation with the funding of two Ph.D. scholarships.
Computing Machinery, New York, NY, USA, 2007, p. 1–es. URL: https://doi.org/10.1145/
1281500.1281602. doi:10.1145/1281500.1281602.
[15] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, C. C. Loy, Y. Qiao, X. Tang, Esrgan: Enhanced
super-resolution generative adversarial networks, 2018. arXiv:1809.00219.
[16] K. Zhang, Image restoration toolbox, https://github.com/cszn/KAIR, 2019.
[17] J. Deng, W. Dong, R. Socher, L. Li, Kai Li, Li Fei-Fei, Imagenet: A large-scale hierarchical
image database, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition,
2009, pp. 248–255. doi:10.1109/CVPR.2009.5206848.
[18] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, 2015.</p>
      <p>arXiv:1512.03385.
[19] C. Szegedy, V. Vanhoucke, S. Iofe, J. Shlens, Z. Wojna, Rethinking the inception architecture
for computer vision, 2015. arXiv:1512.00567.
[20] G. Huang, Z. Liu, L. van der Maaten, K. Q. Weinberger, Densely connected convolutional
networks, 2018. arXiv:1608.06993.
[21] C. M. Bishop, Pattern recognition and machine learning, springer, 2006.
[22] C. Coleman, D. Kang, D. Narayanan, L. Nardi, T. Zhao, J. Zhang, P. Bailis, K. Olukotun, C. Re,
M. Zaharia, Analysis of dawnbench, a time-to-accuracy machine learning performance
benchmark, 2019. arXiv:1806.01427.
[23] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-cam: Visual
explanations from deep networks via gradient-based localization, International Journal of
Computer Vision 128 (2019) 336–359. URL: http://dx.doi.org/10.1007/s11263-019-01228-7.
doi:10.1007/s11263-019-01228-7.
[24] S. Gundle, M. Guani, L’americanizzazione del quotidiano. televisione e consumismo
nell’italia degli anni cinquanta, Quaderni storici (1986) 561–594.
[25] W. post, How america became italian, https://www.washingtonpost.com/opinions/
how-america-became-italian/2015/10/09/4c93b1be-6ddd-11e5-9bfe-e59f5e244f92_story.
html?utm_term=.5a515dec12c5&amp;noredirect=on, 2022.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sandbye</surname>
          </string-name>
          ,
          <article-title>Looking at the family photo album: a resumed theoretical discussion of why and how</article-title>
          ,
          <source>Journal of Aesthetics &amp; Culture</source>
          <volume>6</volume>
          (
          <year>2014</year>
          )
          <fpage>25419</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Calanca</surname>
          </string-name>
          ,
          <article-title>Italians posing between public and private. theories and practices of social heritage</article-title>
          ,
          <source>Almatourism-Journal of Tourism, Culture and Territorial Development</source>
          <volume>2</volume>
          (
          <year>2011</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Mitman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wilder</surname>
          </string-name>
          ,
          <article-title>Documenting the world: film, photography, and the scientific record</article-title>
          , University of Chicago Press,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4] MoMA, Vernacular photography,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Riba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ramos-Terrades</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lladós</surname>
          </string-name>
          ,
          <article-title>Date estimation in the wild of scanned historical photos: An image retrieval approach</article-title>
          ,
          <source>in: International Conference on Document Analysis and Recognition</source>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>306</fpage>
          -
          <lpage>320</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Stacchio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Angeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lisanti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calanca</surname>
          </string-name>
          , G. Marfia,
          <article-title>Towards a holistic approach to the socio-historical analysis of vernacular photos</article-title>
          ,
          <source>ACM Transactions on Multimedia Computing</source>
          , Communications, and
          <string-name>
            <surname>Applications</surname>
          </string-name>
          (TOMM) (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ginosar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rakelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sachs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Efros</surname>
          </string-name>
          ,
          <article-title>A century of portraits: A visual historical record of american high school yearbooks</article-title>
          ,
          <source>in: Proceedings of the IEEE International Conference on Computer Vision Workshops</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Salem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Workman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Jacobs</surname>
          </string-name>
          ,
          <article-title>Analyzing human appearance as a cue for dating images</article-title>
          ,
          <source>in: 2016 IEEE Winter Conference on Applications of Computer Vision</source>
          (WACV), IEEE,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Fernando</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Muselet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Khan</surname>
          </string-name>
          , T. Tuytelaars,
          <article-title>Color features for dating historical color images</article-title>
          ,
          <source>in: 2014 IEEE International Conference on Image Processing (ICIP)</source>
          , IEEE,
          <year>2014</year>
          , pp.
          <fpage>2589</fpage>
          -
          <lpage>2593</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Springstein</surname>
          </string-name>
          , R. Ewerth, “
          <article-title>When was this picture taken?”-image date estimation in the wild</article-title>
          ,
          <source>in: European Conference on Information Retrieval</source>
          , Springer,
          <year>2017</year>
          , pp.
          <fpage>619</fpage>
          -
          <lpage>625</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Thanh</surname>
            <given-names>Nguyen</given-names>
          </string-name>
          , Yolo face implementation, https://github.com/sthanhng/yoloface,
          <year>2018</year>
          .
          <source>Online; accessed 3 August</source>
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Joseph</surname>
            <given-names>Redmon</given-names>
          </string-name>
          , YOLO: Real Time Object Detection, https://github.com/pjreddie/darknet/ wiki/YOLO:-
          <string-name>
            <surname>Real-Time-</surname>
          </string-name>
          Object-Detection,
          <year>2019</year>
          .
          <source>Online; accessed 3 August</source>
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Zhang</surname>
            , Kai, Zuo, Wangmeng,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , Ffdnet:
          <article-title>Toward a fast and flexible solution for cnn-based image denoising</article-title>
          ,
          <source>IEEE Transactions on Image Processing</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14] S. Paris, P. Kornprobst,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tumblin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Durand</surname>
          </string-name>
          ,
          <article-title>A gentle introduction to bilateral filtering and its applications, in: ACM SIGGRAPH 2007 Courses</article-title>
          , SIGGRAPH '07, Association for
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>