<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Blind Dates: Examining the Expression of Temporality in Historical Photographs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexandra Barancová</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Melvin Wevers</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nanne van Noord</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Humanities, Amsterdam School of Historical Studies, University of Amsterdam</institution>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Humanities, Media Studies, University of Amsterdam</institution>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Faculty of Science, Informatics Institute, University of Amsterdam</institution>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
      </contrib-group>
      <fpage>490</fpage>
      <lpage>499</lpage>
      <abstract>
        <p>This paper explores the capacity of computer vision models to discern temporal information in visual content, focusing speci昀椀cally on historical photographs. We investigate the dating of images using OpenCLIP, an open-source implementation of CLIP, a multi-modal language and vision model. Our experiment consists of three steps: zero-shot classi昀椀cation, 昀椀ne-tuning, and analysis of visual content. We use the De Boer Scene Detection dataset, containing 39,866 gray-scale historical press photographs from 1950 to 1999. The results show that zero-shot classi昀椀cation is relatively ine昀ective for image dating, with a bias towards predicting dates in the past. Fine-tuning OpenCLIP with a logistic classi昀椀er improves performance and eliminates the bias. Additionally, our analysis reveals that images featuring buses, cars, cats, dogs, and people are more accurately dated, suggesting the presence of temporal markers. The study highlights the potential of machine learning models like OpenCLIP in dating images and emphasizes the importance of 昀椀ne-tuning for accurate temporal analysis. Future research should explore the application of these 昀椀ndings to color photographs and diverse datasets.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Image dating</kwd>
        <kwd>Computer vision</kwd>
        <kwd>Temporal analysis</kwd>
        <kwd>Historical photographs</kwd>
        <kwd>OpenCLIP</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Time plays a crucial role in shaping our understanding and interpretation of the world around
us. Our perception of duration, the sequence of our memories, and the authority with which
historical records organize the past all contribute to our lived experiences and memory. This
perception extends to our interpretation of visual content, where an image’s materiality,
content, and style can convey critical temporal information. Despite its signi昀椀cance, this aspect
of image understanding remains underexplored in arti昀椀cial intelligence research [
        <xref ref-type="bibr" rid="ref16 ref18">18, 16</xref>
        ].
      </p>
      <p>
        AI models, typically trained on limited data periods, possess a narrow understanding of
temporality due to their lack of ‘awareness’ of historical variations. Although e昀orts have been
made to integrate historical data into language models1[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and even encode time explicitly [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ],
these methods primarily focus on text, leaving visual content interpretation largely uncharted
territory. In this paper, we experiment with the task of ‘dating’ images, predicting when an
image was taken based on its visual content1. We examine how di昀erent image aspects
in氀昀uence a multimodal AI model’s predictive accuracy, uncover structural biases in pre-trained
computer vision models, and explore their e昀ects on predictions. Our research aims to extend
our understanding of the visual representation of time and its in昀氀uence on image
interpretation. This experiment is situated within a broader goal of developing more temporally-aware
computer vision and multimodal models. For this, cross-pollination between AI and
humanities scholarship on cultural heritage, archiving, and temporality will be needed; as4[] show,
interdisciplinary work in this area has been limited, yet it has the potential to be mutually
bene昀椀cial.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        The challenge of automating dating has been addressed across a variety of historical objects,
spanning from photographs [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and artworks [
        <xref ref-type="bibr" rid="ref13 ref8">13, 8</xref>
        ] to archaeological sites [
        <xref ref-type="bibr" rid="ref25 ref9">9, 25</xref>
        ]. With the
increasing digitization of historical documents, many of which lack publication dates,
computational methods have been employed to estimate their creation dates, primarily analyzing
writing styles [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], focusing on both the text and the visual content of the writing. The
automatic dating of historical photographs o昀ers substantial value to archives and museums, but
also domains like temporal forensic analysis, where dating can serve as evidence. Forensic
applications typically concentrate on an image’s material aspects, using techniques that identify
speci昀椀c camera models or devices [
        <xref ref-type="bibr" rid="ref1 ref12">12, 1</xref>
        ]. While such methods may be overly meticulous for
large-scale dating of historical images, they underscore the importance of material information
in establishing an image’s capture date.
      </p>
      <p>
        Beyond material aspects, others show that low-level image features like RGB color
derivatives and color angles carry temporal information. Models trained on these features o昀琀en
surpass human accuracy in dating photographs [
        <xref ref-type="bibr" rid="ref18 ref3">18, 3</xref>
        ]. Research in this domain has seen the
adoption of neural networks for dating photographs, treating it as an ordinal1[0], regression [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ],
classi昀椀cation [
        <xref ref-type="bibr" rid="ref21 ref24 ref5">21, 5, 24</xref>
        ], or retrieval task [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Studies have also started to pay attention to
image content, emphasizing the connection between time and visual elements or semantic cues.
Research indicates that temporal cues can be derived from human appearance features, such
as clothing, hairstyles, and glasses [
        <xref ref-type="bibr" rid="ref21 ref5">21, 5</xref>
        ], or even from architectural elements like windows
to estimate the age of buildings [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. A recent study on a family photo album dataset found
that the accuracy of the model used for the dating task improved as the number of faces and/or
people in a photograph increased [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] – this suggests that certain high-level image features
and visual elements may carry more temporal information than others. Finally, 2[] recently
turned to generative approaches to synthesize portrait images for speci昀椀c decades between
1880 and the present day to distinguish visual markers for these periods.
      </p>
      <p>As we transition our focus from photograph materiality to content, an essential challenge
arises: deciphering how models interpret higher-level input features to predict dates. This
exploration aims to yield deeper insights into the ways in which temporality is encoded in
1All code used for this study is available on GitHub:https://github.com/CANAL-amsterdam/dating-images/
visual content and how we can enhance computer vision systems’ ability to interpret cultural
artifacts.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Image Dating</title>
      <p>
        We use OpenCLIP [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the open source implementation of CLIP [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], to predict when a
photograph was taken. CLIP is a multi-modal language and vision model that has been shown to
have a strong zero-shot capability on diverse vision tasks 1[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and to outperform a number
of domain-speci昀椀c models on various vision and language tasks following task-speci昀椀c
昀椀netuning [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Our interest lies in understanding how visual features, particularly objects, are
leveraged for dating purposes, while also evaluating the model’s aptitude for the dating task.
Among the various models that have been used for dating images, we have not yet seen
experiments with large models that have zero-shot capabilities. Experimenting with the potential of
such models is interesting due to the lesser need for training data, their broader generalizability
and the possibility to examine a multimodal, based on textual and visual data, perspective on
tasks like dating.
      </p>
      <p>
        Data Collections of press photographs are available with relatively reliable dates, making
them well-suited for examining the visual representation of time. For this experiment, we have
chosen to use the De Boer Scene Detection dataset, which contains 52,160 digitized historical
press photographs from the De Boer newspaper agency spanning from 1945 to 2005.2 The
images are scanned and cropped photo negatives, the vast majority of which are gray-scale.2[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
The dataset contains relatively mundane photographs, rather than iconic ones, as well as a wide
variety of di昀erent scenes ranging from sporting events to landscapes [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]; this makes it an
interesting case for exploring the visual elements that carry temporal information. Besides the
exact year, each image has a label describing the depicted scene. We excluded images lacking
date information and those taken before 1950 and a昀琀er 1999, resulting in 39,866 photographs.
These cut-o昀 points were chosen to include data spanning complete decades in our analysis.
Figure 1 illustrates the dataset distribution per year. We split the dataset in a train and test
set using strati昀椀ed sampling based on the year, with the aim of reducing uneven distributions
across the splits to prevent biases, in 80% and 20% respectively.
      </p>
      <p>
        We structure our experiment in three steps: examining zero-shot classi昀椀cation capabilities,
椀昀ne-tuning the model, and assessing the impact of visual content on the model’s dating ability.
Zero-shot Classification. To investigate to what extent OpenCLIP can be used for dating
we apply zero-shot classi昀椀cation to the test set to predict the photograph’s date. This process
uses the prompt ‘a photograph from the year  ’ where  ranges from 1950 up to 2000. We
employ Mean Absolute Error (MAE) for performance evaluation, following 3[
        <xref ref-type="bibr" rid="ref15">, 15</xref>
        ]. We 昀椀nd
an MAE of 15.8, which indicates relatively poor performance considering both the 50 year
range of our dataset and the comparison to results others have demonstrated on the dating
task — [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], for example, attained MAE values of up to 7.12 and 7.48 respectively in
their experiments with the Date Estimation in the Wild (DEW) dataset that covers the period
2The dataset is available at https://zenodo.org/record/7137452
1930-1999. Additionally, [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] showed that human participants had an MAE of 10.9 on the DEW
dataset and as such were on average about 11 years o昀 in their predictions. Comparatively, the
almost 16 years that the zero-shot model achieves is quite poor.
      </p>
      <p>The error rate distribution reveals a preference in the zero-shot model for predicting dates
earlier than the actual date. Suspecting a correlation with the images’ gray-scale nature, we
tested the zero-shot classi昀椀cation with a colorized version of our dataset3, resulting in the
error distribution leaning slightly more toward the future (see Figure2a).4 Figure 3 shows two
sample images for which colorization had a large e昀ect on decreasing date prediction error.
One depicts an outdoor view of a church, the other a group of people in formal wear. Both
images show a large prediction error in the gray-scale variant, albeit in di昀erent directions, i.e.
overestimate or underestimating the actual date. We 昀椀nd that colorizing the images improves
the overall zero-shot capabilities of OpenCLIP, however, with an MAE of 13.2 it is still relatively
ine昀ective for the dating task.</p>
      <p>Fine-tuned Classifier. To overcome the zero-shot limitations we explore whether
昀椀netuning OpenCLIP improves performance on the dating task and eliminates the bias found in
the initial experiment. To this end, we train a logistic classi昀椀er using the OpenCLIP image
embeddings. Training a logistic classi昀椀er also allows us to focus solely on the temporal
information in the visual content, removing possible confounding temporal bias in the model from
text, through prompting. When using textual prompts for zero-shot classi昀椀cation, the words
used might be better suited for speci昀椀c historical periods, thereby introducing a bias in the
images corresponding to this prompt. Fine-tuning reduced the error and the bias, with the MAE
being 6.65 for the classi昀椀er trained and evaluated on the original gray-scale images and 6.79 for
the colorized images. The bias between gray-scale and colorized images found in the zero-shot
3We colorized all the photographs using DeOldifyhttps://github.com/jantic/DeOldify
4A KS-test (KS-statistic: 0.49, p-value: 0.0) supports the di昀erence between the distributions.
(a) Zero-shot classi昀椀cation
(b) Fine-tuned classi昀椀cation
approach disappears a昀琀er 昀椀ne-tuning (Figure</p>
      <p>2), displaying a more normal error distribution5.</p>
      <p>
        Content Analysis. Upon training a model to predict dates, we investigated the content of
the images to determine whether speci昀椀c visual features improved or hindered the predictions.
An initial analysis using the available scene labels proved to be inconclusive as there were large
di昀erences in the within-scene error rates, as well as a large variety of visual features
represented in individual scenes. We opted to examine the visual features at the object level, using
Detectron2 outputs [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. For the detection task, we used the 80 default objects, as de昀椀ned in
COCO, default ROI threshold (0.5). Next, we only selected detection with a con昀椀dence
threshold above 0.8 and types that appeared more than 200 times in the entire data set, in order to
shorten the long tail. The con昀椀dence threshold was output by Detectron2 per image. Finally,
we picked 12 classes representing modes of transport and living beings to focus this experiment
on.6. Our motivation in picking these classes was to reduce the granularity of the available
categories so as to identify larger trends; classes like ‘tie‘ for example, might be closely related to
‘person’, essentially functioning as a sub-class thereof. An additional motivation for excluding
some of the MS COCO object classes, is that they did not all suit the context and/or time span
of our dataset, especially technology like ‘laptop’, ‘cell phone’ or ‘microwave’.
      </p>
      <p>A Bayesian regression analysis was conducted to measure the e昀ect of object presence and
absence on the error rate.7 The regression model was de昀椀ned as follows:
 
_   = 1 + 
_</p>
      <p>Where   _   is the outcome variable, _   is the binary prediction
variable indicating each object’s presence. To model the distribution, we assumed a negative
binomial. We model the errors as counts, where the event is the counts of predictions with a
5Also supported by KS-test (K-S statistic: 0.00, p-value: 0.92).
6‘bicycle’, ‘boat’, ‘bus’, ‘car’, ‘motorcycle’, ‘train’, ‘truck’, ‘bird’, ‘cat’, ‘dog’, ‘horse’, ‘person’
7The analysis was performed using the Python library Bambi and the NumPyro nuts sampler.
speci昀椀c error. 8</p>
      <p>Figure 4a shows that for modes of transport, the presence of ‘bicycle’, ‘boat’, ‘motorcycle’,
and ‘train’ increase the absolute error, whereas ‘bus’ and ‘car’ decrease the absolute error. We
hypothesize that these vehicles are more prone to exterior changes in this period. Of the
animals, we see that ‘bird’ and ‘horse’ increase the error and ‘cat’ decreases the error. Our
hypothesis here is that cats might be depicted more o昀琀en together with humans and in interior
environments, which may include more temporal markers. Finally, we see that having a
‘person’ in an image has the strongest e昀ect, decreasing the MAE from approximately 7.2 to 5.5
(Figure 4b), indicating that depictions of people convey visual cues about time. These results
and hypotheses need further examination, which we intend to undertake in future work.
8Since the variance and mean are not equal a zero-in昀氀ated Poisson was not warranted. See the GitHub for more
information on the models.
(a) Estimated e昀ects of objects on absolute
error. HDI .95, meaning that there is a 95%
chance that the true value lies within this
range.
(b) Posterior predictions for the class ‘person’.</p>
      <p>A person in the image reduces the absolute
error from 7.2 to 5.5.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and Discussion</title>
      <p>
        Our exploration with OpenCLIP to date historical press photographs yielded several 昀椀ndings.
Ine昀ectiveness of Zero-shot Classification. Our 昀椀rst 昀椀nding is that the zero-shot
classi椀昀cation capability of OpenCLIP does not perform well in dating images. The model
demonstrated a distinct bias towards predicting earlier dates, which we attribute to the gray-scale
nature of our images. This suggests that OpenCLIP may have learned to associate gray-scale
with older photographs, as [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] also concluded in exploring the concept of history in
foundation models including CLIP. We attempted to counteract this bias by colorizing the images,
which mildly improved the model’s accuracy and shi昀琀ed the bias towards predicting more
recent dates. However, despite these adjustments, the e昀케cacy of zero-shot classi昀椀cation for this
task remained limited.
      </p>
      <p>Improvement through Fine-tuning. Our second 昀椀nding is that 昀椀ne-tuning OpenCLIP
using a logistic classi昀椀er signi昀椀cantly enhances the model’s performance. The 昀椀ne-tuned model
e昀ectively eliminates the bias towards past dates seen in the zero-shot approach and o昀ers
comparable accuracy levels for both gray-scale and colorized images. This indicates that the
presence of color in images becomes less signi昀椀cant for dating them when the model is trained
to focus on visual content. Future research could look into generating captions rather than
labels or scenes to provide a more enriched context for each image.</p>
      <p>
        Objects as Temporal Markers. Our third 昀椀nding, coming from the post-hoc regression
analysis, is that the presence of people in images generally leads to more accurate date
predictions. This echoes the 昀椀ndings of [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. We posit that this could be attributed to the
timedependent markers humans tend to carry, like fashion and hairstyles, as has previously been
shown in studies on yearbook portraits by 5[] and [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Moreover, we see that the presence
of animals o昀琀en kept as house pets also reduces the error, we hypothesize that this might be
due them being photographed indoors or in proximity to humans, which might carry more
temporal markers than animals, such as horses or birds that are captured in nature. Finally,
certain modes of transportation increase the error rate while others decrease it. We need to
explore to what extent this is related to innovations that lead to visual changes over time. All in,
further investigation is necessary to validate these hypotheses; considering our 昀椀ndings on the
in昀氀uential role of human 昀椀gures in the images, it would be worthwhile to explore datasets
containing fewer human 昀椀gures – in our dataset only 8,674 of the 39,866 photographs contained no
people.9 This could shed light on whether the presence of humans is generally advantageous
for image dating, or if this is a speci昀椀c characteristic of our dataset or a manifestation of model
bias.
      </p>
      <p>In conclusion, this study deepens our understanding of how computer vision models interpret
and extract temporal information from historical visual material. It highlights the potential
of OpenCLIP for image dating tasks. It also underscores the importance of model 昀椀ne-tuning
to counter biases. Future work could test our 昀椀ndings’ generalizability to color images and
datasets from various periods and geographical regions. Such work can be a means to
identifying and using temporal information in visual material better, with the aim of creating more
temporally-aware computer vision and multimodal models. To this end, we see case studies
engaging with speci昀椀c computer vision/ temporal tasks like image dating, as important steps
in testing what works in terms of both models and data.
9This is based on outputs from Detectron2 at an object con昀椀dence threshold of 0.8.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>Kheli昀椀, A. Lawgaly, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Bouridane</surname>
          </string-name>
          . “
          <article-title>Temporal Image Forensic Analysis for Picture Dating with Deep Learning”</article-title>
          . In:2020 International Conference on Computing, Electronics &amp; Communications
          <string-name>
            <surname>Engineering</surname>
          </string-name>
          (iCCECE). Southend, United Kingdom: Ieee,
          <year>2020</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>114</lpage>
          . doi:
          <volume>10</volume>
          .1109/iCCECE49321.
          <year>2020</year>
          .
          <volume>9231160</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lischinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Snavely</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Averbuch-Elor</surname>
          </string-name>
          .
          <article-title>“What's in a Decade? Transforming Faces Through Time”</article-title>
          .
          <source>In:Computer Graphics Forum 42.2</source>
          (
          <issue>2023</issue>
          ), pp.
          <fpage>281</fpage>
          -
          <lpage>291</lpage>
          . doi:
          <volume>10</volume>
          .1111/cgf.14761.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Fernando</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Muselet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Khan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Tuytelaars</surname>
          </string-name>
          . “
          <article-title>Color features for dating historical color images”</article-title>
          .
          <source>In: 2014 IEEE International Conference on Image Processing (ICIP)</source>
          . Paris, France: Ieee,
          <year>2014</year>
          , pp.
          <fpage>2589</fpage>
          -
          <lpage>2593</lpage>
          . doi:
          <volume>10</volume>
          .1109/icip.
          <year>2014</year>
          .
          <volume>7025524</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fiorucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khoroshiltseva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pontil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Traviglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Del Bue</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>James</surname>
          </string-name>
          . “
          <article-title>Machine Learning for Cultural Heritage: A Survey”</article-title>
          .
          <source>In:Pattern Recognition Letters</source>
          <volume>133</volume>
          (
          <year>2020</year>
          ), pp.
          <fpage>102</fpage>
          -
          <lpage>108</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.patrec.
          <year>2020</year>
          .
          <volume>02</volume>
          .017.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ginosar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rakelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Sachs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Krähenbühl</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Efros</surname>
          </string-name>
          .
          <article-title>“A Century of Portraits: A Visual Historical Record of American High School Yearbooks”</article-title>
          .
          <source>In: IEEE Transactions on Computational Imaging 3.3</source>
          (
          <issue>2017</issue>
          ), pp.
          <fpage>421</fpage>
          -
          <lpage>431</lpage>
          . doi:
          <volume>10</volume>
          .1109/tci .
          <year>2017</year>
          .
          <volume>2699865</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hamid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bibi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Moetesum</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Siddiqi.</surname>
          </string-name>
          “
          <article-title>Deep Learning Based Approach for Historical Manuscript Dating”</article-title>
          .
          <source>In: 2019 International Conference on Document Analysis and Recognition (ICDAR)</source>
          . Sydney, Australia: Ieee,
          <year>2019</year>
          , pp.
          <fpage>967</fpage>
          -
          <lpage>972</lpage>
          . doi:
          <volume>10</volume>
          .1109/icdar .
          <year>2019</year>
          .
          <volume>00159</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Ilharco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wortsman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wightman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gordon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Carlini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Taori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Shankar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Namkoong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hajishirzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Farhadi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Schmidt.OpenCLIP</surname>
          </string-name>
          .
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.5143773.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Khan</surname>
          </string-name>
          and
          <string-name>
            <surname>N. van Noord. “</surname>
          </string-name>
          <article-title>Stylistic Multi-Task Analysis of Ukiyo-e Woodblock Prints”</article-title>
          .
          <source>In: British Machine Vision Conference</source>
          .
          <year>2021</year>
          , p.
          <fpage>14</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Klassen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weed</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Evans</surname>
          </string-name>
          . “
          <article-title>Semi-supervised machine learning approaches for predicting the chronology of archaeological sites: A case study of temples from medieval Angkor, Cambodia”</article-title>
          .
          <source>In:Plos One 13.11</source>
          (
          <year>2018</year>
          ),
          <year>e0205649</year>
          . doi:
          <volume>10</volume>
          .1371/journal.pone.
          <volume>0205</volume>
          <fpage>649</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. W.-K.</given-names>
            <surname>Kong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. K.</given-names>
            <surname>Goh</surname>
          </string-name>
          . “
          <article-title>Deep Ordinal Regression based on Data Relationship for Small Datasets”</article-title>
          . In: Ijcai (
          <year>2017</year>
          ), pp.
          <fpage>2372</fpage>
          -
          <lpage>2378</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Manjavacas</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Fonteyn</surname>
          </string-name>
          . “Macberth:
          <article-title>Development and evaluation of a historically pre-trained language model for english (1450-1950)”</article-title>
          .
          <source>In:Proceedings of the Workshop on Natural Language Processing for Digital Humanities</source>
          .
          <year>2021</year>
          , pp.
          <fpage>23</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Bulan</surname>
          </string-name>
          , G. Sharma, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Datta</surname>
          </string-name>
          . “
          <article-title>Device temporal forensics: An information theoretic approach”</article-title>
          .
          <source>In:2009 16th IEEE International Conference on Image Processing (ICIP)</source>
          .
          <year>2009</year>
          , pp.
          <fpage>1501</fpage>
          -
          <lpage>1504</lpage>
          . doi:
          <volume>10</volume>
          .1109/icip.
          <year>2009</year>
          .
          <volume>5414612</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mensink</surname>
          </string-name>
          and
          <string-name>
            <surname>J. Van Gemert. “</surname>
          </string-name>
          <article-title>The rijksmuseum challenge: Museum-centered visual recognition”</article-title>
          .
          <source>In: Proceedings of International Conference on Multimedia Retrieval</source>
          .
          <year>2014</year>
          , pp.
          <fpage>451</fpage>
          -
          <lpage>454</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Riba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ramos-Terrades</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>J.</given-names>
            <surname>Lladós</surname>
          </string-name>
          . “
          <article-title>Date Estimation in the Wild of Scanned Historical Photos: An Image Retrieval Approach”</article-title>
          .
          <source>InD:ocument Analysis and Recognition - ICDAR</source>
          <year>2021</year>
          . Ed. by
          <string-name>
            <given-names>J.</given-names>
            <surname>Lladós</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lopresti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Uchida</surname>
          </string-name>
          . Vol.
          <volume>12822</volume>
          . Cham: Springer International Publishing,
          <year>2021</year>
          , pp.
          <fpage>306</fpage>
          -
          <lpage>320</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -86
          <fpage>331</fpage>
          -
          <lpage>9</lpage>
          \_
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>E.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Springstein</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Ewerth</surname>
          </string-name>
          . ““When Was This Picture Taken?”
          <article-title>- Image Date Estimation in the Wild”</article-title>
          . In:Advances in Information Retrieval. Ed. by
          <string-name>
            <surname>J. M. Jose</surname>
            , C. Hau昀,
            <given-names>I. S.</given-names>
          </string-name>
          <string-name>
            <surname>Altıngovde</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Albakour</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Watt</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Tait</surname>
          </string-name>
          . Vol.
          <volume>10193</volume>
          . Cham: Springer International Publishing,
          <year>2017</year>
          , pp.
          <fpage>619</fpage>
          -
          <lpage>625</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -56608-5\_
          <fpage>57</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>N. van Noord</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wevers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Blanke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Noordegraaf</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Worring</surname>
          </string-name>
          .
          <article-title>An Analytics of Culture: Modeling Subjectivity, Scalability, Contextuality, and</article-title>
          <string-name>
            <surname>Temporality.</surname>
          </string-name>
          <year>2022</year>
          . doi: 10.4 8550/arxiv.2211.07460.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>F.</given-names>
            <surname>O昀ert</surname>
          </string-name>
          . “
          <article-title>On the Concept of History (in Foundation Models)”</article-title>
          .
          <source>In: Image 37.1</source>
          (
          <issue>2023</issue>
          ), pp.
          <fpage>121</fpage>
          -
          <lpage>134</lpage>
          . doi:
          <volume>10</volume>
          .1453/
          <fpage>1614</fpage>
          -0885-1-2023-15462.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>F.</given-names>
            <surname>Palermo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hays</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Efros</surname>
          </string-name>
          . “
          <article-title>Dating Historical Color Images”</article-title>
          .
          <source>InC:omputer Vision - ECCV 2012. ECCV 2012. Lecture Notes in Computer Science</source>
          . Vol. vol
          <volume>7577</volume>
          .
          <string-name>
            <surname>Florence</surname>
          </string-name>
          , Italy: Springer,
          <year>2012</year>
          , pp.
          <fpage>499</fpage>
          -
          <lpage>512</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -33783-3\_
          <fpage>36</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hallacy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          , G. Goh,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mishkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          , G. Krueger,
          <string-name>
            <surname>and I. Sutskever.</surname>
          </string-name>
          “
          <article-title>Learning Transferable Visual Models From Natural Language Supervision”</article-title>
          .
          <source>In:Proceedings of the 38th International Conference on Machine Learning. Pmlr</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>8748</fpage>
          -
          <lpage>8763</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Rosin</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Guy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Radinsky</surname>
          </string-name>
          . “
          <article-title>Time masking for temporal language models”</article-title>
          .
          <source>In: Proceedings of the Fi昀琀eenth ACM International Conference on Web Search and Data Mining</source>
          .
          <year>2022</year>
          , pp.
          <fpage>833</fpage>
          -
          <lpage>841</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>T.</given-names>
            <surname>Salem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Workman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhai</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Jacobs</surname>
          </string-name>
          . “
          <article-title>Analyzing human appearance as a cue for dating images”</article-title>
          .
          <source>In: 2016 IEEE Winter Conference on Applications of Computer Vision</source>
          (WACV).
          <source>Lake Placid</source>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA: Ieee,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . doi:
          <volume>10</volume>
          .1109/wacv.
          <year>2016</year>
          .
          <volume>7477678</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rohrbach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.-W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Keutzer. How Much Can CLIP Bene昀椀t Vision-</surname>
          </string-name>
          and
          <string-name>
            <surname>-Language Tasks</surname>
          </string-name>
          ?
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .48550/arxiv.2107 .06383.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>L.</given-names>
            <surname>Stacchio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Angeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lisanti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calanca</surname>
          </string-name>
          , and G. Mar昀椀a. “
          <article-title>IMAGO: A family photo album dataset for a socio-historical analysis of the twentieth century”</article-title>
          .
          <source>InA:CM Transactions on Multimedia Computing, Communications, and Applications</source>
          <volume>18</volume>
          .3s (
          <year>2022</year>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          . doi:
          <volume>10</volume>
          .1145/3507918.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Duarte</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Ratti</surname>
          </string-name>
          . “
          <article-title>Understanding architecture age and style through deep learning”</article-title>
          .
          <source>In: Cities</source>
          <volume>128</volume>
          (
          <year>2022</year>
          ), p.
          <fpage>103787</fpage>
          . doi:
          <volume>10</volume>
          .1016/j.cities.
          <year>2022</year>
          .
          <volume>103787</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>G.</given-names>
            <surname>Toner</surname>
          </string-name>
          and
          <string-name>
            <given-names>X.</given-names>
            <surname>Han</surname>
          </string-name>
          .
          <article-title>Language and chronology: text dating by machine learning</article-title>
          .
          <source>Brill</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26] [27]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wevers</surname>
          </string-name>
          . “Scene Detection in De Boer Historical Photo Collection:”
          <source>inP: roceedings of the 13th International Conference on Agents and Arti昀椀cial Intelligence</source>
          . Vienna, Austria: SCITEPRESS - Science and Technology Publications,
          <year>2021</year>
          , pp.
          <fpage>601</fpage>
          -
          <lpage>610</lpage>
          . doi:
          <volume>10</volume>
          .5220/00 10288206010610.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Wevers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Vriend</surname>
          </string-name>
          , and A. de Bruin. “
          <article-title>What to Do with 2</article-title>
          .
          <fpage>000</fpage>
          .000
          <string-name>
            <given-names>Historical</given-names>
            <surname>Press</surname>
          </string-name>
          <article-title>Photos? The Challenges and Opportunities of Applying a Scene Detection Algorithm to a Digitised Press Photo Collection”</article-title>
          .
          <source>In:TMG Journal for Media History 25.1 (1</source>
          <year>2022</year>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kirillov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          , W.-Y. Lo, and
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          .Detectron2. https://github.com/fa cebookresearch/detectron2.
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>