<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ROGER: Extracting Narratives Using Large Language Models from Robert Gerstmann's Historical Photo Archive of the Sacambaya Expedition in 1928</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mauricio Matus</string-name>
          <email>mmatus@ucn.cl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diego Urrutia</string-name>
          <email>durrutia@ucn.cl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudio Meneses</string-name>
          <email>cmeneses@ucn.cl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brian Keith</string-name>
          <email>brian.keith@ucn.cl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computing &amp; Systems Engineering, Universidad Católica del Norte</institution>
          ,
          <addr-line>Antofagasta</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>In: R. Campos, A. Jorge, A. Jatowt, S. Bhatia, M. Litvak (eds.): Proceedings of the Text2Story'24 Workshop</institution>
          ,
          <addr-line>Glasgow (Scotland), 24-March-2024</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Narrative Extraction</institution>
          ,
          <addr-line>Heritage Image Archives, Sacambaya Expedition, Large Language Models, Image Labeling. 1</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>School of Journalism, Universidad Católica del Norte</institution>
          ,
          <addr-line>Antofagasta</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This article presents the ongoing work on developing a methodology for the systematic analysis and narrative of heritage image archives, focusing on the photo archive by Robert Gerstmann of the Sacambaya Expedition in 1928. This work combines state-of-the-art artificial intelligence techniques, such as advanced algorithms used in computer visión like convolutional neural networks, with Large Language Models (LLM) for generation purposes. The intent is to establish a practical and accessible framework in this area for institutions and individuals. The proposed method incorporates human-generated image labels with LLMs to produce narratives that aid researchers and users in their sense-making process while they explore a large archive of images. Through this iterative process, we aim to contribute not only to the understanding of this specific historical photo collection but also to the broader development of scalable solutions for the exploration and interpretation of heritage image archives. We seek to achieve a deeper understanding of the contents and meanings of the analyzed files, suggesting and highlighting new clustering of these materials and thematic/narrative connections that may not have been considered by a human observer.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The emergence of new technologies and the availability of a vast photographic archive have
motivated a multidisciplinary project that explores the potential of these two elements in
expanding the scope of multidisciplinary research. This ongoing research in the field of
computational narrative extraction aims to develop a methodology for the analysis and
semiautomatic construction of meaning and narratives from historical image archives using Large
Language Models (LLMs). On the heritage side of this research, we aim to uncover and specify
narratives inherent in large banks of photos for which there is limited information and
dissemination [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
1.1.
      </p>
      <sec id="sec-1-1">
        <title>The Sacambaya Expedition photo archive</title>
        <p>
          His work has been preserved in its original physical format since 1964 in Antofagasta, Chile.
This material consists of 43,475 negatives and 15,054 positives in different formats,
representing a period of photographic capture of approximately 40 years [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. The images
detail landscapes from the beginning of the 20th century from the heights of Bolivia to
Antarctica, the Pacific islands, and the Andes Mountain range [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. In this context, we focus on
the Sacambaya Expedition archive, a part of the photographic work of Robert Gerstmann.
        </p>
        <p>
          In January 1928, Edgar Sanders, a Swiss engineer, established a company in London to
search for an alleged Jesuit treasure hidden in an old monastery located in a ravine in the
province of Inquisiví, Bolivia. The expedition comprised 21 individuals with diverse
professional and military backgrounds, including 19 English citizens, 1 German, and 1 North
American. The team scoured four different locations in the sector for five months, but their
efforts proved fruitless, and they returned to Europe in November of that year [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. We note
that most of the records related to this expedition have not been digitized before the present
project. It is estimated that only approximately 15% of these images are digitized and
accessible online [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
1.2
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>Proposed model</title>
        <p>
          This research seeks to exploit the power of LLMs [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ][
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and image processing techniques to
extract narratives from a compendium of historical images. This endeavor aims to augment
the corpus of historical knowledge by providing a narrative context to image archives that
capture specific historical events. The overarching goal of this project is to establish a
comprehensive framework/pipeline, named Robert Gerstmann Repository (ROGER), that
enables efficient exploration, categorization, and semi-automated extraction of narratives
implicit in previously unexamined heritage archives. The proposed methodology is designed
to be iterative and incremental, ensuring clear documentation of progress through each phase.
The efficacy of the methodology is illustrated through a case study centered on a historical
event, utilizing the framework to narrate its story systematically.
        </p>
        <p>
          The ROGER Narrative Pipeline unfolds in a structured, multi-phased approach,
incorporating human expertise and AI in a collaborative narrative pipeline. Initially, images
are labeled through a combination of AI-driven algorithms and human judgment, resulting in
a curated and contextually enriched dataset. Subsequent phases involve the use of AI to
generate descriptive narratives and cluster these into thematic groups. The proposed pipeline
integrates the use of prompt engineering loops [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] — a process where human input is used to
iteratively refine AI outputs — in the narrative extraction process, thus ensuring that the
emerging narratives are not only accurate but also resonate with human interpretative
frameworks. The final phase of the process is the drafting and construction of a coherent
narrative with human feedback from the clustered image data. This interactive and iterative
process between AI and human intelligence is instrumental in producing a polished and
nuanced narrative output, ready for presentation and scholarly exploration.
        </p>
        <p>This paper presents the results from the application of the ROGER Narrative Pipeline
to the 1928 Sacambaya Expedition historical archive, uncovering the implicit narratives
embedded within historical image archives. The proposed approach contributes to the
interdisciplinary dialogue on narrative extraction, advancing our understanding of
computational narrative construction in historical research. This represents a novel initiative
to systematically decipher the stories enshrined in historical visual records. Finally, in the
conclusion section, it is noted that this is the first work that has attempted to computationally
unearth the stories contained in any of Robert Gerstmann's historical photo archives.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Technological framework</title>
      <p>
        The computational analysis and interpretation of historical image archives is an emerging
interdisciplinary field that integrates computer vision [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], natural language processing [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
information retrieval [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and historical research methods [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Several recent projects [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
have demonstrated the potential of computational techniques to aid in making sense of
largescale image archives and constructing historical narratives from them. From a technological
point of view, the process described in the following Methodology section uses two key
technologies, in order to implement a pipeline that takes a set of related patrimonial pictures
as input and generates a coherent set of narratives as output. These key technologies are
LabelBox and LLMs (e.g., ChatGPT).
2.1.
      </p>
      <sec id="sec-2-1">
        <title>Label Box</title>
        <p>Labelbox is a machine learning annotation platform that simplifies the creation and
management of annotated datasets, which are vital for AI development. It supports a variety
of data types and provides tools for both manual and semi-automated annotation, aimed at
increasing efficiency and accuracy. The platform encourages teamwork with its collaborative
features and maintains high-quality annotations with robust quality control measures. Its
user-friendly interface and integrative capabilities with machine learning workflows make it
accessible for users of different skill levels, streamlining the annotation process from start to
finish.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>LLMs and ChatGPT</title>
        <p>
          LLMs are advanced AI systems designed to process and generate human-like text. They have
been trained on a massive amount of data and can understand and generate text in a wide
variety of languages and styles [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ][
          <xref ref-type="bibr" rid="ref15">15</xref>
          ][
          <xref ref-type="bibr" rid="ref16">16</xref>
          ][
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. ChatGPT, developed by OpenAI, is a specific
example of an LLM designed for conversational interactions. It uses GPT (Generative
Pretrained Transformer) architecture to understand and respond to user inputs. ChatGPT has
been trained on a wide range of internet text to grasp and mimic human-like conversational
patterns.
        </p>
        <p>
          Sensemaking of archives requires synthesizing across individual images to construct a
higher-level understanding. Computational techniques for visual storytelling aim to build
such narratives from image sequences [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. However, this typically relies on constrained
domains with limited vocabularies [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Our work leverages pre-trained LLMs capable of
open-domain generation to construct narratives for historical image archives. In summary,
our methodology builds upon advances in image recognition and LLMs while innovating in
integrating these techniques for computational sensemaking over historical image archives.
We believe this approach can provide both a macro-level narrative as well as a detailed
understanding grounded in the image contents.
3.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>We present the ROGER Narrative Pipeline for the extraction of computational narratives from
visual datasets. The general methodology is presented in Figure 1. This process commences
with a systematic labeling phase where a set of input images is semantically annotated using a
combination of software tools and human oversight, producing a curated dataset. This
curation involves the enrichment of the images with contextual metadata in the form of image
labels. These labels enhance the depth and relevance of the information that will be used in
narrative construction.</p>
      <p>Central to our pipeline is the integration of an LLM, which generates textual
descriptions and titles from the enriched image data. These descriptions are the bedrock upon
which the narrative structure is built. Subsequently, we use the LLM to perform clustering on
the textual descriptions to organize the images into clusters or coherent thematic groups
(Figure 3), followed by the establishment of a timeline, ordering these clusters and images to
create a draft narrative sequence. Integral to this process are the prompt engineering loops,
where human operators iteratively refine the AI prompts based on the outputs to produce a
final narrative (Figure 4). This iterative process is pivotal, allowing for the human operator's
critical and creative inputs to sculpt the narrative, ensuring structural and thematic integrity.</p>
      <p>The final stages of the pipeline revolve around the transformation of the AI-generated
timeline into a narrative draft. This draft undergoes a human-led finalization process, where
narrative/theme experts refine the storyline, ensuring linguistic precision, narrative flow, and
overall coherence. The result is a final narrative that provides a textual representation of the
visual data in a narrative format. The final output contains the ordered list of images, their
titles, their description, and the associated narrative. Furthermore, the output is accompanied
by a detailed cluster list that provides an overview of the narrative elements and their
organization, thereby offering transparency into the narrative structure and content. Through
this methodical and collaborative approach, the pipeline achieves a high-fidelity narrative
extraction from visual inputs, demonstrating the potential for robust human-AI collaboration
in this domain.
4.</p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>The main results associated with each stage of our pipeline, around a subset of 12 heritage
images, are presented here.
4.1.</p>
      <sec id="sec-4-1">
        <title>Data curation and sampling</title>
        <p>
          To demonstrate the capabilities of our methodology, we present the results on a
representative subset of 12 pictures from the photo archive of the Sacambaya Expedition in
1928 [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ][21]. This archive consists of 545 negative originals in 10 x 15 format. From this
collection, we discarded 45 photographs in the present analysis due to their defective nature
and/or being over- or under-exposed, as they did not provide relevant information for labeling
and subsequent categorization.
        </p>
        <p>
          Prior to this study, this photographic archive was sustained by historical sources and
the account of at least one of its members published in 1934 [22]. The historical narrative
suggested by these sources coincides with what is seen in the heritage images found in the
archive. These materials consist of a distinctive first phase of photos made on board a ship,
another category of materials showing means of transportation and human displacement
tasks, to finally settle into a last general category of images that exhibit the efforts of an
excavation and the logistics that this semi-industrial human operation entails [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Therefore,
these 500 photographs were pre-organized into the following thematic clusters:
•
•
•
        </p>
        <p>Journey by ship: Subgroup identified as “London to Arica” (LTA) with 69 images.
Journey by land: Subgroup identified as “Arica to Sacambaya” (ATS) with 187 images
Excavation sites: Subgroup identified as “Sacambaya” (SAC) with 244 images.</p>
        <p>From each of these thematic groups we intentionally selected 4 images to exemplify
the general progress of the expedition, the methodology employed, and suggest a temporal
order in the narration of the journey.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Data enrichment through manual labeling</title>
        <p>The enrichment of the original 500 images through the application of annotation tagging
(both classification and object) by humans was facilitated by the use of the LabelBox [23]
platform. The use of automatic classification algorithms was considered; however, the
decision was made to prioritize establishing a baseline developed by humans. The procedure is
comprised of several distinct phases.
4.2.1</p>
      </sec>
      <sec id="sec-4-3">
        <title>Dataset generation</title>
        <p>A corpus of 500 images was curated, encompassing approximately 2.1GB of data. These
images were standardized to a width of 1920 pixels and subsequently compressed using the
cjpeg software [24] to reduce their file size, culminating in a consolidated dataset of
approximately 370MB.</p>
        <p>a)
b)
The ontology for the annotation process was crafted following an initial analysis of the
photographs. This ontology comprised two sets of annotations: General Classification and
Specific Objects. The development of this ontology was a critical step in ensuring that the
annotations would be comprehensive and consistently applied across the entire image set by
the research assistants. The detailed categorization was designed to facilitate nuanced analysis
and to support the study's objectives by providing rich, structured data.</p>
        <p>The general classification elements consisted of 22 labels (such as trees, road, city,
boat, excavation, beach, square, etc.) that were defined based on a prior visual analysis by the
authors. The specific elements were grouped into categories: person, animal, object,
transportation, and landscape. Each of these categories encompassed between 5 to 7 labels.
This structured approach to classification allows for a detailed and organized analysis of the
photographs. By having both general and specific categories, the researchers could ensure that
the labeling process was thorough and nuanced, capturing both broad and fine-grained details
within the images.
4.2.3</p>
      </sec>
      <sec id="sec-4-4">
        <title>Annotator recruitment and annotation process</title>
        <p>Ten undergraduate students from the fields of journalism and computer engineering were
recruited to serve as research assistants. These individuals were selected based on the
criterion of having completed at least 50% of their academic program and having prior
participation in various research projects. Over the course of two weeks, these individuals
were tasked with the systematic manual delineation of bounding boxes and the subsequent
assignment of labels corresponding to the identified objects within the images. Each
photograph was labeled by at least two students and subsequently reviewed by the
researchers. The team identified a total of 4,868 objects, of which 44.1% were classified as
people, 22.17% as landscape details, 15.14% as various objects, with the remaining categories
including animals and transportation. The average number of annotations produced by each
student was 54.4.
4.2.4</p>
      </sec>
      <sec id="sec-4-5">
        <title>Annotation extraction</title>
        <p>The annotations were subsequently extracted from LabelBox in JSON format, which facilitated
their processing and analysis through the Python programming language. In a detailed
analysis of the dataset, 12 images were selected as representative samples from the collection
of 500, and their annotations were obtained for further investigation. This structured
approach to annotation not only enhances the reliability of the data but also ensures a level of
granularity that is conducive to subsequent computational analysis.</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.3 Descriptions and clustering</title>
        <p>Following the manual labeling process, we generated narrative descriptions of each picture in
the dataset. These descriptions are generated using ChatGPT with GPT4. In particular, we
prompt the model with minimal context about the Sacambaya Expedition, upload the image,
and provide the human-generated labels described previously. Table 1 illustrates the prompt
design used to generate a narrative description and title for the picture shown in Figure 2 and
the output generated by GPT4. We note that our annotators created the tags in Spanish as this
was their native language, they were left untranslated in the prompt. We note that there was
not a significant difference in translating them beforehand.</p>
        <p>Cluster 1: “Maritime Prelude”.</p>
        <p>Cluster 2: ““Expedition Life and Challenges”. Cluster 3: “Industrial and Excavation Efforts”.</p>
        <p>The minimum context information supplied to the model was the name, year, and
purpose of the expedition. The prompt included the names of some key places in the historical
narrative (Bolivia and Sacambaya) and the name and nationality of the photographer (see
Tables 1 to 3). Following the construction of all the narrative descriptions, we used another
prompt to ask the LLM to cluster the images based on their content and reorganize them
chronologically. The prompt used all the image descriptions generated beforehand to generate
these clusters. The prompt is shown in Table 2.</p>
      </sec>
      <sec id="sec-4-7">
        <title>4.4 Timeline and narrative draft</title>
        <p>Following this clustering process, we asked the LLM to generate a timeline of the photos
followed by a draft of the final narrative. We note that our work required extensive prompt
design to ensure that the generated descriptions, clusters, and final narratives were coherent.
We show the final version of the timeline and narrative extraction prompt alongside the
corresponding narrative summary output in Table 3. The final timeline is shown in Figure 4
with the corresponding images. These two elements (timeline and narrative summary)
represent the final output of the proposed pipeline.</p>
        <p>We note that our proposed prompt structure forces the model to hypothesize a
plausible chronological order before generating the narrative draft. The data set of historical
photos does not contain explicit temporal or spatial information not sharing with the model
the thematic grouping mentioned in point 4.1. Therefore the LLM must be able to infer the
order of the photos based on the general context of the expedition. While a human could also
help with the timeline generation process and provide further information via prompting, the
amount of photos in a full archive can make this generally unfeasible and too complex to
scale. Thus, for scalability purposes, we rely on the capabilities of the LLMs to make sense of
the underlying temporal and spatial information.</p>
        <p>In general, sorting an unordered set of photos in the correct temporal order is a
complex problem. In this context, GPT4 had issues when generating the correct timeline with
only limited contextual information, which is why we had to include additional context in our
prompt design. In reviewing the results, we found that the LLM model presented historical
events in the correct chronological order, as long as it was provided with minimal contextual
information.</p>
        <p>You are an expert documentary filmmaker and you are
exploring the historical archive of photos of the Sacambaya
Expedition in Bolivia from 1928, taken by the German
photographer Robert Gerstmann. The expedition sought to find
the lost treasure of the jesuits, hidden somewhere near
Sacambaya. You have the following list of descriptions of
photos. Your goal is to generate two outputs: a timeline of these
photos from the provided descriptions so that it forms a
coherent narrative and a written summary of the narrative. The
timeline should explicitly reference the IDs of the photos. The
narrative summary should also reference the IDs of the photos
as needed between brackets. Note that the provided descriptions
might not be shown in the correct order and the IDs are not
associated with any ordering. Please avoid the use of flowery
language, your narrative should be written in an academic style.</p>
        <p>Do not explain your process.</p>
        <p>The Sacambaya Expedition of 1928, documented by German
photographer Robert Gerstmann, embarked on a challenging
quest to uncover the Jesuit's lost treasure in Bolivia's Sacambaya
Valley. It began with maritime scenes (IDs 65 and 60), capturing
the tranquil ocean and coastal topography as the British
expedition approached its destination. Aboard the ship, the crew
is depicted in a moment of camaraderie and preparation (ID 30),
followed by a brief sojourn at a riverine port, indicating the
integration of transportation modes (ID 23). Upon reaching the
Sacambaya Valley, the expedition faced the valley's rugged
terrain (ID 40), fording rivers and navigating hilly landscapes (ID
85), and traversing dense thickets (ID 91). The photographs depict
the challenging nature of their journey, highlighting both the
physical and logistical hurdles encountered. As the expedition
progressed, the team's burden of transporting heavy machinery
through the dry valley is evident (ID 71), reflecting the blend of
hope and hardship. Moments of rest and mechanical endeavors
are captured (ID 46), as well as industrious activities amidst
natural ruins (ID 37), underscoring the expedition's reliance on
technology of the time. The concluding images (IDs 33 and 81)
focus on the excavation and manual labor involved in the treasure
hunt, set against a backdrop of natural and historical landscapes.</p>
        <p>This narrative, woven through Gerstmann's photographs, tells a
story of determination, challenge, and the enduring human spirit
in the pursuit of historical treasures.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>This ongoing research represents an advancement in the field of narrative extraction from
heritage image archives. Through the iterative and incremental development of our
methodology, a framework is established that not only enhances the understanding of
historical events but also contributes to the broader discourse on computational narratives
and artificial intelligence [25].</p>
      <p>This experimental model facilitates a utilitarian approach to solving issues prevalent
in numerous audiovisual archives. The methodology proposed herein aims to achieve quality
control measures concurrently with narration. While beyond this work's scope, future
initiatives will prioritize expanding the search range and implementing an integral quality
control system to regulate labeling accuracy, clustering thoroughness, and narrative
coherence levels. Additionally, we acknowledge the potential of Large Language Models
(LLMs) to autonomously identify a broad spectrum of objects. Consequently, subsequent
projects may explore creating narrations without human support, relying solely on object
identification and computer vision capabilities. However, if pursuing this avenue,
methodological and future discussion elements must be incorporated to ensure a minimum
level of confidence in the results, rendering them relevant for anthropological, historical, and
heritage discourse. Limitations of this case study and the need for a more comprehensive
evaluation are identified, as its primary objective was to illustrate the methodology rather
than validate it with an extensive data spectrum.</p>
      <p>The successes of the proposed method in constructing coherent historical narratives
suggest a potential paradigm shift in how narrative extraction from visual historical records
can be approached. Thus, this ongoing research represents a significant contribution to the
challenge of uncovering narratives concealed within historical image archives. Furthermore,
our aim is to observe significant changes, trends, and prevalent elements in large groups of
visual information that may not be readily apparent through individual observations [26].
This broader perspective facilitates the finding and construction of narratives that extend
beyond individual images.</p>
      <p>These experiments using easily available LLMs demonstrate the need to always
maintain human control in the process, as shown by all the required prompt engineering.
Future work will consist of applying the proposed pipeline to the entire collection of 500
images. We hope that our proposed methodology and technical pipeline streamline the work
of expert catalogers, documentarians, and media creators, who can now have a minimal
foundational basis to explore large, undisclosed photographic collections. Additionally, in the
present case of the archive of the Sacambaya Expedition by Robert Gerstmann, we hope to
have contributed to the historical and heritage enrichment of a part of this little explored
collection.</p>
      <p>
        In conclusion, we propose that the analysis of a specific photographic collection can
be further enriched through the organization and utilization of information in narratives [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
Theories on sensemaking emphasize that sensemaking and narrative are two inherently
interconnected concepts about how people understand the world around them [27]. Given its
replicability, we consider our proposed method to be a contribution to the discovery,
enrichment, and dissemination of the worlds and narratives “hidden” inside photographic
heritage archives.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>The authors wish to acknowledge the contribution of the UCN Faculty of Humanities with its
grant “Concurso de Incentivo Productividad Científica”, 2023 initiative, which contributed
financially to the project, and the UCN Library for allowing access to and work on Robert
Gerstmann's photo archive. The authors also wish to thank the team of research assistants,
made up of students from the School of Journalism and the Department of Computing and
Systems Engineering, who carried out the task of manual classification and annotation of the
more than 500 photos of the group under study.
[21] D. Buck, 2000 Tales of Glitter or Dust, accessed December 2023. URL:
https://www.thefreelibrary.com/Tales of Glitter or Dust.-a073064246.
[22] S. Jolly, "The Treasure Trail". John Long Limited, London 1934.
[23] Labelbox, "Labelbox," 2024. URL: https://labelbox.com.
[24] Wallace, Gregory K. "The JPEG Still Picture Compression Standard", Communications
of the ACM, April 1991 (vol. 34, no. 4), pp. 30-44.
[25] Keith Norambuena, Brian Felipe, Tanushree Mitra, and Chris North. "A survey on
event-based news narrative extraction." ACM Computing Surveys 55, no. 14s (2023): 1-39.
[26] Klingenstein, Sara &amp; Hitchcock, Tim &amp; DeDeo, Simon. (2014). The civilizing process in
London's Old Bailey. Proceedings of the National Academy of Sciences of the United States of
America. 111. 10.1073/pnas.1405984111.
[27] Battad, Zev &amp; Si, Mei. (2022). A System for Image Understanding using Sensemaking
and Narrative. The Ninth Advances in Cognitive Systems (ACS) Conference 2021.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Fornaro</surname>
            ,
            <given-names>Peter</given-names>
          </string-name>
          &amp; Chiquet,
          <string-name>
            <surname>Vera.</surname>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Artificial Intelligence for Content and Context Metadata Retrieval in Photographs and Image Groups</article-title>
          . Archiving Conference.
          <year>2020</year>
          .
          <volume>79</volume>
          -
          <fpage>82</fpage>
          .
          <fpage>10</fpage>
          .2352/issn.2168-
          <lpage>3204</lpage>
          .
          <year>2020</year>
          .
          <volume>1</volume>
          .0.79.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Alvarado</surname>
          </string-name>
          , Roberto Gerstmann: fotografías, paisajes y territorios latinoamericanos, 1st. ed.Pehuén, Santiago,
          <year>Chile 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Matus</surname>
          </string-name>
          ,
          <article-title>Roberto Gerstmann's last photography</article-title>
          ,
          <source>Video</source>
          ,
          <year>2022</year>
          . URL: https://youtu.be/9nFvhoZd5Os.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Sanders</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>The Story of the Jesuit Gold Mines in Bolivia and of the Treasure Hidden by the Sacambaya River</article-title>
          . (
          <year>1928</year>
          ) Rauner Special Collections Library - Dartmouth College.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pavez</surname>
          </string-name>
          , Imágenes de la revolución industrial: Robert Gerstmann en las Minas de Bolivia (
          <year>1925</year>
          -1936), 1st ed. Plural, La Paz,
          <year>Bolivia 2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6] Ma, Wenchi, Xuemin Tu, Bo Luo, and
          <string-name>
            <given-names>Guanghui</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>"Semantic clustering based deduction learning for image recognition and classification." Pattern Recognition 124 (</article-title>
          <year>2022</year>
          ):
          <fpage>108440</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Makridakis</surname>
            , Spyros,
            <given-names>Fotios</given-names>
          </string-name>
          <string-name>
            <surname>Petropoulos</surname>
            , and
            <given-names>Yanfei</given-names>
          </string-name>
          <string-name>
            <surname>Kang</surname>
          </string-name>
          .
          <article-title>"Large language models: Their success and impact</article-title>
          .
          <source>" Forecasting</source>
          <volume>5</volume>
          , no.
          <issue>3</issue>
          (
          <year>2023</year>
          ):
          <fpage>536</fpage>
          -
          <lpage>549</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Jiho</given-names>
            <surname>Shin</surname>
          </string-name>
          , Clark Tang, Tahmineh Mohati, Maleknaz Nayeb,
          <string-name>
            <given-names>Song</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Hadi</given-names>
            <surname>Hemmati</surname>
          </string-name>
          .
          <year>2024</year>
          .
          <article-title>Prompt Engineering or Fine Tuning: An Empirical Assessment of Large Language Models</article-title>
          in
          <source>Automated Software Engineering Tasks. 1</source>
          ,
          <issue>1</issue>
          (
          <year>October 2024</year>
          ),
          <volume>22</volume>
          pages.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Wevers</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vriend</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>de Bruin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>What to do with 2.000.000 historical press photos? The challenges and opportunities of applying a scene detection algorithm to a digitised press photo collection</article-title>
          .
          <source>TMG Journal for Media History</source>
          ,
          <volume>25</volume>
          (
          <issue>1</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Witte</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kappler</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krestel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Lockemann</surname>
            ,
            <given-names>P. C.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Integrating wiki systems, natural language processing, and semantic technologies for cultural heritage data management</article-title>
          .
          <source>In Language Technology for Cultural Heritage: Selected Papers from the LaTeCH Workshop Series</source>
          (pp.
          <fpage>213</fpage>
          -
          <lpage>230</lpage>
          ). Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ye</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keogh</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Shelton</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2008</year>
          , June).
          <article-title>Annotating historical archives of images</article-title>
          .
          <source>In Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries</source>
          (pp.
          <fpage>341</fpage>
          -
          <lpage>350</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Jo</surname>
            ,
            <given-names>E. S.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <source>Foreign Relations of the United States Series</source>
          ,
          <fpage>1860</fpage>
          -
          <lpage>1980</lpage>
          :
          <article-title>A Study in New Archival History</article-title>
          . Stanford University.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Lotfi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Beheshti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Farhood</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ; Pooshideh,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Jamzad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Beigy</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.</surname>
          </string-name>
          <article-title>Storytelling with Image Data: A Systematic Review and Comparative Analysis of Methods and Tools</article-title>
          .
          <source>Algorithms</source>
          <year>2023</year>
          ,
          <volume>16</volume>
          , 135. https://doi.org/10.3390/a16030135.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Brown</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ryder</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subbiah</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaplan</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dhariwal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neelakantan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shyam</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sastry</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Askell</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , et al.
          <article-title>Language Models Are Few-shot learners</article-title>
          .
          <source>Advances In Neural Information Processing Systems</source>
          ,
          <volume>33</volume>
          :
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Chowdhery</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bosma</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mishra</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barham</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chung</surname>
            ,
            <given-names>H. W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutton</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gehrmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , et al.
          <article-title>PaLM: Scaling language modeling with pathways</article-title>
          .
          <source>arXivpreprint2204.02311</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Touvron</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lavril</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Izacard</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martinet</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lachaux</surname>
            , M.-
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , Rozi`ere,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Hambro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Azhar</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          , et al.
          <article-title>LLaMA: Open and efficient foundation language models</article-title>
          .
          <source>arXivpreprint2302.13971</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <article-title>OpenAI</article-title>
          .GPT-4
          <source>technical report. arXivpreprint2303.08774</source>
          ,
          <year>2023</year>
          . .URL: https://doi.org/10.48550/arX iv.
          <volume>2303</volume>
          .08774.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heo</surname>
            ,
            <given-names>M. O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Son</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>K. W.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>B. T.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Glac net: Glocal attention cascading networks for multi-image cued story generation</article-title>
          . arXiv preprint arXiv:
          <year>1805</year>
          .10973.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>T. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferraro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mostafazadeh</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Misra</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp; Mitchell,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2016</year>
          , June).
          <article-title>Visual storytelling</article-title>
          .
          <source>In Proceedings of the 2016</source>
          conference
          <article-title>of the North American chapter of the association for computational linguistics: Human language technologies</article-title>
          (pp.
          <fpage>1233</fpage>
          -
          <lpage>1239</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Maggiori</surname>
            , Emmanuel, Yuliya Tarabalka, Guillaume Charpiat, and
            <given-names>Pierre</given-names>
          </string-name>
          <string-name>
            <surname>Alliez</surname>
          </string-name>
          .
          <article-title>"Highresolution aerial image labeling with convolutional neural networks</article-title>
          .
          <source>" IEEE Transactions on Geoscience and Remote Sensing</source>
          <volume>55</volume>
          , no.
          <volume>12</volume>
          (
          <year>2017</year>
          ):
          <fpage>7092</fpage>
          -
          <lpage>7103</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>