<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Visual Navigation of Digital Libraries: Retrieval and Classification of Images in the National Library of Norway&apos;s Digitised Book Collection</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Marie</forename><surname>Roald</surname></persName>
							<email>marie.roald@nb.no</email>
							<affiliation key="aff0">
								<orgName type="department">Research and Special Collections</orgName>
								<orgName type="institution">The National Library of Norway</orgName>
								<address>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Magnus</forename><forename type="middle">Breder</forename><surname>Birkenes</surname></persName>
							<email>magnus.birkenes@nb.no</email>
							<affiliation key="aff0">
								<orgName type="department">Research and Special Collections</orgName>
								<orgName type="institution">The National Library of Norway</orgName>
								<address>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lars</forename><surname>Gunnarsønn</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Research and Special Collections</orgName>
								<orgName type="institution">The National Library of Norway</orgName>
								<address>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Bagøien</forename><surname>Johnsen</surname></persName>
							<email>lars.johnsen@nb.no</email>
							<affiliation key="aff0">
								<orgName type="department">Research and Special Collections</orgName>
								<orgName type="institution">The National Library of Norway</orgName>
								<address>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Visual Navigation of Digital Libraries: Retrieval and Classification of Images in the National Library of Norway&apos;s Digitised Book Collection</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">8A63D4D2FB27A49843AC5F8DEA1434DC</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:50+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>image retrieval</term>
					<term>computer vision</term>
					<term>embeddings</term>
					<term>vector search</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Digital tools for text analysis have long been essential for the searchability and accessibility of digitised library collections. Recent computer vision advances have introduced similar capabilities for visual materials, with deep learning-based embeddings showing promise for analysing visual heritage. Given that many books feature visuals in addition to text, taking advantage of these breakthroughs is critical to making library collections open and accessible. In this work, we present a proof-of-concept image search application for exploring images in the National Library of Norway's pre-1900 books, comparing Vision Transformer (ViT), Contrastive Language-Image Pre-training (CLIP), and Sigmoid loss for Language-Image Pre-training (SigLIP) embeddings for image retrieval and classification. Our results show that the application performs well for exact image retrieval, with SigLIP embeddings slightly outperforming CLIP and ViT in both retrieval and classification tasks. Additionally, SigLIP-based image classification can aid in cleaning image datasets from a digitisation pipeline.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>With the goal of preserving and disseminating Norwegian cultural heritage, the National Library of Norway (NLN) began digitising its collection in 2006. This collection, acquired per the Norwegian Legal Deposit Act 1 , spans various materials, including books, newspapers, journals, posters, radio, movies and more <ref type="bibr" target="#b3">[4]</ref>. Almost all books and most newspapers have already been digitised, barring a few exceptions, and the current focus is on processing newspapers, journals, and non-text-based media <ref type="bibr" target="#b3">[4]</ref>. However, digitisation alone is insufÏcient to make cultural heritage available; it is also necessary to ensure that the digitised content is easy to view and access is not overly restricted. Thus, the Bokhylla agreement grants regulated access <ref type="bibr" target="#b10">[11]</ref>, and the online library Nettbiblioteket lets users view collections with an International Image Interoperability Framework (IIIF) <ref type="bibr" target="#b22">[23]</ref> based viewer and perform full-text searches using Elasticsearch. Finally, NLN offers limited access to the textual content through NB DH-LAB <ref type="bibr" target="#b3">[4]</ref> and corresponding webapps <ref type="foot" target="#foot_0">2</ref> which provides tools based on text aggregates (e.g. n-grams, collocations and concordances) to facilitate automated and reproducible analysis of the text.</p><p>Currently, these tools have largely been based on text extracted from Analysed Layout and Text Object-Extensible Markup Language (ALTO-XML) files <ref type="foot" target="#foot_1">3</ref> generated by optical character recognition (OCR) models during digitisation <ref type="bibr" target="#b4">[5]</ref>. However, the output XML also contains coordinates for graphical elements. These graphical elements represent non-textual elements in the books, e.g. illustrations or decorations. While such elements are an important part of the books, they have been cumbersome to explore, requiring manual inspection. Therefore, an essential missing step for making NLN's digitised collection more accessible is making these graphical elements easier to explore and analyse.</p><p>An approach to make such elements explorable, is creating tools for image search, either in the form of exact image retrieval (i.e. recovering a specific image) or semantic image retrieval (i.e. recovering an image with similar contents) or both. While text-based search engines are commonplace, image search is more complicated <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b27">28]</ref>. Early methods matched images using surrounding text <ref type="bibr" target="#b15">[16]</ref>, but this approach demands high-quality textual descriptions, which can be lacking. Alternatively, exact image retrieval traditionally relies on handcrafted image features for comparison <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b27">28]</ref>. Handcrafting such features can be challenging, and typically form a dense vector, which can hinder efÏcient lookups.</p><p>However, recent technological advancements have simplified the implementation of image search engines. Various tools now implement efÏcient search indices for dense vectors, such as the hierarchical navigable small worlds (HNSW) index <ref type="bibr" target="#b11">[12]</ref>. Moreover convolutional neural networks (CNNs) and vision transformers (ViTs) have alleviated the need for handcrafted image features for computer vision <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b6">7]</ref>. Furthermore, there has been an influx of multimodal models, like Contrastive Language-Image Pre-training (CLIP) <ref type="bibr" target="#b18">[19]</ref> and Sigmoid Loss for Language Image Pre-Training (SigLIP) <ref type="bibr" target="#b26">[27]</ref>. The recent advances in computer vision and proliferation of advanced pre-trained computer vision models has empowered the development of new research and tools for exploring and analysing image-based data in the digital humanities <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b24">25,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b19">20]</ref>.</p><p>Previous work on machine learning-driven computer vision-based image search tools for digital humanities mainly focuses on cleanly digitised materials such as collections of videos, photographs, lantern slides and medieval illuminations <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b16">17]</ref>. However, there is limited work applying such tools to images extracted from the output of automatic layout detection of scanned media, e.g. books and newspapers. Such image collections pose unique challenges. First, the magnitude of data is often larger than for collections of photographs. Second, such data can contain artefacts not found in cleanly digitised materials. For example, detected bounding boxes might be inaccurate. False positives can occur, where the automatic layout detection mistakenly marks, e.g. tables or blank pages, as graphical elements. Avoiding such artefacts can be infeasible, as redoing layout analysis for a collection of sizeable magnitude can be costprohibitive and not guaranteed to succeed. Therefore, a natural next step is exploring machine learning-based image retrieval in the context of NLN's collection of scanned automatically processed media. This short paper details ongoing work on these challenges, with three primary contributions:</p><p>1. Developing a proof-of-concept image search application for NLN's pre-1900 books. 2. Comparing modern image embeddings for image retrieval in NLN's digitised books. 3. Evaluating pre-trained models for fine-tuned classification of image categories.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background and related work</head><p>Two traditional approaches for image retrieval are context-based full-text search -querying the images' textual context -and hashing-based approaches for exact image retrieval. The former typically works by using an inverted index to efÏciently retrieve relevant images via e.g. term frequency-inverse document frequency (TF-IDF) weighting <ref type="bibr" target="#b23">[24]</ref>, before potentially reranking them based on image features <ref type="bibr" target="#b15">[16]</ref>. The hashing-based alternative works by computing a compact hash, or "fingerprint", that can be used for efÏcient exact image retrieval <ref type="bibr" target="#b5">[6]</ref>.</p><p>More recent image retrieval approaches compute image similarities using deep learningbased image classification models such as ViTs <ref type="bibr" target="#b6">[7]</ref> or CNNs <ref type="bibr" target="#b7">[8]</ref>. These models first transform an image into an embedding, which is used as input for a logistic regression model. The key insight in using these models for image retrieval is that we can compute image similarities by comparing the embeddings, e.g. with the cosine similarity.</p><p>However, by using classification models, we assume that embeddings learned by training on image-label combinations are informative enough to group images semantically, which can hinder generalisation to out-of-sample images <ref type="bibr" target="#b14">[15]</ref>. Another approach is multimodal models like CLIP and SigLIP. In short, these models work by combining an image transformer and a text transformer to compute image and text embeddings -aligning them to ensure strong cosine similarity for matching pairs. This approach has been successfully applied to e.g. image retrieval and zero-shot classification <ref type="bibr" target="#b18">[19]</ref>, and generalise better to out-of-sample images <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b14">15]</ref>.</p><p>During CLIP and SigLIP training, models receive shufÒed image-caption pairs and compute probabilities for matches. Such training demands extensive data and computational resources.</p><p>To circumvent this, it is common to use pre-trained models and the popularity of model repositories, like Huggingface Hub <ref type="bibr" target="#b25">[26]</ref> and Torch Hub <ref type="bibr" target="#b0">[1]</ref>, has made using models trained on massive datasets accessible.</p><p>While methods for efÏcient sparse vector queries have existed for decades <ref type="bibr" target="#b9">[10]</ref>, querying based on image embeddings requires dense vector queries, which is still a research topic. However, the recently proposed HNSW-index for approximate nearest neighbour search <ref type="bibr" target="#b11">[12]</ref> has gained traction for accuracy and efÏciency. The index consists of a hierarchy of navigable small world graphs <ref type="bibr" target="#b12">[13]</ref>, each built from different data subsets, and querying consists of iteratively traversing the hierarchy, enabling efÏcient navigation through large datasets.</p><p>Applying modern computer vision to problems in digital humanities has recently gained traction. The term distant viewing is introduced in <ref type="bibr" target="#b1">[2]</ref>, which demonstrates how computer vision methods for clustering and object detection can be applied to image-and video-data. Building on this, <ref type="bibr" target="#b24">[25]</ref> shows how CNN-based semantic image retrieval can be used to explore trends in newspaper advertisements and illustrations extracted from Delpher -a digitised materials search engine by the Dutch national library. Moreover, <ref type="bibr" target="#b16">[17]</ref> demonstrate how a combination of monomodal image-and language-models can be used to combine and enrich two manually annotated collections of medieval illuminations and <ref type="bibr" target="#b20">[21,</ref><ref type="bibr" target="#b21">22]</ref> shows how a CLIP model can be used to explore and label magic lantern slides efÏciently and that it can struggle with zero-shot classification of old illustrations. Using CLIP embeddings, <ref type="bibr" target="#b19">[20]</ref> clusters news videos and employs a graph-based approach for efÏcient exploration. Machine learning-driven image retrieval tools for libraries and museums, like Maken<ref type="foot" target="#foot_2">4</ref> , Bildsök <ref type="foot" target="#foot_3">5</ref> and Nasjonalmuseet Beta<ref type="foot" target="#foot_4">6</ref> have also emerged. These previous works highlight computer vision's potential in digital humanities, and thus, evaluating and comparing such models in the context of NLN's digitised book collection is a relevant next step.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methods</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Extracting images</head><p>To search the images, they must first be extracted from the digitised book collection. During NLN's digitisation, books are scanned and processed through a pipeline including layout detection and OCR, producing ALTO-XML files<ref type="foot" target="#foot_5">7</ref> named after Uniform Resource Names (URNs). These files contain page information, describing the page in terms of four block types: TextBlock, Illustration, GraphicalElement and CompositeBlock (blocks containing other blocks) <ref type="foot" target="#foot_6">8</ref> . In the ALTO-XML files parsed for this work, all illustrations and graphical elements are tagged as GraphicalElement. Parsing these files, we extracted the page URN, coordinates, and size for each graphical element in addition to the textual context of each image in the digitised books. For this work, we processed pre-1900 books, creating a sufÏciently large, yet manageable subset for testing.</p><p>For each graphical element, we used NLN's IIIF API <ref type="foot" target="#foot_7">9</ref> to download images from URLs following the format in Table <ref type="table">1</ref>, discarding images with aspect-ratio ≥ 50. By integrating ALTO-XML files with the IIIF endpoint -both technologies already utilised by NLN -we obtained images from digitised Norwegian books before 1900.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Creating the vector search application</head><p>We computed image embeddings using Huggingface Transfomers <ref type="bibr" target="#b25">[26]</ref> with three models: ViT (google/vit-base-patch16-224 <ref type="foot" target="#foot_8">10</ref> ), CLIP (openai/clip-vit-base-patch32 <ref type="foot" target="#foot_9">11</ref> ) and SigLIP (google/siglip-base-patch16-256-multilingual <ref type="foot" target="#foot_10">12</ref> ). Each pre-trained model's preprocessing pipeline involved resizing images to the input shapes (224 for ViT and CLIP, and 256 for SigLIP) and scaling the pixel values. For ViT and SigLIP, images were resized to 224 × 224 and</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>The IIIF URL format.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Description Example</head><p>256 × 256 pixels, altering the aspect ratio. CLIP resized the smallest dimension to 224, preserving the aspect ratio, then center-cropped to 224 × 224 pixels. Next, we used the corresponding image transformer and obtained embeddings of sizes 768 (ViT and SigLIP) and 512 (CLIP).</p><p>After computing embeddings, we ingested them into a Qdrant database and used FastAPI to create an application programming interface (API) for efÏcient querying by images, embedding vectors, image IDs, or context-based text search. Qdrant supports fast K-nearest neighbour search for both dense and sparse vectors. For image-based queries, we used a cosine similaritybased HNSW index, and for context-based full-text queries, we used a dot-product-based inverted index for TF-IDF (details in supplement on GitHub <ref type="foot" target="#foot_11">13</ref> ). We used default parameters for all search indices. The vector database and the API are hosted on-premise, exposing only the API to the Internet. The application also includes a frontend, implemented using Flask and HTMX, hosted using Google Cloud Run with 512 MiB RAM and one vCPU.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Classifying based on embedding vectors</head><p>As the graphical elements stem from NLN's digitisation process, many segmentation anomalies are also tagged as graphical elements. Common examples are blank pages, parts of tables, and text. To estimate the fraction of such regions, we used HumanSignal Label Studio and manually labelled a dataset containing 2000 images as either Blank page, Segmentation anomaly, Illustration or photograph, Musical notation, Map, Mathematical chart or Graphical element (e.g. initial, decorative border, etc.).</p><p>After labelling the data, we fitted regularised logistic regression models (using scikit-learn v1.5.0 <ref type="bibr" target="#b17">[18]</ref>) to classify images based on their embedding vectors. This can be interpreted as a form of transfer learning, fine-tuning the last layer of the transformer model. The embedding vector type (i.e. ViT, CLIP or SigLIP) and the complexity parameter (inverse ridge parameter) were selected using nested cross-validation with 20 outer folds and ten inner folds. Models were selected based on a micro-averaged F1-score (the harmonic mean of micro-averaged precision and sensitivity). We selected the complexity parameter from ten logarithmically spaced values between 10 −4 and 10 4 . Finally, we computed the confusion matrix in the outer cross-validation loop (the evaluation loop). The supplement describes the overall cross-validation algorithm in Algorithms 1 and 2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Evaluating searches</head><p>To evaluate the search, we first manually inspected some example queries before performing a systematic evaluation on exact image retrieval. To simulate exact image retrieval scenarios, we selected the 684 images labelled as Illustration or photograph, Map or Mathematical chart as target images, and applied random cropping (≤ 15 %, independently on all sides), rotation (±0 − 10 ∘ ) and scaling (±0 − 20 %, independently for width and height). Then, querying the database with these transformed images, we evaluated the Top 𝑁 accuracy measuring whether our application retrieved the target image in the first result (Top 1), first row (Top 5), first two rows (Top 10) or results at all (Top 50).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head><p>Figure <ref type="figure">1</ref> shows screenshots from the application <ref type="foot" target="#foot_12">14</ref> for image searches using full-text (Fig. <ref type="figure">1a</ref>) or image similarity (Figs. <ref type="figure">1b to 1d</ref>). Table <ref type="table">2</ref> shows image-based query results with four different images. For the first row, the query exists in the collection, and all models recover it as the top result. Similarly, for the second row, all models return nautical results, and CLIP is the only model that does not return illustrations with lighthouses. Finally, the third and fourth rows show examples of querying with images outside of the collection, where we see that the returned images are content-wise similar. The fourth row demonstrates an example where CLIP embedding vectors fail, leading to irrelevant results. Furthermore, the exact image retrieval experiments demonstrate that our application can recover queried transformed images. As demonstrated in Table <ref type="table">3</ref>, SigLIP performed slightly better than ViT and CLIP and retrieved 94 % of the target images in the first two rows of the search and 97 % in all ten displayed rows. See GitHub for code and details.</p><p>The manual image labelling <ref type="foot" target="#foot_13">15</ref> showed that 349/2000 (17 %) of the graphical elements were blank pages and 524/2000 (26 %) were segmentation anomalies (e.g. tables, text, etc.) -for complete label distribution, see Fig. <ref type="figure">2</ref>. Moreover, the logistic regression model performs well, obtaining a cross-validated F1 score of 96 % (𝜎 = 5.1 %). From the cross-validated confusion matrix, we see that only 66/1127 (&lt; 6 %) of all graphical elements were incorrectly classified as either blank pages or segmentation errors, with a marked amount of incorrect classifications being from the "Graphical element" class. We also observed that the SigLIP embeddings were selected in all 20 outer cross-validation folds, indicating their superiority for this classification task compared to ViT and CLIP. Fig. <ref type="figure">2</ref> also shows the estimated class distribution on the full dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion and conclusion</head><p>These promising results demonstrate that pre-trained computer vision models provide meaningful embeddings. This is notable as our data consists of pre-1900 book images and differs vastly from the training set of such models, which are typically scraped from the internet. Furthermore, the results indicate that SigLIP embeddings slightly outperforms CLIP and ViT for all tasks -even for image classification, which ViT was trained for -in line with prior results showing that multimodal models are more robust to out-of-sample data <ref type="bibr" target="#b14">[15]</ref>.</p><p>While all models perform well for retrieval, CLIP sometimes struggled, particularly if the object of interest was off-centre. In such cases, the object is cropped out during preprocessing and matches will be based on the remaining image. Furthermore, the application performs well for exact image retrieval, even with up to 30 % cropping in both directions and up to ±10 ∘ rotation. These results are promising, but more work is still needed to evaluate performance for other degradations (e.g. simulated print and scanning artefacts). Finally, the encouraging image classification results indicate advantages of adding this methodology to the data ingestion pipeline. Filtering out irrelevant elements can save up to 40 % storage and improve the search results.</p><p>In conclusion, we found that by combining tagged graphical elements of the book digitisation process, NLN's IIIF endpoint and recent advances in artificial intelligence, we can create an efÏcient image search application that facilitates exploring the library's collection in a new way.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Future work</head><p>As the current prototype image-search app only supports books pre-1900, a natural extension is including illustration objects from all NLN's digitised books and newspapers. Moreover, as one use case we consider is exact image retrieval, an obvious next step is more thorough analysis of the the application's accuracy on this task, e.g. using additional evaluation measurements for recall, and including domain-specific degradation (e.g. simulated halftone and scanning artefacts). Another avenue for future work is comparing deep learning-based similarity measures with simpler, less computation-and storage-intensive approaches like hashing-based methods. Additionally, we want to make the software more adaptable, ultimately creating open-source infrastructure to further these methods' accessibility for other ALTO-XML and IIIF collections.</p><p>Future work should explore the embeddings further, e.g. using CLIP and SigLIP for textbased image retrieval. Additionally, performance could improve by fine-tuning the embeddings on domain-relevant data. Moreover, we have so far only used the embeddings for image retrieval and classification. Using the embeddings as the base to discover clusters, automatically tag the images or create image descriptions are, therefore, interesting potential steps. Another important direction is digging deeper into what the models consider "similar" through visualisations and empirical experiments. Finally, because deep learning-based embeddings are trained on datasets with known biases <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b13">14]</ref>, examining biases in these embeddings is crucial.  </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :Figure 2 :</head><label>12</label><figDesc>Figure 1: Screenshots of the image search application: context-based search for "kat" (old Norwegian for cat) (a) and image-based query with a user-uploaded cat image (c). (b) and (d) show the results when selecting an image in (a) and (c), respectively. The app also has a collapsible sidebar (not shown) that we used for selecting SigLIP embedding vectors.</figDesc><graphic coords="11,77.52,418.69,233.55,197.59" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 4</head><label>4</label><figDesc>Confusion matrix for the classification based on the outer cross-validation loop validation sets; it shows the number of elements with label 𝑎 (columns) classified as label 𝑏 (rows).</figDesc><table><row><cell>True</cell><cell>class</cell><cell>Segmentation</cell><cell>anomaly</cell><cell>Blank</cell><cell>page</cell><cell>Graphical</cell><cell>element</cell><cell>Illustration or</cell><cell>photograph</cell><cell>Musical</cell><cell>notation</cell><cell>Map</cell><cell>Mathematical</cell><cell>chart</cell></row><row><cell cols="2">Predicted class</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2">Segmentation anomaly</cell><cell cols="2">496</cell><cell></cell><cell>5</cell><cell></cell><cell>28</cell><cell></cell><cell>8</cell><cell></cell><cell>2</cell><cell>1</cell><cell></cell><cell>2</cell></row><row><cell cols="2">Blank page</cell><cell></cell><cell>11</cell><cell cols="2">339</cell><cell></cell><cell>8</cell><cell></cell><cell>1</cell><cell></cell><cell>0</cell><cell>0</cell><cell></cell><cell>0</cell></row><row><cell cols="2">Graphical element</cell><cell></cell><cell>14</cell><cell></cell><cell>2</cell><cell cols="2">278</cell><cell></cell><cell>15</cell><cell></cell><cell>1</cell><cell>0</cell><cell></cell><cell>2</cell></row><row><cell cols="2">Illustration or photograph</cell><cell></cell><cell>1</cell><cell></cell><cell>3</cell><cell></cell><cell>16</cell><cell cols="2">558</cell><cell></cell><cell>1</cell><cell>2</cell><cell></cell><cell>5</cell></row><row><cell cols="2">Musical notation</cell><cell></cell><cell>1</cell><cell></cell><cell>0</cell><cell></cell><cell>0</cell><cell></cell><cell>0</cell><cell cols="2">109</cell><cell>0</cell><cell></cell><cell>0</cell></row><row><cell cols="2">Map</cell><cell></cell><cell>1</cell><cell></cell><cell>0</cell><cell></cell><cell>0</cell><cell></cell><cell>2</cell><cell></cell><cell cols="2">0 41</cell><cell></cell><cell>0</cell></row><row><cell cols="2">Mathematical chart</cell><cell></cell><cell>0</cell><cell></cell><cell>0</cell><cell></cell><cell>0</cell><cell></cell><cell>8</cell><cell></cell><cell>0</cell><cell>0</cell><cell></cell><cell>39</cell></row><row><cell cols="10">A perfect classifier will only have nonzero entries on the diagonal.</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">https://www.nb.no/dh-lab/apper/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">https://www.loc.gov/standards/alto/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">https://www.nb.no/maken/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3">https://lab.kb.se/bildsok/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4">https://beta.nasjonalmuseet.no/collection/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_5">https://digitalpreservation-blog.nb.no/docs/formats/preferred-formats-en/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_6">https://www.loc.gov/standards/alto/techcenter/layout.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_7">https://iiif.io/api/image/2.0/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_8">Commit hash: 3f49326eb077187dfe1c2a2bb15fbd74e6ab91e3</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="11" xml:id="foot_9">Commit hash: 3d74acf9a28c67741b2f4f2ea7635f0aaf6f0268</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="12" xml:id="foot_10">Commit hash: a66c5982c8c396206b96060e2bf837d6731a326f</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="13" xml:id="foot_11">https://github.com/Sprakbanken/CHR24-image-retrieval</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="14" xml:id="foot_12">https://dh.nb.no/run/bildesok/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="15" xml:id="foot_13">The labels and analysis code are available on GitHub</note>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0" />			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Ansel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Gimelshein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Voznesensky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Bao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Berard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Burovski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Chauhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chourdia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Constable</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Desmaison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Devito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ellison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gschwind</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Hirsh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kalambarkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kirsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lazos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lezcano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">K</forename><surname>Luk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Maher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Puhrsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Reso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Saroufim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Y</forename><surname>Siraichi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Suk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Suo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Tillet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mathews</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Chanan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chintala</surname></persName>
		</author>
		<idno type="DOI">10.1145/3620665.3640366</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems</title>
				<meeting>the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems<address><addrLine>La Jolla, CA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="929" to="947" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Distant Viewing: Analyzing Large Visual Corpora</title>
		<author>
			<persName><forename type="first">T</forename><surname>Arnold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Tilton</surname></persName>
		</author>
		<idno type="DOI">10.1093/llc/fqz013</idno>
	</analytic>
	<monogr>
		<title level="j">Digital Scholarship in the Humanities</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="page" from="3" to="16" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Birhane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">U</forename><surname>Prabhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kahembwe</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2110.01963.2021.arXiv:2110.01963" />
		<title level="m">Multimodal datasets: misogyny, pornography, and malignant stereotypes</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">NB DH-LAB: A Corpus Infrastructure for Social Sciences and Humanities Computing</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">B</forename><surname>Birkenes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Johnsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kåsen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLARIN Annual Conference Proceedings 2023</title>
				<meeting><address><addrLine>Leuven, Belgium</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="30" to="34" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">From Digital Library to N-Grams: NB N-gram</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">B</forename><surname>Birkenes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">G</forename><surname>Johnsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Lindstad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ostad</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th Nordic Conference of Computational Linguistics</title>
				<meeting>the 20th Nordic Conference of Computational Linguistics<address><addrLine>Vilnius, Lithuania</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="293" to="295" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">State of the Art: Image Hashing</title>
		<author>
			<persName><forename type="first">R</forename><surname>Biswas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Blanco-Medina</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2108.11794.2021.arXiv:2108.11794" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale</title>
		<author>
			<persName><forename type="first">A</forename><surname>Dosovitskiy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Beyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kolesnikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Weissenborn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Unterthiner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dehghani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Minderer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Heigold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Houlsby</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2010.11929</idno>
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations</title>
				<meeting><address><addrLine>Vienna, Austria</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="1" to="21" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Deep Residual Learning for Image Recognition</title>
		<author>
			<persName><forename type="first">K</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</title>
				<meeting>the IEEE Conference on Computer Vision and Pattern Recognition<address><addrLine>Las Vegas, NV, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="770" to="778" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale</title>
		<author>
			<persName><forename type="first">K</forename><surname>Hosseini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">C S</forename><surname>Wilson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Beelen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Mcdonough</surname></persName>
		</author>
		<idno type="DOI">10.1145/3557919.3565812</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 6th ACM SIGSPATIAL International Workshop on Geospatial Humanities</title>
				<meeting>the 6th ACM SIGSPATIAL International Workshop on Geospatial Humanities<address><addrLine>Seattle, WA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="8" to="19" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">The Art of Computer Programming</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E</forename><surname>Knuth</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Sorting and Searching</title>
				<meeting><address><addrLine>Reading, MA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Addison-Wesley</publisher>
			<date type="published" when="1997">1997</date>
			<biblScope unit="volume">3</biblScope>
		</imprint>
	</monogr>
	<note>2nd Ed.. 2nd ed</note>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><surname>Kopinor</surname></persName>
		</author>
		<ptr target="https://www.kopinor.no/avtaletekster/bokhylla-avtalen-fra-2024.2023" />
		<title level="m">Bokhylla-avtalen</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">EfÏcient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs</title>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">A</forename><surname>Malkov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A</forename><surname>Yashunin</surname></persName>
		</author>
		<idno type="DOI">10.1109/tpami.2018.2889473</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Pattern Analysis and Machine Intelligence</title>
		<imprint>
			<biblScope unit="volume">42</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="824" to="836" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Approximate Nearest Neighbor Algorithm Based on Navigable Small World Graphs</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Malkov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ponomarenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Logvinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Krylov</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.is.2013.10.006</idno>
	</analytic>
	<monogr>
		<title level="j">Information Systems</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="page" from="61" to="68" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Multimodal bias: Assessing gender bias in computer vision models with NLP techniques</title>
		<author>
			<persName><forename type="first">A</forename><surname>Mandal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Little</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Leavy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 25th International Conference on Multimodal Interaction (ICMI &apos;23)</title>
				<meeting>the 25th International Conference on Multimodal Interaction (ICMI &apos;23)<address><addrLine>Paris, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="416" to="424" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">How hard are computer vision datasets? calibrating dataset difÏculty to viewing time</title>
		<author>
			<persName><forename type="first">D</forename><surname>Mayo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cummings</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gutfreund</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Katz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Barbu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 37th International Conference on Neural Information Processing Systems</title>
				<meeting>the 37th International Conference on Neural Information Processing Systems<address><addrLine>New Orleans, LA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="11008" to="11036" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Multimedia Search Reranking: A Literature Survey</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Rui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Tian</surname></persName>
		</author>
		<idno type="DOI">10.1145/2536798</idno>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<biblScope unit="volume">46</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page">38</biblScope>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Is Medieval Distant Viewing Possible? : Extending and Enriching Annotation of Legacy Image Collections Using Visual Analytics</title>
		<author>
			<persName><forename type="first">C</forename><surname>Meinecke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Guéville</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">J</forename><surname>Wrisley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jänicke</surname></persName>
		</author>
		<idno type="DOI">10.1093/llc/fqae020</idno>
	</analytic>
	<monogr>
		<title level="j">Digital Scholarship in the Humanities</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="638" to="656" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Scikit-Learn: Machine Learning in Python</title>
		<author>
			<persName><forename type="first">F</forename><surname>Pedregosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Varoquaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gramfort</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Michel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Thirion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Grisel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Blondel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Prettenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dubourg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vanderplas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Passos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cournapeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brucher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Perrot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">É</forename><surname>Duchesnay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2825" to="2830" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Learning Transferable Visual Models From Natural Language Supervision</title>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hallacy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ramesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Goh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sastry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Askell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mishkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Krueger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 38th International Conference on Machine Learning. Online</title>
				<meeting>the 38th International Conference on Machine Learning. Online</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="8748" to="8763" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">From Clusters to Graphs -Toward a Scalable Viewing of News Videos</title>
		<author>
			<persName><forename type="first">N</forename><surname>Ruth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Burghardt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Liebl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computational Humanities Research Conference</title>
				<imprint>
			<date type="published" when="2023">2023. CHR2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Towards Multimodal Computational Humanities. Using CLIP to Analyze Late-Nineteenth Century Magic Lantern Slides</title>
		<author>
			<persName><forename type="first">T</forename><surname>Smits</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kestemont</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computational Humanities Research Conference</title>
				<imprint>
			<date type="published" when="2021">2021. CHR2021. 2021</date>
			<biblScope unit="page" from="149" to="158" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">A Multimodal Turn in Digital Humanities. Using Contrastive Machine Learning Models to Explore, Enrich, and Analyze Digital Visual Historical Collections</title>
		<author>
			<persName><forename type="first">T</forename><surname>Smits</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wevers</surname></persName>
		</author>
		<idno type="DOI">10.1093/llc/fqad008</idno>
	</analytic>
	<monogr>
		<title level="j">Digital Scholarship in the Humanities</title>
		<imprint>
			<biblScope unit="volume">38</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="1267" to="1280" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">The International Image Interoperability Framework (IIIF): A Community &amp; Technology Approach for Web-Based Images</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">K</forename><surname>Snydman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sanderson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Cramer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Archiving Conference</title>
				<meeting><address><addrLine>Los Angeles, CA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="16" to="21" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">A Statistical Interpretation of Term Specificity and Its Application in Retrieval</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">Spärck</forename><surname>Jones</surname></persName>
		</author>
		<idno type="DOI">10.1108/eb026526</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Documentation</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="11" to="21" />
			<date type="published" when="1972">1972</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">The Visual Digital Turn: Using Neural Networks to Study Historical Images</title>
		<author>
			<persName><forename type="first">M</forename><surname>Wevers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Smits</surname></persName>
		</author>
		<idno type="DOI">10.1093/llc/fqy085</idno>
	</analytic>
	<monogr>
		<title level="j">Digital Scholarship in the Humanities</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="194" to="207" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Transformers: State-of-the-Art Natural Language Processing</title>
		<author>
			<persName><forename type="first">T</forename><surname>Wolf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Delangue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cistac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Louf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Funtowicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Davison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shleifer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Von Platen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Jernite</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Plu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">Le</forename><surname>Scao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gugger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Drame</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Lhoest</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rush</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.emnlp-demos.6</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</title>
				<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="38" to="45" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Sigmoid Loss for Language Image Pre-Training</title>
		<author>
			<persName><forename type="first">X</forename><surname>Zhai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mustafa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kolesnikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Beyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision</title>
				<meeting>the 2023 IEEE/CVF International Conference on Computer Vision<address><addrLine>Paris, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="11975" to="11986" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<title level="m" type="main">Recent Advance in Content-based Image Retrieval: A Literature Survey</title>
		<author>
			<persName><forename type="first">W</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Tian</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/1706.06064.2017.arXiv:1706.06064" />
		<imprint/>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
