<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Contextual 𝑘NN Ensemble Retrieval Approach for Semantic Postal Address Matching</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">El</forename><forename type="middle">Moundir</forename><surname>Faraoun</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">LIASD Paris 8 University</orgName>
								<address>
									<addrLine>2 rue de la liberté</addrLine>
									<settlement>Saint-Denis</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="laboratory">TEDIES</orgName>
								<orgName type="institution">TALK solutions</orgName>
								<address>
									<addrLine>45 Av. de Paris</addrLine>
									<settlement>Monéteau</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nédra</forename><surname>Mellouli</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">LIASD Paris 8 University</orgName>
								<address>
									<addrLine>2 rue de la liberté</addrLine>
									<settlement>Saint-Denis</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="laboratory" key="lab1">ESILV DVRC</orgName>
								<orgName type="laboratory" key="lab2">Léonard de Vinci group</orgName>
								<address>
									<addrLine>12 Av. Léonard de Vinci</addrLine>
									<settlement>Paris La défense</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Stéphane</forename><surname>Millot</surname></persName>
							<affiliation key="aff1">
								<orgName type="laboratory">TEDIES</orgName>
								<orgName type="institution">TALK solutions</orgName>
								<address>
									<addrLine>45 Av. de Paris</addrLine>
									<settlement>Monéteau</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Myriam</forename><surname>Lamolle</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">LIASD Paris 8 University</orgName>
								<address>
									<addrLine>2 rue de la liberté</addrLine>
									<settlement>Saint-Denis</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Contextual 𝑘NN Ensemble Retrieval Approach for Semantic Postal Address Matching</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">3EFD204124CD1A9845934DEAE60FFE38</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:23+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Address matching or transport entity alignment</term>
					<term>Recipients/consignees identification or pairing</term>
					<term>Recovery of recipients</term>
					<term>Address retrieval</term>
					<term>Ensemble 𝑘NN retrieval models</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The biggest challenge today regarding courier services (delivery of small to medium-sized parcels) is the problem of Address Matching. With the expansion of geographical data and the diversity of formats in which it is received, traditional matching methods are becoming increasingly obsolete due to the lack of conformity of delivery information with postal address writing standards. These new constraints are affecting parcel delivery quality in terms of deliverables, cost and environmental impact. This research focuses on courier delivery data (i.e. postal addresses of recipients) in the context of matching French postal addresses. We introduce a new ensemble retrieval approach to the problem through a voting system leveraging multiple k-Nearest Neighbors search algorithms, called 𝑘NN-vote which effectively transform the Address Matching task to an Address Retrieval task. 𝑘NN-vote returns the top best normalized addresses similar to a given query (a non-normalized delivery address). The system takes advantage of several address representations, in particular Pre-trained Transformers-Based Sentence Embeddings. The system has been tested on a real database of French delivery addresses. The method meets high expectations, returning exactly matched addresses with a success rate of up to 96% in top 10 as well as 86% in top 1.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The transport Entity Alignment problem, also known as the postal Address Matching (AM) problem is inherently an NLP task given that a postal address is mainly structured as a short sentence with a specific arrangement of Named Entities (i.e. attributes or features like Road Name or Door Number) which makes it fall within the scope of Entity Matching (EM). The task involves effectively processing and comparing structural components of a pair of addresses (𝑎, 𝑏) for accurate matching (i.e. 𝑎 and 𝑏 refer to the same real world object).</p><p>Carriers identify delivery addresses received via EDI (Electronic Data Interchange) by matching them with recipient addresses already registered in their database. Nothing could be more simple at first glance, except that delivery addresses are increasingly received in non normalized forms. The addresses received are often incorrect and/or noisy, thus identifying a valid address from an invalid one becomes a very challenging task. The anomalies present in a delivery address can be: <ref type="bibr" target="#b0">(1)</ref> Writing errors, including typographic ones, spelling mistakes, repetition, or the absence of specific address features ; <ref type="bibr" target="#b1">(2)</ref> Address noise may involve personal information, such as names, phone numbers, or requests for appointments ; (3) Lastly, Semantic or contextual errors include the presence of features from unrelated addresses, feature replacements (e.g. "avenue" instead of "street"), feature aliases, like abbreviations or acronyms as well as polysemous 1 features and finally addresses represented by their semantic synonyms, specifically named zones or parks.</p><p>IAL@ECML-PKDD'24: 8 th Intl. Worksh. &amp; Tutorial on Interactive Adaptive Learning, Sep. 9 th , 2024, Vilnius, Lithuania el.moundir.faraoun@gmail.com (E. M. Faraoun); n.mellouli@iut.univ-paris8.fr (N. Mellouli); Stephane.millot@edies.fr (S. Millot); m.lamolle@iut.univ-paris8.fr (M. Lamolle) Let's take for example, the following real delivery address received by a French Carrier: "avenue du g n ral leclerc centre commercial auchan 89200 avallon". Here the correct Road Type is "rue" instead of "avenue" and the typographic error in "g n ral" is intended as "general". Not to mention the absence of a Door Number in the address. We finally note that "centre commercial auchan" is a semantic synonym for the address. These anomalies distort the structure of an address and prevent it from being paired with a valid address record.</p><p>The AM problem is traditionally solved with a binary "Match/No Match" classification of address pairs <ref type="bibr" target="#b0">[1]</ref> mainly relying on neural network-based methods <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b0">1,</ref><ref type="bibr" target="#b5">6</ref>]; yet, the task itself is imagined in a scenario of matching address records between two tables or deduplicating records in a data table. However, in the context of delivery, this correspondence is a search for information similar to a given request (address received). Thus, we are dealing with an unsupervised Information Retrieval (IR) problem in which each new address is treated as a query, which may be valid or incorrectly formatted, and for which we try to find valid "candidate" addresses in the database. This formalization is very relevant since it allows retrieved candidates for a delivery address to be sorted in terms of contextual similarity. Furthermore, the number of "candidate" addresses is relatively small, reducing the computation time if all reference address records have to be aligned.</p><p>Our objective in this research is to take advantage of the various possible representations of addresses, in particular Transformer-Based Sentence Embeddings in the context of Information Retrieval. We propose an ensemble multi-embeddings models approach based on the 𝑘-Nearest Neighbors algorithm (𝑘NN) <ref type="bibr" target="#b6">[7]</ref>, with a voting process between multiple 𝑘NN search models.</p><p>The remainder of this paper is organized into 7 sections. Section 2 reviews the work carried out in relation to address matching. Section 3 formalizes the Address Retrieval problem. We describe our approach in Section 4 and present its experimental settings in Section 5. Results are detailed and discussed in Section 6. We conclude this work by considering its limitations and prospects for improvement in Section 7.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work and State-of-the-Art</head><p>The existing solutions for Address Matching can be summed up in two approaches. An approach based on string similarity measures or matching rules <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9]</ref>. However, the problem remains that these methods are based mainly on structural comparisons between addresses, and they quickly become obsolete when faced with addresses that are written differently but retain the same semantic meaning <ref type="bibr" target="#b3">[4]</ref>. In fact, textual similarity distances such as Levenshtein and others <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13,</ref><ref type="bibr" target="#b13">14]</ref>, are used for address matching. These distances depend on the choice of a similarity threshold, which is generally high. This score remains very approximate and eliminates the possibility of matching pairs below the chosen threshold. Other methods are based on decision tree matching rules <ref type="bibr" target="#b8">[9]</ref>. These methods improve the matching performance but require systematic calibration of the rules by experts due to the diversity of address writing models.</p><p>A second approach based on machine learning (ML) or deep learning architecture (DL) aims to learn the semantic similarity between addresses <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5]</ref>. These methods mainly rely on vector representations of address elements such as Word2Vec <ref type="bibr" target="#b14">[15]</ref> or FastText <ref type="bibr" target="#b15">[16]</ref> to use them as input data of ML (e.g. Random Forest, XGBoost) or DL inference models (e.g. ESIM <ref type="bibr" target="#b3">[4]</ref>, ABLC <ref type="bibr" target="#b4">[5]</ref>) for classification. However, in order to get those word embeddings, a parsing step is needed, which is the process of segmenting addresses into their essential features or elements (e.g. Road Number, Road Name or Postal Code). Various parsing techniques were used for this task. For instance, the latter studies used respectively CRFs <ref type="bibr" target="#b16">[17]</ref>, heuristic rules <ref type="bibr" target="#b2">[3]</ref>, Jieba<ref type="foot" target="#foot_0">2</ref> algorithm and the Trie syntax tree algorithm. Nonetheless, these methods often come short in the proper parsing of noisy erroneous addresses. Furthermore, the lack of context between words in an address due to the static nature of word embeddings suggests that these methods may fail to match certain ambiguous addresses, such as synonymous or polysemous ones <ref type="bibr" target="#b0">[1]</ref>, and addresses that are too distorted by noise and errors.</p><p>Recently, the advent of pre-trained transformer encoders <ref type="bibr" target="#b17">[18]</ref>, like Roberta <ref type="bibr" target="#b18">[19]</ref>, has transformed various tasks by introducing hyper-contextualized word embeddings <ref type="bibr" target="#b0">[1]</ref>. This breakthrough has enabled the achievement of state-of-the-art performances through fine-tuning these encoders for specific tasks, particularly in Entity Matching <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b20">21]</ref>. In the context of Address Matching, a model named GeoRoberta <ref type="bibr" target="#b0">[1]</ref> is proposed. It involves a generation of geographical knowledge for addresses by fine-tuning a Roberta encoder for the task of address features tags detection. It also allows to obtain a textual encoding of GoogleMaps API<ref type="foot" target="#foot_1">3</ref> geographical coordinates of addresses based on Geohash <ref type="foot" target="#foot_2">4</ref> . It is worth to know that GeoRoberta, is based on a pre-trained Roberta encoder as well. It generates augmented contextualized embeddings for an address pair by combining at input, elements of both addresses and their Geohash encodings. The output embeddings are fused afterwards with a second augmented pair of addresses, by combining the feature tags embeddings and their Geohash tag embeddings. This final fused representation is fed into a matching classification layer for the address matching task. The approach integrates textual and geographical data, leveraging the power of pre-trained transformers which allows the matching of polysemous and synonymous addresses more efficiently. However, the generation of Geohash coordinates is based on Google geocoding, which is likely to be wrong for certain ambiguous or excessively erroneous addresses.</p><p>We argue that the use of sentence embeddings to represent addresses in the context of similar information retrieval is much more adapted in terms of representation quality <ref type="bibr" target="#b5">[6]</ref>. This type of representation uses the training of Trasformer-Based Bi-Encoders <ref type="bibr" target="#b21">[22]</ref> for the Semantic Textual Similarity (STS) task. It succeeds in reducing the distance between two addresses in a latent space even though they have different expressions. Moreover it solves the problem of synonymous addresses and allows the resolution of Address Matching through Information Retrieval algorithms <ref type="bibr" target="#b6">[7]</ref>. Such a solution was introduced in <ref type="bibr" target="#b5">[6]</ref> by fine-tuning a DistilBert <ref type="bibr" target="#b22">[23]</ref> Bi-Encoder on address pairs and utilize it for query top "Candidate" addresses retrieving after which a fine-tuned Cross-Encoder for address pair classification is used as a top candidates Re-ranker. To take this idea further, we propose several types of representations, vectors (sentence and word embeddings) and raw (textual address content). Giving rise to several lists of 𝑘 normalized addresses candidates via the Ensemble 𝑘NN algorithms, we propose to finally re-rank them through a vote based on the maximum number of appearances of a given candidate (i.e. Term Frequency) among the ensemble 𝑘NN models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Address Retrieval formalization</head><p>In this section, we introduce the address structure and we define its schema allowing us to formalize the Address retrieval (AR) problem. We are focusing on French reference addresses and considering only the French address features. Therefor, the correct structure of any address is the one that follows the official representation model<ref type="foot" target="#foot_3">5</ref> of French postal addresses, namely any address that contains the basic features of the latter which makes it possible to precisely identify the geographical point of the recipient. The features of a correct French address are described in Fig. <ref type="figure" target="#fig_0">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Address model</head><p>Address structure definition: Let be a set of vocabulary 𝑉 , which includes all permissible instances of possible features in a given address. For example, "avenue" might be an instance of the feature RoadType. We define 𝐷 𝑛𝑜𝑟𝑚 as the set of all correctly structured, normalized address sentences. A normalized address would follow in this case the official model of a French address.</p><p>Within 𝐷 𝑛𝑜𝑟𝑚 , there exits a reference set 𝐷 𝑟𝑒𝑓 such that, ∀ 𝑎 ∈ 𝐷 𝑟𝑒𝑓 , 𝑎 is both normalized and corresponds to an actual real-world location. Thus 𝐷 𝑟𝑒𝑓 is the set of all normalized valid addresses with a real geographical point. </p><formula xml:id="formula_0">• 𝑥 1 is an instance of DoorNumber, • 𝑥 𝑛 is an instance of CityName,</formula><p>• ⪯ is a partial order relation defined on 𝑉 , that we denote (𝑉, ⪯) such that for any</p><formula xml:id="formula_1">1 ≤ 𝑖 &lt; 𝑗 ≤ 𝑛, ∃ (𝑥 𝑖 , 𝑥 𝑗 ) ∈ 𝑉 × 𝑉 , and 𝑥 𝑖 ⪯ 𝑥 𝑗 , • 𝑓 ⪯ (𝑥 1 , ..., 𝑥 𝑛 ) ↦ → 𝑎 ∈ 𝐷 𝑛𝑜𝑟𝑚 such that 𝑎 = 𝑥 1 𝑥 2 ...𝑥 𝑛 .</formula><p>With the following formalization framework, 𝑓 ⪯ (•) could be assumed as a grammar allowing us to generate address sentences that are syntactically and semantically correct. Moreover, if 𝑎 ∈ 𝐷 𝑟𝑒𝑓 , 𝑎 is a normalized address with a real-world location.</p><p>The latter definition allows us to consider any address that follows the address model ℳ as normalized. That being said, an address can be normalized but nonexistent. The following examples of French addresses illustrate this point:</p><p>• (i) "16 avenue jean jaures 89000 auxerre" is a normalized existing address. • (ii) "16 rue jean jaures 89300 joigny" is a normalized address but nonexistent.</p><p>Although the second address is technically correct in its structure, a simple anomaly like features instance replacement of RoadType, PostalCode and CityName makes it not corresponding to a real location. In such a case, the ensemble 𝑘NN multi-embeddings models are interesting since the address semantic context is considered.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Address retrieval</head><p>Now that the address structure formalism is defined, through the lexicographic order relation defined on the feature instances of an address, we assume in the following of our work that an address is simply a structured sentence with a particular context (a.k.a an address sentence). We define the problem of Address Retrieval as a problem of semantic search of textual documents.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Address embedding definition:</head><p>Let 𝑎 be an address sentence. Given a textual encoders 𝐸, An address representation is defined as the output of 𝐸 where 𝑎 is the input. We define 𝐸 0 as id(•) (i.e. the identity function) and therefor, an address representation can be:</p><p>• raw (i.e. the textual content of the address itself) through 𝐸 0 • a vector embedding through a neural encoder 𝐸.</p><p>In the rest of the paper, and for the sake of simplicity, we refer to the ensemble of raw and vector model embeddings as multi-embeddings models.</p><p>Contextual 𝑘NN Address Retrieval task: We want to obtain for a given address query 𝑞 and through an encoder 𝐸, a query representation 𝑒 𝑞 ∈ 𝒳 𝐸 for 𝑘NN retrieval. The Neighborhood of 𝑒 𝑞 is then constructed by fetching its 𝑘 nearest neighbors from a set of reference address sentence representations 𝒳 𝐷 𝑟𝑒𝑓 ⊂ 𝒳 𝐸 according to a distance function 𝑑(•) : 𝒳 2 𝐷 𝑟𝑒𝑓 → R. More formally, the 𝑘 nearest neighbors of 𝑒 𝑞 can be obtained by:</p><formula xml:id="formula_2">𝒦 := {𝑖 1 , 𝑖 2 , . . . , 𝑖 𝑘 | 𝑑(𝑒 𝑞 , 𝑒 𝑖 𝑗 ) are the 𝑘 smallest distances, 𝑖 𝑗 ∈ [|𝒳 𝐷 𝑟𝑒𝑓 |]}<label>(1)</label></formula><p>where 𝒦 denotes the set of indices in [|𝒳 𝐷 𝑟𝑒𝑓 |] = {1, ..., |𝒳 𝐷 𝑟𝑒𝑓 |}, which points to 𝑘 neighbors with the smallest distances close to 0.</p><p>Although a distance 𝑑 depends on the fixed encoder used for an address representation, our 𝑘NN retrieval model is still generic. For example, if 𝐸 is a Transformer-Based Bi-Encoder model then the distance 𝑑 would be a 𝑐𝑜𝑠𝑖𝑛𝑒-like distance. Roughly speaking, our 𝑘NN model have three parameters: the 𝑘, the 𝐸 representation and the distance 𝑑 <ref type="bibr" target="#b23">[24,</ref><ref type="bibr" target="#b24">25,</ref><ref type="bibr" target="#b25">26]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Our Approach</head><p>Ensemble voting for multi-embeddings 𝑘NN models is a robust technique that exploits the strengths of different embedding methods to improve prediction accuracy. By generating multiple embeddings for the same data and combining the predictions of multiple 𝑘NN models through voting, we can achieve better performance and more reliable results. This approach is particularly useful for our task in which different embeddings capture different aspects of the addresses. In order to perform the task of correct address retrieval, we had to undergo the subsequent steps: (1) Data pre-processing and deduplication for both delivery and reference addresses, (2) Offline fine-tuning of different Bi-Encoders on the STS task in order to construct multiple retrieval sets of normalized address embeddings, (3) 𝑘NN retrieval models construction (see Fig. <ref type="figure" target="#fig_1">2</ref>) and ( <ref type="formula">4</ref>) Online aggregating of the different search results through the design of a vote schema (see Fig. <ref type="figure" target="#fig_2">3</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Data pre-processing</head><p>Before fine-tuning the Bi-Encoders, it was necessary to go through two word pre-processing steps then a deduplication step:</p><p>• The first step is the cleaning of both delivery and reference addresses and it involves removing accents and punctuation that might be present in data. • The second step concerns the removal of interfering elements. This step is only applied to the delivery addresses given that all reference addresses are supposed to be correct and normalized. This step removes a set of unnecessary symbols that can be found in non-normalized addresses (e.g. '+', '*', '&amp;', . . . etc.). • The third step is the deduplication of delivery address records. By removing these exact duplicates, we ensured that our fine-tuning process was efficient and not biased by redundant data points.</p><p>The final step is dataset creation for the Bi-Encoders fine-tuning. This step includes another cleaning process we explained in details in 5.1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Offline fine-tuning of Bi-Encoders</head><p>Here, we have as an input a set of (delivery, reference) address pairs. The aim of this step is to fine-tune multiple bi-encoders to generate the address sentence vector embeddings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1.">Bi-Encoder</head><p>Bi-Encoders are Siamese Transformers Networks generally fine-tuned on Semantic Textual Similarity tasks for the purpose of generating meaningful sentence embeddings. Typically, a pre-trained transformer model is first chosen as the training base of the Bi-Encoder. We use two types of pre-trained models:</p><p>• "Camembert-base" <ref type="bibr" target="#b26">[27]</ref> , a model specific to the French language, • "XLM-Roberta-base" <ref type="bibr" target="#b27">[28]</ref>, a multilingual model, which we both adapted on a large corpus of French postal addresses by continuing their training on the Masked Language Modeling (MLM) task. We also used the MLM objective to train a third small Roberta-based model <ref type="bibr" target="#b18">[19]</ref> from scratch on the same corpus.</p><p>Given an address sentence pair (𝑎, 𝑏), a forward pass of the transformer over each tokenized address generates token embeddings for both 𝑎 and 𝑏. Mean pooling is then applied on each address token representations resulting in two fixed length vectors which will be our address sentence embeddings. Considering a specific STS task, the best semantic address matching performance is found through the optimization of an objective function such as the "contrastive loss" <ref type="bibr" target="#b28">[29]</ref> which is used mainly in neural networks for classification and matching tasks, such as similarity learning. It is often used in Siamese networks to train models to learn similar representations for pairs of similar samples and dissimilar representations for pairs of dissimilar samples. Readers interested in exploring the Bi-Encoder architecture can refer to <ref type="bibr" target="#b21">[22]</ref>.</p><p>In our case, our sentences are postal addresses that are no more than a few words long. In addition, all addresses have, more-or-less, the same vocabulary that repeats itself, such as road types or city names. All this reduces the diversity of context between dissimilar addresses. This constraint led us to believe that using a basic objective function would not succeed in creating a sufficient gap in terms of distance between dissimilar addresses. To overcome this, we decided to use the "Multiple Negative Ranking Loss" (MNLR) objective function <ref type="bibr" target="#b29">[30]</ref>, which is often used in the context of ranking and information retrieval tasks and therefore more suited to our similarity search task. This approach is supported by findings in <ref type="bibr" target="#b30">[31]</ref> which highlights that including multiple negatives in each batch enhances the model's ability to distinguish between dissimilar examples without the need to specifically design hard negative pairs. Finding truly effective negative examples can be challenging and significantly impact the performance, making MNRL's ability to utilize multiple negatives in a straightforward manner highly advantageous which leads to better performance and more robust embeddings.</p><p>Multiple Negative Ranking Loss definition: For a given 𝑁 address sentence embeddings pairs [(𝑒 𝑎 1 , 𝑒 𝑏 1 ), ..., (𝑒 𝑎 𝑁 , 𝑒 𝑏 𝑁 )] between query-reference address sentences (𝑎 1 , ..., 𝑎 𝑁 ) and (𝑏 1 , ..., 𝑏 𝑁 ) where (𝑎 𝑖 , 𝑏 𝑖 ) are labeled as similar, and (𝑎 𝑖 , 𝑏 𝑗 ) where 𝑖 ̸ = 𝑗 are labeled as not similar. The loss function is as follows:</p><formula xml:id="formula_3">− 1 𝑁 𝑁 ∑︁ 𝑖=1 ⎡ ⎣ 𝑆(𝑒 𝑎 𝑖 , 𝑒 𝑏 𝑖 ) − 𝑙𝑜𝑔 𝑁 ∑︁ 𝑗=1 𝑒 𝑆(𝑒𝑎 𝑖 ,𝑒 𝑏 𝑗 ) ⎤ ⎦<label>(2)</label></formula><p>This function allows the model to consider in a given batch of positive address pairs, for one sample (𝑎 𝑖 , 𝑏 𝑖 ), using all the normalized reference addresses 𝑏 𝑗 in the other positive pairs, 𝑁 − 1 negative pairs (𝑎 𝑖 , 𝑏 𝑗 ). This strategy helps the model to widen the distance between negative examples 𝑎 𝑖 , 𝑏 𝑗 where 𝑆 is the score function (Generally 𝑆(𝑒 𝑎 𝑖 , 𝑒 𝑏 𝑖 ) = 𝑐𝑜𝑠𝑖𝑛𝑒(𝑒 𝑎 𝑖 , 𝑒 𝑏 𝑖 )). This loss function helps reducing the impact of the lack of context in the addresses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.2.">Retrieval set creation</head><p>Having a dataset of normalized reference address sentences 𝐷 𝑟𝑒𝑓 and a fine-tuned Bi-Encoder 𝐸, we can generate a retrieval sentence embedding set 𝒳 𝐷 𝑟𝑒𝑓 through a forward pass over all address instances of 𝐷 𝑟𝑒𝑓 . This embedding set would be later used at inference time for the retrieval of a given query nearest neighbors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">𝑘NN retrieval models</head><p>𝑘NN-vote is an Ensemble Information Retrieval system based on the search results of multi 𝑘NN models, all similar in their operations but very different in their basis of representations of the searched addresses. In general, an individual 𝑘NN search model is a 𝑘-Nearest Neighbor algorithm which takes as a parameter a distance 𝑑 specific to the type of representation of the searched points (e.g. a Levenshtein or Jaccard distance for a raw textual representation). The algorithm computes all the distances between a query and the search points previously pre-registered in the retrieval reference set 𝒳 𝐷 𝑟𝑒𝑓 (𝒳 𝐷 𝑟𝑒𝑓 = 𝐷 𝑟𝑒𝑓 for raw textual representation) and returns the list of the 𝑘 most similar points having the smallest distance with the query. Table <ref type="table" target="#tab_0">1</ref> shows the different combinations of (encodings, similarities) that can be used in a 𝑘NN search model (𝑘NN Retriever) within the voting system. The table illustrate the possible types of address representation previously mentioned in Section 3.2, That is, the raw textual representation through which we will have different 𝑘NN search models each with a well-defined type of string distance (see Table <ref type="table" target="#tab_0">1</ref>); and (2) vector representation divided into two types:</p><p>• traditional embeddings built by way of mean pooling the static word embeddings of address elements such as Word2Vec, • contextual sentence embeddings, fine-tuned for textual similarity, of postal addresses. Without any a priori hypotheses about the origin of the errors, we have carried out an empirical search for the best address representation spaces with the appropriate similarity measures. We simply applied the various representations and similarity measures in the literature and compared eleven string similarity measures for the raw representations and four vector similarity measures for the static and dynamic embedding. representations (see Table <ref type="table" target="#tab_0">1</ref>). Some of the string similarity measures, such as "Ratio" or "Token_set_ratio," are taken from the fuzzywuzzy library<ref type="foot" target="#foot_4">6</ref> as they enable more robust and flexible comparisons by incorporating tokenization and sorting mechanisms. Unlike traditional metrics like Levenshtein and Jaro, which focus solely on character-level edits, fuzzywuzzy's methods account for word order and partial matches, making them more suitable for real-world text data. The chosen sentence embedding models are considered (see Section 4.2), hence we had a total of 31 𝑘NN models. The advantage here is to allow us to have a maximum of individual candidate lists of retrieved addresses in order to compare, firstly, the performance of each 𝑘NN Retriever model and, secondly, use them to draw the candidates in common between the lists as the most similar candidates. Fig. <ref type="figure" target="#fig_1">2</ref> shows the architecture of a single 𝑘NN Retriever. We finally define the similarity search process as follows: <ref type="bibr" target="#b0">(1)</ref> we convert a query 𝑞 by the desired representation type to have 𝑒 𝑞 ; (2) 𝑒 𝑞 is then passed into the 𝑘NN Retriever which will be responsible for computing the distances between 𝑒 𝑞 and all the representations in the retrieval set in order to return the 𝑘 address indices most similar to 𝑞 ranked according to the smallest distance. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Ensemble voting retrieval system</head><p>The system is termed "multi-embeddings models" due to its dual approach, leveraging both raw address representations and advanced deep learning (DL) techniques for vector text representations in address matching. The core functionality of the system involves returning a final list of 𝑚 similar candidate addresses through a voting process. Among the 𝑘NN ensemble models, the voting process is based on the maximum number of occurrences of a candidate address for a query. It should be noted that this system needs two types of important values: (1) the number of repetitions of each candidate address 𝑖 in the different 𝑘-lists; (2) the different similarity scores of a pair (𝑞,𝑖) for which 𝑖 appeared with different 𝑘NN models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.1.">Retrieval Flow</head><p>1. Candidate address lists retrieval: The system begins by retrieving the 𝑘-lists of candidate addresses using the ensemble 𝑘NN retrieving pipeline. Each model in the ensemble provides a list of address indices for a given query.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Voting process:</head><p>• Repetition counting: The first step in the voting process is to count the number of repetitions of each candidate address 𝑖 across the different 𝑘-lists. • Grouping and sorting: Candidates indices are then grouped based on their repetition counts.</p><p>This creates "bags" of indices, where each bag contains one or more indices pointing to associated addresses. Then the bags are sorted by the maximum number of repetitions. • In-bag max pooling of similarity scores: Within each bag, the system collects the similarity scores for each address from the different 𝑘NN models in which they appeared. Max pooling is then applied to these scores to determine the maximum similarity score for each address within the bag. • In-bag Ranking: The addresses are then sorted within each bag based on their maximum similarity scores.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Final address list retrieval:</head><p>• Final output: All the bags are concatenated resulting in a sorted list of addresses where the top candidate address has been repeated the most times and possesses the highest similarity score. • Cut-off value choice: The system sets the value of 𝑚 (the number of neighbors to return) and computes performance metrics to evaluate the effectiveness of the address matching process. The value of 𝑚 is not necessarily the same as 𝑘 since the voting process ultimately is ranking all the candidates of the 𝑘-lists combined which would naturally produces k-plus candidates depending on how heterogeneous the k-lists are. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Experimental Settings</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Data description</head><p>In our experiments, we use real private postal address data made available by a carrier in the region of Yonne, France. This data consists of two database tables, a table of approximately 1M non-normalized addresses of deliveries received via EDI and another table of registered recipients of more than 42K normalized postal addresses. After the de-duplication step mentioned above, and due to the presence of large number of identical delivery instances, just over 85% of all delivery address instances have been de-duplicated, mainly because most deliveries are business addresses. As a result, we are left with just over 147K distinct delivery addresses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Dataset creation:</head><p>We are in an offline training set up (i.e. our ensemble 𝑘NN retriever doesn't need training but rather takes advantage of the different representations, vector or raw, of postal addresses in order to search for the most similar addresses). That said, the creation of a dataset of address pairs (i.e. non-normalized query-address, normalized reference-address) is necessary for two reasons: (1) the offline fine-tuning of the different sentence representation models for the addresses and (2) using the dataset in the final performance test of the 𝑘NN-vote system. To do this, we use the recipient keys associated with the records in the two tables to create a dataset of over 147K address pairs. The dataset is then divided between training data and test data with respective proportions of 90% and 10%. The same test data will be used to evaluate 𝑘NN-vote. A second cleaning is carried out on the training dataset to eliminate certain non-normalized entry addresses likely to reduce the learning quality of the Bi-Encoder such as addresses having only the postal code and the city name. These type of addresses lacks completely the context linking them to their supposed normalized counterparts. Around 0.8% of the training data was impacted by this second cleaning. The Table <ref type="table" target="#tab_1">2</ref> shows some examples of this kind of addresses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Bi-encoders fine-tuning parameters</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.1.">Fine-tuning Base</head><p>The Three chosen base transformers were trained on a corpus of approximately 950K official French postal addresses from the Yonne region, France and adjacent regions taken from the official governmental In this example, 'xxxx' is used as a placeholder pourrain because the expediter only had the recipient's name and needed to fill in something for the incomplete address website <ref type="foot" target="#foot_5">7</ref> . The complete training of the three encoders was carried out during 5 iterations and no parameter optimization was done. The aim here was simply to adapt the three language models to the postal addresses and have them as a basis for fine-tuning the Bi-Encoders. The "transformers" package from HuggingFace<ref type="foot" target="#foot_6">8</ref> was used to train these language models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.2.">Bi-encoders fine-tuning</head><p>The three Bi-Encoders were fine-tuned according to the best combination of hyper-parameters presented in Table <ref type="table" target="#tab_2">3</ref>. Both Camembert-base and XLM-Roberta-base architectures used for the first two Bi-Encoders fine-tuning can be explored in details in <ref type="bibr" target="#b26">[27,</ref><ref type="bibr" target="#b27">28]</ref> as for the third one, a custom pre-trained Roberta-small architecture (6-layers, 128-hidden, 8-heads and 8 million parameters) is used. The three Bi-Encoders were adjusted on a local server with an NVIDIA Tesla A100 graphics card (20 GB) via the SBERT<ref type="foot" target="#foot_7">9</ref> "sentences-transformers" package. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Models evaluation</head><p>For the evaluation of the proposed voting approach, we compare it with our different individual 𝑘NN models in addition to the bi-encoder (BI_DistilBert) model proposed by Duarte et al. <ref type="bibr" target="#b5">[6]</ref>, where they use DistilBert Multilingual as a basis for fine-tuning their model. To remain consistent with the cited research, we consider a value of 𝑘 neighbors equal to 10 but we take the time to test other values of 𝑘 with respect to our individual systems. The models were evaluated based on two metrics: (1) The existence ratio (ER), which is the proportion of correctly predicted positive pairs out of all pairs in the test data set, and (2) the MRR, i.e. the Mean Reciprocal Rank, which is a measure used to evaluate the quality of the appearance ranks of correct query responses via information retrieval systems. For a sample of queries 𝑄 and 𝑟𝑎𝑛𝑘 𝑖 , i.e. the position of the correct searched address for a query 𝑞 𝑖 ∈ 𝑄 with 𝑖 = 1, ..., |𝑄|, the MRR formula can be defined as follows:</p><formula xml:id="formula_4">𝑀 𝑅𝑅 = 1 |𝑄| |𝑄| ∑︁ 𝑖=1 1 𝑟𝑎𝑛𝑘 𝑖 ,<label>(3)</label></formula><p>The primary objective of the models is to achieve a maximum ER at the exact matching level (i.e. the predicted address is exactly the address sought for the query). In addition, two types of ER are computed:</p><p>(1) The ER of the correct predictions in the first rank (top 1) and ( <ref type="formula" target="#formula_3">2</ref>) the ER of the correct predictions among the 𝑘 address candidates (top k). We are also interested in the matching ER at the road level (i.e. the predicted address is at least in the correct road of the searched address). This type of ER is all the more important since in practical cases, carriers will generally be able to successfully deliver parcels as long as they are in the same lane of the delivery address <ref type="bibr" target="#b5">[6]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1.">Comparison of individual 𝑘NN models</head><p>Our first intent was to compare individual 𝑘NN systems in order to identify the best performing model in terms of top k ER at the exact search level (top k exact). The results illustrated in Fig. <ref type="figure" target="#fig_4">4a</ref> show the superiority of 𝑘NN models based on the different sentence representations and this comes down to the quality of the hyper contextualized embeddings in comparison for example with word embeddings like Word2Vec or FastText. We also note that models based on raw representations are generally more efficient than Word2Vec and FastText. The reason is probably because of the enormous loss of information in the static embeddings due to the mean pooling used to create the address vectors.</p><p>Increasing the value of 𝑘 positively impacts the existence ratio overall the models because the larger the list of neighbors, the greater the chance of more difficult addresses to be retrieved. However, the increasing levels of the existence ratio vary between 5% for sentence embedding, 11% for raw embedding and 26% for static embedding with a 𝑘 value between 5 and 120, as shown in Figure <ref type="figure" target="#fig_4">4a</ref>. This can be explained by the level of accuracy of the sentence embedding, as the majority of positive pairs are already identified within the first 5 candidate addresses. In contrast, the raw and static embedding models require a very high 𝑘 value of up to 120. In terms of MRR, the results in Figure <ref type="figure" target="#fig_4">4b</ref> are consistent with the existence ratios, as the best models should have the highest MRR at the lowest possible 𝑘 value. The dynamic finetunner sentence 𝑘NN embedding models retrieve the searched addresses at the highest ranks compared to the other models. Furthermore, they remain stable as 𝑘 increases, thus demonstrating their strong retrieval ability even with the earliest candidates thanks to their ability to capture address context. This was expected as well, as the very purpose of sentence transformers is to learn how to reduce the distance between vectors of positive address pairs, even if they are very different syntactically, whereas models based on string similarity distances only perform well when the addresses are relatively similar syntactically.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2.">kNN multi-embeddings models experiment results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2.1.">Multi-embeddings models instances</head><p>We wanted to test the performance of the voting system using the set of individual 𝑘NN models while having the flexibility to select different subsets to maximize the voting efficiency. Fig. <ref type="figure">5</ref> illustrates the  results of the ERs top k exact of the chosen subsets that had the overall better performance. We observe that the subset of sentence only models performs generally better. If we further exclude from the latter the 𝑘NN models based on Roberta from scratch (camembert + XLM), we see a small increase in ERs at k values of 5 and 10. This increasing aspect is due to the original pre-training of Camembert and XLM_Roberta. It shows the extent that language models (pre-trained on large language corpora) have in terms of performance quality when used in other tasks such as the STS task. The voting system with all models is the least efficient and this can be explained by the large differences between the neighbor lists returned by the sentence models and the other models. In other words, it is natural that 𝑘NN retrivers with the most mistakes in predicting positive pairs impact the ability of the vote to systematically propose a high number of repetitions to the sought-after addresses. This explanation remains even more coherent when we remove the static vectors models from the vote (sentence + raw). We notice a clear improvement in ERs. As for the MRR results, we observe in Fig. <ref type="figure">6</ref> that in general, voting systems based on sentence models succeed in recovering more positive normalized addresses at the highest ranks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2.2.">Discussion</head><p>We put ourselves in comparison with BI_DistilBert. We take into account the addresses found at the road level and we also consider the top 1 results. We remain consistent regarding the value of 𝑘 = 10. Table <ref type="table" target="#tab_3">4</ref> illustrates the best individual 𝑘NN models and voting systems in comparison with BI_DistilBert. We find that BI_DistilBert performs better than the raw models. However, it remains below the ER results of the individual 𝑘NN sentence models and this is due to two main reasons. First, the base of the model, which is multilingual DistilBert, was not pre-trained on a corpus of postal addresses before fine-tuning its Bi_Encoder. However, we believe that it is important that basic language models learn the structure of a postal address independently of the similarity task. Second, the additional difficulty that our dataset brings. Indeed, our addresses are much more difficult in terms of the errors and noise likely to occur. 𝑘NN-vote systems are better overall, supporting our intuition that aggregating results from multiple sources significantly improves similarity search performance. We do, however, note exceptions to the rule. Some individual 𝑘NN models such as (A) and (B) come before the (G) and (H) vote systems. This decrease in ERs confirms to us that aggregation alone does not always guarantee better results and that a high and heterogeneous number of models used in the voting process negatively impacts the prediction quality. This is why the individual performance of the models used in the vote must also be taken into account. More specifically, the vote will be more likely to have superior results if it uses as its aggregation sources, search models that are the least wrong in their predictions. (I) manages to compete with the two best individual 𝑘NN sentence models but adds no improvement and in particular in the top 1 exact. It is undoubtedly the participation of Rsent models in the vote that prevents it from standing out from the other search systems, since Rsent is significantly less efficient than Csent and XLMsent. In conclusion, the best vote is the one that uses the Csent and XLMsent models with a ER top 1 exact of 86.2% and a ER top 10 exact of 96% thus demonstrating the ability of the voting system to retrieve more positive address pairs in the top 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Inference time:</head><p>We measured the retrieval time for 100 address queries to compare the various solutions, as shown in Table <ref type="table" target="#tab_3">4</ref>. Retrieval times for voting systems (between 51s and 173s) are notably longer than individual 𝑘NN models. Despite being conducted without optimization in an experimental setup, we find these times acceptable for business applications. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusions</head><p>In this work we focused on the problem of matching postal addresses. We first showed that this task can be simply formalised as an information retrieval problem where models such as 𝑘NN have been shown to be efficient in both computation time and accuracy. For this purpose, we have assumed that an address is a sentence described with a set of entities and, consequently, it could contain erroneous elements or noisy elements. However, the positions of the entities have an impact on address recognition. For these reasons, we have proposed using different address representation spaces, such as the word embedding space or the sentence embedding space with a pre-trained transformer. Each representation contributes in part to the search for the closest address in that space. In order to aggregate the contribution of the different spaces, we proposed a 𝑘NN ensemble models based on a voting system called 𝑘NN-vote. The experimental results show that our system performs very well, achieving an accuracy of around 96% in the top 10 and 86.2% in the top 1. This system shows its value for this type of task, even though the voting algorithm is still very naive for the time being. In fact, the algorithm favours addresses with a maximum number of repetitions and re-ranks them solely on the basis of the highest similarity score. Hence the impact of the number of voters on the number of appearances. In addition, the system's focus on the highest score of the address, without taking into account the overall quality of the scores, can lead to the dominance of a single score, even if other scores are more indicative. As a perspective, we are improving the voting process in order to consider and reinforce the potential effectiveness of a model with a lower but more significant score.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: French Postal Address Features</figDesc><graphic coords="4,128.41,65.60,338.47,108.96" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: 𝑘NN Search Model Architecture.</figDesc><graphic coords="8,150.98,101.90,293.33,130.79" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Ensemble Vote Process.</figDesc><graphic coords="9,25.94,114.41,543.40,154.55" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head></head><label></label><figDesc>(a) 𝑘NN's top k exact ERs (b) 𝑘NN's top k exact MRRs</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Evaluation of Individual 𝑘NN models with regards to metrics: ER and MRR</figDesc><graphic coords="12,162.25,323.42,270.78,234.54" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 5 :Figure 6 :</head><label>56</label><figDesc>Figure 5: 𝑘NN-vote top k exact ERs</figDesc><graphic coords="13,128.41,160.36,338.46,66.86" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Different Combinations used for 𝑘NN Retriever</figDesc><table><row><cell cols="2">Representation Encoding</cell><cell>Similarity</cell></row><row><cell>Raw</cell><cell>Textual content</cell><cell>Jaro, Jaro-Winkle,</cell></row><row><cell></cell><cell></cell><cell>Levenshtein, Jaccard,</cell></row><row><cell></cell><cell></cell><cell>Damerau-Levenshtein,</cell></row><row><cell></cell><cell></cell><cell>Ratio, Token set ratio,</cell></row><row><cell></cell><cell></cell><cell>Token sort ratio,</cell></row><row><cell></cell><cell></cell><cell>Partial ratio, Set ratio,</cell></row><row><cell></cell><cell></cell><cell>Seq ratio</cell></row><row><cell>Vector</cell><cell>Csent (Camembert Bi-Encoder)</cell><cell>Cosinus</cell></row><row><cell></cell><cell>XLMsent (Xlm Roberta Bi-Encoder)</cell><cell>Euclidean</cell></row><row><cell></cell><cell>Rsent (Roberta custom Bi-Encoder)</cell><cell>Correlation</cell></row><row><cell></cell><cell cols="2">wvavg (Word2Vec word embeddings averaged) Cityblock</cell></row><row><cell></cell><cell>ftavg (fastText word embeddings averaged)</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Examples of Delivery Address Deletion</figDesc><table><row><cell>Received address</cell><cell>Normalized address</cell><cell>Justification for deletion</cell></row><row><cell>trichey 89430 trichey</cell><cell cols="2">4 rue maillet 89430 trichey This address only have the postal code and</cell></row><row><cell></cell><cell></cell><cell>the name of the city</cell></row><row><cell cols="2">89160 89160 sambourg 11 rue d argenteuil 89160</cell><cell>Here another example where door number</cell></row><row><cell></cell><cell>sambourg</cell><cell>and road name are missing</cell></row><row><cell>xxxx 89240 pourrain</cell><cell>30 route d aillant 89240</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Best Found Fine-tuning Hyper-parameters for Bi-Encoders</figDesc><table><row><cell>Bi-Encoder</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>Existence Ratios of the Best Methods System Top 1 exact Top 1 Top 10 exact Top 10 MRR Time (100 𝑞𝑢𝑒𝑟𝑖𝑒𝑠)</figDesc><table><row><cell>(A) Csent_cosine</cell><cell>0.857</cell><cell>0.916</cell><cell>0.959</cell><cell>0.970 0.895</cell><cell>12</cell></row><row><cell>(B) XLMsent_cosine</cell><cell>0.856</cell><cell>0.917</cell><cell>0.954</cell><cell>0.968 0.893</cell><cell>13</cell></row><row><cell>(C) Rsent_cosine</cell><cell>0.799</cell><cell>0.879</cell><cell>0.930</cell><cell>0.956 0.847</cell><cell>6</cell></row><row><cell>(D) Token_set_ratio</cell><cell>0.740</cell><cell>0.829</cell><cell>0.866</cell><cell>0.889 0.791</cell><cell>8</cell></row><row><cell>(E) Ratio</cell><cell>0.730</cell><cell>0.834</cell><cell>0.871</cell><cell>0.891 0.785</cell><cell>5</cell></row><row><cell>(F) BI_DisilBert</cell><cell>0.763</cell><cell>0.793</cell><cell>0.918</cell><cell>0.939 0.826</cell><cell>10</cell></row><row><cell>(G) all models</cell><cell>0.760</cell><cell>0.872</cell><cell>0.950</cell><cell>0.962 0.830</cell><cell>173</cell></row><row><cell>(H) sentence + raw</cell><cell>0.801</cell><cell>0.896</cell><cell>0.957</cell><cell>0.967 0.859</cell><cell>144</cell></row><row><cell>(I) sentence only</cell><cell>0.852</cell><cell>0.920</cell><cell>0.959</cell><cell>0.972 0.894</cell><cell>69</cell></row><row><cell>(J) camembert + XLM</cell><cell>0.862</cell><cell>0.921</cell><cell>0.960</cell><cell>0.972 0.900</cell><cell>51</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">https://github.com/fxsjy/jieba El Moundir Faraoun et al. CEUR Workshop Proceedings 96-111</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">https://developers.google.com/maps/documentation/geocoding?hl=fr</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">http://geohash.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3">https://www.upu.int/ El Moundir Faraoun et al. CEUR Workshop Proceedings 96-111</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4">https://github.com/seatgeek/fuzzywuzzy</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_5">https://adresse.data.gouv.fr/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_6">https://huggingface.co/docs/transformers/index</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_7">https://www.sbert.net/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>Thanks to the french ANRT (Association Nationale de la Recherche et de la Technologie) for funding this project under the "Cifre convention for thesis funding" https://www.anrt.asso.fr/ and to the developers of TEDIES, TALK solutions who assisted in this project https://site.tedies.eu/.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Georoberta: A transformer-based approach for semantic address matching</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Guermazi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sellami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Boucelma</surname></persName>
		</author>
		<ptr target="https://hal.science/hal-04465164" />
		<imprint>
			<date type="published" when="2023">2023</date>
			<publisher>HAL</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Machine learning innovations in address matching: A practical comparison of word2vec and crfs</title>
		<author>
			<persName><forename type="first">S</forename><surname>Comber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Arribas-Bel</surname></persName>
		</author>
		<idno type="DOI">10.1111/tgis.12522</idno>
		<ptr target="https://doi.org/10.1111/tgis.12522" />
	</analytic>
	<monogr>
		<title level="j">Transactions in GIS</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page" from="334" to="348" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Address validation in transportation and logistics: A machine El Moundir Faraoun et al</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Guermazi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sellami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Boucelma</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-65965-3_21</idno>
		<ptr target="https://doi.org/10.1007/978-3-030-65965-3_21" />
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings 96-111 learning based entity matching approach</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">1323</biblScope>
			<biblScope unit="page" from="320" to="334" />
		</imprint>
	</monogr>
	<note>Communications in Computer and Information Science</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A deep learning architecture for semantic address matching</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Liu</surname></persName>
		</author>
		<idno type="DOI">10.1080/13658816.2019.1681431</idno>
		<ptr target="https://doi.org/10.1080/13658816.2019.1681431" />
	</analytic>
	<monogr>
		<title level="j">International Journal of Geographical Information Science</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="page" from="559" to="576" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Deep contrast learning approach for address semantic matching</title>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>She</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Chen</surname></persName>
		</author>
		<idno type="DOI">10.3390/app11167608</idno>
		<ptr target="https://doi.org/10.3390/app11167608" />
	</analytic>
	<monogr>
		<title level="j">Applied Sciences</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page">7608</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Improving address matching using siamese transformer networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Duarte</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Oliveira</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-49011-8_33</idno>
		<ptr target="https://doi.org/10.1007/978-3-031-49011-8_33" />
	</analytic>
	<monogr>
		<title level="s">Lecture Notes in Computer Science</title>
		<imprint>
			<biblScope unit="volume">14116</biblScope>
			<biblScope unit="page" from="413" to="425" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Quantitative comparison of nearest neighbor search algorithms</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">M</forename><surname>Rakotondrasoa</surname></persName>
		</author>
		<idno>arXiv, 2023</idno>
		<ptr target="https://arxiv.org/abs/2307.05235" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Fast record linkage for company entities</title>
		<author>
			<persName><forename type="first">T</forename><surname>Gschwind</surname></persName>
		</author>
		<ptr target="https://ieeexplore.ieee.org/document/9006095" />
	</analytic>
	<monogr>
		<title level="m">IEEE Conference Publication</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A new method of chinese address extraction based on address tree model</title>
		<author>
			<persName><forename type="first">K</forename><surname>Mengjun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Qingyun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Mingjun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Acta Geodaetica et Cartographica Sinica</title>
		<imprint>
			<biblScope unit="volume">44</biblScope>
			<biblScope unit="page" from="99" to="107" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Binary codes capable of correcting deletions, insertions and reversals</title>
		<author>
			<persName><forename type="first">V</forename><surname>Levenshtein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Soviet Phys. Doklady</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">707</biblScope>
			<date type="published" when="1966">1966</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">A technique for computer detection and correction of spelling errors</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">J</forename><surname>Damerau</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="171" to="176" />
			<date type="published" when="1964">1964</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Distribution de la flore alpine dans le bassin des dranses et dans quelques regions voisines</title>
		<author>
			<persName><forename type="first">P</forename><surname>Jaccard</surname></persName>
		</author>
		<ptr target="https://www.scirp.org" />
	</analytic>
	<monogr>
		<title level="j">Bulletin de La Société Vaudoise Des Sciences Naturelles</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="page" from="241" to="272" />
			<date type="published" when="1901">1901</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida</title>
		<author>
			<persName><forename type="first">M</forename><surname>Jaro</surname></persName>
		</author>
		<idno type="DOI">10.2307/2289924</idno>
		<ptr target="https://doi.org/10.2307/2289924" />
	</analytic>
	<monogr>
		<title level="j">Journal of the American Statistical Association</title>
		<imprint>
			<biblScope unit="volume">84</biblScope>
			<biblScope unit="page" from="414" to="414" />
			<date type="published" when="1989">1989</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">E</forename><surname>Winkler</surname></persName>
		</author>
		<ptr target="https://eric.ed.gov/?id=ED325505" />
		<imprint>
			<date type="published" when="1990">1990</date>
			<publisher>ERIC</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Efficient estimation of word representations in vector space</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<idno>arXiv, 2013</idno>
		<ptr target="https://arxiv.org/abs/1301.3781" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Bojanowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/1607.04606" />
		<title level="m">Enriching word vectors with subword information</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">5</biblScope>
		</imprint>
	</monogr>
	<note type="report_type">arXiv</note>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Sutton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mccallum</surname></persName>
		</author>
		<idno>arXiv, 2010</idno>
		<ptr target="https://arxiv.org/abs/1011.4088" />
		<title level="m">An introduction to conditional random fields</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
		<ptr target="https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html" />
		<imprint>
			<date type="published" when="2017">2017</date>
			<publisher>NeurIPS</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno>arXiv</idno>
		<ptr target="https://arxiv.org/abs/1907.11692" />
		<title level="m">Roberta: A robustly optimized bert pretraining approach</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Deep entity matching with pre-trained language models</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Suhara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Doan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-C</forename><surname>Tan</surname></persName>
		</author>
		<idno type="DOI">10.14778/3421424.3421431</idno>
		<ptr target="https://doi.org/10.14778/3421424.3421431" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the VLDB Endowment</title>
				<meeting>the VLDB Endowment</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="50" to="60" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Entity matching with transformer architectures -a step forward in data integration</title>
		<author>
			<persName><forename type="first">U</forename><surname>Brunner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Stockinger</surname></persName>
		</author>
		<idno type="DOI">10.5441/002/edbt.2020.58</idno>
		<ptr target="https://doi.org/10.5441/002/edbt.2020.58" />
	</analytic>
	<monogr>
		<title level="m">EDBT</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Sentence-bert: Sentence embeddings using siamese bert-networks</title>
		<author>
			<persName><forename type="first">N</forename><surname>Reimers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/d19-1410</idno>
		<ptr target="https://doi.org/10.18653/v1/d19-1410" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</title>
				<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter</title>
		<author>
			<persName><forename type="first">V</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wolf</surname></persName>
		</author>
		<idno>arXiv</idno>
		<ptr target="https://arxiv.org/abs/1910.01108" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Billion-scale similarity search with gpus</title>
		<author>
			<persName><forename type="first">J</forename><surname>Johnson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Douze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jégou</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1702.08734</idno>
		<idno type="arXiv">arXiv:1702.08734</idno>
		<ptr target="https://doi.org/10.48550/arXiv.1702.08734" />
	</analytic>
	<monogr>
		<title level="j">CEUR Workshop Proceedings</title>
		<editor>Moundir Faraoun et al.</editor>
		<imprint>
			<biblScope unit="volume">96</biblScope>
			<biblScope unit="issue">111</biblScope>
			<date type="published" when="2017-02-28">2017. 28 Feb 2017</date>
		</imprint>
	</monogr>
	<note>arXiv preprint El</note>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Multidimensional binary search trees used for associative searching</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Bentley</surname></persName>
		</author>
		<idno type="DOI">10.1145/361002.361007</idno>
		<idno>doi:10.1145/361002. 361007</idno>
		<ptr target="https://doi.org/10.1145/361002.361007" />
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="509" to="517" />
			<date type="published" when="1975">1975</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">Five Balltree Construction Algorithms</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Omohundro</surname></persName>
		</author>
		<ptr target="https://www.icsi.berkeley.edu/pubs/techreports/TR-89-063.pdf" />
		<imprint>
			<date type="published" when="1989">1989</date>
			<pubPlace>Berkeley, CA</pubPlace>
		</imprint>
		<respStmt>
			<orgName>International Computer Science Institute</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Camembert: a tasty french language model</title>
		<author>
			<persName><forename type="first">L</forename><surname>Martin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Muller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Suárez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Dupont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>De La Clergerie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Seddah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sagot</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.645</idno>
		<ptr target="https://doi.org/10.18653/v1/2020.acl-main.645" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="7203" to="7219" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<title level="m" type="main">Unsupervised cross-lingual representation learning at scale</title>
		<author>
			<persName><forename type="first">A</forename><surname>Conneau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Khandelwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Chaudhary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Wenzek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Guzmán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno>arXiv</idno>
		<ptr target="https://arxiv.org/abs/1911.02116" />
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Learning a similarity metric discriminatively, with application to face verification</title>
		<author>
			<persName><forename type="first">S</forename><surname>Chopra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hadsell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lecun</surname></persName>
		</author>
		<idno type="DOI">10.1109/cvpr.2005.202</idno>
		<ptr target="https://doi.org/10.1109/cvpr.2005.202" />
	</analytic>
	<monogr>
		<title level="m">IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR&apos;05)</title>
				<imprint>
			<date type="published" when="2005">2005. 2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<title level="m" type="main">Efficient natural language response suggestion for smart reply</title>
		<author>
			<persName><forename type="first">M</forename><surname>Henderson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Al-Rfou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Strope</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Lukacs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Miklos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kurzweil</surname></persName>
		</author>
		<idno>arXiv, 2017</idno>
		<ptr target="https://arxiv.org/abs/1705.00652" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Batch-softmax contrastive loss for pairwise sentence scoring tasks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Chernyavskiy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ilvovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kalinin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.naacl-main.9</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
				<meeting>the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<date type="published" when="2022-07">jul 2022</date>
			<biblScope unit="page" from="116" to="126" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
