<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Entity Matching with 7B LLMs: A Study on Prompting Strategies and Hardware Limitations</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Ioannis</forename><surname>Arvanitis-Kasinikos</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">National and Kapodistrian University of Athens</orgName>
								<address>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">George</forename><surname>Papadakis</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">National and Kapodistrian University of Athens</orgName>
								<address>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Entity Matching with 7B LLMs: A Study on Prompting Strategies and Hardware Limitations</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">FF0842E0658F800DE6E4EBC90DF10FF7</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:08+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Entity Matching</term>
					<term>7B LLMs</term>
					<term>Zero-Shot Prompts</term>
					<term>Few-Shot Prompts</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Entity Matching (EM) is a fundamental task in data management, involving the identification and linking of records that refer to the same real-world entity across different datasets. While Large Language Models (LLMs) have shown promise in addressing complex natural language processing tasks, their substantial computational requirements often limit their practical applicability. In this work, we investigate the use of 7B parameter LLMs with 4-bit quantization for EM tasks executable on commodity hardware. We explore various prompting strategies, including zero-shot, few-shot, and general matching definition prompts, to evaluate their effectiveness in improving EM accuracy. Experiments are conducted on two benchmark datasets with products, which present varying levels of complexity and challenge in product descriptions. Our findings demonstrate that 7B parameter LLMs can effectively perform EM, with the Orca2 model consistently outperforming others across different prompting strategies and datasets. The study highlights that few-shot prompting significantly enhances performance over zero-shot approaches, emphasizing the importance of task-specific examples and careful prompt design. We also examine the impact of example order in few-shot prompts and find that it has a substantial effect on model performance. Finally, we examine hardware limitations, demonstrating that effective EM can be achieved with resource-constrained models.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Entity Resolution (ER) constitutes a vital task in data management that involves identifying and linking records from different datasets that refer to the same real-world entity <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. In many domains, including e-commerce, healthcare, and finance, accurate ER is essential for ensuring data quality, enabling effective data integration, and supporting informed decision-making <ref type="bibr" target="#b2">[3]</ref>. However, this task is challenging due to data inconsistencies, incompleteness, and ambiguity across different sources <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5]</ref>.</p><p>As an example, consider the product descriptions in Figure <ref type="figure">1</ref>. Despite corresponding to the same object (Sony headphones), there are significant variations in product names, attributes, and dimensions. These discrepancies illustrate the challenges in reconciling variations across datasets, particularly when dealing with unstructured text and linguistic differences. Accurate ER in scenarios like this is crucial for product catalog integration, price comparison, and recommendation systems <ref type="bibr" target="#b5">[6]</ref>.</p><p>Due to its quadratic time complexity, ER solutions typically implement the Filtering-Verification framework <ref type="bibr" target="#b6">[7]</ref>. The Filtering step, often called Blocking, significantly reduces the computational cost to the most similar candidate pairs, which are the most likely matches <ref type="bibr" target="#b7">[8]</ref>. The Verification step performs Entity Matching (EM), which essentially determines whether two records are duplicates, describing the same real-world object. In the following, we exclusively focus on EM.</p><p>Traditional EM solutions typically rely on rule-based approaches, string similarity metrics, or machine learning algorithms <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b10">11]</ref>. However, these methods can struggle with complex linguistic variations and contextual understanding, while requiring domain expertise and heavy DOLAP 2025: 27th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, co-located with EDBT/ICDT 2025, <ref type="bibr">March 25, 2025</ref>, Barcelona, Spain cs1180001@di.uoa.gr (I. Arvanitis-Kasinikos); gpapadis@di.uoa.gr (G. <ref type="bibr">Papadakis)</ref> https://gpapadis.wordpress.com (G. Papadakis) 0000-0002-7298-9431 (G. Papadakis)</p><p>Figure <ref type="figure">1</ref>: Two records with major differences describing the same product.</p><p>human involvement <ref type="bibr" target="#b11">[12]</ref>. This is addressed by more recent state-of-the-art approaches that leverage deep learning (DL) techniques <ref type="bibr" target="#b12">[13]</ref>. However, they require substantial amounts of training data, which are rarely available.</p><p>Recent advancements in NLP, particularly in Large Language Models (LLMs), offer new possibilities for addressing EM challenges <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15]</ref>. LLMs possess advanced capabilities for natural language understanding, which allows them to process and interpret complex textual descriptions <ref type="bibr" target="#b15">[16]</ref>. Most importantly, LLM-based EM can be performed in zeroshot settings, requiring no training instances, a characteristic particularly attractive for out-of-the-box solutions.</p><p>In this work, we evaluate the performance of 7B parameter LLMs in entity matching tasks. While larger LLMs with hundreds of billions of parameters have shown impressive results <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b15">16]</ref>, their computational requirements often make them impractical for many real-world applications. By employing these LLMs, which excel in natural language understanding and semantic similarity assessment, this work seeks to address EM challenges in real-world datasets with linguistic variations and unstructured text, while also highlighting their suitability for execution on commodity hardware. The focus on 7B parameter LLMs is motivated by their potential for efficient deployment on commodity hardware, making them more suitable for practical applications.</p><p>To this end, we perform an extensive experimental evaluation that considers the models' ability to handle different types of EM scenarios. We explore novel zero-shot, fewshot, and general matching definition prompting strategies to assess their effectiveness in improving matching accuracy. Our goal is to bridge the gap between the advanced capabilities of LLMs and the practical constraints of real-world EM applications, potentially paving the way for more efficient and accurate ER techniques in diverse domains.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>There is a plethora of recent LLM-based EM methods, because LLMs offer several advantages over traditional EM solutions: (i) contextual understanding, as they understand the context and semantics of entity descriptions better than traditional string matching techniques. (ii) robustness, since LLMs are typically more capable of addressing variations in how entity information is expressed. (iii) zero-shot and few-shot learning, i.e., LLMs can accomplish EM tasks with no or minimal examples of matching decisions. These characteristics render LLMs ideal for most EM tasks, especially those with complex, unstructured product descriptions.</p><p>The seminal work on LLM-based EM <ref type="bibr" target="#b15">[16]</ref> investigated the effectiveness of GPT3-175B in EM, focusing on three key parameters: (i) problem definition, exploring different phrasings such as "Are Product A and Product B the same?" or "Are Product A and Product B equivalent?". (ii) in-context learning, comparing zero-shot with few-shot approaches. The former involve prompts with no examples in the prompt, while the latter involve a couple of examples, which are selected randomly or by experts. (iii) entity serialization, testing the use of all attributes or just a subset of them. Their experimental analysis led to the following conclusions: (i) few-shot learning significantly outperforms zero-shot approaches, (ii) attribute selection yields better results than using all attributes, (iii) problem definition has a substantial impact on performance, (iv) LLM performance is comparable to the state-of-the-art DL-based matching algorithms.</p><p>A detailed study was conducted in <ref type="bibr" target="#b14">[15]</ref>, using six LLMs, three hosted and three open-source ones. The experiments explored additional parameters such as problem definition, language complexity, output specification, entity serialization, in-context learning, instructions, and fine-tuning. The experimental results revealed that: (i) no single prompt consistently outperformed all others across different scenarios. (ii) Open-source LLMs showed comparable effectiveness to hosted models. (iii) LLMs performed competitively with deep learning-based matchers, even in zero-shot settings.(iv) Few-shot and instruction-based prompts generally outperformed zero-shot approaches. (v) Fine-tuning significantly improved effectiveness.</p><p>In another line of research, three distinct prompting strategies were explored in <ref type="bibr" target="#b16">[17]</ref>: (i) Match prompts, which contain traditional pair-wise questions. E.g., "Do these two records refer to the same real-world entity? Record 1: [details]. Record 2: [details]. " (ii) Comparison prompts, which ask for the most similar entity to a given reference. E.g., "Which of these two records is more consistent with the given record? Given Record: The experimental results show that incorporating record interactions through the comparison and selection prompts significantly improves EM performance across various scenarios; among the two, the selection prompts are the topperformers in most cases. However, they suffer from position bias, because their accuracy decreases when the duplicate record is placed lower in the list of candidates.</p><p>BatchER <ref type="bibr" target="#b17">[18]</ref> aims to reduce the costs for hosted LLMs through batch processing, exploring various methods for question batching and demonstration selection. The exper- imental results demonstrate that batch prompting outperform match prompts in both effectiveness and cost, with the top performance achieved by diversity-based question batching combined with covering-based demonstration selection.</p><p>These studies collectively demonstrate the potential of LLMs in entity matching tasks, highlighting the importance of prompt engineering, the competitiveness of open-source models, and the effectiveness of batching strategies for improved efficiency. This work builds upon and extends the existing ones by focusing specifically on 7B parameter LLMs with 4-bit quantization. Unlike previous studies that primarily use larger, more resource-intensive models, our work explores the potential of smaller and more accessible LLMs for EM tasks. In this context, we perform a comprehensive evaluation of various novel prompting strategies, including zero-shot, few-shot, and general matching definition approaches, across multiple models and datasets. This approach offers insights into the practical applicability of LLMs in resource-constrained environments, bridging the gap between advanced language models and real-world EM challenges.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Problem Definition</head><p>Applied after Filtering, Entity Matching is typically formulated as a binary classification problem <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>. More formally: Given two records 𝑟1 and 𝑟2, the task is to determine whether they refer to the same entity. This is often expressed as a function 𝑓 (𝑟1, 𝑟2) → {0, 1}, where 1 indicates a match (also called duplicate) and 0 indicates a non-match.</p><p>In LLM-based settings, EM is framed as a natural language inference task. The LLM is provided with descriptions of two records and asked to determine if they refer to the same entity, returning "True" for a match and "False" otherwise.</p><p>In all cases, EM performance is measured with respect to:</p><p>• Precision, i.e., the proportion of correctly identified matches out of all predicted matches.</p><p>• Recall, i.e., the proportion of correctly identified matches out of all actual matches.</p><p>• F-measure, i.e., the harmonic mean of precision and recall, providing a balanced measure of performance.</p><p>• Run-time, i.e., the time taken to complete the ER process.</p><p>The first three measures are defined in [0, 1] with higher values indicating higher effectiveness. For the last one, lower values indicate higher time efficiency.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">EM Prompts</head><p>We now present the EM prompts that are examined in our work. The basic prompt is presented in Figure <ref type="figure" target="#fig_1">2</ref>  <ref type="table" target="#tab_0">1</ref>) so that they capture typical variations in product descriptions that are encountered in the full dataset.</p><p>Note that LLM responses to few-shot prompts suffer from position bias <ref type="bibr" target="#b16">[17]</ref>, because the order of examples in the EM prompt might alter the matching decision. This means that in the example of Figure <ref type="figure" target="#fig_1">2</ref>(b), the response for a specific candidate pair might be True (i.e., matching) if the positive example precedes the negative one and False (i.e., nonmatching) otherwise. For this reason, we define two types of few-shot prompts:</p><p>1. TF, where the True example is followed by False one, as in Figure <ref type="figure" target="#fig_1">2</ref>(b).</p><p>2. FT , where the False example is followed by True one.</p><p>Note that with multiple examples per prompt, as in <ref type="bibr" target="#b16">[17]</ref>, more arrangements are possible. In this work, though, we exclusively consider the two variations of the few-shot EM prompt that involves one example per match type.</p><p>To increase the robustness of LLMs to few-shot EM prompts, we consider two matching approaches for each candidate pair, query with both TF and FT prompts:</p><p>1. The union approach labels a candidate pair as True if either the TF or FT prompt results in a True response.</p><p>2. The intersection approach labels a candidate pair as True only if both the TF and FT prompts yield a True response.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Domain-specific Zero-Shot Prompts</head><p>The above prompts are generic enough to apply to any domain. In our experimental analysis, we also consider domain-specific ones, which are crafted for the product matching task. More specifically, we devise a zero-shot prompt that involves general matching definitions, providing the LLM with explicit guidance on how to determine if two records refer to the same product.</p><p>The core assumption of this approach is that the records are described by a clean, aligned schema. This is necessary for building a schema-aware generic definition of duplicate records. In the product matching task, we use four key product attributes: (i) product name, (ii) features, (iii) manufacturer, and (iv) model number. We use them in two different configurations:</p><p>1. The composite domain-specific EM prompt concatenates all four criteria in the above sequence, as in Figure <ref type="figure" target="#fig_3">3</ref>. The goal is to facilitate more nuanced matching decisions. 2. The atomic domain-specific EM prompt uses only the model number as the matching criterion. We selected this attribute because it provides the cleanest and most distinctive values.</p><p>These two configurations were chosen after preliminary tests that suggested that they yield the best performance among all other combinations of these four attributes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Experimental Analysis</head><p>Experimental Settings. All experiments were implemented in Python v3.12.0 and Ollama<ref type="foot" target="#foot_0">1</ref> v0.1.22. All experiments were carried out on a server running Ubuntu 22.04.1 LTS, equipped with Intel Core i7-9700K 8 core @ 3.6 GHz, 32GB RAM and NVIDIA GeForce GTX 1080 Ti 11GB. Due to the limited size of the available VRAM, our study focuses on 7-billion-parameter LLMs with optimizations such as quantization, which in our case replaces the 32bit floating-point model weights with 4-bit integers. This reduces the model size, while maintaining reasonable performance levels. In other words, quantization lowers effectiveness, due to the fewer parameters and the lower precision of the model's weights, but significantly reduces run-times and memory consumption. Therefore, our experimental results are useful for resource-constrained applications, which run LLMs on commodity hardware.</p><p>LLMs. There is a plethora of open-source LLMs, with newer models introduced on a rather frequent basis. During our study, two models were quite popular: Llama 2 <ref type="bibr" target="#b18">[19]</ref>, with 7B parameters and a context length of 4096, as well as Mistal <ref type="bibr" target="#b19">[20]</ref>, with 7.3B parameters. However, preliminary experiments demonstrated that both of them were inappropriate for the EM tasks considered in this work. Llama 2 consistently responded with "True" for every candidate pair, while Mistral failed to provide a response according to given instructions -it indicated an inability to respond in certain cases or gave explanations for its decisions instead of a "True" or "False" label.</p><p>In their place, we considered the following open-source models, which demonstrated high effectiveness in our preliminary experiments:</p><p>1. Orca2 <ref type="bibr" target="#b20">[21]</ref>. Built by Microsoft Research, Orca2 is a family of models fine-tuned on Meta's Llama 2 using synthetic data.  3. Zephyr <ref type="bibr" target="#b21">[22]</ref>. A 7B parameter model fine-tuned on Mistral, it achieves results similar to Llama 2 70B Chat in various benchmarks. It is trained on a distilled dataset, improving grammar and chat results.</p><p>4. Mistral-OpenOrca<ref type="foot" target="#foot_2">3</ref> . This is a 7B parameter model, finetuned on top of Mistral 7B using the OpenOrca dataset. <ref type="foot" target="#foot_3">4</ref> . This is a Llama 2 based model fine-tuned on an Orca-style dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Stable-Beluga</head><p>6. Llama-Pro <ref type="bibr" target="#b22">[23]</ref>: An 8B parameter expansion of Llama 2 that specializes in integrating both general language understanding and domain-specific knowledge, particularly in programming and mathematics.</p><p>In all cases, we use the default latest model with 4-bit quantization and 7B parameters. Datasets. We used two real-world datasets with products that are widely used in the ER literature: (i) 𝐷1 is the Abt-Buy dataset, which comprises product listings from two online retailers, Abt Electronics and Buy.com. (ii) 𝐷2 is the Walmart-Amazon dataset, which contains product listings from two other online retailers, Walmart and Amazon. 𝐷1 primarily focuses on electronic products, while 𝐷2 covers a broader range of product categories, matching diverse entity types. Both datasets present important challenges, such variations in product names and descriptions across retailers, inconsistent use of model numbers and other identifiers, differences in the level of detail provided for each product, variations in formatting and units (e.g., dimensions, weights) as well as missing or null values in certain fields.</p><p>Their technical characteristics are summarized in Table <ref type="table" target="#tab_0">1</ref>. Note that each dataset comprises two individually clean data sources, whose sizes are reported in column #Entities. Note also that we apply the prompts to the candidate pairs generated by a state-of-the-art blocking implemented by PyJedAI <ref type="bibr" target="#b23">[24]</ref> , version 0.1.6. Following <ref type="bibr" target="#b24">[25]</ref>, we kNN-Join, which identifies the 𝑘 nearest neighbors of each entity. We fine-tuned it, maximizing blocking precision for a blocking recall of at least 90%, as reported in the rightmost columns of Table <ref type="table" target="#tab_0">1</ref>. This configuration uses cleaning (i.e., stop-word removal and stemming) and cosine similarity in both datasets. For Abt-Buy, 𝑘 was set to 4, while the attribute values were converted into a multiset of character trigrams. For Walmart-Amazon, 𝑘 was set to 2, while the attribute values were converted into a multiset of character four-grams.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Zero-Shot Prompting Results</head><p>We now examine the relative performance of the selected LLMs over 𝐷1 and 𝐷2, when coupled with the basic zeroshot EM prompt of Figure <ref type="figure" target="#fig_1">2(a)</ref>.</p><p>We observe that Orca2, OpenHermes, and Zephyr consistently rank as the top three models with respect to F-Measure in both datasets. The last two models switch their ranking positions in the two datasets, whereas Orca2 maintains the lead. The superior performance of Orca2, which demonstrates Orca2's robustness under diverse EM settings, can be attributed to its fine-tuning on synthetic data designed for reasoning tasks. This enhances its capability to understand and compare complex product descriptions. OpenHermes is fine-tuned on fully open datasets with strong multi-turn chat skills, leveraging advanced language understanding to perform well. Zephyr's competitive performance probably results from its training on a distilled dataset that improves grammar and chat results, aiding in better interpretation of entity attributes. The lower performance of Mistral-OpenOrca, Stable-Beluga, and Llama-Pro is probably due to the less specialized training data or the smaller model capacities for the specific nuances of EM.</p><p>Note that all models exhibit much higher recall than precision in both datasets. This means that they are prone to label a candidate pair as matching, at the cost of introducing numerous false positives. Orca2 consistently exhibits the highest precision, thus yielding the highest F-Measure, too.</p><p>Note also that all models exhibit markedly lower effectiveness in 𝐷2 compared to 𝐷1. This suggests that 𝐷2 presents greater EM challenges, potentially due to more diverse or complex product descriptions. While 𝐷1 is restricted to electronics, 𝐷2 covers a broader range of products and includes more variation in descriptions, attributes, and data quality, rendering EM more difficult. Furthermore, 𝐷1 has a 1:1 matching between its two data sources, whereas 𝐷2 has a much lower ratio of matches, adding another layer of complexity to the task. The substantial performance gap between 𝐷1 and 𝐷2 underscores the significant impact of data characteristics on model effectiveness.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Few-Shot Prompting Results</head><p>We now examine the performance of the aforementioned few-shot prompts over 𝐷1 and 𝐷2. We disregard Mistral-OpenOrca, Stable-Beluga, and Llama-Pro, because they exhibited significantly lower effectiveness and less consistent performance in the zero-shot experiments -preliminary experiments verified their poor performance in few-shot settings, too. For brevity, we focus on the top three performing models, namely Orca2, OpenHermes, and Zephyr.</p><p>The results are reported in Figure <ref type="figure" target="#fig_5">5</ref>. Based on preliminary experiments, we randomly select the examples included in the few-shot prompts from the candidate pairs of the same dataset. The same examples are used in all prompts issued on a particular dataset.</p><p>In both datasets, we observe the same patterns as regards the relative performance of TF and FT few-shot prompts: For Orca2, there is a substantial improvement when using the latter; OpenHermes is more robust to position bias, as there is no significant difference between the two prompt strategies; Zephyr works best when coupled with the TF few-shot prompts. These patterns highlight that the impact of position bias on each model is consistent across the two datasets. Note also that with the exception of Orca2 with TF prompts, all models achieve higher recall than precision, remaining more prone to label a candidate pair as matching.</p><p>It is also interesting to compare the union approach with the intersection one. For OpenHermes and Zephyr, the latter yields significantly higher F-Measure: by considering as duplicates only the candidate pairs that are marked as matching by both TF and FT few-shot prompts, the reduction in recall is much lower than the increase in precision (as a result, recall remains much higher than precision for both models). This means that considering only the common matches of TF and FT prompts leads to more accurate performance. Note that these patterns are consistent for both models over both datasets. This is not the case with Orca2, whose performance varies significantly across the two datasets. In 𝐷1, the same F1 score is achieved for both approaches, because the intersection raises recall by 12%, while reducing precision to the same degree. In 𝐷2, though, the intersection reduces recall by 23% and increases precision by 16%, thus yielding a much lower F-Measure. Note that in both datasets, the recall of the model gets lower than its precision in combination with the intersection approach, unlike the union one.</p><p>Overall, we can conclude that Orca2 works best when coupled with FT few-shot prompts, while OpenHermes and Zephyr maximize their effectiveness when intersecting the matches of TF and FT prompts. Among them, the top performers over 𝐷1 and 𝐷2 are Orca2 (F1=0.799) and Zephyr (F1=0.531), respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Domain-specific Zero-Shot Prompting Results</head><p>In this section, we compare the atomic domain-specific prompt with the composite one. As in Section 5.2, we exclusively consider the three top performing models with respect to the zero-shot prompts: Orca2, OpenHermes, and Zephyr. Their performance is reported in Figure <ref type="figure" target="#fig_6">6</ref>. We observe that in all cases, the atomic prompt outperforms the composite one to a significant extent -the only exception corresponds to Zephyr in 𝐷1, where the composite prompt increases F-Measure almost by 15%. This pattern should be attributed to the short, distinctive and clean values provided by the model number. This way, it reduces the noise from other product attributes like product name, which are typically associated with long and diverse texts.</p><p>Similar to the above strategies, all LLMs exhibit much higher recall than precision. This means that they remain prone to mark a candidate pair as a match at the cost of introducing false positives -a behavior that permeates all prompt strategies we have examined.</p><p>Among the three models, Orca2 is consistently better, albeit to a minor extent in 𝐷2. This consistent performance underscores Orca2's effectiveness in EM tasks under quite different prompt designs.</p><p>We can conclude that domain-specific zero-shot prompts offer an effective and reliable alternative in datasets with a clean schema of known characteristics.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4.">Comparison of Prompting Strategies</head><p>We now compare the three top-performing models (Orca2, OpenHermes, and Zephyr) with respect to effectiveness and time efficiency across the three strategies of EM prompts discussed in Section 4. Note that among the few-shot and domain-specific variants, for each LLM we only consider the one with the highest F-Measure in both datasets. Their performance is reported in Table <ref type="table" target="#tab_1">2</ref>.</p><p>For Orca2, we observe that the FT few-shot prompts are the top performers in 𝐷1. The atomic domain-specific ones follow in very close distance in terms of F-Measure, while exhibiting a much lower run-time. This means that the domain-specific prompts offer a significantly better balance between effectiveness and time efficiency. In 𝐷2, this strategy scores the highest F-Measure for a slightly higher runtime than the second best approach (zero-shot prompts). For these reasons, Orca2 works best in combination with the atomic domain-specific prompts.</p><p>Regarding OpenHermes, the differences between the three types of prompts are minor in terms of F-Measure. As expected, the fastest approach in both datasets corresponds to the zero-shot prompts. This configuration also achieves the highest F-Measure in 𝐷1, while in 𝐷2, it ranks second, within a negligible distance from the top (&lt;0.5%). Therefore, we can conclude that the zero-shot prompts are the best choice for OpenHermes.</p><p>For Zephyr, there is a clear winner in the case of 𝐷1: the intersection of few-shot prompts. It exhibits, though, the highest run-time by a large extent. This is expected, as it queries the LLM twice per candidate pair. In the case of 𝐷2, the same strategy takes a minor lead over the composite </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Method</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 3</head><p>The F-Measure per dataset reported in the literature for three state-of-the-art EM algorithms.</p><p>domain-specific prompts, which are faster by more than 10%. Due to its consistency, the best choice for Zephyr corresponds to the intersection of TF and FT few-shot prompts. Among the three 7B LLMs, the configuration consistently achieving (almost) the highest effectiveness in both datasets is Orca2 coupled with atomic domain-specific prompts. Its efficiency is also rather high, given that its run-time is marginally higher than that of the fastest (zero-shot) configuration of the other two models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.5.">Comparison to Baselines</head><p>To put the performance of the selected 7B LLMs into perspective, we compare it with three state-of-the-art EM approaches from the literature: 1. ZeroER <ref type="bibr" target="#b25">[26]</ref>, an unsupervised approach that requires no labelled datasets, learning Gaussian mixture models for matching and non-matching candidate pairs. 2. Magellan <ref type="bibr" target="#b28">[29]</ref>, a supervised approach combining binary classifiers with a series of hand-crafted features based on string similarity measures.</p><p>3. DeepMatcher <ref type="bibr" target="#b27">[28]</ref>, a framework leveraging the synergy between language models and Deep Learning classification.</p><p>For each method, we consider its best performance as reported in the literature. The results are reported in Table <ref type="table" target="#tab_0">1</ref>.</p><p>We observe mixed patterns. In 𝐷1, all LLM configurations in Table <ref type="table" target="#tab_1">2</ref>, even the zero-shot prompts, outperform all three baseline methods to a significant extent (&gt; 21%). This is remarkable, because the simplest prompt strategy requires neither domain expertise nor the labeling candidate pairs, unlike Magellan and DeepMatcher, whose performance is derived from large training and validation sets, which amount to 60% and 20% of all candidate pairs, resp.</p><p>The situation is reversed in 𝐷2, where all baseline methods achieve a much better performance. In fact, the highest F-measure of Orca2 is lower by 16.5% than the worst baseline (ZeroER). This should be attributed to the more challenging settings of 𝐷2, which have already been discussed in Section 5.1. Note also that the records in 𝐷2 are noisier, with a much higher portion of missing values. Its records are also longer, an aspect that is crucial for the 7B LLMs we are considering in this study, due to their limited attention window. These settings favor the learning-based functionality of the baseline methods, which take a clear lead over the learning-free functionality of 7B LLMs. Another reason for the poor performance of the latter is that they emphasize recall at the expense of precision, significantly decreasing their F-Measure in 𝐷2, due the very low portion of matches in comparison to the total number of entities from each data source. Therefore, more advanced strategies are required for boosting the performance of 7B LLMs in datasets with characteristics similar to that of 𝐷2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusions &amp; Future Work</head><p>Focusing on 7B open-source LLMs, we examined the performance of three main prompt strategies: (i) the basic, domain-agnostic zero-shot prompt, (ii) the few-shot prompt with one example per type of matches, and (iii) the domainspecific zero-shot prompt. We considered several variants for the last two strategies and applied all of them on two established benchmark datasets for product matching. Testing six popular LLMs, we reached the following conclusions:</p><p>• Few-shot and domain-specific prompting significantly improve the performance of the zero-shot approaches, highlighting the value of task-specific prompts.</p><p>• In few-shot prompts, the response of LLMs is generally sensitive to order of examples. This suggests that a careful prompt engineering is crucial for optimal performance in real-world ER applications.</p><p>• This sensitivity can be addressed by the intersection approach to few-shot prompting, which consistently achieves much better results, increasing precision at a higher rate than it reduces recall.</p><p>• Orca2 consistently outperformed the other LLMs across most prompting strategies and datasets, demonstrating high robustness and effectiveness. In fact, the relative performance of the best models (Orca2 &gt; OpenHermes &gt; Zephyr) remained largely consistent across prompt strategies and datasets, suggesting inherent strengths in their base architectures.</p><p>• The use of 4-bit quantization and 7B parameter models demonstrated the potential for effective EM with limited computational resources. The effectiveness of the considered models is competitive with established, learningbased EM approaches, especially in datasets with low portion of missing values and short entity descriptions.</p><p>In the future, we plan to explore LLMs' capability in matching entities across different languages and to enhance the interpretability and explainability of LLM decisions.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>[details]. (A) Record 1: [details]. (B) Record 2: [details]. " (iii) Selection prompts, which identify a matching entity from a set of candidates. E.g., "Select a record from the following list that refers to the same real-world entity as the given record: Given Record: [details]. Options: 1. [details] 2. [details] 3. [details]..."</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: (a) The basic zero-shot EM prompt, and (b) its few-shot extension.</figDesc><graphic coords="2,309.59,65.61,213.67,131.97" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head></head><label></label><figDesc>(a). It consists of an instruction that describes the input and the desired output. It lacks any examples, thus constitutes a zeroshot EM prompt, which tests the model's ability to generalize to new tasks or domains it has not been trained on. A concise few-shot EM prompt extends the zero-shot one with the examples in Figure 2(b). To provide a balanced context, there are two examples that include a pair of matching entities and a pair of non-matching ones. These examples serve as a form of weak supervision, allowing the LLM to learn from the provided instances and generalize to similar cases. Note that the examples in Figure 2(b) have been carefully selected from dataset 𝐷1 (see Table</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Domain-specific, zero-shot EM prompt for product matching.</figDesc><graphic coords="3,309.59,65.61,213.66,123.75" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Effectiveness of the zero-shot prompt in Figure 2(a) on top of the selected LLMs over 𝐷 1 (left) and 𝐷 2 (right).</figDesc><graphic coords="4,84.52,137.77,212.09,137.27" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Effectiveness of the few-shot prompts in Figure 2(b) on top of selected LLMs over 𝐷 1 (left) and 𝐷 2 (right). From top to bottom, the TF promps are presented first, followed by the FT prompts, the Union and the Intersection approaches.</figDesc><graphic coords="5,84.52,65.88,212.10,136.24" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Effectiveness of the atomic and composite domain-specific zero-shot prompts in Figure 2(a) on top of the selected LLMs over 𝐷 1 (left) and 𝐷 2 (right).</figDesc><graphic coords="6,298.66,65.61,212.10,134.06" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Technical characteristics of the datasets used in the experimental analysis.</figDesc><table><row><cell>𝐷 1</cell><cell>1,076-1,076</cell><cell>1,076</cell><cell>1.16×10 6</cell><cell>3</cell><cell>4,345</cell><cell>0.924</cell><cell>0.229</cell></row><row><cell>𝐷 2</cell><cell>2,554-22,074</cell><cell>853</cell><cell>5.64×10 7</cell><cell>6</cell><cell>5,163</cell><cell>0.910</cell><cell>0.150</cell></row></table><note>2. OpenHermes 2 . This is a Mistral 7B model fine-tuned with fully open datasets, showcasing strong multi-turn chat Dataset #Entities Duplicates Cartesian Product #Attributes Candidate Pairs Bl.Recall Bl.Precision</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Best performance per LLM in combination with the top performing variant per prompt strategy across both datasets.</figDesc><table><row><cell>Prompt Strategy</cell><cell cols="4">𝐷 1 Precision Recall F-Measure Run-time</cell><cell cols="4">𝐷 2 Precision Recall F-Measure Run-time</cell></row><row><cell>Zero-shot</cell><cell>0.664</cell><cell>0.956</cell><cell>0.784</cell><cell>32 min</cell><cell>0.397</cell><cell>0.740</cell><cell>0.517</cell><cell>23 min</cell></row><row><cell>FT Few-shot</cell><cell>0.768</cell><cell>0.834</cell><cell>0.799</cell><cell>41 min</cell><cell>0.420</cell><cell>0.515</cell><cell>0.463</cell><cell>33 min</cell></row><row><cell>Atomic Domain-specific</cell><cell>0.689</cell><cell>0.934</cell><cell>0.793</cell><cell>33 min</cell><cell>0.434</cell><cell>0.708</cell><cell>0.538</cell><cell>25 min</cell></row><row><cell></cell><cell></cell><cell></cell><cell>(a) Orca2</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Zero-shot</cell><cell>0.584</cell><cell>0.963</cell><cell>0.727</cell><cell>31 min</cell><cell>0.309</cell><cell>0.864</cell><cell>0.455</cell><cell>23 min</cell></row><row><cell>Intersection Few-shot</cell><cell>0.683</cell><cell>0.718</cell><cell>0.700</cell><cell>40 min</cell><cell>0.378</cell><cell>0.585</cell><cell>0.459</cell><cell>33 min</cell></row><row><cell>Atomic Domain-specific</cell><cell>0.556</cell><cell>0.969</cell><cell>0.707</cell><cell>33 min</cell><cell>0.306</cell><cell>0.876</cell><cell>0.453</cell><cell>25 min</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="2">(b) OpenHermes</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Zero-shot</cell><cell>0.572</cell><cell>0.965</cell><cell>0.718</cell><cell>32 min</cell><cell>0.329</cell><cell>0.942</cell><cell>0.488</cell><cell>24 min</cell></row><row><cell>Intersection Few-shot</cell><cell>0.667</cell><cell>0.877</cell><cell>0.757</cell><cell>43 min</cell><cell>0.408</cell><cell>0.761</cell><cell>0.531</cell><cell>34 min</cell></row><row><cell>Composite Domain-specific</cell><cell>0.573</cell><cell>0.960</cell><cell>0.718</cell><cell>39 min</cell><cell>0.372</cell><cell>0.913</cell><cell>0.529</cell><cell>30 min</cell></row><row><cell></cell><cell></cell><cell></cell><cell>(c) Zephyr</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://ollama.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://huggingface.co/stabilityai/StableBeluga2</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgments. This work was partially funded by the EU project STELAR (Horizon Europe -Grant No. 101070122).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Data Matching -Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection</title>
		<author>
			<persName><forename type="first">P</forename><surname>Christen</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2012">2012</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">The Four Generations of Entity Resolution</title>
		<author>
			<persName><forename type="first">G</forename><surname>Papadakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ioannou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Thanos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Palpanas</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2021">2021</date>
			<publisher>Morgan &amp; Claypool Publishers</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Big data integration</title>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">L</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Srivastava</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2013">2013</date>
			<publisher>ICDE</publisher>
			<biblScope unit="page" from="1245" to="1248" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">An overview of end-to-end entity resolution for big data</title>
		<author>
			<persName><forename type="first">V</forename><surname>Christophides</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Efthymiou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Palpanas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Papadakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Stefanidis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Comput. Surv</title>
		<imprint>
			<biblScope unit="volume">53</biblScope>
			<biblScope unit="page">42</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Entity resolution in the web of data</title>
		<author>
			<persName><forename type="first">K</forename><surname>Stefanidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Efthymiou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Herschel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Christophides</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">23rd International World Wide Web Conference</title>
				<imprint>
			<publisher>WWW</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="203" to="204" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Building a broad knowledge graph for products</title>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">L</forename><surname>Dong</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ICDE</title>
		<imprint>
			<biblScope unit="page">25</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">A survey of indexing techniques for scalable record linkage and deduplication</title>
		<author>
			<persName><forename type="first">P</forename><surname>Christen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Trans. Knowl. Data Eng</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="1537" to="1555" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">A survey of blocking and filtering techniques for entity resolution</title>
		<author>
			<persName><forename type="first">G</forename><surname>Papadakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Skoutas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Thanos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Palpanas</surname></persName>
		</author>
		<idno>CoRR abs/1905.06167</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Duplicate record detection: A survey</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Elmagarmid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">G</forename><surname>Ipeirotis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">S</forename><surname>Verykios</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Trans. Knowl. Data Eng</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="page" from="1" to="16" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A novel ensemble learning approach to unsupervised record linkage</title>
		<author>
			<persName><forename type="first">A</forename><surname>Jurek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Inf. Syst</title>
		<imprint>
			<biblScope unit="volume">71</biblScope>
			<biblScope unit="page" from="40" to="54" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Automatic record linkage using seeded nearest neighbour and support vector machine classification</title>
		<author>
			<persName><forename type="first">P</forename><surname>Christen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGKDD</title>
				<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="151" to="159" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Active learning based entity resolution using markov logic</title>
		<author>
			<persName><forename type="first">J</forename><surname>Fisher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Christen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PAKDD</title>
		<imprint>
			<biblScope unit="page" from="338" to="349" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Entity resolution: Past, present and yet-to-come</title>
		<author>
			<persName><forename type="first">G</forename><surname>Papadakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ioannou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Palpanas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">EDBT</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="647" to="650" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">The five generations of entity resolution on web data</title>
		<author>
			<persName><forename type="first">K</forename><surname>Nikoletos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ioannou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Papadakis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ICWE</title>
		<imprint>
			<biblScope unit="page" from="469" to="473" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Entity matching using large language models</title>
		<author>
			<persName><forename type="first">R</forename><surname>Peeters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
		<idno>CoRR abs/2310.11244</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Can foundation models wrangle your data?</title>
		<author>
			<persName><forename type="first">A</forename><surname>Narayan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Chami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">J</forename><surname>Orr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ré</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. VLDB Endow</title>
				<meeting>VLDB Endow</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="page" from="738" to="746" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zeng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Sun</surname></persName>
		</author>
		<idno>CoRR abs/2405.16884</idno>
		<title level="m">Match, compare, or select? an investigation of large language models for entity matching</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Chai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Du</surname></persName>
		</author>
		<idno>CoRR abs/2312.03987</idno>
		<title level="m">Cost-effective in-context learning for entity resolution: A design space exploration</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Touvron</surname></persName>
		</author>
		<idno>CoRR abs/2307.09288</idno>
		<title level="m">Llama 2: Open foundation and fine-tuned chat models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title/>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Q</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sablayrolles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mensch</surname></persName>
		</author>
		<idno>CoRR abs/2310.06825</idno>
	</analytic>
	<monogr>
		<title level="j">Mistral</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title level="m" type="main">Orca 2: Teaching small language models how to reason</title>
		<author>
			<persName><forename type="first">A</forename><surname>Mitra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">D</forename><surname>Corro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mahajan</surname></persName>
		</author>
		<idno>CoRR abs/2311.11045</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">Zephyr: Direct distillation of LM alignment</title>
		<author>
			<persName><forename type="first">L</forename><surname>Tunstall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Beeching</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Lambert</surname></persName>
		</author>
		<idno>CoRR abs/2310.16944</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<author>
			<persName><forename type="first">C</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Luo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Llama pro: Progressive llama with block expansion</title>
				<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="6518" to="6537" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">pyjedai: a lightsaber for link discovery</title>
		<author>
			<persName><forename type="first">K</forename><surname>Nikoletos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Papadakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Koubarakis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ISWC Posters, Demos and Industry Tracks</title>
				<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">3254</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Open benchmark for filtering techniques in entity resolution</title>
		<author>
			<persName><forename type="first">F</forename><surname>Neuhof</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fisichella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Papadakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Nikoletos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Augsten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Nejdl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Koubarakis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">VLDB J</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="1671" to="1696" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">Zeroer: Entity resolution using zero labeled examples</title>
		<author>
			<persName><forename type="first">R</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chaba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sawlani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Thirumuruganathan</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
			<publisher>SIGMOD</publisher>
			<biblScope unit="page" from="1149" to="1164" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">A critical re-evaluation of record linkage benchmarks for learning-based matching algorithms</title>
		<author>
			<persName><forename type="first">G</forename><surname>Papadakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kirielle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Christen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Palpanas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICDE</title>
				<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="3435" to="3448" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<title level="m" type="main">Deep learning for entity matching: A design space exploration</title>
		<author>
			<persName><forename type="first">S</forename><surname>Mudgal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rekatsinas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Doan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Krishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Deep</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Arcaute</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Raghavendra</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
			<publisher>SIGMOD</publisher>
			<biblScope unit="page" from="19" to="34" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Magellan: Toward building entity matching management systems</title>
		<author>
			<persName><forename type="first">P</forename><surname>Konda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Das</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. VLDB Endow</title>
				<meeting>VLDB Endow</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="1197" to="1208" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
