<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Smart Multi-Modal Search: Contextual Sparse and Dense Embedding Integration in Adobe Express</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Cherag</forename><surname>Aroraa</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Adobe Inc</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Tracy</forename><surname>Holloway King</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Adobe Inc</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jayant</forename><surname>Kumar</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Adobe Inc</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yi</forename><surname>Lu</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Adobe Inc</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sanat</forename><surname>Sharma</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Adobe Inc</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Arvind</forename><surname>Srikantan</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Adobe Inc</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">David</forename><surname>Uvalle</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Adobe Inc</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Josep</forename><surname>Valls-Vargas</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Adobe Inc</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Harsha</forename><surname>Vardhan</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Adobe Inc</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Smart Multi-Modal Search: Contextual Sparse and Dense Embedding Integration in Adobe Express</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">FD7F37C884453DFD62E6DC2928D68411</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:00+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>multi-modal search</term>
					<term>text-image embeddings</term>
					<term>hybrid search techniques</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>As user content and queries become increasingly multi-modal, the need for effective multi-modal search systems has grown. Traditional search systems often rely on textual and metadata annotations for indexed images, while multi-modal embeddings like CLIP enable direct search using text and image embeddings. However, embedding-based approaches face challenges in integrating contextual features such as user locale and recency. Building a scalable multi-modal search system requires fine-tuning several components. This paper presents a multi-modal search architecture and a series of AB tests that optimize embeddings and multi-modal technologies in Adobe Express template search. We address considerations such as embedding model selection, the roles of embeddings in matching and ranking, and the balance between dense and sparse embeddings. Our iterative approach demonstrates how utilizing sparse, dense, and contextual features enhances short and long query search, significantly reduces null rates (over 70%), and increases click-through rates (CTR). Our findings provide insights into developing robust multi-modal search systems, thereby enhancing relevance for complex queries.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>For search over images and multi-modal content, industry search systems traditionally rely on textual and metadata annotations added to indexed images. However, multi-modal embeddings like CLIP <ref type="bibr" target="#b0">[1]</ref> enable direct search of image content using text and image embeddings, allowing for direct text-to-image and image-to-image search. While pure embedding-based approaches facilitate content understanding, they struggle with integrating contextual features like user locale and recency into retrieved results. Building a production-grade, scalable, multi-modal search system involves carefully tuning several components. This paper describes a series of AB tests conducted to leverage embeddings and other multi-modal technologies in search for Adobe Express templates. These templates are complex multimodal (and multi-page) documents, containing images, text and rich metadata (section 2). Figure <ref type="figure" target="#fig_0">1</ref> shows the Adobe Express template search for a head query and a tail query, where the templates are displayed to the user as images and template metadata drives the left rail filters.</p><p>To improve text search for templates, integrating embeddings required decisions as to:</p><p>• Which embedding model(s) to use • Whether to leverage embeddings for matching (recall), ranking, or reranking • Whether to use dense or sparse embeddings • Whether head and tail queries should be treated identically • Whether embeddings should be used for null and low recovery or everywhere</p><p>Other than which embeddings to use, these decisions were driven by latency concerns and by constraints on integration with Elasticsearch, which was the existing inverted index used for Express template search. With an ever-increasing collection of ∼300,000 templates, dense embeddings could not be used</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>The 1st Workshop on Multimodal Search and Recommendations (CIKM MMSR '24)</head><p>Envelope charora@adobe.com (C. Aroraa); tking@adobe.com (T. H. King); jaykumar@adobe.com (J. Kumar); yil@adobe.com (Y. Lu); sanatsha@adobe.com (S. Sharma); asrikantan@adobe.com (A. Srikantan); duvallezeped@adobe.com (D. Uvalle); jvallsvargas@adobe.com (J. Valls-Vargas); hmatadaallam@adobe.com (H. Vardhan) for matching due to the number of scoring calculations, which leads to high latency. This restricted dense embeddings to (re)ranking, where only a small (&lt;10K) number of top templates had to be scored, and to scenarios like null and low recovery and long tail queries, where the additional latency was worth the improved relevance. In addition, certain types of queries performed better with keyword search, especially those around design type (e.g. poster, Instagram reel) and format (e.g. still, animated, video).</p><p>To determine the optimal combination, we took an iterative approach with a series of evaluations and AB tests. We started with existing models with single integrations and then built on these to improve remaining relevance issues. This paper first overviews the data and models used (section 2) and then discusses the experiments and how the decisions were made for each of these (section 3).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Models and Data</head><p>This section describes key data and models used for the Express template recall and ranking. The templates themselves contain rich image, text, and metadata. The standard search behavioral data (e.g. impressions, clicks) are available, as well as certain application-specific behavioral data (e.g. number of edits, number of exports). These are briefly described in section 2.1. In addition, we have two types of multi-modal models: two CLIP text-image models (section 2.2) and an intent-based model (section 2.3).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Template Data</head><p>Express templates are rich objects which contain many visual layers and text boxes. These can also be viewed as images, e.g. those that are displayed in search (Figure <ref type="figure" target="#fig_0">1</ref>). In addition, templates have titles provided by the template designers as well as filter information such as design type, style, mood, region, and price (free/premium). Additional information is inferred about each template including multi-modal embeddings, user intents, and image tags. Finally, aggregated behavioral data such as impressions, clicks, edits (number of edits users make to the template in order to personalize them) and exports (number of times the template is exported after editing) are available. An example template with its data is shown in Figure <ref type="figure" target="#fig_2">2</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Image-Text CLIP Embeddings</head><p>CLIP <ref type="bibr" target="#b0">[1]</ref> embeds images and text in the same space. This allows for embedding-based search of images using text queries. There are several off-the-shelf CLIP models available. However, for Express template search and for other visual asset search like Adobe Stock, we needed a model that: (1) worked on short text (queries) as well as long text (captions); (2) covered five languages (English, French, German, Japanese, Korean); (3) performed well on high-quality image data for templates, photographs and illustrations; (4) had a sparse version as well as the dense vectors. To meet these requirements, we trained a CLIP-architecture model on Adobe-licensed image-text data. The text model was particularly   <ref type="bibr" target="#b1">[2]</ref>.</p><p>There are many ways to improve the latency when using embeddings with large numbers of assets. However, the approximate methods reduce accuracy because the list of assets whose embeddings are closest to the query embedding is not exact. Once a smaller set of embeddings is selected (e.g. by using the top n embeddings from the approximate scoring), then the dense embedding can be used to get more accurate scores for a final ranking. We used a sparsification method which allows the embeddings to be used similar to keywords in the existing index. <ref type="foot" target="#foot_0">1</ref> An example of this is shown in Table <ref type="table" target="#tab_0">1</ref>. The dense embeddings have values for every dimension (2048 dimensions in table <ref type="table" target="#tab_0">1</ref>). The sparse version, which is derived from the dense one, has more dimensions (8192 in table 1) but most of them have no values. For a query, only assets which match at least n dimensions are returned. In the example, n is set to 1 and so image 2 matches dimension 3 and image 4 matches dimensions 3 and 8192, while image 3 is not matched. The scoring is the sum of the matched dimension values weighted by the score of that dimension for the query. This sparse encoding for matching and scoring is extremely fast, but comes at the cost of lower accuracy.</p><p>The Adobe-specific model (AdobeCLIP) was evaluated for Adobe Express and Stock content, baselining against the off-the-shelf CLIP versions. For large scale evaluation, a stratified sample of held-out search queries were used for semantic search against an index of CLIP and AdobeCLIP embeddings for Express and Stock content. Past clicked assets were considered relevant, non-clicked items irrelevant. In addition, we selected titles (generally 5-15 words) for Express templates and Stock images and used them as long queries, measuring the position of the asset which originally had that title. These methods provide a lower bound on performance since many of the non-clicked items are relevant and titles often matched multiple high-ranking images. These two approaches allowed us to quickly compare different versions of AdobeCLIP against one another and against CLIP. Once the AdobeCLIP model outperformed CLIP and earlier AdobeCLIP versions, we manually inspected results for a subset of held-out queries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Multi-Modal Creative Knowledge Graph</head><p>In addition to learning representations of the content via AdobeCLIP, we found mapping the content's intent to discrete nodes improved recall and explainability and allowed for downstream-recommendation tasks, similar to <ref type="bibr" target="#b5">[6]</ref>. However, we discovered that self-supervised models like AdobeCLIP, which were trained on asset-caption and asset-query data like Adobe Stock and Adobe Express failed to accurately map the asset's intent to short discrete labels. To accomplish, this we created a "Creative" Knowledge Graph (CKG) <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9]</ref> containing over 100K nodes focusing on Adobe-specific user intents. We then trained a multi-modal transformer (MM-CKG) specializing in mapping assets to these discrete nodes using supervised contrastive training. We mined concepts for events, actions, objects, moods, canvas types, colors, and backgrounds to get a robust understanding of an asset's content. For example, actions has subtypes of run, dance, …; events has subtypes of birthday, graduation, wedding, seasonal, …; in turn events|seasonal has subtypes of Halloween, Thanksgiving, 4th of July, ….</p><p>To train the model, we created sequence-wise self-attention blocks inspired by <ref type="bibr" target="#b9">[10]</ref>. We built our model on top of a base CLIP backbone and then added a sequence-wise attention block on top that takes in the hidden states from the last layer of the CLIP backbone that runs through a couple layers of multi-headed transformer blocks. We utilized the 𝑇 𝑐𝑙𝑠 and 𝐼 𝑐𝑙𝑠 outputs from the sequence-wise attention heads as the final representation of the input image and text modalities. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.1.">Supervised Contrastive Loss (SupCoLA)</head><p>We devised our loss function with the following requirements:</p><p>1. Alignment to labels: Ensure that the image and text in the training process were close to the label embeddings. 2. Ability to handle multiple positives in a batch: Traditional contrastive learning (InfoNCE loss <ref type="bibr" target="#b10">[11]</ref>) assumes that for a given pair in a batch, all other pairs are negatives. However, when learning alignment with labels, multiple rows with the same label may be present in a batch. The loss function should not penalize these rows during loss computation. 3. Ability to have multiple labels per row: Some rows have multiple labels. For example, for the prompt, boy is sitting on a beach with his dad for father's day, there are multiple concepts: the creative intent father's day, the scene objects boy and beach, and the background beach background.</p><p>Our resulting loss function, Label-Aligned Supervised Contrastive Loss, is based on SupCon loss <ref type="bibr" target="#b11">[12]</ref> where we pass image, text and label embeddings as anchor features as well as contrast features.</p><formula xml:id="formula_0">ℒ sup = ∑ 𝑖∈𝐼 ℒ sup 𝑖 = ∑ 𝑖∈𝐼 (− 1 𝑃(𝑖) ∑ 𝑝∈𝑃(𝑖) [ ∑ 𝑣∈𝑗(𝑝) log exp(z 𝑖 ⋅ z 𝑣 /𝜏 ) ∑ 𝑛∈𝐴(𝑖) exp(z 𝑖 ⋅ z 𝑛 /𝜏 ) ])<label>(1)</label></formula><p>where 𝐼 is the mini batch, 𝑖 is the index of anchor sample in the batch, 𝐴(𝑖) ≡ 𝐼 ∖ 𝑖 is the set of all samples 𝑛 in the batch that have distinct index than the anchor 𝑖, 𝑗(𝑝) is the set of all positives 𝑝 in the batch that have the same label as anchor 𝑖 and are views of 𝑝. Views of sample 𝑝 denote the embeddings for the label, image and text modalities. Why do we have two domain-specific multi-modal embeddings (AdobeCLIP and MM-CKG)? These target and excel at different use cases, both of which are important for template search relevance. MM-CKG is better at determining the underlying key intent from a query and at specific scene object detection. AdobeCLIP is better at color and layout understanding.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Iterative Experiments</head><p>This section describes the series of on-line experiments conducted to improve the Express template multi-modal search. We focus primarily on experiments involving multi-modal embeddings, but include one experiment that leveraged multi-modal content under the hood but text at query time.</p><p>The Express template search uses a standard architecture for relevance (Figure <ref type="figure" target="#fig_4">4</ref>). It is built on Elasticsearch. There is an initial matching (recall) step to retrieve documents which broadly match the user's query. This step uses keyword-style matching against text and metadata and includes an initial low latency scoring. Matching using sparse AdobeCLIP embeddings for all queries (section 3.4) and dense multi-modal CKG embeddings for long queries (section 3.5) were also added. If not enough results are found, null and low recovery occurs, including a speller (not discussed in this paper; see <ref type="bibr" target="#b12">[13]</ref>) and the use of symbolic CKG intents (section 3.2). The top 10K templates from the initial match set are then reranked using a much broader set of features. This includes dense multi-modal embeddings as well as the usual discrete features such as BM25, locale, language, and aggregated behavioral data. To improve relevance, and hence search click-through rate and export rate, we took an iterative approach to learn how to optimally leverage our multi-modal understanding, especially the multi-modal embeddings. Each of these are discussed in detail below. Since we were redesigning the search system, including the retrieval and ranking platform, some of the experiences were evaluated extensively offline and then launched with end-over-end monitoring, while later experiences were AB tested. <ref type="foot" target="#foot_1">2</ref>1. Reranking with External Image-Text Model (section 3.1) 2. Null and Low recovery with Symbolic Multi-Modal Intents (section 3.2) 3. Ranking with Domain-specific Image-Text Model (section 3.3) 4. Recall with Sparse Image-Text Model (section 3.4) 5. Long Query Recall and Ranking with Multi-Modal Model (section 3.5)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Reranking with External Image-Text Model</head><p>Our initial experiment used an external, English-only CLIP model in the rescore stage of the ranker. To do this, we had to determine how many items to rescore. This was largely governed by latency concerns since we wanted as many items as possible to use the CLIP multi-modal signal. By running load tests, we determined that we could use the CLIP scores for the top 10K templates, where the initial ranking was determined by the existing ranker.</p><p>We also had to determine how to weight the CLIP scores in the rescoring. Since the template ranker at this time was a non-ML, hand-tuned ranker, we determined this based on evaluations of a stratified query sample. The extreme baseline was to use only the CLIP score for the reranking. This had two drawbacks: (1) The top results were not visually diverse enough, especially for broad queries like birthday card or wedding invitation; (2) there was not enough recent content to provide a sense of freshness and seasonality. To determine a suitable weight, we used a divide-and-conquer approach by starting with a 50% balance between the first round ranker score and the CLIP score and then adjusting. This quickly converged on a weighting of basically 2/3 for CLIP and 1/3 for the first round ranker score. In AB testing, the click-through rate (CTR) and export rate improved with the CLIP-based reranking (Table <ref type="table" target="#tab_1">2</ref>). <ref type="foot" target="#foot_2">3</ref> There was no change in the null rate, as was to be expected. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Null and Low Recovery with Symbolic Multi-Modal Intents</head><p>Due to the broad range of user intents, the limited template collection, and the keyword based retrieval, users frequently landed on null and low result pages. When the number of results are low (&lt;5 results), the search engagement with the results drops significantly (up to 2-3x). To reduce the null and low result rate, we incorporated recovery mechanism using the symbolic CKG intents. The CKG intents for each template were indexed and the CKG intents for the query were calculated at query time. If there were &lt;5 results, the CKG query intents were matched against the template intents. For example, the query hot yoga studio opening has the intents yoga and would match all templates with that intent. This resulted in major improvements in CTR and null rate (table 3).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Ranking with Domain-specific Image-Text Model</head><p>The CLIP model (section 3.1) only worked for English and was not optimized for Express templates and queries. Replacing CLIP with AdobeCLIP to rerank the top 10K Express templates was expected to be on par for English queries and improve the CTR for non-English queries. Because the recall and first-round ranker constrained the result set, the core relevance, especially for head queries, was unlikely to change significantly, although the torso and tail queries, especially in non-English were expected be significantly different. The move from CLIP to AdobeCLIP was part of a larger AB test which moved from an older search infrastructure to a newer one which, among other things, allowed for multiple embedding types. The goal of the AB test was to have no negative effects while moving to the new platform. This was borne out (table <ref type="table" target="#tab_3">4</ref>). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Recall with Sparse Image-Text Model</head><p>None of the previous experiments leveraged the power of embeddings for augmenting the initial match set. The CLIP and AdobeCLIP models only affected the reranking of the search results. The null and low recovery with symbolic CKG multi-modal intents only affected null and low queries and leveraged symbolic intents. Dense embeddings could not be used for the initial match set due to latency constraints. So, we experimented with using the AdobeCLIP sparse embeddings in the match set to augment the existing keyword matches. 4 This required determining how many dimensions to match in the sparse embedding (section 2.2) in order to retrieve enough new relevant documents and not too many irrelevant ones. As is well known in the literature <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15]</ref> determining accurate thresholds on embeddings using cosine similarity or dot product is not feasible. 5 The sparse embedding approach allowed us to require a minimum number of asset dimension matches. Once the retrieval approach was determined, the ranking was updated to demote less relevant templates retrieved by the sparse embeddings (table <ref type="table" target="#tab_4">5</ref>). We considered using only AdobeCLIP sparse for the match set but rejected this since the model performed badly at identifying videos, which are popular queries and important to the business. AdobeCLIP only has the image embedding to match against and the images from Express video templates are extremely similar to those of still templates. 5 See <ref type="bibr" target="#b15">[16]</ref> for a recent approach to determining relevance thresholds with embeddings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Long Query Recall and Ranking with Multi-Modal Model</head><p>The above experiments improved relevancy for head queries, both in matching the intent of the user query and in quality of the templates shown. In addition, the improved recall from CKG symbolic intents for null and low recovery (section 3.2) and the addition of sparse AdobeCLIP embeddings into the initial match set (section 3.4) resulted in a broad set of related templates being shown when there are few exact matches. However, there are often few exact match templates for more specific user queries, i.e. for tail queries.</p><p>To address this issue, we targeted longer (&gt;=4 words) to use the CKG multi-modal embedding (MM-CKG section 2.3). The more specific intents of the longer queries work especially well with the domain-specific embeddings, allowing the recall and ranking to find the few templates that exactly match the user intent. The ranking combined 1/3 the weight on MM-CKG and 2/3 the weight on AdobeCLIP. The hypothesis behind this was that AdobeCLIP captures the core relevance matching the query text to the image rendition, while MM-CKG captures the underlying intent of the query and the template. The optimal query length for this experience was determined empirically by manually judging a stratified sample of queries of different lengths, comparing production to the new experience. There was a clear demarcation between queries of &lt;4 words and those of &gt;=4 words. Table <ref type="table">6</ref> shows that for &lt;4 words, both production and MM-CKG largely provide relevant results, i.e. for head queries both approaches work well. However, for &gt;=4 words the new MM-CKG results are significantly better than those in production.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 6</head><p>Relevancy results from human annotation when leveraging MM-CKG for recall and ranking.</p><p>The MM-CKG embedding does not yet have a sparse version of the type available for AdobeCLIP (section 2.2). For this reason, it was not feasible from a latency perspective to use the MM-CKG matching and ranking approach used for all queries. However, since long queries are &lt;10% of the query traffic, we determined that the increased latency from calculating cosine similarity scores between the query and all of the templates was worth the improvements to relevance.</p><p>The AB test launched showcased statistically significant improvements of CTR and null rate on long queries and prompts, highlighting the usefulness of the hybrid system. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusion</head><p>Multi-modal search experiences in industry applications traditionally depend on textual data in the index, thereby reducing the multi-modal search to a traditional keyword search. This provides a low latency experience since industry search engines are heavily optimized for keyword search. The advent of high quality multi-modal embeddings like CLIP has provided radically new capabilities. However, in an existing application, such as Adobe Express template search in this paper, the available multi-modal capabilities and the existing infrastructure including strict latency requirements, require a thoughtful, iterative approach to integrating new multi-modal technologies. This paper described five multi-modal experiments in Express template search, each of which built upon the others. This has resulted in significantly lower null and low rates, while improving click-through rates.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Examples of Express template search. Left: head query coffee instagram. Right: tail query colorful coffee promotion instagram</figDesc><graphic coords="2,305.11,65.60,215.98,118.47" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>Title: Pink Unicorn Birthday Party Instagram Portrait Post Topics: confetti, fantasy, glitter, gold, kids, sparkle, star, unicorn Mood: happy, joyful; Style: bright Region: all; Language: en-US; Date: 2023-12-12; Behavior: still; License: premium AI-inferred: AdobeCLIP embedding, Multi-modal CKG embedding, CKG symbolic intent, autotags Behavioral: Search impressions, clicks, edits, exports</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Sample Express template data available for search matching and ranking</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: MM-CKG uses supervised contrastive learning with SupCoLA loss for label alignment. This allows the model to bring short labels focusing on the overall intent of the asset closer to the content embeddings.</figDesc><graphic coords="4,180.64,438.93,234.00,169.30" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: High-level search architecture for Express template multi-modal search</figDesc><graphic coords="5,153.64,542.11,287.98,163.13" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Dense and Sparse representations of embeddings with sample scoring for sparse embeddings. Dense embeddings are shown with 2048 dimensions. Sparse embeddings have more dimensions (here 8192) but most of the dimensions have no values.</figDesc><table><row><cell cols="3">Dense Embedding Example</cell><cell></cell><cell cols="3">Sparse Embedding Example</cell><cell></cell></row><row><cell cols="3">Dim. Img. 1 Img. 2</cell><cell cols="5">Dim. Query = Img. 2 Img. 3 Img. 4</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>Img. 1</cell><cell></cell><cell></cell><cell></cell></row><row><cell>1</cell><cell>0.11</cell><cell>1.23</cell><cell>1</cell><cell>-</cell><cell>1.12</cell><cell>-</cell><cell>-</cell></row><row><cell>2</cell><cell>1.21</cell><cell>0.42</cell><cell>2</cell><cell>-</cell><cell>-</cell><cell>0.81</cell><cell>-</cell></row><row><cell>3</cell><cell>0.15</cell><cell>0.53</cell><cell>3</cell><cell>1.16</cell><cell>0.83</cell><cell>-</cell><cell>0.64</cell></row><row><cell>4</cell><cell>0.22</cell><cell>2.25</cell><cell>4</cell><cell>-</cell><cell>-</cell><cell>1.83</cell><cell>-</cell></row><row><cell>…</cell><cell></cell><cell></cell><cell>…</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2">2048 2.17</cell><cell>0.64</cell><cell cols="2">8192 0.13</cell><cell>-</cell><cell>-</cell><cell>0.01</cell></row><row><cell></cell><cell></cell><cell cols="4">Sparse Embedding Matching and Scoring</cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell cols="3">Matches for query image 1:</cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell cols="4">image 2: 1 dimension (dimension 3)</cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell cols="3">image 3: 0 dimensions</cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell cols="5">image 4: 2 dimensions (dimension 3 &amp; 192)</cell><cell></cell></row><row><cell></cell><cell></cell><cell cols="2">Scoring for ranking:</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell cols="3">image 2: 1.16 * 0.83 = 0.96</cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell cols="4">image 4: 1.16 * 0.64 + 0.13 * 0.01 = 0.74</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>AB Test Results for CLIP Reranking. All results are statistically significant.</figDesc><table><row><cell>Metric</cell><cell>Change</cell></row><row><cell>CTR</cell><cell>+7%</cell></row><row><cell cols="2">Export rate +4%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>AB Test Results for CKG Symbolic Intents Null and Low Recovery. All results are statistically significant.</figDesc><table><row><cell>Metric</cell><cell>Change</cell></row><row><cell>Null rate</cell><cell>−69%</cell></row><row><cell cols="2">Null recovery CTR +300%</cell></row><row><cell>Low rate</cell><cell>−59%</cell></row><row><cell cols="2">Low recovery CTR +30%</cell></row><row><cell>Overall CTR</cell><cell>+7%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>AB Test Results for Platform Move including CLIP-to-AdobeCLIP. Only the null rate change is statistically significant.</figDesc><table><row><cell>Metric</cell><cell>Change</cell></row><row><cell>CTR</cell><cell>+0.0%</cell></row><row><cell cols="2">Null rate −7.7%</cell></row><row><cell cols="2">Low rate −5.0%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5</head><label>5</label><figDesc>AB Test Results for AdobeCLIP Recall and Ranking. All results are statistically significant.</figDesc><table><row><cell>Metric</cell><cell></cell><cell>Change</cell></row><row><cell></cell><cell cols="2">English Multilingual</cell></row><row><cell>CTR</cell><cell>+3%</cell><cell>+3%</cell></row><row><cell>Null Rate</cell><cell>−35%</cell><cell>−14%</cell></row><row><cell cols="2">Null and Low Recovery Rate −54%</cell><cell>−54%</cell></row><row><cell>4</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 7</head><label>7</label><figDesc>AB Test Results for Long Prompt Understanding.</figDesc><table><row><cell>Metric</cell><cell>Change</cell></row><row><cell>CTR</cell><cell>+17%</cell></row><row><cell cols="2">Null Rate −46%</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">Other approaches are described in<ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5]</ref>.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">The template collection is continually expanding and so the maximum recall size varied for each experiment.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">All results are shown as percentage change. We are unable to show exact metrics.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hallacy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ramesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Goh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sastry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Askell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mishkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Krueger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<idno>ArXiv:2103.00020</idno>
		<title level="m">Learning transferable visual models from natural language supervision</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Cross-lingual and multilingual CLIP</title>
		<author>
			<persName><forename type="first">F</forename><surname>Carlsson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Eisen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rekathati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sahlgren</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2022.lrec-1.739" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Thirteenth Language Resources and Evaluation Conference, European Language Resources Association</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Calzolari</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Béchet</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Blache</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Choukri</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Cieri</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Declerck</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Goggi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Isahara</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Maegaard</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Mariani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Mazo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Odijk</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Piperidis</surname></persName>
		</editor>
		<meeting>the Thirteenth Language Resources and Evaluation Conference, European Language Resources Association<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="6848" to="6854" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Sparseembed: Learning sparse lexical representations with contextual embeddings for retrieval</title>
		<author>
			<persName><forename type="first">W</forename><surname>Kong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Dudek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bendersky</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR &apos;23)</title>
				<meeting>the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR &apos;23)</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Product quantization for nearest neighbor search</title>
		<author>
			<persName><forename type="first">H</forename><surname>Jégou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Douze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Schmid</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Pattern Analysis and Machine Intelligence</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="117" to="128" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Kusupati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Bhatt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rege</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wallingford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sinha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Ramanujan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Howard-Snyder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kakade</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Farhadi</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2205.13147</idno>
		<title level="m">Matryoshka representation learning</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Combining embedding-based and semantic-based models for post-hoc explanations in recommender systems</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">L</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-H</forename><surname>Abel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Gouspillou</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2401.04474.arXiv:2401.04474" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">J</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Deshmukh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Suresh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Arora</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sadaphule</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dalal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Stefan</surname></persName>
		</author>
		<ptr target="https://patents.google.com/patent/US11645095B2/en,UnitedStatesPatentandTrademarkOffice" />
		<imprint>
			<date type="published" when="2023">2023. US11645095B2</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Multimodal input contextual font</title>
		<author>
			<persName><forename type="first">S</forename><surname>Sharma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zheng</surname></persName>
		</author>
		<ptr target="https://patents.google.com/patent/US11775734B2/en,UnitedStatesPatentandTrademarkOffice" />
		<imprint>
			<date type="published" when="2023">2023. US11775734B2</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Contextual font recommendations based on user intent</title>
		<author>
			<persName><forename type="first">S</forename><surname>Sharma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">H</forename><surname>King</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2306.08188.arXiv:2306.08188" />
	</analytic>
	<monogr>
		<title level="m">ECOM23 the SIGIR Workshop on e-Commerce Search</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sun</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2112.03562.arXiv:2112.03562" />
		<title level="m">Cma-clip: Cross-modality attention clip for image-text classification</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Representation learning with contrastive predictive coding</title>
		<author>
			<persName><forename type="first">A</forename><surname>Van Den Oord</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Vinyals</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/1807.03748.arXiv:1807.03748" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Supervised contrastive learning</title>
		<author>
			<persName><forename type="first">P</forename><surname>Khosla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Teterwak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sarna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Isola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Maschinot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Krishnan</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2004.11362.arXiv:2004.11362" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Contextual multilingual spellchecker for user queries</title>
		<author>
			<persName><forename type="first">S</forename><surname>Sharma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Valls-Vargas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">H</forename><surname>King</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Guerin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Arora</surname></persName>
		</author>
		<idno type="DOI">10.1145/3539618.3591861</idno>
		<idno>doi:10.1145/ 3539618.3591861</idno>
		<ptr target="https://doi.org/10.1145/3539618.3591861" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;23</title>
				<meeting>the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;23<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="3395" to="3399" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt-2 embeddings</title>
		<author>
			<persName><forename type="first">K</forename><surname>Ethayarajh</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/1909.00512.arXiv:1909.00512" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Length is a curse and a blessing for documentlevel semantics</title>
		<author>
			<persName><forename type="first">C</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">T</forename><surname>Hudson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Moubayed</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2310.16193.arXiv:2310.16193" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Relevance filtering for embedding-based retrieval</title>
		<author>
			<persName><forename type="first">N</forename><surname>Rossia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Magnani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Liao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of CIKM2024</title>
				<meeting>CIKM2024</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
