<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ACM SIGIR Workshop on eCommerce, July</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>(Vector) Space is Not the Final Frontier: Product Search as Program Synthesis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jacopo Tagliabue</string-name>
          <email>jacopo.tagliabue@nyu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ciro Greco</string-name>
          <email>ciro.greco@bauplanlabs.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bauplan Labs</institution>
          ,
          <addr-line>New York City</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>New York University</institution>
          ,
          <addr-line>New York City</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>27</volume>
      <issue>2023</issue>
      <abstract>
        <p>As ecommerce continues growing, huge investments in ML and NLP for Information Retrieval are following. While the vector space model dominated retrieval modelling in product search - even as vectorization itself greatly changed with the advent of deep learning -, our position paper argues in a contrarian fashion that program synthesis provides significant advantages for many queries and a significant number of players in the market. We detail the industry significance of the proposed approach, sketch implementation details, and address common objections drawing from our experience building a similar system at Tooso.</p>
      </abstract>
      <kwd-group>
        <kwd>product search</kwd>
        <kwd>semantic parsing</kwd>
        <kwd>program synthesis</kwd>
        <kwd>large language models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>“Now, like all great plans, my strategy is so simple an idiot could have devised it”
– Zapp Brannigan</p>
      <p>
        The explosive growth of ecommerce [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] brought equally impressive innovation in Information
Retrieval (IR) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], with product search now to representing 30% to 60% of total online revenues [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3,
4, 5</xref>
        ]. Building on decades of literature in web and document retrieval, product search is typically
modelled as a two-step process: candidate selection (retrieval [6]) and re-ranking [7, 8, 9, 10].
The most widespread model for retrieval is the vector space model (VSM) [11, 12, 13], according
to which relevance is approximated by the distance between a query vector and a product vector
in a suitable space. Even as deep learning drastically altered vectorization [14], it did not call
into question the tenets of the VSM, or the idea that re-ranking is needed to push down the page
irrelevant items wrongfully retrieved [15, 16]. It is important to remember that most real-world
search engines leverage VSM in one form or another: sparse BM25 retrieval in Elasticsearch
may be implemented very diferently from dense retrieval on Redis Vector Search 1, but they all
share the core idea of VSM. Namely, that retrieval is fundamentally approximated by distance
in a vector space.
CEUR
Workshop
Proceedings
      </p>
      <p>We argue that program synthesis through semantic parsing provides a principled and viable
alternative to VSM for product search. In this perspective, search queries are (informal)
instructions for knowledge bases, as opposed to points in a vector space 2. We shall defend two main
claims:
1. VSM is an indirect representation of meaning that is necessary for large unstructured
documents, such as those in web search; however, under diferent circumstances, where
search queries are interpreted against product catalogs, direct representation is feasible
and useful;
2. explicit representations unlock a powerful search experience where formal inferences
can be made to improve retrieval, while ranking is used as a device for personalization.</p>
      <p>Historically, ecommerce tech has been focusing mostly on the challenges of big players, while
a larger market share represented by mid-to-large websites has been neglected [18]. While we
recognize the intrinsic limits of position papers, we believe our contrarian argument will benefit
from the freedom allowed by this format. Our arguments proceed as follows: we first establish
some empirical facts about ecommerce search at the “Reasonable Scale”; we then showcase the
virtues of program synthesis, assuming a semantic oracle. Finally, we show how such a system
can actually be built.</p>
      <p>We believe this work to be valuable for a broad set of practitioners, solving specific use cases
in this segment of the market or working on SaaS solutions3. Even if most of the arguments we
2As we explain below (Fig. 4, our approach is to parse a search query to an intermediate semantic representation, and
then translates the latter into a program, handling the shopping query “as if it were instructions”; program synthesis
may also be construed directly from natural language [17]. We will refer to parsing and synthesis somehow liberally
below, since it’s clear how to move from one to the other.
3As a business context about this blooming industry, Algolia and Bloomreach raised &gt;USD200M each in venture
money in the last few years [19, 20], and Coveo raised &gt;CAD200M at IPO [21].
present are theoretical, these ideas have been successfully implemented in a company before
(Tooso), and played an important role in its acquisition by a public market leader (TSX:CVO)4.</p>
    </sec>
    <sec id="sec-3">
      <title>2. An Industry Perspective</title>
      <p>“Hooray! A happy ending for the rich people.” – Dr. Zoidberg</p>
      <p>While the idiosyncrasies of product search have been partially documented before [25, 26],
most ecommerce systems are still designed from the same building blocks as document search:
VSM for retrieval, Machine Learning (ML) for re-ranking using all types of signals. In our
experience, the farther you go from planetary scale retailers, the less product search will
resemble web search.</p>
      <p>Because digital transformation is consistently taking place in the retail industry, most
ecommerce search systems are now deployed outside of Big Tech Retailers. We are going to describe
the mid-long tail of ecommerce implementations as the “Reasonable Scale” (RSc) [27, 28, 29, 30].
While RSc is intended to be a loose concept [18], practitioners typically know it when they see
it [31].</p>
      <p>A number of strategies need to be diferent at RSc. For instance, instead of several millions of
SKUs, RSc shops may have 10K to 100K products and still make &gt;100M USD in yearly revenues.
Queries on inventories of this size can easily have result sets of 10 golden items. In this context,
no re-ranking strategy will be able to hide irrelevant products from the user: for the typical
strategy of hiding results in page two5 to work, there should be a page two to begin with. Even
as inventory grows, VSM may go against shoppers’ preference: for price-sensitive items, users
often sort results by price [ 32]. When this happens, sub-optimal candidate selection can hurt
the experience (Fig. 3)(see also the cases discussed in [33] with regard to prices and sizes).</p>
      <p>To paint a more quantitative picture of the RSc, we can leverage our unique and privileged
position as SaaS practitioners with access to dozens of diferent real-world deployments. In
particular, there are two main facts that turn out to be crucial for our approach (Section 4):
4While most of these ideas have been developed in 2017-2019, we have updated our arguments to reflect the most
recent advancements in the field.
5“The best place to hide a dead body is page two of Google.”</p>
      <p>
        1. product search mostly deals with short queries in the form of Noun Phrases (NPs)
describing entities and properties (e.g. “red shoes” or “Dell laptop”) [34]. Query examples
from RSc shops can also be found in [35] (Table 1) and [
        <xref ref-type="bibr" rid="ref6">36</xref>
        ];
2. a small number of queries account for a significant portion of the distribution, making
superior relevance for top queries extremely impactful for the overall experience. In the
frequency distribution of a month of anonymous query data sampled from three RSc
shops in two languages, the top 1-to-5% queries account for half of the total individual
queries (Fig. 2).
      </p>
      <p>The first observation is important as parsing gets harder with longer queries; the second
observation is important as it indicates how to align technological objective with business
outcomes – i.e., solving parsing for short queries is a very good place to start.</p>
      <p>Taken together, they both re-afirm the peculiarities of product search, but from a novel and
unusual angle: interestingly, both facts are not true for web or big-scale ecommerce search – as
the numbers of users / items get larger and revenues grow into billions, the tail of the query
distribution gets both longer and more important. In other words, while the general linguistic
behavior for users of Amazon or Facebook is also be NP-based, the tail is disproportionally more
important: the tail is longer, as big catalogs invite a larger set of inputs, and the tail is more
valuable, as marginal improvements in rare queries translate in sizable monetary gains. While
we believe our approach can be used, under the appropriate circumstances, at any scale, its
novelty and impact are more easily noticeable for RSc deployments.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Searching with an oracle</title>
      <p>
        Originally developed for large documents and long queries, VSM is a useful approximation as it
provides a retrieval strategy that avoid explicitly modelling for meaning, which has long been
thought to be an intractable problem: what would be the logical form [
        <xref ref-type="bibr" rid="ref7">37</xref>
        ] of this Wikipedia
page6? As we argue below, the challenges of explicit representations are eased for product
search: on the query side, real-world data shows that NP-like queries are very impactful (Section
6https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)
2); on the item side, products are remarkably diferent from long documents: products are
welldefined entities, which can be described through a sortal (i.e. the type of object, e.g. “shoes”) and
few key properties (e.g. color, material, size, brand, price - crucially those more often used by
shoppers [
        <xref ref-type="bibr" rid="ref6 ref8">38, 36</xref>
        ]). In other words, products already come into an IR system as (quasi) structured
information.
      </p>
      <p>What would a search-as-parsing experience look like? We first sketch the general experience
we have in mind through a “parsing oracle” (PO) - i.e. an idealized system that is able to:
• at runtime, return the logical form of a query;
• at indexing time, given a product (as contained in a digital catalog [24]), return its
properties.</p>
      <p>
        Under the proposed approach, a query is parsed into a logical form (parsing), which is mapped
to a machine code to be executed over the target domain (synthesis): in Fig. 4 we find lambda
expressions and SQL [
        <xref ref-type="bibr" rid="ref9">39</xref>
        ], but the proposal is broadly compatible with any explicit formalism.
In other words, the meaning of “Prada purple shoes” is neither boolean operators over TF-IDF
weights, nor a BERT-based embedding, but (something like):
.[  () &amp; ℎ() &amp;   ()] .7
      </p>
      <p>
        Viewing queries as (small) programs to execute has several advantages. First, it provides
the ability to apply filters that are already available and show their application to the user –
this is often desirable but in a VSM-system it requires an additional module to be trained and
maintained. Second, the explicit and easy-to-debug “trace” of the query enables principled
fallback strategies. As an illustration, assume the user issues the query “purple shoes”, which
has no perfect matches. The logical form that (roughly) states “retrieve an object of type shoes,
with purple as a color”, allows us to reason about the next best thing available, and provide
a graceful fallback message (e.g. “we don’t have purple shoes, but we thought you could like
dark red shoes”8. An explicit parse leads us to recognize that diferent tokens in the query
have diferent psychological importance for the shopper: if the retrieval goal revolves around
shoes, the system should retrieve items that are still shoes while never retrieving purple items
that are not shoes. In this perspective, parsing both yields the exact linguistic intent and lays
down possible compositional fallback strategies. Crucially, fallback strategies can be ML-driven,
domain-driven, or heuristic-driven and may change from one deployment to the next: by
turning queries into code, we make it easier to incorporate constraints (including probabilistic
ones) into an interpretable search plan.
7With  , ℎ ,  as predicates of type  ,  ,  .
8Note that while IR explanations are often used to improve recommender systems [
        <xref ref-type="bibr" rid="ref10">40</xref>
        ], search may benefit from
them for similar reasons.
      </p>
      <p>
        To further appreciate the experience, it is useful to contrast what would have happened under
plausible implementations of VSM. Under a sparse vector space, the shopper would typically
either get a No result page, or – as the opposite extreme – received irrelevant items from an
OR expansion: non-shoes that are purple, shoes that are green 9. Under a dense vector space,
retrieval would provide a set of items, but no principled way to cut the set at the right position
(when a “near” vector is not near enough?) or explain its choice. Both are open problems [
        <xref ref-type="bibr" rid="ref11">41</xref>
        ],
and no solution is known, especially given data constraints of the RSc [29].
      </p>
      <p>
        There is another, subtler, way to appreciate the impact of PO on search especially relevant
for SaaS players [
        <xref ref-type="bibr" rid="ref12">42</xref>
        ], whose job is to develop solutions deployed on dozens independent shops
in several languages and verticals. When you have two shops in the same market (as Shop A
and Shop B below), PO gets you re-usable abstractions. Overlapping parse trees and product
properties can help with cold start scenarios: if a model is matching “Adidas” and “Nike” as
brands with high afinity, it can be ported to a new shop to boostrap learning (i.e. bootstrapping
a new ecommerce without any behavioral data). As an even more extreme form of bootstrapping,
learning can be transferred between (similar) languages when appropriate resources exist: while
VSM models can make good use of multi-language embeddings, the power-law of RSc helps
us here as well, as most retailers would do most business in 1-3 languages. 10 Of course,
reimagining search with PO opens up possibilities also outside of the search experience itself: just
to mention two obvious ones, finer-grained analytics (both about queries as expressing shoppers’
intent, and products, as a collection of human-readable properties), and cross-pollination with
data coming in and out of the PIM (Product Information Management).
      </p>
      <p>In this section, we argued that a large portion of the market would benefit from program
synthesis through semantic parsing, if such a system existed. We now show how such system
can be built.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Building a Semantic Parser</title>
      <p>
        As PO itself has two components – a query and a product parser with a shared domain and
interpretation (in the sense of model theory [
        <xref ref-type="bibr" rid="ref13">43</xref>
        ]) – how do we bootstrap and scale both?
Assuming we use ML to train the parser, the hardest part is obtaining a training set for queries:
while (almost) any untrained human can annotate an ecommerce catalog, producing logical
forms requires a good deal of work by trained linguists. We will therefore break the problem
into pieces, by first assuming we have product representations available to build a training set
for query parsing, and then relaxing this assumption.
      </p>
      <p>
        Fig. 1 showcases the creation of a query dataset for a statistical parser (3) 11, starting from
product representations (1) and a small grammar (2): our insight is that, instead of manual
annotation, we can programmatically generate golden triples &lt;query,logical form,SKUs&gt; by
synthesizing jointly queries, their logical form, and the result set, leveraging the isomorphism
9Far from being a theoretical possibility, this is the default experience for all website using out open source tools like
Elasticsearch, or non-AI SaaS providers.
10Even the fallback strategies mentioned before can be ported: if “sneakers” is fallback for “shoes”, the same strategy
can be applied any time you have “shoes” available in the parse tree.
11The details of the parser are pretty unimportant, as there is substantial evidence that this is a solvable problem
with good enough data [
        <xref ref-type="bibr" rid="ref14">44</xref>
        ].
between product representations and logical forms. Moving the annotation problem away
from logical form helps us leverage further insights on the peculiarities of the RSc. First, it
should be stressed that extracting (most) product features (1, in Fig. 1) is easy: some attributes
come already structured, and statistically accurate labels are easy to obtain thanks to methods
applicable across shops [
        <xref ref-type="bibr" rid="ref15 ref16">45, 46</xref>
        ]. In particular, while recent large language models cannot be
directly used at runtime [
        <xref ref-type="bibr" rid="ref17 ref18">47, 48</xref>
        ], they are ideally suited to be a complementary strategy to more
traditional methods when it comes to entity extraction (or even as an oracle for ofline usage
[
        <xref ref-type="bibr" rid="ref19">49</xref>
        ], see the Appendix). Product information is also important for other parts of the business,
which means labeling can piggyback on independently motivated processes (e.g. PIM).12
      </p>
      <p>Second, the peculiarities of query distribution simplify the slot filling component (2, in Fig. 1):
even in a SaaS scenario where extreme scalability is paramount, NP queries are easy to generate
and then re-use – the queries “ski trousers”, “running shoes” and “ski gloves” (mentioned in
[35]) share the same logical form. Not only the grammar is simple enough to start, but since the
ifnal goal is to parse queries through a model trained on these synthetic NPs, we can err on the
side of recall and over-generate (as it will just create training sentences that nobody would use).</p>
      <p>
        Let’s now recap our approach as an actionable list:
1. at indexing time, extract product representations from the catalog to be indexed in a
knowledge base, through heuristics and/or models [
        <xref ref-type="bibr" rid="ref16 ref20">50, 46</xref>
        ];13
2. build a simple NP-focused grammar, to cover a significant part of the distribution. The
process can begin by annotating historical queries with simple logical forms, and then
generalize a grammar to simplify those trees. To give a sense of how this would work, we
selected Shop A and Shop B, multi-brand retailers in the apparel industry and catalog
size between 10k and 30k SKUs. We manually annotate historical queries to get a sense
of what grammar captures user behavior. Few hundreds parses (respectively, 475 and
459) cover 43% and 25% of the entire query distribution for Shop A and Shop B;
3. use the product representation and the NP-grammar to generate a training test with
synthetic queries and golden parse trees ( Fig. 1) – note that it is easy to augment the set
of parsable queries through paraphrases [
        <xref ref-type="bibr" rid="ref14">44</xref>
        ] or prompting [
        <xref ref-type="bibr" rid="ref21">51</xref>
        ];
4. train a standard parsing model [
        <xref ref-type="bibr" rid="ref22 ref23">52, 53</xref>
        ] on this dataset;
5. at runtime, use the parsing model on an incoming query, get the logical form and map
it to an executable code for the target knowledge base: retrieve the products, execute
fallback strategies if relevant.
      </p>
      <p>This strategy has consequences for two important pieces of the search experience, re-ranking
and type-ahead suggestions. Re-ranking in VSM is often needed to hide poor results, and may
even conflict with relevance objectives: e.g., popular products may sometimes outrank others
irrespective of query intent. A structured approach to retrieval allows ranking to be mostly
about personalization: given a relevant result set, which of the following “purple shoes” is
best for this shopper (based on several real-time and historical ranking signals)? Conversely,
ranking rules – both manual and learned – can be applied on a ceteris paribus level: only if
two items are equally relevant, popularity can influence their ranking. Query suggestions are
12Product labeling can also be outsourced with no privacy concerns.
13We refer the readers to the Appendix for more details.
known to be important for a good search UX [32]: synthetic queries (Fig. 1) could be used to
suggest new and cold query types, as well as familiarize shoppers with the capability of the
parser; for example, suggesting “blue shoes under 100 USD” would gradually educate users in
using the search bar better.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Limitations and answers to common concerns</title>
      <sec id="sec-6-1">
        <title>5.1. Vectors strike back</title>
        <p>
          The explosion of NLP-capabilities in recent years have established beyond any reasonable doubt
the virtues of distributional semantics [
          <xref ref-type="bibr" rid="ref24">54</xref>
          ]: it may therefore seem strange to defend program
synthesis for IR use cases. The quality of the vectorized representations for queries and products
increased dramatically (including exciting possibilities such as multi-modal understanding
[
          <xref ref-type="bibr" rid="ref25">55</xref>
          ]), but the problem with VLM is still present even in the most sophisticated retailers: as
we observe in the result set in Fig. 3, the query “nintendo switch” is retrieving pens. While it
would be tempting to dismiss this as an artefact or an anecdote, it is on the contrary an essential
component of VSM: if relevance is distance in a vector space, there is no cut-of establishing
when far is too far. If we compare the result set to the typical response we would get from a
human assistant14, it is clear the shared meaning of “nintendo switch” is very diferent. For
almost-web-scale catalogs, vector search is pragmatically an efective strategy, as the “very
close” products for most queries are enough to fill the first few result pages; for smaller catalogs,
however, the perceived relevance may quickly degrade and the VSM approach has no principled
countermeasure.
        </p>
        <p>As we discuss below, better vector representations are an essential component of any search
engine, and NLP breakthroughs are a welcome addition to the toolkit of any shop. However,
treating relevance solely as a distance calculation is an approximation, and should be recognized
as such: when we switch our attention from lexically-driven to compositionally-driven use
cases, how much value can we now unlock?</p>
      </sec>
      <sec id="sec-6-2">
        <title>5.2. Parsing vs rewriting</title>
        <p>Parsing is hardly the only query processing technique available to RSc shops: for example,
query rewriting is a popular approach to bridge the gap between the user’s intent (“red Nike
sneakers”) and inventory (burgundy Adidas shoes). However, it is important to realize that
the concerns of parsing and rewriting modules are distinct, and possibly complementary: you
can rewrite “sneakers” into “shoes” before parsing it into an object type, but rewriting by itself
does not challenge the fundamental assumption of VSM – efective rewriting may improve
recall, but does not unlock any of the relevance benefit that parsing provides (Section 3). From
an engineering perspective, it’s easy to see how a rewriting module could leave completely
14Anecdotally, note that ChatGPT response to the prompt “You are a shopper assistant at Best Buy, the famous
electronic retailer. You work in the video-game section. A shopper comes to you and ask for nintendo switch: what
product do you think she wants to buy?” is “If a shopper comes asking for a Nintendo Switch, it’s most likely that
they are referring to the Nintendo Switch console itself”.
untouched the retrieval machinery of VSM, while parsing requires re-thinking the strategy
entirely.</p>
        <p>
          Moreover, a crucial component of our proposal is the “zero-shot” adaptation obtained through
the loose isomorphism between products in a graph and grammars: since parsing is built through
product understanding, not explicit or implicit behavioral supervision, its sample eficiency
makes it ideal for RSc shops and horizontal scalability (see below); on the other side, modern
NLP-based rewriting through behavioral supervision [
          <xref ref-type="bibr" rid="ref26">56</xref>
          ] is better suited for big retailers.15
        </p>
      </sec>
      <sec id="sec-6-3">
        <title>5.3. Vertical vs horizontal scaling</title>
        <p>When thinking about “scalable” engineering, we think of diminishing marginal efort as we
“scale” along an important dimension. Since most IR is done at Big Tech scale, the implicit
notion of scalability is the B2C one: as a target shop grows in inventory and trafic, the long
tail of queries will expand and rare events become more important (Section 2). In this regime,
data-driven approaches are scalable: the more trafic, the more data, so statistical generalization
is a promising path to diminishing marginal efort – how hard is to satisfy this shopper’s intent,
given we have seen already k million of them?</p>
        <p>As we hinted in this work, there is another concept of scalability in IR, which becomes evident
for B2B scenarios: if our system is used across multiple RSc shops, the marginal cost that will
dominate the business is deployment cost – how hard is to get a new shop online, given we put
online k already? The synthesis approach we championed has been developed mainly targeting
this second notion: if the marginal cost of tagging catalogs is diminishing (see the Appendix), the
cost of understanding queries on newer shops diminishes as well, irrespective of how much trafic
they get. While emphasis has been put on synthesis as the actual implementation mechanism
for our strategy, the broader, and perhaps novel insight, is that query performance is (in certain
cases) a by-product of product understanding and linguistic knowledge, both of which are more
scalable than practitioners typically realize.</p>
      </sec>
      <sec id="sec-6-4">
        <title>5.4. Parser fragility</title>
        <p>
          A critical point that has not been addressed is the “fragility” of parsing-first strategies: since no
parsing model would be perfect, what should we do when it fails? In our experience, the most
natural architecture is a two-tier system, such that, if parsing or program execution fail, the
system would resort to a traditional VSM strategy (e.g. a sparse / dense vector-based retrieval).
Considering the speed of an ML parser, we pay a tiny latency tax for the above mentioned
benefits. When it comes to deployment, our recommendation is to use program synthesis on top
of a basic VSM retrieval, not as a replacement; philosophically however, our position remains
that VSM is an approximation to relevance, and should be treated as such.
15Five years after the deployment of the system in this paper, it is telling that leading tech retailers are starting to
use a product graph for rewriting as well [
          <xref ref-type="bibr" rid="ref27">57</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusion</title>
      <p>Motivated by query distributions and industry constraints, we argued that program synthesis
(through semantic parsing) is a feasible path for a better search experience at RSc, compared to
VSM alone as a relevance model. We showed that the usual worries associated with explicit
meaning representations are unwarranted, and maintained that the key insight to a novel view
on search is the “isomorphic” structure of (parsed) queries and product structure.</p>
      <p>The representation dichotomy explicit-but-annotation-heavy VSM
approximate-but-fullylearnable is indeed a false one, and we sketched how a tiny initial linguistic structure can help
bootstrapping a large-scale parsing system. We are confident through 6 years of experience,
deployments and publications that RSc shops can benefit from it, and we hope this paper will
start a discussion with participants coming from diferent backgrounds. While this work hardly
constitutes the last word on the topic, it is hopefully a first step in leading the field away from
local optima, and embracing the peculiarities and opportunities of product search.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>The dry prose of a scholarly paper cannot do justice to the adventure that is building an
earlystage startup: this paper would not have been possible without Tooso, the company pioneering
search-as-parsing at scale back in 2018-2019. We wish to thank first Mattia, Luca, Andrea,
Alessia, and then everybody else involved in that clumsy, special company: a challenge we were
willing to accept, one we were unwilling to postpone, and one we intended to win.</p>
      <p>Furthermore, we wish to thank Tracy Holloway King, Federico Bianchi, Patrick John Chia
and two anonymous reviewers for useful comments to a previous version of this paper.
searching-for-roi-in-retail-the-time-for-a-new-site-search-tool-is-now/?categoryid=
a89c0000000AKp1AAG.
[6] G. Salton, A. Wong, C. S. Yang, A vector space model for automatic indexing, Commun.</p>
      <p>ACM 18 (1975) 613–620. URL: https://doi.org/10.1145/361219.361220. doi:10.1145/361219.
361220.
[7] B. Mitra, N. Craswell, An introduction to neural information retrieval, Foundations and
Trends® in Information Retrieval 13 (2018) 1–126. URL: https://www.microsoft.com/en-us/
research/publication/introduction-neural-information-retrieval/.
[8] N. Choudhary, N. Rao, S. Katariya, K. Subbian, C. K. Reddy, Anthem: Attentive hyperbolic
entity model for product search, in: WSDM ’22: The Fifteenth ACM International
Conference on Web Search and Data Mining, Phoenix, AZ, USA, February 21-25, 2022, WSDM
’22, Association for Computing Machinery, New York, NY, USA, 2022.
[9] C. Pei, Y. Zhang, Y. Zhang, F. Sun, X. Lin, H. Sun, J. Wu, P. Jiang, W. Ou, D. Pei, Personalized
context-aware re-ranking for e-commerce recommender systems, ArXiv abs/1904.06813
(2019).
[10] R. Li, Y. Jiang, W. Yang, G. Tang, S. Wang, C. Ma, W. He, X. Xiong, Y. Xiao, Y. E. Zhao,
From semantic retrieval to pairwise ranking: Applying deep learning in e-commerce
search, Proceedings of the 42nd International ACM SIGIR Conference on Research and
Development in Information Retrieval (2019).
[11] D. Gillick, A. Presta, G. S. Tomar, End-to-end retrieval in continuous space, arXiv preprint
arXiv:1811.08008 (2018).
[12] V. Karpukhin, B. Oğuz, S. Min, P. Lewis, L. Y. Wu, S. Edunov, D. Chen, W. tau Yih, Dense
passage retrieval for open-domain question answering, ArXiv abs/2004.04906 (2020).
[13] Y. Tay, V. Q. Tran, M. Dehghani, J. Ni, D. Bahri, H. Mehta, Z. Qin, K. Hui, Z. Zhao, J. Gupta,
T. Schuster, W. W. Cohen, D. Metzler, Transformer memory as a diferentiable search index,
2022. arXiv:2202.06991.
[14] R. Nogueira, W. Yang, K. Cho, J. J. Lin, Multi-stage document ranking with bert, ArXiv
abs/1910.14424 (2019).
[15] Y. Yan, Z. Liu, M. Zhao, W. Guo, W. P. Yan, Y. Bao, A practical deep online ranking system
in e-commerce recommendation, in: ECML/PKDD, 2018.
[16] D. Sorokina, E. Cantu-Paz, Amazon search: The joy of ranking products, in:
Proceedings of the 39th International ACM SIGIR Conference on Research and
Development in Information Retrieval, SIGIR ’16, Association for Computing Machinery,
New York, NY, USA, 2016, p. 459–460. URL: https://doi.org/10.1145/2911451.2926725.
doi:10.1145/2911451.2926725.
[17] D. Basin, Y. Deville, P. Flener, A. Hamfelt, J. Nilsson, Synthesis of programs in computational
logic, volume 3049, 2004, pp. 30–65. doi:10.1007/978- 3- 540- 25951- 0_2.
[18] J. Tagliabue, You do not need a bigger boat: Recommendations at reasonable scale in
a (mostly) serverless and open stack, in: Proceedings of the 15th ACM Conference on
Recommender Systems, RecSys ’21, Association for Computing Machinery, New York,
NY, USA, 2021, p. 598–600. URL: https://doi.org/10.1145/3460231.3474604. doi:10.1145/
3460231.3474604.
[19] Techcrunch, Search api startup algolia raises $150 million at $2.25
billion valuation, 2021. URL: https://techcrunch.com/2021/07/28/
search-api-startup-algolia-raises-150-million-at-2-25-billion-valuation/.
[20] Bloomreach, With $175 million in funding, bloomreach is authoring the next
chapter of e-commerce, 2022. URL: https://www.bloomreach.com/en/blog/2022/
with-usd175-million-in-funding-bloomreach-is-authoring-the-next-chapter-of-e-commerce.
[21] S. Marotta, Canada’s latest tech public debut swings amid soft
ipos, 2021. URL: https://www.bloomberg.com/news/articles/2021-11-25/
canada-s-latest-tech-public-debut-swings-amid-slew-of-soft-ipos.
[22] B. Requena, G. Cassani, J. Tagliabue, C. Greco, L. Lacasa, Shopper intent prediction from
clickstream e-commerce data with minimal browsing information, Scientific Reports 10
(2020) 2045–2322. doi:10.1038/s41598- 020- 73622- y.
[23] F. Bianchi, J. Tagliabue, B. Yu, L. Bigon, C. Greco, Fantastic embeddings and how to align
them: Zero-shot inference in a multi-shop scenario, in: Proceedings of the SIGIR 2020
eCom workshop, July 2020, Virtual Event, published at http://ceur-ws.org (to appear), 2020.</p>
      <p>URL: https://arxiv.org/abs/2007.14906.
[24] J. Tagliabue, C. Greco, J.-F. Roy, F. Bianchi, G. Cassani, B. Yu, P. J. Chia, Sigir 2021
e-commerce workshop data challenge, in: SIGIR eCom 2021, 2021.
[25] M. Tsagkias, T. H. King, S. Kallumadi, V. Murdock, M. de Rijke, Challenges and research
opportunities in ecommerce search and recommendations, in: SIGIR Forum, volume 54,
2020.
[26] E. Brenner, J. Zhao, A. Kutiyanawala, Z. Yan, End-to-end neural ranking for
ecommerce product search: an application of task models and textual embeddings, ArXiv
abs/1806.07296 (2018).
[27] J. Tagliabue, Mlops without much ops, 2022. URL: https://towardsdatascience.com/
mlops-without-much-ops-d17f502f76e8.
[28] M. Eric, Mlops is a mess but that’s to be expected, 2022. URL: https://www.mihaileric.com/
posts/mlops-is-a-mess/.
[29] J. Tagliabue, H. Bowne-Anderson, V. Tuulos, S. Goyal, R. Cledat, D. Berg, Reasonable scale
machine learning with open-source metaflow, 2023. arXiv:2303.11761.
[30] P. Molino, C. Ré, Declarative machine learning systems: The future of machine learning
will depend on it being in the hands of the rest of us., Queue 19 (2021) 46–76. URL:
https://doi.org/10.1145/3475965.3479315. doi:10.1145/3475965.3479315.
[31] D. Berg, R. K. Chirravuri, R. Cledat, S. Goyal, F. Hamad, V. Tuulos, Open-sourcing metaflow,
a human-centric framework for data science, 2019. URL: https://netflixtechblog.com/
open-sourcing-metaflow-a-human-centric-framework-for-data-science-fa72e04a5d9.
[32] J. Tagliabue, B. Yu, M. Beaulieu, How to grow a (product) tree: Personalized category
suggestions for eCommerce type-ahead, in: Proceedings of The 3rd Workshop on
eCommerce and NLP, Association for Computational Linguistics, Seattle, WA, USA, 2020, pp.
7–18. URL: https://aclanthology.org/2020.ecnlp-1.2. doi:10.18653/v1/2020.ecnlp- 1.2.
[33] T. H. King, White Roses, Red Backgrounds: Bringing Structured Representations to Search,
Springer International Publishing, Cham, 2023, pp. 191–215. URL: https://doi.org/10.1007/
978-3-031-21780-7_9. doi:10.1007/978- 3- 031- 21780- 7_9.
[34] A. Schade, J. Nielsen, Ecommerce User Experience Vol. 05: Search., 2022. URL: https:
//www.nngroup.com/reports/ecommerce-ux-search-including-faceted-search/.
[35] B. Yu, J. Tagliabue, C. Greco, F. Bianchi, “an image is worth a thousand features”:
Scal</p>
    </sec>
    <sec id="sec-9">
      <title>A. Implementation notes</title>
      <p>As the novelty of our proposal does not lie in new classifiers or NLP pipelines, we briefly
expand here the implementation strategies sketched in Section 4. We count as a strength of the
approach that tried-and-tested and of-the-shelf techniques can be successfully used to start:
any improvements to the below methods will make the parser even better.
Once basic tags (COLOR, BRAND, etc.) are defined as the building blocks of the knowledge
base and the logical forms, we need to know where and how each of these attributes can be
found starting from the product catalogs. We favor a declarative approach, where tags are
associated with strategies, that get executed in series when parsing the catalog. For example,
Table 1 shows how three tags can be extracted from Shop A.</p>
      <p>We first have configuration strategy, which just points to the column in the catalog that
contains the attribute (typical for brands, prices etc.); this leverages the structured nature of
catalogs, which is a huge simplifying factor when considering product search vis-à-vis web
search. We then have a model strategy, which relies on machine learning to accomplish tagging;
ifnally we have a heuristic strategy, building on domain knowledge and catalog specifics.</p>
      <p>
        When discussing scaling B2B product search across deployments, it’s important to realize
diferent strategies have diferent levels of granularity. Configurations are set per shop and
they are deterministic; models can typically be trained across shops (for entire industries for
example) and can leverage the latest zero-shot classifiers in case no label is wanted / needed
[
        <xref ref-type="bibr" rid="ref16">46</xref>
        ] heuristics are more case specifics, but in our experience they have some degree of re-use:
moreover, heuristics can be used to train new classifiers (using for example weak supervision
[
        <xref ref-type="bibr" rid="ref20">50</xref>
        ]), which will in turn reduce the use of heuristics.
      </p>
      <p>
        Importantly, the very recent progress on large language models promises to greatly simplify
the actual building of a structured knowledge representation, ofering even zero-shot graph
building from text [
        <xref ref-type="bibr" rid="ref28">58</xref>
        ]. While LLMs are still too slow and somehow not understood enough to
be directly involved in the runtime query path, they are definitely well suited to speed up the
ofline component of our method (Fig. 1, section 1 and 2 from the left).
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Cramer-Flood</surname>
          </string-name>
          ,
          <source>Global Ecommerce</source>
          <year>2020</year>
          .
          <article-title>Ecommerce Decelerates amid Global Retail Contraction but Remains a Bright Spot</article-title>
          .,
          <year>2020</year>
          . URL: https://www.emarketer.com/content/ global-ecommerce-
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ai</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. Narayanan.R</surname>
          </string-name>
          ,
          <article-title>Model-agnostic vs. model-intrinsic interpretability for explainable product search</article-title>
          ,
          <source>in: Proceedings of the 30th ACM International Conference on Information and Knowledge Management</source>
          , CIKM '21,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2021</year>
          , p.
          <fpage>5</fpage>
          -
          <lpage>15</lpage>
          . URL: https://doi.org/10.1145/3459637.3482276. doi:
          <volume>10</volume>
          .1145/ 3459637.3482276.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Commerce</surname>
          </string-name>
          ,
          <article-title>How Ecommerce Site Search Can Create a Competitive Advantage</article-title>
          .,
          <year>2021</year>
          . URL: https://www.bigcommerce.com/articles/ecommerce/site-search/ #
          <article-title>the-effectiveness-of-ecommerce-site-search-</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Alaimo</surname>
          </string-name>
          ,
          <volume>87</volume>
          %
          <article-title>of shoppers now begin product searches online</article-title>
          .,
          <year>2018</year>
          . URL: https://www. retaildive.com/news/87-of
          <article-title>-shoppers-now-begin-</article-title>
          <string-name>
            <surname>product-</surname>
          </string-name>
          searches-online/530139/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Compton</surname>
          </string-name>
          ,
          <article-title>Searching For ROI In Retail: The Time For A New Site Search Tool</article-title>
          Is Now,
          <year>2021</year>
          . URL: https://www.forrester.
          <article-title>com/blogs/ able product representations for in-session type-ahead personalization</article-title>
          ,
          <source>in: Companion Proceedings of the Web Conference</source>
          <year>2020</year>
          , WWW '20,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2020</year>
          , p.
          <fpage>461</fpage>
          -
          <lpage>470</lpage>
          . URL: https://doi.org/10.1145/3366424.3386198. doi:
          <volume>10</volume>
          .1145/3366424.3386198.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Greco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tagliabue</surname>
          </string-name>
          ,
          <article-title>Language in a (search) box: Grounding language learning in real-world human-machine interaction</article-title>
          ,
          <source>in: Proceedings of the</source>
          <year>2021</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>4409</fpage>
          -
          <lpage>4415</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .naacl-main.
          <volume>348</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          . naacl- main.348.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>R.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Data recombination for neural semantic parsing</article-title>
          ,
          <source>ArXiv abs/1606</source>
          .03622 (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tagliabue</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. Yu,</surname>
          </string-name>
          <article-title>Query2Prod2Vec: Grounded word embeddings for eCommerce, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>154</fpage>
          -
          <lpage>162</lpage>
          . URL: https: //aclanthology.org/
          <year>2021</year>
          .naacl-industry.
          <volume>20</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .naacl- industry.20.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>B.</given-names>
            <surname>Hui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Improving text-to-sql with schema dependency learning</article-title>
          ,
          <source>ArXiv abs/2103</source>
          .04399 (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Explainable recommendation: A survey and new perspectives</article-title>
          ,
          <source>Found. Trends Inf. Retr</source>
          .
          <volume>14</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bahri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Metzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tomkins</surname>
          </string-name>
          , Choppy:
          <article-title>Cut transformer for ranked list truncation</article-title>
          ,
          <source>in: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , SIGIR '20,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2020</year>
          , p.
          <fpage>1513</fpage>
          -
          <lpage>1516</lpage>
          . URL: https://doi.org/10.1145/3397271. 3401188. doi:
          <volume>10</volume>
          .1145/3397271.3401188.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tagliabue</surname>
          </string-name>
          , Applied Research at Reasonable Scale, https://medium.com/the-techlife/ applied-research-at-reasonable
          <source>-scale-8a74d2beed89</source>
          ,
          <year>2022</year>
          . [Online; accessed 19-Feb-2023].
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>W.</given-names>
            <surname>Hodges</surname>
          </string-name>
          , Model Theory, in: E. N.
          <string-name>
            <surname>Zalta</surname>
          </string-name>
          (Ed.),
          <source>The Stanford Encyclopedia of Philosophy</source>
          , Spring 2022 ed., Metaphysics Research Lab, Stanford University,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Berant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Building a semantic parser overnight</article-title>
          ,
          <source>in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</source>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Beijing, China,
          <year>2015</year>
          , pp.
          <fpage>1332</fpage>
          -
          <lpage>1342</lpage>
          . URL: https://aclanthology.org/P15-1129. doi:
          <volume>10</volume>
          .3115/v1/
          <fpage>P15</fpage>
          - 1129.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>V.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Karnick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jhala</surname>
          </string-name>
          ,
          <article-title>Product classification in e-commerce using distributional semantics</article-title>
          ,
          <source>in: COLING</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>P.</given-names>
            <surname>Chia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Attanasio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Terragni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Magalhães</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Goncalves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Greco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tagliabue</surname>
          </string-name>
          ,
          <article-title>Contrastive language and vision learning of general fashion concepts</article-title>
          ,
          <source>Scientific Reports</source>
          <volume>12</volume>
          (
          <year>2022</year>
          ).
          <source>doi:10.1038/s41598- 022- 23052- 9.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Frieske</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ishii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Madotto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fung</surname>
          </string-name>
          ,
          <article-title>Survey of hallucination in natural language generation</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , P. Liang,
          <article-title>Evaluating verifiability in generative search engines</article-title>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>A.</given-names>
            <surname>Drozdov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Scharli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Akyuurek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Scales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Bousquet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Compositional semantic parsing with large language models</article-title>
          ,
          <source>ArXiv abs/2209</source>
          .15003 (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Ratner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Ehrenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Fries</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ré</surname>
          </string-name>
          , Snorkel:
          <article-title>Rapid training data creation with weak supervision</article-title>
          ,
          <source>Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases 11</source>
          <volume>3</volume>
          (
          <year>2017</year>
          )
          <fpage>269</fpage>
          -
          <lpage>282</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosenbaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Soltan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hamza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Safari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Damonte</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Groves</surname>
          </string-name>
          , Clasp:
          <article-title>Few-shot cross-lingual data augmentation for semantic parsing</article-title>
          ,
          <source>in: AACL-IJCNLP</source>
          <year>2022</year>
          ,
          <year>2022</year>
          . URL: https://www.amazon.science/publications/ clasp-few
          <article-title>-shot-cross-lingual-data-augmentation-for-semantic-parsing.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Laferty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>McCallum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. C. N.</given-names>
            <surname>Pereira</surname>
          </string-name>
          ,
          <article-title>Conditional random fields: Probabilistic models for segmenting and labeling sequence data</article-title>
          ,
          <source>in: Proceedings of the Eighteenth International Conference on Machine Learning</source>
          , ICML '
          <fpage>01</fpage>
          , Morgan Kaufmann Publishers Inc., San Francisco, CA, USA,
          <year>2001</year>
          , p.
          <fpage>282</fpage>
          -
          <lpage>289</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [53]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Joint extraction of entities and relations based on a novel tagging scheme</article-title>
          ,
          <source>in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Vancouver, Canada,
          <year>2017</year>
          , pp.
          <fpage>1227</fpage>
          -
          <lpage>1236</lpage>
          . URL: https:// aclanthology.org/P17-1113. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P17</fpage>
          - 1113.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [54]
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Lake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. L.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <article-title>Word meaning in minds and machines, Psychological review (</article-title>
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [55]
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Chia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tagliabue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Greco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Goncalves</surname>
          </string-name>
          , “
          <article-title>does it come in black?” CLIP-like models are zero-shot recommenders</article-title>
          ,
          <source>in: Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5)</source>
          , Association for Computational Linguistics, Dublin, Ireland,
          <year>2022</year>
          , pp.
          <fpage>191</fpage>
          -
          <lpage>198</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .ecnlp-
          <volume>1</volume>
          .22. doi:
          <volume>10</volume>
          .18653/ v1/
          <year>2022</year>
          .ecnlp-
          <volume>1</volume>
          .
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [56]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Goutam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yin</surname>
          </string-name>
          , Queen:
          <article-title>Neural query rewriting in e-commerce</article-title>
          ,
          <source>in: The Web Conference</source>
          <year>2021</year>
          ,
          <year>2021</year>
          . URL: https://www.amazon.science/ publications/queen
          <article-title>-neural-query-rewriting-in-e-commerce.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [57]
          <string-name>
            <given-names>S.</given-names>
            <surname>Farzana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ristoski</surname>
          </string-name>
          ,
          <article-title>Knowledge graph-enhanced neural query rewriting</article-title>
          ,
          <source>in: Companion Proceedings of the ACM Web Conference</source>
          <year>2023</year>
          , WWW '23 Companion, Association for Computing Machinery, New York, NY, USA,
          <year>2023</year>
          , p.
          <fpage>911</fpage>
          -
          <lpage>919</lpage>
          . URL: https: //doi.org/10.1145/3543873.3587678. doi:
          <volume>10</volume>
          .1145/3543873.3587678.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [58]
          <string-name>
            <given-names>V.</given-names>
            <surname>Shenoy</surname>
          </string-name>
          , Graphgpt, https://github.com/varunshenoy/graphgpt,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>