<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards AI-Powered Multi-Modal Generative OLAP</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sandro Bimonte</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chant Boyadjian</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Rizzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sana Sellami</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DISI, University of Bologna</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LIS, Aix Marseille University</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>TSCF, INRAE - University of Clermont Auvergne</institution>
          ,
          <addr-line>Aubière</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Recent advances in AI allow decision makers to move beyond traditional analysis towards sophisticated decisionmaking tasks that require human intuition and perception. This opens the door to a novel form of OLAP, one that integrates unstructured data -such as text and images- into its analytical workflows. In this work we propose OLAP-AI, a novel and enriched form of the OLAP paradigm aimed at supporting, besides the analysis of categorical and numeric data, also semantically rich operations over free text and images. OLAP-AI is multimodal, in that it supports crossed semantic searches between text and images for filtering and grouping. Besides, it significantly extends aggregation by operating on text and images rather than on numeric data only, and by relying on generative models. To investigate the technical feasibility of the OLAP-AI paradigm, we provide a proof-of-concept by relying on the open-source vector DBMS Weaviate.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;AI</kwd>
        <kwd>OLAP</kwd>
        <kwd>Multi-modal databases</kwd>
        <kwd>Vector databases</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and motivation</title>
      <p>year
month
date
restaurant
description
restaurant
picture</p>
      <p>
        dish
standard
dish picture
restaurant
city
region
country
classification, and sentence extraction for automatic summarization. It ranks each word or sentence
based on its importance to the entire text. Similarly, a few works have studied the use of images in
OLAP analyses. For instance, iCube [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] extends a traditional data warehouse by including a dimension
table that stores intrinsic image features as vectors, which are used with similarity for filtering queries.
Overall, these works show an interest in using non-structured data for OLAP analyses to enhance the
decision-making process. However, they do not support a tight integration of multi-modal data within
the same conceptual and technological framework, nor do they extend OLAP operators to cope with
text, images, and numeric data in a seamless way. Moreover, the aggregation functions provided by
these works are basic data summarization methods based on statistical and data mining approaches.
      </p>
      <p>
        We claim that the advent of AI models opens the door to a novel form of OLAP, one that integrates
unstructured data —such as text and images— into its analytical workflows and enables generative
multimodal selection, grouping, and aggregation of these data. Thus, in this work we propose a novel vision
of the OLAP paradigm, which we call OLAP-AI and aims at extending the expressive power of OLAP to
support, besides the analysis of categorical and numeric data, also semantically rich operations over
free text and images. Our approach can be labeled as multi-modal, in that it supports crossed semantic
searches between text and images for filtering and grouping [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Remarkably, it extends aggregation in
two ways: (i) with reference to classical OLAP, by operating on text and images rather than on numeric
data only; (ii) with reference to the approaches to text/image-based OLAP mentioned above, by relying
on generative models. From a technical point of view, the implementation of this paradigm shift requires
the integration of multi-modal and generative AI models into DBMSs (Database Management Systems).
One promising direction to achieve this integration would be by relying on the emerging technology of
vector databases, which provide a practical foundation for indexing, searching, and aggregating
highdimensional representations of diverse data types [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. To investigate the feasibility of this solution, we
provide a proof-of-concept for OLAP-AI by relying on the vector DBMS Weaviate.
      </p>
      <p>The paper is structured as follows. Section 2 introduces the OLAP-AI paradigm, while Section 3
describes our proof-of-concept implementation. Section 4 closes the paper by discussing the main open
challenges for research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Multi-modal generative OLAP</title>
      <p>This section describes the OLAP-AI paradigm we envision for multidimensional analysis of structured
and unstructured data. We illustrate OLAP-AI with a multidimensional cube that models all orders
placed across a chain’s restaurants; its conceptual schema is shown in Figure 1.</p>
      <p>
        OLAP analyses are commonly based on the multidimensional model [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], whose main citizens are
Sweet, but the price
felt slightly high for
what it was
facts to be analyzed by means of dimensions. Dimensions are organized in hierarchies with nested levels,
and facts are described by numeric measures to be aggregated by means of basic aggregation functions
(e.g., SUM, AVG, MIN). Instances of levels are categorical textual data and are called members. The
conceptual schema for our ORDER LINE fact features four dimensions:
• date, i.e., the order data with levels day, month, and year.
• customer, i.e., the account of the customer who placed each order.
• dish, whose members correspond to all the dishes ofered by the restaurant chain. Note that, for
each dish ofered by multiple restaurants, there will be a single member of dish.
• restaurant, whose members are the chain’s various restaurants with their geographical locations
and descriptions.
      </p>
      <p>Each order line is described by some numeric measures: quantity, unit price, and amount (computed
as quantity × unit price). This schema supports classical OLAP queries such as “What is the total and
average amount of the orders placed each year by each customer in Paris?”</p>
      <p>In OLAP-AI, this multidimensional schema could be enriched with text and images as shown (in
green) in Figure 1. For instance:
• each dish could be associated with a standard picture of that dish;
• each restaurant could have a textual description, and a picture showing whether it is outdoor or
indoor (see Figure 2, right);
• each order line could have two additional non-numeric measures: the picture of the dish actually
served to the customer and a textual review given by the customer (see Figure 2, left).
To illustrate how OLAP-AI leverages multimodal data and generative AI models, we showcase a few
representative queries enabled by the enriched schema:
• Q1. For each city and year, only for the orders of burgers placed in restaurants with a terrace, show
the average amount paid, the average unit price, and the dish picture that is most similar to the
standard burger picture ofered by the chain . Here, the filter selects the restaurants with a terrace
by means of a multi-modal semantic search (search stored images based on input text) on the
restaurant photos. Besides, an aggregation operator is applied to the dish picture measure to
return the burger picture most similar to the standard one (search stored images based on input
image).
• Q2. For each type of restaurant (indoor or outdoor), show a representative image of the burgers
ordered. To classify restaurants as indoor or outdoor, their pictures are analyzed via multi-modal
semantic search (search stored images based on input text). The aggregation operator returns, for
each group of pictures, a single picture that emphasizes the common visual characteristics of the
burgers ordered, computed via a generative model.
• Q3. For each city, year, and restaurant category (fast food or traditional), return a summary of
the customer reviews and the average amount paid, but only for restaurants considered expensive.
Restaurants are considered to be expensive if at least one of their reviews mentions a high price,
while the restaurant category is inferred from the restaurant descriptive text. Both tasks are
achieved via a semantic search (search stored text based on input text). Aggregation involves
both numeric data (the average amount) and text (the summary of customer reviews obtained via
a generative model).</p>
      <p>Table 1 summarizes the use of images and text in each query for filtering (WHERE clause), grouping
(GROUP BY clause), and aggregating (SELECT clause). Note that, in the table, the terms ’categorical’
and ’numeric’ refer to the standard way of specifying filtering, grouping, and aggregation in OLAP.
Noticeably, semantic search is applied over multi-modal data to filter and aggregate multidimensional
data based on information (such as the presence of a terrace) that is not explicitly stored in the cube in
categorical form. Moreover, generative AI models enable the extension of classical (numeric) aggregation
to operate on images (e.g., to create a representative image) and text (e.g., by creating review summaries).</p>
    </sec>
    <sec id="sec-3">
      <title>3. Proof-of-concept</title>
      <p>In this section we give a proof-of-concept for the OLAP-AI paradigm by describing a prototype
implementation of the enriched ORDER LINE cube and of the queries described in the previous section. The
overall data pipeline is sketched in Figure 3.</p>
      <p>Data is stored using Weaviate (https://weaviate.io), an open-source vector database designed for
storing and searching data via machine learning-generated vector embeddings. Weaviate supports
eficient semantic search, i.e., users can retrieve information based on the meaning of queries rather
than on exact keyword matches. With its built-in support for various AI models, Weaviate is especially
useful for applications involving natural language processing, image search, and recommendations. Its
scalability and flexibility make it a powerful tool for developers building AI-powered search and data
analysis solutions.</p>
      <p>
        To implement multi-modal filters, we used CLIP (Contrastive Language-Image Pre-training) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], a
deep search model developed by OpenAI that learns to associate images with their corresponding textual
descriptions by relying on a dataset of 400 million image-text pairs collected from the internet. To
compute text and image similarities, we used the near_media and near_text functions of Weaviate.
For generative textual aggregation, we used the generate.near_text function, which combines
semantic text search with a text generation step using an LLM. Finally, since the generation of images
is not supported by Weaviate yet, for this task we used an external application called ComfyUI.
      </p>
      <p>
        To store the cube data, we have adopted a shattered schema [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and created one document collection
per dimension plus one for the fact, as shown in Figure 4. Weaviate does not natively support joins,
so we used the Pandas library to manually implement them. Images and textual descriptions were
transformed into embeddings using Weaviate functions.
      </p>
      <p>Overall, to design and feed the enriched LINE ORDER cube with Weaviate, we followed two steps:
1. Multidimensional design. Here, we created the collections of dimensions and facts according
to the shattered schema and configured the vectorization task. In Weaviate, vectorization is
configured during the initialization of a collection using the vectorizer_config parameter,
which accepts an array of vectorizer configurations. The configuration parameters depend on the
chosen module.
2. Data loading and vectorization. We fed the shattered schema with both structured alphanumeric
data and unstructured data in their native forms together with their corresponding vectors.
A vector is a numeric representation that captures the semantic content of one or more data
ifelds. These vectors are produced by machine learning algorithms called vectorizers, which
transform unstructured data into embeddings. In Weaviate, unstructured data are stored both in
their original form and as vectors. Similarity searches are performed on these vectors, enabling
comparison and retrieval based on semantic closeness rather than exact matches. This allows, for
example, the similarity between two texts, two images, or even between an image and a text to
be computed. This feature is supported by the multi-modal approach of Weaviate, where vector
representations of diferent data modalities are represented in the same vector space. During
data ingestion, vector embeddings are automatically created and stored alongside the raw data
in the database. For our proof-of-concept, we have configured two types of vectorizers: ( i) The
text2vec_ollama module, which relies on the nomic-embed-text model to generate vectors
from the textual fields restaurant_description of Location and review of Order Line; (ii) The
multi2vec_clip module, which applies the CLIP model to the image fields restaurant_image
of Location and dish_picture of Dish.</p>
      <p>To implement the OLAP-AI queries introduced in Section 2 we followed four sequential steps:
1. Filtering, aimed at filtering the data based on conditions applied to hierarchy levels using native
Weaviate operators. This step takes a collection as input and returns only the elements that
satisfy the conditions.
2. Join, which merges the collections data of dimensions and fact required to execute the query. Joins
are implemented using the Pandas library’s merge function. The input is a set of DataFrames
constructed by extracting the properties of objects retrieved from Weaviate collections and by
converting Python dictionaries into lists of tuples. The output is a single DataFrame that combines
all the data.
3. Grouping and numeric aggregation, where the data is grouped by specified hierarchy levels and
numeric aggregation functions —such as average or sum— are applied to selected measures. The
implementation uses the groupby method of Pandas, which takes the merged DataFrame as
input and produces a new DataFrame containing grouped and aggregated results.
4. Multi-model aggregation. Diferent methods and tools are used to aggregate data according to
their type. For generative textual aggregation (as in Q3) we use RAG queries, while to find the
most similar image to a given one (as in Q1) we use the semantic search provided by Weaviate.
Generative image aggregation (as in Q2) is handled with ComfyUI, since Weaviate does not
support the integration of generative models for images.</p>
      <p>To illustrate the query implementation, in the remainder of this section we explain how Q3 has
been implemented using the Python client of Weaviate. Noticeably, this query requires the use of the
prompting methods ofered by Weaviate to generate information that is not explicitly stored in the
cube, so as to enable textual grouping and aggregation. Weaviate provides a specific kind of query for
prompting, called RAG (Retrieval Augmented Generation) query. During initialization, the configuration
of a generative model can be defined to enable RAG queries that accept prompts. The configuration
includes specifying an API endpoint and the model name, which in our case is llama3:8b. A RAG
query in Weaviate first performs a similarity search to retrieve relevant objects, then applies a prompt
on the search results using a generative AI model. Weaviate supports two main types of RAG queries: (i)
single prompt, which generates one response per returned object, and (ii) grouped task, which generates
a single response for the entire set of returned objects. Here are the steps we followed:
1. Firstly, we implemented the WHERE clause (filtering step). In order to identify restaurants
considered as expensive, we performed a text semantic search using the near_text operator
to return the orders whose dish review embedding in the Order Line collection has a small
distance from that of the text “high price”.
2. Query Q3 asks to group order lines by restaurant category (fast food or classic). To this end,
we wrote a single prompt RAG query over the Location collection; in this query, we asked via a
prompt (shown in Figure 5) to classify restaurants into “fast food” and “classic” by applying the
near_text operator to the embeddings of textual restaurant descriptions. The result consists of
a collection of objects, each containing the generated classification result and the corresponding
restaurant’s metadata. Then, we mapped each restaurant_id to its corresponding class and stored
them in a dictionary called category_map.
3. Once data has been classified for grouping, three successive left joins must be executed. Firstly, we
joined the restaurants considered expensive with their orders through the restaurant identifiers.
Then, we joined each order in the result with its date, month, and year, using the relevant order
identifiers. Finally, we joined the generated restaurant category (either fast food or classic) with
the result of the second join, using the restaurant identifiers.
4. At this stage, grouping and numeric aggregation could be implemented. We grouped order lines
by city, year, and restaurant category and computed the average amount for each group.
5. Finally, Q3 requires textual aggregation, which was implemented by generating a summary of
the dish reviews through a grouped task RAG query with the near_text operator (the prompt
is shown in Figure 5).</p>
      <p>Classify the following restaurant description {restaurant_description} into one of these two types
only: Fast Food or Classic. Return only the classified type.</p>
      <p>Provide a short summary of the essential comments, using your own words, starting immediately
with the summary and without adding any prefix or information not present in the reviews.</p>
      <p>One point deserves some further considerations. The near_text operator, which in Q3 computes
the distance between the embedding of each review and the embedding of text “high price”, ranks the
reviews from the nearest to the farthest one. If no threshold was provided, this ranking would also
include reviews that actually do not mention a high price. Thus, to ensure that the retrieved reviews
are truly relevant, we specified a maximum distance (0.55) using the distance parameter. This value
could not be automatically determined; we had to select an appropriate threshold by observing the
distances of reviews known to mention a high price. The results of Q3 are shown in Figure 6. All
restaurants are represented except the one with identifier 1, as it has no review mentioning a high price.
The restaurants with identifiers 5, 4, and 2 were classified as classic, while the one with identifier 3 was
classified as fast food.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and open challenges</title>
      <p>Recent advances in AI ofer a compelling research opportunity to reimagine OLAP beyond its traditional
focus on structured (categorical and numeric) data. The integration of multi-modal and generative AI
models into DBMSs supported by vector DBMS gave us a chance to envision OLAP-AI, a significant
enrichment of the OLAP paradigm to make it capable of seamlessly handling semantically rich queries
over text, images, and numeric data within a unified analytical framework. Our approach moves beyond
basic summarization techniques, enabling more insightful and flexible decision-making processes. To
validate it, we have provided a proof-of-concept implementation using Weaviate, a vector database that
supports semantic indexing and retrieval of multi-modal data.</p>
      <p>Several open research challenges remain:
• Formalization. To support our OLAP-AI paradigm, the multidimensional model must be extended
and the OLAP operators must be formally redefined. The use of AI will allow new selection
and grouping levels to be defined at query time (as done, for instance, in Q2 by introducing a
restaurantType level on-the-fly); in the same way, it will allow the separation between measures
and dimension, which is a foundation of the classical multidimensional model, to be overcome
(e.g., by classifying order lines into popular and unpopular based on their review).
• Design and implementation: Traditional OLAP systems rely on structured, relational schemas
with well-defined dimensions and measures. Adopting vector databases, which rely on
highdimensional embeddings, requires redefining how unstructured data is represented, stored, and
queried. In this work we used a vector database since it natively supports the integration of AI
models. However, some recent eforts are going on to extend relational databases to support
vectors and AI models; an example of this is PgAI (https://github.com/timescale/pgai), an
opensource extension of PostgreSQL that integrates AI directly in the DBMS.
• Performance and optimization: While generally well-handled by vector-based architectures,
scalability raises some challenges when dealing with large and heterogeneous datasets, particularly in
maintaining consistent performance across diverse data modalities. The design and optimization
of new indexing strategies for multimodal data —especially when combining textual, visual, and
numeric information— is an active research area, together with the definition, maintenance, and
update of materialized views in a vector database context.
• Querying: OLAP query languages (such as MDX or SQL-based variants) are not natively designed
to express semantic or similarity-based queries. Extending these languages to support vector
search operations, such as nearest-neighbor retrieval or semantic joins, requires careful abstraction
and standardization.
• Risks: Using AI models to produce (possibly generative) outputs in analytical settings raises risks,
such as hallucinations and bias amplification, whose impact on the decision-making process must
be investigated.</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>No Generative AI tools have been used during the preparation of this work.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgment and Declaration</title>
      <p>This work was partially supported by the French projects 22-PEAE-0007, ANR-16-IDEX-0001, and
ANR-24-CHR4-0004-0. No Generative AI tools have been used during the preparation of this work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Eigner</surname>
          </string-name>
          , T. Händler,
          <article-title>Determinants of LLM-assisted decision-making</article-title>
          ,
          <source>arXiv 2402.17385</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Golfarelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rizzi</surname>
          </string-name>
          ,
          <article-title>Data Warehouse design: Modern principles</article-title>
          and Methodologies,
          <string-name>
            <surname>McGraw-Hill</surname>
          </string-name>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bouakkaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ouinten</surname>
          </string-name>
          , S. Loudcher,
          <article-title>OLAP textual aggregation approach using the Google similarity distance</article-title>
          ,
          <source>International Journal of Business Intelligence and Data Mining</source>
          <volume>11</volume>
          (
          <year>2016</year>
          )
          <fpage>31</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Pérez-Martínez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Berlanga-Llavori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Aramburu-Cabo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. B.</given-names>
            <surname>Pedersen</surname>
          </string-name>
          ,
          <article-title>Contextualizing data warehouses with documents</article-title>
          ,
          <source>Decision Support Systems</source>
          <volume>45</volume>
          (
          <year>2008</year>
          )
          <fpage>77</fpage>
          -
          <lpage>94</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Park</surname>
          </string-name>
          , H. Han,
          <string-name>
            <surname>I. Song</surname>
          </string-name>
          ,
          <article-title>XML-OLAP: A multidimensional analysis framework for XML warehouses</article-title>
          , in
          <source>: Proc. DaWaK</source>
          , Copenhagen, Denmark,
          <year>2005</year>
          , pp.
          <fpage>32</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , J. Han,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. X.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Text cube: Computing IR measures for multidimensional text database analysis</article-title>
          ,
          <source>in: Proc. ICDM</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>905</fpage>
          -
          <lpage>910</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          , P. Tarau,
          <article-title>TextRank: Bringing order into text</article-title>
          ,
          <source>in: Proc. EMNLP</source>
          , Barcelona, Spain,
          <year>2004</year>
          , pp.
          <fpage>404</fpage>
          -
          <lpage>411</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Annibal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Felipe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Ciferri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Ciferri</surname>
          </string-name>
          ,
          <article-title>iCube: A similarity-based data cube for medical images</article-title>
          ,
          <source>in: Proc. CBMS</source>
          , Perth, Australia,
          <year>2010</year>
          , pp.
          <fpage>321</fpage>
          -
          <lpage>326</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Mei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>A survey of multimodal retrieval-augmented generation</article-title>
          ,
          <source>CoRR abs/2504</source>
          .08748 (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Survey of vector database management systems</article-title>
          ,
          <source>The VLDB Journal</source>
          <volume>33</volume>
          (
          <year>2024</year>
          )
          <fpage>1591</fpage>
          -
          <lpage>1615</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          , et al.,
          <article-title>Learning transferable visual models from natural language supervision</article-title>
          ,
          <source>in: Proc. ICML</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>8748</fpage>
          -
          <lpage>8763</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Chevalier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Malki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kopliku</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Teste</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tournier</surname>
          </string-name>
          ,
          <article-title>Document-oriented models for data warehouses</article-title>
          ,
          <source>in: Proc. ICEIS</source>
          , Rome, Italy,
          <year>2016</year>
          , pp.
          <fpage>142</fpage>
          -
          <lpage>149</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>