1. Introduction

SIGIR Workshop on eCommerce, Jul

10.1145/3308558.3313417

Model with LLMs

Rumana Ferdous Munne

rumanaferdous.munne@riken.jp 0

Md Mostafizur

Rahman

mdmostafizu.a.rahman@rakuten.com 1

Yuji Matsumoto

yuji.matsumoto@riken.jp 0 0 RIKEN Center for Advanced Intelligence Project (AIP) , Tokyo , Japan 1 Rakuten Institute of Technology (RIT), Rakuten Group, Inc. , Tokyo , Japan

2025

17 2025 4702 4709

Lookalike modeling is the key to digital marketing, driving product sales and improving ad campaigns by identifying users similar to a given set of seed users. However, this task presents several challenges. Companies often handle hundreds of marketing campaigns daily, targeting a large user base, making it difficult for models that depend solely on high-level features to achieve optimal performance. Additionally, the limited size of seed lists can lead to over-fitting, requiring models to generalize effectively. Traditional methods, using deep learning and graph-based approaches, excel at capturing complex user-item relationships but heavily depend on ID-based data and often overlook valuable textual information, such as user reviews and item descriptions. Moreover, privacy concerns and increasing data regulations further complicate the process, as conventional models frequently rely on sensitive user attributes. To overcome these challenges, we propose a Graph-Lookalike Model (GLoM) that integrates large language models (LLMs) into lookalike modeling. GLoM enhances user targeting by combining advanced representation learning with LLMs, capturing important semantic information in user behavior, sentiments, and preferences, while preserving the graph structure and incorporating auxiliary textual features. Our experiments show that GLoM successfully expands the user base across diverse categories like books, movies, electronics, and automotive, outperforming the baselines.

Lookalike modeling User targeting Embedding learning Representation learning LLMs

1. Introduction

The rapid expansion of the internet has led to a significant increase in digital marketing activities, with a large number of users interacting with these activities on a daily basis. In this vast online marketplace with billions of users, it is crucial for marketers to deliver content, ads, or products to the right audience through recommendation systems or advertising platforms. Lookalike modeling plays a key role in identifying similar users to a given set of seed users (Figure 1), thus increasing the chances of achieving specific marketing goals. Leading tech companies like Facebook, Google [ 1 ], Tencent [ 2 ], and LinkedIn [ 3 ] have developed robust platforms for such campaigns, yet the task remains complex.

Lookalike model offers significant economic benefits by identifying high-potential users for marketing campaigns. Traditional methods, which rely on demographic data or purchasing behavior, often miss latent users and depend solely on implicit feedback. The lack of generalization, user sentiment understanding, and insufficient seed users make things more challenging. Scaling these models to meet campaign needs is another significant challenge. An effective model must capture both explicit and implicit user traits, incorporate user sentiments, and scale efficiently. Deep learning [ 4, 5 ] and graph-based algorithms [ 6, 7 ], commonly used in recommender systems, have shown promise for lookalike modeling. Graph-based models [ 6, 7 ] have demonstrated remarkable capabilities in capturing complex user-item relationships. However, these models often operate on mapped user/item information for learning. The use of demographic or location-based data also raises privacy concerns. Moreover, the primary reliance on mapped data in graph-based models may overlook valuable information, such as the rich textual content associated with users and items.

An ideal lookalike model should go beyond basic user preferences and incorporate hidden factors such as user sentiments and user experiences. Large Language Models (LLMs) can play a pivotal role in analyzing user reviews and ratings to identify patterns of user-user similarity based on text data.

CEUR Workshop

ISSN1613-0073

Understanding how users feel about their purchases provides deeper insights while reducing reliance on private information. The proposed GLoM model addresses these limitations by avoiding the use of private user data and instead utilizing purchase interactions alongside publicly available data, such as product reviews and LLM-generated user or item profiles. An effective lookalike model should account not only for users preferences but also for factors such as affordability and lifestyle. For instance, a user purchasing electric car accessories might also be interested in smart home devices, such as Amazon Alexa, smart thermostats, or energy management hubs. This suggests that while the user actively engages with automotive products, they may have relevant interests in other domains where they lack a purchase history. It also highlights the potential for targeting users who can afford premium items but remain overlooked if the model focuses only on single-domain data. To address these limitations, our approach incorporates users’ complex cross-domain behaviors and item similarity patterns to deliver more comprehensive and effective audience expansion for user targeting.

Understanding complex relationships between users and items, as well as identifying users with similar behaviors, requires effective data structures like Knowledge Graphs (KGs) [ 8, 9, 10, 11 ]. KGs represent information as triples, where each triple encodes a factual relationship. The proposed GLoM model utilizes a knowledge graph constructed from user-item interactions, generated user and item polarity (i.e., whether a user likes or dislikes an item), and user-user similarity connections. LLMs excel at capturing the semantic meaning of entities, relationships, and text-encoded triples. However, they struggle to model the structural relationships inherent in graph data. Conversely, Graph Neural Networks (GNNs) are well-suited for processing graph structures but lack the ability to fully grasp the rich textual semantics that LLMs handle effectively. By combining the strengths of both LLMs and GNNs, GLoM creates a comprehensive model that integrates structural and semantic insights, enabling accurate user behavior analysis and effective lookalike modeling.

GLoM integrates a pre-trained embedding model with a graph convolutional network to identify lookalike users. During pre-training, user and item profiles generated by LLMs are leveraged with a knowledge graph. This pre-training and fine-tuning paradigm allows GLoM to extract informative and transferable knowledge from abundant unlabelled data through self-supervision tasks such as masked language modeling. This approach is particularly beneficial when the labeled seed list for user targeting is insufficient, as it avoids the need to train a new model from scratch, maintaining the integrity of the pre-trained model. GLoM employs three different aggregation techniques (please refer to Sec. 3.8.1) for node features, enhancing the pre-trained model with effective smoothing. This approach mitigates issues such as oversmoothing and irrelevant smoothing, thereby improving the precision of lookalike modeling. The main contributions of our work are organized as follows: • We propose a novel two-stage model, GLoM, which leverages LLMs and KGs for the lookalike audience expansion problem to demonstrate its effectiveness and robustness.

• To the best of our knowledge, GLoM is the first lookalike model that combines the strengths of • GLoM significantly outperforms state-of-the-art (SOTA) lookalike models across four public datasets.

2. Related Works

In this section, we review related work on lookalike modeling, focusing on various approaches including similarity-based methods, clustering techniques, rule-based methods, multi-task learning, and graph-based models.

Similarity-based methods, such as those proposed by [ 12 ], expand a given seed list by calculating the similarity between pairs of seed users and candidate users using predefined metrics like Cosine or Jaccard similarity. Several studies have explored approaches to lookalike modeling, such as k-means clustering [ 13, 14 ], which offers simplicity but faces challenges in capturing complex and high-dimensional relationships. Clustering-based models are also quite popular when the number of tasks or campaigns is limited [ 13, 15, 16 ]. These models primarily cluster users to generate a candidate set, which is then ifltered using a regression model. However, this approach compromises on precision and algorithmic complexity to prioritize online performance.

On the other hand, rule-based methods identify similar users based on specific demographic features or interests, as targeted by marketers. These methods typically rely on user profile mining to infer interest tags from user behavior [17]. The limitations of similarity-based and rule-based models are: the former depends on the choice of the similarity function, and the latter captures only high-level features, often leading to suboptimal performance. GLoM addresses these issues by incorporating semantic understanding and more complex relationship modeling.

Multi-task learning has also been explored for lookalike modeling. This approach allows for simultaneous learning across multiple tasks, potentially improving efficiency [ 18]. However, existing multi-task methods are generally designed for scenarios with fewer than vfie tasks [ 15 ], limiting their applicability in real-world settings where hundreds of marketing campaigns run daily. Model-based methods train customized prediction models for each campaign or task, and GLoM falls into this category. For example, logistic regression (LR) has been utilized to expand audiences, which proved effective [19]. One-stage methods that train models from scratch for each campaign are time-consuming and prone to overfitting. More recently, two-stage approaches [ 20, 2 ] have been proposed to pretrain embeddings using data from all campaigns. Rakuten employed a lookalike model for its advertising platform, relying heavily on demographics,user/item attributes and user-item interactions [21, 22, 23]. While it achieves strong performance, comparison is challenging as the data is not publicly available. However, these methods often overlook generalization, task relationships, and semantic understanding. In contrast, GLoM efficiently addresses these challenges.

3. Graph Lookalike Model (GLoM) 3.1. Preliminaries

Problem Statement: In a lookalike setting, a list of seed users = ( 1, 2, … , ) is given to the model and the task is to find similar users to the seed list where ≫ .

Knowledge Graph: A Knowledge Graph (KG) is a directed, labeled graph = ( , , ) where: is the set of entities (nodes), is the set of relations (edge types), and ⊆ × × representing directed edges, where 1, 2 ∈ and ∈ . is the set of triples ( 1, , 2) Our proposed GLoM operates in two stages: a pre-training stage and a graph learning stage. In the pre-training stage, we use LLM-based user/item profile generation and knowledge graph to generate the pre-trained embeddings for the graph learning stage. We construct a knowledge graph using data from user-item interactions, user-item polarity edges (please refer to Sec. 3.3) and user-user similarity edges (please refer to Sec. 3.4). We represent a triplet as (, , ) . Hereafter, bold lowercase letters indicate embeddings, and bold uppercase letters denote matrices. Figure 2 demonstrates GLoM’s model architecture.

3.2. User and Item Profile Generation

So, the user and item profiles are generated as: = ( process in detail in the sections below.

3.2.1. Input Prompt for User

In this section, we describe how we create textual descriptions, or profiles, for users and items for GLoM. These profiles improve the understanding of user and item interaction preferences by adding textual information related to them. For user and item profile generation, we use two types of information: one is the input prompt for users or items and , and the other is the general prompt guideline and . , ), = ( , ). We discuss the In the context of user profile generation, Large Language Models can be utilized to effectively encapsulate the particular types of items that users are likely to purchase. By leveraging collaborative filtering, the system generates a user profile by first identifying items with which the user has interacted. A = [ , , ] is created, where is the title, is the previously generated item profile, and subset ̂ of these items is then uniformly sampled. For each item in this subset, a textual representation is the review provided by the user . The input prompt for generating the user profile is then defined as = ({ ∣ ∈ ̂ }), where (⋅) organizes these textual attributes into a coherent string. This approach provides a comprehensive representation of the users personalized tastes and preferences, ensuring that the generated profile accurately reflects their real opinions and interests (Figure Purchased Items [ Title: Wall-E (Mandarin Chinese Edition) Category: [’Movies & TV’, ’Characters & Series’, ’Wall-E’] Description: [”Pixar genius reigns in this funny romantic comedy, which stars a robot who says absolutely nothing for a full 25 minutes yet somehow completely transfixes and endears himself to the audience within th...”] Review: ”Happy to have found this video. How’s of children entertainment.” Title: Spy: Susan Cooper Undercover Category: [’Movies & TV’, ’Blu-ray’, ’Movies’] Description: [”Quick Shipping !!! New And Sealed !!! This Disc WILL NOT play on standard US DVD player. A multi-region PAL/NTSC DVD player is request to view it in USA/Canada. Please Review Description...”] Review: ”Glad to have found this dvd, funny show.” Title: Rocky Horror Picture Show VHS Category: [”Movies & TV’, ’Art House & International’, ’By Country’, ’United Kingdom’, ’Music & Musicals’] Description: [”Rocky Horror Picture Show [VHS]...”] Review: ”Love this show. Glad I found it on dvd.” whole new hilarious twist on the superhero movie....”]

3.2.2. Input Prompt for Item

For item profile generation, LLMs can be guided to produce profiles that accurately reflect the appealing characteristics of items. The textual information of an item ∈ is categorized into four types: title , original description , item attributes = { 1, … , | | }, and a collection of user reviews = { 1, … , }. The input prompt for generating the item profile is structured as = ( ) with respect to = [ , , , ̂⊆ ]

. The function (⋅) combines these various text features into a single string, ensuring the inclusion of item descriptions or selected user reviews. This ensures that the LLM generates item profiles that accurately capture the distinct attributes and qualities that make the item appealing to users (Figure 4).

3.2.3. User/Item Profile Generation Example

This section presents examples of generating user and item profiles using large language models, with a focus on the Amazon-Movies and TV dataset. While we showcase specific examples from this dataset, the same approach is applied to other datasets such as Books, Electronics, and Automotive, with slight variations in the prompts for general prompt guideline tailored to the item type. For instance, for books, the prompt might ask, "What kind of story would the user enjoy?", whereas for automotive, it could inquire, "What car features are important to the user?".

In generating user profiles, LLMs (e.g., GPT-4 Turbo) are prompted to summarize the types of items that would appeal to the user based on their past purchases and reviews. Item profiles incorporate insights from how different users reviewed the product, with a focus on its unique features. This methodology Item data

3.2.4. Fixed-size Embeddings

To convert user and item profiles into fixed-size embeddings, we utilize the approach outlined in [

The method involves creating a fixed-dimensional vector representation for each profile by applying a model fine-tuned with specific instructions for various tasks. Given a user profile the embeddings and for the user and item profiles are computed as: or an item profile , u = ( ), v = ( ).

Here, (⋅) represents the function to transform the text inputs into fixed-size vectors while preserving the contextual information of the generated profiles.

3.3. User-Item Polarity Edge Creation

To refine the relationship between users and items, we introduce the concept of user-item polarity edge creation. This approach is based on the feedback a user provides about a specific item . The polarity score

is derived from analyzing the review score by combining the user rating with the sentiment expressed in the review text. This score is then used to define a polarity edge as representing the strength and direction of the interaction between the user and the item. These polarity edges are integrated into the KG. To calculate the polarity score, our model utilizes sentiment analysis to extract sentiment scores from reviews, following the method described by Hartmann et al. [25]. It calculates the average sentiment score across all reviews and compares the sentiment of a specific review = (, , ), to this average. If the sentiment score of the user’s review for an item is greater than or equal to the average score, it is counted as the user liking the item; otherwise, it is considered the opposite. By combining this sentiment analysis with the user rating, the model provides a more contextual understanding of user feedback, distinguishing between outlier opinions and the general perspective. The model assigns a weight to both the user rating and the sentiment score as: = × + (1 − ) × .

Here, represents the weight for the user rating, while (1 − ) represents the weight for the sentiment score. This equation balances the influence of the numerical rating and the sentiment expressed in the review, resulting in a more accurate and nuanced assessment of the review’s actual value.

3.4. User-User Similarity Edge Generation

We define a user-user similarity edge based on the similarity between users and , denoted as = ( , , sim( , )). The similarity is calculated based on the cosine similarity of their polarity scores , which measures the angle between vectors representing the polarity scores for items both users have rated. The similarity function between users and is given by: sim( , ) = ∑∈ CM ( , − ̄ ) ( , − ̄ )

2 √∑∈ CM ( , − ̄ ) √∑∈ CM ( , − ̄ ) 2 (1) 1 where ̄ = |CM| ∑∈ CM , and, ̄ = |CM| ∑∈ CM , .

1 ̄ and ̄ both users and , , and , are the polarity scores for item given by users and , and are the average polarity scores for users and , respectively. If the similarity score between two users exceeds the threshold , it indicates they share similar online buying behavior. This similarity computation leverages both the ratings and the sentiment expressed in the ratings, providing a comprehensive measure of user similarity for collaborative filtering. represents the set of items rated by

3.5. Pre-trained Model (PM)

In the pre-training stage, we aim to design an architecture that can efficiently and jointly reason over text and structured data. The knowledge graph provides rich information and a solid knowledge base for solving the lookalike problem, but they lack language understanding. To ease this we introduce a pre-trained model pre(⋅; pre) parameterized with pre which learns graph structure along with generated texts. This model is designed to generate embeddings of entities ∈ in the knowledge graph , formulated as ℎ = pre(;

pre).

We propose two types of entity representations: interaction-based and profile-based. Interaction-based representations capture interaction facts, including polarity and similarity information, while profilebased representations focus on textual and semantic details. These two representations are learned simultaneously in the same vector space without enforcing unification. The pretraining energy function is defined as: ℱ = ℱℐ + ℱ , where ℱℐ is the energy function of interaction-based representations defined as: ℱℐ = ‖u + r − v ‖ and ℱ

is the energy function of user and item profile-based representations [ 26].

To make the learning process of ℱ compatible with ℱℐ, we define ℱ as:ℱ = ℱ + ℱ ℐ + ℱℐ , where ℱ

= ‖u + r − v ‖, in which head and tail are profile generation-based representations of user and item. Also, we have ℱ ℐ

= ‖u + r − v ‖ and ℱℐ = ‖u + r − v ‖, where one of u or v uses profile generation-based representation and the other uses interaction-based representation. In this paper, we generate representations for user/item profile using the method described in Sec. 3.6. To capture the dynamic intentions of users, we design a link prediction task in the pre-trained stage for learning entity representations, which is defined as: ℒpre =

∑ (,,)∈ℰ

∑ ( ′,, ′)∈ℰ−1 [ + (, , ) − ( ′, , ′)]+ , (2) where (, , ) margin, (, , ) is positive triples, which actually exist in the KG, ( ′, , ′) is negative triples, is the is the score function.Here, the score function is defined as (, , ) = ‖ + − ‖1,2, where , and , are the embeddings of head entities, relations and tail entities, respectively. From the other perspective, sum of a head and relation embeddings reaches a near point to the related tail embeddings, therefore it can be regarded as a query to visit tail entities related to the head entity with the relation. In this paper, we term it as knowledge query. Later Graph Learning Model (GLM) aggregates the knowledge queries for aligning neighbor node embeddings in the vector space.

3.6. Graph Learning Model (GLM)

After the Pre-trained Model, the embeddings are passed through a Graph Convolution Network for smoothing. Regarding aggregation, GLoM aggregates knowledge queries instead of neighboring node features. A knowledge query (for definition refer to Sec. 3.5) from a source node to a destination node with a relation , aggregated during the update of the destination node features, is defined as: (, , ) = + . Here, and represent embeddings of the source node and the relation, respectively, and the triple (, , )

exists within the Knowledge Graph (KG). One of the primary benefits of aggregating knowledge queries is the alignment of node embeddings within the vector space. In the update phase, GLoM combines the aggregated knowledge queries with the target node embedding using linear transformation.

3.6.1. Aggregator

Here, we propose mean and attention-based aggregators. We redefine the notation of knowledge queries for multiple layers of GLoM as: (−,,1) = ℎ −1 + ℎ−1 , where indicates the -th layer, (,,)

is the -th query from source to destination with relation , and ℎ and ℎ represent the -th embeddings of the source and relation, respectively. Initial embeddings of all nodes and relations obtained from PM are utilized, i.e., ℎ0 = , ℎ0 = . Two types of aggregators are formulated as follows. Mean aggregator: This aggregator simply computes the average of neighboring knowledge queries: = MEAN ({ −1 (,,) , ∈ ( ), ∈ ℛ(, ) }) is the set of relations between and .

Here, is the -th message of knowledge queries, ( ) is the set of neighbor nodes of node , ℛ(, ) Attention aggregator: Unlike the mean aggregator, the attention aggregator weights each knowledge query based on its importance. Two types of attention aggregators are offered: ( 1) (,,) = (,,) ∈ () = (,,) ∑ (,,)

(,,) ∑∈ () (−,,1) (,,) = ( (−,,1) ) ℎ −1 ( 2) = LeakyReLU ( ( (−,,1) ∥ ℎ−1 )) where (,,)

is the normalized attention coefficient, ∈ ℝ2 is the trainable parameter, is the dimension of the embeddings, and ∥ denotes concatenation. The first method (Eq. ( 6)) employs the inner product between the knowledge query and the destination node to calculate attention coefficients, akin to the attention mechanism of Knowledge Graph Convolutional Networks [27]. The second method (Eq.(7)) introduces trainable parameters that enable automatic adjustment of how knowledge queries are aggregated based on the loss function. (3) (4) (5) (6) (7) After aggregating the knowledge queries, new embeddings for all nodes and relations are obtained by combining the destination node embedding with the aggregated queries. The update rule is as: ℎ = (ℎ −1 + ) + , = −1 + Here, ℎ and are the updated embeddings for target nodes and relations, serving as inputs for the next layer. and are the trainable parameters for the -th layer. This rule combines the destination node embedding with the aggregated queries and applies a linear transformation. The same transformation is applied to relation embeddings, enabling translation between entities and relations. No non-linear functions are used in this process.

3.7. Lookalike Audience Expansion

Our goal here is to obtain user embeddings for GLoM and retrive a target list for each given seed list. Therefore, we employ the following unsupervised loss function for training GLM.

ℒfinal =

∑ ∑ (,ℐ )∈ (,ℐ ′)∈ −1

[ + ( ℎ , ℎℐ) − ( ℎ , ℎℐ ′)]+ where ( , ℐ ) ∈ is the positive pairs of various interactions between users and items, ( , ℐ ′) ∈ −1 is the random negative one. We obtain the user embeddings and use these embeddings with a similarity threshold to filter the closest users of each seed user and generate the new list as target prospecting for each marketing campaign as a final output of GLoM framework. (8) (9)

4. Experiment

To demonstrate that GLoM improves lookalike model performance by utilizing both interactions and generated textual information, we conducted the following experiments to address the specified research questions: • RQ1: Does GLoM outperform other lookalike approaches? • RQ2: How do different components contribute to the model’s performance? • RQ3: How do different user-item profile generations, based on various LLMs, impact pre-training? • RQ4: How does GLoM perform with limited seed lists?

4.1. Datasets

We evaluate our model using four Amazon datasets1: Books, Movies (TV and Movies), Electronics, and Automotive. These datasets include user ratings and reviews, which we preprocess for lookalike modeling.

Model Performance on Amazon public datasets. We report the mean results over five runs. The best results are marked with a superscript asterisk (*), and the second-best results are underlined.

Prec. items in each dataset for which we expanded the user base from the seed users. For knowledge graph construction, we follow the method outlined in [28]. Amazon datasets are selected for their comprehensive user and item attributes, including extensive reviews, compared to other public datasets like MovieLens and Yelp. For training on the Books, Movies, Electronics, and Automotive datasets, seed:non-seed = positive samples:negative samples = 1:10. Each dataset contains a seed set and a non-seed set for training, as well as a set of expanded users for testing. The set of expanded users consists of actual audiences (positive samples) and other candidate users (negative samples).

4.2. Baselines

follows: We describe various approaches suitable for lookalike audience expansion. We reference a few baselines from the deployed lookalike model in WeChat [ 2 ]. The baseline approaches presented here aim to predict users as potential targets for advertising campaigns and can run on a single GPU. Baselines are as Logistic Regression-based Lookalike Model (LR): An end-to-end Logistic Regression (LR) classifier based on the raw features [19].

Pre-trained Model (PM): We utilize the user embeddings from the PM (please refer to Sec. 3.2) model for retrieving the potential users for our experimental datasets.

GraphSAGE: GraphSAGE is a graph-based learning model that generates node embeddings by sampling and aggregating features from a nodes local neighborhood [ 6 ]. We employ GraphSAGE [ 6 ] as an end-to-end system to generate user representations for lookalike setting.

LightGCN: LightGCN is a simplified Graph Convolutional Network (GCN) model tailored for recommendation tasks [ 7 ]. An end-to-end LightGCN is used as baseline similar to GraphSage as GLoM

GLoM LLMs

Books Movies Electronics Automotive Gemini-1.0-pro GPT 3.5 Turbo GPT 4 Turbo LLaMa 3-70B is designed based on graph model.

Pinterest: Pinterest’s two-stage approach is employed as one of our baselines [20]. In the first stage, a global user embedding model is trained to create user embeddings. In the second stage, an embedding-based scoring model is used to compute an affinity score for each user in relation to a specific campaign or task.

MetaHeac: It is a state-of-the-art audience expansion model for advertising and has been deployed in WeChat [ 2 ]. In their paper, they also included LR and Pinterest as baselines.

4.3. Evaluation Protocols and Model settings

The models run on a single GPU NVIDIA Tesla V100. We employ grid search to fine-tune the hyperparameters, ensuring optimal performance. For the embeddings dimensionality , we consider options from {25, 50, 100, 150, 200}, while the learning rate is chosen from {0.001, 0.01, 0.05, 0.1}, and the margin is selected from {1, 5, 10}. To generate accurate user and item profiles, we leverage advanced models including GPT (3.5/4 Turbo), Gemini-1.0-pro, and LLaMa 3-70B, provided by OpenAI, Google, and Meta, respectively. We implement our method and baselines using PyTorch 1.8.2 in a Python 3.6 environment, leveraging both PyTorch and the Deep Graph Library [29] for baseline implementations.

4.4. Performance Comparison (RQ1)

In this experiment, we evaluate the end-to-end performance of all models using Precision, Recall, and PR-AUC. The actual users who purchased an item are treated as positive examples, while the remaining candidate users are treated as negative examples. Table 2 presents the performance of GLoM compared to other baselines.

GLoM operates as a two-stage method, with the first stage focusing on pre-training. Notably, the pre-trained embeddings (PM) outperformed several baseline models and demonstrated competitive performance against SOTA lookalike model MetaHeac. This highlights the effectiveness of embedding pre-training in enhancing look-alike modeling performance.

Key observations from the evaluation are as follows: (1) GLoM consistently achieved superior performance compared to baseline models, providing strong evidence of its effectiveness. Specifically, GLoM-Attn1 delivered the best results on four public datasets. (2) Compared to the state-of-the-art lookalike methods such as MetaHeac and Pinterest, GLoM delivers stronger results. This confirms that GLoM improves user targeting by capturing semantic information related to user behavior, sentiments, and preferences while preserving the graph structure. Specifically, GLoM-Attn1 achieved improvements of 6.40%, 2.76%, 6.75%, and 4.98% compared to the MetaHeac model on the Books, Movies, Electronics, and Automotive datasets, respectively and also significantly outperforming Pinterest. (3) GLoM outperforms ID-based models in performance. Models like LightGCN and GraphSage rely heavily on ID-based information, which may overlook valuable data such as the rich textual information associated with users and items. This indicates that GLoM’s learned representations effectively capture global collaborative relationships, going beyond the limitations of ID-based representation techniques. (4) MetaHeac also performed well in certain cases, such as for Amazon-Electronics. As reported in the paper [ 2 ], it also achieved better performance than the Pinterest on Amazon datasets as well. (5) GLoM-Mean and GLoMAttn1 exhibited more stable performance compared to other methods, with GLoMs Attn1 achieved the best results. However, the performance of GLoM-Attn2 was less stable, potentially due to imbalances in knowledge queries per node. The attention mechanism in GLoM is guided by the pre-training models loss function, which presents challenges in designing a specialized attention mechanism for this task. Figure 5 presents the plotted data for four datasets (Books, Movies, Electronics, and Automotive), showing recall performance as the embedding size (dimensions) increases. Note that we plot the embedding size versus recall values specifically for GLoM-Attn1. As the embedding dimensions grow, recall improves across all datasets at different rates. For most datasets, an embedding size of 125 performs well.Please note that increasing the embedding size can slow down the expansion process during real-time marketing campaigns. However, GLoM can efficiently expand the seed list for the datasets in Table 1 in less than 20 minutes using a single GPU, making it highly effective for rapid campaign scaling.

4.5. Ablation Study (RQ2)

In this paper, we argue that an effective lookalike model needs to better capture both textual signals and graph structures. We analyze the effects of user/item profile generation, polarity edges, user-user similarity, and pre-trained embeddings within our proposed model, GLoM (specifically GLoM-Attn1). To evaluate these components, we first conduct an ablation study to verify the effectiveness of each module in the Pre-trained Model (PM) and assess PM’s overall contribution to GLoM.

We introduce three components in the Pre-trained Model stage, derived from LLM and raw data: User/Item Profile Generation (Sections 3.2 and 3.3), User/Item Polarity Edge Creation (Section 3.5), and User-User Similarity Edge (Section 3.6). First, we remove the User/Item Profile Generation (denoted as GLoM w/o U/I Profile). Second, we remove the User/Item Polarity Edge (denoted as GLoM w/o U/I Polarity). Third, we remove the User-User Similarity Edge (denoted as GLoM w/o U/U Similarity). Lastly, we evaluate the performance of GLoM without the Pre-trained Model. The results are shown in Table 3. We observe the following key findings: 1) The performance of GLoM w/o U/I Profile is the worst compared to GLoM w/o U/I Polarity and GLoM w/o U/U, indicating that the User/Item Profile Generation using LLMs effectively captures semantic signals and user behavior trends, leading to improved performance. 2) Among GLoM w/o U/I Polarity and GLoM w/o U/U, the latter proves to be a more critical component for GLoM. 3) GLoM performs better than GLoM w/o PM, demonstrating that the pre-training phase generates meaningful user and item embeddings that enhance lookalike model performance. Overall, the results in Table 3 highlight that all the proposed components are crucial for constructing an effective lookalike model. 4.6. User-Item Profile Generations using Diferent LLMs (RQ3) In this section, we explore the use of different LLMs, such as LLaMa 3-70B, GPT 3.5 and 4 Turbo, and Gemini-1.0-pro , for generating user and item profiles in GLoM. By leveraging these models, we aim to capture deeper semantic relationships between users and items. Table 4 shows the performance of GLoM when using various large language models for user/item profile generation. We observed that GPT-4 Turbo performed exceptionally well in GLoM, particularly in capturing textual signals, as reflected in the results achieved by GLoM-Attn1 (Table 4). Its ability to model complex user behaviors and preferences from textual data led to significant performance gains. In contrast, LLaMa demonstrated slightly lower performance compared to GPT-4. Gemini-1.0-pro, on the other hand, showed a slightly lower performance than both GPT and LLaMa. Since some users/items have long reviews, and Gemini-1.0-pro is known for handling shorter texts better, this could be a contributing factor. These findings highlight the critical role that model selection plays in optimizing user-item profile generation within GLoM, making it essential to choose the right LLM based on the task at hand.

4.7. Limitations (RQ4)

In lookalike modeling, seed lists serve as the foundation from which the model identifies and expands to find similar users. However, one key limitation is the model’s heavy dependence on the quality, representativeness, and size of the seed list. To generalize effectively and ensure diversity, we intentionally minimize the number of cold users (those with limited data) in the seed list. If the seed list contains a significant number of cold users, the performance of the GLoM model is expected to degrade. We conducted experiments to assess GLoMs performance across different seed list sizes to understand its dependencies. Figure 6 shows GLoMs performance on various metrics with varying seed sizes. The results indicate that GLoMs performance drops when the seed list contains fewer than 300 users. However, its performance stabilizes once the seed list exceeds 500 users. Finetuning smaller LLMs with domain-specific data can simplify GLoM’s current model complexity.

5. Conclusion

In this paper, we address the lookalike problem in advertising platforms by introducing a novel method called the Graph Lookalike Model (GLoM), which leverages the power of Large Language Models. Our model is likely the first model to successfully integrate LLMs with a graph structure, capturing both textual and structural information without using sensitive private data. This enables a deeper understanding of users’ behaviors and preferences to identify similar users for advertising campaigns. We demonstrate its effectiveness using public Amazon datasets, which provide the key features required for a lookalike model. Given GLoM’s ability to better understand users, it has the potential for application in other industrial scenarios, such as recommendation systems, financial risk analysis, wealth management, and more.

Declaration on Generative AI

During the preparation of this work, the authors used ChatGPT (GPT-4) in order to: Grammar and spelling check, paraphrase and reword. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the publications content.

[1]

Kanagal ,

Ahmed ,

Pandey ,

Josifovski ,

Garcia-Pueyo ,

Yuan , Focused matrix factorization for audience selection in display advertising , in: 2013 IEEE 29th International Conference on Data Engineering (ICDE) , IEEE, 2013 , pp. 386 - 397 .

[2]

Zhu , Y. Liu,

Xie ,

Zhuang ,

Hao ,

Ge ,

Zhang ,

Lin ,

Cao , Learning to expand audience via meta hybrid experts and critics for recommendation and advertising , in: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , 2021 , pp. 4005 - 4013 .

[3]

Liu ,

Pardoe , K. Liu,

Thakur ,

Cao ,

Li , Audience expansion for online social network advertising , in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 2016 , pp. 165 - 174 .

[4]

Guo ,

Tang ,

Ye ,

Li ,

He , Deepfm: a factorization-machine based neural network for ctr prediction , arXiv preprint arXiv:1703.04247 ( 2017 ).

[5]

Zhang ,

Yao ,

Sun ,

Tay , Deep learning based recommender system: A survey and new perspectives, ACM computing surveys (CSUR) 52 ( 2019 ) 1 - 38 .

[6]

Hamilton ,

Ying ,

Leskovec , Inductive representation learning on large graphs , in: I. Guyon,

U. V.

Luxburg ,

Bengio ,

Wallach ,

Fergus ,

Vishwanathan , R. Garnett (Eds.), Advances in Neural Information Processing Systems , volume 30 , Curran

Associates

, Inc., 2017 . URL: https: //proceedings.neurips.cc/paper/2017/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf.

[7]

He ,

Deng ,

Wang ,

Li ,

Zhang ,

Wang , Lightgcn: Simplifying and powering graph convolution network for recommendation , in: Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval , 2020 , pp. 639 - 648 .

[8]

Bordes ,

Usunier ,

Garcia-Durán ,

Weston ,

Yakhnenko , Translating embeddings for modeling multi-relational data , in: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS'13 , Curran Associates Inc., Red

Hook

, NY , USA, 2013 , p. 27872795 .

[9] M. M. Rahman , A. Takasu , Knowledge graph embedding via entities type mapping matrix , in: International Conference on Neural Information Processing , Springer, 2018 , pp. 114 - 125 .

[10] M. M. Rahman , A. Takasu , Leveraging entity-type properties in the relational context for knowledge graph embedding , IEICE TRANSACTIONS on Information and Systems 103 ( 2020 ) 958 - 968 .

[11] M. M. Rahman , A. Takasu , G. Demartini, Representation learning for entity type ranking , in: Proceedings of the 35th Annual ACM Symposium on Applied Computing , 2020 , pp. 2049 - 2056 .

[12]

Ma , E. Wagh,

Wen ,

Xia ,

Ormandi ,

Chen , Score look-alike audiences , in: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) , IEEE, 2016 , pp. 647 - 654 .

[13]

Ma ,

Wen ,

Xia ,

Chen , A sub-linear, massive-scale look-alike audience extension system a massive-scale look-alike audience extension , in: Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications , PMLR, 2016 , pp. 51 - 67 .

[14]

Ramesh ,

Teredesai ,

Bindra ,

Pokuri ,

Uppala , Audience segment expansion using distributed in-database k-means clustering , in: Proceedings of the Seventh International Workshop on Data Mining for Online Advertising , 2013 , pp. 1 - 9 .

[15]

G. Y.-Y.

Chan ,

Mai ,

A. B.

Rao ,

R. A.

Rossi ,

Du ,

C. T.

Silva ,

Freire , Interactive audience expansion on large scale online visitor data , in: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , 2021 , pp. 2621 - 2631 .

[16]

Lin ,

Chen ,

Xia ,

Wang ,

Zhang , T. He, Hierarchical information propagation and