=Paper=
{{Paper
|id=Vol-3924/short6
|storemode=property
|title=Robust Training Objectives Improve Embedding-based Retrieval in Industrial Recommendation Systems
|pdfUrl=https://ceur-ws.org/Vol-3924/short6.pdf
|volume=Vol-3924
|authors=Matthew Kolodner,Mingxuan Ju,Zihao Fan,Tong Zhao,Elham Ghazizadeh,Yan Wu,Neil Shah,Yozen Liu
|dblpUrl=https://dblp.org/rec/conf/robustrecsys/KolodnerJF0GWSL24
}}
==Robust Training Objectives Improve Embedding-based Retrieval in Industrial Recommendation Systems==
Robust Training Objectives Improve Embedding-Based Retrieval
in Industrial Recommendation Systems
Matthew Kolodner1,* , Mingxuan Ju1 , Zihao Fan1 , Tong Zhao1 , Elham Ghazizadeh1 , Yan Wu1 ,
Neil Shah1 and Yozen Liu1
1
Snap, Inc., 2772 Donald Douglas Loop N, Santa Monica, CA 90405, USA
Abstract
Improving recommendation systems (RS) can greatly enhance the user experience across many domains, such as social media. Many
RS utilize embedding-based retrieval (EBR) approaches to retrieve candidates for recommendation. In an EBR system, the embedding
quality is key. According to recent literature, self-supervised multitask learning (SSMTL) has showed strong performance on academic
benchmarks in embedding learning and resulted in an overall improvement in multiple downstream tasks, demonstrating a larger
resilience to the adverse conditions between each downstream task and thereby increased robustness and task generalization ability
through the training objective. However, whether or not the success of SSMTL in academia as a robust training objectives translates to
large-scale (i.e., over hundreds of million users and interactions in-between) industrial RS still requires verification. Simply adopting
academic setups in industrial RS might entail two issues. Firstly, many self-supervised objectives require data augmentations (e.g.,
embedding masking/corruption) over a large portion of users and items, which is prohibitively expensive in industrial RS. Furthermore,
some self-supervised objectives might not align with the recommendation task, which might lead to redundant computational overheads
or negative transfer. In light of these two challenges, we evaluate using a robust training objective, specifically SSMTL, through a
large-scale friend recommendation system on a social media platform in the tech sector, identifying whether this increase in robustness
can work at scale in enhancing retrieval in the production setting. Through online A/B testing with SSMTL-based EBR, we observe
statistically significant increases in key metrics in the friend recommendations, with up to 5.45% improvements in new friends made
and 1.91% improvements in new friends made with cold-start users. Besides, with a dedicated case study, the benefits of robust training
objectives are demonstrated through SSMTL on large-scale graphs with gains in both retrieval and end-to-end friend recommendation.
1. Introduction end-to-end recommendation. In this work, we specifically
focus on the friend recommendation EBR setting, where
Recommendation systems (RS) have become a crucial com- vast amounts of topological information relating users are
ponent for user experience [1, 2]. Most industrial RS ex- readily available. Recent works [12, 13, 14] have shown
plore a two-stage process [3]. During the first stage (i.e., the that including this relational information can improve the
retrieval phase), among hundreds of millions of candidate embedding quality. The relational information is commonly
users/items, the RS usually utilizes several models optimized modeled with graph neural networks (GNNs), producing
for recall to select a small set of candidate users/items (e.g., embeddings that leverage neighbor information in graphs,
1,000 candidates). Whereas during the second stage (i.e., the such as co-friend relationships. For graph-aware EBR in
ranking phase), within the candidate subset, the RS can ex- particular, link prediction has seen success for generating
plore complicated expensive models that are optimized for high-quality embeddings [15], where we look to predict
precision to select top πΎ candidates for the final recommen- the presence of an edge between a query node and set of
dation. Such two-stage process enables recommendation candidate nodes.
over large quantities of possible users/items and allows for While link prediction is effective in learning nuanced sim-
greater flexibility towards key recommendation metrics. ilarities and distinctions between candidates, there are sev-
In this two-stage scheme, the retrieval stage is especially eral other self-supervised graph learning philosophies that
important, as it acts as the bottleneck for possible candidates can provide high-quality embeddings, such as mutual infor-
provided to the ranker in the second stage. One common ap- mation maximization [16], generative reconstruction [17],
proach [4, 5] for the retrieval step is to leverage embedding- or whitening decorrelation [18]. Based on these general
based retrieval (EBR). Specifically, EBR learns embeddings philosophies, many graph-based approaches have been pro-
for all users and items as vectors in a low-dimensional latent posed and used to learning embeddings directly, achieving
space. These embeddings are learned in a way such that desirable properties of embeddings without requiring ex-
the distance between them is reflective of their similarity, plicit labels. Recently, Ju et al. [19] evaluated combining
with more similar items being closer together in the latent these self-supervised learning approaches with link predic-
space. As a result, candidates can be retrieved through a tion in a multitask (MTL) setting, demonstrating a larger re-
nearest-neighbor search across the latent space. In practice, silience to the adverse conditions between each downstream
this is done using an approximate nearest neighbor methods task and thereby increased robustness and generalization
optimized for large-scale retrieval, such as FAISS [6] and ability through the training objective
HNSW [7]. However, whether or not using SSMTL in academia as
Many methods [8, 9, 10, 11] have been proposed for gen- a robust training objective translates to large-scale (i.e.,
erating high-quality embeddings for EBR, which lead to over hundreds of millions of users and interactions in-
more relevant candidates and improved metrics after the between) industrial RSs still requires verification. Simply
adopting academic setups in industrial RSs might result
RobustRecSys: Design, Evaluation, and Deployment of Robust Recom-
in several issues. Firstly, many self-supervised objectives
mender Systems Workshop @ RecSys 2024, 18 October, 2024, Bari, Italy.
*
Corresponding author. require data augmentations (e.g., embedding masking/cor-
$ mkolodner@snap.com (M. Kolodner); mju@snap.com (M. Ju); ruption) over a large portion of users and items, which
zfan3@snap.com (Z. Fan); tong@snap.com (T. Zhao); is prohibitively expensive in industrial RSs. Furthermore,
eghazizadeh@snap.com (E. Ghazizadeh); ywu@snap.com (Y. Wu); some self-supervised objectives might not align with the rec-
nshah@snap.com (N. Shah); yliu2@snap.com (Y. Liu)
Β© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
ommendation task, which might lead to redundant compu-
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
Figure 1: In our proposed SSMTL framework, we combine the CCA and MAE SSL methods with the retrieval task in our embedding
generation scheme for EBR. CCA looks to maximize the correlation of two augmented views of the input subgraph while decorrelating
features of a single view. MAE seeks to reconstruct the query user nodes after being propagated through the GNN encoder backbone.
Finally, the retrieval task seeks to predict which candidates share a link with the query user using a categorical cross entropy loss. The
loss of each subtask is weighted and summed to measure the final loss. Embeddings can be generated through the GNN encoder for EBR.
tational overheads or negative transfer [20], a phenomenon 2. Background
where performance can worsen as a result of the complexity
and potentially opposing nature of the various tasks. 2.1. Graph-Aware Embedding-based
In this work, we investigate whether robust SSMTL train- Retrieval
ing objectives are able to improve the link prediction re-
trieval performance on large-scale graphs with over hun- In a two-stage recommendation system with a retrieval then
dreds of millions of nodes and edges. Specifically, we look to ranking phase, the retrieval phase plays an important role fil-
find what combination of SSL approaches can improve over- tering out the most relevant candidates to lighten the load of
all robustness and thereby augment retrieval through com- the ranker. Since the ranking result is largely dependent on
plementary yet disjoint information. In our experiments, items retrieved in the retrieval phase, a good quality retrieval
we find two SSL approaches, based on philosophies from model can drastically improve the final ranking. Embedding
whitening decorrelation (e.g., Canonical Correlation Anal- based retrieval (EBR) is a method thatβs recently adopted
ysis [21]) and generative reconstruction (e.g., Masked Au- and deployed in many content, product, and friend recom-
toencoders [22]), that are able to augment the performance mendation systems[4, 23, 24, 12], and proved to achieve
of link prediction without negative transfer. We deploy superior results. EBR transform users and items into em-
the proposed framework on an industrial large-scale friend beddings, changing the retrieval problem into a nearest-
recommendation system to a community of hundreds of neighbor search problem in a low-dimensional latent space.
millions of users. In the online A/B testing, we observe These embeddings can be determined in advance and in-
significant improvements in key metrics like new friends dexed using an approximate nearest neighbor search such
made, especially with cold-start users on the platform. Our as FAISS [6] and HNSW [7] in order to retrieve the top-π
contributions are summarized as follows: most relevant items efficiently at serving.
When applying EBR to RS problems, the quality of em-
β’ We demonstrate the effectiveness of robust training beddings is of upmost importance. In this paper, we use a
objectives such as SSMTL in a large-scale industrial friend recommendation system as our subject. In scenarios
recommendation system. like friend recommendation where vast amounts of topolog-
ical information relating users and items is readily available,
β’ We conduct an online study of SSMTL on a massive
these embeddings can be augmented with GNNs. Previous
real-world recommendation system, and observe a
work showed that EBR for friend recommendation systems
statistically significant increase in key metrics, with
see benefits leveraging graph-aware embeddings[12]. In
up to 5.45% improvements in new friends made
this setting, nodes would contain individual user features
and 1.91% improvements in new friends made with
while edges map to user-user interactions. This approach
cold-start users.
compliments commonly used graph traversal approaches
(eg. friend-of-friend (FoF) [25]), allowing for retrieval of
candidates from any number of hops away from the target.
Here we describe GNNs for generating graph-aware em-
beddings for EBR. GNNs have demonstrated state-of-the-art
performance in many problems containing rich topological
information within the graph data [26], such as recommen-
dation and forecasting. Formally, we define πΊ = (π±, β°, π),
where π± is the set of π nodes (|π±| = π), β° is the set of 3. Self-Supervised Multitask
edges (β° β π± β π±), and π is a feature matrix of dimension
π where π β RπΓπ . Many modern GNNs also employ
Learning for EBR
a message-passing structure, consisting of an aggregation In the following sections, we describe details of the SSL
(AGG) and update (UPD) function. The goal of this paradigm methods used in our SSMTL approach, our experiment set
is for nodes to receive information from their neighbors, up and results, highlighting the benefits and impact of in-
collecting messages using its AGG function before updating cluding SSMTL based embedding in EBR for large-scale
their own messages with the UPD function, both of which industrial recommendation systems.
are learnable and permutation-invariant. For some node π’
at layer π, the next message-passing layer can be written as
3.1. Self-Supervised Learning Methods
(οΈ (οΈ )οΈ)οΈ
h(π+1)
π’ = UPD(π) h(π) π’ , AGG
(π)
{hπ£(π) , βπ£ β π© (π’)} We identify two self-supervised learning approaches that
(1) are scalable and lead to improvements in the large-scale
where π© (π’) is the neighborhood nodes of node π’. Dif- recommendation setting through a more robust training
ferent message-passing GNN models use different combi- objective.
nations of AGG and UPD functions. An example of a more
complex GNN, Graph-attention networks (GATs) [27], use Canonical Correlation Analysis. Based on work from
an attention mechanism for each pair of nodes π and π [21], Canonical Correlation Analysis (CCA) deploys a non-
contrastive, non-discriminitive SSL method to train the
πΌππ = softmaxπ (πatt (Wβπ , Wβπ )) (2) GNN. The self-supervised training objective is described in
Equation 3. First, given a subgraph with π nodes, two aug-
where W is a linear transformation applied to every node mented views of the subgraph are created and fed through
and πatt is the attention function parameterized by a weight the GNN, producing ZA and ZB where ZA , ZB β RπΓπ .
vector and a non-linearity function. The AGG function is Each of these embeddings are fed through a task-specific
then a attention-weighted sum of its neighbors features head, and then are normalized so that each feature has 0
while the UPD function is implicitly defined in W and the mean and β1π standard deviation, resulting in ZΜA and ZΜB .
non-linearity function. Typically, to generate graph-aware
The loss is then computed from Equation 3. The first term
embeddings from GNNs, a margin based ranking loss[13, 12]
in the equation seeks to minimize the distance of the same
or contrastive[28] loss can be used, to encourage items that
nodes between the two views. The second term enforces
are closer in the graph to be closer in the embedding space.
that the feature-wise covariance of all nodes is equal to the
identity matrix.
2.2. Multitask Learning
Multitask learning (MTL) is an approach in machine learn- β¦ β¦2 (οΈβ¦ β¦2 β¦ β¦2 )οΈ
β¦Λ β¦Λπ β¦Λπ
ing where a model is trained simultaneously on several tasks. βCCA = β¦Z β ZΜ +π ZΜ β I + ZΜ β I
β¦ β¦ β¦
π΄ π΅β¦ β¦Zπ΄ π΄ β¦ β¦Zπ΅ π΅ β¦
πΉ πΉ πΉ
MTL has been extensively explored in recommendation as a (3)
way to improve key metrics [29, 30, 31, 32]. Thus, the core Masked Autoencoders. Based on work from [22], this
idea behind multitask learning is to improve the robustness approach leverages a graph masked autoencoder (MAE)
of the model by leveraging the domain-specific information that focuses on feature reconstruction. First, an augmented
contained in the training signals of related tasks [33, 34]. view of the subgraph is created and the features of the query
Hard parameter sharing, one of the most fundamental forms users are masked out. This augmented graph is then fed
of MTL, uses a shared representation which then branches through the GNN and a task-specific head. The features of
into multiple heads capable of learning task-specific infor- the query users are then re-masked and passed through a
mation [35, 36, 37]. graph convolution layer. As described in Equation 4, for
For graph-aware EBR in particular, self-supervised multi- all masked nodes π±, the final loss is equal to the average
task learning (SSMTL) has been proposed as a new approach of the scaled cosine error between the original features X
to MTL, optimizing the embeddings directly to achieve de- and generated features Z. This approach only relies on the
sirable embedding properties without the use of positive local neighborhood surrounding the query node, making it
or negative labels. In this setting, we combine several self- a good option for large-scale SSMTL.
supervised learning (SSL) methods with a downstream re- )οΈπ¦
trieval task to learn both direct and indirect embedding fea- xππ zπ
(οΈ
1 βοΈ
βMAE = 1β , π¦ β₯ 1 (4)
tures. Recent work [19] has shown that SSMTL can lead to |π±| π£ βπ± βxπ β Β· βzπ β
improved task generalization and embedding quality on sev- π
eral academic benchmarks through the increasingly robust We note that these two approaches both utilize non-
training objective. However, many of the SSL approaches contrastive methods. While experimenting with different
used are constrained to the assumption that global graph in- SSL tasks, we find that contrastive SSL approaches do not
formation can be inferred within the graph structure. This is perform very well in the production setting due to their
not valid in the large-scale recommendation setting, where assumption that global information is readily available in
graphs are constrained to some πΎ-hop around a query user the original and augmented graphs. This is not necessarily
in order to fit in memory. As a result, many of these SSL true for large-scale recommendation, where subgraphs are
methods may lead to negative transfer due to SSL task con- constrained to the K-hop neighborhood surrounding each
flict with the target link prediction task, and there remains query node.
work to be done to investigate which methods perform best
in this large-scale setting.
3.2. Experimental Setup 4. Conclusion
3.2.1. Problem Breakdown In this paper, we evaluate the effectiveness of a robust self-
We evaluate the SSMTL as a robust training objective on an supervised multitask learning objective in embedding-based
industrial friend recommendation system with hundreds of retrieval. Through online evaluation, we demonstrate that
millions of users and connections. To handle this scale of self-supervised methods used in a multi task setting are able
training, we sample subgraphs containing the π-hop neigh- to augment the performance of the underlying retrieval task
borhood around each query user. Following training, the on the scale of over 800 million nodes and edges, providing
embeddings for EBR can be via propagation through the complementary yet disjoint information to enhance the em-
encoder backbone. bedding quality. We observe statistically significant gains
in the number of friendships made for both high and low
3.2.2. Retrieval Baseline degree users.
The baseline model uses a supervised single-task setup for
embedding-based retrieval. We use a GAT as the GNN en- References
coder backbone to obtain embeddings for the query user and
each candidate, producing a candidate embedding matrix z. [1] Y. Li, K. Liu, R. Satapathy, S. Wang, E. Cambria, Re-
We can then compute the dot product between the query cent developments in recommender systems: A survey,
user and each candidate and apply Softmax to generate the 2023. arXiv:2306.12680.
logits. We then calculate the Categorical Cross Entropy [2] A. Sun, Y. Peng, A survey on modern recommendation
Loss with the true labels y across the π = 2 classes and π system based on big data, 2024. arXiv:2206.02631.
candidates, outlined in Equation 5. [3] P. Covington, J. Adams, E. Sargin, Deep neural
networks for youtube recommendations, in: Pro-
ceedings of the 10th ACM Conference on Recom-
π βοΈπ
(οΈ )οΈ
βοΈ ππ§ππ
βretrieval = β π¦ππ log βοΈπ (5) mender Systems, RecSys β16, Association for Comput-
π§ππ
π=1 π=1 π=1 π ing Machinery, New York, NY, USA, 2016, p. 191β198.
URL: https://doi.org/10.1145/2959100.2959190. doi:10.
3.2.3. SSMTL Implementation Details 1145/2959100.2959190.
In our SSMTL approach, we use both CCA and MAE in com- [4] J. Huang, A. Sharma, S. Sun, L. Xia, D. Zhang,
bination with the retrieval baseline as the training objec- P. Pronin, J. Padmanabhan, G. Ottaviano, L. Yang,
tives. All three methods share the same GAT GNN backbone. Embedding-based retrieval in facebook search, CoRR
The augmented views for CCA and MAE occur separately, abs/2006.11632 (2020). URL: https://arxiv.org/abs/2006.
with CCA performing edge and feature drop augmentations 11632. arXiv:2006.11632.
while MAE performs edge drop and query node masking. [5] Y. Gan, Y. Ge, C. Zhou, S. Su, Z. Xu, X. Xu, Q. Hui,
The task-specific head for CCA is a Linear-ReLU-Linear X. Chen, Y. Wang, Y. Shan, Binary embedding-based
block while the task-specific head for MAE is one linear retrieval at tencent, 2023. arXiv:2302.08714.
layer. The final loss with SSMTL is a weighted sum of the [6] J. Johnson, M. Douze, H. JΓ©gou, Billion-scale sim-
losses. ilarity search with gpus, CoRR abs/1702.08734
(2017). URL: http://arxiv.org/abs/1702.08734.
arXiv:1702.08734.
βcombined = πΌβretrieval + π½βCCA + πΎβMAE (6)
[7] Y. A. Malkov, D. A. Yashunin, Efficient and ro-
where πΌ is the weight for the retrieval loss, π½ is the weight bust approximate nearest neighbor search using hi-
of the CCA loss, and πΎ is the weight of the MAE loss. In erarchical navigable small world graphs, CoRR
practice, we observed best performance when the retrieval abs/1603.09320 (2016). URL: http://arxiv.org/abs/1603.
weight was several orders of magnitude larger than the 09320. arXiv:1603.09320.
other loss weights. [8] Y. Zhang, X. Dong, W. Ding, B. Li, P. Jiang, K. Gai,
Divide and conquer: Towards better embedding-based
3.3. Results retrieval for recommender systems from a multi-task
perspective, 2023. arXiv:2302.02657.
We evaluated the effectiveness of SSMTL for end-to-end [9] G. Linden, B. Smith, J. York, Amazon.com recommen-
friend recommendation with online A/B testing. The con- dations: item-to-item collaborative filtering, IEEE In-
trol group used candidates retrieved from the production ternet Computing 7 (2003) 76β80. doi:10.1109/MIC.
model trained with retrieval baseline, while the treatment 2003.1167344.
group instead used candidates retrieved with the new robust [10] R. Jha, S. Subramaniyam, E. Benjamin, T. Taula, Unified
training objective in the SSMTL setting, specifically combin- embedding based personalized retrieval in etsy search,
ing the previous retrieval loss with whitening decorrelation 2023. arXiv:2306.04833.
and generative reconstruction objectives. [11] R. Peng, K. Liu, P. Yang, Z. Yuan, S. Li, Embedding-
In the A/B experimental results, we saw statistically signif- based retrieval with llm for effective agriculture in-
icant improvements across several friend recommendation formation extracting from unstructured data, 2023.
metrics. Specifically, we observed up to 5.45% improve- arXiv:2308.03107.
ments in new friends made and +1.91% new friends made [12] J. Shi, V. Chaurasiya, Y. Liu, S. Vij, Y. Wu, S. Kanduri,
with low-degree users in various markets. Overall, from N. Shah, P. Yu, N. Srivastava, L. Shi, G. Venkatara-
these results, we see that SSMTL is able to provide improved man, J. Yu, Embedding based retrieval in friend rec-
recommendation compared with the single-task setting, in ommendation, in: Proceedings of the 46th Interna-
particular helping with candidate generation for low-degree tional ACM SIGIR Conference on Research and De-
users.
velopment in Information Retrieval, SIGIR β23, As- [28] Z. Liu, Y. Ma, Y. Ouyang, Z. Xiong, Con-
sociation for Computing Machinery, New York, NY, trastive learning for recommender system, CoRR
USA, 2023, p. 3330β3334. URL: https://doi.org/10.1145/ abs/2101.01317 (2021). URL: https://arxiv.org/abs/2101.
3539618.3591848. doi:10.1145/3539618.3591848. 01317. arXiv:2101.01317.
[13] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamil- [29] Y. Lu, R. Dong, B. Smyth, Why i like it: multi-task
ton, J. Leskovec, Graph convolutional neural net- learning for recommendation and explanation, 2018,
works for web-scale recommender systems, CoRR pp. 4β12. doi:10.1145/3240323.3240365.
abs/1806.01973 (2018). URL: http://arxiv.org/abs/1806. [30] J. Ma, Z. Zhao, X. Yi, J. Chen, L. Hong, E. H. Chi, Mod-
01973. arXiv:1806.01973. eling task relationships in multi-task learning with
[14] P. P.-H. Kung, Z. Fan, T. Zhao, Y. Liu, Z. Lai, J. Shi, multi-gate mixture-of-experts, in: Proceedings of
Y. Wu, J. Yu, N. Shah, G. Venkataraman, Improving the 24th ACM SIGKDD International Conference on
embedding-based retrieval in friend recommendation Knowledge Discovery & Data Mining, KDD β18, As-
with ann query expansion, in: Proceedings of the 47th sociation for Computing Machinery, New York, NY,
International ACM SIGIR Conference on Research and USA, 2018, p. 1930β1939. URL: https://doi.org/10.1145/
Development in Information Retrieval, 2024, pp. 2930β 3219819.3220007. doi:10.1145/3219819.3220007.
2934. [31] X. Ma, L. Zhao, G. Huang, Z. Wang, Z. Hu, X. Zhu,
[15] C. Li, X. Peng, Y. Niu, S. Zhang, H. Peng, C. Zhou, J. Li, K. Gai, Entire space multi-task model: An effective
Learning graph attention-aware knowledge graph approach for estimating post-click conversion rate,
embedding, Neurocomputing 461 (2021) 516β529. 2018. arXiv:1804.07931.
URL: https://www.sciencedirect.com/science/article/ [32] H. Tang, J. Liu, M. Zhao, X. Gong, Progressive lay-
pii/S0925231221010961. doi:https://doi.org/10. ered extraction (ple): A novel multi-task learning
1016/j.neucom.2021.01.139. (mtl) model for personalized recommendations, in:
[16] A. v. d. Oord, Y. Li, O. Vinyals, Representation learning Proceedings of the 14th ACM Conference on Recom-
with contrastive predictive coding, arXiv preprint mender Systems, RecSys β20, Association for Comput-
arXiv:1807.03748 (2018). ing Machinery, New York, NY, USA, 2020, p. 269β278.
[17] K. He, X. Chen, S. Xie, Y. Li, P. DollΓ‘r, R. Girshick, URL: https://doi.org/10.1145/3383313.3412236. doi:10.
Masked autoencoders are scalable vision learners, in: 1145/3383313.3412236.
Proceedings of the IEEE/CVF conference on computer [33] A. Argyriou, T. Evgeniou, M. Pontil, Multi-task
vision and pattern recognition, 2022, pp. 16000β16009. feature learning, in: B. SchΓΆlkopf, J. Platt, T. Hoffman
[18] A. Ermolov, A. Siarohin, E. Sangineto, N. Sebe, Whiten- (Eds.), Advances in Neural Information Processing
ing for self-supervised representation learning, in: In- Systems, volume 19, MIT Press, 2006. URL: https:
ternational Conference on Machine Learning, PMLR, //proceedings.neurips.cc/paper_files/paper/2006/file/
2021, pp. 3015β3024. 0afa92fc0f8a9cf051bf2961b06ac56b-Paper.pdf.
[19] M. Ju, T. Zhao, Q. Wen, W. Yu, N. Shah, Y. Ye, C. Zhang, [34] R. Caruana, Multitask learning, Machine Learning 28
Multi-task self-supervised graph neural networks en- (1997) 41β75.
able stronger task generalization, in: The Eleventh [35] P. Guo, C.-Y. Lee, D. Ulbricht, Learning to branch for
International Conference on Learning Representa- multi-task learning, 2020. arXiv:2006.01895.
tions, 2023. URL: https://openreview.net/forum?id= [36] X. Sun, R. Panda, R. S. Feris, Adashare: Learning what
1tHAZRqftM. to share for efficient deep multi-task learning, CoRR
[20] L. Torrey, J. Shavlik, Transfer Learning, IGI Global, abs/1911.12423 (2019). URL: http://arxiv.org/abs/1911.
2010, pp. 242β264. 12423. arXiv:1911.12423.
[21] H. Zhang, Q. Wu, J. Yan, D. Wipf, P. S. Yu, From canon- [37] S. Vandenhende, S. Georgoulis, B. D. Brabandere, L. V.
ical correlation analysis to self-supervised graph neu- Gool, Branched multi-task networks: Deciding what
ral networks, CoRR abs/2106.12484 (2021). URL: https: layers to share, 2020. arXiv:1904.02920.
//arxiv.org/abs/2106.12484. arXiv:2106.12484.
[22] Z. Hou, X. Liu, Y. Cen, Y. Dong, H. Yang, C. Wang,
J. Tang, Graphmae: Self-supervised masked graph au-
toencoders, 2022. arXiv:2205.10803.
[23] P. Covington, J. Adams, E. Sargin, Deep neural net-
works for youtube recommendations, in: Proceedings
of the 10th ACM Conference on Recommender Sys-
tems, New York, NY, USA, 2016.
[24] T. Koh, G. Wu, , M. Mi, Manas hnsw realtime: Power-
ing realtime embedding-based retrieval, 2021.
[25] M. E. J. Newman, Clustering and preferential at-
tachment in growing networks, Physical Review E
64 (2001). URL: http://dx.doi.org/10.1103/PhysRevE.64.
025102. doi:10.1103/physreve.64.025102.
[26] J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, M. Sun,
Graph neural networks: A review of methods and
applications, CoRR abs/1812.08434 (2018). URL: http:
//arxiv.org/abs/1812.08434. arXiv:1812.08434.
[27] P. VeliΔkoviΔ, G. Cucurull, A. Casanova, A. Romero,
P. LiΓ², Y. Bengio, Graph attention networks, 2018.
arXiv:1710.10903.