<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Beauty Beyond Words: Explainable Beauty Product Recommendations Using Ingredient-Based Product Attributes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Siliang Liu</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rahul Suresh</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amin Banitalebi-Dehkordi</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Accurate attribute extraction is critical for beauty product recommendations and building trust with customers. This remains an open problem, as existing solutions are often unreliable and incomplete. We present a system to extract beauty-specific attributes using end-to-end supervised learning based on beauty product ingredients. A key insight to our system is a novel energy-based implicit model architecture. We show that this implicit model architecture ofers significant benefits in terms of accuracy, explainability, robustness, and flexibility. Furthermore, our implicit model can be easily fine-tuned to incorporate additional attributes as they become available, making it more useful in real-world applications. We validate our model on a major e-commerce skincare product catalog dataset and demonstrate its efectiveness. Finally, we showcase how ingredient-based attribute extraction contributes to enhancing the explainability of beauty recommendations.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;attribute extraction</kwd>
        <kwd>beauty recommendation</kwd>
        <kwd>ingredient analysis</kwd>
        <kwd>explainability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The value of the global beauty and personal care market is estimated to be over $646 billion in 2024 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Product discovery and trust are two of the biggest considerations in Beauty customers’ shopping
journeys in e-commerce stores. Many factors contribute to these problems, such as lack of personalized
recommendations, inaccurate or incomplete product benefit and/or ingredient information, lack of
targeted curation, etc. Having such information accurately listed in the product catalogue is particularly
important for Beauty category of products, as they are topically applied to the skin. Manual curation
and sanitization of such metadata is possible at small scales. However, for larger e-commerce stores,
with a large portfolio of products, it will be impractical to rely on manual annotation.
      </p>
      <p>The primary objective of our work is to enhance the beauty shopping experience by automatically and
accurately extracting beauty attributes at scale. These attributes not only aid customers in comparing
and refining product choices but also foster trust in the e-commerce stores. Furthermore, the extracted
attributes contribute to building more explainable beauty recommendations, which empower customers
to make informed purchasing decisions.</p>
      <p>We propose a robust and scalable learning-based solution capable of predicting beauty attributes from
product ingredients. To achieve this, we integrate an energy-based implicit strategy to extract 5 skin
types, 11 skin concerns, and 17 attributes commonly preferred across beauty products, as elaborated in
Section 9.1. In summary, the key benefits of our proposed model are:
• Improved accuracy and precision compared to the alternatives
• Explainability through analysis of the attention weights (§5.4)
• Robustness in a low-resource regime via implicit data augmentation (§5.5)
• Flexibility when finetuning previously trained models on new labels (§6.2)</p>
      <p>To the best of our knowledge, there has been no prior study on the extraction of beauty-specific
attributes based on product ingredients. Our contributions are outlined as follows:
• We introduce a novel energy-based implicit model for extracting beauty attributes from product
ingredients and the title. We define implicit vs. explicit models in Section 3.
• Our proposed approach is assessed using skincare products from a major e-commerce store. We
demonstrate its superiority over traditional keyword-based solutions and an explicit classifier
baseline on a test dataset annotated by beauty domain experts.
• We document and extensively discuss the key algorithmic and architectural features that
contribute to explainability, robustness, and flexibility of our proposed model.
• As a use-case study, we illustrate how ingredient-based extracted attributes can enhance the
development of explainable beauty recommendations in Section 7.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        Attribute Value Extraction The problem of product attribute extraction in e-commerce is
traditionally solved using named entity recognition (NER). NER approaches typically use
beginning-insideoutside (BIO) tagging [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] to segment texts. However, NER-based approaches exhibit substantial
limitations due to their reliance on predefined entity types. This rigidity makes it dificult to scale
in dynamic environments where attributes are numerous and constantly evolving, such as in beauty
product recommendations. Certain research also models the attribute extraction task as a sequential
tagging problem [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] using CRF and BiLSTM. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] describes a method that extracts attributes using a
parameterized decoder with pretrained attribute embeddings, through a hypernetwork and a
Mixtureof-Experts (MoE) module. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] also model the attribute to make the prediction task more scalable. Our
work is similar to the solution proposed in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which uses BERT and Bi-LSTMs to model semantic
relations between attribute and product titles on a large-scale dataset. However, the deep learning
modules in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] are primarily used as components in the NER pipeline and the outputs of the model are
still the BIO tags. Our work is diferent in that our proposed model directly outputs the attribute values
and the architectural design choices are heavily guided by explainability, robustness, and flexibility.
      </p>
      <p>
        In the direction of classification tasks, recent advancements utilize multitask framework and
multimodality [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8, 9, 10</xref>
        ]. Furthermore, these models utilize parameter sharing across diferent attribute
prediction tasks, reducing the model’s complexity and encouraging generalization. Each attribute has its
own output layer, allowing the network to predict multiple attributes simultaneously. On the other hand,
prior works have demonstrated that incorporating an implicit method [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ] ofers unique benefits.
In particular, when treating product attribute extraction as an implicit classification problem—where
attributes themselves are also part of the input—the model can focus on specific attributes to extract
from the product description. This approach helps the model learn more meaningful and relevant
embeddings from the input which leads to more accurate attribute value extraction.
Beauty Product Recommendation Extant literature provides limited research on beauty product
recommendation that incorporates ingredient analysis [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ]. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] directly uses an
ingredientconcern mapping table to provide solutions for users of various skin conditions detected by an object
detection computer vision model. However, this mapping table is often supplied by a third party where
mappings are constructed independently for each ingredient without accounting for the order and the
interactions with other ingredients, leading to inflexible rule-based recommendation methods. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]’s
approach extracts ingredient eficacy based on user reviews and recommends products containing those
ingredients for customers across various age groups. Although this method relies on user-generated
content, it does not align with our fact-based approach, making it inapplicable to our use-case scenario.
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] employs a method based on ingredient similarity using one-hot encoding to recommend products
given a user’s past purchase. However, this work does not leverage ingredient data to predict targeted
skin types and concerns directly, which is the focus of our work.
      </p>
      <sec id="sec-2-1">
        <title>Query Attributes</title>
        <p>Dry Skin, Oily Skin, Acne, Sagging,
Dark Spot, Cruelty-Free, ...</p>
      </sec>
      <sec id="sec-2-2">
        <title>Extract Product Information</title>
        <p>Ingredients—Snail Secretion Filtrate, Betaine,
Butylene Glycol, 1,2-Hexanediol, Sodium Polyacrylate, ...
Title—COSRX Snail Mucin 96% Power Repairing Essence
cnE rT
o an
rLed frsom
ryea re
xN-1</p>
        <p>S
le M
f- tu
A l
tt i</p>
        <p>e h
iton ead
n
x1</p>
        <p>Output
Probability</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. System Overview</title>
      <p>
        We approach the beauty attribute extraction problem as a supervised multi-label classification task. Our
proposed solution features a bidirectional Transformer encoder network similar to BERT [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], with a
slight modification applied to the last attention layer as summarized in Algorithm 1. It is important to
note that the network does not use the feed-forward layers in the last Transformer encoder block and
does not have any additional classifier modules commonly used in downstream learning tasks. Instead,
the logits are directly calculated from the attention values. We refer to our model as BeautyTech-BERT,
or BT-BERT for short.
      </p>
      <p>The model operates by taking as inputs a query attribute, a list of ingredients, and the product title,
and producing the probability for the query attribute. Figure 1 shows an example use-case where the
user is querying six attributes for a product titled “COSRX Snail Mucin Essence". Based on the product
ingredients, the network will make an inference on whether to label the query attributes true or false.
In this case, since Betaine is an ingredient known for its hydrating properties, the network is likely to
predict true for Dry Skin, meaning this product likely benefits those who have a dry skin type.</p>
      <p>
        Conceptually, our model can be viewed as an energy-based model (EBM) [
        <xref ref-type="bibr" rid="ref19 ref20 ref21">19, 20, 21</xref>
        ], as it assigns a
normalized scalar (or "energy") to each input data point, thereby representing a probability distribution
over the training data. We also denote our model as an implicit model, as it accepts the query attribute
as input and generates a prediction solely for that attribute. This distinguishes it from conventional
multi-label classifiers, where the classifier module and the number of output classes must be explicitly
defined.
      </p>
      <p>Model Input For each product, the query attribute is concatenated with ingredients and title to pass
to the model. Maintaining the original sequence order of the ingredient list is essential, as it reflects the
standard convention of listing higher potency ingredients first. We first tokenize the query label and
pad query tokens up to a length of 3. The product ingredients and title are also tokenized. The entire
sequence is truncated or padded such that the final length is 512. We place the query attribute at the
beginning of the input sequence so that its position is consistent across all input sequences—similar
to the efect of the [CLS] token in BERT when using it in downstream tasks—which is important for
computing the logits.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Data Preparation</title>
      <p>
        Our proposed method is a supervised learning approach and thus requires labeled training data. We first
collect a dataset of skincare products from product data available publicly [
        <xref ref-type="bibr" rid="ref22 ref23">22, 23</xref>
        ]. For each product,
attribute labels were meticulously annotated by domain experts based on years of scientific ingredient
research. An example is shown in Figure 4. Overall, we collected a total of 11580 data points, where
9334 (≈ 80%) are dedicated to training and 2246 (≈ 20%) to evaluation. Figure 5 shows the distribution
of products categorized by product types and attributes in our dataset.
      </p>
      <p>Multilabel Classification Output:
Dry Skin, Oily Skin, Sagging, Hydration, Fragrance Free,...</p>
      <p>Sigmoid
[1st_token_out]</p>
      <sec id="sec-4-1">
        <title>Feedforward Network</title>
        <p>Multilabel Classification Output:
Dry Skin, Oily Skin, Sagging, Hydration, Fragrance Free,...</p>
        <p>Sigmoid
Fully Connected
Encoder</p>
        <p>Self-Attention
Layer Norm
Feed Forward</p>
        <p>Layer Norm
Self-Attention
xN-1
Encoder
[CLS_out]
Layer Norm
Feed Forward</p>
        <p>Layer Norm
Self-Attention
xN
(Sub)word Embeddings + Segment Embeddings + Positional Embeddings
(Sub)word Embeddings + Segment Embeddings + Positional Embeddings</p>
      </sec>
      <sec id="sec-4-2">
        <title>Input: Query Attribute [CLS] Ingredients [SEP] Title [PAD]</title>
        <p>Input: [CLS] Ingredients [SEP] Title [PAD]
Implicit Model
Explicit Model
Algorithm 1 BT-BERT Forward Pass
# extract the last layer’s attention, e.g., -1
# attentions are [batch, heads, seqlen, seqlen]
attentions = outputs["attentions"][-1]
# summing attention values over all heads
# for the first token attending to itself
# 16 is a hyperparameter multiplication factor
logits = 16 * attentions[:, :, 0, 0].sum(dim=1)
L = binary_cross_entropy_with_logits(logits, labels)</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <p>
        5.1. Training Details
For all experiments, we train the network end-to-end with a batch size of 8 until convergence. We use
the AdamW optimizer [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] with an initial learning rate of 3 × 10− 5. We follow the standard setup
for training Transformer models by splitting the trainable parameters into two categories: decay and
non-decay parameters. Non-decaying parameters are biases and LayerNorm [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] parameters; all other
parameters are weight decayed. We set beta2 = 0.95 to improve training stability as recommended
in [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ].
      </p>
      <p>
        We explored a few diferent training recipes but found them to have negligible impact on the final
model performance, including using a cosine annealing learning rate scheduler [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], linear decay
scheduler, and weighted loss for addressing the class imbalance issue.
5.2. Baseline Solutions
We evaluated our method against two simple baseline solutions: Fuzzy Search and the explicit model
alternative illustrated in Figure 2.
      </p>
      <p>Fuzzy Search This is a straightforward approach of finding keywords based on edit distance and
other heuristics. Specifically, a predefined list of target keywords is established (see Section 9.6) for
each of the 33 attributes. Subsequently, a product is categorized as possessing a particular attribute if
any of the keywords from the corresponding list are detected within the product information.</p>
      <p>We compare to this baseline as an example of highly explainable solution, but we are well aware
that it is not state-of-the-art by any means. By examining a few examples, the limitations of the fuzzy
search approach is immediately apparent. First, fuzzy search is unable to discern complex textual
context. For example, it may overlook the labeling of a product described as free of perfume, silicones,
phthalates, fragrance as ‘Fragrance Free’. Second, it is sensitive to error tolerance threshold. For instance,
despite a product being described as hydra intensive treatment, the method may not assign the attribute
"Hydration" if the error tolerance is set too low.</p>
      <p>
        Explicit Model A common approach for classification tasks often trains an explicit feed-forward
network on top of a pre-trained rich embedding, similar to the approach described in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. As a
benchmark, we experimented with this approach, where the model receives product information as
input and outputs the likelihood of the 33 labels. Figure 2 highlights the diferences between the implicit
and the explicit models. In the explicit model, the classifier’s output dimension is predefined to be the
same as the number of attributes. For this approach, we use the pre-trained weights and tokenizer of
bge-base-en-v1.5 [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] from HuggingFace. We chose bge-base-en-v1.5 as it is considered the
state-of-the-art text embedding model for retrieval, clustering, reranking tasks in the Massive Text
Embedding Benchmark (MTEB) [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. As a common practice, we freeze the backbone weights and only
update the classifier parameters for four epochs to avoid catastrophic forgetting. We find that training
end-to-end after four epochs provides the optimal results compared to other configurations.
5.3. Model Results
We evaluate models on the standard classification metrics: Accuracy, Precision, Recall and F1-Score.
Although we report recall and F1-score, we prioritize accuracy and precision as the main evaluation
metrics. A higher precision aligns more closely with our acceptable risk threshold by minimizing the
likelihood of potentially recommending products containing unsuitable ingredients to customers with
particularly sensitive skin. This is important as we envision attribute-based beauty recommendations
as one of the direct applications on this work.
      </p>
      <p>Table 1 summarizes the results of label prediction across diferent methods. We observe that both
learning-based methods significantly outperform the fuzzy search baseline, as expected. The implicit
model performs slightly better than the explicit alternative across all evaluation metrics. Aside from
the quantitative edge, the implicit model ofers other qualitative advantages that the explicit model
does not. We discuss this extensively in the following sections.
5.4. Explainability
In this section, we analyze the input tokens with high attention values in the second last layer of the
Transformer encoder block. Top tokens are obtained using Algorithm 2.</p>
      <p>In Table 2, we choose three query attributes—‘Acne’, ‘Fine Lines and Wrinkles’ and ‘Hydration’—and
show that tokens with high attention values are ingredients that address the target skin concerns. This
means that our model has learned the efects of diferent ingredients and how they are associated to
diferent skin concerns and skin types. We chose these labels, as they are the most popular filter criteria
for beauty products.</p>
      <p>We also assess the high attention tokens for each predicted label of a single product and show that
these tokens are diferent across attributes of a given product. This means that our model has learned to
pay attention to diferent tokens when it is being asked about diferent attributes. Table 3 demonstrates
some of the examined products.
5.5. Robustness in Low Data Regime
In this section, we present empirical evidence demonstrating the robust performance of BT-BERT even
when the volume of training data is limited. Figure 6 shows the validation accuracy across various
degrees of data scarcity, namely when the model is trained using the full dataset, as well as 1/2, 1/4, and
1/8 of the full training corpus. In each training run, we systematically down-sample the training set
and keep the validation set constant, i.e., it still contains the same 2246 products.</p>
      <p>
        Note that for the 1/8 training, the model is trained with only 1167 products and yet still the validation
accuracy only drops by less than 1.25%. We hypothesize that the robust performance of BT-BERT in
such a low-resource regime can be attributed to the fact that it is an energy-based implicit model, as
opposed to an explicit classifier. The same scaling pattern is observed in other energy-based models
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Additionally, we attribute part of such robustness to the implicit data augmentation strategy
employed in training—specifically, each product is paired with all 33 query attributes, exposing our
model to diverse input contexts. We have not yet fully characterize the scaling behaviors of implicit
and explicit models. It is possible that with improved training techniques, the explicit approach can
close the gap in low-resource regimes.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>6.1. Does the choice of logits transformation matter?
Our early experiments indicate that scaling the probability linearly with 16 achieves better results
than not employing it. We explored an alternative scaling formulation using  () = log(/(1 − )),
where  represents the attention value of the first query token from all attention heads. The design
is inspired by probability theory, where /(1 − ) is commonly referred to as the odds or odds ratio
when  is a probability. Taking the logarithm of the odds ratio is a common transformation used in
logistic regression to convert probability into logits.</p>
      <p>Additionally, we experimented with using the summation and average of the attention values from
the first three query tokens as  before applying the log transformation. However, these variations did
not produce better results. Ultimately, we chose the linear scaling method of multiplying by 16 due to
its simplicity and slightly faster computation times.
6.2. Finetuning on Additional Attributes
In this section, we discuss the adaptability of implicit models in incorporating new labeled attributes as
they become available. We design a scenario mirroring real-world dynamics, where an initial dataset
comprises 30 out of 33 labels, with the remaining 3 labels introduced in a subsequent release. Such
scenarios are commonplace in the beauty industry, where emerging trends and evolving consumer
preferences necessitate the addition of new product attributes. For instance, the advent of clean beauty
as a trend in 2023 [30] underscores the relevance of this work. Through comprehensive analysis and
experimentation, we assess and highlight the implicit model’s eficacy in seamlessly incorporating new
attributes.</p>
      <p>We removed the labels for ‘Fragrance Free’ (generally-preferred), ‘Oily Skin’ (skin type), and ‘Acne’
(skin concern) from the full dataset (full) and trained a model on the remaining 30 labels (30). Then,
we add back the removed labels and finetune the previously trained model with the complete dataset
for only one epoch. Table 4 shows the validation accuracies before and after the finetuning step. When
ifnetuning on only the three additional labels ( 3), we observe a significant drop in validation accuracy
for the existing 30 labels in the validation set. We believe this is due to the catastrophic forgetting
problem and could potentially be alleviated by using more advanced finetuning algorithms [ 31, 32, 33].</p>
      <p>When finetuning with full, we observe only a slight drop of performance when predicting the
existing 30 labels, but the accuracy for the new labels is drastically improved. It is important to note
that this finetuning procedure is impossible when using explicit models, since the number of output
classes is diferent and therefore the classifier must be replaced and retrained.
6.3. Alternating Query Attribute Tokens
In this section, we highlight the benefit of our implicit model during inference time. First, we show that
it can handle similar but not identical query attributes. We take ‘Fine Lines and Wrinkles’ as an example
and replace the query attribute with just a single word ‘Lines’ for a commonly available anti-wrinkle
renewal skin cream. We use Algorithm 2 to extract the high attention tokens and track how they change
when the attribute tokens are replaced.</p>
      <p>We observed a number of overlapping tokens especially those addressing lines and wrinkles—
‘#chio’, ‘pu’, ‘soy’, ‘lines’, ‘baku’, ‘re’, and ‘#tino’. We also identified non-overlapping
tokens such as water, after, cleansing, fine, cart, and wr. It is important to note that the
nonoverlapping tokens, such as ’water’ and ’cleansing,’ are more general and not as directly relevant to the
specific skin concern. We believe that this approach can help us better understand the ingredients and
their target uses.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Explainable Beauty Recommendation and Customer Understanding</title>
      <p>Explainable Beauty Recommendation One critical application of ingredient-based attribute
extraction lies in delivering explainable recommendations to beauty customers.</p>
      <p>In the ever-evolving beauty industry, where personalization is key, transparency and clarity in
product suggestions are vital. As illustrated in Figure 3, skincare recommendations are made using a
point-wise approach, where each product is individually assessed based on the customer’s specific skin
type and concerns. Here, the customer has selected “oily” skin and concerns of “acne” and “dullskin”.
The recommended products not only contain ingredients intended to address these issues but are
also compatible with the customer’s stated skin type, enhancing the trustworthiness and relevance
of each suggestion. Each product is annotated with its predicted target skin concerns and skin types,
alongside the ingredients intended to address those concerns, using Algorithm 2 discussed in Section 5.4.
For example, Salicylic Acid is highlighted for its anti-acne properties across various product types
like cleansers, pads, and serums. Furthermore, the system strategically omits products with oil-based
ingredients that could exacerbate oily skin, ensuring that recommendations are appropriate for the
user’s concerns.</p>
      <p>By providing fact-based explanations for recommended products, this approach ofers clear and
transparent justifications for the recommendations. As customers purchase and use products with
efective ingredients, they are more likely to achieve the desired skin results, fostering long-term trust
and encouraging repeat engagement with the e-commerce store. This method not only empowers
customers to make informed purchasing decisions but also strengthens their trust in the recommendation
system. This approach is versatile and can be applied broadly across most beauty catalogs, including
haircare and makeup, where ingredients stay on the skin for extended periods. In the context of strategic
and utility-aware recommendations, explainability is crucial for aligning personalized suggestions with
both individual needs and broader objectives. This alignment ultimately enhances customer confidence,
satisfaction, and long-term audience growth.</p>
      <p>Customer understanding Conversely, customer propensity toward specific attributes—such as
preferred skin type, skin concerns, and ingredient preferences—can be inferred from their past
purchases. Our future work focuses on understanding customer skin types and concerns by building upon
existing attribute extraction methodologies. This advancement will enable further refinement of our
recommendation algorithms, particularly in the ranking layer.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion</title>
      <p>We present an energy-based implicit model for extracting beauty-specific attributes trained using
end-to-end supervised learning. We empirically show that the implicit approach outperforms traditional
explicit classifiers in terms of accuracy, precision, and other evaluation metrics. Aside from better
performance, we show that the implicit model is explainable, robust to low-data scenarios, and easy to
incorporate new attributes as they become available. Using the explainability feature of our model, we
propose novel ways to use the predictions without additional training by comparing and contrasting
the high value tokens across diferent products and attributes. We have not yet fully characterized the
limits of the model’s capabilities. Currently, we only qualitatively identify the high attention value
tokens and discuss how they are related to the specific skin concerns and skin types in our attention
analysis. We wish to better quantify the correlations between all predicted ingredients and the attributes.
Although our work focuses on beauty attribute extraction, we believe the simplicity of our approach
and comprehensiveness of our analysis provide a solid foundation for future research in designing more
capable and explainable models in all domains of machine learning. In future work, we will validate the
generated attributes within downstream recommendation systems and conduct a thorough evaluation.
Furthermore, we will assess the impact of explainability for end users through A/B testing.</p>
    </sec>
    <sec id="sec-9">
      <title>9. Appendices</title>
      <p>9.1. Labels for Skincare Products
We define 33 labels for skincare products that include 5 skin types, 11 skin concerns, and 17 attributes
that are generally preferred across beauty products.</p>
      <p>• Target skin types: Dry Skin, Normal Skin, Oily Skin, Combination Skin, Sensitive Skin
• Target skin concerns: Acne, Hydration, Pores, Fine Lines and Wrinkles, Sagging, Dark Spots,</p>
      <p>Dullness, Redness, Uneven Texture, Dark Circles, Pufiness
• General preferred beauty attributes: 100% Vegan, Cruelty Free, Fragrance Free, Hypoallergenic,
Paraben Free, Mineral Oil Free, Palm Oil Free, Oil Free, Alcohol Free, Sulphate Free, Gluten Free,
Silicone Free, Phthalate Free, Talc free, Non Comedogenic, Aluminum Free, Fluoride Free.
9.2. Product information and Labels
Each product comes with a title, list of ingredients, and a Boolean label for each attribute. An example
is shown in Figure 4.
(item_name) for each product.
9.3. Training data label distribution</p>
      <p>Label Distribution across Product Type
eagn001V% rrrcFFeeeaagn illrcyeeaogpnH rrFeeeaaPbn lilirrFeeeanOM lilrFeeaPOm llrcFeeoohA lftrFeeeauS ltrFeeenuG iilrcFeeeonS lttrFeeeaaPhh lrcFeeaT iceeooogdnnCNm lirFeeO ilrFeeunuAmm ilrrFFeeeodu irkynSD ilikynSO iiitskveennSS ilrkaonSNm iiitkaoobnnnSCm trskaopSD sseednR llssenuD cenA ilirssLkeeadnnnW iagggnS ilrrcskeaCD trsxveeeeTunnU rsePo iffssePnu itryaodnH ltrrFyeeeuC
e
n</p>
      <p>Labels iF
products associated with the respective attribute. For instance, there are a total of 1809 out of 11580 products for</p>
      <p>2,400
2,200
2,000
1,800
1,600
1,400
t
1un,200
o
1C,000
800
600
400
200
0
Product Type</p>
      <p>ASTRINGENT_SUBSTANCE
SKIN_CARE_AGENT
SKIN_CARE_PRODUCT
SKIN_CLEANING_AGENT
SKIN_CLEANING_WIPE
SKIN_EXFOLIANT
SKIN_MOISTURIZER
SKIN_PROTECTANT
SKIN_SERUM
SKIN_TREATMENT_MASK</p>
      <p>SUNSCREEN
9.4. Learning Curves for Robustness in Low Data Regime Experiment
9.5. Algorithm for Key Token Extraction Based on Attention Values
Algorithm 2 Key Token Extraction Based on Attention Values
# get index of top-k attention per row across all heads
topk_indices = attentions.flatten(0, 1).topk(topk).indices
topk_indices = topk_indices.unique()
# remove non-meaningful tokens
TO_REMOVE = [‘,’, ‘[CLS]’, ‘[SEP]’, ‘(’, ‘)’, ‘[PAD]’]
topk_tokens = [k for k in topk_tokens if k not in TO_REMOVE]
# convert col indices to token strings
topk_tokens = convert_ids_to_tokens(input_ids[topk_indices])
9.6. FuzzySearch Attribute Key Words
For FuzzySearch method, We define keywords for each of the 33 labels.</p>
      <p>• Dry Skin: "dry", "all", "universal".
• Normal Skin: "normal", "all", "universal".
• Oily Skin: "oil", "all", "universal".
• Combination Skin: "combination", "all", "universal".
• Sensitive Skin: "sensitive", "all", "universal".
• Acne: "anti acne", "blackheads", "salicylic acid", "Glycolic Acid", "Benzoyl Peroxide", "breakouts
treatment", "acne preventing", "skin clarifying".
• Hydration: "dehydration", "dryness", "hydrating", "rehydrate", "soothing", "moisturizing",
"nourishing", "softening", "replenishing".
• Pores: "pore", "oil control".
[30] K. MCGRATH, Did clean beauty go too far?, https://www.allure.com/story/is-clean-beauty-over,
2023. Accessed: 2024-04.
[31] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, Lora: Low-rank
adaptation of large language models, arXiv preprint arXiv:2106.09685 (2021).
[32] S.-Y. Liu, C.-Y. Wang, H. Yin, P. Molchanov, Y.-C. F. Wang, K.-T. Cheng, M.-H. Chen, Dora:</p>
      <p>Weight-decomposed low-rank adaptation, arXiv preprint arXiv:2402.09353 (2024).
[33] L. Zhang, A. Rao, M. Agrawala, Adding conditional control to text-to-image difusion models, in:
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3836–3847.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wood</surname>
          </string-name>
          , Beauty &amp; personal care-worldwide, https://www.statista.com/outlook/cmo/ beauty-personal-care/worldwide,
          <year>2024</year>
          . Accessed:
          <fpage>2024</fpage>
          -
          <lpage>04</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chiticariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Krishnamurthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Reiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vaithyanathan</surname>
          </string-name>
          ,
          <article-title>Domain adaptation of rulebased annotators for named-entity recognition tasks</article-title>
          ,
          <source>in: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</source>
          , Cambridge, MA,
          <year>2010</year>
          , pp.
          <fpage>1002</fpage>
          -
          <lpage>1012</lpage>
          . URL: https://aclanthology.org/D10-1098.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Putthividhya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>Bootstrapped named entity recognition for product attribute extraction</article-title>
          ,
          <source>in: EMNLP</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>1557</fpage>
          -
          <lpage>1567</lpage>
          . URL: http://www.aclweb.org/anthology/D11-1144.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Bidirectional LSTM-CRF models for sequence tagging</article-title>
          ,
          <source>CoRR abs/1508</source>
          .
          <year>01991</year>
          (
          <year>2015</year>
          ). URL: http://arxiv.org/abs/1508.
          <year>01991</year>
          . arXiv:
          <fpage>1508</fpage>
          .
          <year>01991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Opentag: Open attribute value extraction from product profiles</article-title>
          , CoRR abs/
          <year>1806</year>
          .01264 (
          <year>2018</year>
          ). URL: http://arxiv.org/abs/
          <year>1806</year>
          .01264. arXiv:
          <year>1806</year>
          .01264.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Zalmout</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Grant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <article-title>Adatag: Multi-attribute value extraction from product profiles with adaptive decoding</article-title>
          ,
          <source>arXiv preprint arXiv:2106.02318</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <article-title>Scaling up open tagging from tens to thousands: Comprehension empowered attribute value extraction from product title</article-title>
          ,
          <source>in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>5214</fpage>
          -
          <lpage>5223</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cardoso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Daolio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vargas</surname>
          </string-name>
          ,
          <article-title>Product characterisation towards personalisation: Learning attributes from unstructured data to recommend fashion products</article-title>
          ,
          <source>in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>80</fpage>
          -
          <lpage>89</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Krishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khabsa</surname>
          </string-name>
          , H. Ma, SMARTAVE:
          <article-title>Structured multimodal transformer for product attribute value extraction</article-title>
          , in: Y.
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Kozareva</surname>
          </string-name>
          , Y. Zhang (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2022</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Abu Dhabi, United Arab Emirates,
          <year>2022</year>
          , pp.
          <fpage>263</fpage>
          -
          <lpage>276</lpage>
          . URL: https: //aclanthology.org/
          <year>2022</year>
          .findings-emnlp.
          <volume>20</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .findings-emnlp.
          <volume>20</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F. T.</given-names>
            <surname>Dezaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Arora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Suresh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Banitalebi-Dehkordi</surname>
          </string-name>
          ,
          <article-title>Automated material properties extraction for enhanced beauty product discovery and makeup virtual try-on</article-title>
          ,
          <source>arXiv preprint arXiv:2312.00766</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Mordatch</surname>
          </string-name>
          ,
          <article-title>Implicit generation and modeling with energy-based models</article-title>
          , in: H.
          <string-name>
            <surname>Wallach</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Beygelzimer</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>d'Alché-</article-title>
          <string-name>
            <surname>Buc</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Fox</surname>
          </string-name>
          , R. Garnett (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>32</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Florence</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lynch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. A.</given-names>
            <surname>Ramirez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wahid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Downs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Mordatch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tompson</surname>
          </string-name>
          ,
          <article-title>Implicit behavioral cloning</article-title>
          ,
          <source>in: 5th Annual Conference on Robot Learning</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Afshar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yeon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Levitskyy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Suresh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Banitalebi-Dehkordi</surname>
          </string-name>
          ,
          <article-title>Improving the accuracy of beauty product recommendations by assessing face illumination quality</article-title>
          ,
          <source>arXiv preprint arXiv:2309.04022</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Alashkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <article-title>Examples-rules guided deep neural network for makeup recommendation</article-title>
          ,
          <source>in: Proceedings of the AAAI conference on artificial intelligence</source>
          , volume
          <volume>31</volume>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>H.-H. Li</surname>
            ,
            <given-names>Y.-H.</given-names>
          </string-name>
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>Y.-N.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
          </string-name>
          , P.-J. Cheng,
          <article-title>Based on machine learning for personalized skin care products recommendation engine</article-title>
          , in: 2020
          <source>International Symposium on Computer, Consumer and Control (IS3C)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>460</fpage>
          -
          <lpage>462</lpage>
          . doi:
          <volume>10</volume>
          .1109/IS3C50286.
          <year>2020</year>
          .
          <volume>00125</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Nakajima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Honma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Aoshima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Akiba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Masuyama</surname>
          </string-name>
          ,
          <article-title>Recommender system based on user evaluations and cosmetic ingredients</article-title>
          ,
          <source>in: 2019 4th International Conference on Information Technology (InCIT)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>22</fpage>
          -
          <lpage>27</lpage>
          . doi:
          <volume>10</volume>
          .1109/INCIT.
          <year>2019</year>
          .
          <volume>8912051</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>R. S</surname>
          </string-name>
          , H. S,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jayasakthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D. A</given-names>
            ,
            <surname>K. Latha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gopinath</surname>
          </string-name>
          ,
          <article-title>Cosmetic product selection using machine learning</article-title>
          ,
          <source>2022 International Conference on Communication, Computing and Internet of Things (IC3IoT)</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . URL: https://api.semanticscholar.org/CorpusID:248753814.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          , CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          ). arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Y. W.</given-names>
            <surname>Teh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Welling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Osindero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>Energy-based models for sparse overcomplete representations</article-title>
          ,
          <source>J. Mach. Learn. Res</source>
          .
          <volume>4</volume>
          (
          <year>2003</year>
          )
          <fpage>1235</fpage>
          -
          <lpage>1260</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <article-title>How to train your energy-based models, arXiv preprint (</article-title>
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2101.03288.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          , S. Chopra,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hadsell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ranzato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>A tutorial on energy-based learning</article-title>
          ,
          <source>Predicting structured data 1</source>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Skillsmuggler</surname>
          </string-name>
          , Amazon ratings dataset,
          <year>2024</year>
          . URL: https://www.kaggle.com/datasets/skillsmuggler/ amazon-ratings, accessed:
          <fpage>2024</fpage>
          -08-26.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>C.</given-names>
            <surname>Feeds</surname>
          </string-name>
          , Amazon usa beauty products dataset,
          <year>2024</year>
          . URL: https://data.world/crawlfeeds/ amazon-usa
          <article-title>-beauty-products-dataset</article-title>
          , accessed:
          <fpage>2024</fpage>
          -08-26.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>I.</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <article-title>Decoupled weight decay regularization</article-title>
          ,
          <source>arXiv preprint arXiv:1711.05101</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Kiros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>Layer normalization</article-title>
          ,
          <source>arXiv preprint arXiv:1607.06450</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mustafa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kolesnikov</surname>
          </string-name>
          , L. Beyer,
          <article-title>Sigmoid loss for language image pre-training</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF International Conference on Computer Vision</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>11975</fpage>
          -
          <lpage>11986</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>I.</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          , Sgdr:
          <article-title>Stochastic gradient descent with warm restarts</article-title>
          ,
          <source>arXiv preprint arXiv:1608.03983</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>S.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Muennighof</surname>
          </string-name>
          , C-pack:
          <article-title>Packaged resources to advance general chinese embedding</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2309</volume>
          .
          <fpage>07597</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>N.</given-names>
            <surname>Muennighof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tazi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Magne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          , Mteb:
          <article-title>Massive text embedding benchmark</article-title>
          ,
          <source>arXiv preprint arXiv:2210.07316</source>
          (
          <year>2022</year>
          ). URL: https://arxiv.org/abs/2210.07316. doi:
          <volume>10</volume>
          .48550/ARXIV. 2210.07316.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>