Beauty Beyond Words: Explainable Beauty Product
                         Recommendations Using Ingredient-Based Product
                         Attributes
                         Siliang Liu1,* , Rahul Suresh1 and Amin Banitalebi-Dehkordi1
                         1
                             Amazon BeautyTech


                                      Abstract
                                      Accurate attribute extraction is critical for beauty product recommendations and building trust with customers.
                                      This remains an open problem, as existing solutions are often unreliable and incomplete. We present a system
                                      to extract beauty-specific attributes using end-to-end supervised learning based on beauty product ingredients.
                                      A key insight to our system is a novel energy-based implicit model architecture. We show that this implicit
                                      model architecture offers significant benefits in terms of accuracy, explainability, robustness, and flexibility.
                                      Furthermore, our implicit model can be easily fine-tuned to incorporate additional attributes as they become
                                      available, making it more useful in real-world applications. We validate our model on a major e-commerce
                                      skincare product catalog dataset and demonstrate its effectiveness. Finally, we showcase how ingredient-based
                                      attribute extraction contributes to enhancing the explainability of beauty recommendations.

                                      Keywords
                                      attribute extraction, beauty recommendation, ingredient analysis, explainability


                         1. Introduction
                         The value of the global beauty and personal care market is estimated to be over $646 billion in 2024 [1].
                         Product discovery and trust are two of the biggest considerations in Beauty customers’ shopping
                         journeys in e-commerce stores. Many factors contribute to these problems, such as lack of personalized
                         recommendations, inaccurate or incomplete product benefit and/or ingredient information, lack of
                         targeted curation, etc. Having such information accurately listed in the product catalogue is particularly
                         important for Beauty category of products, as they are topically applied to the skin. Manual curation
                         and sanitization of such metadata is possible at small scales. However, for larger e-commerce stores,
                         with a large portfolio of products, it will be impractical to rely on manual annotation.
                            The primary objective of our work is to enhance the beauty shopping experience by automatically and
                         accurately extracting beauty attributes at scale. These attributes not only aid customers in comparing
                         and refining product choices but also foster trust in the e-commerce stores. Furthermore, the extracted
                         attributes contribute to building more explainable beauty recommendations, which empower customers
                         to make informed purchasing decisions.
                            We propose a robust and scalable learning-based solution capable of predicting beauty attributes from
                         product ingredients. To achieve this, we integrate an energy-based implicit strategy to extract 5 skin
                         types, 11 skin concerns, and 17 attributes commonly preferred across beauty products, as elaborated in
                         Section 9.1. In summary, the key benefits of our proposed model are:

                                • Improved accuracy and precision compared to the alternatives
                                • Explainability through analysis of the attention weights (§5.4)
                                • Robustness in a low-resource regime via implicit data augmentation (§5.5)
                                • Flexibility when finetuning previously trained models on new labels (§6.2)

                         Bari’24: Workshop on Strategic and Utility-aware REcommendations held in conjunction with the 18th ACM Conference on
                         Recommender Systems (RecSys), 2024, in Bari, Italy.
                         *
                           Corresponding author.
                         $ celineli@amazon.com (S. Liu); surerahu@amazon.com (R. Suresh); aminbt@amazon.com (A. Banitalebi-Dehkordi)
                          0009-0007-4561-7548 (S. Liu)
                                      © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
   To the best of our knowledge, there has been no prior study on the extraction of beauty-specific
attributes based on product ingredients. Our contributions are outlined as follows:

    • We introduce a novel energy-based implicit model for extracting beauty attributes from product
      ingredients and the title. We define implicit vs. explicit models in Section 3.
    • Our proposed approach is assessed using skincare products from a major e-commerce store. We
      demonstrate its superiority over traditional keyword-based solutions and an explicit classifier
      baseline on a test dataset annotated by beauty domain experts.
    • We document and extensively discuss the key algorithmic and architectural features that con-
      tribute to explainability, robustness, and flexibility of our proposed model.
    • As a use-case study, we illustrate how ingredient-based extracted attributes can enhance the
      development of explainable beauty recommendations in Section 7.


2. Related Works
Attribute Value Extraction The problem of product attribute extraction in e-commerce is tradi-
tionally solved using named entity recognition (NER). NER approaches typically use beginning-inside-
outside (BIO) tagging [2, 3] to segment texts. However, NER-based approaches exhibit substantial
limitations due to their reliance on predefined entity types. This rigidity makes it difficult to scale
in dynamic environments where attributes are numerous and constantly evolving, such as in beauty
product recommendations. Certain research also models the attribute extraction task as a sequential
tagging problem [4, 5] using CRF and BiLSTM. [6] describes a method that extracts attributes using a
parameterized decoder with pretrained attribute embeddings, through a hypernetwork and a Mixture-
of-Experts (MoE) module. [7] also model the attribute to make the prediction task more scalable. Our
work is similar to the solution proposed in [7], which uses BERT and Bi-LSTMs to model semantic
relations between attribute and product titles on a large-scale dataset. However, the deep learning
modules in [7] are primarily used as components in the NER pipeline and the outputs of the model are
still the BIO tags. Our work is different in that our proposed model directly outputs the attribute values
and the architectural design choices are heavily guided by explainability, robustness, and flexibility.
   In the direction of classification tasks, recent advancements utilize multitask framework and multi-
modality [8, 9, 10]. Furthermore, these models utilize parameter sharing across different attribute
prediction tasks, reducing the model’s complexity and encouraging generalization. Each attribute has its
own output layer, allowing the network to predict multiple attributes simultaneously. On the other hand,
prior works have demonstrated that incorporating an implicit method [11, 12] offers unique benefits.
In particular, when treating product attribute extraction as an implicit classification problem—where
attributes themselves are also part of the input—the model can focus on specific attributes to extract
from the product description. This approach helps the model learn more meaningful and relevant
embeddings from the input which leads to more accurate attribute value extraction.

Beauty Product Recommendation Extant literature provides limited research on beauty product
recommendation that incorporates ingredient analysis [13, 14]. [15] directly uses an ingredient-
concern mapping table to provide solutions for users of various skin conditions detected by an object
detection computer vision model. However, this mapping table is often supplied by a third party where
mappings are constructed independently for each ingredient without accounting for the order and the
interactions with other ingredients, leading to inflexible rule-based recommendation methods. [16]’s
approach extracts ingredient efficacy based on user reviews and recommends products containing those
ingredients for customers across various age groups. Although this method relies on user-generated
content, it does not align with our fact-based approach, making it inapplicable to our use-case scenario.
[17] employs a method based on ingredient similarity using one-hot encoding to recommend products
given a user’s past purchase. However, this work does not leverage ingredient data to predict targeted
skin types and concerns directly, which is the focus of our work.
                                                Query Attributes                      Predict Attributes

                                            Dry Skin, Oily Skin, Acne, Sagging,
                                            Dark Spot, Cruelty-Free, ...


                                                                                       Encoder Layer


                                                                                                       Self-Attention
                                                                                        Transformer


                                                                                                        Multi-head
                                                                                                                        Output
                                Extract Product Information                                                             Probability
                         Ingredients—Snail Secretion Filtrate, Betaine,
                          Butylene Glycol, 1,2-Hexanediol, Sodium Polyacrylate, ...
                         Title—COSRX Snail Mucin 96% Power Repairing Essence
                                                                                       xN-1               x1


Figure 1: Overview of beauty product extraction workflow and the BT-BERT architecture. Our model is identical
to the BERT Transformer [18] except in the last layer—the initial N-1 layers remain unmodified. We remove the
final MLP from the last layer of the Transformer encoder and directly use the self-attention values to formulate
the output probability.


3. System Overview
We approach the beauty attribute extraction problem as a supervised multi-label classification task. Our
proposed solution features a bidirectional Transformer encoder network similar to BERT [18], with a
slight modification applied to the last attention layer as summarized in Algorithm 1. It is important to
note that the network does not use the feed-forward layers in the last Transformer encoder block and
does not have any additional classifier modules commonly used in downstream learning tasks. Instead,
the logits are directly calculated from the attention values. We refer to our model as BeautyTech-BERT,
or BT-BERT for short.
   The model operates by taking as inputs a query attribute, a list of ingredients, and the product title,
and producing the probability for the query attribute. Figure 1 shows an example use-case where the
user is querying six attributes for a product titled “COSRX Snail Mucin Essence". Based on the product
ingredients, the network will make an inference on whether to label the query attributes true or false.
In this case, since Betaine is an ingredient known for its hydrating properties, the network is likely to
predict true for Dry Skin, meaning this product likely benefits those who have a dry skin type.
   Conceptually, our model can be viewed as an energy-based model (EBM) [19, 20, 21], as it assigns a
normalized scalar (or "energy") to each input data point, thereby representing a probability distribution
over the training data. We also denote our model as an implicit model, as it accepts the query attribute
as input and generates a prediction solely for that attribute. This distinguishes it from conventional
multi-label classifiers, where the classifier module and the number of output classes must be explicitly
defined.

Model Input For each product, the query attribute is concatenated with ingredients and title to pass
to the model. Maintaining the original sequence order of the ingredient list is essential, as it reflects the
standard convention of listing higher potency ingredients first. We first tokenize the query label and
pad query tokens up to a length of 3. The product ingredients and title are also tokenized. The entire
sequence is truncated or padded such that the final length is 512. We place the query attribute at the
beginning of the input sequence so that its position is consistent across all input sequences—similar
to the effect of the [CLS] token in BERT when using it in downstream tasks—which is important for
computing the logits.


4. Data Preparation
Our proposed method is a supervised learning approach and thus requires labeled training data. We first
collect a dataset of skincare products from product data available publicly [22, 23]. For each product,
attribute labels were meticulously annotated by domain experts based on years of scientific ingredient
research. An example is shown in Figure 4. Overall, we collected a total of 11580 data points, where
9334 (≈ 80%) are dedicated to training and 2246 (≈ 20%) to evaluation. Figure 5 shows the distribution
of products categorized by product types and attributes in our dataset.
                                  Multilabel Classification Output:                                      Multilabel Classification Output:
                     Dry Skin, Oily Skin, Sagging, Hydration, Fragrance Free,...            Dry Skin, Oily Skin, Sagging, Hydration, Fragrance Free,...

                                                  Sigmoid                                                               Sigmoid
                                                                                          Feedforward
                                             [1st_token_out]                                Network                  Fully Connected


                                             Self-Attention
                                                                                                                      [CLS_out]


                                    Encoder                                                                Encoder

                                              Layer Norm                                                             Layer Norm

                                             Feed Forward                                                           Feed Forward
                                                                 xN-1                                                                     xN
                                              Layer Norm                                                             Layer Norm

                                           Self-Attention                                                         Self-Attention


                  (Sub)word Embeddings   + Segment Embeddings + Positional Embeddings    (Sub)word Embeddings   + Segment Embeddings + Positional Embeddings

           Input: Query Attribute         [CLS]    Ingredients   [SEP]   Title   [PAD]        Input: [CLS]          Ingredients   [SEP]        Title   [PAD]


                                         Implicit Model                                                         Explicit Model

Figure 2: Difference between implicit and explicit models. Left: In implicit models, the model intakes query
attribute together with product ingredients and title. Note that in our case, the output logits come directly
from the self-attention values of the last encoder layer. Right: Explicit models represent the standard way of
fine-tuning the BERT model, where a classifier is attached to the end of the Transformer.


Algorithm 1 BT-BERT Forward Pass
 1: bert_model = AutoModel.from_pretrained(...)
 2:
 3: function forward(input_ids, labels)
 4:    outputs = bert_model(input_ids)
 5:
 6:    # extract the last layer’s attention, e.g., -1
 7:    # attentions are [batch, heads, seqlen, seqlen]
 8:    attentions = outputs["attentions"][-1]
 9:
10:    # summing attention values over all heads
11:    # for the first token attending to itself
12:    # 16 is a hyperparameter multiplication factor
13:    logits = 16 * attentions[:, :, 0, 0].sum(dim=1)
14:
15:    L = binary_cross_entropy_with_logits(logits, labels)
16:
17: end function


5. Experiments
5.1. Training Details
For all experiments, we train the network end-to-end with a batch size of 8 until convergence. We use
the AdamW optimizer [24] with an initial learning rate of 3 × 10−5 . We follow the standard setup
for training Transformer models by splitting the trainable parameters into two categories: decay and
non-decay parameters. Non-decaying parameters are biases and LayerNorm [25] parameters; all other
parameters are weight decayed. We set beta2 = 0.95 to improve training stability as recommended
in [26].
   We explored a few different training recipes but found them to have negligible impact on the final
model performance, including using a cosine annealing learning rate scheduler [27], linear decay
scheduler, and weighted loss for addressing the class imbalance issue.
    Table 1
    Model Performance: Explicit vs. Implicit Approach (BT-BERT)
                            Method           Accuracy   Precision   Recall   F1-Score   Parameters
                            BT-BERT             0.964       0.987    0.958      0.960    109,360,128
                            Explicit Model      0.946       0.954    0.904      0.912    109,975,296
                            Fuzzy Search        0.301       0.287    0.356      0.327              –


5.2. Baseline Solutions
We evaluated our method against two simple baseline solutions: Fuzzy Search and the explicit model
alternative illustrated in Figure 2.

Fuzzy Search This is a straightforward approach of finding keywords based on edit distance and
other heuristics. Specifically, a predefined list of target keywords is established (see Section 9.6) for
each of the 33 attributes. Subsequently, a product is categorized as possessing a particular attribute if
any of the keywords from the corresponding list are detected within the product information.
  We compare to this baseline as an example of highly explainable solution, but we are well aware
that it is not state-of-the-art by any means. By examining a few examples, the limitations of the fuzzy
search approach is immediately apparent. First, fuzzy search is unable to discern complex textual
context. For example, it may overlook the labeling of a product described as free of perfume, silicones,
phthalates, fragrance as ‘Fragrance Free’. Second, it is sensitive to error tolerance threshold. For instance,
despite a product being described as hydra intensive treatment, the method may not assign the attribute
"Hydration" if the error tolerance is set too low.

Explicit Model A common approach for classification tasks often trains an explicit feed-forward
network on top of a pre-trained rich embedding, similar to the approach described in [18]. As a
benchmark, we experimented with this approach, where the model receives product information as
input and outputs the likelihood of the 33 labels. Figure 2 highlights the differences between the implicit
and the explicit models. In the explicit model, the classifier’s output dimension is predefined to be the
same as the number of attributes. For this approach, we use the pre-trained weights and tokenizer of
bge-base-en-v1.5 [28] from HuggingFace. We chose bge-base-en-v1.5 as it is considered the
state-of-the-art text embedding model for retrieval, clustering, reranking tasks in the Massive Text
Embedding Benchmark (MTEB) [29]. As a common practice, we freeze the backbone weights and only
update the classifier parameters for four epochs to avoid catastrophic forgetting. We find that training
end-to-end after four epochs provides the optimal results compared to other configurations.

5.3. Model Results
We evaluate models on the standard classification metrics: Accuracy, Precision, Recall and F1-Score.
Although we report recall and F1-score, we prioritize accuracy and precision as the main evaluation
metrics. A higher precision aligns more closely with our acceptable risk threshold by minimizing the
likelihood of potentially recommending products containing unsuitable ingredients to customers with
particularly sensitive skin. This is important as we envision attribute-based beauty recommendations
as one of the direct applications on this work.
   Table 1 summarizes the results of label prediction across different methods. We observe that both
learning-based methods significantly outperform the fuzzy search baseline, as expected. The implicit
model performs slightly better than the explicit alternative across all evaluation metrics. Aside from
the quantitative edge, the implicit model offers other qualitative advantages that the explicit model
does not. We discuss this extensively in the following sections.
    Table 2
    Attention analysis for ‘Acne’, ‘Fine Lines and Wrinkles’, and ‘Hydration‘ attributes

            Attribute             High Attention Sub-word Tokens                       Ingredient
                                  ‘sal’, ‘#ic’, ‘#yl’, ‘#ic’, ‘acid’                   Salicylic Acid
                                  ‘alcohol’                                            Alcohol
            Acne
                                  ‘benz’, ‘#oy’, ‘#l’, ‘per’, ‘#oxide’                 Benzoyl Peroxide
                                  ‘beta’, ‘#ine’                                       Betaine
                                  ‘#pher’                                              Tocopheryl Acetate
                                  ‘#ito’                                               Palmitoyl
            Lines & Wrinkles
                                  ‘baku’, ‘#chio’                                      Bakuchiol
                                  ‘re’, ‘#tino’                                        Retinol
                                  ‘#yal’, ‘#uron’, ‘ate’                               Sodium Hyaluronate
            Hydration             ‘#ly’, ‘#cer’, ‘#in’                                 Glycerin
                                  ‘ni’, ‘#ac’, ‘#ina’, ‘#mide’                         Niacinamide


    Table 3
    Attention analysis for product attributes

       Attribute                  High Attention Sub-word Tokens                     Corresponding Ingredient

       Product: PanOxyl AM Oil Control Moisturizer, NEW Sheer Formula, Absorbs Excess Oil and
       Reduces Shine, with Mineral Sunscreen for Acne Prone and Oily And All Skin Tones - 1.7 oz
       Dry Skin                   ‘#yal’, ‘#uron’, ‘ate’                             Sodium Hyaluronate
       Sensitive Skin             ‘#olo’                                             Bisabolol
       Dark Circles               ‘but’, ‘#yl’, ‘#ic’, ‘#yla’, ‘#te’                 Butyloctyl Salicylate
       Product: Good Molecules BHA Clarifying Gel Cream - Facial Cream with Salicylic Acid, Green
       Tea, and Gotu Kola Extract Soothe and Hydrate - Skincare for Face
       Acne                       ‘sal’, ‘#ic’, ‘#yl’, ‘#ic’, ‘acid’                 Salicylic Acid
       Dry Skin                   ‘#ly’, ‘#cer’, ‘#in’                               Glycerin
       Redness                    ‘allan’, ‘#to’                                     Allantoin
       Product: I DEW CARE Moisturizer Face Cream - Chill Kitten | Moringa Seed, Prickly Pear, Heartleaf
       Extract, 24 Hour, Aloe Vera Gel for Dry, Red Skin, Cactus Oil-free, 1.69 Fl Oz
       Redness                    ‘tea’, ‘ni’, ‘#ac’, ‘#ina’, ‘#mide’                Green Tea, Niacinamide
       Fine Lines and Wrinkles    ‘as’, ‘#cor’, ‘#bic’                               Ascorbic Acid


5.4. Explainability
In this section, we analyze the input tokens with high attention values in the second last layer of the
Transformer encoder block. Top tokens are obtained using Algorithm 2.
   In Table 2, we choose three query attributes—‘Acne’, ‘Fine Lines and Wrinkles’ and ‘Hydration’—and
show that tokens with high attention values are ingredients that address the target skin concerns. This
means that our model has learned the effects of different ingredients and how they are associated to
different skin concerns and skin types. We chose these labels, as they are the most popular filter criteria
for beauty products.
   We also assess the high attention tokens for each predicted label of a single product and show that
these tokens are different across attributes of a given product. This means that our model has learned to
pay attention to different tokens when it is being asked about different attributes. Table 3 demonstrates
some of the examined products.
5.5. Robustness in Low Data Regime
In this section, we present empirical evidence demonstrating the robust performance of BT-BERT even
when the volume of training data is limited. Figure 6 shows the validation accuracy across various
degrees of data scarcity, namely when the model is trained using the full dataset, as well as 1/2, 1/4, and
1/8 of the full training corpus. In each training run, we systematically down-sample the training set
and keep the validation set constant, i.e., it still contains the same 2246 products.
   Note that for the 1/8 training, the model is trained with only 1167 products and yet still the validation
accuracy only drops by less than 1.25%. We hypothesize that the robust performance of BT-BERT in
such a low-resource regime can be attributed to the fact that it is an energy-based implicit model, as
opposed to an explicit classifier. The same scaling pattern is observed in other energy-based models
[12]. Additionally, we attribute part of such robustness to the implicit data augmentation strategy
employed in training—specifically, each product is paired with all 33 query attributes, exposing our
model to diverse input contexts. We have not yet fully characterize the scaling behaviors of implicit
and explicit models. It is possible that with improved training techniques, the explicit approach can
close the gap in low-resource regimes.


6. Discussion
6.1. Does the choice of logits transformation matter?
Our early experiments indicate that scaling the probability linearly with 16 achieves better results
than not employing it. We explored an alternative scaling formulation using 𝑓 (𝑥) = log(𝑥/(1 − 𝑥)),
where 𝑥 represents the attention value of the first query token from all attention heads. The design
is inspired by probability theory, where 𝑥/(1 − 𝑥) is commonly referred to as the odds or odds ratio
when 𝑥 is a probability. Taking the logarithm of the odds ratio is a common transformation used in
logistic regression to convert probability into logits.
   Additionally, we experimented with using the summation and average of the attention values from
the first three query tokens as 𝑥 before applying the log transformation. However, these variations did
not produce better results. Ultimately, we chose the linear scaling method of multiplying by 16 due to
its simplicity and slightly faster computation times.

6.2. Finetuning on Additional Attributes
In this section, we discuss the adaptability of implicit models in incorporating new labeled attributes as
they become available. We design a scenario mirroring real-world dynamics, where an initial dataset
comprises 30 out of 33 labels, with the remaining 3 labels introduced in a subsequent release. Such
scenarios are commonplace in the beauty industry, where emerging trends and evolving consumer
preferences necessitate the addition of new product attributes. For instance, the advent of clean beauty
as a trend in 2023 [30] underscores the relevance of this work. Through comprehensive analysis and
experimentation, we assess and highlight the implicit model’s efficacy in seamlessly incorporating new
attributes.
   We removed the labels for ‘Fragrance Free’ (generally-preferred), ‘Oily Skin’ (skin type), and ‘Acne’
(skin concern) from the full dataset (𝒟full ) and trained a model on the remaining 30 labels (𝒟30 ). Then,
we add back the removed labels and finetune the previously trained model with the complete dataset
for only one epoch. Table 4 shows the validation accuracies before and after the finetuning step. When
finetuning on only the three additional labels (𝒟3 ), we observe a significant drop in validation accuracy
for the existing 30 labels in the validation set. We believe this is due to the catastrophic forgetting
problem and could potentially be alleviated by using more advanced finetuning algorithms [31, 32, 33].
   When finetuning with 𝒟full , we observe only a slight drop of performance when predicting the
existing 30 labels, but the accuracy for the new labels is drastically improved. It is important to note
that this finetuning procedure is impossible when using explicit models, since the number of output
classes is different and therefore the classifier must be replaced and retrained.
    Table 4
    Model performance on partially held out data. In this experiment, we evaluate the model’s ability to
    incorporate additional labels when they become available.
                                                 Train 𝒟30   Finetune 𝒟3   Finetune 𝒟full
                             Acc. on 30 labels    93.9%         82.4%          93.4%
                             Acc. on 3 labels     59.6%         94.7%          93.5%


6.3. Alternating Query Attribute Tokens
In this section, we highlight the benefit of our implicit model during inference time. First, we show that
it can handle similar but not identical query attributes. We take ‘Fine Lines and Wrinkles’ as an example
and replace the query attribute with just a single word ‘Lines’ for a commonly available anti-wrinkle
renewal skin cream. We use Algorithm 2 to extract the high attention tokens and track how they change
when the attribute tokens are replaced.
   We observed a number of overlapping tokens especially those addressing lines and wrinkles—
‘#chio’, ‘pu’, ‘soy’, ‘lines’, ‘baku’, ‘re’, and ‘#tino’. We also identified non-overlapping
tokens such as water, after, cleansing, fine, cart, and wr. It is important to note that the non-
overlapping tokens, such as ’water’ and ’cleansing,’ are more general and not as directly relevant to the
specific skin concern. We believe that this approach can help us better understand the ingredients and
their target uses.


7. Explainable Beauty Recommendation and Customer Understanding
Explainable Beauty Recommendation One critical application of ingredient-based attribute ex-
traction lies in delivering explainable recommendations to beauty customers.
   In the ever-evolving beauty industry, where personalization is key, transparency and clarity in
product suggestions are vital. As illustrated in Figure 3, skincare recommendations are made using a
point-wise approach, where each product is individually assessed based on the customer’s specific skin
type and concerns. Here, the customer has selected “oily” skin and concerns of “acne” and “dullskin”.
The recommended products not only contain ingredients intended to address these issues but are
also compatible with the customer’s stated skin type, enhancing the trustworthiness and relevance
of each suggestion. Each product is annotated with its predicted target skin concerns and skin types,
alongside the ingredients intended to address those concerns, using Algorithm 2 discussed in Section 5.4.
For example, Salicylic Acid is highlighted for its anti-acne properties across various product types
like cleansers, pads, and serums. Furthermore, the system strategically omits products with oil-based
ingredients that could exacerbate oily skin, ensuring that recommendations are appropriate for the
user’s concerns.
   By providing fact-based explanations for recommended products, this approach offers clear and
transparent justifications for the recommendations. As customers purchase and use products with
effective ingredients, they are more likely to achieve the desired skin results, fostering long-term trust
and encouraging repeat engagement with the e-commerce store. This method not only empowers
customers to make informed purchasing decisions but also strengthens their trust in the recommendation
system. This approach is versatile and can be applied broadly across most beauty catalogs, including
haircare and makeup, where ingredients stay on the skin for extended periods. In the context of strategic
and utility-aware recommendations, explainability is crucial for aligning personalized suggestions with
both individual needs and broader objectives. This alignment ultimately enhances customer confidence,
satisfaction, and long-term audience growth.

Customer understanding Conversely, customer propensity toward specific attributes—such as
preferred skin type, skin concerns, and ingredient preferences—can be inferred from their past pur-
chases. Our future work focuses on understanding customer skin types and concerns by building upon
Figure 3: Skincare recommendation with explainable ingredient for each attribute.


existing attribute extraction methodologies. This advancement will enable further refinement of our
recommendation algorithms, particularly in the ranking layer.


8. Conclusion
We present an energy-based implicit model for extracting beauty-specific attributes trained using
end-to-end supervised learning. We empirically show that the implicit approach outperforms traditional
explicit classifiers in terms of accuracy, precision, and other evaluation metrics. Aside from better
performance, we show that the implicit model is explainable, robust to low-data scenarios, and easy to
incorporate new attributes as they become available. Using the explainability feature of our model, we
propose novel ways to use the predictions without additional training by comparing and contrasting
the high value tokens across different products and attributes. We have not yet fully characterized the
limits of the model’s capabilities. Currently, we only qualitatively identify the high attention value
tokens and discuss how they are related to the specific skin concerns and skin types in our attention
analysis. We wish to better quantify the correlations between all predicted ingredients and the attributes.
Although our work focuses on beauty attribute extraction, we believe the simplicity of our approach
and comprehensiveness of our analysis provide a solid foundation for future research in designing more
capable and explainable models in all domains of machine learning. In future work, we will validate the
generated attributes within downstream recommendation systems and conduct a thorough evaluation.
Furthermore, we will assess the impact of explainability for end users through A/B testing.
9. Appendices
9.1. Labels for Skincare Products
We define 33 labels for skincare products that include 5 skin types, 11 skin concerns, and 17 attributes
that are generally preferred across beauty products.

    • Target skin types: Dry Skin, Normal Skin, Oily Skin, Combination Skin, Sensitive Skin
    • Target skin concerns: Acne, Hydration, Pores, Fine Lines and Wrinkles, Sagging, Dark Spots,
      Dullness, Redness, Uneven Texture, Dark Circles, Puffiness
    • General preferred beauty attributes: 100% Vegan, Cruelty Free, Fragrance Free, Hypoallergenic,
      Paraben Free, Mineral Oil Free, Palm Oil Free, Oil Free, Alcohol Free, Sulphate Free, Gluten Free,
      Silicone Free, Phthalate Free, Talc free, Non Comedogenic, Aluminum Free, Fluoride Free.

9.2. Product information and Labels
Each product comes with a title, list of ingredients, and a Boolean label for each attribute. An example
is shown in Figure 4.


        Figure 4: Sample Pandas dataframe with product ingredient list (Full Ingredients) and title
        (item_name) for each product.


9.3. Training data label distribution

                                                                                                                                                                           Label Distribution across Product Type
              2,400

              2,200                                                                                                                                                                                                                                                                                                                                                                        Product Type
                                                                                                                                                                                                                                                                                                                                                                                                 ASTRINGENT_SUBSTANCE
              2,000                                                                                                                                                                                                                                                                                                                                                                              SKIN_CARE_AGENT
                                                                                                                                                                                                                                                                                                                                                                                                 SKIN_CARE_PRODUCT
              1,800                                                                                                                                                                                                                                                                                                                                                                              SKIN_CLEANING_AGENT
                                                                                                                                                                                                                                                                                                                                                                                                 SKIN_CLEANING_WIPE
              1,600                                                                                                                                                                                                                                                                                                                                                                              SKIN_EXFOLIANT
                                                                                                                                                                                                                                                                                                                                                                                                 SKIN_MOISTURIZER
              1,400                                                                                                                                                                                                                                                                                                                                                                              SKIN_PROTECTANT
          Count


                                                                                                                                                                                                                                                                                                                                                                                                 SKIN_SERUM
              1,200                                                                                                                                                                                                                                                                                                                                                                              SKIN_TREATMENT_MASK
                                                                                                                                                                                                                                                                                                                                                                                                 SUNSCREEN
              1,000

                  800

                  600

                  400

                  200

                   0
                        100% Vegan
                                     Fragrance Free

                                                      Hypoallergenic

                                                                       Paraben Free
                                                                                      Mineral Oil Free

                                                                                                         Palm Oil Free
                                                                                                                         Alcohol Free

                                                                                                                                        Sulfate Free
                                                                                                                                                       Gluten Free

                                                                                                                                                                     Silicone Free

                                                                                                                                                                                     Phthalate Free
                                                                                                                                                                                                      Talc Free

                                                                                                                                                                                                                  Non Comedogenic
                                                                                                                                                                                                                                    Oil Free
                                                                                                                                                                                                                                               Aluminum Free

                                                                                                                                                                                                                                                               Flouride Free
                                                                                                                                                                                                                                                                               Dry Skin

                                                                                                                                                                                                                                                                                          Oily Skin
                                                                                                                                                                                                                                                                                                      Sensitive Skin
                                                                                                                                                                                                                                                                                                                       Normal Skin

                                                                                                                                                                                                                                                                                                                                     Combination Skin
                                                                                                                                                                                                                                                                                                                                                        Dark Spots

                                                                                                                                                                                                                                                                                                                                                                     Redness
                                                                                                                                                                                                                                                                                                                                                                               Dullness

                                                                                                                                                                                                                                                                                                                                                                                          Acne
                                                                                                                                                                                                                                                                                                                                                                                                 Fine Lines and Wrinkles
                                                                                                                                                                                                                                                                                                                                                                                                                           Sagging

                                                                                                                                                                                                                                                                                                                                                                                                                                     Dark Circles
                                                                                                                                                                                                                                                                                                                                                                                                                                                    Uneven Textures

                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Pores
                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Puffiness
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Hydration
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Cruelty Free


                                                                                                                                                                                                                                                                       Labels

Figure 5: Label Distribution across Product Type in our dataset. The height of each bar indicates the number of
products associated with the respective attribute. For instance, there are a total of 1809 out of 11580 products for
Dry Skin.
                                                   0.98

                                                   0.97

                                                   0.96

                                                   0.95


                             Validation Accuracy
                                                   0.94

                                                   0.93

                                                   0.92
                                                                                Dataset Size
                                                   0.91
                                                                                  Full
                                                                                  Half
                                                   0.90
                                                                                  Quarter
                                                                                  One-Eighth
                                                   0.89

                                                   0.88

                                                   0.87
                                                          0   20,000   40,000       60,000     80,000
                                                                       Steps


Figure 6: Validation accuracy training on various sizes of dataset


9.4. Learning Curves for Robustness in Low Data Regime Experiment
9.5. Algorithm for Key Token Extraction Based on Attention Values

Algorithm 2 Key Token Extraction Based on Attention Values
 1: function GetTopAttentionTokens(input_ids, attentions, topk)
 2:    # input_ids is a tensor of shape (seqlen,)
 3:    # attentions is a tensor of shape (heads, seqlen, seqlen)
 4:
 5:    # get index of top-k attention per row across all heads
 6:    topk_indices = attentions.flatten(0, 1).topk(topk).indices
 7:    topk_indices = topk_indices.unique()
 8:
 9:    # convert col indices to token strings
10:    topk_tokens = convert_ids_to_tokens(input_ids[topk_indices])
11:
12:    # remove non-meaningful tokens
13:    TO_REMOVE = [‘,’, ‘[CLS]’, ‘[SEP]’, ‘(’, ‘)’, ‘[PAD]’]
14:    topk_tokens = [k for k in topk_tokens if k not in TO_REMOVE]
15:
16: end function


9.6. FuzzySearch Attribute Key Words
For FuzzySearch method, We define keywords for each of the 33 labels.
    • Dry Skin: "dry", "all", "universal".
    • Normal Skin: "normal", "all", "universal".
    • Oily Skin: "oil", "all", "universal".
    • Combination Skin: "combination", "all", "universal".
    • Sensitive Skin: "sensitive", "all", "universal".
    • Acne: "anti acne", "blackheads", "salicylic acid", "Glycolic Acid", "Benzoyl Peroxide", "breakouts
      treatment", "acne preventing", "skin clarifying".
    • Hydration: "dehydration", "dryness", "hydrating", "rehydrate", "soothing", "moisturizing", "nour-
      ishing", "softening", "replenishing".
    • Pores: "pore", "oil control".
  • Fine Lines and Wrinkles: "wrinkle", "anti-aging", "anti aging", "anti-aging", "wrinkle treatment",
    "wrinkles treatment", "skin cell renewal", "skin-cell-renewal", "plumping", "refine skin texture",
    "refine-skin-texture", "repairing", "fine line", "anti aging", "plumping", "skin cell renewal", "replen-
    ishing", "octinoxate", "octisalate", "avobenzone".
  • Sagging: "firming", "wrinkle", "anti aging", "skin cell renewal".
  • Dark Spots: "hyperpigmentation", "melasma", "dyschromia", "brown spot", "age spot", "dark
    spot", "brightening", "even toning", "color correction", "lightening", "antioxidant", "oxygenating",
    "whitening".
  • Dullness: "even toning", "dull skin", "lightening", "brightening", "colour correction", "skin cell
    renewal", "rejuvenating", "exfoliating", "plumping".
  • Redness: "redness", "anti inflammatory", "soothening", "soothing", "redness reduction", "redness
    removal", "oxygenating".
  • Uneven Texture: "uneven texture", "uneven skin".
  • Dark Circles:"puffiness", "dark circles", "color correction", "lightening", "antioxidant", "radiant
    skin", "brightening".
  • 100% Vegan: "vegetarian", "plantbased", "vegan", "animalbyproductfree".
  • Cruelty Free: "crueltyfree".
  • Fragrance Free: "unscented", "fragrancefree".
  • Hypoallergenic: "preservativefree", "latexfree", "chemicalfree", "formaldehydefree", "slesfree".
  • Paraben Free: "preservativefree", "slesfree", "slsfree", "parabenfree".
  • Mineral Oil Free: "palmoilfree", "mineraloilfree".
  • Palm Oil Free: "palmoilfree".
  • Oil Free: "oilfree", "palmoilfree", "mineraloilfree".
  • Alcohol Free: "alcoholfree".
  • Sulphate Free: "sulfatefree".
  • Gluten Free: "glutenfree".
  • Silicone Free: "siliconefree".
  • Phthalate Free: "phthalatefree".


References
[1] L. Wood, Beauty & personal care-worldwide, https://www.statista.com/outlook/cmo/
    beauty-personal-care/worldwide, 2024. Accessed: 2024-04.
[2] L. Chiticariu, R. Krishnamurthy, Y. Li, F. Reiss, S. Vaithyanathan, Domain adaptation of rule-
    based annotators for named-entity recognition tasks, in: Proceedings of the 2010 Conference on
    Empirical Methods in Natural Language Processing, Association for Computational Linguistics,
    Cambridge, MA, 2010, pp. 1002–1012. URL: https://aclanthology.org/D10-1098.
[3] D. Putthividhya, J. Hu, Bootstrapped named entity recognition for product attribute extraction, in:
    EMNLP, 2011, pp. 1557–1567. URL: http://www.aclweb.org/anthology/D11-1144.
[4] Z. Huang, W. Xu, K. Yu, Bidirectional LSTM-CRF models for sequence tagging, CoRR
    abs/1508.01991 (2015). URL: http://arxiv.org/abs/1508.01991. arXiv:1508.01991.
[5] G. Zheng, S. Mukherjee, X. L. Dong, F. Li, Opentag: Open attribute value extraction from product
    profiles, CoRR abs/1806.01264 (2018). URL: http://arxiv.org/abs/1806.01264. arXiv:1806.01264.
[6] J. Yan, N. Zalmout, Y. Liang, C. Grant, X. Ren, X. L. Dong, Adatag: Multi-attribute value extraction
    from product profiles with adaptive decoding, arXiv preprint arXiv:2106.02318 (2021).
[7] H. Xu, W. Wang, X. Mao, X. Jiang, M. Lan, Scaling up open tagging from tens to thousands:
    Comprehension empowered attribute value extraction from product title, in: Proceedings of the
    57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 5214–5223.
[8] A. Cardoso, F. Daolio, S. Vargas, Product characterisation towards personalisation: Learning
    attributes from unstructured data to recommend fashion products, in: Proceedings of the 24th
    ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 80–89.
 [9] Q. Wang, L. Yang, J. Wang, J. Krishnan, B. Dai, S. Wang, Z. Xu, M. Khabsa, H. Ma, SMARTAVE: Struc-
     tured multimodal transformer for product attribute value extraction, in: Y. Goldberg, Z. Kozareva,
     Y. Zhang (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2022, Associa-
     tion for Computational Linguistics, Abu Dhabi, United Arab Emirates, 2022, pp. 263–276. URL: https:
     //aclanthology.org/2022.findings-emnlp.20. doi:10.18653/v1/2022.findings-emnlp.20.
[10] F. T. Dezaki, H. Arora, R. Suresh, A. Banitalebi-Dehkordi, Automated material properties extraction
     for enhanced beauty product discovery and makeup virtual try-on, arXiv preprint arXiv:2312.00766
     (2023).
[11] Y. Du, I. Mordatch, Implicit generation and modeling with energy-based models, in: H. Wallach,
     H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, R. Garnett (Eds.), Advances in Neural
     Information Processing Systems, volume 32, Curran Associates, Inc., 2019.
[12] P. Florence, C. Lynch, A. Zeng, O. A. Ramirez, A. Wahid, L. Downs, A. Wong, J. Lee, I. Mordatch,
     J. Tompson, Implicit behavioral cloning, in: 5th Annual Conference on Robot Learning, 2021.
[13] P. Afshar, J. Yeon, A. Levitskyy, R. Suresh, A. Banitalebi-Dehkordi, Improving the accuracy of beauty
     product recommendations by assessing face illumination quality, arXiv preprint arXiv:2309.04022
     (2023).
[14] T. Alashkar, S. Jiang, S. Wang, Y. Fu, Examples-rules guided deep neural network for makeup
     recommendation, in: Proceedings of the AAAI conference on artificial intelligence, volume 31,
     2017.
[15] H.-H. Li, Y.-H. Liao, Y.-N. Huang, P.-J. Cheng, Based on machine learning for personalized skin
     care products recommendation engine, in: 2020 International Symposium on Computer, Consumer
     and Control (IS3C), 2020, pp. 460–462. doi:10.1109/IS3C50286.2020.00125.
[16] Y. Nakajima, H. Honma, H. Aoshima, T. Akiba, S. Masuyama, Recommender system based on
     user evaluations and cosmetic ingredients, in: 2019 4th International Conference on Information
     Technology (InCIT), 2019, pp. 22–27. doi:10.1109/INCIT.2019.8912051.
[17] R. S, H. S, K. Jayasakthi, S. D. A, K. Latha, N. Gopinath, Cosmetic product selection using machine
     learning, 2022 International Conference on Communication, Computing and Internet of Things
     (IC3IoT) (2022) 1–6. URL: https://api.semanticscholar.org/CorpusID:248753814.
[18] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers
     for language understanding, CoRR abs/1810.04805 (2018). arXiv:1810.04805.
[19] Y. W. Teh, M. Welling, S. Osindero, G. E. Hinton, Energy-based models for sparse overcomplete
     representations, J. Mach. Learn. Res. 4 (2003) 1235–1260.
[20] Y. Song, D. P. Kingma, How to train your energy-based models, arXiv preprint (2021). URL:
     https://arxiv.org/abs/2101.03288.
[21] Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato, F. Huang, A tutorial on energy-based learning,
     Predicting structured data 1 (2006).
[22] Skillsmuggler, Amazon ratings dataset, 2024. URL: https://www.kaggle.com/datasets/skillsmuggler/
     amazon-ratings, accessed: 2024-08-26.
[23] C. Feeds, Amazon usa beauty products dataset, 2024. URL: https://data.world/crawlfeeds/
     amazon-usa-beauty-products-dataset, accessed: 2024-08-26.
[24] I. Loshchilov, F. Hutter, Decoupled weight decay regularization, arXiv preprint arXiv:1711.05101
     (2017).
[25] J. L. Ba, J. R. Kiros, G. E. Hinton, Layer normalization, arXiv preprint arXiv:1607.06450 (2016).
[26] X. Zhai, B. Mustafa, A. Kolesnikov, L. Beyer, Sigmoid loss for language image pre-training, in:
     Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 11975–11986.
[27] I. Loshchilov, F. Hutter, Sgdr: Stochastic gradient descent with warm restarts, arXiv preprint
     arXiv:1608.03983 (2016).
[28] S. Xiao, Z. Liu, P. Zhang, N. Muennighoff, C-pack: Packaged resources to advance general chinese
     embedding, 2023. arXiv:2309.07597.
[29] N. Muennighoff, N. Tazi, L. Magne, N. Reimers, Mteb: Massive text embedding benchmark, arXiv
     preprint arXiv:2210.07316 (2022). URL: https://arxiv.org/abs/2210.07316. doi:10.48550/ARXIV.
     2210.07316.
[30] K. MCGRATH, Did clean beauty go too far?, https://www.allure.com/story/is-clean-beauty-over,
     2023. Accessed: 2024-04.
[31] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, Lora: Low-rank
     adaptation of large language models, arXiv preprint arXiv:2106.09685 (2021).
[32] S.-Y. Liu, C.-Y. Wang, H. Yin, P. Molchanov, Y.-C. F. Wang, K.-T. Cheng, M.-H. Chen, Dora:
     Weight-decomposed low-rank adaptation, arXiv preprint arXiv:2402.09353 (2024).
[33] L. Zhang, A. Rao, M. Agrawala, Adding conditional control to text-to-image diffusion models, in:
     Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3836–3847.