<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Twice - Twitter Content Embeddings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xianjing Liu</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Behzad Golshan</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kenny Leung</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aman Saini</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vivek Kulkarni</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ali Mollahosseini</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jef Mo</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>In this short paper, we describe a new model for learning content-based tweet embeddings that serve to be generically useful as signals for a variety of down-stream predictive tasks. In contrast to prior approaches that only leverage cues from the raw text, we take a holistic approach and propose Twice, a model for learning tweet embeddings that (a) leverages cues beyond the raw text (including media) and (b) attempts to yield representations optimized for overall similarity, a combination of topical, semantic, and engagement similarity. Ofline evaluations suggest that our model yields richer and superior embeddings compared to the benchmark models in the tasks evaluating on both academia dataset and twitter products.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Deep Learning</kwd>
        <kwd>Recommendation</kwd>
        <kwd>Embedding</kwd>
        <kwd>NLP</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Similarity – tweets which are similar in meaning should</title>
        <p>be close in the embedding space. An example pair of
A rich representation of a tweet that captures nuances semantically similar sentences is “The quick brown fox
in meaning is critical for most predictive models at Twit- jumped over the lazy dog” and “The brown fox leaped
ter (including Topics, Health, Recommendations etc.). over the lazy dog”. (b) Topic Similarity – tweets that are
Consequently, there is an urgent need for models that about the same topic should be close in the embedding
can encode or summarize a tweet’s content into a dense space. An example pair here would be “Don Bradman is
representation – a representation that can then be used the greatest cricket player of all time” and “India won the
in various downstream models. Some key surface areas Cricket World Cup” since both tweets are about “Cricket”.
which will be using tweet embeddings include Home (c) Engagement Similarity – tweets that share engagement
Timeline, Notifications, Topics, and potentially Health audiences are deemed similar.
models. An important requirement is that the tweet em- In this paper, we present Twice – a model that
atbedding be generically useful on a variety of predictive tempts to capture the above notions of tweet similarity.
tasks and not necessarily be useful for only a specific In contrast to most prior work which only seeks to
emapriori task. Finally, we expect downstream models to bed raw tweet text using standard pre-trained language
only consume the embeddings, without having to worry models, Twice models tweets holistically leveraging not
about the inner workings of the underlying model. only raw tweet text, but also incorporating cues from the</p>
        <p>Taking a bird’s eye view, the need is simply for a associated media, and hyperlinks. We evaluate Twice on
tweet representation/embedding that captures similarity a suite of ofline benchmark tasks and demonstrate that
between tweets by embedding them in a vector space. our proposed model significantly outperforms several
Specifically, tweets that are “similar” must be close in the baseline approaches.
vector space and tweets that are not “similar” should
ideally be far in this vector space with respect to a suitable
metric. Zooming in, for practical modeling it is useful to 2. Related Work
attempt to operationalize this vague notion of “similar”
and attempt to be more specific here. We can attempt to
capture the following notions of similarity: (a) Semantic</p>
      </sec>
      <sec id="sec-1-2">
        <title>Our work is very closely related to work in the area</title>
        <p>
          of learning sentence embeddings which seeks to learn
dense representations of sentences and capture sentence
similarity. One of the earliest works on learning dense
dDaLt4ioSnR,’2c2o:-lWocoartkesdhowpitohnthDeee3p1sLteAarCnMingInftoerrnSaeatirocnhaalnCdonRfeecroemncme eonn- embeddings of sentences is the work of Le and Mikolov
Information and Knowledge Management (CIKM), October 17-21, 2022, [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] which generalized the Skipgram word-embedding
Atlanta, USA models [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] to learn sentence embeddings and paragraph
$ xianjingabbyl@twitter.com (X. Liu); bgolshan@twitter.com embeddings. With the rise of convolutional and
recur(B. Golshan); kennyleung@twitter.com (K. Leung); rent neural network models, several approaches to learn
(aVm. aKnuslakianrin@i)t;waimttoerll.caohmoss(Aei.nSi@aitnwi)i;tvtekru.clokmarn(Ai@. tMwoitlltaehr.ocsosmeini); sentence embeddings were proposed [
          <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6 ref7">3, 4, 5, 6, 7</xref>
          ].
Addijefm@twitter.com (J. Mo) tionally, a couple of these works sought to learn
represen© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License tations of tweets by applying these networks to Twitter
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org)
text [
          <xref ref-type="bibr" rid="ref4 ref5">5, 4</xref>
          ]. Finally, with the introduction of
Transformers and pretrained language models, the current state
of art approaches now use pre-trained language models
coupled with contrastive loss functions to learn sentence
embeddings [
          <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref14 ref15 ref8 ref9">8, 9, 10, 11, 12, 13, 14, 15</xref>
          ]. All of these
approaches only look at embedding generic sentences and
are not attuned to embedding tweets where deeper
semantic cues, and multi-modal content can be used to
obtain rich representations. Our model Twice, in-turn
builds on these works, but also incorporates cues
specific to tweets (like media and hyperlink) to yield rich
embeddings of tweets.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Twice</title>
      <p>
        At its core, Twice is a Bert model [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] trained on a
multitask loss function. Figure 1 shows the model architecture
of Twice model. In particular, we consider the
following three tasks, each attempting to capture notions of
similarity as noted in Section 1. More specifically, we
optimize a standard Bert model on the following tasks:
• Topic Prediction: The task is to predict the
concept topics associated with the tweet. This
enables the representation to capture topical
similarity. In particular, we optimize binary cross
entropy loss since this is a multi-label prediction
setting where a tweet may be associated with
multiple topics. For instance, the tweet "America
is heading back to the Moon, folks. No
astronauts, but likely to glean loads of data." belongs
to "Space" and "Science" topics. The total number
of concept topics in this task is 419.
• Engagement prediction: Given a
representation of the user (obtained by encoding a user
biography) and a tweet, the task is to predict if
the user engages with the tweet. This task is
essentially identical to the task in the well-known
Clip (Contrastive Language-Image Pre-Training)
model [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] except that instead of embedding
images using an image encoder, we embed the user
biographies using a standard Bert encoder. The
loss function used is identical to the one described
in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Training on this task enables us to
capture tweet similarity based on user engagement
patterns and may be particularly useful to model
especially when down-stream products may want
to maximize user engagements.
• Language prediction: Because we desire
multilingual support, we would like tweet
representations to also encode language cues, so that tweets
of the same language tend to be closer than ones
from diferent languages. Therefore, we
explicitly train on the task of predicting the language
of the tweet and use the standard cross-entropy
loss function for this task.
      </p>
      <p>
        The full loss function is simply the average of the above
three loss functions. Finally, to obtain a dense
representation of the tweet, we simply use the representation of
the [CLS] token. Twice leverages cues from the entire
tweet and not just the raw tweet text. In particular, in
addition to the raw text, we also leverage media cues by
obtaining media annotations for any associated media as
described in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. These media annotations are simply
concatenated to the raw text via a separator token before
being input to the model. Similarly, when a tweet has
hyperlinks, we extract the first 100 tokens of the
webpage title and description as encoded in the linked HTML
page. These features are also appended to the input.
Training procedure. Twice is trained on a dataset of
200 million tweets sampled over a 90 day interval. We
also associate with these tweets the users who engaged
with them (which the Clip task component requires). The
model was optimized using standard Adam with weight
decay as the optimization procedure and trained for 5
epochs until convergence.
      </p>
    </sec>
    <sec id="sec-3">
      <title>4. Experiments</title>
      <sec id="sec-3-1">
        <title>We evaluate Twice both quantitatively and qualitatively each of which we discuss below.</title>
        <sec id="sec-3-1-1">
          <title>4.1. Quantitative Evaluation</title>
          <p>
            Setup. We quantify the efectiveness of tweet
embeddings in capturing content similarity via measuring their
performance on three benchmark tasks – tasks that
relfect how well embeddings capture notions of similarity
noted in Section 1:
• SemEvalPIT [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ]. This benchmark is an
academic benchmark and consists of about 1000
pairs of tweets with similarity scores obtained
by human judgments. We measure the
performance of embeddings on this task by computing
the Spearman correlation of similarity scores
obtained for these tweet pairs in embedding space
with human judgements. Higher correlations
suggest better embeddings reflecting better
alignment with human-derived similarity judgments.
• Recalling Favorites (Favs). In order to
measure the efectiveness of these embeddings in
down-stream predictive models of engagement,
we consider the task of recalling (based on just
a top  nearest neighbor lookup) which tweets a
given user favorites from more than 5k candidate
of tweets given their past engagement history.
Higher scores reflect better embeddings.
• Topic Assignment Precision (Topics). We
compute the precision of topic assignments on a
test set of topical tweets. Higher precision
suggest better encoding of topical similarity. Once
again, here we only base our decisions using a
-NN classifier.
          </p>
          <p>Our rationale for restricting ourselves to using very
simple nearest neighbor approach based on cosine similarity
of tweet embeddings is based on the intuition that higher
quality representations would inherently demonstrate
a higher-degree of “ease of extraction” of the predictive
information. It is for this reason precisely that one needs
to use simple models as opposed to very complex deep
predictive models. We made a design choice to use a
NN-based approach which supported quick
implementation but simple shallow models are another alternative.
Finally, we summarize model performance by reporting
the harmonic mean over tasks.</p>
          <p>Baselines.</p>
          <p>
            We consider the following baseline models:
• Bert: This is just the standard pre-trained Bert
model [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ] and serves as the simplest but strong
baseline that one could use to embed tweets.
• Simcse: Simcse [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] is a state-of-the-art sentence
embedding approach that learns sentence
embedding using an unsupervised approach. The main
idea of Simcse is to pass a tweet  twice through
Bert (with dropout enabled). This yields two
different (noisy) representations of . The idea is to
these as positive examples.  and all other
examples in the batch are treated as negative examples.
The objective to maximize cosine similarity in the
representations of the positive pair and minimize
this between  and the negative examples. It is
to be noted here that we train Simcse on Twitter
data.
• Hashspace: Hashspace is a Bert based model
trained on the task of hashtag prediction where
the model is optimized to predict the correct
hashtag associated with a tweet from a set of 100K
hashtags. Hashspace currently as deployed at
Twitter only uses the raw tweet text as cues to
make predictions, and does not model tweets
holistically.
• Topicspace: Topicspace addresses two main
limitations of Hashspace. First, in contrast to
Hashspace, we model tweets holistically and
leverage cues from media, and hyperlinks as well.
Second, we simplify the predictive task. Instead
of learning to predict a label from a universe of
100K labels (hashtags), we only learn to predict
one or more topics from a space of 419 concept
topics. The intuition is that to capture similarity,
it is suficient to capture fairly broad topics than
seek to capture extremely fine-grained hashtags.
By making these two changes, we note that we
can learn a model with a better fit to the data
yielding richer representations.
• Clip: The original Clip in [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ] is a neural
network trained on 400 millions of (image, text) pairs.
Our Clip model is identical to the original Clip
model in the model architecture. Our Clip model
is trained using a multi-modal method in which
Bert
          </p>
          <p>Simcse
Hashspace
Topicspace</p>
          <p>Clip
Twice</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Results. Table 2 shows the average precision score of</title>
        <p>Results. Table 1 shows the results of our evaluation on using the Twice embeddings on the tasks of Spam
deour benchmark suite. Based on these results we can make tection and ToS violation classification. We compare the
the following conclusions: (a) First, observe that just us- result with the benchmark models Bert and Hashspace.
ing standard models like Bert for embedding tweets does The results show that Twice outperforms both the
stannot yield superior embeddings. It is imperative to learn dard Bert and Hashspace models in both the Spam
deembeddings from Twitter data. (b) Second, state-of-the- tection and ToS violation tasks.
art unsupervised methods for sentence embedding
perform worse than supervised methods which is inline with 4.3. Qualitative Evaluation and Analysis
prior work on sentence embeddings as well. (c) Topic- of Usage in Content Recommenders
space significantly outperforms Hashspace overall. This
is because Topicspace leverages cues from beyond tweet In order to evaluate our model qualitatively, we also built
text and also uses a simpler but more intuitive task. (d) a web page where one can enter a tweet ID and see the
Models like Topicspace and Clip which solely optimize nearest neighbors to the given tweet from a given
prefor a specific notion of similarity tend to perform signifi- determined universe of tweets – a scenario that reflects
cantly better on the corresponding evaluation tasks than the usage of embeddings for candidate generation in
conother models simply because the underlying represen- tent recommenders. Figure 2 shows the nearest neighbors
tations are optimized to capture that specific similarity for a couple of seed tweets as a demonstration. Note that
notion over others. (e) Finally, note that our proposed the nearest neighbors reflect the broad topic of the seed
model Twice generally outperforms all of these base- tweet and are similar in content to the seed tweet
suggestline approaches overall. While the mean performance of ing that our model is able to capture content similarity
Twice and Clip are identical, note that Twice outper- between tweets.
forms Clip on both SemEvalPIT and the Topics tasks While the process outlined in this section can be used
significantly with a slight drop on the Favs task. for candidate generation in content recommenders (by</p>
        <p>To summarize, all in all Twice demonstrates superior ifnding tweets similar to a user’s interests and past
enperformance over prior production models and yields im- gagements), through a qualitative analysis conducted
proved embeddings of tweets by leveraging cues beyond ofline, we have identified a list of challenges that need to
the tweet text. be addressed in-order to ensure good quality candidates
are returned.</p>
        <sec id="sec-3-2-1">
          <title>4.2. Quantitative Evaluation and Analysis of Usage in Health Products</title>
          <p>Setup. We evaluate the performance of content
embeddings on our health platform. We use Twice embeddings
as features in a shallow model to predict Spam and Terms
of Service (ToS) violations. The spam prediction is a
binary classification task to predict whether a tweet is a
• Seed Selection. Using tweets with which users
have positively engaged as seeds to fetch more
interesting content is a natural choice. However,
some seed tweets may have very little content
which makes them unsuitable for retrieving
candidate tweets. Examples include tweets that contain
frequent phrases like “good morning”, everyday
greetings, and daily life updates.</p>
          <p>Bert
Hashspace</p>
          <p>Twice</p>
          <p>Note that these challenges are independent of the
underlying tweet representation itself and may significantly
hinder the quality of candidates even if the tweet
representation model is of a superior quality. To address these
challenges, we build various filters and apply to the
candidate pool, we also carefully design and pick desirable
seed tweets to generate high quality tweets for the user.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion</title>
      <p>In this paper, we proposed a model for embedding tweets
that goes beyond just modeling tweet text. Our goal
has been to develop generically useful rich
representations of tweets that can be used in a variety of
downstream predictive models at Twitter. To that end, we
have demonstrated through ofline evaluation that our
proposed model outperforms the benchmark models on
various tweet products. As next steps, we seek to
validate our model using online A/B tests in various product
surfaces which serve as the ultimate litmus test.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Distributed representations of sentences and documents</article-title>
          ,
          <source>in: International conference on machine learning, PMLR</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1188</fpage>
          -
          <lpage>1196</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Corrado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>26</volume>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wieting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gimpel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Livescu</surname>
          </string-name>
          ,
          <article-title>Towards universal paraphrastic sentence embeddings</article-title>
          ,
          <source>arXiv preprint arXiv:1511.08198</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vosoughi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vijayaraghavan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <article-title>Tweet2vec: Learning tweet embeddings using character-level cnn-lstm encoder-decoder</article-title>
          ,
          <source>in: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1041</fpage>
          -
          <lpage>1044</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Dhingra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fitzpatrick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Muehl</surname>
          </string-name>
          , W. Cohen,
          <article-title>Tweet2vec: Character-based distributed representations for social media</article-title>
          ,
          <source>in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2016</year>
          , pp.
          <fpage>269</fpage>
          -
          <lpage>274</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Hill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Korhonen</surname>
          </string-name>
          ,
          <article-title>Learning distributed representations of sentences from unlabelled data</article-title>
          ,
          <source>in: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1367</fpage>
          -
          <lpage>1377</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schwenk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Barrault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <article-title>Supervised learning of universal sentence representations from natural language inference data</article-title>
          ,
          <source>in: EMNLP</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Cer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          , S.-y. Kong,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Limtiaco</surname>
          </string-name>
          ,
          <string-name>
            R. S. John,
            <given-names>N.</given-names>
            <surname>Constant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Guajardo-Cespedes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tar</surname>
          </string-name>
          , et al.,
          <source>Universal sentence encoder</source>
          , arXiv preprint arXiv:
          <year>1803</year>
          .
          <volume>11175</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3982</fpage>
          -
          <lpage>3992</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          , W. Ma, R. Liu,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vosoughi</surname>
          </string-name>
          ,
          <article-title>An empirical survey of unsupervised text representation methods on twitter data</article-title>
          ,
          <source>in: Proceedings of the Sixth Workshop</source>
          on Noisy User-generated
          <string-name>
            <surname>Text</surname>
          </string-name>
          (
          <article-title>W-NUT</article-title>
          <year>2020</year>
          ),
          <year>2020</year>
          , pp.
          <fpage>209</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Representation learning for short text clustering</article-title>
          ,
          <source>in: International Conference on Web Information Systems Engineering</source>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>321</fpage>
          -
          <lpage>335</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kayal</surname>
          </string-name>
          ,
          <article-title>Unsupervised sentence-embeddings by manifold approximation and projection</article-title>
          ,
          <source>in: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics:</source>
          Main Volume,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          , Simcse:
          <article-title>Simple contrastive learning of sentence embeddings</article-title>
          ,
          <source>in: EMNLP (1)</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Shou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <surname>Whiteningbert:</surname>
          </string-name>
          <article-title>An easy unsupervised sentence embedding approach</article-title>
          ,
          <source>in: Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2021</year>
          ,
          <year>2021</year>
          , pp.
          <fpage>238</fpage>
          -
          <lpage>244</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <article-title>Sentence embeddings using supervised contrastive learning</article-title>
          ,
          <source>arXiv preprint arXiv:2106.04791</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hallacy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          , G. Goh,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mishkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          , et al.,
          <article-title>Learning transferable visual models from natural language supervision</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>8748</fpage>
          -
          <lpage>8763</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kulkarni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Leung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Haghighi</surname>
          </string-name>
          , CTM
          <article-title>- a model for large-scale multi-view tweet topic classification, in: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, Association for Computational Linguistics</article-title>
          , Hybrid: Seattle, Washington + Online,
          <year>2022</year>
          , pp.
          <fpage>247</fpage>
          -
          <lpage>258</lpage>
          . URL: https: //aclanthology.org/
          <year>2022</year>
          .naacl-industry.
          <volume>28</volume>
          . doi:
          <volume>10</volume>
          . 18653/v1/
          <year>2022</year>
          .naacl-industry.
          <volume>28</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Callison-Burch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dolan</surname>
          </string-name>
          , SemEval
          <article-title>-2015 Task 1: Paraphrase and semantic similarity in Twitter (PIT)</article-title>
          ,
          <source>in: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval</source>
          <year>2015</year>
          ), Association for Computational Linguistics,
          <year>2015</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          . URL: https://www.aclweb.org/anthology/ S15-2001. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>S15</fpage>
          -2001.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>