<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Profiling Cryptocurrency Influencers using Few-shot Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hamna Muslihuddeen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pallapothula Sathvika</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shalaka Sankar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shreya Ostwal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anand Kumar M</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Information Technology, National Institute of Technology Karnataka</institution>
          ,
          <addr-line>Surathkal</addr-line>
          ,
          <country country="IN">India 575025</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <abstract>
        <p>This research provides a novel method for identifying cryptocurrency influencers on social media in a low-resource environment. The analysis focuses on English-language Twitter messages and divides influencers into impact categories ranging from minimal to massive. With a maximum of 10 English tweets per user, the dataset consists of 32 people per category. By comparing the suggested approach to two baseline models-Usercharacter Logistic Regression and t5-large (bi-encoders) using zero-shot and label tuning few-shot methods-the proposed system is evaluated using the Macro F1 measure. The ifndings show that the suggested approach operates efectively in low-resource environments and has the potential to be used to further in-depth studies of influencer profiling.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;low-resource</kwd>
        <kwd>cryptocurrency</kwd>
        <kwd>few-shot</kwd>
        <kwd>zero-shot</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Cryptography is used by cryptocurrencies, which are digital or virtual tokens, to safeguard their
transactions and limit the generation of new tokens. They operate independently of a central
authority or middleman, such as a bank or the government, because they are decentralised. In
addition to being saved in digital wallets, cryptocurrency is frequently exchanged on online
exchanges. They have no government or physical backing, and the market forces of supply and
demand determine their price.</p>
      <p>The rapidly rising ubiquity and dissemination of online information such as social media
text and news improve user accessibility towards financial markets, however, modeling these
vast streams of irregular, temporal data poses a challenge.(Ramit Sawhney, Shivam Agarwal,
Megh Thakkar, Arnav Wadhwa, and Rajiv Ratn Shah. 2021.) Here, the challenge of efectively
modeling large volumes of online information, such as social media text and news, which have
irregular patterns and evolve over time. The authors introduce a novel model, HTLSTM, that
uses hyperbolic geometry to better capture the unique characteristics of online information
streams, especially in the context of finance.With cryptocurrencies becoming more and more
popular, it is very common to find people or organisations with sizable online followings who are
able to influence the thoughts and behaviours of their followers with regard to cryptocurrencies.
These people or organisations are known as cryptocurrency influencers. These influencers
may include traders, analysts, investors, journalists, or cryptocurrency specialists. Since their
followers’ buying and selling decisions are influenced by their thoughts and suggestions, crypto
influencers can have a big impact on the acceptance and value of cryptocurrencies. Many
cryptocurrency influencers express their views, analyses, and opinions about various
cryptocurrencies and blockchain-related projects on social media sites like Twitter, YouTube, and
Instagram.</p>
      <p>While some influencers are renowned for their precise market predictions and analyses, others
are renowned for their outspoken and divisive viewpoints. To advertise their goods and services,
some influencers also work with cryptocurrency initiatives and businesses. But it’s vital to
remember that not all cryptocurrency influencers are reliable or trustworthy, and some can even
participate in dishonest or deceptive behaviour. As a result, it’s crucial for people to conduct
their own research and use caution when acting on cryptocurrency influencers’ advise. But
since not everyone can aford this, a solution that can profile crypto influencers in real-time in
a matter of milliseconds must be developed. This needs processing as little data as possible in
order to to get fast and accurate results.</p>
      <p>Making use of the Few Short Learning method is one strategy for solving this issue. A form of
machine learning called few-shot learning entails teaching a model to recognise new classes of
data with a very small sample size. When discussing cryptocurrency influencers, this refers to
training a model to recognise influencers based on a sparse sample of their tweets.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Survey</title>
      <p>
        In[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], it’s about author profiling, which is the process of identifying characteristics of an author
based on their writing, such as gender, age, native language, personality type, etc. It specifically
focuses on the gender and age aspects of author profiling in social media, using everyday
language to reflect basic social and personality processes. Author profiling involves applying
computational tools and linguistic analysis to analyse written materials in order to predict
and identify features about the authors, such as their demographics, personality traits, and
behaviour. It can be used in areas like marketing, social media analysis, and forensic linguistics.
In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], discuss few-shot and zero-shot learning in the context of author profiling. They explain
that few-shot learning aims to train classifiers with little training data, while zero-shot learning
does not use any labelled data. They also describe how the entailment approach can be used
for zero-shot text classification, which relies on neural language models such as BERT trained
on large NLI datasets. The authors assess their framework using two tests that determine an
author’s gender and age based on their written work. They contrast their strategy with a
number of industry standards and show that their framework produces competitive performance,
especially in situations where only a small amount of labeled data is available.
      </p>
      <p>Overall, the study makes a significant contribution to the field of author profiling and
emphasises how zero-shot and few-shot learning have great potential for this task. By utilising these
methods, models for author profiling can be created more accurately and more efectively while
overcoming the problem of scarce labelled data.</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], An approach for active few-shot learning is suggested by the authors. The authors
suggest a technique known as FASL (Fast Active Selection for Labelling), which combines
fewshot learning and active learning. In order to enhance the performance of the few-shot learning
model, FASL seeks to choose the most instructive examples for labelling. The process entails
selecting the most instructive examples for labelling from a huge pool of unlabeled examples
by first training a few-shot learning model on a limited set of labelled examples. The few-shot
learning model is then retrained using the labelled examples that were chosen earlier.On a
number of few-shot learning datasets, the authors assess their approach and compare it to
other cutting-edge techniques. They show that FASL performs competitively on some datasets
and outperforms other approaches on others. They also ofer a thorough examination of the
influence of various selection procedures as well as the eficacy of the active selection approach.
In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], adopts machine learning model based on text analysis, using tf-idf for feature extraction
of data, and then uses logistic regression model for data training.The idea of linear regression is
to fit a straight line through historical data and use this line to predict new data. The objective
of logistic regression is to calculate the likelihood that an observation belongs to a given class.
A logistic (sigmoid) function is used in the logistic regression model to convert the linear
regression equation and limit the output to a range between 0 and 1. This enables us to translate
the result into a probability.
      </p>
      <p>
        In[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] ,the application of active learning with random forest, a cutting-edge multi-class classifier.
The suggested strategy uses an efective active learning algorithm to maximise the combined
entropy of a group of samples while minimising information redundancy. The technique performs
better than the basic batch mode of active learning when used to adaptively classify undersea
mines.
      </p>
      <p>
        In[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], it mentions that adequate hyper-parameter tuning is crucial for the efective use of SVM
classifiers. For this issue, a number of techniques have been employed, including grid search,
random search, estimation of distribution algorithms (EDAs), and bio-inspired metaheuristics .
The conclusion is backed by experimental findings, and according to the set standards, EDAs
are the best techniques for optimising SVM classifier hyperparameter settings. It is crucial to
keep in mind that the efectiveness of the remaining algorithms depends on the precise values
of the user-defined parameters that are used to control them.
      </p>
      <p>
        In[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],The term "term frequency inverse document frequency" (TF-IDF) is used to examine
the applicability of key terms to corpus documents. The application of the algorithm to various
numbers of documents is the main topic of the study. To start, the actions that should be taken
for TF-IDF implementation are explained along with their functioning principle. The results are
then provided, and the strengths and shortcomings of the TD-IDF algorithm are contrasted in
order to verify the conclusions drawn from using the algorithm.
      </p>
      <p>
        In[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], examines the use of Word2Vec to identify implicit linkages in multi-participant
ComputerSupported Collaborative Learning chat sessions. Word2Vec is a potent and one of the most
recent Natural Language Processing semantic models used to determine text cohesion and
document similarity. The intensity of the semantic ties between two utterances is measured by
cohesion scores in this study; the higher the score, the more similar the two utterances are to
one another.With Word2Vec, the context before and after each word occurrence in the training
dataset is used to compute each embedding. As a result, words that frequently appear together
in comparable contexts are represented closer together in the embedded space, while words
that do not frequently occur together in similar situations are represented in various areas of
this space.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Problem Statement</title>
      <p>It is important to recognise that not all cryptocurrency influencers are trustworthy or honest,
and that some may engage in misleading or manipulative behaviour. People should therefore
do their homework and exercise caution while adopting the advice of these influencers.</p>
      <p>
        To solve this problem, we aim to develop a low-resource model that can categorise
cryptocurrency influencers on social media into five diferent groups based on their level of influence:
null, nano, micro, macro, and mega. Our concentration is on English-language Twitter messages,
and our goal is to create a strong model that, using the Few Short Learning technique[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], can
precisely profile and categorise cryptocurrency influencers on social media. By doing this, we
intend to give people a useful tool that will help them decide wisely when interacting with
social media bitcoin influencers[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>The project involves 2 major parts: data processing and developing the model. Under data
processing we perform feature extraction to obtain maximum information possible from the
limited dataset. Following feature extraction we proceed to develop the model based on few
short learning which makes use of active learning.</p>
      <sec id="sec-4-1">
        <title>4.1. Dataset</title>
        <p>The dataset used in our few-shot learning task consists of two JSON files: train_text.json and
train_truth.json. The train_text file contains 160 JSON objects, each of which contains the
Twitter user ID and the corresponding user tweets. The number of tweets per user varies
between 2 to 12 tweets. The train_truth file contains 160 JSON objects, each of which contains
the Twitter user ID and the profiled class label. The class labels used are null, nano, micro,
macro, and mega.</p>
        <p>The dataset is relatively small, with only 32 users under each of the five class labels, resulting
in a total of 160 entries. This presents a challenge for the few-shot learning task, as the model
must learn to recognize and classify users based on a limited amount of training data. Therefore,
it is important to carefully select and pre-process the data to ensure that the model can efectively
learn and generalize from it.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Data Pre-processing</title>
        <p>Data cleaning and feature extraction are important steps in preparing the dataset for few-shot
learning. In our approach, we perform the following steps to preprocess the data:</p>
        <p>Combine tweets: To simplify the data and create a single input sequence for each user, we
combine all the tweets of a particular user into a single sentence.</p>
        <p>Remove punctuation: We remove all punctuation marks from the text, as they do not
provide meaningful information for our task.</p>
        <p>Convert emojis and emoticons: Emojis or emoticons are often used to communicate
thoughts and can be perceived to be more efective than text in social media. Therefore we
convert all emojis and emoticons into their corresponding text representations to get more
information and to ensure consistency in the data.</p>
        <p>Replace hyperlinks: Some tweets contain hyperlinks to other websites. For valid links, we
replace the hyperlink with the data scraped from the website or the title of the website. For
invalid links, we replace the hyperlink with a blank space.</p>
        <p>Overall, these preprocessing steps help to standardize the data and remove irrelevant
information, allowing the model to focus on the key features that are important for classifying
cryptocurrency influencers. By cleaning and processing the data in this way, we can improve
the model’s performance and accuracy.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Feature extraction</title>
        <p>Feature extraction is a crucial step in data preprocessing that aims to reduce the dimensionality
of raw data by selecting and transforming relevant features to improve the accuracy and
eficiency of machine learning algorithms. In our task of profiling cryptocurrency influencers
in social media, we adopt various feature extraction techniques to capture the most important
characteristics of the dataset and enable efective analysis, modeling, and decision-making.</p>
        <p>
          We first encode the preprocessed tweet of each Twitter user into a vector using TF-IDF [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. In
natural language processing, the TF-IDF (term frequency-inverse document frequency) sentence
transformation approach is used to assess the significance of a given sentence within a corpus or
document. It determines each word’s relevance in a sentence by comparing its frequency in the
sentence to its rarity across the entire document or corpus. By doing so, the original sentences
can be changed into vector representations that reflect the semantic significance of the words
included within them. This approach makes it simpler to assess how similar two sentences are.
        </p>
        <p>
          In addition to TF-IDF encoding, we also extract various other features that are unique to
our problem statement and help in improving the model’s performance. We keep count of the
number of tweets per user, the number of hyperlinks mentioned in the tweets, and the number
of valid and invalid hyperlinks. We also count the number of cryptocurrency-related terms used
in the text, which is a critical aspect of our task, given that we are profiling cryptocurrency
influencers. Finally, we also include the word2vec [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] embedding of size 200 of the names of the
most popular cryptocurrencies to capture the relationships and similarities between the various
cryptocurrencies.
        </p>
        <p>When we combine all the selected features, our feature matrix consists of 973 columns and
160 rows in this case, with each row, representing a unique Twitter user. In general the number
of columns in the feature matrix will be equal to the sum of total number of words in the dataset
and the additional columns we have added and the number of rows will be the total number
of twitter users given in the dataset. This feature matrix serves as the input to our machine
learning algorithm, which use it to train and make predictions on new and unseen data. By
adopting a comprehensive feature extraction approach, we can capture the most critical and
relevant information from the dataset, which helps us to understand better and analyze the
dynamics of cryptocurrency influencers on social media.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Classification model using Few-Shot Learning</title>
        <p>
          In order to execute few shot learning in our project, we employed active learning. Selecting
the most informative samples from a dataset for expert annotation or for active learning by
a machine learning algorithm is a key component of the active learning machine learning
approach[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. For labels on particular samples that are most likely to increase the model’s
accuracy, the algorithm actively asks the user or expert in active learning. There are several
sampling techniques available to choose the most illuminating samples for training. Only a few
of the strategies are efective at picking samples from a dataset this huge. As follows:
• Uncertainty Sampling: According to uncertainty sampling, retraining the model using
the most uncertain data points would improve the model’s accuracy.
• Diversity Sampling: Using a diverse sample The representative test data are gathered
using this technique from unsupervised learning. Predicted results for such samples are
provided as training data to the algorithm.
        </p>
        <p>
          In order to perform active learning, we used diversity sampling. K means clustering is the
unsupervised learning method we used to choose the samples. We have divided the data into two
parts, namely the initial training data and test data, to imitate the real-life scenario where the test
data is unsupervised. On the first set of training data, the model is trained.Multinomial Logistic
regression[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] is chosen as the base model .As there are 5 classes, the test data is divided into 5
clusters .By calculating the nearest data point to the training dataset, we choose a representative
sample from each cluster. The data point is added to the training data along with the model’s
prediction for it, while the test data is eliminated. We keep doing this until we have gathered
the required number of representative samples. The steps involved are as follows:
1. Initial splitting of dataset into training and splitting.The desired splitting should contain
more testing data. We have used 2 diferent approaches to split the data into testing and
training sets. Firstly we used the conventional method of using 80%training data and 20%
testing data. The second method involves finding the median of the number of tweets
in each class of user and classifying the data such that users whose number of tweets
is more than or equal to the median are classified into the training set and the rest are
categorized to the test set.
2. Iterative clustering is performed on the testing data to collect the samples. The number
of samples should at least be 15% of the data.
3. The collected samples are appended to the training data and are used to retrain the model
4. The remaining data is used to test the model.
        </p>
        <p>Along with active learning, we also implemented transfer learning to compare and evaluate
and choose the better model. To implement transfer learning, we have used logistic regression
as both a pre-trained model and a learning model. Logistic regression is the most suitable
classification model since it allows to extract coeficients for the learning model. First, we have
pre-trained the logistic regression model with 12.5% of the dataset to extract the weights. The
weights of the first layer are then set to that of the pre-trained model. The layer is frozen to fix
those pre-defined weights. We then retrain the model with another 12.5% of the dataset. The
remaining dataset is used for testing.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Hyperparameters</title>
        <p>
          The performance of active learning was compared with respect to three classifiers: Logistic
Regression, Support Vector machine classifier[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], and Random Forest classifier[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. In Logistic
Regression, the model has been initialized with the following parameters: a. “multiclass” which
has been set to multinomial, and solver which has been set to ’lbfgs’, which stands for
Limitedmemory Broyden-Fletcher-Goldfarb-Shanno. This solver is most suitable for multiclass problems
with small to medium-sized datasets. In the Random Forest classifier, the only hyperparameter
used was “n-estimators” which was set to 40. The value for this hyperparameter was obtained
by the trial-and-error method.
        </p>
        <p>
          The hyperparameters defined in The SVM classifier[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] are the kernel, degree, random state,
and gamma. the kernel used here was “poly” due to the higher dimensionality and linear
separability. The degree for the poly kernel was set to 5. The random seed for various random
processes within an algorithm, including random initialization, shufling of data, or random
sampling is set to 0 to ensure that the algorithm’s random processes produce the same results
when the code is run again with the same random seed value. The gamma hyperparameter is set
to ’auto’, which means the value of gamma is automatically determined based on the training
data. The ’auto’ option calculates gamma as 1 / n-features, where n-features is the number of
features in the input data.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>We evaluated the performance of our created model using a diferent test dataset. We compared
the performance of our model with the accuracy and F1 Macro scores attained by conducting
logistic regression, SVM, and Random Forest, each of which is a well-established machine
learning approach for classification problems, in order to guarantee the validity of our results.</p>
      <p>We were able to validate our method and evaluate its efectiveness in identifying
cryptocurrency influencers on social media by comparing the accuracy of our model with that of the
logistic regression, random forest, and SVM. Through this comparison, we were able to ensure
that our model is reliable and robust and that it can correctly classify influencers into the five
categories of null, nano, micro, macro, and mega. Refer TABLE 2</p>
      <p>This evaluation procedure provides a quantitative assessment of the accuracy and efectiveness
of our developed model and demonstrates its ability to perform well in a low-resource setting
when compared to a well-established machine-learning algorithm. In addition to accuracy, we
also utilized the F1 score as an evaluation metric for our developed model. The F1 score is a
commonly used performance measure in classification tasks, which considers both precision
and recall of the model’s predictions.</p>
      <p>The F1 scores obtained from our developed model as recorded in TABLE 2 can confidently
conclude that our developed model of combining Logistic Regression with Active Learning
outperforms logistic regression and other methods in a low-resource situation. This is a significant
ifnding, as it indicates that our model is efective in profiling cryptocurrency influencers on
social media using limited resources, and can be a valuable tool for researchers and individuals
seeking to make informed decisions when engaging with influencers.</p>
      <p>Along with comparing the models TABLE 3 depicts the accuracy obtained for the diferent
types of test-train dataset split. Upon observing it we can say that the convention 40-60 split is a
better approach to split the dataset rather than taking the median as the threshold for the split.</p>
      <p>Overall, the use of the F1 score provides a more comprehensive and robust evaluation of our
developed model’s performance and further confirms its superiority over logistic regression in
a low-resource setting.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In conclusion, the task of profiling cryptocurrency influencers in social media and categorizing
related aspects of their influence is a challenging one, especially when working with
lowresource settings. However, by focusing on English Twitter posts and making use of Few shot
learning, it is possible to extract valuable insights and information about these influencers. It
is important to continue developing and refining techniques for analyzing social media data
in order to better understand the influence of cryptocurrency influencers and their impact on
the wider industry. By doing so, we can gain a deeper understanding of the dynamics of the
cryptocurrency world and make more informed decisions about its future.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>First and foremost, we would like to express our sincere gratitude to CLEF for giving us the
chance to take part in this prestigious competition. We deeply value their support in making it
possible for us to demonstrate our abilities on this platform. We would also like to take this
opportunity to thank our professor and guide Dr Anand Kumar for their guidance, support, and
encouragement throughout the entire process. Their mentorship and expertise were invaluable
in helping us to shape the direction of our research and to bring our ideas to fruition.
of Word2Vec for Identifying Implicit Links," 2017 IEEE 17th International Conference on
Advanced Learning Technologies (ICALT), Timisoara, Romania, 2017, pp. 199-200, doi:
10.1109/ICALT.2017.120.
[9] Ramit Sawhney, Shivam Agarwal, Megh Thakkar, Arnav Wadhwa, and Rajiv Ratn Shah.
2021. Hyperbolic Online Time Stream Modeling. In Proceedings of the 44th
International ACM SIGIR Conference on Research and Development in Information Retrieval
(SIGIR ’21). Association for Computing Machinery, New York, NY, USA, 1682–1686.
https://doi.org/10.1145/3404835.3463119</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Koppel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Inches</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <source>Overview of the Author Profiling Task at PAN 2013. CLEF Conference on Multilingual and Multimodal Information Access Evaluation</source>
          .
          <fpage>352</fpage>
          -
          <lpage>365</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Mara</given-names>
            <surname>Chinea-Rios</surname>
          </string-name>
          , Thomas Müller,
          <string-name>
            <surname>Gretel Liz De La Peña Sarracén</surname>
          </string-name>
          , Francisco Rangel, Marc Franco-Salvador.
          <article-title>Zero and Few-Shot Learning for Author Profiling In: NLDB</article-title>
          , pp.
          <fpage>333</fpage>
          -
          <lpage>344</lpage>
          ,
          <year>2022</year>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Müller</surname>
          </string-name>
          , Guillermo Pérez-Torró,
          <article-title>Angelo Basile, Marc Franco-Salvador. Active Few-Shot Learning with</article-title>
          FASL In: NLDB, pp.
          <fpage>323</fpage>
          -
          <lpage>333</lpage>
          ,
          <year>2022</year>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Liu</surname>
          </string-name>
          and
          <string-name>
            <surname>L. Zhang,</surname>
          </string-name>
          <article-title>"Application of Logistic Regression in WEB Vulnerability Scanning,"</article-title>
          <source>2018 International Conference on Sensor Networks and Signal Processing (SNSP)</source>
          ,
          <source>Xi'an, China</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>486</fpage>
          -
          <lpage>490</lpage>
          , doi: 10.1109/SNSP.
          <year>2018</year>
          .
          <volume>00097</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H. T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yadegar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kong</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <article-title>"Eficient batch-mode active learning of random forest,"</article-title>
          <source>2012 IEEE Statistical Signal Processing Workshop</source>
          (SSP), Ann Arbor, MI, USA,
          <year>2012</year>
          , pp.
          <fpage>596</fpage>
          -
          <lpage>599</lpage>
          , doi: 10.1109/SSP.
          <year>2012</year>
          .
          <volume>6319769</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rojas-Domínguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Padierna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Carpio Valadez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Puga-Soberanes and H. J. Fraire</surname>
          </string-name>
          ,
          <article-title>"Optimal Hyper-Parameter Tuning of SVM Classifiers With Application to Medical Diagnosis,"</article-title>
          <source>in IEEE Access</source>
          , vol.
          <volume>6</volume>
          , pp.
          <fpage>7164</fpage>
          -
          <lpage>7176</lpage>
          ,
          <year>2018</year>
          , doi: 10.1109/ACCESS.
          <year>2017</year>
          .
          <volume>2779794</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Qaiser</surname>
            ,
            <given-names>Shahzad</given-names>
          </string-name>
          &amp; Ali,
          <string-name>
            <surname>Ramsha.</surname>
          </string-name>
          (
          <year>2018</year>
          ). Text Mining:
          <article-title>Use of TF-IDF to Examine the Relevance of Words to Documents</article-title>
          .
          <source>International Journal of Computer Applications</source>
          .
          <volume>181</volume>
          . 10.5120/ijca2018917395.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Gutu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dascalu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruseti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rebedea</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Trausan-Matu</surname>
          </string-name>
          ,
          <article-title>"Unlocking the Power</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>