<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detection of Violent Events in Social Media: DA-VINCIS 2023</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Braulio Hernández-Minutti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jesus-Alejandro Olivares-Padilla</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ricardo Valerio-Carrera</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Omar Juárez Gambino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Instituto Politénico Nacional (IPN) - Escuela Superior de Cómputo (ESCOM), J.D. Batiz e/ M.</institution>
          <addr-line>O. de Mendizabal s/n, Mexico City, 07738</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we describe the participation of the ESCOM NLP group in the DA-VINCIS 2023 task. The task propose to detect violent events in tweets. Two tracks were defined: identification of violent events and recognition of categories of violent events. We trained machine learning methods and proposed an ensemble schema which boost the performance. Our best model ranked 11th and 9th in the first and second tasks.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Violent event detection</kwd>
        <kwd>Multimodal information</kwd>
        <kwd>Machine Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Violence is perceived as a severe problem in society. Some consequences of violence are
depression, anxiety, and post-traumatic stress disorder. Detecting these events helps to prevent
the terrible efects mentioned above.</p>
      <p>
        Social networks allow users to communicate quickly and efectively. The media have exploited
these features to publish news and increase their audience. Twitter has become the media’s
favorite for posting news; nearly 85% of trending topics are reported to be headlines or persistent
news stories [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Therefore, it has become relevant to detect violent events that are reported
through this media.
      </p>
      <p>
        Violent behaviour in social networks has been studied from diferent aspects. In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], authors
proposed some methods to identify violent radicals from non-violent considering user’s profiles.
Hate speech is another facet of social media violence. Transfer learning has been used for
automatic detection of these events [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], violent events reported on Twitter were used
to train methods for automatic detection. Pretrained transformers, ensembles, and multi-task
learning were the main approaches to tackling the task.
      </p>
      <p>
        Considering the impact of violence and the importance of its early detection, a task was
colocated at the IberLEF 2023 evaluation forum [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. DA-VINCIS 2023 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] proposes that participants
create solutions that automate detecting violent events on Twitter.
      </p>
      <p>In the 2023 edition, two task were proposed. Violent event identification (binary classification)
and violent event category recognition (multilabel classification). Text and images were provided
for both tasks to consider single or multimodal information.</p>
      <p>In this paper, we describe our participation in the DA-VINCIS task. The rest of this paper is
organized as follows. Section 2 describes the task and the corpus. Section 3 describes the method
we used. Section 4 explains the performed experiments and the obtained results. Section 5
shows our conclusions and future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Task and corpus description</title>
      <p>The DA-VINCIS task aims to determine whether or not a tweet describes a violent incident by
analyzing its textual and visual information. Two tracks were featured:
1. Identification of violent events: This involves determining whether a given tweet is
associated with a violent incident or not. It is a binary classification task.
2. Recognition of categories of violent events: Recognizing the crime category to which a
given tweet belongs. This is a multilabel classification task.</p>
      <p>To accomplish the task, a significant corpus was collected from Twitter. A total of 4,731
tweets were collected and annotated, which were distributed among the diferent phases of the
competition, for both training and testing. This included 3,578 tweets for the various training
phases and 1,153 for testing. Each tweet includes textual information and an associated image
that complements the tweet’s context. This detail allows contestants to build models that
interpret not only the textual content but also the visual content of the tweet, adding a layer of
information. In the experiments to be described in Section 4, only textual information will be
used, so the use of images is proposed for future work.</p>
      <p>Tweets were labeled with the following categories:
• Accident: An eventual event or action resulting in unintentional harm to people or things.
• Murder: Deprivation of life.
• Robbery: Voluntary appropriation or destruction of another person’s property without
the right or consent of the person who can legally dispose of them.
• None of the above: Selected when no crime is reported in the tweet. It is worth noting that
tweets under this category were also collected using keywords associated with violent
events.</p>
      <p>All categories will be utilized for the second track, while for the first track, only two categories
will be considered, namely "none-of-the-above" versus the rest, i.e., violent events.</p>
      <p>The challenge was conducted on the CodaLab platform. The shared task was divided into
two stages:
• Development phase. Participants were provided with labeled training data, including
2,996 tweets and labeled validation data comprising 582 tweets. During this phase, which
lasted approximately two months, participants could submit predictions for the validation
set and receive immediate feedback on the CodaLab site.
• Final phase. Participants were given the same labeled training data set as in the previous
development phase and unlabeled test data consisting of 1,153 tweets. They were allowed
to upload up to five submissions per day, ten in total during the entire competition.
Performance on the test set was used to rank participants. No immediate feedback was
provided on the CodaLab site during this phase.</p>
      <p>For track 1, the evaluation measures considered were recall, precision, and F1 score,
specifically for the violent-incident class. For track 2, macro average recall, precision, and F1 score
were considered. In both cases, the primary evaluation measure was the F1 score.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>Before applying machine learning techniques to text data, it is essential to perform adequate
preprocessing. This process aims to clean and transform the text into a structured form,
facilitating extracting relevant features and improving the results’ quality.</p>
      <sec id="sec-3-1">
        <title>3.1. Tokenization and Lemmatization</title>
        <p>The first stage of data preprocessing in text analysis is tokenization. This technique involves
dividing the text into smaller units known as tokens. Tokens can be individual words or even
short phrases, depending on the desired level of granularity.</p>
        <p>Once the tokens have been obtained, lemmatization is applied. This phase aims to reduce
words to their base form, also known as the lemma. For example, the words "running," "runs,"
and "ran" would be lemmatized to their base form, "run." This helps reduce vocabulary variability
and ensures that diferent forms of the same word are treated as a single entity.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Removal of Stop Words</title>
        <p>Another important step in data preprocessing is the removal of stop words. Stop words are
common words that do not provide significant information for analysis and can be safely
removed without afecting the overall meaning of the text. The removed stop words include
articles, prepositions, conjunctions, and common pronouns in Spanish.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Text representation</title>
        <p>Once the data preprocessing is completed, selecting relevant features, appropriately representing
the text, and applying machine learning algorithms to obtain robust predictive models are
essential.</p>
        <p>Regarding text representation, three diferent vector representation techniques were used:
frequency-based representation, which converts the text into a numerical matrix; binary vector
representation; and TF-IDF (Term Frequency-Inverse Document Frequency). In binary vector
representation, 1 is assigned if a word is present in the text and 0 otherwise. On the other hand,
TF-IDF assigns a weight to each word based on its frequency in the text and the overall corpus.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Data Partitioning and Model Optimization</title>
        <p>Next, the data were divided into training and development sets, with 90% used for training and
the remaining 10% for testing. Subsequently, the machine learning models to be evaluated in
this study were selected. The considered models were Logistic Regression, Support Vector
Machines, Naive Bayes, Multilayer Perceptron, and XGBoost. Then, an exhaustive hyperparameter
search was performed using the grid search technique combined with cross-validation. This
strategy allowed exploring diferent combinations of hyperparameters and identifying those
that optimized the performance of each model.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Model Evaluation and Ensemble Approach</title>
        <p>Finally, once the models were fine-tuned, their performance was evaluated using evaluation
metrics such as precision, recall, and F1 score, with the latter being the primary metric of
interest in this case. Additionally, an ensemble approach was implemented, combining multiple
models using the soft voting technique.</p>
        <p>In the ensemble approach, the soft voting technique was adopted. This method involves
calculating the probability of belonging to each class for each model and conducting a weighted
vote to make the final decision. By combining the predictions of multiple models, the ensemble
approach aims to leverage the strengths and diversity of each model, leading to improved
overall performance and robustness. For track 1, a voting technique was utilized, where the
ifnal prediction was based on the collective decision of multiple models. However, for track 2, a
multioutput classifier was employed due to the multilabel classification expected.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and results</title>
      <p>
        As mentioned in the previous section, experiments were conducted with three diferent text
vector representations and various machine learning algorithms. A comprehensive search for
the best parameters and performance was performed for each vector representation using grid
search and 5-fold cross-validation. Additionally, a soft voting ensemble approach was applied.
In the case of track 2, a multi-output classifier was also utilized to accommodate the specific
characteristics of the expected results. All experiments were done using the machine learning
library [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The results of these experiments for each track are explained below.
      </p>
      <sec id="sec-4-1">
        <title>4.1. Violent event identification</title>
        <p>This track involves binary detection of violent incidents in tweets, the preprocessing steps
explained in the sub-section 3.1 were applied. After several experiments, it was found that the
best text representation was the frequency representation. The following are the parameters
for each model:
• Logistic Regression: multiclass=’ovr’, C=1, maxiter=100, penalty=’l2’, solver=’newton-cg’
• Support Vector Machine:C=10, degree=2, gamma=’scale’, kernel=’poly’, probability=True
• Naive Bayes: alpha=1
• Multilayer Perceptron: hiddenlayersizes=(10, 10), maxiter=1000, randomstate=42
• XGBClassifier:booster=’gbtree’, objective=’binary:logistic’, randomstate=0, eta=0.3,
samplingmethod=’uniform’</p>
        <p>As can be seen in Table 1, Support Vector Machines was the best classifier. However, the
ensemble of these methods outperformed the independent classifier results.</p>
        <p>Algorithm
LR
SVM
NB
MLP
XGB
Ensemble</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Violent event category recognition</title>
        <p>For the second track, which is the multilabel detection of violent events in a tweet, the same
preprocessing as in track 1 was applied. Subsequently, the multioutput classifier was
implemented, which is used when dealing with a classification problem with multiple labels or output
tasks. Unlike traditional classifiers that focus on a single label or task, the multioutput classifier
has the ability to predict multiple labels simultaneously. Each output represents a diferent
classification label or task, which is particularly useful in situations where there is
interdependence between the tasks or when predicting multiple related features or labels. Another issue
that was considered was the class unbalance. In particular, the classes in this task were highly
unbalanced. Therefore, the parameter class_weight=’balanced’ was assigned when creating the
machine learning models. This parameter allows handling the unbalance in the class’s weights
inversely proportional to their respective frequencies. Next, the grid search was conducted,
and it was found that the best representation of the text was the binary representation. The
following parameters were obtained for each model:
• Logistic Regression: warmstart=False, multiclass=’auto’, classweight=’balanced’, C= 5000,
maxiter= 50000, penalty= ’l2’, solver= ’lbfgs’
• Support Vector Machine: C= 0.5, shrinking=False, classweight= ’balanced’, gamma=
’scale’, kernel= ’rbf’, probability=True
• Naive Bayes: alpha=1
• Multilayer Perceptron: hiddenlayersizes=(1000, 500), maxiter=10000, randomstate=0,
learningrate="constant", solver= ’lbfgs’
• XGBClassifier: booster=’gbtree’, objective=’reg:logistic’, randomstate=0, eta=0.3,
samplingmethod=’uniform’</p>
        <p>In particular, for this subtask, the logistic regression model achieved the best results on the
development set (10% of the training corpus). The results of all models are shown in Table 2.</p>
        <p>Algorithm
LR
SVM
NB
MLP
XGB
Ensemble</p>
        <p>However, it can be observed that some models obtained results below expectations. Therefore,
in the soft voting ensemble, it was decided not to include Naive Bayes and Multi layer Perceptron.
The voting algorithms included in the ensemble are Logistic Regression, Support Vector Machine,
and XGB. The results are shown in the last row of Table 2.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Results of models in the final phase file</title>
        <p>The trained models shown in the previous subsections were used on the dataset provided for
the final phase. This dataset consisted of 1,153 tweets and was used for both tracks. The same
preprocessing steps explained in the preprocessing subsection were applied to this dataset.
Finally, predictions for each instance were made using the corresponding models for each track.</p>
        <p>In Figure 1, the results of the participants in the contest for track 1 are shown. It can be
observed that we obtained the 11th place in the contest (ESCOM team), keeping in mind that
the leading metric is the F1-score.</p>
        <p>Similarly, in Figure 2, the results of the participants in the contest for subtask 2 are shown. It
can be observed that we obtained the 9th place in the contest (ESCOM team), keeping in mind
that the leading metric is the F1-score.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and future work</title>
      <p>Violence has negative consequences in people. Detecting violent acts helps the authorities to
take action and reduce the damage caused by such acts. In this paper, we report our participation
in DAVINCIS 2023 task. Two tracks were proposed: violent event identification and violent
event category recognition. Diferent text representations and machine learning methods were
tried. Classifiers were fine-tunned and a voting ensemble schema was used to improve the
performance. Our proposal obtained the 11th place in the first track and the 9th place for the
second task. As future work, we propose to use the images provided in tweets in order to
include additional futures. Pretraining language models could also be explored.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgments</title>
      <p>This research was funded by CONAHCYT-SNI and Instituto Politécnico Nacional (IPN), through
grants SIP-20232011 and EDI.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kwak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Park</surname>
          </string-name>
          , S. Moon,
          <article-title>What is Twitter, a social network or a news media?</article-title>
          ,
          <source>in: Proceedings of the 19th international conference on World wide web, ACM</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>591</fpage>
          -
          <lpage>600</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wolfowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Perry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hasisi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Weisburd</surname>
          </string-name>
          ,
          <article-title>Faces of radicalism: Diferentiating between violent and non-violent radicals by their social media profiles</article-title>
          ,
          <source>Computers in Human Behavior</source>
          <volume>116</volume>
          (
          <year>2021</year>
          )
          <fpage>106646</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ali</surname>
          </string-name>
          , U. Farooq,
          <string-name>
            <given-names>U.</given-names>
            <surname>Arshad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shahzad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Beg</surname>
          </string-name>
          ,
          <article-title>Hate speech detection on twitter using transfer learning</article-title>
          ,
          <source>Computer Speech &amp; Language</source>
          <volume>74</volume>
          (
          <year>2022</year>
          )
          <fpage>101365</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L. J.</given-names>
            <surname>Arellano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Escalante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Villaseñor</given-names>
            <surname>Pineda</surname>
          </string-name>
          , M. Montes y Gómez,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sanchez-Vega</surname>
          </string-name>
          ,
          <article-title>Overview of da-vincis at iberlef 2022: Detection of aggressive and violent incidents from social media in spanish</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>69</volume>
          (
          <year>2022</year>
          )
          <fpage>207</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Montes-y Gómez, Overview of IberLEF 2023: Natural Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023), co-located with the 39th Conference of the Spanish Society for Natural Language Processing (SEPLN 2023), CEUR-WS</article-title>
          .org,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jarquín-Vásquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. I. Hernández</given-names>
            <surname>Farías</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. J.</given-names>
            <surname>Arellano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Escalante</surname>
          </string-name>
          , L. VillaseñorPineda, M. Montes y Gómez,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sanchez-Vega</surname>
          </string-name>
          ,
          <article-title>Overview of da-vincis at iberlef 2023: Detection of aggressive and violent incidents from social media in spanish</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>71</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , E. Duchesnay,
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          )
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>