<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Synonymous variation of 'War' in the British national corpus using sketch engine: a linguistic analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zoriana Rybchak</string-name>
          <email>zoriana.l.rybchak@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olha Kulyna</string-name>
          <email>olha.v.kulyna@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>S. Bandery street 12, Lviv, 79000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>MoDaST-2024: 6th International Workshop on Modern Data Science Technologies</institution>
          ,
          <addr-line>May, 31 - June, 1, 2024, Lviv- Shatsk</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This article presents an intelligent system designed to analyze and control speech based on userdefined criteria, with the objective of enhancing communication skills through insightful data analysis. Leveraging Python libraries such as PyAudio, Vosk, Pandas, and Plotly, the system enables audio recording, speech-to-text conversion, data management, and visualization of speech patterns. The study explores effective speech recognition methods and algorithms for audio processing and text analysis, including keyword detection and segment analysis. Visualizations generated by the system offer users a clear understanding of their speech dynamics over time. The software features an intuitive interface to ensure widespread usability. Key functionalities include speech recording, processing, unwanted word management, audio playback, and chart creation. This research contributes a comprehensive speech analysis application utilizing modern techniques to provide actionable insights for improving spoken language proficiency.</p>
      </abstract>
      <kwd-group>
        <kwd>British National Corpus</kwd>
        <kwd>synonym</kwd>
        <kwd>linguistic analysis</kwd>
        <kwd>Sketch Engine</kwd>
        <kwd>CQL</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The field of corpus linguistics is a highly significant area within linguistics and related
disciplines [1; 2; 3; 4; 5; 6; 7]</p>
      <p>
        Studies from numerous fields adopt the term ‘corpus’ to refer to a collection of text data,
but they treat entire texts as singular entities rather than systematically analysing
collections of texts to generalize linguistic findings across the entire corpus or specific
subsets within it [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>The aim of this article is to conduct a linguistic analysis focused on synonymous
variations of the word 'war' as found in the British National Corpus (BNC) using the Sketch
Engine. The goal is to explore how this critical term is used across different contexts and to
identify patterns or shifts in its usage.
0000-0002-5986-4618 (Z. Rybchak); 0000-0002-2334-0660 (O. Kulyna)</p>
      <p>© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>The novelty of this research lies in its utilization of the Sketch Engine, a powerful tool
for corpus linguistics, to delve into the nuanced variations of the word 'war' within a large
and diverse corpus like the BNC. This method allows for a comprehensive examination of
linguistic contexts where 'war' appears and offers insights into the semantic and
pragmatic aspects of its usage.</p>
      <p>The hypothesis of this study is that 'war' exhibits significant synonymy across the BNC,
appearing in various linguistic forms and contexts that reflect the multifaceted nature of
conflicts and warfare in the English language. Through this analysis, the researchers expect
to uncover distinctive patterns of synonymous usage, shedding light on how language
shapes and reflects attitudes towards conflict and related phenomena.</p>
      <p>We have defined the following tasks for this study:
1. Identify a synonym series for the word "war" within the context of texts using Sketch</p>
      <p>Engine.
2. Analyze different synonymous options and their usage contexts to reveal nuances in
the meaning of the word "war."
3. Determine the most commonly used synonyms.
4. Understand the contextual shades in which these synonyms are used and their
impact on the perception of the texts in which they are employed.</p>
      <p>One key benefit of corpus-based analyses is the ability to generalize findings across a
diverse range of texts, providing a more representative and nuanced understanding of
language usage. Unlike traditional approaches that rely on individual examples or limited
data sets, corpus linguistics enables researchers to identify patterns that may not be
immediately apparent, contributing to a more comprehensive picture of how language
evolves and adapts over time. This holistic view is crucial for advancing our understanding
of semantic variation and pragmatic nuances surrounding important concepts like 'war'.</p>
      <p>Moreover, the incorporation of computational tools like the Sketch Engine enhances the
efficiency and accuracy of linguistic analyses. By automating the process of data retrieval
and analysis, researchers can focus more on interpreting results and drawing meaningful
conclusions from the corpus. This synergy between computational techniques and linguistic
inquiry underscores the interdisciplinary nature of corpus linguistics, which bridges
theoretical insights with practical applications in diverse fields such as lexicography,
discourse analysis, and sociolinguistics.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>Corpus analysis can be done by integrating computational methods of natural language
processing.</p>
      <p>
        J. Dunn stated that the use of text classification and text similarity models demonstrates
how we can enhance our capabilities in conducting corpus linguistics on extensive
databases [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. These computational techniques are gaining significance as corpora expand
beyond the scope of traditional linguistic analysis methods.
      </p>
      <p>For our research we use keyword extraction which involves automatically extracting the
most pertinent information from text using various tools and machine learning algorithms.
We can customize our software to identify keywords that align with our specific
requirements. This way we can experiment with provided sample keyword extractor tool.</p>
      <p>
        The British National Corpus (BNC) is a collection of 100 million words sampled from
written and spoken English across various sources [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. It aims to represent a diverse range
of British English from the later part of the 20th century. For our survey we used the BNC
XML Edition. British National Corpus works with Sketch Engine and offers a complete set of
tools such as word sketch, thesaurus, keyword, word list, n-grams, concordance, trends and
text type analysis. Our research was limited to word sketch, thesaurus and concordance.
The word sketch examines the collocates and contextual words associated with a particular
word. It provides a concise summary of the word’s grammatical and collocation patterns on
a single page. The findings are categorized into grammatical relations. The thesaurus in
Sketch Engine automatically generates compilation of synonyms or words that belong to
the same semantic category (semantic field). This list is created by analysing the context in
which these words appear within the chosen text corpus. The concordance tool in Sketch
Engine offers a wide range of search options and can locate words, phrases, tags, documents,
tags types or corpus structures and presents the results in context as a concordance. Users
can sort, filter, count and further process the concordance to achieve their desired
outcomes.
      </p>
      <p>The Corpus Query Language (CQL) was used as a specific code or query language in
Sketch Engine. It enables users to search for lexical patterns and set search criteria that are
beyond what the standard user interface allows.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Analysis and discussion</title>
      <p>We enter Sketch Engine and select the British National Corpus (BNC) corpus. Initially, we
created a profile for the word "war." To do this, in Word Sketch, I entered the lemma "war"
specifying the part of speech as a noun. The noun "war" is used 21,541 times and functions
as a modifier; the word "war" modifies another word; verbs used with the word "war" were
obtained; as an object and as a subject; other nouns used with "war" with the conjunction
"and"; prepositional phrases; adjectives.</p>
      <sec id="sec-3-1">
        <title>Next, we visualized the table using the "Show Visualization" button. Figure 2: Visualisation Next, we could see how the collocations were used in context and explore the metadata by clicking on the icon marked with "I."</title>
      </sec>
      <sec id="sec-3-2">
        <title>We present usage examples for consideration:</title>
        <p>A civil war in the United States in the final decade of this century leads to the formation
of a breakaway group.</p>
        <p>I think Mao was quite keyed up on the whole situation, I think he realized that to win
the war they had to erm adjust the mass support very carefully, and I think that's basically
what this I think that's why two months later they er they er gave up this document cos he
was worried then they'd lose the middle peasants' support.</p>
        <p>Across the country, more than 5m of Mozambique's 16m people have been displaced by
the war between President Joaquim Chissano's government forces and the Renamo
rebels. &lt;/s&gt;&lt;s&gt; Now, as the peace seems to hold, families are beginning to go home.</p>
        <p>Next, our objective was to generate a synonym series for the word "war" using the
Thesaurus button. We selected to display the first 50 results initially. Then, the search was
narrowed down to 20 results.</p>
        <p>For convenience, we downloaded all the data into a folder on the desktop. We also
created a visualization of the executed search.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Let's look at examples in specific contexts: The word "conflict" is used 7,075 times in various contexts. For analysis, we conduct an examination of the word "war" and each of the first 20 nouns from the synonym series using Sketch Engine's Difference and Concordance tools.</title>
        <p>Note that we are identifying the synonym series: "revolution," "invasion," "rebellion,"
which were found under "Conflict."</p>
        <p>It would be interesting to explore the Concordance:</p>
      </sec>
      <sec id="sec-3-4">
        <title>Examples:</title>
        <p>Thanks to our parliamentary system and the stability that it has given us, the British
people have been spared the horrors of revolution, civil war and invasion for more than 300
years. At the same time, it is also clear that there is not a strict and invariable relation
between war, particularly defeat in war , and political revolution.</p>
        <p>Therefore the exclusion of non-commercial ventures currently contained in the Transfer
Regulations is in conflict with the EC Acquired Rights Directive and the exception is likely
to be meaningless. Let's move on to the word "campaign," which is used 10,267 times.</p>
        <p>Again, we encounter three synonyms: "revolution," "invasion," "rebellion."</p>
      </sec>
      <sec id="sec-3-5">
        <title>Examples:</title>
        <p>Even in eastern Europe the active anti-semitic campaigns , which were to stimulate the
mass emigration of the Jews, still lay in the future</p>
        <p>The campaign is fought on a national, and party, basis.</p>
        <p>Despite this the two parties began almost immediately to undertake joint campaigns.</p>
        <p>Let's move on to the next word "action," used 25,180 times, which also includes the
synonymous series: revolution, invasion, rebellion.</p>
        <p>The word "crisis" is used 6,440 times and underscores its synonymy with revolution,
invasion, rebellion, and conflict.</p>
      </sec>
      <sec id="sec-3-6">
        <title>Examples:</title>
        <p>Thus during the worst crisis in British industrial history neither the labour movement
nor its radical Left were able to take advantage of the situation. The annual national rate of
destruction of tropical rainforest has increased by 147 per cent since the Third World
debt crisis began in 1982, according to an analysis by Friends of the Earth (FoE) of figures
published by the UN Food and Agriculture Organisation (FAO). Other issues remained
unresolved, including terms of trade and the debt crisis , action to combat global warming,
and the means of safeguarding tropical forests. Let's move on to the word "situation," used
19,576 times, which offers additional synonyms: revolution, invasion, rebellion.</p>
      </sec>
      <sec id="sec-3-7">
        <title>We conduct a Concordance and proceed to the examples:</title>
      </sec>
      <sec id="sec-3-8">
        <title>Examples:</title>
        <p>There is one interesting situation in which the rule is broken, which can also be
interpreted along the above lines.</p>
        <p>Erm but the the point about today's discu discussions I don't call them interviews
because er it's a self employed situation .</p>
        <p>The climate is right, and we believe it could be sound financial management in a very
difficult situation , it's been referred to, should we borrow.</p>
        <p>The next word, "development," is used 32,898 times and complements the synonym
series with the words "invasion," "revolution," and "rebellion."</p>
      </sec>
      <sec id="sec-3-9">
        <title>We analyze the concordance and extract examples.</title>
      </sec>
      <sec id="sec-3-10">
        <title>Examples:</title>
        <p>In other regions we see scattered developments , again of figures which appear more or
less subsidiary to the whole design.</p>
        <p>The filling motifs – petals joined in twos, urn-peltae, squares with guilloche knots and
floral scrolls – are all to be found, in various stages of development , elsewhere in the region.</p>
        <p>Indeed, a close examination of the mosaics from Yorkshire and Humberside seems to
reveal notable contrasts with the west in the number and importance of individual
workshops as well as in the significance of planned developments (i.e. strategies of mosaic
building desired by clients or, apparently, followed by mosaicists).</p>
        <p>Let's look at and analyze "operation," used 15,564 times. It also highlights additional
synonyms: revolution, invasion, rebellion.</p>
      </sec>
      <sec id="sec-3-11">
        <title>We extract examples from the Concordance.</title>
        <p>Next, we selected the "Oneclick dictionary" option and gained access to dictionaries.</p>
        <p>By analyzing and using examples, we identify words that, according to their meaning and
contexts, do not correspond to the word "war": event, action, development, change, policy,
life, movement, education, business, project, activity, market, situation.</p>
        <p>Synonym series of the word "war":</p>
      </sec>
      <sec id="sec-3-12">
        <title>We have excess to variety of dictionaries.</title>
      </sec>
      <sec id="sec-3-13">
        <title>We checked the meaning with Cambridge Dictionary.</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>This pie chart provides a breakdown of ten distinct categories of synonyms for ‘war’
according to the findings:</p>
      <p>Operation represents 21, 8 % of all recorded synonyms, indicating planned military
actions or maneuvers with specific objectives.</p>
      <p>Attack accounts for 15, 2 % of synonyms, reflecting offensive actions aimed at causing
harm or damage.</p>
      <p>Campaign represents 14,4 % of synonyms, signifying organized military operations with
specific objectives.</p>
      <p>War reached at 10, 1 %, reflecting large-scale armed conflicts between nations or groups.</p>
      <p>Conflict accounts for 9,9 % of collected synonyms, representing various disputes or
disagreements, ranging from interpersonal to societal issues.</p>
      <p>Crisis reached 9 % of all synonyms, reflecting critical situations marked by instability
and potential escalation.</p>
      <p>Battle represents 6 % of synonyms, signifying engagements characterized by intense
combat and strategic maneuvers.</p>
      <p>Revolution comprises 5% of conflicts, denoting organized movements to overthrow
established political or social systems.</p>
      <p>Rebellion represents 1,5 % of synonyms, indicating acts of resistance or defiance against
authority.</p>
      <p>Accuracy and context are essential aspects of conducting linguistic analysis, particularly
when examining the synonym series of the word "war" in the British National Corpus (BNC)
using Sketch Engine. This analysis presents several challenges that require careful
consideration:</p>
      <p>Defining the parameters for constructing the synonym series of "war" poses a significant
challenge due to the diverse array of words used in contexts related to conflict, each with
subtle variations in meaning.</p>
      <p>Adjusting search parameters within Sketch Engine, such as selecting appropriate sub
corpora, time frames, and refining constraints, is crucial. Improper settings can yield
misleading or inaccurate results, impacting the integrity of the analysis.</p>
      <p>Analyzing the usage of each word within the synonym series across different contexts is
essential for grasping their semantics and nuanced meanings. However, aligning these
contexts perfectly with the intended theme of "war" can prove challenging.</p>
      <p>The vast amount of textual data contained in the British National Corpus necessitates
thorough processing and analysis. This undertaking demands considerable time and
patience to extract meaningful insights and draw valid conclusions. It is important to
acknowledge that the examples provided in the analysis may not always align seamlessly
with the thematic focus on "war," thus affecting the precision of the findings. Absolute
accuracy cannot be guaranteed due to the inherent complexities of linguistic data analysis.
Navigating these challenges requires a meticulous approach to ensure the reliability and
validity of the research findings derived from linguistic analysis within large-scale corpora
like the BNC.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In conclusion, this study not only contributes to the theoretical foundations of corpus
linguistics but also offers practical insights into the usage and representation of 'war' within
the English language. By elucidating the synonymous variations of this critical term, we aim
to provide valuable perspectives on how language functions as a dynamic and adaptive
system, reflecting and shaping human experiences of conflict and warfare. This research
underscores the enduring relevance of corpus linguistics as a powerful tool for exploring
the intricacies of language in all its richness and complexity.</p>
      <p>The findings of this study are illustrated in the accompanying pie chart, which breaks
down ten distinct categories of synonyms for 'war' based on their prevalence within the
corpus analysis: operation represents 21.8% of all recorded synonyms, indicating planned
military actions or maneuvers with specific objectives; attack accounts for 15.2% of
synonyms, referring to offensive actions aimed at causing harm or damage; campaign
represents 14.4% of synonyms, indicating organized military operations with specific
objectives; war represents 10.1% of synonyms, reflecting large-scale armed conflicts
between nations or groups; conflict contains for 9.9% of synonyms, representing various
disputes or disagreements ranging from interpersonal to societal issues; crisis represents
9% of synonyms, reflecting critical situations marked by instability and potential escalation;
battle - 6% of synonyms, signifying engagements characterized by intense combat and
strategic maneuvers; revolution comprises 5% of synonyms, denoting organized
movements to overthrow established political or social systems; rebellion possesses 1.5%
of synonyms, indicating acts of resistance or defiance against authority.</p>
      <p>This detailed analysis not only enhances our understanding of linguistic diversity but
also highlights how language shapes our perception of complex societal issues (in our
research ‘war’)..</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E. B.</given-names>
            <surname>Soderqvist</surname>
          </string-name>
          ,
          <article-title>Evidentiality across age and gender: A corpus-based study of variation in spoken British English</article-title>
          , in Research in Corpus Linguistics,
          <string-name>
            <surname>V.</surname>
          </string-name>
          <year>5</year>
          . doi:
          <volume>10</volume>
          .32714/ricl.05.02.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Kwary</surname>
          </string-name>
          ,
          <article-title>A corpus and a concordancer of academic journal articles</article-title>
          , in Data in Brief, V.
          <volume>16</volume>
          ,
          <year>2018</year>
          . doi:
          <volume>10</volume>
          .1016/j.dib.
          <year>2017</year>
          .
          <volume>11</volume>
          .023.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Starko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rysin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shvedova</surname>
          </string-name>
          , Ukrainian Text Preprocessing in GRAC, in
          <source>2021 IEEE 16th International Conference on Computer Sciences and Information Technologies</source>
          ,
          <string-name>
            <surname>LVIV</surname>
          </string-name>
          , Ukraine,
          <year>2021</year>
          , doi: 10.1109/CSIT52700.
          <year>2021</year>
          .
          <volume>9648705</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Shvedova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rysin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Starko</surname>
          </string-name>
          , Handling of Nonstandard Spelling in GRAC, in
          <source>2021 IEEE 16th International Conference on Computer Sciences and Information Technologies</source>
          ,
          <string-name>
            <surname>LVIV</surname>
          </string-name>
          , Ukraine. Vol.
          <volume>2</volume>
          . P.
          <volume>105</volume>
          -
          <fpage>108</fpage>
          ,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .1109/CSIT52700.
          <year>2021</year>
          .
          <volume>9648834</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Starko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rysin. VESUM: A Large Morphological</surname>
          </string-name>
          <article-title>Dictionary of Ukrainian As a Dynamic Tool</article-title>
          . COLINS-2022
          <source>: 6th International Conference on Computational Linguistics and Intelligent Systems, May 12-13</source>
          ,
          <year>2022</year>
          . URL: https://ceur-ws.org/Vol3171/paper8.pdf
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. Z.</given-names>
            <surname>Lahjouji-Seppälä</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rabus</surname>
          </string-name>
          &amp; R. von Waldenfels,
          <article-title>Ukrainian standard variants in the 20th century: stylometry to the rescue</article-title>
          ,
          <source>in Russ Linguist</source>
          ,
          <year>2022</year>
          . doi.org/10.1007/s11185-022-09262-9
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Egbert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Burch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Biber</surname>
          </string-name>
          ,
          <article-title>Lexical dispersion and corpus design</article-title>
          , in
          <source>International Journal of Corpus Linguistics, V. 25, I. 1</source>
          ,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1075/ijcl.18010.egb.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Crosthwaite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ningrum</surname>
          </string-name>
          , M. Schweinberger,
          <article-title>Research trends in corpus linguistics: a bibliometric analysis of two decades of Scopus-indexed Corpus Linguistics Reaserch in Arts and Humanities</article-title>
          , in
          <source>International journal of Corpus Lingustics</source>
          ,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .1075/ijcl.21072.cro.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Dunn</surname>
          </string-name>
          ,
          <source>Natural Language Processing for Corporus Linguatics</source>
          . Cambridge University press,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .1017/9781009070447.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>British</given-names>
            <surname>National</surname>
          </string-name>
          <article-title>Corpus</article-title>
          . URL: www.natcorp.ox.ac.uk/corpus.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>