<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Medical Causation in Defining Emotions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bektemyssova Gulnara</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sabdemov Aidos</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Engineering and Information Security, Processing homogeneous and heterogeneous data, International Information Technology University</institution>
          ,
          <addr-line>st. Manasa 34/1, Almaty</addr-line>
          ,
          <country country="KZ">Kazakhstan</country>
        </aff>
      </contrib-group>
      <fpage>138</fpage>
      <lpage>143</lpage>
      <abstract>
        <p>Emotions are an essential component of human nature, which can describe a person's health and help determine this condition's causes. Proceeding from this, it becomes obvious that health plays a vital role in forming one of emotion condition, and in the reverse order, any emotion can describe the state of human health. This approach can provide medical personnel with important information about patients: emotions, state of health, and establishing cause and effect relationships. How-ever, the creation of this model is hampered by the lack of large labeled datasets. Thus, the study's main goal is to create a dataset that would have information about the emotional state of a person and causal medical relationships that affect a person's emotional state. We conduct comprehensive data collection and analysis, using state-of-the-art models for assessing emotions, medical extractions of creatures, and determination of cause-and-effect relationships.</p>
      </abstract>
      <kwd-group>
        <kwd>eHealth</kwd>
        <kwd>Emotion recognition</kwd>
        <kwd>Named Entity Recognition</kwd>
        <kwd>Cause-effect</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>With online communication progress, emotional information becomes
significantly valuable not only for social research but also for medical analysis.
Life-threatening, severe symptoms such as coughing, breathing difficulties, heart
failure, and fatigue cause a compassionate person’s state, leading to various
feelings and emotions, such as surprise to anger or fear to joy, and others. Given
emotions, for example, help detect treatment effect and state condition of human.</p>
      <p>
        There are several problems in the research of emotion cause extraction. The
most notable is no data for emotion cause analysis. First studies defined as a
problem of emotion cause extraction described in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Studying the experience of
given research [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] improved and released a novel dataset that becomes a
benchmark dataset for emotion cause extraction research. The task for emotion cause
extraction was also studied in novel researches where the problem was addressed
as a clause-level binary classification problem [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3-5</xref>
        ]. The next problem stands for
the small size of the annotated corpus. Consequently, many deep learning models
are not relevant for emotion cause extraction. The last problem in our research
is defining the relationship between causes and health. Up-to-date improvement
of medical text extraction researches [
        <xref ref-type="bibr" rid="ref10 ref11 ref6 ref7 ref8 ref9">6-11</xref>
        ] was made possible by applying
machine learning techniques for medical named entity recognition (NER) and
relation extraction (RE) applying modern models as Conditional Random Field Long
Short-Term Memory. However, extracting medical text mining has limitations.
To tackle this problem, recent study BioBERT [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] outperform all previous work
and become a state-of-the-art benchmark for NER and RE tasks, which is based
on powerful model BERT [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>Deep learning opens an extensive range for research in any field and
complex field as medicine. However, to work with deep learning, large data is
required. Our work aims to build a model for analyzing human emotional behavior
according to medical and other causes of this emotion. The main problem we
tackled is the lack of quality data. For this purpose, we decide to create a corpus
for our future research based on this topic. The paper has the following
organization. Section 3 discusses the novel corpus creation, including algorithmic and
implementation terms. Section 4 reviews the results, and Section 5 concludes and
discus future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Construction of emotion cause corpus</title>
      <p>
        In this section, we first describe the linguistic phenomenon in emotion expressions.
It serves as the inspiration to develop the annotated dataset. We then introduce
details of the annotation scheme, followed by the construction of the dataset.
Today there is a lack of research and dataset for Emotion Cause, which makes this
work relevant. To date, there are two studies by [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>2.1 Emotion-cause pair extraction corpus</title>
      <p>ECPE corpus was constructed based on ECE corpus, where one utterance belong
to one emotion and related to one causes. ECE corpus consist of Chinese news
containing 20,000 articles. After removing irrelevant instances, there are 2,105
instances with cause relation. Emotion cause annotated as &lt;cause &gt;, and the
emotion as &lt;keywords &gt;. Where, 97.2% of data has one emotion cause, other
2.8% respectively.</p>
      <p>Example from data with cause: &lt;keywords &gt;sadness &lt;keywords &gt;, &lt;cause
&gt;sadness &lt;cause &gt;, because of sadness excessive without cause: &lt;keywords
&gt;fear &lt;keywords &gt;, &lt;cause&gt;&lt;cause &gt;, there are lingering palpitations, she still
has lingering palpitations.</p>
    </sec>
    <sec id="sec-4">
      <title>2.2 Emotion-stimulus data</title>
      <p>
        The Emotion-stimulus data corpus [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] consists of 820 rows of data including
emotion tag and emotion cause. Data annotated in XML format: &lt;cause&gt;and &lt;\
cause&gt;belongs to emotion cause. However, &lt;emotion type&gt;describes emotion.
The given study was built with FrameNet tool [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] into the frame of Ekman’s six
emotion classes [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and finally annotated by human to verify them.
      </p>
      <p>Example from data: &lt;fear &gt;People are becoming more and more concerned,
&lt;cause &gt;about the healthiness of their diet and way of life &lt;\cause &gt;. &lt;\fear &gt;.</p>
    </sec>
    <sec id="sec-5">
      <title>2.3 Custom Web Dataset</title>
      <p>
        Despite the fact that data were collected from available corpuses, this is still
insufficient for extensive analysis. As a result, additional data was collected from
“Psychiatric Treatment Adverse Reactions” (PsyTAR) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] dataset and medical
forums with total amount – 4000 of data. The difficulty lies in the fact that they
are not annotated for causal relationships. This is the main task from which the
following steps stand out. We split the emotion cause extraction task into two
subtasks with the purpose to get a set of emotion clauses:
and set of cause clauses for each document.
      </p>
      <p>
        For cause relevance, we decide to use a keyword matching pattern.
According to [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], there are six linguistic groups of keywords. They are essentially
correlated with causes, as shown in Table 1. By given keywords, the manual corpus
will be filtered.
‘for’, ’as’
‘because’, ‘so’, ‘but’, ’after’
‘to think about’, ‘to talk about’
V:EpistemicMarkers
      </p>
      <p>‘to hear’, ‘to see’, ‘to know’, ‘to exist’
VI:Others</p>
      <p>‘is’, say’, ‘at’, ‘can’</p>
      <p>
        For emotion relevance, we collected data from two datasets: Twitter Emotion
Corpus (TEC) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and CrowdFlower (CF) [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] with total amoubt: 61051 tweets
and trained it on four models: Naive Bayes (NB) and Support Vector Machine
(SVM), BERT, Multi-label BERT
      </p>
      <p>After prepossessing, the number of examples per emotion decreased
significantly, due to significant noises in data. As a result, we manually picked from
filtered data 800 examples for each emotion for training and 200 examples for
emotions for the test set.</p>
      <p>We use the F1 score for evaluation, which calculated:</p>
      <p>After training on given models we got we that multi label BERT outperforms
different model. For other models results described in Table 2:
Example from data: &lt;happy &gt;I feel much better after &lt;cause &gt;taking the
headache medicine. &lt;\cause &gt;&lt;\happy &gt;.</p>
      <p>As a result of three corpora, we got a single dataset consisting of ECPE, ESD,
and CWD, described in Table 3. But since the main goal is to identify the medical
reason in a particular emotion, medical relevance will be applied to the assembled
dataset.</p>
      <p>Sum
615
371
612
882
778
242
3500</p>
      <p>For medical relevance we decide to use BioBERT, which significantly
outper-forms previous state of the art researches in different types of medical
text miningtasks, such as question answering ( MRR by 12.24%), named entity
recognition(F1 by 0.62%) and medical relation extraction ( F1 by 2.80%)
Medical clauses</p>
      <p>After applying medical relevance we got results R as subtraction of given sets:
The amount of data that is related to medicine decreased from 3500 to 986
data units, which is about 28% of all data. Final annotated data have XML format
annotation. Where, &lt;cause&gt;and &lt;\cause&gt;belongs to the emotion cause. However,
&lt;mcause&gt;and &lt;\mcause&gt;to the medical-emotion. For emotion, &lt;emotion type&gt;
tag was applied.</p>
      <p>Example from data:
• For medical cause: &lt;happy&gt;I feel much better after &lt;mcause&gt; taking the
headache medicine. &lt;\mcause&gt; &lt;\happy&gt;
• For other cause: &lt;sad &gt;I am sad &lt;cause&gt;nobody wants to do it like I have
done it for them. &lt;\cause&gt;&lt;\sad&gt;
4</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and discussion</title>
      <p>In this paper, we present our work on medical causation in defining emotions.
Lack of data for building and training a model was the driving force for creating
corpus. We also describe the medical emotion cause extraction method to capture
required data consisting of 3 main methods: emotion relevance, cause relevance,
medical relevance. For emotion and medical relevance, state-of-the-art BERT
models were used. However, cause relevance stands for the key word matching
method, which needs improvement in future work. Given corpus helps us create
the first model for analyzing and extracting emotional causes related to health and
different events. We believe that the proposed work will help better investigate
treatment effect and help understand human health’s real state.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Chen</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
          </string-name>
          . S, and
          <string-name>
            <surname>Huang</surname>
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Emotion cause detection with linguistic constructions</article-title>
          .
          <source>In Proceedings of the 23rd International Conference on Computational Linguistics (COLING)</source>
          , pages
          <fpage>179</fpage>
          -
          <lpage>187</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Gui</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            <given-names>Q.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Zhou</surname>
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Event-driven emotion cause extraction with corpus construction</article-title>
          .
          <source>In Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pages
          <fpage>1639</fpage>
          -
          <lpage>1649</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Li</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feng</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>D.</given-names>
          </string-name>
          , and Zhang Y.:
          <article-title>A co-attention neural network model for emotion cause analysis with emotional context awareness</article-title>
          .
          <source>In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <fpage>4752</fpage>
          -
          <lpage>4757</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Xu</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diao</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            <given-names>L.</given-names>
          </string-name>
          , and Xu L.:
          <article-title>Extracting emotion causes using learning to rank methods from an information retrieval perspective</article-title>
          .
          <source>IEEE Access</source>
          .
          <article-title>(</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Yu</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rong</surname>
            <given-names>W.</given-names>
          </string-name>
          , Zhang Z.,
          <string-name>
            <surname>Ouyang</surname>
            <given-names>Y</given-names>
          </string-name>
          , and Xiong Z.:
          <article-title>Multiple level hierarchical network-based clause selection for emotion cause extraction</article-title>
          .
          <source>IEEE Access</source>
          ,
          <volume>9071</volume>
          -
          <fpage>9079</fpage>
          . (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Habibi</surname>
            <given-names>M.</given-names>
          </string-name>
          et al.:
          <article-title>Deep learning with word embeddings improves biomedical named entity recognition</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>33</volume>
          ,
          <fpage>37</fpage>
          -
          <lpage>48</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Bhasuran</surname>
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Natarajan</surname>
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Automatic extraction of gene-disease associations from literature using joint ensemble learning</article-title>
          .
          <source>PLoS One</source>
          ,
          <volume>13</volume>
          , e0200699 (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Giorgi</surname>
            <given-names>J.M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Bader G.D.</surname>
          </string-name>
          :
          <article-title>Transfer learning for biomedical named entity recognition with neural networks</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>34</volume>
          ,
          <issue>4087</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Wang</surname>
            <given-names>X.</given-names>
          </string-name>
          et al.:
          <article-title>Cross-type biomedical named entity recognition with deep multi-task learning</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>35</volume>
          ,
          <fpage>1745</fpage>
          -
          <lpage>1752</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lim</surname>
            <given-names>S.</given-names>
          </string-name>
          and Kang J.
          <article-title>Chemical-gene relation extraction using recursive neural network</article-title>
          .
          <source>Database</source>
          . (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Yoon</surname>
            <given-names>W.</given-names>
          </string-name>
          et al.:
          <article-title>Collabonet: collaboration of deep neural networks for biomedical named entity recognition</article-title>
          .
          <source>BMC Bioinformatics</source>
          ,
          <volume>20</volume>
          ,
          <issue>249</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lee</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoon</surname>
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>So</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            <given-names>J.:</given-names>
          </string-name>
          <article-title>BioBERT: a pre-trained biomedical language representation model for biomedical text mining</article-title>
          ,
          <source>Bioinformatics</source>
          , Volume
          <volume>36</volume>
          ,
          <fpage>1234</fpage>
          -
          <lpage>1240</lpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Devlin</surname>
            <given-names>J</given-names>
          </string-name>
          . et al.:
          <article-title>Bert: pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers), Minneapolis, MN, USA. pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Xia</surname>
            <given-names>R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ding Z.</surname>
          </string-name>
          :
          <article-title>Emotion-cause pair extraction: A new task to emotion analysis in texts</article-title>
          .
          <source>In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <fpage>1003</fpage>
          -
          <lpage>1012</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ghazi</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Inkpen</surname>
            <given-names>D.</given-names>
          </string-name>
          , and Szpakowicz S.:
          <article-title>Detecting emotion stimuli in emotion-bearing sentences</article-title>
          .
          <source>In International Conference on Intelligent Text Processing and Computational Linguistics (CICLing)</source>
          , pages
          <fpage>152</fpage>
          -
          <lpage>165</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Ghazi</surname>
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Inkpen</surname>
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Szpakowicz</surname>
            <given-names>S.</given-names>
          </string-name>
          :
          <source>Computational Linguistics and Intelligent Text Processing Lecture Notes in Computer Science</source>
          , Vol.
          <volume>9042</volume>
          ,
          <fpage>152</fpage>
          -
          <lpage>165</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Fillmore</surname>
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petruck</surname>
            <given-names>M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruppenhofer</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wright</surname>
            <given-names>A</given-names>
          </string-name>
          .:
          <source>FrameNet in Action: The Case of Attaching. IJL</source>
          <volume>16</volume>
          (
          <issue>3</issue>
          ),
          <fpage>297</fpage>
          -
          <lpage>332</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Ekman</surname>
            <given-names>P.:</given-names>
          </string-name>
          <article-title>An argument for basic emotions</article-title>
          .
          <source>Cognition &amp; Emotion</source>
          <volume>6</volume>
          (
          <issue>3</issue>
          ),
          <fpage>169</fpage>
          -
          <lpage>200</lpage>
          (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Zolnoori</surname>
            <given-names>M.</given-names>
          </string-name>
          et al.:
          <article-title>“The PsyTAR dataset: From patients generated narratives to a corpus of adverse drug events and effectiveness of psychiatric medications</article-title>
          .
          <source>” Data in Brief 24</source>
          . (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20. Twitter Emotion Corpus
          <year>2012</year>
          . http://saifmohammad.com/WebPages/SentimentEmotionLabeledData.html
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>CrowdFlower</surname>
          </string-name>
          .
          <year>2016</year>
          . https://www.figureeight.com
          <article-title>/data/sentiment-analysis-emotion-text/.</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>