<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Anaphora Resolution from Social Media Text</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Information Systems, Birla Institute of Technology and Science Pilani</institution>
          ,
          <addr-line>Pilani, Rajasthan</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Anaphora resolution for social media texts is essential yet dificult task for text understanding. An important characteristic of anaphora is that it creates a connection between the antecedent and the anaphor buried in the anaphoric sentence. This paper outlines the methods used to locate anaphora and their antecedents in a particular text. The text is a social media tweet for the SocAnaRes-IL 2022 challenge that was part of FIRE 2022. The proposed model uses a Neural Co-reference Network for the anaphora resolution.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Co-reference Resolution</kwd>
        <kwd>Anaphora Resolution</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Neural Co-reference</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>developed model is able to provide the correct resolution for 73 anaphors by attempting on 285
on anaphors.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The tasks of co-reference resolution [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], textual entailment [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], learning textual similarity [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
and discourse relation sense classification [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have all demonstrated the huge success of neural
techniques. The shell noun dataset and a part of ARRAU that has pronominal abstract anaphora
in any form are used to train a neural mention-ranking model for the resolution of unrestricted
abstract anaphora [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. On the shell noun dataset, the model achieves state-of-the-art results
for the unrestricted abstract anaphora resolution task. For pronominal anaphors, the model is
still inadequate. The findings imply that frameworks for pronominal anaphors and nominal
anaphors should be learnt separately. Using syntactic information, the model can choose
candidates that make sense, but it can’t tell the diference between candidates with the same
type of syntactic information. On the other hand, if the model doesn’t get syntactic information,
it learns deeper features that help it choose the right antecedent without limiting the number
of possibilities. Therefore, the model must be compelled to initially choose suitable candidates
before continuing to learn features to separate them using a bigger training dataset to increase
performance. A model can be developed that selects candidates from the broader context as
well as sentences that contain the antecedent [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        The pairwise model has been used in the majority of earlier work on bridge anaphora
resolution, which make the assumption that the gold mention information is available [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ].
Without knowing any gold mention information, the antecedent for a given anaphor can be
determined by bridging anaphora resolution as question-answering based on context [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In
order to produce a significant amount of "quasi-bridging" training data, the developed
questionanswering architecture makes use of transfer learning and a cutting-edge technique.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>
        The training dataset provided by FIRE 2022 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] was provided in 4 diferent languages, English,
Hindi, Malyalam and Tamil respectively. Each of them contained about 100-700 text documents.
Each document contained a tweet or a series of tweets. The structure of document was:
Each document was structured as a tab-delimited value file with 3
columns1. The first column consists of the words/tokens.
2. The second column consists of the Markables. (Here the markables are the anaphors and
the antecedents).
3. The third column consists of the antecedent markable id.
      </p>
      <p>This tab-delimited value file format was unsuitable for NLP tasks like entity recognition and
vectorization. Since it was generated from tweet data it also contained unusable information
like tweet id and in some cases had not been cleaned. There were rows in the documents that
contained entire sentences with words separated by ’\n’ tokens. The data had to be pre-processed
ifrst into a more usable format and the structure of the table had to be disrupted to accommodate
the denoising and new tokenization. We generated a tokenized list of each tweet with new
indices due to the removal of unnecessary tokens and addition of word tokens in the samples
where tokenization was incorrectly done. This made the data more suited for our proposed
model.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Proposed Technique</title>
      <p>
        We utilised a statistical model pretrained on the English language as part of the NeuralCoref
network by huggingface. The first step of the process is to extract words or phrases that refer
to real entities. These include proper nouns like names and objects or possessives like ’My
brother’. The model then trains a set of features for each of these entities or ’mentions’. This is
done by taking an initial set of word embeddings and training them on the OntoNotes Corpus, a
large manually-annotated corpus used for coreference resolution tasks. This method of learning
features helps with coreference resolution tasks by segregating word vectors along attributes
like gender, which is helpful to the model when identifying antecedents for people’s pronouns.
However, it does not do a great job with coreference involving things and pronouns such as ’it’
or demonstrative pronouns like ’this’ and ’that’. To address these issues, the feature vector will
have to take into account contextual information surrounding the entity word or phrase. This
contextual information is added by averaging the feature vectors of words present around the
entity word and adding some other features based on phrase length, word location, speaker
information, etc. Once these vectors are ready the model train two separate neural networks to
identify antecedents. The first is a classification network that scores each ’mention’ to classify
whether it is the first instance of an entity or whether it has an antecedent. The second neural
networks generates a score for a ’mention’ and every possible antecedent in the text to identify
the most likely pair. The scores generated by both networks are non-probabilistic max-margin
objective scores. Once both these models are trained, the representations for each mention are
plugged into them and based on their scores the most probably antecedent is determined. Figure
1 shows the mechanisum for scoring the Co-references using neural network architecture [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Evaluation</title>
      <p>The Result from the Table 1 can be improved by applying several pre-processing procedures,
such as removing unnecessary number strings and precise tokenization for ’\n’ recognition in
tweets.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>In our three-step model, the first step of entity detection has minimal scope for improvement.
However, the feature generation step could be further built upon. Neuralcoref uses contextual
information surrounding the entity word to improve its representation vector. This includes
averaging the vectors for the surrounding words as well as taking integer representations of
factors like speaker and location. This can be further improved upon by using diferent methods
to add context to the feature representation. With a larger dataset, the third step of classification
neural networks could also be further improved by adding layers of complexity. With the
current dataset size, using large and complex language models doesn’t yield results but that is
liable to change as the number of datapoints increase.</p>
      <p>The current model is operating on a small dataset with requirement of several pre-processing
steps like removal of unnecessary numerical strings and accurate tokenization due for detection
of ’\n’ in the tweets. A thorough pre-processing of the dataset will automatically lead to a jump
in accuracy of the model.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] Anaphora resolution from social media text in indian languages (socanares-il</article-title>
          <year>2022</year>
          ), http://78.46.86.133/SocAnaRes-IL22/ (20212).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>[2] Neuralcoref 4.0: Coreference resolution in spacy with neural networks</article-title>
          , https://github.com/huggingface/neuralcoref (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <article-title>Deep reinforcement learning for mention-ranking coreference models</article-title>
          ,
          <source>arXiv preprint arXiv:1609.08667</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Bowman</surname>
          </string-name>
          , J. Gauthier,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rastogi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Potts</surname>
          </string-name>
          ,
          <article-title>A fast unified model for parsing and sentence understanding</article-title>
          ,
          <source>arXiv preprint arXiv:1603.06021</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Mueller</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Thyagarajan,</surname>
          </string-name>
          <article-title>Siamese recurrent architectures for learning sentence similarity</article-title>
          ,
          <source>in: Proceedings of the AAAI conference on artificial intelligence</source>
          , volume
          <volume>30</volume>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rutherford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Demberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <article-title>A systematic study of neural discourse models for implicit discourse relation</article-title>
          ,
          <source>in: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume</source>
          <volume>1</volume>
          ,
          <string-name>
            <given-names>Long</given-names>
            <surname>Papers</surname>
          </string-name>
          ,
          <year>2017</year>
          , pp.
          <fpage>281</fpage>
          -
          <lpage>291</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Marasović</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Born</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Opitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Frank</surname>
          </string-name>
          ,
          <article-title>A mention-ranking model for abstract anaphora resolution</article-title>
          ,
          <source>arXiv preprint arXiv:1706.02256</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Poesio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maroudas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hitzeman</surname>
          </string-name>
          ,
          <article-title>Learning to resolve bridging references</article-title>
          ,
          <source>in: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)</source>
          ,
          <year>2004</year>
          , pp.
          <fpage>143</fpage>
          -
          <lpage>150</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <article-title>A deterministic algorithm for bridging anaphora resolution</article-title>
          , arXiv preprint arXiv:
          <year>1811</year>
          .
          <volume>05721</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <article-title>Bridging anaphora resolution as question answering</article-title>
          , arXiv preprint arXiv:
          <year>2004</year>
          .
          <volume>07898</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>