<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>J. Joy);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Automatic Information Extraction and Inferencing System from Online News Sources for Substance Abuse Cases</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Judith George Joseph</string-name>
          <email>judithgeorgejoseph123@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jestin Joy</string-name>
          <email>jestinjoy@fisat.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sreeraj M</string-name>
          <email>sreeraj.sac@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sanjay Govind</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shijas Muhammed T P</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tibi Sunni</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Assistant Professor, Department of Computer Science And Engineering, Federal Institute of Science And Technology(FISAT)</institution>
          ,
          <addr-line>Kerala</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Assistant Professor, Department of Computer Science, Sree Ayyappa College</institution>
          ,
          <addr-line>Alappuzha, Kerala</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Computer Science And Engineering, Federal Institute of Science And Technology(FISAT)</institution>
          ,
          <addr-line>Kerala</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>ISIC'21: International Semantic Intelligence Conference</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>The rising number of substance abuse cases is a serious situation that demands significant attention. Gaining insights from the reported substance abuse cases will greatly help law enforcement authorities and policy makers. The unstructured nature of the publicly available data is a challenge. Computational techniques can be made use in eficiently extracting and summarising these unstructured data. The proposed system extracts the news reported on substance abuse related crimes from Malayalam online news papers. The extracted data is then processed using Natural Language Processing (NLP) techniques to generate a set of information that can be helpful in generating valuable inferences. Results show that the proposed system provide good accuracy for the data extraction task.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Information extraction</kwd>
        <kwd>NER</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Data Mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>The United Nations Ofice on Drugs and Crime (UN</title>
        <p>
          ODC) reports[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] that approximately 5 per cent of the
world’s population used an illicit drug in 2010 and 27
million people can be classified as problem drug users.
        </p>
        <p>
          Alcohol and illicit drug use cause around 39 deaths per
million population. In addition to causing death,
substance abuse is also responsible for significant
morbidity and the treatment of drug addiction creates a
tremendous burden on society. Significant rise in the
reported drug abuse cases is a serious public health
threat. Handling this problem needs the intervention
of government, law enforcement and public health
sector. World Health Organization (WHO) study[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
estimates that the four major cause of illicit drug use
death are AIDS, suicide, overdose and trauma. Based
on this, the median number of deaths are estimated
to be 1,94,058 as per 2000 estimates. Illicit drug use
also causes premature deaths in young adults and
adversely afects their overall health.
        </p>
        <p>
          Substance use is a problem in India too. Ministry of
Social Justice and Empowerment, Government of
India report[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] “Magnitude of Substance Use in India
2019” shows the dismal picture in India. After
Alcohol, Cannabis and Opioids is the most commonly used
substances in India and about2.8% of the population
use it. More than 30 lakh of the people with opioid use
disorders are from Indian states of Uttar Pradesh,
Punjab, Haryana, Delhi, Maharashtra, Rajasthan, Andhra
Pradesh and Gujarat. Enforcement activities report[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
by Excise department, Government of Kerala reports
that during 2019, 7099 cases are registered based on
Narcotic Drugs and Psychotropic Substances Act.
        </p>
        <p>
          Though governments publish[
          <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
          ] data regarding
substance abuse cases, it is not easy to get region wise
detailed information. For example detailed
information regarding size, type and location of registered cases
are not easy to find. But these information are
available in public domain through news reports.
Problem with these news reports are that, they are not in
a structured format. Various techniques[
          <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
          ] are
explored for extracting structured information from
unstructured textual data. Information extraction is
the process of extracting information from unstruc- Web NER
tured data. It is extensively used in medical document Crawler Training
mining, mining business and law documents. Internet
being a rich source of unstructured textual data, web
mining is also an active research area. The proposed Figure 1: NER Training
system extracts structured information from news
reports. News reports regarding substance abuse cases
reported in online edition of popular Malayalam news on crime location and 0.87 on drug quantity. But
inpapers are used for this purpose. These are then pro- formation regarding the dataset size and testing
inforcessed using Natural Language Processing (NLP) tech- mation is missing in the paper.
niques like Named Entity Recognition (NER) for ex- Rexy Arulanandam, Bastin Tony Roy Savarimuthu
tracting structured information. This information helps and Maryam A. Purvis proposed a system[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] for
exin getting information like places where more cases tracting crime information from newspaper articles.
are reported, most commonly used drug, amount of Named Entity Recognition (NER) coupled with
Condieach drug as reported in news etc. tional Random Field (CRF) is used to find crime
location in a sentence. 70 articles from Otago Daily Times
is used for evaluating the system. LBJ NER Tagger is
2. Related Works found to be the best tagger with a precision of 0.98.
Accuracy varies from 84% to 90% for for New Zealand
arStudy on information extraction techniques from un- ticles for the task of identifying locations in sentences
structured data is explored in literature[
          <xref ref-type="bibr" rid="ref5 ref6 ref7 ref8 ref9">5, 6, 7, 8, 9</xref>
          ]. and classifying it into crime location sentences.
This involves extracting data from medical text, busi- Eiji Aramakia, Yasuhide Miurab, Masatsugu Tonoike
ness and law documents. Most of the research revolves et al[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] proposed a system for extracting adverse drug
around using English as the language. We haven’t came events and efects from clinical records. Results on a
across much research[
          <xref ref-type="bibr" rid="ref10">10, 11</xref>
          ] on information extrac- study on 3,012 discharge summaries show that 7.7% of
tion from Malayalam unstructured text. This is mainly records include adverse event information, and 59% of
due to the unavailability of publicly available datasets them can be extracted automatically.
and computational techniques for processing text. Works Authors haven’t came across any similar systems
related to extracting drug related information unstruc- for extracting information from Malayalam news
artured text is discussed below. ticles.
        </p>
        <p>
          Extracting Substance Abuse Information from
Clinical Notes[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] was studied by Lybarger, Yetsigen et.al
They proposed a neural network architecture for auto- 3. Design and Implementation
matic extraction of substance abuse information from
clinical notes. A discrete model was also experimented Proposed system consists of two phases. In the first
for extracting information. These clinical notes were phase, relevant data is crawled from web and fed to
stored with information about patients’ substance abuse Named Entity Recognition Module (NER) for creating
history. The model was trained to find the presence a model for recognizing named entities. This phase is
of substances events like alcohol, drug, or tobacco. A given in Figure 1.
        </p>
        <p>
          Maximum Entropy (MaxEnt) model was used for clas- This phase is not an easy task since we need to NER
sifying the status. Other entities like amount, frequency, on Malayalam language text. Malayalam[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] is a
lanexposure history,.. were extracted using Conditional guage spoken in the Indian state of Kerala. It is one
random fields (CRF) model. Neural Multi-task Model of 22 scheduled languages of India and is spoken by
predicted all entities for all substances. 37,919,870 people. Malayalam follows a word order
        </p>
        <p>
          Khmael Rakm Rahem and Nazlia Omar proposed of SOV (subject-object-verb) generally. Malayalam is
a rule-based approach [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] for extracting drug related a heavily agglutinated and inflected language making
crime information from online newspaper articles. The it dificult for NER task. Diferent techniques are
extask involved extracting information like drug name, plored for Malayalam NER[
          <xref ref-type="bibr" rid="ref15 ref16 ref17 ref18 ref19">15, 16, 17, 18, 19</xref>
          ]. Most of
nationality, location and assess the quantity and price these are based statistical techniques. This study also
of drug. A set of grammatical and heuristic rules were used a statistical technique for NER.
used for this purpose. Data from Malaysian National Statistical model provided by Spacy1 is used in this
News Agency (BERNAMA) is used in the system. Sys- study. Tagged data is fed to the NER system for
traintem achieved a precision of 0.96 on drug names, 0.83
ing. Transition based approach[
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] is used for NER.
        </p>
        <p>This uses word embedding strategy using subword
features and bloom embeddings. CNN filter sizes are
chosen with beam search. 1D convolutional filters are
applied over the input text to predict how the upcoming
words may change the current entity tags.</p>
        <p>In the second phase, the trained model is made use
in extracting information. A rule base is also used for
this purpose. This is given in Figure 2. NER model
helps to identify relevant entities for the information
extraction task. Name, age, place, drug name, amount
and size is considered in the proposed system. Tagged
sentences are then fed to processing module, which
processes information based on handcrafted rules. A
snapshot of the rules used in the proposed system are
given below.
1. Name of the person, and drug appears in the
ini</p>
        <p>tial part of the news item.
2. If money occurs just before the drugs name, then</p>
        <p>it is assigned as that of the corresponding drug’s
3. First occurrence of location is assigned as that of Figure 3: NER output for processed sample news items
location
4. Person age is close to the name of the person
5. Amount of drug carried by the ofender is close to 3.2. Implementation
the drug name
3.1. Dataset</p>
      </sec>
      <sec id="sec-1-2">
        <title>Though there exists trained models for languages like</title>
        <p>English, publicly available tagged dataset for
Malayalam language is non existent for this task. Data is
extracted from online edition of Malayalam news sites of
Malayala Manorama, Mathrubhumi, Mangalam, News18 4. Results and discussion
Malayalam, Deshabhimani and Media One. Tagging
for NER was done using web frontend based on doc- The proposed system involves passing the news item
cano, which is an open source text annotation tool. It to NER module and processing it using the rule based
is an open source text annotation tool. It can be used system. Figure 3 shows the result of NER module for
to tag data for various tasks like named entity recogni- sample news items.
tion, text summarization and sentiment analysis. Data This is then fed to the processing module for
infercollected for training were from the period January ence. Output from the inference module is given in
2017 to December 2019. Figure 4.</p>
        <p>Processing of the data is done using Python
programming language. Spacy2 NER module is used for named
entity recognition, which forms the important
component of the system. The availability of pretrained
statistical models and support for large number of
languages makes Spacy a good choice for text processing.
each and every news story following the same
writing style. This is a major drawback of the system. For
example the accuracy of the entities quantity, person,
date are the lowest. Most news stories lack quantity
and date information in a standard format. Person
information is also dificult to identify since news
stories sometimes lack them and sometimes more person
names like that of law enforcement authorities are
included making it dificult for the system to correctly
identify it.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>5. Conclusion</title>
      <p>For evaluating the system, 50 substance abuse re- An automated system for generating valuable
inforlated news articles are collected. These news articles mation out of online news articles can reduce the
colosare from the period January 2020 to March 2020. The sal amount of efort that must be put in to do the same
collected news articles are manually verified to be of by other means. The data provided by the system can
substance abuse cases. These news articles are then aid in statistical research and study, generating key
infed to system and accuracy of the entity identified is ferences for investigations, for background studies in
recorded. Accuracy is found by matching the entities formulating action plans etc. Since the system
promanually with the predicted entities. For example re- cesses news reports on crimes related to substance abuse,
sults indicate that of the 50 news articles considered, the information provided is very significant and
releon 42 of them location entity is predicted correctly. vant as the issue is an ongoing serious social threat.</p>
      <p>Table 1 lists the accuracy identified by the system However in a broad sense the services provided by
for the given 50 news articles. the current version of the system is limited. Which</p>
      <p>Results indicate that system could identify the enti- also opens an opportunity for future enhancement. Now
ties location, drugs with reasonably good accuracy. Al- the system is providing only key aspects mentioned in
though system could identify most entities correctly, news. It can be modified into a full fledged inference
these are marked as those relevant by the rule based which increase it’s clarity. The proposed system can
system. The reduction in accuracy for other entities be enhanced in a way that it responds to user queries.
is due to the failure in the part of rule base to
correctly match the entity. Rules are framed manually
after going through news stories. We cant be sure of</p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>UNODC</surname>
          </string-name>
          , Atlas on substance use (
          <year>2010</year>
          ),
          <year>2011</year>
          . URL: https://www.who.int/publications/i/item/ 9789241500616.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>M. W.-S. Louisa</surname>
            <given-names>Degenhardt</given-names>
          </string-name>
          , Wayne Hall,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lynskey</surname>
          </string-name>
          , Illicit drug use,
          <year>2020</year>
          . URL: https://www.who.int/publications/cra/chapters/ volume1/
          <fpage>1109</fpage>
          -
          <lpage>1176</lpage>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>N. D. D. T. C.</surname>
          </string-name>
          (NDDTC),
          <source>Magnitude of substance use in india - 2019</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>o. K</surname>
          </string-name>
          . Excise department,
          <source>Month wise details of enforcement activities during</source>
          <year>2019</year>
          ,
          <year>2020</year>
          . URL: https://excise.kerala.gov. in/enforcement-activities-
          <volume>2</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Alawad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. X.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Yoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Blair</given-names>
            <surname>Christian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Penberthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mumphrey</surname>
          </string-name>
          , X.
          <string-name>
            <surname>-C. Wu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Coyle</surname>
          </string-name>
          , G. Tourassi,
          <article-title>Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks</article-title>
          ,
          <source>Journal of the American Medical Informatics Association</source>
          <volume>27</volume>
          (
          <year>2020</year>
          )
          <fpage>89</fpage>
          -
          <lpage>98</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Baumgartner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ittycheriah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Factoring fact-checks: Structured information extraction from fact-checking articles</article-title>
          ,
          <source>in: Proceedings of The Web Conference</source>
          <year>2020</year>
          ,
          <year>2020</year>
          , pp.
          <fpage>1592</fpage>
          -
          <lpage>1603</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Milosevic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gregson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hernandez</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Nenadic, A framework for information extraction from tables in biomedical literature</article-title>
          ,
          <source>International Journal on Document Analysis and Recognition (IJDAR) 22</source>
          (
          <year>2019</year>
          )
          <fpage>55</fpage>
          -
          <lpage>78</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lybarger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yetisgen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ostendorf</surname>
          </string-name>
          ,
          <article-title>Using neural multi-task learning to extract substance abuse information from clinical notes</article-title>
          ,
          <source>in: AMIA Annual Symposium Proceedings</source>
          , volume
          <volume>2018</volume>
          , American Medical Informatics Association,
          <year>2018</year>
          , p.
          <fpage>1395</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Rahem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Omar</surname>
          </string-name>
          ,
          <article-title>Drug-related crime information extraction and analysis</article-title>
          ,
          <source>in: Proceedings of the 6th International Conference on Information Technology and Multimedia</source>
          , IEEE,
          <year>2014</year>
          , pp.
          <fpage>250</fpage>
          -
          <lpage>254</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Mohandas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Nair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Govindaru</surname>
          </string-name>
          ,
          <article-title>Domain specific sentence level mood extraction from malayalam text</article-title>
          , in: 2012 International Conference on
          <article-title>Advances in Computing and CommuAuthors would like to thank the help extended by Adam nications</article-title>
          , IEEE,
          <year>2012</year>
          , pp.
          <fpage>78</fpage>
          -
          <lpage>81</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <article-title>Shamsudeen for providing the required dataset</article-title>
          and tag- [11]
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Nair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Jayan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sherly</surname>
          </string-name>
          , et al.,
          <article-title>Sentimaging frontend. sentiment extraction for malayalam</article-title>
          , in: 2014 International Conference on Advances in ComputReferences ing,
          <source>Communications and Informatics (ICACCI)</source>
          , IEEE,
          <year>2014</year>
          , pp.
          <fpage>1719</fpage>
          -
          <lpage>1723</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Arulanandam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. T. R.</given-names>
            <surname>Savarimuthu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Purvis</surname>
          </string-name>
          ,
          <article-title>Extracting crime information from online newspaper articles</article-title>
          ,
          <source>in: Proceedings of the second australasian web conference-</source>
          volume
          <volume>155</volume>
          ,
          <year>2014</year>
          , pp.
          <fpage>31</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Aramaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Miura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tonoike</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ohkuma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Masuichi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Waki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ohe</surname>
          </string-name>
          ,
          <article-title>Extraction of adverse drug efects from clinical records</article-title>
          .,
          <source>MedInfo</source>
          <volume>160</volume>
          (
          <year>2010</year>
          )
          <fpage>739</fpage>
          -
          <lpage>743</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G. F.</given-names>
            <surname>Simons</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Fennig</surname>
          </string-name>
          , Ethnologue: languages of Asia, sil International Dallas,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sreeja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Pillai</surname>
          </string-name>
          ,
          <article-title>Towards an eficient malayalam named entity recognizer analysis on the challenges</article-title>
          ,
          <source>Procedia Computer Science</source>
          <volume>171</volume>
          (
          <year>2020</year>
          )
          <fpage>2541</fpage>
          -
          <lpage>2546</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>C.</given-names>
            <surname>Malarkodi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Devi</surname>
          </string-name>
          ,
          <article-title>A deeper study on features for named entity recognition</article-title>
          ,
          <source>in: Proceedings of the WILDRE5-5th Workshop on Indian Language Data: Resources and Evaluation</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>66</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Jayan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rajeev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sherly</surname>
          </string-name>
          ,
          <article-title>A hybrid statistical approach for named entity recognition for malayalam language</article-title>
          ,
          <source>in: Proceedings of the 11th Workshop on Asian Language Resources</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>58</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ajees</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Idicula</surname>
          </string-name>
          ,
          <article-title>A named entity recognition system for malayalam using neural networks</article-title>
          ,
          <source>Procedia computer science 143</source>
          (
          <year>2018</year>
          )
          <fpage>962</fpage>
          -
          <lpage>969</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Thottingal</surname>
          </string-name>
          ,
          <article-title>Finite state transducer based morphology analysis for malayalam language</article-title>
          ,
          <source>in: Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>G.</given-names>
            <surname>Lample</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ballesteros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Subramanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kawakami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dyer</surname>
          </string-name>
          ,
          <article-title>Neural architectures for named entity recognition</article-title>
          ,
          <source>arXiv preprint arXiv:1603.01360</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>H.</given-names>
            <surname>Nakayama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kubo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Taniguchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liang</surname>
          </string-name>
          , doccano: Text annotation tool for human,
          <year>2018</year>
          . URL: https: //github.com/doccano/doccano, software available from https://github.com/doccano/doccano.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>