<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>NTTMU System in the 2nd Social Media Mining for Health Applications Shared Task</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Chen-Kai Wang</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science and Information Engineering, National Taitung University</institution>
          ,
          <addr-line>Taitung</addr-line>
          ,
          <country country="TW">Taiwan, R.O.C</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Graduate Institute of Biomedical Informatics, Taipei Medical University</institution>
          ,
          <addr-line>Taipei, Taiwan</addr-line>
          ,
          <country country="CN">R.O.C.</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Institute of Information Science</institution>
          ,
          <addr-line>Academia Sinica, Taipei</addr-line>
          ,
          <country country="TW">Taiwan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this study, we describe our methods to automatically classify Twitter posts describing events of adverse drug reaction and medication intake. We developed classifiers using linear support vector machines (SVM) and Naïve Bayes Multinomial (NBM) models. We extracted features to develop our models and conducted experiments to examine their effectiveness as part of our participation in AMIA 2017 Social Media Mining for Health Applications shared task. For both tasks, the best-performed models on the test sets were trained by using NBM with n-gram, partof-speech and lexicon features, which achieved F-scores of 0.295 and 0.615, respectively.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p></p>
      <p>Lexicon-based features: We used the ADR lexicon compiled in our previous work [5] to mark their presence
and developed two binary features for a tweet; one is the presence of drug names and the other is presence of
ADR mentions.</p>
      <p>In addition to the above features, we have tried to exploit a likely positive dataset [6] and employed different term
weighting methods, such as the transformed weight-normalized complement Naïve Bayes (TWCNB) [7]. Naïve Bayes
classifier and the weighted features, such as term frequency, inverse document frequency, length normalization and
complement class weighting, are used as the factors for TWCNB. Unfortunately, we could not achieve any significant
improvement over the above feature sets. We will report the details in the Results section.</p>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>For task 1, the configuration (14) adopting NBM algorithm with all proposed features achieves the highest F-measure,
49.92%, here. And for task 2, the same configuration (denoted as 4 in Table 2) also achieved the highest F-measure,
63.34%.</p>
      <p>P</p>
      <p>R</p>
      <p>F
In this paper, we gave a briefly introduction of our systems based on SVM and NBM algorithms and conducted
experiments to study the effectiveness of different features and preprocessing. We observed that the best
configurations for both tasks were based on the spell-checked and dosage-replaced tweets along with n-gram, POS
and lexicon features.</p>
    </sec>
    <sec id="sec-4">
      <title>PSB SMM4H Shared Task 1 Results</title>
      <p>PSB SMM4H Shared Task 2 Results
0.213
0.362
0.226</p>
      <p>0.69
0.644
0.662</p>
      <p>R</p>
      <p>F
0.433
0.249
0.403
0.554
0.588
0.572
0.5497809.268261
0.441441441</p>
      <p>0.295
0.489693941</p>
      <p>0.29
0.614
0.615
0.614</p>
      <p>G. Holmes, A. Donkin, and I. H. Witten, "Weka: A machine learning workbench," in Intelligent Information
Systems, 1994. Proceedings of the 1994 Second Australian and New Zealand Conference on, 1994, pp.
357361.</p>
      <p>J. Jonnagaddala, T. R. Jue, and H.-J. Dai, "Binary classification of Twitter posts for adverse drug reactions,"
presented at the Proceedings of the Social Media Mining Shared Task Workshop at the Pacific Symposium
on Biocomputing, Big Island, Hawaii, 2016.</p>
      <p>R. T.-H. Tsai, H.-C. Hung, H.-J. Dai, Y.-W. Lin, and W.-L. Hsu, "Exploiting Likely-Positive and Unlabeled
Data to Improve the Identification of Protein-Protein Interaction Articles," presented at the 6th InCoB - Sixth
International Conference on Bioinformatics, 2007.</p>
      <p>
        M. Timonen, "Term Weighting in Short Documents for Document Categorization, Keyword Extracti
        <xref ref-type="bibr" rid="ref3">on and
Query Expansion," 2013</xref>
        .
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Sarker</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <article-title>"Portable automatic text classification for adverse drug reaction detection via multi-corpus training," J Biomed Inform</article-title>
          , vol.
          <volume>53</volume>
          , pp.
          <fpage>196</fpage>
          -
          <lpage>207</lpage>
          ,
          <year>Feb 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sarker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rouhizadeh</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. O'Connor</surname>
            , and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <article-title>"Detecting Personal Medication Intake in Twitter: An Annotated Corpus and Baseline Classification System,"</article-title>
          <source>BioNLP</source>
          <year>2017</year>
          , pp.
          <fpage>136</fpage>
          -
          <lpage>142</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>O.</given-names>
            <surname>Owoputi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. O</given-names>
            <surname>'Connor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gimpel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Schneider</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>"Improved part-of-speech tagging for online conversational text with word clusters,"</article-title>
          <source>in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>