=Paper= {{Paper |id=Vol-2614/session4_paper3 |storemode=property |title=Multi-modal Sentiment Analysis using Super Characters Method on Low-power CNN Accelerator Device |pdfUrl=https://ceur-ws.org/Vol-2614/AffCon20_session4_multimodal.pdf |volume=Vol-2614 |authors=Baohua Sun,Lin Yang,Hao Sha,Michael Lin |dblpUrl=https://dblp.org/rec/conf/aaai/SunYSL20 }} ==Multi-modal Sentiment Analysis using Super Characters Method on Low-power CNN Accelerator Device== https://ceur-ws.org/Vol-2614/AffCon20_session4_multimodal.pdf
    Multi-modal Sentiment Analysis using Super
     Characters Method on Low-power CNN
                Accelerator Device

               Baohua Sun1 , Lin Yang1 , Hao Sha1 , and Michael Lin1

Gyrfalcon Technology Inc., 1900McCarthy Blvd Suite 208, Milpitas, CA, 95035, US
                        baohua.sun@gyrfalcontech.com



       Abstract. Recent years NLP research has witnessed the record-breaking
       accuracy improvement by DNN models. However, power consumption is
       one of the practical concerns for deploying NLP systems. Most of the
       current state-of-the-art algorithms are implemented on GPUs, which is
       not power-efficient and the deployment cost is also very high. On the
       other hand, CNN Domain Specific Accelerator (CNN-DSA) has been in
       mass production providing low-power and low cost computation power.
       In this paper, we will implement the Super Characters method on the
       CNN-DSA. In addition, we modify the Super Characters method to uti-
       lize the multi-modal data, i.e. text plus tabular data in the CL-Aff shared
       task.

       Keywords: Super Characters · Squared English Word · Two-dimensional
       Embedding · Text Classification · Multi-modal Sentiment Analysis.


1    Introduction

The need to classify sentiment based on the multi-modal input arises in many
different problems in customer related marketing fields. Super Characters [3] is
a two-step method for sentiment analysis. It first converts text into images; then
feeds the images into CNN models to classify the sentiment. Sentiment clas-
sification performance on large text contents from customer online comments
shows that the Super Character method is superior to other existing meth-
ods. The Super Characters method also shows that the pretrained models on a
larger dataset help improve accuracy by finetuning the CNN model on a smaller
dataset. Compared with from-scratch trained Super Characters model, the fine-
tuned one improves the accuracy from 95.7% to 97.8% on the well-known Chinese
dataset of Fudan Corpus. Squared English Word (SEW) [6] is an extension of
the Super Characters method into Latin Languages. With the wide availability
of low-power CNN accelerator chips [4] [5], Super Characters method has the
great potential to be deployed in large scale by saving power and fast inference
speed. In addition, it is easy to deploy as well. The recent work also extend its
applications to chatbot [8], image captioning [9], and also tabular data machine
learning [7].


 Copyright 2020 for this paper by its authors. Use permitted under Creative Commons License
 Attribution 4.0 International (CC BY 4.0). In: N. Chhaya, K. Jaidka, J. Healey, L. H. Ungar, A. Sinha
 (eds.): Proceedings of the 3rd Workshop of Affective Content Analysis, New York, USA, 07-
 FEB-2020, published at http://ceur-ws.org
2       B. Sun et al.

    The CL-AFF Shared Task[1] is part of the Affective Content Analysis work-
shop at AAAI 2020. It builds upon the OffMyChest dataset[2], which contains
12,860 samples of training data and 5,000 samples of testing data. Each sample
is a multi-modal input containing both text and tabular data. The text input is
an English sentence from Reddit. The tabular data is the corresponding log in-
formation for each sentence, like wordcount, created utc time and etc. And each
sample has six sets of binary classification labels, EmotionDisclosure?(Yes|No),
InformationDisclosure?(Yes|No), Support?(Yes|No), EmmotionSupport?(Yes|No),
InformationSupport?(Yes|No), GeneralSupport?(Yes|No). In this paper, we will
apply Super Characters on this data set to classify the muti-modal input.


2     Super Characters for Multi-modal Sentiment Analysis
      and Low-Power Hardware Solution
For multi-modal sentiment analysis, we can simply split the image into two
parts. One for the text input, and the other for the tabular data. Such that both
can be embedded into the Super Characters image. The CNN accelerator chip
comes together with a Model Development Kit (MDK) for CNN model training,
which feeds the two-dimensional Super Characters images into MDK and then
obtain the fixed point model. Then, using the Software Development Kit (SDK)
to load the model into the chip and send command to the CNN accelerator chip,
such as to read an image, or to forward pass the image through the network
to get the inference result. The advantage of using the CNN accelerator is low-
power, it consumes only 300mw for an input of size 3x224x224 RGB image at the
speed of 140fps. Compared with other models using GPU or FPGA, this solution
implement the heavy-lifting DNN computations in the CNN accelerator chip,
and the host computer is only responsible for memory read/write to generate
the designed Super Character image. This has shown good result on system
implementations for NLP applications [10].


3     Experiments
3.1   Data Exploration
The training data set has 12,860 samples with 16 columns. The first ten columns
are attributes, including sentenceid, author, nchar, created utc, score, subred-
dit, label, full text, wordcount, and id. And the other six columns are labels for
each of the tasks of Emotion disclosure, Information disclosure, Support, Emmo-
tion support, Information support, and General support. Each task is a binary
classification problem based on the ten attributes. So there will be 60 models
to be trained for a 10-fold validation. The test data set has 5000 samples with
only the ten columns of attributes. The system run will give labels on these test
samples based on the 10-fold training.
    For the training data, unique ids are 3634 compared to the whole training
12,860. While for the testing data, this number is only 2443 compared to the
                    Multi-modal Sentiment Analysis using Super Characters      3

whole testing dataset 5000, meaning some of the records may come from the
same discussion thread. And the unique authors are 7556 for training, and 3769
for testing, which means some of the authors are active that they may published
more than one comments.
    Based on this, we have considered to include author names in the multi-
modal model as well, since a comment may be biased by the personality of its
author. The maximum length of an author’s name is 20 charactors, if SEW [6] is
to be used to project the names onto a two-dimensional embedding. On the other
hand, the nchar which indicates the number of characters for the full text has a
maximum value of 9993, and the maximum wordcount is 481. The column “la-
bel” has 37 unique values, which are different combinations of strings like “hus-
band”, “wife”, “boyfriend”, “girlfriend”, and their abbreviations like “bf”,“gf”.
The column “subreddit” is a categorical attribute with values in (“offmychest”,
“CasualConversation”). After converting the Unix time in the column of “cre-
ated utc”, we found that the records are generated from 2017 to 2018. The
column score has integers ranging from -44 to 1838 with 251 unique values.

3.2   Design SuperCharacters Image
The sentence length distribution is given in Figure 1. The layout design for
the full text will be based on this. Since we present the English words using
SEW [6] method, the size of each English word on the SuperCharacters image
should better be calculated by (224/N)*(224/N) if the whole image is set to
224x224. Here N is an integer. The dimension is set to 224x224 because of the
chip specification.




         Fig. 1. Histogram of sentence length counted by number of words.
4       B. Sun et al.




(a) Design Option One: 7 words per row, max     (b) Design Option Two: 8 words per row,
7 rows, with only full text information em-     max 5 rows, with all attributes except id and
bedded in the image. Example: full text=“If     sentenceid. Example: author=“Laseyguy”,
it were me, and I cared about a person, I       wordcount=32,         created time=“2018-07-
would absolutely read their book to show        26     19:49:22”,    subreddit=“offmychest”,
support, but I can also understand struggling   score=2426, nchar=102, label=“husband”,
to get stalted on/through something I have      and full text=“I think it’s safe to say that
absolutely no interest in.”.                    we’ve all been there - that realization that
                                                this isn’t what we signed up for - and that
                                                life will never be the same again”.




(c) Design Option Three: 7 words per row,       (d) Design Option Four: Same design as De-
max 6 rows, with four attributes subreddit,     sign Option Three except that the full text
wordcount, score, label, and full text. Exam-   is augmented. Spaces are added to the front
ple: subreddit=“CasualConversation”, word-      of the full text sentence in order to get aug-
count=37, score=2, label=“girlfriend”, and      mented Super Characters image. Each record
full text=“The best thing that you can do       can get multiple augmented Super Charac-
is keep at it don’t give up hope, and try       ters images until the added blanks make the
not to lose sight of what is important - you    sentence length equals 42 (7 words per row,
health and the health of your loved ones (and   max 6 rows).
kity)!”.

                        Fig. 2. Demonstrations of design options.
                    Multi-modal Sentiment Analysis using Super Characters        5

Design Option One In this design setting, we only include the full text infor-
mation and ignore the other attributes. If N=7, it means each row has 7 words,
and each word has (224/7)*(224/7)=32*32 pixels. In this setting we can hold up
to 49 words in full text. For the records with words more than 49, the full text
will ignore the words from the 49th. In this case, only 0.86% of the training data
and 1.98% of the testing data will have to cut the sentence at 49 words. An
example of this design setting is in Figure 2a.


Design Option Two If N=8, it means each row has 8 words, and each word
has (224/8)*(224/8)=28*28 pixels. And if we set the cutlength=40, it means
that we will have 5 rows for the full text, and the other 3 rows will not be used
for text, but all the space of the 224*(3*28) square pixels will be used for the
tabular data given in the attributes other than full text”. For the records with
words more than 40, the full text will ignore the words from the 40th. In this
case, only 2.03% of the training data and 4.14% of the testing data will have to
cut the sentence at 40 words. We have the option to use the bottom part of the
image to embed the other attributes. The id and sentenceid should be unrelated
to the prediction, so these two attributes are not included. One example having
the full text, author, wordcount, created utc, subreddit, score, nchar, and label
is given in Figure 2b.
    However, the 10-fold training accuracy on this design is not good. This is
partially because some of the attributes do not contribute to prediction but
adds more noise instead. For example, the created time may not be very related
to the prediction of the tasks but occupies a good portion of the embedding area
of the image. In addition, since most of the wordcounts are centered around less
than twenty, the two-dimensional embeddings of the full text should have better
resolution if the cutlength is smaller than 40. So the font size will be larger and
easier for CNN to learn.


Design Option Three This design setting cuts the cut length of the full text
sentence to 42, and leave the space of the last row for some important attributes,
including subreddit, wordcount, score, and label. An example of this design set-
ting is in Figure 2c.


Design Option Four This is data augmentation for Design Option Three. For
a small data set, we need more data with the same semantic meaning generated
from the raw labeled data without adding any noise. For Super Characters, the
text are projected into the image. Adding some spaces at the front should not
change the semantic meaning, and at the same time increased the number of
generated Super Characters images. For each sentence, if the sentence length is
less than 42, we will add one space at the front and then generate the Super
Characters image. This process iterates until the length of the sentence with the
added space reaches 42. An example of this design setting is in Figure 2d.
6       B. Sun et al.

3.3   Experimental Results
After comparison, only Design Option One and Design Option Four are kept for
the entire 10-fold training and validation.
    For the system runs, it is limited to submit a maximum of 10 system runs.
So, only the first five 10-folds models on both Design Option One and Design
Option Four are tested against the 5000 testing data and submitted. The details
of these 10 system runs are given in Table 1−6.


                System Runs        Accuracy Precision Recall F1
           Design Option One fold0 68.98%    33.33% 1.29% 2.48%
           Design Option One fold1 69.21%    33.33% 0.51% 1.01%
           Design Option One fold2 69.21%    33.33% 0.51% 1.01%
           Design Option One fold3 69.21%    33.33% 0.51% 1.01%
           Design Option One fold4 69.21%    33.33% 0.51% 1.01%
           Design Option Four fold0 68.98%   44.19% 4.88% 8.80%
           Design Option Four fold1 64.65%   42.99% 47.30% 45.04%
           Design Option Four fold2 70.94%   55.38% 26.48% 35.83%
           Design Option Four fold3 70.08%   51.66% 35.99% 42.42%
           Design Option Four fold4 71.34%   59.40% 20.31% 30.27%

Table 1. System Run details on 10-folds validation for the task of Emotion disclosure.




                System Runs        Accuracy Precision Recall F1
           Design Option One fold0 65.93%    59.21% 33.88% 43.10%
           Design Option One fold1 66.14%    61.84% 29.13% 39.61%
           Design Option One fold2 65.25%    54.65% 51.14% 52.83%
           Design Option One fold3 63.20%    51.13% 74.95% 60.79%
           Design Option One fold4 65.48%    53.63% 68.74% 60.25%
           Design Option Four fold0 67.90%   63.38% 37.19% 46.88%
           Design Option Four fold1 67.95%   64.31% 35.74% 45.95%
           Design Option Four fold2 65.80%   66.44% 20.50% 31.33%
           Design Option Four fold3 66.19%   54.61% 66.25% 59.87%
           Design Option Four fold4 66.75%   55.64% 62.32% 58.79%

Table 2. System Run details on 10-folds validation for the task of Informa-
tion disclosure.



   In general, Design Option Four are a little better than Design Option One,
but these results are still not good. The results are a little better than constantly
predict one class. We can see that the results on this OffMyChest data is not as
good as on AffCon19 CLAFF shared task. And compared with Super Characters
on Wikipedia data set, the accuracy on this data is not as accurate as well.
                     Multi-modal Sentiment Analysis using Super Characters           7

                System Runs        Accuracy Precision Recall F1
           Design Option One fold0 77.95%    64.18% 27.04% 38.05%
           Design Option One fold1 78.82%    61.83% 40.25% 48.76%
           Design Option One fold2 78.19%    58.10% 46.23% 51.49%
           Design Option One fold3 76.06%    51.62% 70.13% 59.47%
           Design Option One fold4 75.02%    50.12% 63.21% 55.91%
           Design Option Four fold0 78.43%   59.57% 43.08% 50.0%
           Design Option Four fold1 79.53%   62.95% 44.34% 52.03%
           Design Option Four fold2 79.13%   70.23% 28.93% 40.98%
           Design Option Four fold3 78.58%   81.94% 18.55% 30.26%
           Design Option Four fold4 78.41%   56.79% 57.86% 57.32%

    Table 3. System Run details on 10-folds validation for the task of Support.


                System Runs        Accuracy Precision Recall F1
           Design Option One fold0 73.35%    73.33% 22.22% 34.11%
           Design Option One fold1 72.33%    57.97% 40.40% 47.62%
           Design Option One fold2 72.64%    59.68% 37.37% 45.96%
           Design Option One fold3 72.64%    75.0% 18.18% 29.27%
           Design Option One fold4 73.27%    62.50% 35.35% 45.16%
           Design Option Four fold0 72.10%   61.90% 26.26% 36.88%
           Design Option Four fold1 72.64%   66.67% 24.24% 35.56%
           Design Option Four fold2 72.33%   62.79% 27.27% 38.03%
           Design Option Four fold3 75.47%   62.65% 52.53% 57.14%
           Design Option Four fold4 71.38%   56.45% 35.35% 43.48%

Table 4. System Run details on 10-folds validation for the task of Emotion support.


                System Runs        Accuracy Precision Recall F1
           Design Option One fold0 66.14%    55.41% 66.13% 60.29%
           Design Option One fold1 68.03%    61.96% 45.97% 52.78%
           Design Option One fold2 68.03%    57.43% 68.55% 62.5%
           Design Option One fold3 67.92%    72.34% 27.64% 40.0%
           Design Option One fold4 66.67%    66.67% 27.64% 39.08%
           Design Option Four fold0 71.16%   65.69% 54.03% 59.29%
           Design Option Four fold1 71.16%   65.69% 54.03% 69.29%
           Design Option Four fold2 68.65%   58.82% 64.52% 61.54%
           Design Option Four fold3 68.24%   66.67% 35.77% 46.56%
           Design Option Four fold4 69.18%   69.84% 35.77% 47.31%

Table 5. System Run details on 10-folds validation for the task of Information support.




    Several methods could be used to further improve the accuracy. First, pre-
trained model may help improve. For this shared task, the size of training exam-
ples are relatively small to understand the complex definition of these 6 tasks.
8       B. Sun et al.

                System Runs        Accuracy Precision Recall F1
           Design Option One fold0 78.93%     0.0%     0.0% 0.0%
           Design Option One fold1 79.25%     0.0%     0.0% 0.0%
           Design Option One fold2 73.27%    19.35% 9.09% 12.37%
           Design Option One fold3 77.67%    36.84% 10.61% 16.47%
           Design Option One fold4 79.56%    66.67% 3.03% 5.80%
           Design Option Four fold0 76.73%   16.67% 3.03% 5.13%
           Design Option Four fold1 79.25%   50.0%    1.52% 2.94%
           Design Option Four fold2 76.42%   36.36% 18.18% 24.24%
           Design Option Four fold3 80.82%   72.73% 12.12% 20.78%
           Design Option Four fold4 77.99%   25.0%    3.03% 5.41%

Table 6. System Run details on 10-folds validation for the task of General support.




Second, other data augmentation method could be introduced in order to further
boost the accuracy. For example, replacing word with its synonyms. Third, the
data set is skewed data set. We can balance the data set by upsampling.


4    Conclusion

In this paper, we proposed modified version of Super Characters, in order to make
it work on multi-modal data. In the case of this AffCon CLAFF shared task, the
multi-modal data includes text data and tabular data. In addition, we deploy
the models on low-power CNN chips, which proves the feasibility of applying
DNN models with consideration of real-world practical concerns such as power
and speed. The Super Characters method is relatively new and starts to attrack
attentions for application scenarios. Pretrained models on large corpus would be
very helpful for the Super Characters method, as success of pretrained model
is observed for NLP models like ELMO and BERT. For fine-tuning on small
datasets, data augmentation should further boost the generalization capability.


References

1. Jaidka, K.; Singh, I.; Lu, J.; Chhaya N.; and Ungar, L. 2020. A report of the CL-Aff
  OffMyChest Shared Task at Affective Content Workshop @ AAAI. In Proceedings of
  the 3rd Workshop on Affective Content Analysis @ AAAI (AffCon2020). New York,
  New York. February.
2. Asai, A.; Evensen, S.; Golshan, B.; Halevy, A.; Li, V.; Lopatenko, A.; Stepanov, D.;
  Suhara, Y.; Tan, W.-C.; and Xu, Y. 2018. Happydb: A corpus of 100,000 crowd-
  sourced happy moments. In Proceedings of LREC 2018. Miyazaki, Japan: European
  Language Resources Association (ELRA).
3. Sun, B.; Yang, L.; Dong, P.; Zhang, W.; Dong, J.; and Young, C. 2018. Super char-
  acters: A conversion from sentiment classification to image classification. Proceedings
  of EMNLP2018 workshop WASSA2018.
                    Multi-modal Sentiment Analysis using Super Characters          9

4. Sun, B.; Yang, L.; Dong, P.; Zhang, W.; Dong, J.; and Young, C. 2018. Ultra Power-
  Efficient CNN Domain Specific Accelerator with 9.3 TOPS/Watt for Mobile and
  Embedded Applications. Proceedings of the IEEE Conference on Computer Vision
  and Pattern Recognition Workshops, pages 1677–1685, 2018.
5. Sun, B.; Liu, D.; Yu, L.; Li, J.; Liu, H.; Zhang, W. and Torng, T. 2018. MRAM Co-
  designed Processing-in-Memory CNN Accelerator for Mobile and IoT Applications.
  NeurIPS 2018 Workshop MLPCD, 2018.
6. Sun, B.; Yang, L.; Chi, C.; Zhang, W.; Lin, M. 2019. Squared English Word:
  A Method of Generating Glyph to Use Super Characters for Sentiment Analysis.
  Proceedings of AAAI 2019 AffCon Workshop.
7. Sun, B.; Yang, L.; Zhang, W.; Lin, M.; Dong, P.; Young, C. and Dong, J. 2019.
  SuperTML: Two-Dimensional Word Embedding for the Precognition on Structured
  Tabular Data. Proceedings of the IEEE Conference on Computer Vision and Pattern
  Recognition 2019 Workshop Precognition.
8. Sun, B.; Yang, L.; Lin, M.; Young, C.; Dong, J.; Zhang, W. and Dong, P. 2019.
  SuperChat: Dialogue Generation by Transfer Learning from Vision to Language using
  Two-dimensional Word Embedding and Pretrained ImageNet CNN Models. CVPR
  2019 Workshop Language and Vision.
9. Sun, B.; Yang, L.; Lin, M.; Young, C.; Dong, P.; Zhang, W. and Dong, J. 2019. Su-
  perCaptioning: Image Captioning Using Two-dimensional Word Embedding. CVPR
  2019 Workshop VQA.
10. Sun, B.; Yang, L.; Lin, M.; Young, C.; Dong, P.; Zhang, W. and Dong, J. 2019.
  System Demo for Transfer Learning across Vision and Text using Domain Specific
  CNN Accelerator for On-Device NLP Applications. IJCAI 2019 Workshop Bringing
  Semantic Knowledge into Vision and Text Understanding (Tusion).