Detecting Location-Indicating Phrases in User Utterances
                              for Chat-Oriented Dialogue Systems

                         Hiromi Narimatsu, Hiroaki Sugiyama, Masahiro Mizukami
                                  NTT Communication Science Laboratories
                    {narimatsu.hiromi, sugiyama.hiroaki, mizukami.masahiro}@lab.ntt.co.jp


                          Abstract                                        User : I played tennis at a park.
                                                                          System: (understand only play tennis.)
     This paper establishes a method that detects words                   System: Oh you played tennis, where did you play?
     or phrases that indicate location in Japanese spoken                 User : Hmm, I played at a park close to my home...
     language for a chat-oriented dialogue system. Al-
     though conventional methods for detecting words                     Figure 1: Example dialogue that system undetected location phrase
     or phrases focus on named entities (NE)s, humans                    and caused a dialogue breakdown.
     frequently use non-NE words to signify locations.
     For example, we can say “I went to that famous                       User : I went to the capital of France last week.
     tower in Paris” instead of “I went to the Eiffel                     System: (understand only France as location.)
     Tower” if we forget its proper name. Since con-                      System: Oh great, where did you go in France?
     ventional NE recognizers extract only Paris as a lo-                 User : Hmm, I visited the capital...
     cation from the utterance, they cannot correctly un-
     derstand because the phrase “that famous tower in                   Figure 2: Example dialogue that the system misunderstood location
     Paris” denotes the location in this utterance. Such                 phrase and caused a dialogue breakdown.
     insufficient understanding may allow a system to
     ask “Where did you go in Paris?” next, and easily
     result in dialogue breakdown.                                       alogue, which should be shared between talkers. In addition,
     To correctly understand location phrases, we fo-                    location phrases are important in a slot filling-based conver-
     cused on conditional random field (CRF)-based                       sational agents [Han et al., 2013]. An example system is that
     model as a representative method for NE extrac-                     uses 5W1H (who, what, when, where, why, how) slots for
     tion. Since there is no chat corpus that such                       filling by conversation. The target words or phrases are ex-
     location-indicating phrases are annotated, we firstly               tracted from user utterances. Since the targets of when and
     created a corpus by annotating location-indicating                  where slots particularly appear in the beginning of dialogue,
     phrases to actual human-human chat-oriented dia-                    the system needs to detect whether they are included in the
     logues. Then, we evaluated with the corpus how                      utterance.
     the model work. The evaluation shows that human                         For the purpose of detecting location in sentences and doc-
     utterances include various location phrases except                  uments, previous work has been adopted named entity (NE)
     for NEs. It also shows that a CRF-based model                       recognition. However, we human often use and understand
     trained a new annotated corpus detects the target                   location words or phrases except for NEs in chatting situa-
     phrases with high accuracy.                                         tion. We describe two cases using Figure 1 and Figure 2.
                                                                             First case is that we human use and understand a common
                                                                         word as location. In the example shown in Figure 1, a park
1   Introduction                                                         represents location but it is not a named entity. If the system
Recently, chat-oriented dialogue systems have been                       takes 5W1H information extraction strategies, it is important
attracting attention for social and entailment as-                       to detect it as location. However, NE recognizers usually un-
pects      [Bickmore and Picard, 2005;        Ritter et al., 2011;       detect it as location, and it leads a dialogue breakdown.
Higashinaka et al., 2014; Otsuka et al., 2017]. In chatting                  The second case is that humans use various words to tell a
situation, there is a significant problem that systems precisely         location. For instance, the following two utterances “I went
understand users’ utterances. Although the systems need                  to Paris” and ”I went to the capital of France,” have identical
to grasp the meaning of words or phrases in utterances                   meaning. However, conventional NE recognizers correctly
[Higashinaka et al., 2015], it is difficult because the domain           extract Paris as the location in the first utterance, but they
in not limited in chats.                                                 only extract France as the location while whole the phrase
   In this study, we focused on the understanding of location            “the capital in France” is the correct location phrase in the
phrases. Locations are frequently used as background of a di-            second. Such insufficient detection also results in a dialogue


                                                                     8
breakdown, as shown in Figure 2.                                         cannot be predefined.
   The simplest way to detect these phrases as location is that
developing a location phrase list as a dictionary and match-             3     Location Phrase Dataset
ing the target phrase against the list, but it is possible to lead
misdetection such as park in “Can I park my car?” for the                To examine what kinds of words or phrases except for NE
first case. Moreover, location phrases inlcude not only words            are used as locations, we analyze human utterances in chats.
but also phrases like “the capitable in France,” and “the elec-          Since there is no available chat data with location phrase an-
tricity shop near XX station” as shown in the second case.               notations, we create a corpus by annotating location words or
Therefore, simply adding these location phrases to a list is             phrases in human-human chat-oriented dialogues.
not effective.                                                           3.1 Location phrase annotation
   To overcome the difficulties, we conduct this research as
follows. First, we newly annotated such location-indicating              We use chat dialogues collected by human-human text-
phrases to human-human chat-oriented dialogues because                   based chats, and annotated location words or phrases to
there is no such corpus available. Then, we evaluated the                them. The dialogue data are collected by the previous study
location phrase detection accuracy using the chat corpus. We             [Meguro et al., 2009] and the dialogues are conducted with-
focused on CRF-based model that is a representative method               out limiting the topic or contents. We use 600 dialogues and
for NE extraction, and compared three models; one is trained             24,888 utterances in the dataset. Each dialogue consists of
only NEs, another is trained the chat corpus, and the other is           about 40 utterances.
combined the above two models. The evaluation results show                 Then, we extract location-indicating phrases by manual an-
that human represents location with various phrases except               notations. To define the instructions for the annotation, we
for NEs, and training the chat corpus with CRF- based model              examined 10 chat dialogues including about 400 utterances
is effective for detecting them.                                         and extracted the features of location phrases. These are ex-
                                                                         ample location phrases:
2   Related Work                                                         Example 1
                                                                         I went to the capital of France yesterday.
For the purpose of grasping the meaning of words or phrases,             I ate at a ramen shop near my office.
there are two types of related work. The first type is a
named entity task initiated by the Defense Advanced Re-
                                                                         In the examples, the underlined phrases the capital of France
search Projects Agency (DARPA) [DARPA, 1995] at the
                                                                         and a ramen shop near my office, were the target locations of
Sixth Message Understanding Conference (MUC-6). It
                                                                         the utterance. Although France and ramen shop are also lo-
identified seven types of NEs: person, organization, lo-
                                                                         cation words or phrases, they are partial phrases of the target
cation, and numeric expressions such as date, time, and
                                                                         locations. Therefore, we assumed a whole phrase that indi-
money. Sekine et al. proposed an extended named en-
                                                                         cates a location is extracted as a single location.
tity [Sekine et al., 2002]. There are many NE recogni-
                                                                            Then, we determined the instructions as follows:
tion approaches [Sekine et al., 1998], and the scheme us-
ing conditional random fields (CRF) [Lafferty et al., 2001]                  1. Annotate a sequence of words (including modifiers) as a
has been the primary one [Nadeau and Sekine, 2007]. The                         single location, such as the capital in France instead of
characteristics using CRF is that it can estimate the se-                       France.
quence probability dealing with relations between n-th                       2. Annotate words or phrases that can identify a location,
prior and posterior words and their features, i.e., part-of-                    such as the area around the tower and the place where I
speech (POS) tags and character types. For this task,                           ate ramen.
approaches using Bi-directional LSTM, RNN have also                          3. Regard words or phrases that evoke “location” even if
been proposed [Chiu and Nichols, 2015; Lample et al., 2016;                     only slightly, as annotation target. (This definition helps
Wang et al., 2017]. They obtain higher performance than                         to avoid overlooking any words.)
CRF-based methods, but they need a certain amount of train-
ing datasets to obtain stable results. Although these ap-                    4. Clarify the ambiguity of the annotation, by attaching one
proaches detect NEs with high accuracy, the target NE lo-                       of the labels shown in Table 1. (It helps to omit super-
cations are different from location phrases in chats.                           fluities that may be occurred by the third instruction. )
   The second type is an information extraction task for task-              We assumed that most of the location phrases can be in-
oriented dialogue systems [Lee et al., 2010; Eric et al., 2017;          tuitively understood as location, but it is possible that human
Bordes et al., 2017]. Basically, this is a slot-filling task,            cannot decide whether the phrase is location, and where the
which assumes that the target words or phrases that fill the             phrase is segmented. Therefore, we decided the ambiguity
slots are predefined. For example, in a restaurant reservation           labels as shown in Table 1. These labels help to precisely
task, slots are prepared for date, location, and the number              measure the system performance by removing phrases which
of people, and they are filled through a dialogue by check-              human cannot simply decide.
ing words in user’s utterances against words and phrases list               To decide the number of annotators, we firstly verified the
that are predefined. Although this approach is effective if the          annotation agreement using the first 30 dialogues. We em-
words and phrases list is prepared in advance, they are unsuit-          ployed two annotators and gave them the above instructions
able for chatting situation such that target words or phrases            and the entire sequential dialogues.


                                                                     9
                     Table 1: Ambiguity label.                                1   [JP] 電車の中で隣の人とおしゃべりしました。
                                                                                  [EN] I talked with the person next to me in the train.
 Label     Criteria                                                           2   [JP] 暇なときはよく電気屋にいきます。
  L1       The words/phrase that annotated without any hes-                       [EN] I often go to electricity shops in my free time.
           itation.                                                           3   [JP] 水が美味しいところに行きたい。
  L2       The words/phrase that annotated without a certain                      [EN] I want to go a place where the water is
           about segmentation.                                                    delicious.
  L3       The words/phrase that the annotator annotated but
           had no confidence that it is a location.                                Figure 3: Representative examples assigned into L1.
  L4       Applies both labels 2 and 3.
                                                                              1   [JP] 国内を三地域ほど旅をしました。
   Table 2 shows the annotation agreement results. We calcu-                      [EN] I travelled about three areas in Japan.
lated agreement score v by                                                    2   [JP] X というお店にいきました。
                                                                                  [EN] I went to the shop named X.
             Number of phrases detected by both annotators
  v =                                                            .            3   [JP] 京都の辺りは暖かいです。
         Number of phrases detected by the reference annotator                    [EN] Area around Kyoto is hot.
The score using all the detected phrases is shown as all and
that using only label L1 is shown as L1. The agreement                             Figure 4: Representative examples assigned into L2.
scores using L1 data exceeded 0.89 in both evaluations. Since
the 0.89 score is high enough to use the data of a single anno-               1   [JP] 私は実家暮らしです。
tator, one annotator worked on the remaining 570 dialogues                        [EN] I am living at home.
in accodance with the above instructions.                                     2   [JP] ファミレスよりファーストフードにいきます。
                                                                                  [EN] I often go to fast food restaurants than
                  Table 2: Annotation agreement.                                  family restaurants.
                                                                              3   [JP] イタリア料理を良く作ります。
         Reference         Detector       v (all)   v (L1)                        [EN] I usually made Italian food.
         Annotator 1       Annotator 2     0.87      0.89
         Annotator 2       Annotator 1     0.83       1.0                          Figure 5: Representative examples assigned into L3.


                                                                          less confidence, there were words that it is difficult to identify
3.2 Dataset analysis                                                      the unique location, and words included in other phrases that
We analyzed the annotated data by counting the number of                  represent other entities except for location.
ambiguity labels. The total number of location words or                      From the results, we focused on detecting location phrases
phrases annotated by this work was 4,202. Table 3 shows                   assigned L1 because it is not a big difference that understand-
the number and the ratio of the ambiguity labels annotated                ing only Kyoto as location and area around Kyoto as location
to these phrases. The L1 results show that 70% of the loca-               phrases. In addition, the location phrases assigned into L3
tion phrases were annotated without any ambiguity. The L2                 are different from others because they are some parts of other
results show that 25% were annotated with segmentation am-                entities. Since such phrases are understood as other entities,
biguity. The other labels were much less than L1 and L2.                  we assumed that it is not necessary to detect them as location.
                                                                          Furthermore, although the location phrases assigned into L3
                                                                          include phrases that cannot identify the location as fast food
         Table 3: Number of phrases with ambiguity labels.                restaurants, human does not always understand them as loca-
                                                                          tion. Therefore, we use the location phrases assigned L1 as
       Label        L1         L2       L3       L4      all
                                                                          evaluation target.
       Number      2914       1025     216       47     4202
       (Ratio)    (0.69)     (0.24)   (0.05)   (0.01)
                                                                          4       Location Phrase Detection using Annotated
   Then we analyze the feature of sentences in each ambigu-                       Dataset
ity labels by taking some representative examples. Figure 3,              To detect target location phrases except for NE, we develop a
Figure 4, and Figure 5 are the example three sentences as-                new model using the dataset that is newly annotated in Sec-
signed into each label L1, L2, and L3 respectively. For label             tion 3. We used CRF [Lafferty et al., 2001] to detect location
L1 that human understand the words or phrases as locations                phrases by training word sequences with their features and
without any ambiguity, there were many location phrases ex-               tags. Since the performance of CRF-based approach is stable
cept for NEs such as general nouns and the phrases includ-                and it can work with less datasets than neural network based
ing modifiers. For label L2 that human uncertainly annotated              methods, we take CRF-based approach.
the words in regard to the segmentation place, there were                    We use grammatical and superficial features: the original
words used to ambiguate the locations for example around                  words, the POS tags for each word estimated a priori, and
and about. For label L3 that human annotated the words with               five character types: hiragana, katakana, kanji, mark, and tag.


                                                                     10
Table 4: Features and LOC-tag that is an estimation target where the                    Table 5: Location phrase detection performance.
input sentence is [JP]: 昨日、エッフェル塔に登ったよ。/ [EN]: I
went to the Eiffel Tower yesterday. The underlined words represent
location.                                                                        Label      model        Precision    Recall    f -Measure
                                                                                             NE            0.58        0.22        0.32
                                                                                  all       Dial           0.91        0.70        0.79
    Word                      Char type     POS      LOC-tag                               NE+Dial         0.87        0.74        0.80
    <S>                       tag           bos          O                                   NE            0.66        0.03        0.07
    昨日 (yesterday)            kanji         noun         O                        L1        Dial           0.89        0.67        0.76
    、 (,)                     mark          noun         O                                 NE+Dial         0.91        0.84        0.87
    エッフェル (Eiffel)            katakana      noun     B-LOC
    塔 (Tower)                 kanji         noun      I-LOC                  1    [JP] 海も山もあるのでいろいろできました。
    に (to)                    hiragana      pp           O                        [EN] I can do many things because there is a sea and
    登っ (go)                   kanji         verb         O                        mountain.
    た (-ed (past))            hiragana      verb         O                   2    [JP] 私は実家暮らしです。
    よ (expression)            hiragana      sep          O                        [EN] I am living at home.
    。(.)                      mark          sep          O                   3    [JP] 私の近所の図書館にも子供がたくさんいます。
    <S>                       tag           eos          O                        [EN]       There     are    many    children       at
                                                                                  the library in my neighborhood.
Table 4 shows the example features where the input sentence
                                                                            Figure 6: Example of location phrases that Dial successfully de-
is “[JP] 昨日、エッフェル塔に登ったよ。([EN] I went to the
                                                                            tected. Dial detected underlined words and phrases as locations, but
Eiffel Tower yesterday.) The underlined words represent lo-                 NE did not detect any locations.
cation.” First, the sentence is split into words using a Japanese
morphological analyzer, JTAG [Fuchi and Takagi, 1998], and
POS tags were estimated simultaneously. Char type repre-                    5.2 Results
sents the character type that is determined by its unicode sym-             Table 5 shows the results. Score all represents the detection
bols. The LOC-tags are labeled using BIO-tags that B-LOC                    performance for the annotated location phrases in all the ut-
is attached to the first word of location phrase, I-LOC is at-              terances. Score L1 represents the performance using only the
tached to its intermediate words, and O is attached to the other            utterances that are annotated L1 ambiguity labels. The re-
words that are not location words or phrases. BIO-tags are the              sults of recall scores using all labels indicate that only 22%
estimation targets. Here, the i-th word is represented as xi .              of location phrases in human-human chat dialogue are NEs,
To train and estimate the tag of i-th word xi , we used the                 and Dial can detect non-NE location phrases by training the
features of xi−2 , · · · , xi+2 .                                           suitable dataset. Then, the results of precision scores show
                                                                            that the correctness of detected phrases using Dial are im-
5    Evaluation                                                             proved 0.33 points over NE. Therefore, the overall score f -
We evaluated the performance of the location phrase detec-                  Measure is improved 0.47 points. The results of label L1
tion using the new model described in Section 4 comparing                   remarkably indicate that human use various phrases except
with conventional models trained only NEs.                                  for NEs as location in chatting situation. Finally, combined
                                                                            models NE+Dial reached 0.80 for all, and 0.87 for L1 label.
5.1 Experimental setup                                                         To demonstrate the effectiveness of training the newly an-
We compared the following three models:                                     notated data, we analyzed the detected location phrases and
        NE CRF trains the NE location tags annotated to                     compared the results of the two models; NE and Dial. Fig-
            1995 Mainichi newspapers.                                       ure 6 shows the example phrases of Dial successfully detected
                                                                            utterances and NE undetected utterances. The underlined
       Dial CRF trains the location tags newly annotated to
                                                                            words or phrases represent the location phrases. Although
            our text dialogue data.
                                                                            humans understand sea, mountain, and home as locations,
  NE+Dial CRF trains both NE and Dial dataset.                              these terms are undetected by NE because they are not lo-
For NE evaluation, we only used B-LOC, I-LOC, and O lo-                     cation NEs. However, these words were correctly detected as
cation tags instead of all NE-tags in this experiment. All                  locations by training the chat corpus annotated in this study.
the 24,888 annotated utterances were used as test data for                     Figure 7 shows example phrases of Dial undetected and
the evaluation. For Dial evaluation, we calculated the eval-                NE successfully detected utterances. The underlining is rep-
uation scores by 5-fold cross-validation. For NE+Dial eval-                 resented as well in Figure 6. The words Florence, Palma, and
uation, we combined both of the above dataset and trained                   Bologna are named locations. Famous place names are of
them using CRF. We evaluated the detection performance us-                  course included in the data of NE. However, Dial includes
ing precision, recall, and f -measure, which is the harmonic                some famous place names only in the annotated dialogue
mean of the precision and recall. If the detected phrase par-               data. Therefore, combining the training data of NE and Dial
tially matched the annotated one, it was counted as incorrect               is effectively improved the detection performance. However,
because extracting partially matched phrase such as Paris in                some named locations that are not so famous cannot be de-
“that famous tower in Paris” easily leads dialogue breakdown.               tected by both NE and Dial. Therefore, adding some named


                                                                       11
    1    [JP] フィレンツェのステーキはオススメです。                                             In case of Figure 10, the system rephrases the location
         [EN] The steak in Florence is my recommendation.                  phrase to a correct NE. Detecting location phrase that is not
    2    [JP] パルマやボローニャは本当においしいものが                                         NE may be used for identifying the location and rephrasing
         たくさんある。                                                           it as smart agents. These rephrasing may makes us feel the
         [EN] There are many delicious foods in Palma and                  intelligence of the system.
         Bologna.
                                                                           References
Figure 7: Example of location phrases that NE successfully detected
utterances. NE detected underlined words and phrases, but Dial did         [Bickmore and Picard, 2005] Timothy W Bickmore and
not detect any locations.                                                     Rosalind W Picard. Establishing and maintaining long-
                                                                              term human-computer relationships. ACM Transactions
                                                                              on Computer-Human Interaction (TOCHI), 12(2):293–
locations may be necessary in case that further higher accu-                  327, 2005.
racy is required.
   From these results, Dial extracts location words and                    [Bordes et al., 2017] Antoine Bordes, Y-Lan Boureau, and
phrases that are not named entities, and a group of phrases                  Jason Weston. Learning end-to-end goal-oriented dia-
such as the library in my neighborhood by traning features                   log. Proc. of the 5th International Conference on Learning
of words and words’ sequence. Since the detected phrases                     Representations (ICLR), 4 2017.
from NE and Dial are different each other, the combined                    [Chiu and Nichols, 2015] Jason PC Chiu and Eric Nichols.
model NE+Dial is effective for detecting them. The results                   Named entity recognition with bidirectional lstm-cnns.
also show that CRF trained NE with small dialogue dataset                    arXiv preprint arXiv:1511.08308, 2015.
is effective for detecting location phrase in chat-oriented dia-
logues.                                                                    [DARPA, 1995] DARPA. Proc. of the sixth message un-
                                                                             derstanding conference. Morgan Kaufmann Publishers,
                                                                             Columbia, MD, USA, 1995.
6       Conclusion
                                                                           [Eric et al., 2017] Mihail Eric, Lakshmi Krishnan, Francois
We addressed the importance of understanding location                         Charette, and Christopher D. Manning. Key-value retrieval
phrases in chatting situations. To verify the performance of                  networks for task-oriented dialogue. Proc. of the 18th An-
conventional CRF models of NE extraction for phrases that                     nual SIGdial Meeting on Discourse and Dialogue (SIG-
indicate locations in chatting situation, we created a new cor-               DIAL), pages 37–49, 8 2017.
pus of annotated location phrases in a textualized human-
human chat-oriented dialogue. Our evaluation using the cor-                [Fuchi and Takagi, 1998] Takeshi Fuchi and Shinichiro Tak-
pus shows that the conventional NE recognizer is insuffi-                     agi. Japanese morphological analyzer using word co-
cient for understanding location phrases in chatting situa-                   occurrence. Proc. of the 36th Annual Meeting of the As-
tion, but the conventional method CRF is effective for detect-                sociation for Computational Linguistics (COLING), pages
ing location-indicating phrases in chats by training the target               409–413, 1998.
words and phrases that are newly annotated in this studies.                [Han et al., 2013] Sangdo Han, Kyusong Lee, Donghyeon
   In future work, we will further annotate an essential lo-                 Lee, and Gary Geunbae Lee. Counseling dialog system
cation phrase in phrases assigned to L2, L3, and L4 ambi-                    with 5w1h extraction. In Proceedings of the SIGDIAL
guity labels, and evaluate the performance in detail. Then,                  2013 Conference, pages 349–353, 2013.
we will implement the detection function in 5W1H based
chat-oriented dialogue systems, and evaluate the effective-                [Higashinaka et al., 2014] Ryuichiro Higashinaka, Kenji
ness. Some dialogue examples using this location-phrase de-                  Imamura, Toyomi Meguro, Chiaki Miyazaki, Nozomi
tection are described in Section A. Finally, we will extend this             Kobayashi, Hiroaki Sugiyama, Toru Hirano, Toshiro
work to other targets of 5W1H except for locations.                          Makino, and Yoshihiro Matsuo. Towards an open-domain
                                                                             conversational system fully based on natural language
                                                                             processing. Proc. of the 25th International Conference on
A       Appendix                                                             Computational Linguistics (COLING), 2:928–939, 2014.
We show some dialogue examples using the location phrases
                                                                           [Higashinaka et al., 2015] Ryuichiro Higashinaka, Kotaro
detection. In the case of Figure 8, the system conducts di-
alogue by choosing one sentence from many options. Al-                       Funakoshi, Masahiro Araki, Hiroshi Tsukahara, Yuka
though the similarity score between the user utterance and                   Kobayashi, and Masahiro Mizukami. Towards taxonomy
the option sentences is high, the system can filter the options              of errors in chat-oriented dialogue systems. In Proceed-
with different locations.                                                    ings of the 16th Annual Meeting of the Special Interest
                                                                             Group on Discourse and Dialogue, pages 87–95, 2015.
   In the case of Figure 9, the system correctly understands
mountain as location and asks “which mountain” to identify                 [Lafferty et al., 2001] John Lafferty, Andrew McCallum,
the location in detail. Actually, the mountain near by Mt. Fuji              and Fernando C.N. Pereira. Conditional random fields:
easily makes a system misunderstand only Mt. Fuji as loca-                   Probabilistic models for segmenting and labeling sequence
tion. Therefore, showing the correct understanding to users                  data. Proc. of the Eighteenth International Conference on
may look smarter than ever.                                                  Machine Learning (ICML), pages 282–289, 2001.


                                                                      12
    System:    お昼はどこで食べましたか？ (Where did you eat lunch today?)
    User:      東京行きの新幹線で食べました。 (I ate lunch in the bullet train bound for Tokyo.)
    System:    Thinking as follows...
                These options are similar to user’s utterance, but they should be filtered because the location is not the
               bullet train. Option1: Lunch at Tokyo is great! Option2: There are many restaurants in the Tokyo station.
    System:    新幹線、ってことは駅弁ですか？ (Bullet train! So, you mean you ate Ekiben (box lunch)?? )
    User:      そうそう。京都駅で駅弁買ったんだよ。(That’s right! I bought the box lunch at Kyoto station.)

                                   Figure 8: Example dialogue using the location phrase detection.

    System:    夏休みにどこへ行きましたか。(Where did you go last summer?)
    User:      山に行きました。(I went to a mountain.)
    System:    Thinking as follows... Understand a mountain as location, but that is not named entity.
    System:    山行ったんですね。どちらの山に行かれたんですか？ (Oh, sounds good! Which mountain did you go?)
    User:      富士山の近くの山に登りました。(I climbed the mountain near by Mt. Fuji.)
    System:    Thinking as ... Understand the mountain near by Mt. Fuji as location.
    System:    富士山じゃないんですね。 (Oh, that’s not Mt. Fuji. )

                                   Figure 9: Example dialogue using the location phrase detection.

    User:      先週末、パリにあるタワーに登りました。(I went to the tower in Paris last weekend.)
    System:    Thinking as follows... Understand the tower in Paris as location and identify it as Eiffel Tower.
    System:    あっ、エッフェル塔ですね。どうでしたか。(Oh, that’s the Eiffel Tower, isn’t it? How was that?)
    User:      そうそう。すごく眺めが良かった。(Yes! The view was great!)

                                  Figure 10: Example dialogue using the location phrase detection.


[Lample et al., 2016] Guillaume Lample, Miguel Balles-                   natural language processing, pages 583–593. Association
  teros, Sandeep Subramanian, Kazuya Kawakami, and                       for Computational Linguistics, 2011.
  Chris Dyer. Neural architectures for named entity recog-            [Sekine et al., 1998] Satoshi Sekine, Ralph Grishman, and
  nition. arXiv preprint arXiv:1603.01360, 2016.                         Hiroyuki Shinnou. A decision tree method for finding and
[Lee et al., 2010] Cheongjae Lee, Sangkeun Jung, Kyung-                  classifying names in japanese texts. Proc. of the 6th Work-
  duk Kim, Donghyeon Lee, and Gary Geunbae Lee. Re-                      shop on Very Large Corpora, 1998.
  cent approaches to dialog management for spoken dialog              [Sekine et al., 2002] Satoshi Sekine, Kiyoshi Sudo, and
  systems. Journal of Computing Science and Engineering,                 Chikashi Nobata. Extended named entity hierarchy. Proc.
  4(1):1–22, 3 2010.                                                     of the 3rd International Conference on Language Re-
[Meguro et al., 2009] Toyomi Meguro, Ryuichiro Hi-                       sources and Evaluation (LREC), pages 52–57, 5 2002.
  gashinaka, Kohji Dohsaka, Yasuhiro Minami, and Hideki               [Wang et al., 2017] Chunqi Wang, Wei Chen, and Bo Xu.
  Isozaki. Analysis of listening-oriented dialogue for build-            Named entity recognition with gated convolutional neural
  ing listening agents. In Proceedings of the 10th Annual                networks. In Chinese Computational Linguistics and Nat-
  Meeting of the Special Interest Group on Discourse and                 ural Language Processing Based on Naturally Annotated
  Dialogue (SIGDIAL), pages 124–127. Association for                     Big Data, pages 110–121. Springer, 2017.
  Computational Linguistics, 2009.
[Nadeau and Sekine, 2007] David Nadeau and Satoshi
  Sekine. A survey of named entity recognition and classi-
  fication. Named Entities: Recognition, classification and
  use, pages 3–26, 2007.
[Otsuka et al., 2017] Atsushi Otsuka, Toru Hirano, Chiaki
  Miyazaki, Ryuichiro Higashinaka, Toshiro Makino, and
  Yoshihiro Matsuo. Utterance selection using discourse re-
  lation filter for chat-oriented dialogue systems. In Dia-
  logues with Social Robots, pages 355–365. Springer, 2017.
[Ritter et al., 2011] Alan Ritter, Colin Cherry, and William B
   Dolan. Data-driven response generation in social media.
   In Proceedings of the conference on empirical methods in


                                                                 13