=Paper= {{Paper |id=Vol-230/paper-6 |storemode=property |title=HCP with PSMA: A Robust Spoken Language Parser |pdfUrl=https://ceur-ws.org/Vol-230/06-hasan.pdf |volume=Vol-230 |dblpUrl=https://dblp.org/rec/conf/ijcai/HasanMK07 }} ==HCP with PSMA: A Robust Spoken Language Parser== https://ceur-ws.org/Vol-230/06-hasan.pdf
                       HCP with PSMA: A Robust Spoken Language Parser

                         Monirul Hasan, Venkatesh Manian, Christel Kemke
                Department of Computer Science, University of Manitoba, Winnipeg, Canada
                              {kmhasan, venkat, ckemke}@cs.umanitoba.ca




                         Abstract                                approaches uses neural networks for processing natural
                                                                 languages. Rather than relying on one particular approach
    “Spoken language” is a field of natural language             researchers are also trying techniques that involve different
    processing, which deals with transcribed speech ut-
                                                                 approaches. For example, Christiansen and Chater [1985]
    terances. The processing of spoken language is
                                                                 state that the symbolic approach and the connectionist ap-
    much more complex and complicated than process-              proach complement each other rather than merely provide
    ing standard, grammatically correct natural lan-
                                                                 alternative representations.
    guage, and requires special treatment of typical
                                                                    The Hybrid Connectionist Parser (HCP) we choose for
    speech phenomena called “disfluencies”, like cor-            our work utilizes both a symbolic component and a connec-
    rections, interjections and repetitions of words or
                                                                 tionist component. In the following section, we discuss pre-
    phrases. We present a parsing technique that util-
                                                                 vious works on spoken language processing. Then, we pre-
    izes a Hybrid Connectionist Parser (HCP) extended            sent the Hybrid Connectionist Parser, which is the basis for
    with a Phrase Structure Matching Algorithm
                                                                 the spoken language parser. Following, we describe the
    (PSMA) for the syntactic processing (parsing) of
                                                                 combination of HCP and the Phrase Structure Matching
    spoken language. The HCP is a hybrid of a connec-            Algorithm for spoken language parsing. The paper con-
    tionist and a symbolic approach, which builds a
                                                                 cludes with a summary of our experimental evaluation of
    neural network dynamically based on a given con-
                                                                 this method using spoken language sentences from the
    text-free grammar and an input sentence. It has an           HCRC MapTask corpus [Anderson et al., 1992].
    advantage over traditional parsers due to the use of
    graded activation and activation passing in the
    network. These features were exploited in tech-              2   Spoken Language Processing
    niques to detect and repair disfluencies, in combi-          In the following two sections, we give a brief introduction
    nation with the Phrase Structure Matching Algo-              to phenomena typical of spoken language, and then cite
    rithm, which operates on the graphical structure of          relevant work in the area of spoken language processing.
    the network. This approach to spoken language
    parsing is purely syntactical and – in contrast to           2.1 Spoken Language
    other work – does not require a modified grammar             Spoken language is marked by the presence of pauses, inter-
    for spoken language, or information derived from             jections and repetition of words or phrases. These “features”
    the speech input, nor any pre-marking of disfluen-           or “speech repairs” are categorized into three classes:
    cies. We implemented the combined HCP and
    PSMA parser and tested it on a standard corpus for           Fresh starts. The speaker restates his/her previous utter-
    spoken language, the HCRC Map Task Corpus.                   ance. For example: “I need to send … let’s see … how
    The experiments showed very good results, espe-              many boxes can one engine take?”
    cially for a purely syntactic method, which detects          Modification repairs. The speaker does not restate what
    as well as corrects disfluencies.                            he/she utters but replaces or repeats a word or a phrase. For
                                                                 example: “You can carry them both … tow both on the
1   Introduction                                                 same engine.”
Over the last 50 years, researchers from different fields have   Abridged repairs. The speaker stops or pauses while utter-
worked on the problem to describe natural languages so that      ing a phrase and then continues what they started to say. For
they can be processed by formal means and thus computers.        example: “We need to um… manage to get the bananas.”
Dale et al. [2000] classify these approaches into three             Natural Language parsers, in general, are not capable of
groups: symbolic methods that represent the language with        handling these erroneous inputs. We need to special capa-
formal grammars, empirical methods that rely on statistical      bilities in the parsers to deal with these disfluencies.
data rather than explicit handcrafted rules, and connectionist
   Approaches to detecting and repairing disfluencies in          that any single source of information is not sufficient to deal
spoken language can be categorized according to the kind of       with repairs.
information they need and the method using this informa-             Nakatani and Hirschberg [33] determined the repairs by
tion. Some researchers use syntactic information based on         using acoustic and prosodic cues. They tested 202 utterances
grammar rules, some use prosodic signals, which indicate an       containing 223 repairs. Of them 186 repairs were detected
interruption in fluent speech, and others use pattern match-      with a recall of 83.4%. 177 of the repairs are detected with
ing to detect repetitions of words or phrases, or combina-        the help of word fragments and pause duration. In another
tions of these methods. We present in the following sections      experiment, the word fragments were not considered result-
major work done on detecting and/or repairing disfluencies        ing in 174 repairs with a 78.1% recall.
in spoken language.                                                  Shriberg et al. [1997] showed that prosodic information
                                                                  can provide better results to detect repairs in speech than
2.2 Earlier Work on Spoken Language Processing                    other methods that use lexical information. Shriberg et al.
Hindle [1983] used edit signals (a phonetic signal) to indi-      used features like the duration of pause, the duration of
cate the presence of a repair in speech. He formulated four       vowels, the fundamental frequency, and the signal to noise
rules namely surface copy editor, category copy editor,           ratio to detect repairs. These measurements were present in
stack copy editor and handling fresh restarts. His method         the speech data that were used for training. Finally, they
had remarkable results in experiments showing only 3%             used a decision tree to find the interruption point that fol-
failure. However, assuming the presence of edit signal lim-       lows a filled pause, repetitions, repairs and false starts from
ited the applicability of such approach.                          other points..
   Dowding et al. [1993] developed a natural language sys-           Heeman and Allen [1999] viewed the problem of recog-
tem called GEMINI. They used both syntactic and semantic          nizing part of speech tags, discourse markers, detecting dis-
information in the bottom-up parser to generate structures.       fluent parts in speech as intertwined. They solved the prob-
Then they use an utterance level parser to derive a semanti-      lem of detecting and correcting repairs, finding utterances
cally correct structure. When the utterance level parser fails,   ends and discourse markers at the speech recognition phase.
a correction phase is used to detect and repair. Their training   With this system, Heeman and Allen solved 66% of speech
set contained 178 repairs and the system was able to detect       repairs and they had a precision of 74%.
89 repairs, and 81 of those repairs were corrected.                  In contrast to the research described above, we propose a
   Lavie [1996] developed a new parsing algorithm GLR*            parsing method based on purely syntactic information, the
for spoken language. This parser skipped words to find            Hybrid Connectionist Parser (HCP) with an added compo-
maximum set of words that form a complete structure. The          nent, the Phrase Structure Matching Algorithm (PSMA).
system did not perform any speech repair detection and cor-       This combined method detects disfluencies (typically cor-
rection.                                                          rections and repetitions), based on a parser problem to con-
   Wermter and Weber [1997] mentioned a flat screening            tinue a once started parse tree, and repairs those disfluencies
method to handle utterances in their system SCREEN. The           by overlapping partially matching structures of the initial
system was hybrid in that it used a flat representation for       parse-tree and the newly generated sub-tree. The method
syntactic and semantic analysis and ANNs to deal with am-         does not require any pre-marking of the interruption point,
biguous structures and repairs in the utterances. The system      or additional phonetic or prosodic information. In the next
dealt with pauses interjections and repetitions of a word or      sections, we describe the basic principles of the HCP, and
phrase.                                                           the augmentation of the HCP with the PSMA, and illustrate
   McKelvie [1998] used symbolic rules to implement re-           how this method is used to parse spoken language.
pairs. He developed a set of grammar rules for fluent speech
and later extended it to handle disfluent patterns. However,      3    The Hybrid Connectionist Parser
McKelvie’s work does not explicitly show the number of            Given a Context-Free Grammar (CFG), the Hybrid Connec-
repairs detected and the number of false alarms caused.           tionist Parser (HCP) constructs the underlying Artificial
   Core [1999] designed a parser to work with metarules           Neural Network (ANN) dynamically. The parser works
such as non-interference metarule, editing term metarule          online, incrementally reading words from the input sentence
and repair metarule. These metarules were used to specify         and merging partially completed networks, starting with
different patterns of speech repair. His system was able to       mini-networks created from the rules of a given CFG. This
detect 237 out of 495 speech repairs using the word and the       idea of constructing the ANN dynamically is relatively un-
part of speech tag information. The correct reparandum start      common but more suitable for processing dynamic struc-
was detected for 187 out of 237 using only word matching          tures of unlimited size. Previously, [Elman 1990; Sharkey
and 183 out of 237 using word and part of speech similari-        1992; Wermter and Weber 1997; Jain and Waibel 1990]
ties.                                                             used Recurrent Neural Networks (RNN) to imitate the re-
   Bear et. al [1992] worked on different sources of informa-     cursive generative capability of natural languages. In this
tion to detect and correct repairs in speech. They used pat-      section, we outline the main characteristics of the HCP. A
tern matching technique, the syntactic and semantic infor-        more detailed description can be found in [Kemke 1996,
mation of a parser and acoustic information. They conclude        2001a, 2001b, 2002].
3.1 Representation                                                 3.2 The Algorithm
The mini-networks in HCP represent the grammatical rules           The HCP starts with a set of mini-networks generated from
of the given CFG. A mini-network is a two-layer network            the context-free grammar. The input is read word by word,
with one root-node, representing the syntactic categories on       and the parser proceeds incrementally. After processing
the left hand side (LHS) of the grammar rule and one or            each word from the input sentence, it searches the existing
more child node(s), representing the right hand side (RHS)         mini-networks to see if this word can be bound to a child
of the rule. The link weights between each of the child            node of one of those mini-networks. While searching for
nodes and the root node are 1/n, if n is the number of child       such networks, it has to be ensured that the order of the
nodes. An example mini-network for the grammar rule A →            terms in the grammar rules is maintained. In other words, if
A1 A2 … An is shown in Figure 1.                                   we are to bind to a node Ak of a network representing the
                                                                   rule A → A1 A2 … An, we need to ensure that the nodes Ai
                                                                   < Ak have already been fully activated and bound. This can
                             A                                     be achieved through introducing proper inhibition mecha-
                                                                   nisms, similar to the ones described above for complex
              1 n                      1 n                         mini-networks. When all the child nodes in a mini-network
                            1 n                                    are activated, the root node of that network becomes fully
                                                                   activated. If the root-node of a network is fully activated,
            A1        A2     …         An                          the search for a network, to which this root-node can be
                                                                   bound, starts again. If some input has already been proc-
                                                                   essed, and thus incomplete networks have been generated,
   Figure 1: Mini-network representing a grammar rule              the fully activated network can be bound to such an existing,
In the general case, the units in the network are simple           partially activated network, in a process called “merging”.
threshold units with the threshold value set to 1.0. Thus,         The root-node of the later network has to match with the
each root node gets somewhat activated, when one or more           left-most unbound node of the earlier, partially activated
of its child nodes are fully activated, but gets fully activated   network. Alternatively, a new mini-network can be created
(and fires) only when all its children are fully activated.        in parallel, if the root-node is the left-most node on the RHS
Thus, the HCP simulates a deterministic, incremental parser.       of the mini-network. After processing all input words, the
In order to make the representation and use of these mini-         parse was successful if at least one fully activated network
networks more efficient, we introduced complex mini-               exists, whose root-node is labeled with the start symbol of
networks, which can represent rules with the same LHS but          the grammar. This network corresponds to a complete parse
different RHS. This is typical for natural language gram-          tree for the input sentence. In case of structurally (syntacti-
mars, which often have slightly different RHS for the same         cally) ambiguous sentences, the parser will derive several
category on the LHS. For this purpose, we use hypothesis           complete, fully activated networks.
nodes. When there are multiple RHS for the same LHS,
each of the RHS is represented through a hypothesis node,
                                                                   3.3 Special Features
which is connected to the root node representing the LHS.          The HCP has some advantages over conventional parsers
The link weight for each of these new connections is set to        for CFGs, due to the online, incremental, dynamic construc-
1.0. All other nodes not belonging to a hypothesis node are        tion of an underlying neural network. The algorithm exhibits
connected to it through negative link weights to facilitate        parallelism in the way nodes are bound in each stage – thus
mutual inhibition. Thus, the root node for the LHS gets acti-      the HCP can be ported to a parallel architecture. It is also
vated, when at least one of the children is fully activated.       possible to represent CFGs in a condensed form for process-
Figure 2 illustrates this idea for the sample grammar rule:        ing by the HCP. One example are the complex mini-
                                                                   networks described above, and optional terms on the RHS
VP → V (H1) | V NP (H2) | V PP (H3) | V NP PP (H4)                 of a grammar rule, which could be simply added to a net-
                                                                   work using a link weight of 0, and thus can be accepted by
                       VP                                          the parser but are not required.
                                                                      Some characteristics of this parsing method make it par-
                                                                   ticularly useful for spoken language processing. If an input
                                                                   word or root-node of a fully activated network cannot be
   H1            H2          H3         H4                         bound to an existing network, nor create a new mini-
                                                                   network, the parser cannot continue to derive a complete
                                                       1.0         parse tree. This characteristic is used to detect interruption
                                                        0.5        points and disfluencies in spoken language. The use of
                                                                   graded activation and activation passing enables the parser
                                                        1/3
        V             NP          PP                               to maintain information about partially accepted, as well as
                                                       -0.2        expected structures, and this can be used to process incom-
                                                                   plete and superfluous structures, reflecting repetitions and
  Figure 2: Complex mini-network with hypothesis nodes             corrections in spoken language.
4     Augmenting the HCP with the PSMA                                 The PSMA searches backward through the initial, incom-
                                                                    plete parse tree, which contains the unbound, not activated
We augmented the HCP with a Phrase Structure Matching
                                                                    node (NOM in Figure 3). When it can confirm a significant
Algorithm (PSMA), extending the concept of the “Graph
                                                                    structural similarity, it overlaps the matching structures and
Matching Algorithm” by Kemke [2001b]. We developed the
                                                                    thus creates a new, corrected parse tree. At this point, the
PSMA to correct disfluent structures present in an utterance
                                                                    previous structures are discarded and the new, combined
by detecting structural similarities between the reparandum
                                                                    structure is retained.
and its alteration. The PSMA is triggered, when an incom-
                                                                       To find a match between the new sub-structure and the
plete structure is detected during parsing with the HCP. This
                                                                    initial, incomplete one, the PSMA starts at the right bottom
leads to an inconsistent state of the parsing process, which
                                                                    of the initial parse tree and step-by-step moves towards its
cannot be continued, since a newly generated sub-tree can-
                                                                    parent nodes, in case of failure of a match with the label of
not be merged into the initial incomplete parse tree, and can
                                                                    the root node of the new structure. In the example, the
also not otherwise be expanded. The PSMA takes the partial
                                                                    PSMA starts the comparison with the node “a” and its cate-
parses generated by the HCP as input and corrects the dis-
                                                                    gory DT in the initial parse tree. The root node of the new
fluency by trying to find a structural match between the new
                                                                    structure is S which does not match with DT. The PSMA
sub-tree and the initial parse tree. The following example
                                                                    moves to the next higher level NP, which also fails to
illustrates the idea.
                                                                    match. The third attempt, to match with VP, is also unsuc-
   The sentence “I have a … I have got a picket fence” is an
                                                                    cessful. Finally, the PSMA moves to the highest node S and
example of a fresh start. The speaker meant to say “I have
                                                                    finds a match of the initial structure with the new, complete
got a picket fence”, but inadvertently started with “I have
                                                                    structure. The two structures are then merged, through an
a” and then starts the sentence over with “I have got a
                                                                    overlapping process, in which the newer generated structure
picket fence.”
                                                                    precedes over the initial, earlier created structure. This re-
   The syntactic parsing of the initial, false start:
                                                                    flects the assumption that speakers correct themselves in
   S [NP[PRP[I]] VP[HV[have]] NP[DT[a] NOM[ ]]]]                    later parts of the utterance, and thus the later parts override
reveals that the parser is trying to recognize as next com-         earlier parts of the input. The result is a new, complete parse
pleted structure a Noun Phrase NP starting with “a” and             tree (Figure 4). The comparison of the two structures
expects as next category a Nominal NOM (Figure 3).                  through top down traversal confirms a structural match to a
   In this parsing situation, a nominal is thus expected as         high degree. This makes the PSMA a robust and reliable
root node of the next completed sub-tree. The HCP contin-           method to correct the found disfluency.
ues to parse and derives a new, fully activated sub-tree,
which is in this example a complete sentence structure. The                       S
algorithm detects that a nominal is expected but it is not
present in the expected position in the parsed input sentence.
Instead, the root node is labeled with the sentence symbol S.                NP              VP
Thus, it is assumed that a disfluency has occurred and this
point in the parse tree is interpreted as the interruption point.
The PSMA takes the complete sentence structure returned                     PRP       HV          VP
by the HCP:
   S [NP [PRP [I]] VP [HV[have] VP[VBP[got]
                         NP [DT[a] NOM [picket fence]]]]                      I       have        VBP   NP

and tries to match the root node S of this structure with pre-
vious incomplete structure shown in Figure 3.                                                     got   DT   NOM

                       S
                                                                                                        a    picket   fence

                 NP               VP
                                                                          Figure 4: Full parse of "I have got a picket fence"
                                                                       In the example, we see that the PSMA has detected the
                 PRP       HV          NP
                                                                    interruption point based on the fact that the parse could not
                                                                    be continued as expected. The search and match procedure
                  I        have        DT   NOM                     found a substantial overlap and proper replacement for the
                                                                    incorrect structure. The PSMA does not expect, however,
                                                                    that a phrase is repeated exactly to correct a disfluency. A
                                       a                            successful match starting of a parent node in the initial parse
                                                                    tree and the root node of the new sub-tree would be suffi-
            Figure 3: Partial parse of "I have a"                   cient to allow the PSMA to merge the two structures.
4   Evaluation                                                     complete sub-tree from the HCP, the PSMA started imme-
                                                                   diately to search and match with the initial tree, in order to
We tested the method with the HCRC Map Task Corpus by              perform an overlapping of structures to correct the disflu-
Anderson et al. [1992]. This corpus contains transcribed           ency in the utterance. In the incremental processing tests,
dialogues regarding path finding in a terrain, between an          the PSMA detected 89 out of 121 repair instances with a
instruction giver with a route marked map and an instruction       recall rate of 73.5%. Of these 89 repairs, the PSMA was
follower with an unmarked map. Since it is based on spon-          successfully able to correct 80 of them.
taneous spoken language, the Map Task Corpus contains
disfluencies typical for natural conversation. An advantage
of the Map Task Corpus is the restriction to a particular do-
                                                                   5   Future Work
main of conversation, which made the grammar and lexicon           As our results have shown, the incremental processing
development for our tests with the HCP and PSMA rela-              method has a promising prospect. We would like to extend
tively fast and easy.                                              our work with this method in the following directions.

4.1 Evaluation Metrics                                             Parallelizing the HCP. The HCP is implicitly parallel in
                                                                   nature. When each node is activated, we have the choice of
The merit of the speech repair tool is calculated using the        binding it to (several) other nodes, or to create a new mini-
measures of precision and recall – these two metrics have          network for it. These options are mutually exclusive, i.e.
been used earlier [Core, 1999; Heeman and Allen, 1999;             they cannot be present in the same parse tree, and could thus
Nakatani and Hirschberg, 1993] and are a standard meas-            be executed in parallel. We are considering a parallel im-
urement for disfluency detection.                                  plementation of the HCP to speed up the computation. Al-
Precision                                                          though speeding up the parser does not improve the recall
The precision is calculated by Tp/(Tp+Fp), where Tp (True          rate, it makes the parser more applicable for practical use.
positive) is the total number of repairs that are detected as      Removing useless mini-networks. In our implementation,
repairs and Fp (False positive) is the total number of false       the HCP can merge and create new mini-networks as
repairs that are detected as repairs.                              needed. None of these mini-networks were removed during
Recall                                                             run time, even though they might not have been needed any-
The recall is calculated by Tp/(Tp+Fn), where Fn (False            more after a certain point in time. After processing suffi-
negative) is the total number of repairs that are not detected     cient input incrementally, we can detect, when a partial
as repairs. In our tests based on the HCRH Map Task Cor-           parse tree / mini-network may no longer be useful. Remov-
pus, all examples contained at least one disfluency. We thus       ing the mini-network can reduce excessive memory re-
did not calculate the precision, as there were no false posi-      quirements and at the same time speed up the whole compu-
tives in the test set.                                             tation.
                                                                   Using a refined and extended grammar. The choice of
4.2 Experiments                                                    grammar can be a determining factor in robust parsing. Fine
We implemented the PSMA integrated with the HCP in two             tuning the grammar rules can improve the repair detection
different ways: using a post-processing method, in which           capability of the parser. Verb subcategorization [Jurafsky
the PSMA starts to work only after the HCP has created all         and Martin 2000], for example, can be employed to refine
possible structures, and an incremental method, in which the       the grammar rules. We would like to use a more elaborate
PSMA is activated as soon as parsing cannot be continued           grammar notation, and do further tests with a larger set of
and a disfluency is assumed.                                       grammar rules and an expanded lexicon, to see how the
   In our first set of experiments we let HCP to finish pars-      spoken language parser improves and/or scales up.
ing of the full utterance. The PSMA then takes over trying
to fix the disfluencies marked by interruption points. We          6   Conclusion
conducted the experiment in three stages. For the first stage,
we had 55 test sentences with a single repair instance. The        We have extended a Hybrid Connectionist Parser (HCP)
parser was able to correct 38 of them. For the second stage        with a Phrase Structure Matching Algorithm (PSMA) in
we expanded the grammar and ran tests on 71 sentences and          order to perform parsing of spoken language. This approach
the parser corrected 28 of them. In the third stage, we in-        combines the use of symbolic and connectionist methods
creased the number of repair instances. Of the 98 sentences        and features, in particular, the use of symbolic features like
we tested, the parser corrected 42 of them.                        syntactic categories as nodes, and neural network features
   In the first set of experiments, it turned out that the post-   like graded activation and threshold dependent processing in
processing method had problems to correct utterances with          the HCP. The neural network features are suitable to deal
more than two repair instances. We subsequently tested an          with incomplete and inconsistent structures, and thus it
incremental version of the combined HCP and PSMA,                  deemed useful to employ the parser for spoken language
which brought about better results.                                processing. Typical phenomena of spoken language are dis-
   In this second set of experiments, the PSMA started as          fluencies, in particular corrections and modifications of ut-
soon as a disfluency was detected. After receiving a new           terances during spontaneous speech. The HCP detects such
                                                                   disfluencies, and the PSMA repairs them based on a struc-
tural match and overlapping of comparable structures. The         [Hindle, 1983] D. Hindle. Deterministic Parsing of Syntactic
experimental results we obtained with a detection and cor-            Non-fluencies. Proc. 21st Annual Meeting of the Associa-
rection degree of approximately 70% are good, for a purely            tion of Computational Linguistics, 123–128, 1983.
syntactic method, which does not require any prosodic,            [Jain and Waibel, 1990] A. N. Jain and A. H. Waibel. In-
phonetic or other information.                                        cremental Parsing by Modular Recurrent Connectionist
   The method, however, seems to depend on the tuning of              Networks. In D. S. Touretzky (ed.): Advances in Neural
the grammar, which has to accurately reflect the linguistic           Information Processing Systems 2, Morgan Kaufmann,
input structures, and thus scalability and performance                San Mateo, CA, 1990.
should be improved through a more elaborate grammatical
notation and potentially by adding further information de-        [Jurafsky and Martin, 2000] D. Jurafsky and J. H. Martin.
rived from the speech signal, like hesitations, pauses or fill-       Speech and Language Processing. Prentice Hall, 2000.
ers. Overall, based on the first experimental results, the ap-    [Kemke, 1996] C. Kemke. A Hybrid Approach to Natural
proach has a promising prospect.                                      Language Parsing. von der Malsburg, von Seelen, Vor-
                                                                      brueggen, Sendhoff (Eds.): Artificial Neural Networks,
Acknowledgments                                                       Proc. ICANN'96, 875–880, Bochum, Germany, 1996.
The work reported in this paper is partly based on                [Kemke, 2001a] C. Kemke, Connectionist Parsing with Dy-
Venkatesh Manian’s M.Sc. thesis in the Department of                  namic Neural Networks – or: Can Neural Networks
Computer Science at the University of Manitoba. He is now             make Chomsky Happy?, Technical Report MCCS-01-
affiliated as software architect with a local company.                327, CRL, NM State University, 2001.
      This work has been supported by the Natural Sciences        [Kemke, 2001b] C. Kemke. Generative Connenctionist
and Engineering Research Council of Canada (NSERC).                   Parsing with Dynamic Neural Networks. Proc. 2nd
                                                                      Workshop on Natural Language Processing and Neural
References                                                            Networks, 31-37, Tokyo, Japan, 2001.
[Aho et al., 1986] Alfred V. Aho, Ravi Sethi, and Jeffrey D.      [Kemke, 2002] C. Kemke. A Constructive Approach to
    Ullman. Compilers: Principles, Techniques and Tools.              Parsing with Neural Networks – The Hybrid Connec-
    Pearson Education, 1986.                                          tionist Parsing Method. Advances in Artificial Intelli-
[Anderson et al., 1992] A. Anderson, M. Bader, E. Bard,               gence, LNAI-2338, 310-318, Springer, 2002.
    E. Boyle, G.M. Doherty, S. Garrod, S. Isard, J. Kowtko,        [Lavie, 1996] A. Lavie. GLR*: A Robust Grammar-
    J. McAllister, J. Miller, C. Sotillo, H.S. Thompson and           Focused Parser for Spontaneously Spoken Language.
    R. Weinert. The HCRC Map Task Corpus. Language                    Ph.D. Thesis, Carnegie Mellon University, 1996.
    and Speech. 34: 351–366, 1992.                                [Manian, 2005] V. Manian. Integrating a Connectionist
[Bear et al., 1992] J. Bear, J. Dowding, and E. Shriberg.             parser and a Graph Matching Algorithm for Handling
    Integrating Multiple Knowledge Sources for Detection              Disfluenciesnin Spoken Language Processing. M.Sc.
    and Correction of Repairs in Human-Computer Dialog.               Thesis, University of Manitoba, 2005.
    Proc. 30th Annual Meeting of the Association for Compu-       [McKelvie, 1998] D. McKelvie. The Syntax of Disfluency
    tational Liguistics, 56-63, 1992.                                 in Spontaneous Spoken Language. Human Communica-
[Core, 1999] M. G. Core. Dialog Parsing: From Speech                  tions Research Centre, HCRC/RP-95, Edinburgh, May
    Repairs to Speech Acts. PhD thesis, University of Roch-           1998.
    ester, New York, 1999.                                        [Nakatani and Hirschberg, 1993] C. Nakatani and J.
[Dale et al., 2000] R. Dale, H. Moisl, H. Somers, and H. L.           Hirschberg. A Speech-First Model for Repair Detection
    Somers (Eds.). Handbook of Natural Language Process-              and Correction. Proc. 31st Annual Meeting of the Asso-
    ing. Marcel Dekker, New York, 2000.                               ciation for Computational Linguistics, 46–53, 1993.
[Dowding et al., 1993] J. Dowding, J. M. Gawron, D. Ap-           [Sharkey, 1992] N. Sharkey. Connectionist Natural Lan-
    pelt, J. Bear, J. Cherny, R. Moore and D. Moran. Gem-             guage Processing. Intellect, Oxford, England, 1992.
    ini: A Natural Language System for Spoken Language            [Shriberg, 1997] E. Shriberg, R. Bates, and A. Stolcke. A
    Understanding. Proc. 31st Annual Meeting of the Asso-             Prosody-only Decision-Tree Model for Disfluency De-
    ciation for Computational Linguistics, 54-61, Columbus,           tection. Proc. 5th European Conference on Speech Com-
    Ohio, 1993.                                                       munication and Technology, 2383-2386, Rhodes,
 [Elman, 1990] J. L. Elman. Finding Structure in Time.                Greece, 1997.
    Cognitive Science, 14: 179–211, 1990.                         [Wermter and Weber, 1997] S. Wermter and V. Weber.
[Heeman and Allen, 1999] P. Heeman and J. Allen. Speech               SCREEN: Learning a Flat Syntactic and Semantic Spo-
    Repairs, Intonational Phrases and Discourse Markers,              ken Language Analysis Using Artificial Neural Net-
    Modelling Speakers Utterance in Spoken Dialogue.                  works. Journal of Artificial Intelligence Research, 6:35–
    Computational Linguistics, 25:527-571, December 1999.             85, 1997.