Segmenting Sequences Semantically Using Petri Net Transducers for the Translation from Sequential Data to Non-Sequential Models Markus Huber1,2(B) and Matthias Wolff1 1 Brandenburg University of Technology Cottbus-Senftenberg, Germany 2 InnoTec21 GmbH, Leipzig, Germany markus.huber@b-tu.de Abstract In previous work we presented an extension and generalisation of finite state transducers (FSTs) to so-called Petri net transducers (PNTs). These are applicable to any form of transforming sequential input signals into non-sequential output structures – which can be used to represent the semantics of the input – by performing a weighted relation between partial languages, i.e. assigning one weight per related input-output-pair. This paper extends the framework by an additional weighted output where every node of the structure carries a weight. Moreover the output structure induces segments on all intermediate transducers and thus on the input giving a more detailed view on how the single weight emerged during the transformation. We extend the definitions of PNTs, show the resulting changes for the operation of language composition, and discuss the impact on other PNT-algorithms. The theory is accompanied by an example of translating a keyboard input stream into a structural representation of commands to be executed. Keywords: Petri net transducer · Weighted labelled partial order · Concurrent semiring · Natural language understanding · Cognitive systems 1 Introduction In [16] we introduced a special form of Petri nets for the weighted translation of partial languages. We showed how Petri net transducers (PNTs) are a generalisation of finite state transducers (FSTs) [2] and discussed several operations including language com- position where a cascade of transducers translates from a source language into a target language by means of intermediate languages. In [23,14,15] we proposed the use of PNTs in the field of semantic dialogue modelling for the translation of utterances into meanings. We first give a brief overview of the area of research within which our current work is developed. Cognitive systems (including speech dialogue systems) mainly consist of three parts allowing them to interact with their environments: perceptor, behaviour control, and actuator [7]. Of these perceptor and actuator each comprise a hierarchy of transforming units and bidirectional communication between the particular levels (cf. [21]). The responsibility of the perceptor hierarchy is translating input signals into semantic representations where the relevant parts from the input are related to semantic categories 139 relevant to the system. While a low-level signal – e.g. the name Wolfgang Amadeus Mozart when spoken by a person – is sequential, its semantics is, in general, non- sequential – although there is an order relation between Wolfgang and Amadeus there is no order between those and Mozart. We use the concept of feature-values-relations (FVRs) [10,9] as semantic repres- entations. In figure 1 the semantic structure for the name Wolfgang Amadeus Mozart can be seen where the semantic categories are depicted as ellipses and the parts from the signal as rectangles. The order relation between Wolfgang and Amadeus is retained by the order of their semantic categories. Although the category of middle name exists, examples as Pippilotta Viktualia Rullgardina Krusmynta Efraimsdotter Långstrump or Oscar Zoroaster Phadrig Isaac Norman Henkle Emmannuel Ambroise Diggs should motivate the need for a more general approach to repetition. Note that although the example obeys a tree-like form, FVRs are neither limited to trees nor need to be single rooted. Which semantic categories and which parts of an input signal are of relevance for a concrete system depends on its scope of action. Therefore semantics is clearly not a function of (only) the input signal and especially not a function of its syntax. By assigning weights to the nodes of an FVR it is possible to represent the confidence of each single part of the semantics as opposed to an overall weight. Examples for non-sequential semantic representations are concept relational trees [1] which structures are constraint to binary trees, and abstract meaning representation [17] and feature structures [18] which can have a more general structure. All of them are single rooted, do not provide a nice way to preserve order information of repeated parts, and are computed from syntactical analysis. Furthermore none of them carry weights on individual nodes and only the last two have assigned an overall weight. [20] uses relational trees where connections between entities relevant for the system are represented. The authors state that ‘in most cases a relation link is analogous to a syntactic dependency link’ and again these tree-like structures are single rooted, do not cover the preservation of order information, and there is only a single weight assigned. We use labelled partial orders (LPOs) to represent FVRs, and PNTs to represent sets of LPOs and sets of pairs of LPOs. The transducer operation of language composition allows building hierarchical bidirectional translation systems and since PNTs are a generalisation of FSTs we can utilise results from existing research in the field of speech technology [8,25] to build on. The resulting transducer cascade allows bottom-up processing of input signals to output decoded meaning, and top-down processing using prior knowledge and expectations. Systems as described in [20] all have a gap between syntax and semantics since they first translate input signals into syntactic representations and do their semantic analysis afterwards. Our approach needs no premature decisions name given name surname Wolfgang given name Mozart Amadeus Figure 1. Semantic representation of ‘Wolfgang Amadeus Mozart’. 140 CMD/1.00 c p P N T s A T A E S / cp/0.81 SRC/1.00 DST/1.00 FILE/0.50 DIR/1.00 PNTs/0.63 ATAED/0.01 Figure 2. Segmented input sequence and weighted output structure because we directly translate from the signal level into semantic representations. Also the priming of the perceptor hierarchy by simply appending another transducer which represents expected semantics is (to our knowledge) a unique feature. The semantic structure – the output of the perceptor – serves as input to the behaviour control which computes the reaction of the cognitive system. These reactions also include interactions with the environment to request additional information for an incomplete structure. New input is incorporated into the already collected information. In [5] several operations on FVRs are described which admit an algebraic structure on the set of FVRs. In [20,19] a similar approach is described where relational trees are used to represent the information state of the system and some operations are described using so-called tree regular expressions. However, there is a complete lack of algebraic investigation. The computed output of the behaviour control is again a semantic structure which gets translated by the actuator hierarchy into an output signal. According to system theory the actuator should use the same formalism as the perceptor (cf. [8]). Since PNTs provide the operation of inversion – by swapping the input and output label of each transition – they constitute a good choice. In contrast to PNTs tree transducers [12] and DAG transducers [18] are both derivation tree oriented and in general not closed under language composition and inversion. Both do not allow semantic priming in general but can be used for generation with some reservations. Speech dialogue systems must operate under real-time conditions. There is a period of a few hundred milliseconds for analysing the input, computing the reaction, and generating the output. Systems such as the Unified Approach to Signal Synthesis and Recognition (UASR) [8,24] have shown that FSTs can fulfil these requirements. We are confident that PNTs are also capable of meeting these demands and are currently developing an embedded cognitive user interface for intuitive interaction with arbitrary electronic devices [3] where we already applied first results. In previous work we also proposed a variant of PNTs capable of handling output LPOs of arbitrary width (rhPNTs [11]) which are needed to produce general structures as in figure 1. For the here presented work we do not use that extension but we will describe another enhancement of PNTs which is capable of segmenting the input (sensor signal, speech or some text) into semantic units and assigning meaningful weights – e.g. likelihoods or probabilities – to these segments. Figure 2 shows a simple example of an input text and the decoded semantics. The scenario is that a user types the command cp PNTs ATAES/ on the keyboard trying to copy a file named PNTs to a folder named ATAED. The input – a sequence of character events – is translated into a semantic structure for executable commands including correction of the destination of the copy- command. So the system executes the user’s intention not the command he typed. 141 The paper is organised as follows: In section 2 we recall basic definitions and introduce in section 3 the used translation formalism, always demonstrating the concepts on our example. Thereby subsections 3.1 and 3.2 reuse the definitions from [16] and subsection 3.2 extends the definitions from [11]. In section 4 we introduce the new notion of segments of weighted LPOs and extend Petri net transducers to make use of them. Afterwards, we introduce segmenting Petri net transducers fusing the concepts given so far and finally give a brief conclusion and outlook on future work in section 5. 2 Mathematical Preliminaries By N0 we denote the set of non-negative integers, by N the set of positive integers. The set of all multisets over a set X is the set N0 X of all functions f : X → N0 . Addition + on multisets is defined by (m + m0 )(x) = m(x) + m0 (x). The relation ≤ between multisets is defined through m ≤ m0 :⇔ ∃m00 (m + m00 = m0 ). We write x ∈ m if m(x) > 0. A set A ⊆ X is identified with the multiset m satisfying (m(x) = 1 ⇔ x ∈ A) ∧ (m(x) = 0 ⇔ x 6∈ A). A multiset m satisfying m(a) > 0 for exactly one element a we call singleton multiset and denote it by m(a)a. Given a binary relation R ⊆ X ×Y over the sets X,Y and sets A ⊆ X and B ⊆ Y , we denote the image of A by R(A) = {y ∈ Y | ∃x ∈ A : (x, y) ∈ R} and the preimage of B by R−1 (B) = {x ∈ X | ∃b ∈ B : (x, b) ∈ R}. We denote by Dom(R) = R−1 (Y ) the domain of R and call R(X) the image of R. For X 0 ⊆ X and Y 0 ⊆ Y the restriction of R onto X 0 ×Y 0 is denoted by R|X 0 ×Y 0 . Given a binary relation R ⊆ X ×Y and a binary relation S ⊆ Y × Z for sets X,Y, Z, their composition is defined by R ◦ S = {(x, z) | ∃y ∈ Y ((x, y) ∈ R ∧ (y, z) ∈ S)} ⊆ X × Z. For a binary relation R ⊆ X × X over a set X, we denote S R1 = R and Rn = R ◦ Rn−1 for n ≥ 2. The symbol R+ denotes the transitive closure n∈N Rn of R. Let A be a finite set called an alphabet. A word over A is a finite sequence of symbols from A . For a word w its length |w| is the number of its symbols. The symbol ε denotes the empty word satisfying |ε| = 0 and is the neutral w.r.t. concatenation of words: wε = εw = w. By A ∗ we denote the set of all words over A , including the empty word. A step over A is a multiset over A . A step sequence over A is an element of (N0 A )∗ . A directed graph is a pair G = (V, →), where V is a finite set of nodes and → ⊆ V ×V is a binary relation over V , called the set of edges. For a node v ∈ V its preset is the set • v = →−1 ({v}) and its postset the set v• = →({v}). A path is a sequence of (not necessarily distinct) nodes v1 . . . vn (n > 1) such that vi → vi+1 for i = 1, . . . , n − 1. A path v1 . . . vn is a cycle if v1 = vn . A directed graph is called acyclic if it has no cycles. For an acyclic directed graph G = (V, →) its maximal nodes is the set Max(G) = {v ∈ V | v• = 0}, / its minimal nodes is the set Min(G) = {v ∈ V | • v = 0}. / An acyclic directed graph (V, →0 ) is an extension of an acyclic directed graph (V, →) if → ⊆ →0 . An irreflexive partial order over a set V is a binary relation < ⊆ V × V which is irreflexive (∀v ∈ V : v 6< v) and transitive (<=<+ ). A reflexive partial order over a set V is a binary relation ≤ ⊆ V ×V which is reflexive (∀v ∈ V : v ≤ v), transitive (≤ = ≤+ ) and antisymmetric (∀v, w ∈ V : v ≤ w ∧ w ≤ v =⇒ v = w). We identify a finite partial order ∼ over V with the directed graph (V, ∼). Given a partial order po = (V, ∼) we call 142 two nodes v, v0 ∈ V independent if v 6∼ v0 and v0 6∼ v. By co∼ ⊆ V ×V we denote the set of all pairs of independent nodes of V . A monoid is a triple (S, ∗, n) where S is a set, ∗ is a binary closed operation on S (∀a, b ∈ S : a∗b ∈ S), and n is the neutral w.r.t. ∗ (∀a ∈ S : a∗n = a = n∗a). The operation is often written by juxtaposition (ab = a ∗ b). If ∗ is idempotent (∀a ∈ S : aa = a) the monoid is called idempotent. If ∗ is commutative (∀a, b ∈ S : ab = ba) the monoid is called commutative. If there exists an absorbing element 0 ∈ S (∀a ∈ S : a0 = 0 = 0a) the monoid is called a monoid with zero. A monoid is called ordered if it is equipped with a reflexive partial order ≤ such that the operation ∗ is monotone (∀a, b ∈ S : a ≤ b =⇒ (∀c ∈ S : c ∗ a ≤ c ∗ b ∧ a ∗ c ≤ b ∗ c)). For a given monoid (S, ∗, n) the operation ∗ defines a binary relation on S via a ≤∗ b :⇔ ab = b. If this relation is a reflexive partial order, the monoid is called naturally ordered. An idempotent and commutative monoid (S, ∗, n) is naturally ordered. Moreover, if S is equipped with the reflexive partial order ≤∗ , then ∀a, b ∈ S : ab = sup{a, b}, where the supremum is taken w.r.t. ≤∗ . 3 Input and Output We use LPOs extended by weights from an algebraic structure called bisemiring to represent the input and output structures as depicted in figure 2 and use PNTs by the means of non-sequential semantics of Petri nets for the weighted translation of LPOs. 3.1 Weighting of Structures Except for definition 2 this subsection equates the state from [16]. A labelled partial order (LPO) over a set X is a 3-tuple (V, <, l), where (V, <) is an irreflexive partial order and l : V → X is a labelling function on V . In most cases, we only consider LPOs up to isomorphism, i.e. only the labelling of nodes is of interest, but not their names. Two LPOs (V, <, l) and (V 0 , <0 , l 0 ) are isomorphic if there is a bijective renaming function I : V → V 0 satisfying l(v) = l 0 (I(v)) and v < w ⇔ I(v) <0 I(w). In figures, we only show the labels of nodes, and normally omit transitive arrows. If an LPO lpo is of the form ({v} , 0, / l), then it is called a singleton LPO. A step-wise linear LPO is an LPO (V, <, l) where the relation co< is transitive. The maximal sets of independent nodes are called steps. The steps of a step-wise linear LPO are linearly ordered. Thus, step-wise linear LPOs can be identified with step sequences. A step-linearisation of an LPO lpo is a step-wise linear LPO lpo0 which is an extension of lpo. The set of series-parallel LPOs (sp-LPOs) is the smallest set of LPOs containing all singleton LPOs (over a set X) and being closed under sequential and parallel product. For two LPOs lpo1 = (V1 , <1 , l1 ) and lpo2 = (V2 , <2 , l2 ), where V 1 ∩V2 = 0/ is assumed, their sequential product is defined by lpo1 ; lpo2 = (V1 ∪V2 , <1 ∪ <2 ∪V1 ×V2 , l1 ∪ l2 ) and their parallel product is defined by lpo1 k lpo2 = (V1 ∪V2 , <1 ∪ <2 , l1 ∪ l2 ). For an LPO lpo we denote by SP(lpo) the set of all series-parallel extensions of lpo and by SPmin (lpo) the set of all minimal series-parallel extensions of lpo in SP(lpo). If lpo is an extension of lpo0 , we write lpo ≤ lpo0 . 143 A semiring is a quintuple S = (S, ⊕, ⊗, 0, 1), where (S, ⊕, 0) is a commutative monoid, (S, ⊗, 1) is a monoid with zero where 0 is the absorbing element, and ⊗ (the S- multiplication) distributes over ⊕ (the S-addition) from both sides. If ⊗ is commutative, then the semiring is called commutative. A bisemiring is a six-tuple S = (S, ⊕, ⊗, , 0, 1), where (S, ⊕, ⊗, 0, 1) is a semiring and (S, ⊕, , 0, 1) is a commutative semiring. The binary operation  on the set S is called S-parallel multiplication. If ⊕ is idempotent, the bisemiring is called idempotent. A concurrent semiring is an idempotent bisemiring (S, ⊕, ⊗, , 0, 1) satisfying ∀a, b, c, d ∈ S : (a  b) ⊗ (c  d) ≤⊕ (a ⊗ c)  (b ⊗ d). (CS) A weighted LPO (wLPO) over an alphabet A and a bisemiring S = (S, ⊕, ⊗, , 0, 1) is a quadruple (V, <, l, ν) such that (V, <, l) is an LPO over A and ν : V → S is an additional weight function. We use all notions introduced for LPOs also for wLPOs. The weight of sp-wLPOs is defined in the obvious way. The total weight is computed from the weights of the nodes through applying ⊗ to the sequential product and  to the parallel product of sub-wLPOs. This is well-defined, since the set of sp-wLPOs as well as the sub-structure (S, ⊗, ) of a bisemiring (S, ⊕, ⊗, , 0, 1) form an sp-algebra admitting an sp-algebra homomorphism from the set of sp-wLPOs into the bisemiring. For an sp-wLPO wlpo its weight is denoted by ω(wlpo). The weight for general wLPOs is computed as the sum of the weights of all its series-parallel extensions. Condition (CS) ensures that less restrictive wLPOs yield bigger weights. So in the case of using concurrent semirings it suffices to consider only minimal series-parallel extensions. Definition 1 (sp-Weight of wLPOs). Let wlpo = (V, <, l, ν) be a wLPO over a concur- rent semiring. Then its sp-weight is defined by ω sp (wlpo) = ⊕wlpo0 ∈SPmin (wlpo) ω(wlpo0 ). Figure 3 shows on the left the semantic structure from our example. The weights are drawn from the concurrent semiring ([0, 1], max, ·, ·, 0, 1) – a trivial extension of the Viterbi semiring. The sp-weight is easy to compute since the structure is an sp-wLPO. For wLPOs over the same bisemiring having isomorphic underlying LPOs we provide a new sum-operation. The result is again a wLPO with an isomorphic underlying LPO but all nodes carry the sum of the weights of their corresponding nodes. Definition 2 (Sum of wLPOs). Let wlpo = (V, <, l, ν) and wlpo0 = (V 0 , <0 , l 0 , ν 0 ) be two wLPOs over the bisemiring S = (S, ⊕, ⊗, , 0, 1) having isomorphic underlying LPOs and let I : V → V 0 be the corresponding bijective renaming function. Then their sum is defined by wlpo ⊕ wlpo0 = (V, <, l, ν ⊕ (ν 0 ◦ I)), where the sum of the weight functions is declared pointwise. CMD/1.00 CMD/1.00 CMD/1.00 cp/0.81 cp/0.34 cp/0.81 SRC/1.00 DST/1.00 ⊕ SRC/1.00 DST/1.00 = SRC/1.00 DST/1.00 FILE/0.50 DIR/1.00 FILE/0.50 DIR/1.00 FILE/0.50 DIR/1.00 PNTs/0.63 ATAED/0.01 PNTs/0.76 ATAED/0.02 PNTs/0.76 ATAED/0.02 Figure 3. Semantic structure as wLPO and sum of wLPOs 144 Figure 3 also shows a second wLPO, representing the same semantics, which get summed up with the first one. The new weights equal the maximum of the original weights. 3.2 Translation of Structures We use the concept of transducers to translate between LPOs by augmenting weighted transitions with input and output symbols. A run is represented as a wLPO over the transitions which can then be projected onto the input resp. output symbols. This sub- section equates the states from [16,11] and extends the notions from [11] with weights. Additionally, definitions 5, 7 and 8 provide formalisations of some concepts from [16,11]. A place/transition Petri net (PT-net) is a 4-tuple N = (P, T, F,W ), where P is a finite set of places, T is a finite set of transitions disjoint from P, F ⊆ (P × T ) ∪ (T × P) is the flow relation and W : (P × T ) ∪ (T × P) → N0 is a flow weight function satisfying W (x, y) > 0 ⇔ (x, y) ∈ F. A marking of a PT-net assigns to each place p ∈ P a number m(p) ∈ N0 of tokens, i.e. a marking is a multiset over P representing a distributed state. A marked PT-net is a PT-net N = (P, T, F,W ) together with an initial marking m0 . For (transition) steps τ over T we introduce the two multisets of places • τ(p) = ∑t∈T τ(t)W (p,t) and τ • (p) = ∑t∈T τ(t)W (t, p). A transition step τ can occur in m if m ≥ • τ. If τ occurs in m, the resulting marking m0 is defined by m0 = m − • τ + τ • . We τ write m → − m0 to denote that τ can occur in m and its occurrence leads to m0 . A step execution in m is a finite step sequence over T τ1 . . . τn such that there are markings τ1 τ2 τn m1 , . . . , mn with m − → m1 −→ ... −→ mn . We use LPOs over T to represent single non-sequential runs of PT-nets, i.e. the labels of an LPO represent transition occurrences. For a marked PT-net N = (P, T, F,W, m0 ) an LPO lpo = (V, <, l) over T is an LPO-run if each step-linearisation of lpo is a step execution of N in m0 . If an LPO-run lpo = (V, <, l) occurs in a marking m, the resulting marking m0 is defined by m0 = m − ∑v∈V • l(v) + ∑v∈V l(v)• . We denote the occurrence lpo of an LPO-run lpo by m −→ m0 . A Petri Net Transducer is a PT-net where each transition is augmented with an input and an output symbol and additionally carries a weight drawn from a bisemiring. Definition 3 (Petri Net Transducer). A Petri Net Transducer (PNT) over a bisemiring S = (S, ⊕, ⊗, , 0, 1) is defined as a tuple N = (P, T, F,W, pI , pF , Σ , σ , ∆ , δ , ω), where – (P, T, F,W, m0 ) with m0 = pI is a marked PT-net (called the underlying PT-net), pI ∈ P is the source place satisfying • pI = 0/ and pF ∈ P is the sink place satisfying pF • = 0, / – Σ is a set of input symbols and σ : T → Σ ∪ {ε} is the input mapping, – ∆ is a set of output symbols and δ : T → ∆ ∪ {ε} is the output mapping and – ω : T → S is the weight function. A wLPO wlpo = (V, <, l, ν) over T is a wLPO-run of N lpo – if the underlying LPO lpo = (V, <, l) is an LPO-run of N with pI −→ pF and – if ν(v) = ω(l(v))) holds for all v ∈ V . 145 We denote by wLPO(N) the set of all wLPO-runs of N. A PNT can be used to translate a partial language into another partial language, relating so-called input words to so-called output words. Input and output words are defined as LPOs (V, <, l) with a labelling function l : V → A ∪ {ε} for some input or output alphabet A . Such LPOs we call ε-LPOs. For each ε-LPO (V, <, l) we construct the corresponding ε-free LPO (W, <|W ×W , l|W ), W = V \ l −1 (ε) by deleting ε-labelled nodes together with their adjacent edges. Since partial orders are transitive, this does not change the order between the remaining nodes. Definition 4 (Input and Output Labels of Runs). Let N = (P, T, F,W, pI , pF , Σ , σ , ∆ , δ , ω) be a PNT and let wlpo = (V, <, l, ν) ∈ wLPO(N). The input label of wlpo is the LPO σ (wlpo) corresponding to the ε-LPO (V, <, σ ◦ l). The output label of wlpo is the LPO δ (wlpo) corresponding to the ε-LPO (V, <, δ ◦ l). For LPOs u over Σ and v over ∆ , we denote by wLPO(N, u) the subset of all wLPOs wlpo from wLPO(N) with input label σ (wlpo) = u, and by wLPO(N, u, v) the subset of all wLPOs from wLPO(N, u) with output label δ (wlpo) = v. The input language of a PNT is the set of all input labels of its weighted LPO-runs. Its elements are called input words. Output language and output words are defined analogously. A PNT assigns weights to all pairs of LPOs u over Σ and v over ∆ based on the weights of its wLPO-runs (cf. [16,13]). Concerning the semantics, only the input output behaviour of PNTs is relevant. Since transitions also may have empty input and/or empty output, there are always (infinitely) many PNTs having the same semantics. For practical application, such PNTs are equivalent (cf. [16]). For our example the PNT Nw generating the output word cp PNTs ATAED/ is shown in the upper part of figure 4 on the following page. Note that the symbols themselves are sequences of symbols and the typo of the user was corrected. This was possible since the system already did several translations of the original input sequence. First the sequence was translated into a set of sequences where every input symbol created alternatives according to the keyboard layout. If you type an S on a QWERTZ keyboard it could be that your intention was to type a D since the two keys are neighboured. Figure 5 on the next page shows a PNT which realises this correction for the letter S. The next step was translating from the many sequences of characters into sequences of known words. The system could filter out many sequences since it knows only about some commands, files and folders. Since there was no folder named ATAES but there exists a folder ATAED, the shown sequence of words was generated. The PNT No in the lower part translates between the two words depicted below of it. Since there is a file PNTs and also a folder with the same name, the weight for FILE is also lesser than 1. Note that the input and output languages of a PNT N are extension closed, since wLPO(N) is extension closed. This allows No from figure 4 to also accept the two step- linearisations cp PNTs ATAED/ and cp ATAED/ PNTs as input. Indeed these would be translated into step-linearisations of the depicted output word with an additional edge between PNTs and ATAED. We now introduce the central transducer composition operation of language com- position. It allows to put several transducers in a row to translate from the input of the 146 ε:cp/0.81 ε:PNTs/0.63ε:ATAED//0.01 cp PNTs ATAED/ Nw 1 ε:DST/1.00 ε:DIR/1.00 ATAED/:ATAED/1.00 ε:CMD/1.00 cp:cp/1.00 2 No 1 ε:SRC/1.00 ε:FILE/0.50 PNTs:PNTs/1.00 ATAED/ DST DIR ATAED cp CMD cp PNTs SRC FILE PNTs Figure 4. Two PNTs with input and output words first one to the output of the last one utilising the intermediate ones. We consider a fixed concurrent semiring S = (S, ⊕, ⊗, , 0, 1) and a PNT Na over S with input alphabet Σa and output alphabet ∆a and a PNT Nb over S with input alphabet Σb = ∆a and output alphabet ∆b . We assume Na and Nb to be disjoint. We define a binary relation $ ⊆ Ta × Tb on the Cartesian product of the transitions from Na and Nb by ta $ tb :⇔ δa (ta ) = σb (tb ) and say that ta and tb build a composable pair. Any transitions ta ∈ Ta with empty output (δa (ta ) = ε) and tb ∈ Tb with empty input (σb (tb ) = ε) are called non-invasive. Any invasive transition which is not part of a composable pair is said to be futile. The construction corresponds to the parallel product of Na and Nb and merging each composable pair of transitions (ta ,tb ) to a new transition with input symbol σa (ta ) and output symbol δb (tb ), weight ωa (ta )  ωb (tb ) and connections •ta + •tb and ta • + tb • . Moreover, we keep all non-invasive transitions of Na and Nb unchanged, and omit all other transitions of Na or Nb , i.e. all futile transitions. Figure 6 on the following page shows the result of the language composition of Nw and No from figure 4. The nodes filled with dots establish the parallel product, the greyed transitions are the result of merging transitions from composable pairs, and all white transitions are non-invasive ones from No . The greyed nodes also represent Nw inside the result. To provide a formal definition of language composition we set for i ∈ {a, b} – for the construction of the parallel product of Na and Nb • Pk = {p◦I , p◦F } as the new source and sink places, • Tk = {t◦I ,t◦F } as the splitting and joining transitions, 1 S:S/0.88 S:A/0.02 S:W/0.02 S:E/0.02 S:D/0.02 S:X/0.02 S:Y/0.02 Figure 5. A PNT suggesting alternatives for a typed S 147 CMD cp DST DIR ATAED SRC FILE PNTs ε:CMD/1.00 ε:cp/0.81 ε:SRC/1.00 ε:FILE/0.50 ε:PNTs/0.63 2 ε:DST/1.00 ε:DIR/1.00 ε:ATAED/0.01 Figure 6. Language composition of Nw and No with output word • Fk = {(p◦I ,t◦I ), (t◦I , pIa ), (t◦I , pIb ), (pFa ,t◦F ), (pFb ,t◦F ), (t◦F , p◦F )} as the corres- ponding flow relation, – for the merged transitions • T m = $ using the composable pairs as new transitions, • πi : T m → Ti as the projections onto the first resp. second component of a com- posable pair and for any set X we also write πi to mean (idX , πi ) or (πi , idX ), • Fim = {(p,t) ∈ Pi × T m | πi (p,t) ∈ Fi } ∪ {(t, p) ∈ T m × Pi | πi (t, p) ∈ Fi } to be the merged flow relation part from Ni , • Wim = ∑(p,t)∈Pi ×T m Wi (πi (p,t))πi (p,t) + ∑(t,p)∈T m ×Pi Wi (πi (t, p))πi (t, p) to be the merged flow weight function part from Ni , • ω m : T m → S as the merged weight function with ω m (t) = ωa (πa (t))  ωb (πb (t)), – for the keeping of non-invasive transitions • Tin to be the non-invasive transitions from Ni , • Fin = Fi |(Pi ×Tin )∪(Tin ×Pi ) as the non-invasive flow relation of Ni . Definition 5 (Language Composition). Let Na = (Pa , Ta , Fa ,Wa , pI , pF , Σa , σa , ∆a , δa , ωa ) and Nb = (Pb , Tb , Fb ,Wb , pI , pF , Σb , σb , ∆b , δb , ωb ) with ∆a = Σb be two PNTs over the same concurrent semiring S = (S, ⊕, ⊗, , 0, 1). Then, using the notations from above, the language composition Na ◦ Nb is the PNT N = (P, T, F,W, pI , pF , Σ , σ , ∆ , δ , ω) over S , where – P = Pk ∪ Pa ∪ Pb and T = Tk ∪ T m ∪ Tan ∪ Tbn , – F = Fk ∪ Fam ∪ Fbm ∪ Fan ∪ Fbn , – W |Fk ≡ 1, W |Fam ≡ Wam , W |Fbm ≡ Wbm , W |Fan ≡ Wa , W |Fbn ≡ Wb , – pI = p◦I and pF = p◦F , – Σ = Σa and ∆ = ∆b , – σ |Tk ≡ ε, σ |T m ≡ σa ◦ πa , σ |Tan ≡ σa , σ |Tbn ≡ ε, – δ |Tk ≡ ε, δ |T m ≡ δb ◦ πb , δ |Tan ≡ ε, δ |Tbn ≡ δb , – ω|Tk ≡ 1, ω|T m ≡ ω m , ω|Tan ≡ ωa and ω|Tbn ≡ ωb . 148 Since input and output languages of PNTs are extension closed it is possible that the operation of language composition propagates dependencies from one PNT to the other. Consider two PNTs Na and Nb like in the above definition, an output word u0 of Na , an input word u of Nb with u0 ≤ u and the output word v into which Nb translates u. Then Nb also accepts u0 as input word and translates it into an output word v0 ≤ v. This conjuncture can also be seen on the output word in figure 6 where the language composition took over the order between PNTs and ATAED/ from Nw whereas the corresponding transitions from No were unordered. The problem is discussed in more detail in [11] where solutions based on separating input and output processing are proposed. All those solutions have in common that the processing of weights, i.e. merging of transitions, no longer occurs where the output is produced. To overcome this we use another approach from [11] namely Hierarchical Petri Net Transducers. We extend them by augmenting the transitions with weights drawn from a bisemiring and introduce the additional concept of shared transitions. A Hierarchical Petri Net Transducer is a PNT together with a refinement operation to substitute transitions by other PNTs. This induces sets of transitions which can be used to eliminate some edges from the input and output labels of runs by restricting the order relation. With this post processing transitions can be isolated from one another solving the problem of propagating unwanted order during language composition operation. Definition 6 (Hierarchical Petri Net Transducer). For a given PNT N0 = (P0 , T0 , F0 , W0 , pI , pF , Σ0 , σ0 , ∆0 , δ0 , ω0 ) over a bisemiring S = (S, ⊕, ⊗, , 0, 1) a hierarchical PNT (hPNT) over S is a 6-tuple H = (N0 , N , ρ, Fc ,Wc , Ts ), where – N0 is the initial PNT, – N = {N1 , . . . , Nk } is a family of refinement PNTs over S whereat N0 , N1 , . . . , Nk are pairwise disjoint, – ρ : T0 → {1, . . . , k} is a partial refinement function which is injective (ρ(t) = ρ(t 0 ) =⇒ t = t 0 ) and associates transitions from the initial PNT with PNTs from N . A transition t ∈ T0 , if t ∈ Dom(ρ) holds, is refined by the PNT Nρ(t) . Any transition t ∈ T0 for which t 6∈ Dom(ρ) holds is called simple. Any simple transition t must have empty input and output σ0 (t) = ε = δ0 (t) as well as neutral weight ω0 (t) = 1. S S – Fc ⊆ (P0 × ki=1 Ti ) ∪ ( ki=1 Ti × P0 ) is the crossing flow relation which allows to connect transitions from refinement PNTs to places from the initial PNT, – Wc is the corresponding crossing flow weight function analogous to PT-nets and – Ts ⊆ T0 is the set of shared transitions which allow to relax the isolation of transi- tions. The crossing components are initially empty and only needed for holding information when computing the language composition of a PNT and an hPNT where the result is again an hPNT (see definition 8). The same holds for the set of shared transitions. In figure 7 on the next page one can see the hPNT Ho where the transitions t1 , t2 , and t3 are refined by the PNTs N1 resp. N2 resp. N3 . The refinement operation is defined in an obvious way. Any non-simple transition is substituted with two new ε-transitions with neutral weight 1. One is called down-going 149 N1 ε:DST/1.00 ε:DIR/1.00 Ho N3 ε:CMD/1.00 t3 2 1 t1 t2 N2 ε:SRC/1.00 ε:FILE/0.50 cp:cp/1.00 PNTs:PNTs/1.00 ATAED/:ATATED/1.00 N1 1 N2 1 N3 1 Figure 7. hPNT Ho , refinement PNTs Ni for transitions ti (i = 1, 2, 3) refinement transition and only inherits the incoming arcs and the other one is called up-going refinement transition and inherits only the outgoing arcs. The down-going refinement transition is connected to the source place of the refinement PNT and the up-going refinement transition with its sink place. For the formal procedure of refining we refer to [11]. The result of refining is a PNT N called the interface of H and the set of runs of N is per definition the set of runs of H. We use all other notations introduced for PNTs as well as definition 4 also for hPNTs by means of their interfaces. The hPNT Ho in figure 7 has the same input and output words as the PNT No from figure 4 since the interface of Ho is equivalent to No . But hierarchical PNTs have additional input and output labels where the order relation is restricted. Dependencies between different refinement PNTs are excluded except for shared transitions. Dependencies between the initial PNT and refinement PNTs other than by the refinement itself and through shared transitions are excluded as well. We focus on the output labels. Figure 8 shows an output word of Ho where the nodes are grouped to visualise the allowed connections. The narrower lines represent the refinement paths. We use the notations from [11] where • T and T • are the sets of down-going resp. up- going refinement transitions and set Tr = • T ∪T • to be the set of all refinement S transitions. We use K = ρ(T0 ) as the image of ρ and K0 = K ∪ {0}. Then we have r = i∈K0 l −1 (Ti ∪ Tr ∪ Ts ) × l −1 (Ti ∪ Tr ∪ Ts ) as the set of all allowed relations. Definition 7 (Relaxed Output Label of Runs). Let H = (N0 , N , ρ, Fc ,Wc ) be a hPNT and let wlpo = (V, <, l, ν) ∈ wLPO(H). The relaxed output label of wlpo is the LPO δr (wlpo) corresponding to the ε-LPO (V, <|r + , δ ◦ l). For LPOs u over Σ and v over ∆ , we denote by wLPOO,r (H, u, v) the subset of all wLPOs wlpo from wLPO(H, u) with relaxed output label δr (wlpo) = v. We consider a fixed concurrent semiring S and a PNT Na over S with input alphabet Σa and output alphabet ∆a and an hPNT H over S with its interface Nb and input alphabet Σb = ∆a and output alphabet ∆b . We assume Na and Nb to be disjoint. DST DIR ATAED CMD cp SRC FILE PNTs Figure 8. Output word of Ho 150 Using the notations introduced before definition 5 as well as the notations from above of definition 7 we additionally set – the right partial language composition Na |◦ Nb to be the PNT N = (P, T, F,W, pI , pF , Σ , σ , ∆ , δ , ω) with • P = Pb and T = T m ∪ Tbn , • F = Fbm ∪ Fbn , • W |Fbm ≡ Wbm , W |Fbn ≡ Wb , • pI = pIb and pF = pFb , • Σ = Σa and ∆ = ∆b , • σ |T m ≡ σa ◦ πa , σ |Tbn ≡ ε, • δ |T m ≡ δb ◦ πb , δ |Tbn ≡ δb , • ω|T m ≡ ω m , ω|Tbn ≡ ωb and – the remaining flow relation F(Na |◦ Nb ) = Fam as the non-invasive flow relation of Na . Definition 8 (Hierarchical Language Composition). Let Na = (Pa , Ta , Fa ,Wa , pI , pF , Σa , σa , ∆a , δa , ωa ) be a PNT over the concurrent semiring S = (S, ⊕, ⊗, , 0, 1), H be an hPNT over S , and Nb be the interface of H with ∆a = Σb . Then, using the notations from above, the hierarchical language composition Na ◦ H is the hPNT H 0 = (N00 , N 0 , ρ 0 , Fc0 ,Wc0 ) where – N00 = Na ◦ N0 , – N 0 = {N10 , . . . , Nk0 } where Ni0 = Na |◦ Ni for every i = 1, . . . , k, – ρ 0 = ρ, S – Fc0 = Fc ∪ i=1,...,k F(Na |◦ Ni ), – Wc0 |Fc ≡ Wc , Wc0 |F(Na |◦Ni ) ≡ Wa |F(Na |◦Ni ) for every i = 1, . . . , k and – Ts0 = Ts ∪ Tan . The hierarchical language composition corresponds to the language composition of the PNT and the interface of the hPNT. Technically, the above definition only states where to put the new places and transitions into. Consider now the language composition Nw ◦ Ho . The output word of the result would be the same as for Nw ◦ No depicted in figure 6 but the relaxed output word would be the one from figure 8 where the order between PNTs and ATAED is restricted away. 4 Adding and Spreading the Weight The introduced framework allows for weighted relations between input and output but neither are weighted objects. After processing there is no way to see which part of a run had what impact on the result. Since we want to have a more detailed view of the weight we want to relate an input to a weighted output. Therefore we propose an additional definition of weighted output labels of runs for PNTs where the weight function is recomputed during the deletion of ε-labelled nodes. For this recomputation a wLPO 151 is segmented by its labels. This also has influence on the definition of equivalence for PNTs and results in technical restrictions for PNT-algorithms. Language composition of several PNTs can propagate the segments from one end to the other. The application to hPNTs leads to Segmenting Petri Net Transducers. These new ideas are introduced in two steps in the next subsections. 4.1 Segments of Weighted Structures A wLPO is segmented by means of its labels. Every segment contains only one non-ε- labelled node but ε-labelled nodes can be part of more than one segment. The weight of a single segment is computed as its sp-weight. For a given ε-wLPO wlpo = (V, <, l, ν) and for any subset V 0 ⊆ V of nodes we set – V6ε0 = {v ∈ V 0 | l(v) 6= ε)} to be the set of all non-ε-labelled nodes from V 0 , – VVε0 = V 0 ∪ {v ∈ V \ V6ε | ∃v0 ∈ V 0 : v < v0 } to be the set of all nodes from V 0 and all their ε-labelled predecessors, – VV 0 = V 0 ∪ {v ∈ V | ∃v0 ∈ V 0 : v < v0 } to be the set of all nodes from V 0 and all their predecessors, – poV 0 = (V 0 , <|V 0 ×V 0 ) to be the corresponding partial order on V 0 , – Min(V 0 ) = Min(poV 0 ) to be the set of all minimal nodes from V 0 and – Max(V 0 ) = Max(poV 0 ) to be the set of all maximal nodes from V 0 . Definition 9 (Segments of ε-wLPOs). Let wlpo = (V, <, l, ν) be an ε-wLPO. Let N be a set. Then, using the notations from above, we compute the segments seg(wlpo) of wlpo with the following procedure: 1. Initialise N with all nodes from V . 2. Let w = (N, <|N×N , l|N , ν|N ) be the current ε-wLPO. ε into seg(wlpo). 3. For every minimal non-ε-labelled node l ∈ Min(N6ε ) put N{l} 4. Set N to be N \ NMin(N6ε ) . 5. If N6ε 6= 0/ then continue with step 2. 6. Put N into seg(wlpo). For a non-ε-labelled node v ∈ V6ε we say that the wLPO seg(v) = (Sv , <|Sv ×Sv , l|Sv , ν|Sv ), where Sv is the one set from seg(wlpo) with Max(Sv ) = {v}, is the segment of v. The set from step 6 is called ε-segment of wlpo. For an ε-wLPO (V, <, l, ν) we construct the corresponding ε-free wLPO (Vs ,