Physical synthesis for CPLD architectures Sid-Ahmed Senouci Mentor Graphics, Grenoble, France Abstract—In this paper, we present a new synthesis feature namely, “Xor matching”, and the foldback product term synthesis for Complex Programmable Logic Devices (CPLD) architecture that is based on PAL-like macrocells. Our goal is to use the Xor gate and the foldback terms, (or shareable expander − Altera equivalent terminology [17]), available in each macrocell for minimizing the number of macrocells required to implement a circuit. We propose two innovative approaches: the first, is a very fast algorithm which always gives a match for a function onto the Xor gate of the CPLD device, when one exists; the second approach, is based on answering a fundamental problem: determine if a given foldback cluster can be assigned to a PAL block. A foldback cluster is defined as a set of functions and sub-functions that result in the same foldback which is created by the foldback decomposition algorithm. A suite of test cases (MCNC) were tested with device-fitting algorithms targeting the Atmel CPLD device (ATF15xx series) which implemented the corresponding hardware resources. I. Introduction The growth of the programmable logic market is a non-debatable success in the IC world over the last decade. CPLDs and FPGAs share this success in a balanced way and seem to be perfectly adapted to complementary needs. Most FPGAs have logic blocks based on look-up-tables (LUTs) , and some have multiplexer-based logic blocks. CPLDs are based on Programmable Array Logic style macrocells. Each macrocell can implement any Boolean function of up to k inputs and with no more than m product terms. Figure 2 shows the structure of an Atmel’s CPLD macrocell. It consists of Xor gate, foldback (or shareable expander − Altera equivalent terminology [17]), cascade (or parallel expander − Altera equivalent terminology [17]) and a flip-flop (Architecture of the macrocell is detailed in section II.1). An FPGA offers high logic density, high capacity and somewhat unpredictable delays, as a critical path may need to go through multiple levels of logic cells connected by programmable interconnections. On the other hand, CPLD offers lower logic density and predictable delays, as a critical path may need to go through fewer levels of macrocells. It is well known that fast decoders, finite state machines are the favorite application field of CPLD whilst FPGA products have become aggressive in offering efficient high complexity logic and in embedding RAM or hard blocks in their architectures. In the last decade, the synthesis problem for FPGAs has been widely addressed. Many LUT-based technology mapping, placement and floor-planning algorithms have been presented [5,6,7]. However, only some studies have been proposed on the synthesis problem for CPLDs, and consequently, very few mapping algorithms have been published [1,2,3,4]. [1] presented an approach that allows existing multi-level synthesis techniques to be adapted to implement circuits that are well-suited for CPLD architectures. The TEMPLA flow includes three phases: optimal tree mapping, heuristic partial collapsing−, and bin packing. The objective of this algorithm is to minimize the number of PLAs. In [2] PLAmap was developed to minimize the delay of mapped circuits. The PLAmap algorithm breaks the technology mapping problem into three phases: labeling, mapping and packing. [3] proposed k_m_flow as a technology mapper for single-output PLA-like macrocells, with both inputs and product term. The k_m_flow algorithm consists of two phases: labeling the network and mapping the network into k/m-macrocells. In [4] area/depth is the primary goal where the algorithm takes advantange of existing LUT mapper for single-output PLAs and packing for multiple-output PLAs. Most these algorithms are based on PLA architecture, without taking into account the macrocell architecture. This paper focuses on innovative techniques in the synthesis flow for complex programmable logic devices (CPLDs) especially for those containing foldbacks, which are like an inverted product term, and an Xor block such as (Pterm ⊕ Sum of Pterms). For Xor synthesis, the detected Xor templates become "don't touch" subfunctions similar to other macros such as arithmetic blocks generated by parameterized macrogenerators. Furthermore the foldback decomposition, which is a pseudo factorization, is performed if and only if a global cost evaluation in terms of the potential number of macrocells shows a gain. As foldback product terms can only be used within a Programmable Array Logic (PAL), the combinational logic using a foldback product term has to be put in the same PAL (Fig 2). Thus the foldback cluster is created. This cluster assignment is somwhat similar to the cone-based approach used for timing-driven floorplan [8], but here the clusters give priority to foldback gains. The aim of the Xor and foldback detection procedures is to reduce the number of macrocells required to implement a circuit. We propose a very fast Xor matching algorithm based on algebraic theory[9]. We have also successfully reduced the number of macrocells by applying the foldback decomposition. Finally, we address the assignment problem on the PAL block. The rest of the paper is organized as follows: after having introduced the CPLD features we are dealing with, the global flow is presented in section 2. In section 3, we give basic definitions, notations and problem formulation of 1 the Xor detection. Definitions, foldback decompostion algorithm and assignment problem are presented in section 4. Experimental results and conclusions are presented in sections 5 and 6. II. General approach II.1 CPLD features P A L B lo ck A Foldbacks Regional 16 M acro cells 16 (Inputs and Feedbacks Bus) 1 to 1 6 P A L B lo ck N I/O P in s Switch Matrix I/O P in s 40 16 Global Bus P A L B lo ck B P A L B lo ck N - 1 I/O P in s I/O P in s GOE GCK GCLEAR F ig 1 : A T F 1 5 x x F a m ily o f C P L D Atmel’s ATF15xx family of CPLDs [16] will be used as an example throughout this paper. The architecture is a classical CPLD structure partitioned into PAL blocks. Each PAL consists of a set of macrocells and a switch matrix block. All PAL blocks are linked together via the global bus (Fig 1), which contains all input and I/O pin signals as well as the buried feedback signals from all macrocells. The switch matrix in each PAL block receives as its inputs all signals from the global bus in their true and inverted form. These signals can be selected as inputs to the individual PAL blocks by the fitter software. Each input signal or feedback signal has access to a few multiplexers from switch matrix block, thus greatly limiting the available opportunities to route a global bus signal. Regional Foldback Bus FOLDBACK CASCADE MATRIX SWITCH 40 16 PT1 Product Term MUX PT2 XOR Gate PT3 D/T Q OR Gate ... PT4 CK PT5 CE Switch Matrix Global Bus Outputs Fig 2 : ATF15xx Macrocell Figure 2 illustrates the macrocell architecture. Each macrocell consists of product terms, a product term select multiplexer, OR gate, Xor gate, Cascade logic (or parallel expander), foldback bus, a flip-flop/latch and an output buffer. The product term select multiplexer (PTMUX) allocates five product terms as needed to the macrocell. Within a single macrocell, all the product terms can be routed to the OR gate. For the Cascade capability, the number of product terms can be extended from neighboring macrocells, up to 40 as long as space is available in the same PAL. An Xor gate exists at the output of the macrocell for each Boolean function. One input to the Xor gate 2 comes from the OR sum product terms (SOP). The other Xor gate input can be a product term (PT). Thus he the macrocell’s Xor gate allows efficient implementation of any Boolean function that can be expressed as (PT ⊕ SOP ) . Each macrocell can also generate a foldback product term. This signal goes to the regional bus and is available to all the macrocells in a given PAL block. The foldback is an inverse polarity of one of the macrocell’s product terms. Lastly, the flip-flop has very flexible data and control functions and can be configured for D and T operation. II.2 The global design flow The global design flow takes as input a netlist which is the output of a higher level compiler (VHDL, CUPL, Verilog or ABEL Compiler) as well as user constraints (Fig 3). Open Abel Format Flip-Flop Processing User constraints Minimization/Polarity Selection Mapping Xor Matching Foldback detection Cluster assignement Final Place/Route Fig 3: Global design flow After a classic polarity selection phase based on product term number minimization (ESPRESSO), a mapping step is called. Thereafter, the Xor matching and Foldback decomposition stages are considered. The first step detects Xor expressions that are of interest for CPLD synthesis, namely expressions having the following form (PT ⊕ SOP ) , where PT is a Product Term and SOP a Sum of Product terms Boolean expression. The second step is a Foldback product term detection/selection. It will be decided at this point if the use of a foldback product term appears to be suitable for the initial functions or subfunctions. These approaches will be explained in detail in this paper. The Xor matching and Foldback decomposition are performed if and only if a global cost evaluation in terms of potential number of macrocells shows a gain. After the foldback detection step, foldback clusters are created. As foldback product terms can only be used within a PAL, the logic utilizing a foldback product term has to be put in the same PAL. Unfortunately, the presence of foldback clusters may lead to a difficult fitting process. Nevertheless, it is interesting to use the foldback physical pattern and to perform minimization on logic as required. The assignment of foldback clusters consists of two phases: first, control and size limitation of the foldback clusters are processed. The second phase is to check if each cluster input is affected to one and only one input of the PAL block. This will lead to the final placement/routing that will not be considered at all in this paper. III. Xor Matching for CPLD architectures Xor operators commonly appear in high level descriptions and are preserved through the RTL synthesis path. It may happen, particularly in the CPLD world, that Xor expressions are flattened and one of the tasks of this paper is to reintroduce them in order to take full advantage of the available Xor gates of the CPLD device. This study is therefore completely different from the basic problem of expressing Boolean functions using the Xor operator. It is well known that a large amount of published literature exists on Reed-Muller expressions [15]. In this section we present a matching algorithm for the Xor gate of the CPLD device based on the necessary 3 conditions theory [9]. III.1 Problem formulation Definition: Consider the Xor_CPLD expressed in the following form PT ⊕ f , where PT is a product term of a macrocell and f is a sum of product terms of a macrocell (Fig 2). Example : Let F be an Xor Boolean function : F = ab ⊕ (cd + gh ) (i) After flattening and minimization , F can be expressed as : F = a cd + b cd + a gh + b gh + ab c g (ii) + ab c h + ab d g + ab d h The macrocell architecture is based on five product terms. In (i), use 1 macrocell to implement F, but in (ii), use 2 macrocells to implement F. After the Xor extraction, we reduce the number of macrocells required to implement F, and the gain is equal to one macrocell. The problem selected here as expressed above is rather a rewriting of the Boolean function equivalent to PT ⊕ f in which such a form has been hidden by a flattening process. Given a Boolean function, F is a sum of product terms expression. We assume that F is expressed in minimum support. In this section we try to use the new approach to detect if F can be implemented by an Xor_CPLD. F can be expressed as follows: F = Pt ⊕ g (1) Where g is a sum of product terms expression. The input variable supports of Pt and g are disjoint. This corresponds to the CPLD physical pattern. By applying this restriction, we obtain a much faster algorithm. The aim of the procedure presented in this subsection is to extract a product term Pt having a maximal number of literals called “Xor cofactors” from the Boolean function F. III.2 Necessary conditions verified by literals of a XOR cofactor Let F be a minimized sum of product terms expression. n F= ∑ mi where mi is the product term. (2) i =1 Let us suppose that F can be implemented by an Xor_CPLD, then F has a decomposition as shown in (1). Where Pt is Xor cofactor. The problem reduces to finding Pt, which is an Xor cofactor. Pt = x1 x 2 ...x p (3) F = ⎛⎜ x1 x 2 ...x p ⎞⎟ ⊕ g (4) ⎝ ⎠ Proposition III.2.1: If Pt is an Xor cofactor, then all the inputs of Pt have the same occurrence in a decomposition of F as in (2). ( ) Let xi , x j be two inputs of Pt and Occ( xi ), Occ xi be respectively the occurrence of the literals xi , xi in (2). ( ) An input occurrence be expressed as follows: SumOcc xi = Occ xi + Occ xi ( ) ( ) From (4) we have: F = ⎛⎜ xi x j ...x p ⎞⎟ g + ⎛⎜ xi x j ...x p ⎞⎟g (5) ⎝ ⎠ ⎝ ⎠ With g being independent of all inputs of Pt : k h g = ∑ p i and g = ∑ l i i =1 i =1 where p i and l i are product terms then : SumOcc ( x i ) = SumOcc( x j ) = k + h ⎧ ⎫ Proposition III.2.2: Xor cofactor Pt exists if, ⎨∀xi , xi ∈ Lt ( Pt ), F x = g ⎬ . ⎩ i ⎭ Where Lt ( Pt ) is a set of literals of Pt . We compute Fx or F according to this literal xi appears under a i xi direct or complemented form in Pt . We know g is independent of all inputs of Pt and using (4) we then have: Fxi = Fx j = ... = Fx p = g 4 III.3 Xor_CPLD matching algorithm The approach used here consists of two phases. If F = Pt ⊕ g , then obvious necessary conditions exist dealing with the occurrence of the literals of Pt. A first phase aims at detecting these necessary conditions of identifying the Pt literals. Having identified the variable candidates, a second phase based on ROBDDs (Reduced Ordered Binary Decision Diagram) verifies if F is equivalent to a Pt ⊕ g expression. III.3.1 Binary decision diagram We assume here that the reader is familiar with binary decision diagrams and all related numerous BDD packages [9][10]. Example1: The ROBDD templates of a ⊕ b (Fig 4) Example2: We begin the construction of the ROBDD following an order of variables which starts with the Pt variables. The ROBDD templates of Pt ⊕ g is shown in (Fig 5). X1 a 0 1 Inverter b Xp 0 1 Fig 4 ROBDD of g Fig 5 III.3.2 Algorithm Based on this theory, we describe the algorithm that checks whether F can be implemented by an Xor_CPLD gate. The Boolean function F is minimized once. Let S(F) be the set of inputs of F. We perform this check only if we need more than one macrocell to implement F. The matching steps are done as follows: ( ) 1. ∀x i ∈ S ( F ) , compute SumOcc xi and use SumOcc value to sort S(F) in classes. Each class is an Xor cofactor candidate. 2. Select Xor cofactor candidate as one with greater number of inputs, then we construct a ROBDD for this candidate using a variable ordering such that the inputs of the Xor cofactor candidate appear on the top of the ROBDD (Fig 5). 3. Check for each input x i of the Xor cofactor candidate if Fx = g . This check is performed i by verifying in the ROBDD template (Fig 5) if the node g is reached by only direct (complemented) from the nodes x i ,..., x p −1 and by direct edge and an inverted edge from the node x p . 4. To check if F is equivalent to a F = Pt ⊕ g expression. If so, report a match and quit. Otherwise return to step 2. IV. Foldback synthesis A foldback is a product term PT of a macrocell which is inverted and fed back into the PAL logic array for use by any or all of the macrocells in the same PAL block (Fig 2) (i.e this signal is available to all macrocells within the same PAL block). The foldback terms in each PAL can also generate additional fan-in sum terms with small additional delays. IV.1 Basic definitions: Definition 1: Cost of a Boolean expression Consider a Boolean function F expressed as a sum of product terms. The cost of the function F, denoted as Cost (F), is the number of product terms. 5 Definition 2: foldback candidate A foldback candidate is a Boolean expression having the form ( x1 + x 2 + x 3 + ... + x k ) where x i is an input variable or a subfunction (shared or not) under a direct or complemented form and denoted by the term Node FB which is always inverted. Definition 3: Cost of foldback The cost of a foldback subfunction selected as an admissible foldback candidate is 1. This means that it corresponds to the foldback physical pattern. We detect a foldback factor of two or more product terms of a set of functions. Example1: Let F be a minimized sum of the product terms expression. F = abde + abce + abe f + gh + hlm + np and Cost (F) = 6 yields a foldback factoring as follows: F = abe(d + c + f ) + gh + hlm + np , Node FB = (c d f ) and Cost ( Node FB ) = 1 then: F = abe Node FB + gh + hlm + np and Cost (F) = 4 After the foldback decomposition the gain is equal to 2 product terms. Example 2: The following case does not generate a foldback factor. F = abe + abcd = ab(e + cd ) In this case we do not have the foldback form since, the expression (e + cd ) is not a foldback factor. IV.2 Foldback detection algorithm The algorithm proceeds like a “pseudo” factorization with a filtering of all the factors which are not the foldback form ( x1 + x 2 + x 3 + ... + x k ) , where x i is an input variable or a subfunction (shared or not) under a direct or inverted form. We describe the foldback detection algorithm as follows: 1. Form cliques of functions by specifying number of literals in common. For each clique of functions. 2. For each function in clique, Create all foldback product terms for both onset and offset expressions and for each such foldback product term a. Determine if the identical foldback product term is derived from other product terms of this function or from product terms of other functions of the clique. b. Form an association with this foldback of each function of the clique which will factor into this foldback. When all foldback product terms of the clique are determined, 3. For each foldback in the clique, For each function in the clique, a. Determine if the function factors into that foldback, onset, offset, or both onset and offset. b. If the use of the foldback is indicated, determine whether or not a reduction in the number of product terms occurs by the use of the foldback for instance, if foldback factoring is designated for the onset specification, compare the onset implementation with the expander against the offset implementation without foldback to determine if use of the foldback results in a reduction in the number of product terms. c. If there is a reduction in the number of product terms through use of the foldback, accumulate the number of saved product terms for all functions of the clique. 4. Select that expander which yields the optimal savings in number of product terms in the clique a. Create a foldback node for that foldback. b. For each function in the clique which yields a product term reduction by use of that foldback, form the appropriate function implementation having that foldback node as a literal, both onset, offset when appropriate. c. Remove the designated expander from the list of clique foldbacks. d. Perform step 2 on newly created function implementations, that is, determine if the newly factored function, with foldback implementation, can be factored again into foldbacks which may have a common use in the clique, allowing factoring into multiple foldbacks. 6 5. Return to step 4. IV.3 Foldback cluster assignment The preparation of the fitting process needs to answer a basic question whether a given foldback cluster candidate can be assigned to the PAL block. IV.3.1 Definitions Definition 1: The PAL block architecture is defined as the number of inputs, denoted as I, the number of macrocells, denoted as M and the number of outputs, denoted as O. Definition 2: In a PAL block, each pin signal is routed to X Muxs. This means that there are X possibilities for routing each pin signal into a PAL block. Definition 3: In a PAL block, each feedback is routed to Y Muxs , providing Y possibilities for routing this signal in to the PAL block. Definition 4: The number of inputs of the PAL block is equal to the number of Muxs of the switch matrix block. Definition 5: The foldback cluster, denoted as CL FB , is defined as the set of functions and subfunctions that have the same foldback. Let IN (CL FB ) be the set of inputs or subfunction of CL FB , MC (CL FB ) as the number of macrocells and OUT (CL FB ) as the number of outputs. IV.3.2 Problem formulation The assignment problem for the foldback cluster can be formally stated as follows: given a cluster of the functions and the basic PAL block , the cluster is assigned to a PAL block. This problem is solved by checking two conditions: first that the PAL block capacity is respected. Secondly, that if each input (or output, or feedback) of cluster is affected by one and only one input (or output, or feedback) of the PAL block. Is means finding a free Mux of the switch matrix in the PAL block to route the input (or output, or feedback). We denote an input (or output, or feedback) of the foldback cluster as an element. Definition: Let ELT (CL FB ) be the set of elements of CL FB . Property: If a cluster of functions has been assigned to a PAL block, then all the functions of this cluster have the same foldback. Proposition: A CL FB is said to be assigned to the PAL block if and only if: (1). IN (CL FB ) ≤ I , MC (CL FB ) ≤ M and OUT (CL FB ) ≤ O (2). Each element of CL FB is assigned to one and only one Mux of the switch matrix in a PAL block. Proposition (1) is a quick and easy check. For proposition (2) we propose the problem formulation. We have a set of elements of CL FB and a set of Muxs of the PAL block. These are the two kinds of nodes in our bipartite graph. We know which Mux can be routed with elements of flodback cluster (Definition 2 and Definition 3). This defines the edges of our bipartite graph. Our aim in the maximum cardinality matching problem is assigning elements to Muxs in such a way that as many Muxs as possible are used by an element that can handle it. One element can be assigned to at most one Mux and we can assign at most one element to one Mux. At this point, the problem can be expressed more naturally in graph theory terms. A bipartite graph G = (U, V, E) with vertex sets U , V and edge set E, where U is a set of inputs of CL FB , V is a set of Mux of switch matrix in PAL block and E = {(u , v) u ∈ U, v ∈ V} . If (u, v) is an edge, i.e, (u,v) ∈ E, then there exists a possibility of routing the vertex u by the vertex v (Definition 2 and Definition 3). Obviously, the problem reduces to a classical bipartite matching problem and the solution consists in performing a maximum matching algorithm in a graph G [10,11]. In conclusion, the foldback cluster can be assigned to the PAL block if there exists a maximum cardinality of a matching in a graph G equal to ELT (CL FB ) . V. Experimental results The techniques presented above have been implemented in C language and incorporated in new Atmel EDA tools. To assess the results produced by Atmel fitters, these were compared to Altera’s MAX+PLUS II tool (version 9.5). Table 1 shows the analysis of the results for MCNC benchmark circuits. For each benchmark, 2 indications are given. For Atmel, the first one gives the number of macrocells (NberMC) and the second gives the number of foldbacks (NberFB). The results of Altera are obtained by trying two different synthesis options with optimization for area (index 0) : Normal and Fast. NberLC values and NberSE values indicate respectively the number of logic cells and the number of the shareable expanders. For comparison we used the ATF15xx device from Atmel [16] and the MAX7000 device from Altera [17]. The two devices have the same logical capacity. Then, a logic cell (LC) in a MAX7000 is basically equivalent to a macrocell (MC) in a ATF15xx (Fig 2). 7 ATMEL ALTERA Bench synthesis Normal synthesis Fast synthesis NberMC NberFB NberLC NberSE Gain(%) NberLC NberSE Gain(%) 5xp1 12 9 16 9 33 15 7 25 alu4 81 30 141 29 74 149 26 84 apex3 142 34 154 43 8 168 41 18 apex4 207 36 248 23 20 250 21 21 clip 16 26 22 3 37 19 10 19 cps 163 44 175 88 7 157 62 -3 duke2 45 25 52 28 16 51 10 13 ex5 64 18 63 23 -1,5 67 8 4 misex3 96 30 225 77 134 220 54 129 rd84 26 17 32 06 23 32 18 23 seq 173 36 266 122 54 237 67 37 t481 7 6 6 6 -14 17 5 143 table3 104 28 139 102 34 127 25 22 table5 90 23 125 77 39 118 34 31 vg2 10 13 16 4 60 16 0 60 Total avrage Gain 34% 39% Table 1: Comparison with Altera tool On average, when the circuits in Table1 are fitted using the Altera tool, they require 34% to 39% more logic cells than the Atmel fitters targeted to the ATF15xx. VI. Conclusion The overall design flow for a CPLD is organized around processing for innovative features. The Xor synthesis based on algebraic matching and foldback processing creates initial foldback clusters. Once the clusters are defined, the clustering step explores a bipartite matching method of cluster assignment to PALs. This flow has been fully implemented and validated on the ATMEL flow. In conclusion, it has been shown during the last two decades that successful topic synthesis is a combination of fundamental successful achievements (ROBDD ...) and device-specific features requiring permanently innovative heuristics. VII. References [1] J. H. Anderson, S. D. Brown, “Technology Mapping for Large Complex PLDs, ” in Proc. 35 th ACM/IEEE Design Automation Conference 1998, pp 698-703. [2] D. Chen, J. Cong, M. Ercegovac and Z. Huang, “Performance-Driven Mapping for CPLD Architectures, ” IEEE Trans. on Computer-Aided Design, oct. 2003,Vol. 22, No. 10, pp. 1424-1431. [3] J. Cong, H. Huang, and X. Yuan, “Technology mapping for k/m-macrocell based FPGA’s,” in Proc. ACM/SIGDA Int. Symp. Field Programmable Gate Arrays, San Jose, CA, Feb 2000, pp. 51-59. [4] S.L. Chen, T.T Hwang and C.L Liu “A Technology Mapping Algorithm for CPLD Arcitectures, ” in Proc. IEEE. Int. Conf. on Fiel-Programmable Technology, Hong Kong, Dec. 2002, pp. 204-210. [5] J. Cong and Y. Ding, “FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs, ” IEEE Trans. On Computer-Aided Design, Jan. 1994, Vol. 13, No. 1, pp. 1-12. [6] K. C. Chen, J. Cong, Y. Ding, A. Kahng, and P. Trajmar, “DAG-Map: Graph-based FPGA technology mapping for delay optimization,” IEEE Design Test Comput., pp. 7–20, Sept.1992. [7] H.Eisenmann and F.Johannes, “Generic Global Placement and Floorplanning,” in Proc.35 th ACM/IEEE Design Automation Conference 1998. [8] S.A Senouci, A. Amoura, H. Krupnova and G. Saucier, “Timing-Driven Floorplanning on Programmable Hierarchical Targets,” in Proc. ACM/SIGDA Int. Symp. Field Programmable Gate Arrays, San Jose, CA, 1998, pp. 85-92. 8 [9] R. Murgai, B.K. Brayton, A.S Vincentelli, “An improved synthesis algorithm for multiplexor-based PGA’s,” in Proc.29th ACM/IEEE DAC 1992, pp 380-386. [10] H. Alt, N. Blum, K. Mehlhorn, M. Paul, “Computing a maximum cardinality matching in a bipartite graph in time O(n1.5(m log n)0.5) ,” Information Processing Letters, Vol. 37, No. 4, 237-240, 1991 [11] N. Sherwani, Algorithms for VLSI Physical Design Automation, Third edition. 1999 by Kluwer Academic Publishers. [12] D. Kania, “A technology mapping algorithm for PAL-based devices using multi-output function graphs,” in Proc. 26th Euromicro Conf., Sept. 2000, pp. 146–153. [13] V. Solovjev and M. Chyzy, “The Universal Algorithm for Fitting Targeted to Complex Programmable Logic Devices,” in Proc. 25th Euromicro Conf., Sept. 1999, pp. 286–289. [14] S. Krishnamoorthy and R. Tessier , “Technology Mapping Algorithms for Hybrid FPGAS Containing Lookup Tables and PLAs, ”IEEE Trans. on Computer-Aided Design, may. 2003,Vol. 22, No. 5, pp. 545-559. [15] V. Ciriani, “Logic Minimization using Exclusive OR Gates ,” in Proc 38st ACM/IEEE Design Automation Conference 2001. [16] The ATF15xx Family Data Sheet, Atmel Corporation, 2000. [17] MAX 7000B Programmable Logic Device Family, the Altera Data Book, Altera Corporation, 2000. 9