AutoModelGen: A Generic Data Level Implementation of ModelGen Andrew Smith and Peter McBrien Dept. of Computing, Imperial College London, Exhibition Road, London SW7 2AZ Abstract. The model management operator ModelGen translates a sche- ma expressed in one modelling language into an equivalent schema ex- pressed in another modelling language, and in addition produces a map- ping between those two schemas. AutoModelGen is a generic data level implementation of ModelGen that meets these desiderata. Our approach is distinctive in that (i) it takes a generic approach that can be applied to any modelling language, and (ii) it does not rely on knowing the mod- elling language in which the source schema is expressed in. Key words: ModelGen, Model Management, Data Transformation, Data Integration, Meta Modelling 1 Introduction ModelGen is a model management [1] operator that translates a schema in a source data modelling language (DML), for example XML Schema, into an equivalent schema in a target DML, for example SQL, and also generates a mapping between the two schemas. To date, no implementation of ModelGen generates both a target schema and a mapping between the source and target schemas [2]. In this demonstration we present an implementation of ModelGen that automatically creates a data level mapping that describes how instances of the source schema should be translated [3]. Further distinguishing features of our approach are that (1) the translations are made on a Universal Meta Model (UMM) that has previously been shown to able to represent schemas from a large number of data modelling languages, and (2) the mappings created are bidirectional i.e. we also create a mapping from the target to the source schema. Fig. 1 gives an overview of our approach. In step (1) the source schema Ss is transformed into an equivalent schema, Shdm−s expressed in the UMM. In step (2), a series of information preserving [4] transformations are applied to Shdm−s to transform it into Shdm−t , that matches the structure of a schema in the target DML. In step (3) the constructs in Shdm−t are transformed into their equivalents in the target DML to create St . We will first discuss the overall architecture of our system and then discuss the details of the two algorithms used in step (2). 66 Proceedings of CAiSE’08 Forum Fig. 1. Overview of the approach taken 2 Architecture AutoModelGen is a tool that creates schemas and both-as-view (BAV) transformations [5] in the AutoMed data integration system [6]. AutoMed allows for schemas to be stored in both the native modelling language of a data source (eg XML or SQL/relational) and in AutoMed’s UMM called the hypergraph data model (HDM) [7]. The HDM uses three modellings constructs (nodes, edges and constraints) to represent the constructs of a high level DML [8]. HDM nodes and edges have associated data values (called their extent). Constraints place restrictions on the data values that may appear in the extent. Each variant of a high level DML construct has a particular representation in the HDM. For example, the set of HDM constraints generated by a nullable SQL column will be different those generated by a not null column. This is important when it comes to identifying whether a group of HDM constructs matches a particular construct in the target DML. A BAV information preserving [4] mapping is made up of a sequence of trans- formations called a pathway, where each transformation either adds, deletes or renames a single schema object (such as a single SQL column, SQL primary key definition, XML element, etc), thereby incrementally generating a new schema from an old schema. The extent of the schema object being added or deleted is defined as a query on the extents of the existing schema objects. BAV transformations can be grouped into information preserving compos- ite transformations (CT), that act as templates of a fragment of a path- way, describing common patterns of transformation steps. For example the CT id node expand is useful when the target DML has explicit keys (such as a key attribute in ER or SQL models) but the source model has implicit keys (such as in XML Schema). 3 Algorithms AutoMatch inspects a given HDM schema Shdm−x and determines which of the nodes, edges and constraints match a construct in the target DML. AutoTransform searches for a schema in which all the HDM schema objects match the structure of the target DML by repeatedly applying CTs to the schema objects in Shdm−x that AutoMatch identifies as not matching constructs in the Proceedings of CAiSE’08 Forum 67 target DML. The set of possible schemas created in this way is called the world space [9] of the problem. It can be represented as a graph whose nodes are indi- vidual HDM schemas and whose edges are the CTs needed to get from one node in the graph to the next. To limit the number of possible CTs that have to be performed at each node of the world space graph, CTs must satisfy certain pre- conditions before they can be executed. In our algorithm the preconditions rely on the structure of the HDM schema, in particular the constraints, surrounding the schema object that the CT is to be applied to. Fig. 2. An example world space graph The world space graph for the example in the demonstration is shown in Fig. 2. Each node is labelled with a schema name (S, S0 , . . .) and a list of the unidentified schema objects e1,e2,. . . in that schema, or the word Solution. All the constructs in a Solution node match those of the target model. Above each node is a list of CTs that meet the preconditions for the unidentified schema objects in that schema. Those CTs that meet the preconditions most closely are sorted to the top of the list and executed first. AutoTransform performs a depth first search on the world space graph starting from the initial state, by executing the CT at the head of the list, until a solution or a dead end is reached. If a dead end is reached the algorithm back tracks to the last node in the world space graph where an untried CT/edge combination exists, and executes the next CT in the list for that node. If all the edges on all the nodes have been tried without finding Solution then AutoTransform has failed. 4 Execution of the Tool The current prototype of the tool is capable of translating between schemas represented in the XML, ER and SQL DMLs, and of materialising the data instances of a schema in one DML as instances of a second DML. In a typical execution of the tool the following steps are performed: 1. A Source schema is imported into AutoMed, and then translated into the HDM. 2. The AutoMatch and AutoTransform algorithms are run on the newly generated HDM schema, hdm, to generate a new schema hdm0 , where the hdm0 schema is one that matches the structure of the target DML. 68 Proceedings of CAiSE’08 Forum 3. The hdm0 schema is translated into a schema in target DML. 4. This target schema along with its data is then materialised. Fig. 3. Equivalent ER, SQL and XML Schemas created by AutoModelGen Since the result of the tool’s output is a set of schemas and BAV mappings held in AutoMed, the standard AutoMedtoolkit may be used to view results of the tool’s execution. Fig. 3 shows a screen shot from the AutoMed GUI featuring an ER schema and then in an anti-clockwise direction, the SQL and XML equivalents of the schema generated automatically by AutoModelGen. References 1. Bernstein, P.A., Halevy, A.Y., Pottinger, R.: A vision of management of complex models. SIGMOD Record 29(4) (2000) 55–63 2. Bernstein, P.A., Melnik, S.: Model management 2.0: manipulating richer mappings. In: SIGMOD Conference. (2007) 1–12 3. Smith, A., McBrien, P.: A generic data level implementation of modelgen. In: BNCOD. (2008) To appear 4. Hull, R.: Relative information capacity of simple relational database schemata. SIAM J. Comput. 15(3) (1986) 856–886 5. McBrien, P., Poulovassilis, A.: Data integration by bi-directional schema transfor- mation rules. In: ICDE. (2003) 227–238 6. M. Boyd, S. Kittivoravitkul, C. Lazanitis, P.J. McBrien and N. Rizopoulos: Au- toMed: A BAV Data Integration System for Heterogeneous Data Sources. In: CAiSE04. Volume 3084 of LNCS., Springer Verlag (2004) 82–97 7. McBrien, P., Poulovassilis, A.: A general formal framework for schema transforma- tion. In: Data and Knowledge Engineering. Volume 28. (1998) 47–71 8. Boyd, M., McBrien, P.: Comparing and transforming between data models via an intermediate hypergraph data model. J. Data Semantics IV (2005) 69–109 9. Weld, D.S.: An introduction to least commitment planning. AI Magazine 15(4) (1994) 27–61