Sketches, Views and Pattern-Based Reasoning Ralph L. Wojtowicz Baker Mountain Research Corporation (ralphw@bakermountain.org) Shepherd University (rwojtowi@shepherd.edu) Abstract—The mathematical theory of sketches provides a aligning knowledge models that differ due to simple renaming graphical framework for describing and relating knowledge and more complex reformulations of concepts. representations and their models. Maps between sketches can extract domain-specific context from a sketch, express knowledge Examples of knowledge representations include relational dynamics and be used to manage representations created for algebra and its implementation in database languages (such distinct applications or by different analysts. There are precise as SQL), entity-relation-attribute diagrams, ontologies, data connections between classes of sketches and fragments of first- specifications and sketches. Our interest in the latter results order, infinitary predicate logic. EA sketches are a particular class from its graphical nature, deep connections between sketch that is related to entity-attribute-relation diagrams and can be theory and logic and a rich notion of contextual view of a implemented using features available in many relational database sketch. Sketch theory has proved to be a valuable tool in systems. In this paper we illustrate sketch theory through devel- opment of a simple human terrain model. We apply the theory to mathematical logic and the theory of computer programming an example of aligning sketch-based knowledge representations languages. Its relationship to other semantic technologies, and compare the approach to one using OWL/RDF. We describe therefore, warrants further exploration. the computational infrastructure that is available for working with sketches and outline research challenges. The purpose of this paper is to introduce the sketch data model to researchers and practitioners of other semantic-based technologies and to describe a program for its application. We I. I NTRODUCTION seek to give an overview of the theory through discussion and examples without focusing on the mathematical details. We use the term knowledge representation to refer to a The following themes emerge. (1) An ontology or sketch mathematical model of the concepts that we use to understand, is a presentation of knowledge. Different presentations of reason about and navigate our environment. It evolves in the same knowledge are possible. The theory of a sketch response to new experiences, concept formulation and the is the formal mathematical object that such presentations mission at hand. Ownership, membership, amicability, people generate. (2) Alignment of distinct knowledge representations and plans are examples of interrelated entities in this network. and derivation of views of particular ones are more appro- We use decision space to refer to a sets of individuals and priately formulated using theories than presentations. (3) The relationships that our knowledge representation organizes. This sketch model emphasizes the distinction between a knowledge space is more dynamic, densely populated and uncertain than representation and its models. Instances, incompleteness and the knowledge representation. The concept of ownership, for uncertainty may be more appropriately incorporated in models example, encompasses a list of ephemeral connections between rather than in knowledge representations themselves. (4) The individuals and their possessions. Our understanding of owner- software infrastructure available for working with sketches ship persists while instances of this relationship come and go. currently is meager compared to that which has been developed Moreover, different people can share a common understanding around other semantic technologies such as OWL/RDF. of ownership even if the instances of this relationship that they observe have little or no overlap. They apply the same knowledge representation to distinct models. A. Concept of Operations Different knowledge representations may characterize the Figure 1 illustrates an example concept of operations that same concept in distinct ways. Renaming the concept ‘own- shows how the sketch data model might be used in a decision ership’ as Eigentum or propriété, for example, results in a support system. Later in this paper we discuss details of partic- new presentation of the concept. A complex idea may, more ular aspects of the data pipeline. Data from distributed sources generally, be decomposed into distinct, simpler concepts by is marshaled into local data models S that are expressed as different people. Finally, as we build a knowledge model to sketches. The local sketches are aligned using sketch maps into organize our observations of a greater range of phenomena, a common parent T called a theory. T is the sketch generated we frequently derive and extract parts of it that are suitable by the local sketches taking into account potential overlaps. for context-based reasoning about particular situations. Parent sketches evolve over time as local ones are modified and new data sources come online. Within a particular mission A mathematical formulation of knowledge should dis- context, a view V of the system knowledge representation tinguish between the knowledge representation and decision T is extracted. The problem of mathematically characterizing space models. It should support evolution of the former and the context from event and decision histories is a challenging the dynamics and uncertainty that are characteristics of the one and is an active area of research for applications such latter. The mathematical framework should support derivation as Internet search. A view is then a sketch equipped with a of context-specific views of a knowledge representation and sketch map into the current parent theory. Models of sketches a decision space. Finally, it should provide mechanisms for (including views) are distinct from the sketches themselves. STIDS 2013 Proceedings Page 117 They include the observed instances that populate the classes sketch of human terrain knowledge. The vertices Person, For- and relations that symbols in the sketch represent. Uncertain- eign, Coalition, Resident, Village and TribalElement represent ties and partial information are accounted for in the model, not classes of entities. Individuals who populate these classes in the sketch. Data artifacts relevant to a view are analyzed to are typically not represented in G (although it is possible estimate statistical metrics for potential future states. Figure 1 to include them explicitly). They instead occur in semantic is conceptual and necessarily incomplete. It does not show, for models of the sketch and may be realized as, for example, example, the roles of user interface components, visualization rows in database tables. The SeenIn vertex represents a relation tools and query and reasoning engines, nor of event and between the two classes to which it has edges. It represents the decision history archives which would be built into a real situation in which foreigners may be observed in one or more command decision system. local villages. Instances of this relation, like individuals who populate the classes, occur in models of the sketch instead of data data being represented in the sketch itself. source source Person Knowledge S1 S2 representations: sketches has TribalElement Resident theories T1 T2 ··· Foreign settled by Coalition context liv es in SeenIn V1 V2 ··· view based in Village M1 M2 M3 ··· models place Fig. 2. Part of a graph representing human terrain knowledge potential futures Intuitively, the edges of G represent functions. As we dis- cuss below, however, edges may model incomplete or uncertain cost/reward information. We intend the edge from Resident to Village, for volatility example, to model a situation in which each resident of an area of interest is associated with a unique home village. Each Decision spaces: sketch models instance of the class represented by the Coalition vertex is Fig. 1. Concept of operations for the sketch data model to be based in a specified village. Moreover, each village is settled by a unique tribal element. The two edges from SeenIn represent the process of identifying a foreigner and a village in which he or she was observed. Multiple or no observations B. Historical Background of a particular individual are possible. Note that in a sketch, relations (properties) are modeled by vertices rather than edges Category theory is a mathematical field introduced by as is the idiom in OWL/RDF. A relation vertex, however, is the Eilenberg and Mac Lane in the 1940s to manage transforma- domain of edges that specify the types of entities that it links. tions between certain geometric and algebraic structures. It saw That is, the types of the variables that occur as the domain and explosive growth after Kan discovered the unifying concept range of (binary) relations must be specified. of adjoints in 1958 [22]. During subsequent decades it has been applied across diverse areas of computer science and Various features of the human terrain that we seek to mathematics including statistics [9], linguistics [8], dynamic represent are not captured by the graph alone. We express these systems, semantics of programming languages [3], topology using extra structures called diagrams, cones and cocones. In and, in particular, logic [20] where it provides a non-set- Section II-A below we give a general discussion of the way in theoretic foundation for mathematics. The theory of sketches which these constraints are specified using graph maps. In this is a subdomain of category theory developed by C. Ehresmann paper we describe examples of how these concepts are used in 1968 [11]. It was almost exclusively a tool of the French but do not define them precisely. For details, see [2]. school of category-theorists until publication of [2], [3]. A category is a collection of objects (e.g., sets, probability spaces The triangle involving Resident, Village and TribalElement or vector spaces) and maps between them (e.g., functions, is an example of a diagram. It expresses the intuition that the stochastic matrices or linear transformations). In this paper tribal element of a resident R should coincide with the tribal we use categories to construct models of sketches. Sketches element that has settled the village in which R lives. This themselves form a category having rich structure [13]. semantics is imposed on models of the sketch by including an appropriate diagram in the sketch constraints and by the mathematical definition of sketch model. The constraint can II. S KETCHES , M ODELS AND M APS be implemented, for example, using database triggers if the instances are stored in database tables. A sketch is a graph-based knowledge representation. It consists of an underlying directed graph G together with The Foreign, Coalition and Resident classes are to be extra structures that impose semantic constraints on models. construed as subclasses of Person. Again, this intent is not Figure 2 shows part of the underlying graph of a simple captured by the graph alone. We express subtype relations by STIDS 2013 Proceedings Page 118 including particular cone constraints in our formulation of the B. Semantic Models of Sketches sketch. One such cone would be included for each of the three Individuals who populate the classes of a sketch are not subtypes that occurs in Figure 2. Cones, like diagrams and typically represented in the sketch itself. They are elements cocones, impose mathematical requirements on models. of models of the sketch. As we discuss below, this framework Finally, we may intend the classes Foreign, Coalition and clarifies the distinction between the syntax of a knowledge Resident to be mutually exclusive and to exhaust the possible representation and its semantics. We can use this formulation classifications of Person instances. This feature is not captured to introduce partiality (i.e., missing data) and uncertainty into by the graph but can be included in the sketch using three cones models rather than requiring these features to be part of the (to assert the subtype constraints) and a cocone to assert the syntax. First, however, we describe deterministic, set-based disjoint union constraint. models. A (set-based) model of a graph G is an assignment of a set M (v) to each vertex v of G and a function M (e) : M (A) → M (B) to each edge e : A → B of G. There are no A. Sketch Maps further restrictions on models of a graph. A map H → G from a graph H to a graph G is a pair A model of a sketch S is a model of its underlying graph of functions that assigns a G vertex to each H vertex and a G that satisfies the restrictions that are represented by the G edge to each H edge in a way that respects the source constraints (diagrams, cones and cocones) of S. In this paper and target information for edges in the two graphs. Graph we seek to give an overview of the theory and do not give a maps play important roles in defining and applying sketches. precise definition of these constraints or their semantics. For First, each of the three types of semantic constraints (diagrams, details, see [2]. Figure 3 shows a model of a fragment of the cones and cocones) is defined as a type of graph map from a human terrain sketch that was shown in Figure 2. The Resident base graph B to the underlying graph G of the sketch. and TribalElement vertices are interpreted as sets of instances. The edge labeled ‘has’ is interpreted as a function between B !G the sets. To constitute a model of the sketch, the functional interpretations of the edges lives in and settled by must be The three classes of constraints are distinguished by the shapes consistent with that of ‘has’. of their base graphs. Second, maps between sketches are defined to be maps between the underlying graphs that preserve Amina Dhulbahante the semantic constraints. To illustrate this idea, observe that a Faysal Isaaq graph map Bashir ! G′ Said Darod G between the underlying graphs of two sketches S and S ′ M (Resident) M (TribalElement) M (has) converts each S constraint B −→ G into a graph map Fig. 3. Functional model of a fragment of the sketch shown in Figure 2. B !G ! G′ via composition of graph maps. If G → G′ is a sketch map, By varying the semantic category in which sketch models then this composite is also an S ′ constraint. Maps between take their values, we may represent lack of information and sketches give a rigorous, general framework for addressing uncertainty. The edges of the underlying graph may, for knowledge model dynamics, alignment and views. For exam- example, represent partial functions rather than total functions. ple, if a knowledge model S ′ subsumes another model S, we Recall that a partial function from a set X to a set Y is a can express this fact using a sketch map. function X ′ → Y for some subset of X and that composition g ◦ f of partial functions is associative (like composition of S ! S′ total functions) and is defined by further restricting the domain of f . Figure 4 shows a partial function model of a fragment of Not every sketch map expresses a parent-child relationship, the human terrain sketch that was shown in Figure 2. In this however. Those that do are called monomorphisms and satisfy example, the tribal element membership of Faysal is unknown. a condition that generalizes the notion of a one-to-one function. A sketch map can, alternatively, merge distinct vertices or Amina edges to eliminate redundancy such as equivalent classes that Dhulbahante Faysal have been given distinct names. Isaaq Bashir Said Darod Alignment of intersecting sketches S1 and S2 in a common parent S is expressed by the following diagram of sketch maps M (Resident) M (TribalElement) M (has) !!! S """"" " #" !!! Fig. 4. Partial functional model of a fragment of the Figure 2 sketch. S1 #""" " S2 """ !!! !!! S0 In Figure 5 we illustrate a probabilistic model of the edge where S0 is a sketch representing the intersection of the two ‘has’ that occurred in Figure 2. In this model, each point of knowledge models. We discuss an example in Section II-E. the source object (which for ‘has’ is the set M (Resident)) STIDS 2013 Proceedings Page 119 is mapped to a probability function on the target object (the The definition of map between models requires the two paths set M (TribalElement) in this case). That is, in this semantic to define the same function. That is, the village of a resident category, edges are interpreted as stochastic matrices (all who occurs in both models should be the same in both models. entries are non-negative and columns sum to 1). Composition Of course, not every map of models represents an extension or is matrix multiplication. subsumption relationship. As with alignment of sketches, we can express alignments of models using maps. For example, Amina 0.8 0.2 Dhulbahante alignment of intersecting models M1 and M2 in a common Faysal 1.0 parent M is expressed by the following diagram of model Isaaq Bashir 1.0 maps where M0 is a model representing the intersection. Said Darod 1.0 M (Resident) M (TribalElement) !!# M $"""""" M (has) !!!! M1 %### !!& M2 ### Fig. 5. Probabilistic model of a fragment of the Figure 2 sketch. !!!! M0 There is a rich literature investigating classes of sketches, as distinguished by the types of semantic constraints they include, D. Presentations and Theories and classes of semantic models. Examples include linear, finite product, finite limit, EA (entity-attribute) and mixed sketches. A sketch (or an OWL ontology) is a compact presentation The expressiveness of the class of sketch imposes requirements of the much larger body of knowledge T that it generates. For on the classes of structures that may be employed in semantic example, if an ontology defines a class A, a subclass A′ of models. Just as various OWL dialects are associated with A and a property P that is defined on A, then we can derive different fragments of the predicate calculus, so too are classes a subproperty P ′ by restricting P to A′ . This restriction P ′ of sketches. may or may not be explicitly defined in the ontology. It is part of the larger body of knowledge T that the ontology is C. Maps of Models designed to present. Sketch theory defines and provides tools The theory of sketches also provides a notion of maps for analyzing this generated body of knowledge. between semantic models. We call these model maps. They can be used to represent model dynamics, comparisons and The theory T of a sketch S is the sketch that S generates combinations. For example, the fact that different people by recursive application of the constructions supported by may populate our tabulations of the Resident class that is the type of sketch. These constructions can include property represented by the corresponding vertex of Figure 2, should chains (i.e., composition), property inverses (i.e., reciprocals), not require us to change the syntax of our knowledge represen- property restrictions, products (ordered pairs) and coproducts tation. In other words, our understanding of the concepts and (unions) of classes, and extraction of subclasses and subprop- relationships of the human terrain does not necessarily change erties. The constructions are specified as types of diagrams, when we observe a new individual to add to our information cones and cocones since these are the concepts used to specify system. This modular approach to information and knowledge semantic constraints in sketches. T is usually much larger management is a strength of the sketch framework. than S. It can be infinite even if S is finite. Consequently, when we write down a knowledge representation, we almost As with models themselves, model maps can introduce never write down T . We formulate a presentation S instead. partiality and uncertainty. We focus on deterministic maps. Let M and M ′ be models of a sketch S. For each vertex In Figure 6 we compute a small example. The underlying v of S, the models have corresponding sets M (v) and M ′ (v) graph G of the sketch has two vertices and two edges. of individuals. A map τ from M to M ′ is a collection of To make the example a bit more concrete, assume that P functions τv ! M ′ (v) represents a class of people and E represents a class of elected M (v) officials who serve political districts. The edge r represents between these sets of instances. In order to be a map of models, an assignment of elected officials to people while u identifies these functions must be consistent with the functions in the elected officials as particular instances of people. We impose models themselves that arise from edges in the underlying one semantic constraint: the property chain (composite r ◦ u) sketch graph. Two models M and M ′ of the Figure 2 sketch, of the two edges indicated in the triangle should coincide with for example, each have associated sets of Resident and Village the identity function on elected officials. That is, each elected instances. If M ′ subsumes M by, for example, adding new res- official serves his or her own political district. The finite graph idents, then the new model should maintain the data about the G could, potentially, generate an infinite family of property previously-known residents. This is expressed by the following chains: r ◦ u, u ◦ r, u ◦ r ◦ u, r ◦ u ◦ r, etc. The semantic diagram in which we use τ to denote the two functions τVillage constraint has the effect of truncating this list so that the only and τResident . distinct properties are those shown in the path graph on the τ right side of Figure 6. The path graph is the underlying graph M (Resident) ! M ′ (Resident) of the theory T1 of the sketch S1 whose underlying graph is M(lives in) M ′ (lives in) shown on the left side of Figure 6 and whose only constraint " " is the diagram shown in the center. The derived edge u ◦ r M (Village) ! M ′ (Village) connects each person to his or her elected representative. τ STIDS 2013 Proceedings Page 120 u id id u E P u S1 V $$ S2 E P ### $$ E P r id r ! $###m1 $ m2 $% ! r E T1 $$ T u◦r $$ # 2 $$ ### Fig. 6. The two-vertex graph G (left) together with the diagram D (center) & '## generate the theory T1 (right). The graph and diagram form a sketch S1 . T Fig. 9. Alignment of the presentations (e.g., sketches or ontologies) S1 and S2 into the common knowledge representation T using the union (pushout) operation on sketches. E. Alignment A body of knowledge may be presented in different ways even within a fixed formalism (e.g., ontologies or sketches). sketches and maps. This illustrates a particular case of the People use terms differently and use different words to describe data alignment step shown at the top left of Figure 1. the same concepts. The notion of theory of a sketch provides The need to use structures generated from the available a framework for formulating the alignment problem. Consider presentations, rather than using the presentations alone, is the elected officials example discussed above. An alternative evident in the alignment example discussed in Chapter 10 presentation is shown in Figure 7. This sketch has only a of [14]. In this example, two OWL ontologies are aligned using single vertex C that represents a class of citizens. It has a OWL statements. The first ontology is defined below. single edge e that represents the connection of each citizen to his or her elected official. The semantic constraint (indicated ex1:Mother rdfs:subClassOf ex1:HomeDweller. by the center diagram) again asserts that each elected official ex1:Father rdfs:subClassOf ex1:HomeDweller. ex1:Son rdfs:subClassOf ex1:HomeDweller. represents himself or herself. The theory generated by this ex1:Daughter rdfs:subClassOf ex1:HomeDweller. sketch has one vertex and two edges. It is shown below right. ex1:hasChild rdf:type owl:ObjectProperty. ex1:hasSon rdfs:subPropertyOf ex1:hasChild. e id ex1:hasDaughter rdfs:subPropertyOf ex1:hasChild. C C C e C The second is defined with prefix ex2. It overlaps with but is e C e visibly not equivalent to the first. e ex1:Relative rdf:type owl:class. Fig. 7. An alternative formulation of the knowledge model shown in Figure 6. ex1:Mother rdfs:subClassOf ex1:Relative. The one-vertex graph G (left) together with the diagram D (center) generates ex1:Father rdfs:subClassOf ex1:Relative. the theory T2 (right). The graph and diagram form a sketch S2 . ex1:Child rdfs:subClassOf ex1:Relative. ex1:hasParent rdfs:type owl:ObjectProperty. We seek to align these two formulations of the same We align the two using the OWL statements below. concepts. To do this we can not use the presentations S1 and S2 ex1:Mother owl:equivalentClass ex2:Mother. themselves. We must use theories. Although we can identify ex1:Father owl:equivalentClass ex2:Father. the vertex P of the S1 with the vertex C of S2 , the problem ex1:Son rdfs:subclassOf ex2:Child. is that there is no edge in S1 that corresponds to the edge e ex1:Daugher rdfs:subclassOf ex2:Child. ex1:hasChild owl:inverseOf ex2:hasParent. of S2 . The appropriate edge occurs in the theory T1 of S1 not in the sketch S1 itself. Figure 8 illustrates how the alignment problem is formulated using sketches. The task is to find a Although the Mother and Father classes coincide, there sketch V and sketch maps into the theories generated by the are no appropriate classes in ex2 to identify with ex1:Son two presentations that we seek to align. V can be construed as or ex1:Daughter. The classes occur in a knowledge rep- the overlap between the two theories. It is a view (as defined resentation generated by ex2 not in ex2 itself. Similarly, ex1:hasChild and ex2:hasParent have no corresponding in the next section) of both presentations. element in the other’s ontology. They are identified with S1 S2 elements constructed from the other. V" ! "" !! "" !!m m2 "" F. Contexts and Views ! "!! 1 # ! T1 T2 Decision making uses both general knowledge and specifics of the decision space to balance the expected costs and risks Fig. 8. Formulation of the alignment problem using sketches. To align of a program of actions. It focuses on the components that presentations S1 and S2 (i.e., sketches or ontologies), we compute a sketch V and maps m1 and m2 into the theories generated by the presentations. are most relevant to a mission, its goals and tasks. It must efficiently and effectively manage the available data. Context carries information about intent. Views are imple- The sketch framework supports an operation called pushout mentations of context in a knowledge representation. A view of which is essentially the union accounting for overlaps between a database is derived using a query. In SQL implementations, sketches. With this union operation we can align the two a view is typically a single (virtual) table. The view update presentations that we have been discussing into a single problem addresses the question of how to determine an appro- knowledge representation T . Figure 9 shows the resulting priate update to the state of the total database when a view STIDS 2013 Proceedings Page 121 is modified. The influential paper [1] developed the constant first-order logic is undecidable, NP-complete for propositional complement approach to view updates. [4], [12] defined the logic, and P-complete and linear for propositional Horn logic notion of lens that characterized a class of updates. At about (a property exploited in the Prolog language) [26]. the same time, [15] described update strategies for particular update classes. Sketches were introduced into the study of data A sketch is an alternative, graphical way of presenting a semantics in order to better understand database dynamics; in logical theory [2], [20]. In the sketch data model, we express particular, the view update problem [24]. [19] uses sketches to relationships using diagrams, cones and cocones in directed extend the lens concept and classes of view update strategies. graphs instead of with formulas and terms. Logical inference employs graph properties associated with constraints. A sketch Within the sketch data model, views are particular sketch with no constraints is like a logical signature with no axioms. maps and, in general, are much more expressive than a single derived table. Specifically, a view of a knowledge representa- M (A) tion S is a sketch V together with a sketch map V !T Q M (P ) pb M (C) where T is the theory generated by S (see Section II-D). Consider, for example, the human terrain sketch shown in M (B) Figure 2. One view of interest is obtained by restricting the Fig. 11. Universal mapping property that characterizes pullback cones SeenIn relation to the subrelation in which the village is one in which coalition personnel are based. This view involves A pullback cone, for example, is characterized by the property subclasses of each of the classes (in addition to the subrelation illustrated in Figure 11 which shows a model M of such a of SeenIn that we mentioned). None of these subclasses occur constraint (see [2]). If the two outer functions from Q to M (C) in the sketch itself but they are generated by applying cone are equal, then there is a unique function from Q to M (P ) and cocone constraints. for which the two paths from Q to M (A) are equal as are A challenge to implementing this framework in a decision- the two from Q to M (B). Such graph definitions are called making context is using the available event history and other universal mapping properties [22]. From these we derive other context-specific data to construct an appropriate sketch view in inference rules such as: If the function from M (B) to M (C) is a semi-autonomous manner. The graphical nature of sketches a subtype (is a) relationship, then so is the edge from M (P ) may facilitate the adaptation of recent techniques developed to M (A) [22]. for context sensitive Internet search [6], [16], [21], [28] that Sketches, like logics, are developed with varying levels use the graph structure of the Web. of fidelity. Linear sketches are the least expressive. Finite limit, finite sum, EA (entity-attribute) and mixed sketches are G. Support for n-ary Relations richer. Despite the distinct character of logical and sketch- A limitation of OWL/RDF described by practitioners is its based inference, they share deep connections. For various lack of direct support for representing n-ary relations. Such classes of sketches, there are algorithms for constructing properties can be represented directly in sketches. To express logical theories that have equivalent categories of models (see a relation R that may hold among n entities that have types D.2.2 of [20]). Reasoning about a knowledge model expressed A1 , · · · , An , we introduce vertices for each of these n + 1 as a sketch, therefore, may be achieved either directly using classes and we include n edges R → Ai . Any axioms that the the computational category theory techniques discussed below relation is intended to satisfy would then be formulated using in IV or indirectly by converting to a first-order theory and diagrams, cones and cocones. using a predicate calculus reasoner. R The problem of pattern-based reasoning with sketches is similar to the ontology alignment problem [18] that is solved A1 ··· An mathematically via the theory (or syntactic category) of a sketch [20]. We may align, for example, the human terrain Fig. 10. Sketch for an n-ary relation R among entities of types A1 , . . . , An sketches that are shown in Figures 13 and 2 via a sketch map from the latter to the former. Refinement of the knowledge base to represent levels in a tribal hierarchy (e.g., ethnic III. L OGICAL I NFERENCE groups, tribes, clans and factions) is accomplished with a sketch map from the Figure 2 sketch to a new sketch that The graphical nature of the sketch data model supports would include additional edges and constraints. The simple implementation of pattern-matching reasoning capabilities that business knowledge representation shown in Figure 12 can be emulate the process of experienced decision makers [23]. mapped to a sub-sketch of our human terrain model. In classical logic, we express properties and relationships as terms and formulas that are recursively-constructed from basic components. Inference is formulated as rules for deriving IV. S OFTWARE I NFRASTRUCTURE valid expressions. Like models of physical phenomena, logics The software infrastructure available for working with are developed with varying levels of fidelity based on their sketches is meager relative to that associated for other semantic intended applications. Examples include classical, descriptive, models (e.g., OWL/RDF). The Easik tool1 is the most mature. modal and linear logics. Expressiveness, however, comes at the expense of higher computational complexity: inference for 1 http://mathcs.mta.ca/research/rosebrugh/Easik STIDS 2013 Proceedings Page 122 Employee Division The Easik tool supports read and write operations between to knowledge representations and XML files, SQL files and SQL ed home of ign (MySQL or PostgreSQL) database connections. In Easik, a ass sketch is implemented as a database schema. Each sketch Consultant Full-Time entity (graph node) is a table created according to the schema. works at Office Values that populate the tables form a model of the sketch. Each table has an implicit, integer-valued primary key. In Fig. 12. Part of a sketch representing business structure knowledge general, a primary key constraint on a table expresses the fact that the values in one or more columns together are a unique It provides a graphical interface for building a collection of identifier of a row. This does not preclude the possibility sketches and views. It implements procedures for reading and that two rows may refer, for example, to a single individual. writing sketches to and from XML and SQL files. It also Attributes are table columns. For example, an entity B with provides an interface to models maintained in PostgreSQL no attributes or outgoing edges is implemented in PostgreSQL and MySQL databases. Easik does not implement a reasoning as follows where the id column is an automatically-generated engine. Figure 13 shows a sketch that is similar to the one (i.e., SERIAL), key. whose graph is shown in Figure 2. It was developed using CREATE TABLE B ( id SERIAL PRIMARY KEY ); Easik. The screen shot illustrates convenient abbreviations for various semantic constraints. The decorated arrows to Person, A foreign key constraint specifies that the values in one for example, indicate subtype relationships. These are imple- or more columns must match the values occurring in some e mented as cones. Vertices connected to the + symbol form row of another table. We implement a sketch edge A −→ B a kind of cocone. Its base consists of the vertices Coalition, as a foreign key contained in the A-table and referencing the Foreign and Resident. The paths connected to CD indicate a primary key of the B-table. In PostgreSQL this is expressed diagram constraint. as follows. CREATE TABLE A ( id SERIAL PRIMARY KEY, e INTEGER NOT NULL REFERENCES B (id) ON DELETE CASCADE ON UPDATE CASCADE ); Insertion of a row into the A-table, therefore, involves specifying values for the columns that are introduced into that table for the edges having domain A. Deletion of a B-row can impact the A-table if an A-row references the B-row via e the edge A −→ B. Foreign keys serve to implement relations (object properties) in the sketch data model since a relation is simply an entity having edges to the nodes that correspond to Fig. 13. Human terrain sketch implemented using the Easik software tool the types of its participants. Category theory, despite its abstract nature, is highly Although subtype relations (monic edges) are a particular computational. All semantic constraints, for example, can be kind of cone constraint, they can be implemented as foreign computed from four basic types: cones can be expressed keys with unique references in the codomain table. In general, using product cones and equalizer cones; cocones can be however, sketch constraints are implemented using triggers. expressed using coproducts and coequalizers. One freely- A trigger for a database table or view executes a specified available implementation of these and related computations function whenever certain events occur. The simple diagram is written in the ML programming language and is described shown in Figure 14, for example, asserts that semantics of the in [25]. This work and similar research activities could provide composite edge f followed by g should equal that of h. In a basis for implementing a reasoning engine for a sketch-based PostgreSQL we express this as follows. information system. CREATE FUNCTION commutativeDiagram0() If the only constraints of the sketch are diagrams (i.e., the RETURNS trigger AS $commutativeDiagram0$ sketch has neither cones nor cocones), then the theory T (when DECLARE _cdTarget1 CONSTANT INTEGER := NEW.h; T is, in fact, finite) generated by the sketch may be computed _cdTarget2 CONSTANT INTEGER := (SELECT B.g FROM B via the left Kan extension algorithm which generalizes the WHERE B.id = NEW.f); Todd-Coxeter procedure from group theory [7], [27]. If the BEGIN IF _cdTarget1 IS DISTINCT generated sketch is infinite, the algorithm, of course, does not FROM _cdTarget2 terminate. Its complexity in the cases when it does terminate THEN RAISE EXCEPTION has not been characterized and is highly sensitive to small ’Commutative diagram constraint failure’; variations in the sketch [5]. END IF; RETURN NEW; END; V. I MPLEMENTING S KETCH M ODELS IN DATABASES $commutativeDiagram0$ LANGUAGE plpgsql; CREATE TRIGGER commutativeDiagram0 Features available in major relational database systems in- BEFORE INSERT ON A cluding PostgreSQL and MySQL provide an interface between FOR EACH ROW EXECUTE the mathematical theory of sketches and their application. PROCEDURE commutativeDiagram0(); STIDS 2013 Proceedings Page 123 It is a trigger that fires before insertion of a row into [2] M. Barr and C. Wells. Toposes, Triples and Theories. Springer. 1985 the A-table to confirm that the values entered in the foreign [3] M. Barr and C. Wells. Category Theory for Computing Sciences. key columns for f and h satisfy the commutativity constraint Prentice-Hall. 1990 taking into account the value in the g-column in an appro- [4] A. Bohannon, J. Vaughan and B. Pierce. Relational Lenses: A Language priate row of the B-table. Cone and cocone constraints are for Updatable Views. In Proc. of the 25th ACM SIGMOD-SIGACT- SIGART Symposium on Principles of Database Systems. ACM. 2006 all similarly implemented. All EA sketch constraints can be [5] J. J. Cannon, L. A. Dimino, G. Havas and J. M. Watson. Implementation constructed from these [22]. and Analysis of the Todd-Coxeter Algorithm. Mathematics of Computa- f tion. 27(123):463–490. July 1973 B A [6] H. Cao, D. Jiang, J. Pei, E. Chen and H. Li. Towards Context-Aware cd Search by Learning a Very Large Variable Length Hidden Markov g Model from Search Logs. In Proceedings of the 18th World Wide Web h Conference (WWW’09). pp. 191–200. 2009 C [7] S. Carmody, M. Leeming, R. F. C. Walters. The Todd-Coxeter Procedure and Left Kan Extensions. J. Symbolic Computation. 19:459–488. 1995 Fig. 14. Diagram to implement using a SQL trigger [8] C. Casadio, P. J. Scott and R. A. G. Seely, Eds. Language & Gram- mar: Studies in Mathematical Linguistics and Natural Language. CLSI A research question that arises is how might one utilize the Publications. 2005 sketch data model in a context of large-scale, distributed data. [9] N. N. Čencov. Statistical Decisions Rules and Optimal Inference. Amer- Broader demand for a scalable system that supports views and ican Mathematical Society. 1982 integrity constraints are well-known. Megastore, Tenzing, and [10] E. F. Codd. Derivability, Redundancy, and Consistency of Relations Spanner are Google products developed to meet this demand. Stored in Large Data Banks. IBM Research Report RJ599. 1969 Apache Cassandra is an open-source alternative. [11] C. Ehresmann. Esquisses et types de structures algébriques. Bul. Inst. Politehn. Iaşi. 14:1–14. 1968 [12] J. Foster et al. Combinators for Bi-Directional Tree Transformations: VI. C ONCLUSION A Linguistic Approach to the View Update Problem. ACM Transactions OWL/RDF and related semantic web technologies have on Programming Languages and Systems. 29. 2007 established tenable positions in the intelligence, defense and [13] J. W. Gray. The category of sketches as a model for algebraic semantics. In Categories in Computer Science and Logic. V. 92 of Contemporary security domains. The sketch data model, however, integrates a Mathematics. AMS. pp. 109–135. 1989 variety of features that can be leveraged. These include its deep [14] J. Hebeler, M. Fischer, R. Blace and A. Perez-Lopez. Semantic Web connections with infinitary predicate logic, the ability to imple- Programming. Wiley Publishing, Inc. 2009 ment sketches and their models using major relational database [15] S. J. Hegner. An Order-Based Theory of Updates for Closed Database systems, its graphical nature and, perhaps most significantly, Views. Annals of Math. and Artificial Intelligence. 40:63–125. 2004 its sophisticated notion of view of an information system. It [16] T. Joachims. Optimizing Search Engines Using Clickthrough Data. is possible to formulate OWL constructs using sketches. The SIGKDD’02, Alberta Canada. 2002. classes of sketches that can be expressed in the dialects of [17] M. Johnson and R. Rosebrugh. Sketch Data Models, Relational Schema OWL2 is an open question. We have illustrated these concepts and Data Specifications. Electr. Notes Theor. Comp. Sci. 61(6):1–13. 2002 and their application to a simple ontology alignment problem. [18] M. Johnson and R. Rosebrugh. Ontology Engineering, Universal Al- gebra, and Category Theory. In Theory and Applications of Ontology: The sketch data model also clarifies the formulation of cer- Computer Aplications. Springer-Verlag. pp. 565–576. 2010 tain challenges that we encounter in applications of OWL/RDF. [19] M. Johnson, R. Rosebrugh and R. J. Wood. Lenses, Fibrations and In the sketch approach, uncertainty and lack of information are Universal Translations. Mathematical Structures in Computer Science. aspects of models of the sketch. They are not features of the 22:25–42. 2012 knowledge representation itself. Moreover, sketches (and OWL [20] P. Johnstone. Sketches of an Elephant. Oxford University Press. 2005 ontologies) are more appropriately construed as presentations [21] A. Kustarev, Y. Ustinovesky and P. Serdyukov. Measuring Usefulness of the larger bodies of knowledge that they generate. In sketch of Context for Context-Aware Ranking. ACM WWW 2012 Companion, Lyon, FR. 2012 theory, this larger knowledge base is called the theory of [22] S. Mac Lane. Categories for the Working Mathematician. 2nd Ed. a sketch. Except in simple cases of renaming, alignment of Springer-Verlag. 1999 presentations involves maps into theories. The extent to which [23] J. Morrison, R. T. Kelly, R. A. Moore and S. G. Hutchins. Implications procedures for generating a theory from a sketch can support of Decision Making Research for Decision Support and Displays. In partial-automation of alignment problems is an open research Decision Making Under Stress: Implications for Training and Simulation. question. J. A. Cannon-Bowers and E. Salas, Eds. pp. 375–406. 1998 [24] R. Rosebrugh and R. J. Wood. Relational Databases and Indexed Finally, the concept of view of a knowledge representation Categories. Category Theory 1991: Proceedings of an International is formulated as a sketch map to a theory. This generalizes the Summer Category Theory Meeting Held June 23–30. Vol. 13 of CMS notion of view of a database. Recent techniques developed for Conference Proceedings. AMS. pp. 391–407. 1991 context sensitive Internet search exploit the graph structure of [25] D. E. Rydeheard and R. M. Burstall. Computational Category Theory. the Web and search histories. The extent to which these tech- Prentice-Hall. 1988 niques and the graphical nature of sketches can be exploited to [26] A. Troelstra and H. Schwichtenberg. Basic Proof Theory. Cambridge University Press. 2000 support semi-automated extraction of context-relevant views of [27] R. L. Wojtowicz and N. S. Yanofsky. Quantum Kan Extensions and a knowledge representation is another open research challenge. Applications. DOI Contract D11PC20232 Final Report. 2013 [28] B. Xiang, D. Jiang, J. Pei, X. Sun, E. Chen and H. Li. Context- R EFERENCES Aware Ranking in Web Search. ACM SIGIR’10 29–23 July, Geneva, [1] F. Bancilhon and N. Spyratos. Update Semantics of Relational Views. Switzerland. 2010 ACM Transactions on Database Systems. 6:557–575. 1981 STIDS 2013 Proceedings Page 124