=Paper= {{Paper |id=Vol-293/paper-6 |storemode=property |title=Semantic Enterprise Technologies |pdfUrl=https://ceur-ws.org/Vol-293/paper6.pdf |volume=Vol-293 |authors=Massimo Ruffolo,Luigi Guadagno,and Inderbir Sidhu,pages 70-84 |dblpUrl=https://dblp.org/rec/conf/semweb/RuffoloGS07 }} ==Semantic Enterprise Technologies== https://ceur-ws.org/Vol-293/paper6.pdf
FIRST - First Industrial Results of Semantic Technologies




                                   Semantic Enterprise Technologies

                                  Massimo Ruffolo1,2 , Inderbir Sidhu2 , Luigi Guadagno2
                          1
                              ICAR-CNR - Institute of High Performance Computing and Networking
                                            of the Italian National Research Council
                                                        2
                                                          fourthcodex inc.
                                 University of Calabria, 87036 Arcavacata di Rende (CS), Italy
                                                 e-mail: ruffolo@icar.cnr.it
                                       e-mail: {lguadagno,isidhu}@fourthcodex.com
                                      WWW home page: http://www.fourthcodex.com



                        Abstract. Nowadays enterprises request information technologies that lever-
                        age structured and unstructured information for providing a single integrated
                        view of business problems in order to foster better business process manage-
                        ment and decision making. The growing interest in semantic technologies is
                        due to the limitation of existing enterprise information technologies to answer
                        these new challenging needs. Semantic Web Technologies (SWT), the current
                        open standard approaches to semantic technologies based on semantic web
                        languages, provide some interesting answers to novel enterprise needs by al-
                        lowing to use domain knowledge within applications. However, SWT aren’t
                        well suited for enterprise domain because of some drawbacks and a lack of
                        compatibility with enterprise-class applications. This paper presents the new
                        Semantic Enterprise Technologies (SET) paradigm founded on the idea of Se-
                        mantic Models that are executable, flexible and agile representation of domain
                        knowledge. Semantic Models are expressed by means of the Codex Language
                        obtained combining Disjunctive Logic Programming (Datalog plus disjunc-
                        tion) and Attribute Grammars both extended by object-oriented and two-
                        dimensional capabilities. Semantic Models enable to exploit domain knowledge
                        for managing both structured and unstructured information. Since the Codex
                        Language derives from the database field, it allows SET to provide advanced
                        semantic capabilities well suited for enterprises. Differences and interoperabil-
                        ity issue between SET and SWT are briefly discussed in the paper that shows,
                        also the SET Reference Architecture (SETA), an application example and the
                        business value of SET.


                1     Introduction
                   Nowadays enterprise knowledge workers need information technologies that lever-
                age structured and unstructured information for providing a single integrated view of
                business problems in order to foster better business process management and decision
                making. They want answers to specific business requirements, not documents and
                reports, search document base by concepts not simply by keywords, query databases
                and unstructured information repositories in a uniform way taking into account the
                meaning of data and information, obtain application integration and interoperability
                exploiting semantic-aware services.




70
FIRST - First Industrial Results of Semantic Technologies




                    The growing area of Semantic Technologies [2] could answer these novel needs
                by providing a new enterprise-class of semantically-enabled business applications. In
                this scenario semantics means the use of domain knowledge to affect computing by
                allowing the meaning of associations among information to be known and processed
                at execution time. Semantic Technologies must be able to:
                  – Allow computable semantic representations of domain knowledge, and provide
                     reasoning capabilities on it, in order to enable software to do useful tricks such as
                     finding hidden relationships in a complicated web of objects.
                  – Handle very large-scale knowledge bases containing both structured and unstruc-
                     tured information.
                  – Automate the capturing of events of events and entities to connect people, places,
                     and events using information in different formats coming from many different
                     sources.
                  – Assist human monitoring and analysis of situations, workflows, collaboration and
                     communication.
                  – Facilitate interoperability by exploiting enterprise concepts to link applications,
                     data sources, and services in easily to use composite views, providing real-time
                     interaction, analysis, and decision-support.
                  – Deliver their capabilities as value added ingredient components that are easy to
                     embed with existing enterprise applications and integration architectures.
                These challenges exceed the capabilities and performance capacity of current open
                standards approaches to semantic technologies based on semantic web languages (i.e.
                OWL and RDF) [16]. Such approaches have the benefit to create interoperability
                over the web but they suffer of the following important drawbacks when applied to
                enterprise domains: (i) they have a lack of compatibility with relational databases,
                the most widely adopted enterprise information technologies for representing, storing
                and managing enterprise structured data; (ii) they allow for ”connecting the onto-
                logical dots” by using predefined ontological-data model that assists to discover and
                infer new knowledge (when proper reasoners are available). But ’real world’ enterprise
                business practices are not neatly predefined and in fact are more likely to be dynamic,
                emergent, or even chaotic. In other words, fully articulated knowledge models (ontolo-
                gies) are not necessary for recognizing relevant facts in the ever-growing knowledge
                bases, or inferring new useful related information, or ensuring that enough knowl-
                edge is available just in time to improve decision-making; (iii) they do not propose
                mechanisms for handling directly the already available huge amount of unstructured
                information.
                    This paper describes the novel Semantic Enterprise Technologies (SET) paradigm
                founded on a the idea of Semantic Models (SM) that are executable, flexible and agile
                representation of domain knowledge (e.g. simple taxonomies equipped with few and
                simple descriptors, very rich ontologies equipped with complex business rules and
                descriptors). SM are represented by means of a new language, called Codex Language
                obtained by combining Disjunctive Logic Programming (Datalog plus disjunction)
                [4,6,9] and attributes grammars both extended by means of object-oriented [3,12] and
                two-dimensional capabilities [13,14,15]. SM enable to exploit domain knowledge for
                managing both structured (e.g. relational databases) and unstructured information
                (e.g. document repositories).




71
FIRST - First Industrial Results of Semantic Technologies




                    SET overcome the above mentioned SWT limitations by allowing the following
                fundamental set of features: (i) they basically came from the database world. So they
                propose query language and reasoning approaches well suited for the relation model.
                The Codex language in fact, are based on the Closed World Assumption (CWA) and
                the Unique Name Assumption (UNA) whereas semantic web languages are based on
                Open World Assumption (OWA) and do not consider UNA. In order to make SET
                interoperable with SWT a translation approach that takes into account the different
                semantics of the Codex Language and OWL has been defined. This way SET can
                be considered complementary to SWT, (ii) the Codex Language allows to represent
                Semantic Models that can be composed of ”just enough” taxonomies and rules or of, if
                required, complex ontologies containing also relationships, constraints, axioms, so the
                knowledge representation process fits the very dynamic enterprise environments; (iii)
                they provide powerful unstructured information management mechanisms that allow
                concepts annotation and extraction from unstructured sources (semantic enterprise
                metadata acquisition) via a pattern-based approach. So precise semantic information
                extraction, classification and search, enabling semantic indexing and querying of the
                enterprise knowledge, are also possible.
                    In order to make effective technological and business advantages of SET, as already
                happens for existing enterprise-class information technologies, a reference architecture
                that constitutes the framework for applying SET features to enterprise domains is
                required. This paper proposes SETA (the SET reference Architecture) that describes
                the technologies and architecture enabling the use of Semantics in an enterprise.
                SETA aims at transforming multiple sources (structured and unstructured) and bits
                of dynamic information with domain and concept coverage from disparate enterprise
                systems into useful knowledge that fosters better enterprise performances.
                    SET have already been applied to contact-center software, CRM applications,
                asset and content management repositories, news and media delivery services, health
                care organizations and more. Their distinctive features help to shape the future of
                new knowledge-powered computing solutions in many different traditional areas like
                Competitive Intelligence, Document and Content Management, CRM, Text Analytics,
                Information Extraction. The application of SETs to real cases shows that they can
                improve value creation capabilities of enterprises allowing the definition and execution
                of more efficient and effective business processes and decision making.
                    The remainder of this paper is organized has follows. Section 2 presents the struc-
                ture of Semantic Models and describes the Codex Language, Section 3 provides a
                comparison between Codex Language and OWL and drafts the interoperability ap-
                proach, Section 4 describes the SET reference Architecture, Section 5 sketches an
                application of SET to health care risk management, Section 6 contains a brief de-
                scription of the business value of SET and Section 7 concludes the paper.


                2     Semantic Models

                    SET are based on the concept of Semantic Model. A SM is a flexible and agile
                representation of domain knowledge. Semantic Models can be constituted by either
                just small pieces of a domain knowledge (e.g. small taxonomies equipped with few




72
FIRST - First Industrial Results of Semantic Technologies




                rules) or rich and complex ontologies (obtained, for example, by translating existing
                ontologies and by adding rules and descriptors) that gives respectively weak or rich
                and detailed representation of a domain. More formally a SM is a seven-tuple of the
                form:
                                            SM = hHC , HR , O, T, A, M, Di
                where:
                 – HC and HR are sets of classes and relations schemas. Each schema is constituted
                   by a set of attributes, the type of each attribute is a class. In both HC and HR
                   are defined partial orders allowing the representation of concepts and relation
                   taxonomies (with multiple inheritance).
                 – O and T are sets of class and relation instances also called objects and tuples.
                 – A is a set of axioms represented by special rules expressing constraints (rules
                   always true) about the represented knowledge.
                 – M is a set of reasoning modules that are logic programs constituted by a set of
                   (disjunctive) rules that allows to reason about the represented and stored knowl-
                   edge, so new knowledge not explicitly declared can be inferred.
                 – D is a set of descriptors (i.e. production rules in an two-dimensional object-
                   oriented attribute grammar) enabling the recognition (within unstructured docu-
                   ments) of class (concept) instances contained in C, so their annotation, extraction
                   and storing is possible.
                SM are represented by means of the novel powerful and very expressive Codex Lan-
                guage described in the following.


                2.1     The Codex Language

                    The Codex Language brings together the expressiveness of ontology languages
                and the power of Disjunctive logic rules. The Codex Language combines notions com-
                ing from Disjunctive Logic Programming (Datalog plus disjunction) and Attribute
                Grammars both extended by means of object-oriented and two-dimensional capabili-
                ties. The attribute grammars allow one to intuitively express patterns for recognizing
                instances of the ontology concepts in structured and unstructured data. The language
                has been defined as such in recognition of the fact that in order to leverage and apply
                Semantic Models in the enterprise, it is important to find and retrieve appropriate
                data from all kinds of data sources, such as, schema based structured databases, un-
                structured documents and semi-structured documents containing implicit structure.
                    In order to understand the motivation for extending the capabilities of the Codex
                Language beyond the modeling capabilities of most ontology languages to a language
                that also makes it possible to describe how to recognize instances of the ontology
                concepts in data, consider the following example. Imagine a financial analyst tasked
                with researching some corporation, say, Microsoft. An ontology could describe that
                a company may be owned by individuals or boards, and the knowledge base could
                contain the facts that Microsoft is a company, and it is owned by Bill Gates. Besides
                representing all of the above information, the codex language can also express the
                rules and patterns for identifying instances of Microsoft Corp. in documents. One
                possibility is for a document to mention Microsoft as “the company owned by Bill




73
FIRST - First Industrial Results of Semantic Technologies




                Gates”. The rules for recognizing the concept of company ownership can easily be
                expressed in the codex language allowing for the identification of the instances of
                companies owned by Bill Gates or anyone else in structured and unstructured data
                sets. Hence, it is now possible to determine that when a document mentions “the
                company owned by Bill Gates”, we have really discovered a document talking about
                Microsoft Corp.
                    The Codex language supports the typical ontology constructs, such as, class, object
                (class instance), object-identity, (multiple) inheritence and relations, tuple (relation
                instance). It also supports powerful reasoning by means of reasoning modules that
                are modular logic programs containing a set of (disjunctive) rules. The language aug-
                ments these typical ontology modeling constructs with a mechanism that enables the
                description of patterns and rules over an ontology for identifying meaningful data in
                any data source. This part of the codex language extends classical attribute grammars
                by means of two-dimensional and object-oriented capabilities allowing the expression
                of concept descriptors. A descriptor represents a rule that “describes” the means for
                recognizing instances (objects) of a concept in unstructured documents by means of
                complex (two-dimensional) composition of other objects or in structured sources (e.g.
                databases, structured files) by means of ad-hoc queries and reasoning tasks. When a
                descriptor matches within an unstructured document, the document can be annotated
                with respect to the related concept, and moreover, an instance of the matching class
                can be created in the knowledge base. In order to empower unstructured information
                management, the Codex language can also exploit sophisticated Natural Language
                Processing capabilities.
                    In the Codex Language a class can be thought of as an aggregation of individuals
                (objects or class instances) that have the same set of properties (attributes). A class
                is defined by a name (which is unique) and an ordered list of attributes identifying
                the properties of its instances. Each attribute is identified by a name and has a type
                specified as a built-in or user-defined class. For instance, the classes country and
                person can be declared as follows:
                  class country (name:string).
                  class person (name:string, age:integer, nationality:country).

                The ability to specify user-defined classes as attribute types (nationality:country)
                allows for the description of complex objects, i.e. objects made of other objects recur-
                sively (a person could be a parent that is also a person). The language also supports
                the definition of special classes called collection classes that ”collect” individuals that
                belong together because they share some properties. Instances of these classes can
                either be declared explicitly as in the case of normal classes, or specified by a rule
                that defines the shared properties in an intensional way.
                    Objects (class instances) are declared by asserting new facts. Objects are unam-
                biguously identified by their object-identifier (oid ) and belong to a class. An instance
                for the class manager can be declared as follows:
                  bill:manager("Bill", 35, usa, 50000).
                  mario:manager("Mario", 37, italy, 40000).

                Here, the strings ”Bill” and ”Mario” are the values for the attribute name; while
                bill and mario are the object-identifier[s] (oid) of these instances (each instance
                is identified by a unique oid). Instance arguments can be specified either by object




74
FIRST - First Industrial Results of Semantic Technologies




                identifiers (usa and italy), or by a nested class predicate (complex term) which works
                like a function.
                    Relationships among objects are represented by means of relations, which, like
                classes, are defined by a unique name and an ordered list of attributes (with name and
                type). Relation instances (tuples) are specified by asserting a set of facts. For instance,
                the relation managed by, and a tuple asserting that project newgen is managed by
                bill (note that newgen and bill are OID), can be declared as follows:
                   relation managed_by(proj:project, man:manager).
                        managed_by(newgen, bill).
                The Codex language makes it possible to specify complex rules and constraints over
                the ontology constructs, merging, in a simple and natural way, the declarative style
                of logic programming with the navigational style of ontologies. Additionally, the rules
                and constraints are organized as reasoning modules, benefiting from the advantages of
                modular programming. Eventually, in order to check the consistency of a knowledge
                base the user can specify global integrity constraints called axioms. For example, the
                following axiom expresses that each project can have only one manager:
                   ::- managed_by(proj:A, man:M1), managed_by(proj:A, man:M2), M1 <> M2.
                A descriptor can be viewed as an object-oriented production rule p ∈ Π in an
                two-dimensional attribute grammar defined on a formal context free grammar G =
                hΣ, VN , A ∈ VN , Πi over the alphabet Σ. In the Codex Language the domain of the
                attributes is the set of classes declared in the Semantic Model whereas the alphabet
                Σ is constituted by class names and object identifiers. More formally, a descriptor d
                is a couple hh, bi such that h → b, where h is the head of d and b is its body. The
                following example declares the company class, along with an instance and a descriptor
                for this instance:
                   class company (name:string, nationality:country, market:market_area).
                        acme: company("Acme Inc.", usa, rocket_skiis).
                         -> .
                The descriptor head  represents the object (or the class of objects) that the user
                desires to recognize within unstructured documents. The descriptor body represents
                the rule “describing” the structure of the objects in the head in terms of regular
                expressions (or other objects).
                    As another example, consider the following fragment of the Codex Language rep-
                resenting the extraction of a table containing stock index:
                collection class italian_stock_market_index_row(
                    stockMarketIndex: stock_market_index, [value]){}
                         ->
                             {L:=add(L,V)}+.
                collection class italian_stock_market_index_table
                    ([italian_stock_market_index_row]){}
                         ->  {L:=add(L,X)}+ DIRECTION = "vertical".
                In the example, the collection class italian stock market index row represents ta-
                ble rows composed of the stock market index and a sequence of numeric value[s];
                the collection class italian stock market index table will contains the vertical se-
                quence of rows that constitute the table.
                    The Codex language allows the expression of very complex patterns that utilize the
                ability to treat any unstructured or semi-structured document as a two-dimensional
                plane and exploit the full expressive power of semantic models. The above example




75
FIRST - First Industrial Results of Semantic Technologies




                in particular shows the ability to focus on complex and very specific information for
                extraction from unstructured documents, which in this case happens to be a table of
                stock indexes related to Italian companies only.
                    It is noteworthy that descriptor [s] can be expressed using a visual support and
                that the instances of concepts matching the descriptors can be extracted and stored
                in the Knowledge Base. These instances can also be serialized as XML, RDF, etc. to
                be used for analytical and/or Web (Semantic Web) based applications.


                3     The Semantic Enterprise and the Semantic Web

                    In order to motivate why Semantic Enterprise Technologies paradigm could rep-
                resent an opportunity for improving current semantic technology capabilities, it is
                important to compare and contrast the semantic enterprise approach against the se-
                mantic web approach. For example, in the closed world of an enterprise, reasoning
                using closed world and unique name assumptions offers certain advantages over the
                open world assumption necessary when working with the Web. Another important
                aspect for consideration is the need for performing semantic reasoning over enter-
                prise data residing in relational databases. It is also important to be able to integrate
                rules with ontologies in order to maximize the return from the ontology building ef-
                fort. The benefits possible from the integration of logic programming and OWL have
                already been described in [1,8,11]. In this section we provide a detailed description
                of the advantages of the SET paradigm and its interoperability with semantic web
                technologies.


                3.1     Closed World and Unique Name Assumptions

                    Enterprise databases are founded on Unique Name Assumption (UNA) and Closed
                World Assumption (CWA). The semantics of these databases are intuitive and familiar
                to their users. The CWA is known only information explicitly stored in the Knowledge
                Base, so a CWA-based rules entails false for information not explicitly declared in the
                Knowledge Base. The UNA assumes that names of information elements stored in
                the Knowledge Base are unique, so two elements with different names are necessarily
                distinct. The codex language is based on closed world and unique name assumption.
                This makes it highly suitable for modeling and reasoning in the enterprise. Conversely,
                OWL is based on Open World Assumption which is better suited for the Semantic
                Web.
                    A Semantic Model can be seen as an extension of the enterprise database. In
                order to understand the effect the semantic assumptions of a database can have on
                the behavior of a query over exactly the same data, consider the databases shown
                in Fig. 1. When database (A) is queried for the company that has its headquar-
                ters only in the USA, a CWA-based reasoner returns Intel, whereas an OWA-
                based reasoner is unable to answer this query because there is no explicit state-
                ment stating that Intel has its headquarters only in the US. Database (B) repre-
                sents the relationship that for each project we can have at most one leader and




76
FIRST - First Industrial Results of Semantic Technologies




                that a project leader can lead many projects. The DL representation is the fol-
                lowing TBox: PROJECT MANAGER hasLeader: PROJECT --> MANAGER and ABox: PROJECT(p1) PROJECT(p2)
                MANAGER(John) MANAGER(Chris) hasLeader(p1,John) hasLeader(p2,Chris). In order to represent the
                database constraint that a project may have at most one manager, we impose the
                following DL constraint: > ≤ 1 hasLeader. In the case of a relational database, when
                a user attempts to insert the fact that John is also a leader of the project p2, there
                is a violation of referential integrity, and the action is not allowed by the database.
                However, if we assert the same fact in a DL ABox as hasleader(p2,john), the system does
                not complain about any possible violation. When the system is queried for the leader
                of the project p2, the OWA-based reasoner infers that Chris and John are the project
                leaders and that Chris=John because of the lack of UNA. In order to obtain the same
                behavior as CWA in this case, the user must explicitly declare that Chris 6= John.
                This is obviously counterintuitive for the normal enterprise user.




                                             Fig. 1. Two simple relational databases
                Negative Queries Negative queries are queries for data where the query condition
                does not hold. For example, an airline database may contain all the pairs of airports
                connected by its flights. The CEO of the airline might want to query for all the
                airports not connected by its hub airport. An OWA based reasoner would have trouble
                answering such a query, since the database does not, and for all practical purposes,
                can not, contain information about every airport that is not serviced. In the absence of
                asserted or explicitly derived negative information, the OWL based reasoner cannot
                answer such query. On the other hand, answering such a query is trivial for CWA
                based systems which correctly assume that if there doesn’t exist any record of a flight
                between two airports, then it is save to assume that there are not flights between those
                airports. Answering queries containing negative criteria in an intuitive way usually
                requires some form of closed-world reasoning.
                    From the above discussions and examples it follows that CWA and UNA are
                better suited for the enterprise domain because their semantics are more intuitive for
                the users. Also, the reasoning based on such assumptions produces more useful new
                information. So an approach to knowledge representation coming directly from the
                database world is able to preserve the database semantics allowing the user to query
                and reason on databases in a more natural way.

                3.2     Rules and Integrity Constraints
                Rules It is important to integrate rules with ontologies in order to fully capture the
                knowledge of an enterprise or a domain. Enterprises need to realize a return on their




77
FIRST - First Industrial Results of Semantic Technologies




                investment in building ontologies, and a such, ontologies must not be perceived as
                just a documentation technique for describing a domain, but also as a language and
                system that can capture and execute domain rules expressed over the ontology. This
                has also been recognized by the Semantic Web community which is actively working
                on adding rules to the Semantic Web language stack [7]. The Rule Interchange For-
                mat (RIF) working group of the World Wide Web Consortium (W3C) is currently
                working on standardizing such a language. Responding to popular demand, the Se-
                mantic Web Rule Language (SWRL) has been proposed. However, as the authors
                point out, SWRL has been designed as a first-order language and straightforwardly
                integrated with OWL as a simple extension. For these reasons SWRL is trivially
                undecidable and does not address nonmonotonic reasoning tasks, such as expressing
                integrity constraints. The codex language on the other hand has built-in support for
                rules. These rules can operate on the concepts, relationships and constraints defined in
                the ontology, as well as any other arbitrary atoms and predicates desired by the user.
                The logical underpinning for codex is provided by Datalog, providing closed world
                semantics such as default negation and unique name assumption as well as support
                for recursive queries. As consequence of being Datalog based, codex also supports all
                of the semantics of SQL easily extending the benefits of semantic technologies to
                enterprise databases.
                Integrity Constraints In OWL, domain and range restrictions constrain the type
                of objects that can be related by a role. Also, participation restrictions specify that
                certain objects have relationships to other objects.
                    For example, we can state that each student must have a student number as:
                Student ⊆ ∃hasStudentN umber.StudentN umber [10]. However, even though this
                restriction can be expressed, its semantics are quite different from those of an equiva-
                lent constraint in a relational database. A relational database will not allow the user
                to insert a student without specifying a student number. Because of its open world
                assumption, OWL will allow a Student without a student number, since the assump-
                tion will be that the student has a student number, but it hasn’t yet been added to
                the ABox. Straightforward specification and enforcement of integrity constraints is
                extremely important for enterprises as such constraints are an essential part of their
                domain and business model. In the codex language it is trivial to specify and apply
                such constraints.

                3.3     Disjunctive Reasoning
                    The Codex Language is developed on top of the DLV system [4,9] that allows to
                exploit the answer set semantics and stable model semantics [6] for disjunctive logic
                programs. The possibility to exploit disjunction (in the head of rules) and constraints
                enable to express reasoning tasks for solving complex real problems. The disjunction
                allows to compute a set of models (search space) that can contain the possible solu-
                tion for a given problem, whereas constraints allow to choice the solution by adopting
                a brave or cautious reasoning approach. In the brave reasoning are considered the
                solutions that are true in at least one model, in the cautious reasoning a solution to
                be considered acceptable must be true in all the models. Disjunction allows to express
                very rich business rules and to model reasoning task able to solve different kinds of




78
FIRST - First Industrial Results of Semantic Technologies




                problems like: planning problems (under incomplete knowledge), constraints satisfi-
                ability, abductive diagnosis. In the following, in order to better explain disjunctive
                capabilities of the Codex Language an example of team building is provided.
                      module(teamBuilding){
                           (r) inTeam(E, P) ∨ outTeam(E, P) :- E:employee(),P:project().
                           (c1) :- P:project(numEmp:N),not #count{E:inTeam(E, P)}=N.
                           (c2) :- P:project(numSk:S),not #count{Sk:E:employee(skill:Sk), inTeam(E, P)} ≥ S.
                           (c3) :- P:project(budget:B),not #sum{Sa,E:E:employee(salary:Sa), inTeam(E, P)} ≤ B.
                           (c4) :- P:project(maxSal:M),not #max{Sa:E:employee(salary:Sa), inTeam(E, P)} ≤ M. }
                The reasoning module contain a disjunctive rule r that guesses whether an employee
                is included in the team or not, generating the search space by exploiting the answer
                set semantics. The constraints c1, c2, c3, and c4 model the project requirements,
                filtering out those solutions that do not satisfy the constraints. So knowledge encoded
                into the Semantic Model can be exploited for providing solutions to complex business
                problems.


                3.4     Interoperability With the Semantic Web

                    Currently OWL is the standard language on which the Semantic Web movements
                is trying to really implement it [16]. A lot of dictionaries, thesaurus and ontologies,
                expressed by means of this language, are already available. Many companies and
                organizations have invested in the construction of semantic resource like enterprise
                models and domain ontologies that they want use for building semantic applications.
                So all the organizations deal with the problem to reuse these semantic resources,
                developed by means of OWL, and to make them interoperable with already existing
                databases and application.
                    The SET paradigm address the interoperability problem with SWT providing
                an import-export approach that enable to ”translate” OWL ontologies in Semantic
                Models (without descriptors) and viceversa. When a Semantic Model is obtained from
                an already existing OWL ontologies descriptors can be added in order to enable SET
                to exploit the model for semantic applications.
                    To make OWL and the Codex Language interoperable is a complex problem be-
                cause it requires the translation from OWL (based on description logic) to the Codex
                Language (based on Disjunctive Logic Programming). Problems related to the joining
                of DL and DLP has been addressed by many authors [5,8,10]. They presented many
                methods to translate OWL-DL to logic programming. All these methods require first
                the definition of which fragment of OWL to deal with. For example, In [10] a detailed
                description of how a considerable fragment of OWL-DL can be processed within logic
                programming systems. To this end, the author derives an enhanced characterization of
                Horn-SHIQ, the description logic for which this translation is possible, and explained
                how the generated Datalog programs can be used in a standard logic programming
                paradigm without sacrificing soundness or completeness of reasoning. For the trans-
                lation from OWL to Codex Language some fragments of the languages for which the
                semantic equivalence between the original OWL ontology and the obtained Semantic
                Model is guaranteed have been identified. In this context semantic equivalence means
                the possibility to obtain the same entailment in the source and in the destination
                language despite the differences of semantic assumptions. A more detailed discussion




79
FIRST - First Industrial Results of Semantic Technologies




                around OWL-Codex Language translation is out of the scope of this paper, for further
                details is possible to consult [3].


                4     SETA: The SET reference Architecture
                    The SET reference Architecture describes the technologies and architecture en-
                abling the use of Semantics in an enterprise. The various entities for enabling semantic
                applications in an enterprise are shown in Fig. 2. below.




                         Fig. 2. Semantic Enterprise             Fig. 3. Ontology Management
                In this figure we identify the high level components composing the Semantic Enter-
                prise. These include tools and technologies for:
                  – modeling and managing Semantic Models
                  – specifying rules over a Semantic Model
                  – extracting metadata from unstructured and semi-structured sources
                  – storing and indexing the extracted metadata
                  – reasoning over extracted metadata and existing structured data
                The modeling environment usually includes a graphical interface (GUI) for creating,
                modifying and managing Semantic Models. This interface also allows one to spec-
                ify rules over the Semantic Model. These rules may include the concept descriptors
                and the reasoning engine uses the constraints and relationships in conjunction with
                these rules for performing its reasoning tasks. It is vital to include both structured
                and unstructured data in order to enable the semantic enterprise. This requires two
                important features from the architecture: (i) use of existing structured data sources,
                (ii) extraction of useful information from unstructured data sources. The technologies
                comprising the semantic architecture should be able to use the existing enterprise
                databases for providing semantic capabilities since the enterprises have large amounts
                of data and hence it is not feasible to convert all this data into a different format
                for the purpose of enabling semantic enterprise. So far all of the technologies in the




80
FIRST - First Industrial Results of Semantic Technologies




                enterprise have only been able to use structured data for providing decision making
                capabilities and enabling various applications. With the availability of ontologies and
                reasoning capabilities it is now possible to leverage unstructured and semi-structured
                data, since the semantics allow us to assign the correct interpretation. It is important
                to remember that a large amount of information is locked up in such unstructured
                sources, and it is important to include it in any decision making process and applica-
                tions. Extracting useful information from unstructured data requires a multi-pronged
                approach. This kind of approach includes:
                  – Natural Language Processing (NLP) including Part of Speech (POS) tagging,
                    sentence splitting, entity extraction, and other NLP related capabilities.
                  – Concept recognition and extraction.
                  – Supervised and unsupervised classification.
                    Supporting the Semantic Enterprise is a well defined process that requires the
                management of Semantic Models. This goes beyond ontology modeling, and includes
                the entire ontology life-cycle management comprising of ontology versioning, combina-
                tion of ontologies into a higher level ontology, ontology comparison, and the addition
                of rules and descriptors to ontologies as shown in Fig. 3. above.


                5     An e-Health Application

                    In this section an application of SET to a real case is shown. The scope of the
                application is to support wards to monitor errors and risks causes in lung cancer cares.
                The application has been developed in the context of a project aimed at provide some
                Italian hospitals with health care risk management capabilities.
                    The application takes as input Electronic Medical Records (EMRs) and risk re-
                ports coming from different hospital wards. An EMR is generally a flat text document
                (having usually 3 pages) written in Italian natural language. EMRs are weakly struc-
                tured, for example, the personal data of the patient are in the top of the document,
                clinical events (e.g medical exams, surgical operations, diagnosis, etc.) are introduced
                by a date. Risk reports, filled at the end of clinical process, are provided to patients by
                wards to acquire information about errors with or without serious outcomes, adverse
                events, near misses.
                    The goal of the application is to extract semantic metadata about oncology ther-
                apies and errors with temporal data. The application extracts personal information
                (name, age, address), diagnosis data (diagnosis time, kind of tumor, body part affected
                by the cancer, cancer progression level), care and therapies information. Extracted
                information are exploited to construct, for each cared patient, an instance of lung
                cancer clinical process. Acquired process instances are analyzed by means of data and
                process mining techniques in order to discover if errors happen following patterns in
                phases of drugs prescription, preparation or administration.
                    The application has been obtained by representing a medical Semantic Model
                inherent to lung cancer that contains: (i) concepts and relationships referred to the
                disease, its diagnosis, cares in term of surgical operations and chemotherapies with
                the associated side effects. Concepts related to persons (patients), body parts and
                risk causes are also represented. All the concepts related to the cancer come from the




81
FIRST - First Industrial Results of Semantic Technologies




                ICD9-CM diseases classification system, whereas the chemotherapy drugs taxonomy,
                is inspired at the Anatomic Therapeutic Chemical (ATC) classification system. (ii) a
                set of descriptors enabling the automatic acquisition of the above mentioned concepts
                from Electronic Medical Records (EMRs). In the following a piece of the medical
                Semantic Model that describes (and allows to extract) patient name, surname, age
                and disease is shown.
                class anatomy ().
                    class body_part (bp:string) isa {anatomy}.
                        class organ isa {body_part}.
                            lung: organ("Lung").
                            ->.
                        ...
                    ...
                class disease (name:string).
                    tumor: disease("Tumor").
                    ->.
                    cancer: disease("Cancer").
                    ->.
                    ...
                relation synonym (d1:disease,d2:disease)
                    synonym(cancer,tumor).
                    ...
                class body_part_desease () isa {disease}.
                    lung_cancer: body_part_disease("Lung cancer").
                    -> CONTAIN  & 
                    ...
                collection class patient_data (){}
                    collection class patient_name (name:string){}
                         ->   {Y := X;}
                            SEPBY .
                    collection class patient_surname (surname:string){}
                            ->
                                  {Y:=X;} SEPBY .
                        collection class patient_age (age:integer){}
                            -> {Y := $str2int(Z);}
                                SEPBY .
                    ...
                collection class patient_data (name:string, surname:string,
                                               age:integer, diagnosis:body_part_disease){}
                         ->
                             CONTAIN {X:=X1}
                            & {Y:=Y1} & {Z:=Z1} & .
                ...
                The classes diagnosis section and hospitalization section used in the above
                descriptors represent text paragraphs containing personal data and diagnosis data
                recognized by proper descriptors that aren’t shown for lack of space. The extraction
                mechanism can be considered in a WOXM fashion: Write Once eXtract Many, in fact
                the same descriptors can be used to enable the extraction of metadata related to
                patient affected by lung cancer in unstructured EMRs that have different arrange-
                ment. Moreover, descriptors are obtained by automatic writing methods (as happens,
                for example, for the cancer and tumor concepts) or by visual composition (as happens
                for patient data)
                    Metadata extracted by using the Semantic Model are stored as collection class
                instances into a knowledge base. For the simple piece of Semantic Model shown above
                the extraction process generates the following patient data class instance for an
                EMR: "@1": patient data("Mario","Rossi","70",lung cancer).
                    The application is able to process many EMRs and risk reports in a single execu-
                tion and to store extracted metadata in XML format.




82
FIRST - First Industrial Results of Semantic Technologies




                6     The Business Value of Semantic Enterprise Technologies

                    The SET paradigm allows value creation from different perspectives. From the
                technological point of view, the value creation capabilities of the SET paradigm can
                be explained by introducing the knowledge powered computing vision. This vision is
                founded on the transformation of enterprise information into knowledge by converting
                knowledge into software via a portfolio of embeddable semantic components. Seman-
                tic Models transform information into knowledge and can be leveraged to directly
                embed knowledge into software making it actionable. This way Semantic-aware enter-
                prise applications can be obtained. In fact, to provide value SET cannot be separate,
                external, or isolated, rather they have to interoperate with the complex portfolios
                of applications and information repositories, owned by enterprises to enhance their
                performances.
                    From the strategic point of view, turning knowledge into software enable to deliver
                a new generation of knowledge powered features that allow to better assist in driving
                business processes and in making decisions. A new categories of knowledge worker can
                leverage such enhanced features to obtain functionalities and domain knowledge inter-
                changes that creates information intelligence where knowledge powered applications
                deliver more precise answers with adaptive responses. So better performances can be
                achieved by improving decision support and making, exception handling, emergency
                response, compliance, risk management, situation assessment, command and control.
                    From the tactical point of view, is important to note that SET features enable
                a new way to use data and information. SET allow to leverage new information re-
                sources like e-mail, web pages, forums, blogs, wikis, CRM transcripts, search logs,
                organizational documents as already happens for database. To exploit executable Se-
                mantic Models enables a better understanding of the interrelationships and shared
                context of existing structured and unstructured enterprise information. So more in-
                formed users can work smarter with better business process execution and monitoring,
                decision-making and planning.


                7     Conclusion

                    This paper presented the Semantic Enterprise Technologies (SET) paradigm. SET
                are based on the concept of Semantic Model that are executable, flexible and agile
                representation of domain knowledge (e.g. simple taxonomies equipped with few and
                simple descriptors, very rich ontologies equipped with complex business rules and de-
                scriptors) and to exploit it for managing both structured (e.g. relation databases) and
                unstructured information (e.g. document repositories). Semantic Models are expressed
                by means of the Codex Language obtained combining Disjunctive Logic Programming
                (Datalog plus disjunction) and Attribute Grammars both extended by object-oriented
                and two-dimensional capabilities. SET overcome the limitation of Semantic Web Tech-
                nologies (SWT) in enterprise domains. SET provide mechanisms to address several
                important modeling problems that frequently happens in enterprises and that are
                hard, if not impossible to solve using OWL alone, but can easily be addressed us-
                ing the Codex Language. SET are interoperable with SWT thank to a translation




83
FIRST - First Industrial Results of Semantic Technologies




                mechanism. Translation allows to import portions of already existing OWL ontolo-
                gies to use in semantic enterprise applications and to export portions of Semantic
                Models toward semantic Web applications. Leveraging enhanced semantic features of
                SET, enterprises can transform information into knowledge in order to achieve better
                business performances.

                References
                1. Baader F., Calvanese D., McGuinness D.L., Nardi D., Patel-Schneider P.F., eds. The
                   Description Logic Handbook: Theory, Implementation, and Applications. CUP - 2003
                2. Davies J., Studer R., Warren P. Semantic Web Technologies: Trends and Research in
                   Ontology-based Systems. Wiley, July 11, 2006, ISBN-13: 978-0470025963.
                3. Dell’Armi T., Gallucci L., Leone N., Ricca F., Schindlauer R. ”OntoDLV: an ASP-based
                   System for Enterprise Ontologies”. Proceedings of the 4th International Workshop on
                   Answer Set Programming, Porto, Portugal, September 8–13, 2007.
                4. Eiter T., Gottlob G., Mannila H. Disjunctive Datalog. ACM TODS 22(3) (1997) 364418.
                5. Eiter T., Lukasiewicz T., Schindlauer R., Tompits H. Combining Answer Set Program-
                   ming with Description Logics for the Semantic Web. In: Principles of Knowledge Repre-
                   sentation and Reasoning: Proceedings of the Ninth International Conference (KR2004),
                   Whistler, Canada. (2004) 141–151.
                6. Gelfond M., Lifschitz V. Classical Negation in Logic Programs and Disjunctive Databases.
                   NGC, vol. 9, pg. 365-385, 1991.
                7. Horrocks I., Patel-Schneider P.F., Boley H., Tabet S., Grosof B., Dean M. Swrl: A se-
                   mantic web rule language combining owl and ruleml W3C Member (2004) Submission.
                   http://www.w3.org/Submission/SWRL/.
                8. Krotzsch M., Hitzler P., Vrandecic D., Sintek M. How to reason with OWL in a logic
                   programming system. In Proceedings of the Second International Conference on Rules and
                   Rule Markup Languages for the Semantic Web, RuleML2006, pp. 17–26. IEEE Computer
                   Society, Athens, Georgia, November 2006.
                9. Leone N., Pfeifer G., Faber W., Eiter T., Gottlob G., Perri S., Scarcello F. The DLV
                   System for Knowledge Representation and Reasoning. ACM TOCL 7(3) (2006) 499562
                10. Motik B. Reasoning in Description Logics using Resolution and Deductive Databases.
                   Phd Thesis. 2006 Karlsruhe.
                11. Motik B., Horrocks I., Rosati R., Sattler U. Can OWL and Logic Programming Live
                   Together Happily Ever After? 5th International Semantic Web Conference, Athens, GA,
                   USA, November 5-9, 2006, LNCS 4273.
                12. Ricca F., Leone N. ”Disjunctive Logic Programming with types and objects: The DLV+
                   System”. Journal of Applied Logic, Elsevier, Volume 5, Issue 3, September 2007, Pages
                   545-573.
                13. Ruffolo M., Leone N., Manna M., Sacc D., Zavatto A. Esploiting ASP for Semantic
                   Information Extraction. In proceedings of the ASP’05 workshop - Answer Set Program-
                   ming: Advances in Theory and Implementation, University of Bath, Bath, UK, 27th–29th
                   July 2005.
                14. Ruffolo M., Manna M. A Logic-Based Approach to Semantic Information Extraction.
                   In proceedings of the 8th International Conference on Enterprise Information Systems
                   (ICEIS’06), Paphos, Cyprus, May 23-27, 2006
                15. Ruffolo M., Manna M., Oro E. Object Grammar. Internal Report of High Performance
                   Computing and Networking Institute of the Italian National Research Council. 2007
                16. Smith M. K., Welty C., McGuinness D. L. OWL web ontology language guide. W3C
                   Candidate Recommendation (2003) http://www.w3.org/TR/owl-guide/.




84