Supporting FrameNet Project with Semantic Web technologies Paulo Hauck1, Regina Braga1, Fernanda Campos1, Tiago Torrent2, Ely Matos2, José Maria N. David1 1 Pós Graduação em Ciência da Computação 2 Projeto FrameNet Brasil – Universidade Federal de Juiz de Fora (UFJF) Campus Universitário s/n – Juiz de Fora – MG – Brazil {paulo.hauck, regina.braga, fernanda.campos, tiago.torrent, jose.david, ely.matos}@ufjf.edu.br Abstract. FrameNet Project is being developed by ICSI at Berkeley, with the goal of documenting the English language lexicon based on Frame Semantics. For Brazilian Portuguese, the FrameNet-Br Project, hosted at UFJF, follows the same theoretical and methodological perspective. This work presents a service-based infrastructure that combines Semantic Web technologies with FrameNet-like databases, by considering the hypothesis that the application of technologies such as ontologies, linked data, and web services can contribute to build and reuse lexical resources based on Frame Semantics. The contributions are related to enriched semantics, data reliability and natural language processing. 1. Introduction FrameNet is a lexicography project under development at the International Computer Science Institute (ICSI) with the goal of documenting the English language lexicon based on the concepts from Frame Semantics in [FILLMORE, 1982]. The FrameNet-Br Project is derived from FrameNet, and focuses on the documentation of linguistic frames in Brazilian Portuguese [SALOMÃO, 2011]. There are several works related to the FrameNet Project. Some of them aim to improve data reusability by using technologies that facilitate the reuse of the information contained in the FrameNet database. Among these technologies, one of the most prominent is related to the Semantic Web. The use of Semantic Web technologies emphasizes characteristics such as reuse and acquisition of new knowledge. In the FrameNet context, the Semantic Web can improve the use of lexical data because (i) the formalism provided by ontologies allows formal detailing and definition of shared concepts and the use of inference machines for data validation and implicit information discovery; (ii) the linked data can promote greater integration of FrameNet data with other information bases, like DBPedia and GeoNames and (iii) Web Services allow the integration of tools, independently of both programming languages and operational systems. On the other hand, the interface between lexical resources and ontologies, the OntoLex Interface [Huang et al, 2010], has been recently explored with the aims of understanding how the associations between lexical and formal semantics can contribute to the improvement of machine reading, an activity that is key to data mining, automatic translation and text summarization. This paper presents a service-based infrastructure, named FSI (FrameNet Semantic Infrastructure), which combines Semantic Web technologies and FrameNet structure and data. Therefore, this work is related to the benefits that can be obtained with the application of Semantic Web technologies in the context of FrameNet, both in the documentation process and in frame-based searches. The main objective is to build an infrastructure based on Semantic Web concepts to support the development of FrameNet-like resources, as well as their use and applications. This infrastructure aims to provide two interactive interfaces, one focused on the interaction with other software tools, through a service layer, and another one to support direct user interaction. It allows the maintenance of data, also taking advantage of the benefits of using ontologies for this task. The specific goals, derived from the main objective are: (i) to provide greater formalism to the FrameNet data, by using ontologies to describe their structures; (ii) to promote the use of FrameNet data by external tools through Web Services; (iii) to provide tools that help in frame documentation and also in sentence annotation; (iv) to reduce the probability of human errors during sentence annotation, using validations based on inference machines; and (v) to provide the user with a new experience on querying FrameNet data, by using linked data and, hence, enabling the discovery of new information. These goals are fully explored by Hauck [2014]. This article particularly focuses on goals (i), (ii) and (v). This paper is organized into the following sections, besides this introduction. Section 2 briefly presents the main concepts related to frames and the FrameNet Project. Section 3 discusses related work. Section 4 presents the FSI infrastructure and a case study. Finally, section 5 presents the conclusions. 2. Frame Semantics and FrameNet Frame Semantics proposes that human knowledge is not composed of isolated pieces of information, but is rather based on a set of related concepts. This knowledge is specified in complex structures, called frames. These frames constitute a complex system of related concepts so that in order to understand one of them it is necessary to understand the structure in which the entire frame fits [FILLMORE, 1982]. FrameNet [RUPPENHOFER et al, 2011] is a lexical resource for the English language, based on the theory of Frame Semantics. As a lexical resource, it focuses on lexical units, concepts or scenes evoked by theses units (represented by frames), and relations among these frames. The whole project can be seen as an information base, used successfully in applications such as information extraction, machine translation and valence dictionaries. It is also being expanded to other languages such as German1, Japanese2, 1 http://www.laits.utexas.edu/gframenet/ 2 http://jfn.st.hc.keio.ac.jp/ French3 and Spanish4. A version for Brazilian Portuguese has also been developed, called FrameNet-Br [SALOMÃO et al., 2013]. A frame is a structure composed of Frame Elements (FE), which are the participants and props of the scene described by the frame. If a scene is expressed by a sentence, it is said that a specific word in the sentence is the "target", which evokes the frame. Each part of the sentence that is part of the syntactic locality of the target word expresses a Frame Element. The process of defining which part corresponds to each Frame Element is called "annotation" and is, together with frame creation, the main task involved in FrameNet development. According to Ruppenhofer et al. [2010], there are factors that call for the creation of a new frame, such as differences in perspective, variation in the argument structure, causative-inchoative alternation and ontological distinction of FEs. In order to assist the latter factor, FrameNet adopted the definition of Semantic Types for some FEs. The Semantic Type assigned to a FE aims to indicate the type of filler expected to that FE and, on an annotated sentence, one can expect the filler of a FE to be a instance of the assigned Semantic Type. 3. Related Works Some previous works were discussed during FSI specification, including the use of ontologies to formalize the structure of frames and their relationships [MOREIRA, 2012] [NUZOLESE et al, 2011] [SCHEFFCZYK et al., 2008]; the construction of a service- oriented infrastructure combined with a formal model for the description of their data [VEGI et al, 2011; 2012]; and the development of a tool to support the documentation of frames and the annotation of sentences [LEENOI, 2011]. Scheffczyk et al. [2008] proposes the construction of ontologies in OWL-DL from the transcription of information expressed by FrameNet frames. The ontologies are used to formally describe the structure of a frame. FSI is also based on the idea of creating an ontology to formalize the structure of the frames and their relations, in order to obtain a higher level of data reliability, and to allow other tools to take advantage of these data, considering their formalism. However, we aim to increase the use of ontologies, not only validating the structure of a frame, but also the relationship with other frames and also between FEs. To tackle this issue, we consider that the semantic definition of frames points out that a frame also depends on its relations with other frames, and not only on its components such as FEs and Lexical Units (LU). In Nuzolese et al. [2011], data from FrameNet were semi-automatically transformed into linked data, using ontologies. According to the authors, this transformation enables greater data integration with other related databases. Similarly to Nuzolese et al. [2011], FSI also uses linked data. However, FSI uses a vocabulary already available in FSI, provided from the data integration, from annotated sentences with data and from other related databases. The advantage of FSI in this case, besides the expressive power of ontologies to define the formal vocabulary of these data, is also in the use of domain 3 https://sites.google.com/site/anrasfalda/ 4 http://sfn.uab.es:8080/SFN/ ontologies to allow greater expressiveness, as well as the use of external resources connected through linked data, forming a richer knowledge network. Moreira [2012] revisits some of the limitations that Ovchinnikova el al. [2010] had already pointed out in FrameNet, such as low lexical coverage, incompleteness of the network of relations, inconsistencies in the sets of inherited properties, lack of axiomatization, as well as the fact that FrameNet poses no explicit distinction between roles and types, an important feature for ontologies. Moreira [2012] then proposes that elements of FrameNet structure be formalized so as to avoid mistakes in using them. FSI extends Moreira’s [2012] work, by creating an ontology for those elements and also for the data derived from annotation. Considering the proposal of Leenoi et al. [2011], ontologies were used to formalize part of the data from Thai FrameNet, and they also built tools to support the documentation of frames and the annotation of sentences. For FSI, we also developed tools to support the documentation of frames and the annotation of sentences. Our major differential is that we use semantic information to assist the user in documentation and annotation, ensuring greater data reliability, since, by using inference techniques, the ontology allows the user to notice data inconsistencies. Vegi et al. [2012] propose an infrastructure for managing and sharing design patterns using metadata descriptions based on a formal vocabulary, and a communication interface to be used by external tools. As Vegi et al. [2012], in FSI formal vocabularies for data representation were created, but with greater expressiveness, by the use of OWL and SWRL rules. In addition, FSI also uses the SOA protocol, thereby promoting greater availability for integration with other tools. 4. FrameNet Semantic Infrastructure In this section, we present the FSI architecture. FSI is based on SOA principles, and uses Semantic Web concepts together with FrameNet data in order to contribute to the maintenance of FrameNet and the applicability of these data to other activities related to NLP (Natural Language Processing). Two ontologies were created for FSI implementation: i) FrameNet metadata ontology, named ONTO-FRAME-BR, which semantically describes the data structure that makes up the frames and the semantic relations between them, and ii) ONTO- ANNOTATION-BR, to cover sentence annotation. FSI aims to reuse existent domain ontologies, which serve as a source for definition of the Semantic Type of Frame Elements. This provides a semantic expressiveness to the fragments of the scene referenced by each Frame Element. The linked data approach [BERNERS-LEE et al, 2001] is also exploited by FSI, for connecting each fragment of a scene, represented by an FE, to a Web resource, so it is possible to get new information from these resources. 4.1 Ontologies The Copa 2014 FrameNet Brasil Project (COPA2014) [TORRENT et al., 2014] is a frame- based domain specific trilingual electronic dictionary built to be used by tourists, journalists and the staff involved in the organization of the FIFA World Cup 2014 in Brazil. COPA2014 uses the whole FrameNet infrastructure. We used COPA2014 as a basis to implement and validate FSI. The domain ontologies used for this validation were the PROTON ontology [TERZIEV et al., 2005], which covers various domains but details the tourism domain in depth, and the SWAN Soccer Ontology [MÖLLER, 2004] that covers the soccer domain. The Onto-Frame-BR aims to provide a semantic basis for the data and metadata. It makes FrameNet data readable by computer engines through the formalism imposed by the ontology. It also contributes to data reliability, since the ontology ensures the semantic validity of the data. To build this ontology, we carried out a reverse engineering process in the COPA2014 project database. The entities that compose the database model, strictly related to the representation of the Frame, were initially mapped as ontology classes. Each relationship between these entities was mapped as object properties. Next, it was necessary to refine the ontology, according to the FrameNet documentation [RUPPENHOFFER et al., 2010]. The first step was to define existential and universal restrictions of classes, in order to validate individuals based on the minimum requirements for their existence. As an example, Figure 1 shows the restrictions for the ontological class FrameElement. Figure 1: Classes and Restrictions in Protège. The next step was the separation between frame-to-frame relations and frame internal relations, since in the COPA2014 database, they were grouped together. This separation was made in order to avoid that relations were assigned incorrectly, and also to ensure that the semantic definition of these relations be consistent with that by Ruppenhoffer et al. [2010]. However, some semantic definitions could not be fully specified using only OWL. Thus, SWRL rules were used with the aim of either classifying individuals or identifying implicit relationships that would not be possible only by using OWL. Figure 3: Rule to verify the inheritance of causative frame. Figure 2: Perspective_on restrictions. Ruppenhoffer et al. [2010] and Leenoi et al. [2011] describe seven possible frame- to-frame relations and their restrictions. Considering this documentation and in order to adequately represent the structure of FrameNet, the semantics of these relations were defined in FSI. To help in the identification of frames that violate these or any other restrictions defined by SWRL rules, we created a InvalidFrame class for those individuals. As an example of one of relations defined in the ontology, we have the Perspective_on relation, which is described as a relation between a neutral frame and another non-neutral frame. This relation occurs when a neutral frame can adopt more than one viewpoint. Thus, FEs may vary according to the viewpoint adopted, and the two or more viewpoints can not coexist in the same frame. To explain this restriction, an equivalent property of this relation in the ontology was described as non-reflective, without the need to create SWRL rules (Figure 2). As an example of SWRL rules creation, we have the Causative_of and Inchoative_of relations. Causative frames should inherit from the Transitive_action frame, while Inchoative frames should inherit from Event, State or Gradable_attributes frames. As shown in Figure 3, the rule for the relation Causative_Of checks whether that frame is defined as causative of another frame, and also inherits from a frame that has a different name than Transitive_action, so, the rule classifies the target frame from the Causative_of relation, in an Invalid_Frame class. Similarly to the frame-to-frame relations, Ruppenhoffer et al. [2010] and Leenoi et al. [2011] also describe possible relations between FEs inside the same frame. In order to support these relations, semantic descriptions in the ontology were also specified. In Figure 4, we can see a summary of all SWRL rules created to support frame internal relations. Figure 4: SWRL rules. As a result of this process, the Onto-Frame-BR was specified. This ontology differs from the ontologies defined in Leenoi et al. [2011], Nuzolese et al. [2011 ] and Scheffczyk et al. [2008], especially considering the detailed semantics of the relations between frames and between FEs. In Nuzolese et al. [2011] and Scheffczyk et al. [2008] these relations are not expressed or are expressed only as part of the vocabulary without restrictions or rules to validate them. Only in Lenoi et al. [2011] the relations between frames are discussed. But the authors do not make clear if they were treated in the ontology or were only informed. Furthermore, the authors provide no means to obtain or reproduce the ontology. A partial view of Onto-Frame-BR, presenting its main classes and relations, is shown in Figure 5.  Figure 5: Onto-Frame-BR main classes and relations. The Onto-Annotation-BR ontology was also developed with the aim of completing the Onto-Frame-BR ontology, covering the semantic annotation, i.e., defining the participation of fragments as FEs and identifying the frame. This ontology allows the representation of annotated sentences carried out in the project. In order to validate the semantics of annotations, two SWRL rules were created, as well as a InvalidAnnotatedSentence class for classifying sentences with invalid annotations. Therefore, a way to validate the semantics of annotations was created, using the two (Onto-Frame-BR and Onto-Annotation-BR) ontologies defined in this work. Furthermore, from the annotated sentences fragments identified in these ontologies, it is possible to associate external linked data resources. These ontologies can be obtained in http://www.ufjf.br/framenetbr-eng/projects/fsi/. 4.2 Architecture Figure 6 shows an overview of the infrastructure with its main components. FSI is divided into three layers: i) Data Layer, where data processed by the infrastructure, such as ontologies, linked data resources, services annotations and access control information, are stored; ii) Service Layer, whose purpose is to provide an interface to external software tools (developed in any programming language); and iii) the Portal, where an interface is provided. This paper focuses on the description of the Service Layer. Figure 6: FSI main components. As stated before, FSI uses a set of ontologies to provide a formal structure and semantics for the data stored in the infrastructure. These ontologies include ONTO- FRAME-BR and ONTO-ANNOTATION-BR, described in section 4.1. The other ontologies are related to the domains that are represented by the frames stored in the database. These domain ontologies allow the definition of semantic restriction on the FEs in a way that makes it possible to evaluate if the annotations respects the semantics of the frame that is evoked. For example, in Figure 7, considering the soccer domain, we have the representation of the frame Play, in which their FE Squads, Squad1 and Squad2 are related to the ontological type Squad, which was defined in the Soccer domain ontology. This ontology also defines a restriction where instances of this FE may also be instances of the term Country described in the ontology. The same holds for the FE Host. However, in this case, City and Country are both ontological types that can be accepted as an instance of this FE. For the representation of the fragments that instantiate the FEs, we used linked data sources [Berners-Lee et al, 2001]. Thus, each fragment is connected to at least one term, from an external database, providing more information based on the navigation between these connections. In Figure 8, an example of this approach is presented, considering the annotation "The Brazilian Team faces the USA in Toronto". Where parts of annotations, such as "The Brazilian Team" and "The USA" are connected by an equivalence relation using a linked data external dataset that represents these teams. Based on these resources, we can get new information from the semantic network that is formed by linked data sets. As an example, we can get the name of the coach or even the names of the players of these teams, taking advantage of the links to external sources.  Figure 7: Use of Domain ontologies to restrict the semantic type of FEs. Figure 8: Fragments of annotations using linked data. The FSI functionalities are available through services, based on SOA architecture. Therefore, four services were developed with the aim of providing a communication interface for external tools: i) Access Service: controls the external tools accessing FSI functionalities, avoiding changes in the ontology data; ii) Visualization Service: responsible for several data formats that can be provided by the ontology, including the visualization of frames and their structures; iii) Ontology and Linked Data Service: responsible for providing an interface to access and modify the ontology data. This service is the most important feature of FSI. It has several methods to obtain FrameNet elements like frames, LUs (lexical units), sentences and annotations. iv) Discovery Service: responsible for providing information about services and their methods, including semantic annotations. 4.3. Usage Scenario In this section we present a usage scenario considering how the interface provided by the Service Layer can be used in NLP activities by external tools. To illustrate this scenario, we used the Cadmos tool (Character-centered Annotation of Dramatic Media Objects) [CATALDI et al., 2011]. It is a framework to support the annotation of multimedia resources based on the use of ontologies and on the identification of scenes. During the description of a scene, terms and expressions with ambiguous meanings and different interpretation possibilities may appear. To tackle this issue, Cadmos provides a disambiguation process that uses various lexical resources, including FrameNet and WordNet. In Cadmos, the generated annotations are stored in RDF triples and associated to ontologies for domain delimitation. Since WordNet and FrameNet are supported for scene identification, FSI may be used in the frame disambiguation process, as shown in Figure 9. One of the advantages of using FSI in this context is the use of semantic information that can be obtained from FEs, since these elements may be assigned to the domain ontology, making possible to better identify the context in which the frame can be applied. In addition, it could also be possible to take advantage of FrameNet annotation data, stored in FSI, which are associated with external linked data sources. These connections can enrich the media annotations, for example, by assigning an annotated sentence element from FrameNet to a Cadmos annotation element.   Figure 9: Terms Disambiguation Figure 10: Service Flow Execution in process in Cadmos with order to obtain FSI frames FSI frames data. and FEs data. Figure 10 details the interaction flow between Cadmos and the methods of the ontology and linked data service of FSI to obtain the frames and their FEs data in the disambiguation process. 5. Conclusions Several authors have been contributing to improve the access to lexical resources such as FrameNet, as well as their use in different applications and the sharing of related information. Those efforts benefit from Semantic Web technologies, such as ontologies and linked data. These technologies, applied to FrameNet, can provide formalization of frame structure using both formal vocabularies and ontological classes. This work follows this approach by combining: i) the use of ontologies that describe the structure of frames and semantic relations between these frames associated with the use of domain ontologies for semantic constraints of FEs; ii) the use of linked data to enrich the annotation of sentences; and iii) the access to data through a Service Layer that enables the integration of FSI with other services and applications. The main contributions of the work are: i) the construction of an infrastructure, based on Semantic Web and SOA technologies, to foster the access to lexical resources and to promote more reliability to the documentation of frames and annotation of sentences; ii) the construction of ONTO-FRAME-BR, which formally represent the frame structure and deals with the semantics of the relations between frames and between their elements, supporting the frame documentation process and providing the user with evidence of possible errors; iii) the construction of ONTO-ANNOTATION-BR, which helps structure the process of sentence annotation so that sentence fragments can be both related to FEs documented in ONTO-FRAME-BR and used as linked data; iv) the possibility of using domain ontologies to relate external linked data resources to fragments of annotated sentences. Some limitations may also be highlighted, both related to the technology and to the scope adopted. Among them, we list: i) the limitations of OWL and SWRL to treat inheritance relations between frames; ii) the fact that only the semantic aspects of sentence annotation were accounted for in FSI. . Despite these points to be improved, we believe that the work achieved its objectives by providing an infrastructure that contributes to FrameNet both in regards to maintenance issues and to the offering of semantic information that can be used by external users and tools. Acknowledgments We would like to thanks FAPEMIG, CNPq and CAPES for their support. References BERNERS-LEE, T., HENDLER, J., LASSILA, O. The Semantic Web - A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American, 01/05/2001. Disponível em: . Acesso em 16 abril 2013. CATALDI, M., DAMIANO, R., LOMBARDO, V., PIZZO, A., SERGI, D. Integrating commonsense knowledge into the semantic annotation of narrative media objects. In: Artificial Intelligence Around Man and Beyond. Springer Berlin Heidelberg, 2011. FILLMORE, C. J. Frame Semantics. In: The Linguistic Society Of Korea (org.). Linguistics in the morning calm. Seoul: Hanshin, 1982. HAUCK, P. FSI: Uma infraestrutura de apoio ao Projeto Framenet utilizando Web Semântica. 142p. Dissertação de Mestrado em Ciência da Computação. Universidade Federal de Juiz de Fora, 2014. HUANG, C. et al. Ontology and the Lexicon. Cambridge, MA: Cambridge University Press, 2010. LEENOI, D., JUMPATHONG, S., PORKAEW, P., SUPNITHI, T. Thai Framenet Construction and Tools. International Journal on Asian Language Processing 21(2), p. 71-82. 2011. MOREIRA, A. Proposta de um framework apoiado em ontologias para a detecção de frames. 2012. 194p. Tese de Doutorado em Linguística. Universidade Federal de Juiz de Fora, 2012; MÖLLER, K. SWAN Soccer Ontology. 2004. Disponível em http://sw.deri.org/2005/05/swan/soccer/ontology/soccer.owl. Acessado em: 12 Set. 2013. NUZZOLESE, A. G., GANGEMI, A., PRESUTTI, V. Gathering lexical linked data and knowledge patterns from FrameNet. In: Proceedings of the sixth international conference on Knowledge capture (K-CAP '11). ACM, New York, USA, p 41-48. 2011. OVCHINNIKOVA, E. et al. Data-driven and ontological analysis of FrameNet for natural language processing. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10). ELRA. Valletta, Malta. RUPPENHOFER, J., ELLSWORTH, M., PETRUCK, M. R. L., JOHNSON, C. R., SCHEFCZYK, J. FrameNet II: extended theory and practice. 2010. Disponível em: . Acesso em: 12 Jan 2013. SALOMÃO, M. M. M, TORRENT, T.T., SAMPAIO, T. F. A Linguística Cognitiva Encontra a Linguística Computacional: notícias do projeto Framenet Brasil. Cadernos de Estudos Linguísticos 55 (1), p. 7-32. 2013. (in portuguese) SCHEFFCZYK, J., BAKER, C. F., NARAYANAN, S.. Ontology-Based reasoning about lexical resources. In Ontologies and Lexical Resources for Natural Language Processing, Cambridge Studies in Natural Language Processing. Cambridge University Press, Cambridge. 2008. TERZIEV, I., KIRYAKOV, A., MANOV, D. Base upper-level ontology (BULO) Guidance. Deliverable of EU-IST Project IST. 2005. TORRENT, T. T., SALOMÃO, M. M. M., CAMPOS, F. C. A., BRAGA, R. M. M., MATOS, E. E. S., GAMONAL, M. A., GONÇALVES, J. A., SOUZA, B. C. P., GOMES, D. S., PERON, S. R. Copa 2014 FrameNet Brasil: a frame-based trilingual electronic dictionary for the Football World Cup. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations. Dublin, Ireland, p 10-14. 2014. VEGI, L. F. M. Technical description of Dublin Core application profile to analysis patterns (DC2AP). 2012. Disponível em: . Acesso em: 22 Jan. 2013. VEGI, L. F. M., PEIXOTO, D. A., SOARES, L. S., LISBOA FILHO, J., OLIVEIRA, A. P. An infrastructure oriented for cataloging services and reuse of analysis patterns. In: International Workshop On Reuse In Business Process Management, 2, 2011, Clermont- Ferrand, France. Proceedings of BPM 2011 Workshops, LNBIP vol. 100, Part 4. Berlin: Springer, 2012. p. 338-343.