Semantic Link Prediction through Probabilistic Description Logics Kate Revoredo1 , José Eduardo Ochoa Luna2 , and Fabio Gagliardi Cozman2 1 Departamento de Informática Aplicada, Unirio Av. Pasteur, 458, Rio de Janeiro, RJ, Brazil 2 Escola Politécnica, Universidade de São Paulo, Av. Prof. Mello Morais 2231, São Paulo - SP, Brazil katerevoredo@uniriotec.br,eduardo.ol@gmail.com,fgcozman@usp.br Abstract. Predicting potential links between nodes in a network is a problem of great practical interest. Link prediction is mostly based on graph-based features and, recently, on approaches that consider the se- mantics of the domain. However, there is uncertainty in these predictions; by modeling it, one can improve prediction results. In this paper, we pro- pose an algorithm for link prediction that uses a probabilistic ontology described through the probabilistic description logic crALC. We use an academic domain in order to evaluate this proposal. 1 Introduction Many social, biological, and information systems can be well described by net- works, where nodes represent objects (individuals), and links denote the rela- tions or interactions between nodes. Predicting a possible link in a network is an interesting issue that has recently gained attention, due to the growing inter- est in social networks. For instance, one may be interested on finding potential friendship between two persons in a social network, or a potential collaboration between two researchers. Thus link prediction [12, 20] aims at predicting whether two nodes (i.e. people) should be connected given that we know previous infor- mation about their relationships or interests. A common approach is to exploit the network structure, where numerical information about nodes is analyzed [12, 20, 9]. However, knowledge about the objects represented in the nodes can improve prediction results. For instance consider that the researchers Joe and Mike do not have a publication in common, thus they do not share a link in a collaboration network. Moreover, graph features do not indicate a potential link between them. However, they have published in the same journal and they both teach the same course in their respectively universities. This information can be an indication of a potential collaboration between them. Given this, approaches that are based on the semantics related to the domain of the objects represented by the nodes [21, 18] have been proposed. In some of them, an ontology modeling the domain and the object interests were used in the prediction task. However, there is uncertainty in such predictions. Often, it is not possible to guarantee the relationship between two objects (nodes). This is maybe due to the fact that information about the domain is incomplete. Thus, it would be interesting if link prediction approaches could handle the probability of a link conditioned on the information about the domain. In our example, knowing that the probability of the relationship between Joe and M ike conditioned on the knowledge of them publishing in the same journal and teaching the same course is high implies a link between them in the network; otherwise, a link is not suggested. In graph-based approaches, probabilistic models learned through machine learning algorithms were used for link prediction. Some examples of probabilistic models are Probabilistic Relational Model (PRM) [6], Probabilistic Entity Relationship Model (PERM) [7] and Stochastic Relational Model (SRM) [22]. On approaches based on semantic we claim that ontologies must be used to model the domain. Therefore, to model uncertainty, probabilistic approaches, such as probabilistic ontologies, must be considered. An ontology can be represented through a description logic [2], which is typi- cally a decidable fragment of first-order logic that tries to reach a practical bal- ance between expressivity and complexity. To encode uncertainty, a probabilistic description logic (PDL) must be contemplated. The literature contains a number of proposals for PDLs [8, 10, 19, 13]. In this paper we adopt a recently proposed PDL, called Credal ALC (crALC) [4, 16, 5], that extends the popular logic ALC [2]. In crALC one can specify sentences such as P (Professor|Researcher) = 0.4, indicating the probability that an element of the domain is a Professor given that it is a Researcher. These sentences are called probabilistic inclusions. Exact and approximate inference algorithms that deal with probabilistic inclusions have been proposed [4, 5], using ideas inherited from the theory of Relational Bayesian Networks (RBN)[11]. In this paper, we propose to use a probabilistic ontology defined with the PDL crALC for semantic link prediction. The paper is organized as follows. Section 2 reviews basic concepts of PDLs and crALC. Section 3 presents our algorithm for semantic link prediction through the PDL crALC. Experiments are discussed in Section 4, and Section 5 con- cludes the paper. 2 Probabilistic Description Logics and crALC Description logics (DLs) form a family of representation languages that are typi- cally decidable fragments of first order logic (FOL) [2]. Knowledge is expressed in terms of individuals, concepts, and roles. The semantic of a description is given by a domain D (a set) and an interpretation ·I (a functor). Individuals represent objects through names from a set NI = {a, b, . . .}. Each concept in the set NC = {C, D, . . .} is interpreted as a subset of a domain D. Each role in the set NR = {r, s, . . .} is interpreted as a binary relation on the domain. Several probabilistic descriptions logics (PDLs) have appeared in the litera- ture. Heinsohn [8], Jaeger [10] and Sebastiani [19] consider probabilistic inclusion axioms such as PD (Professor) = α, meaning that a randomly selected object is a Professor with probability α. This characterizes a domain-based semantics: prob- abilities are assigned to subsets of the domain D. Sebastiani also allows inclusions such as P (Professor(John)) = α, specifying probabilities over the interpretations themselves. For example, one interprets P (Professor(John)) = 0.001 as assigning 0.001 to be the probability of the set of interpretations where John is a Professor. This characterizes an interpretation-based semantics. The PDL crALC is a probabilistic extension of the DL ALC that adopts an interpretation-based semantics. It keeps all constructors of ALC, but only allows concept names on the left hand side of inclusions/definitions. Additionally, in crALC one can have probabilistic inclusions such as P (C|D) = α or P (r) = β for concepts C and D, and for role r. If the interpretation of D is the whole domain, then we simply write P (C) = α. The semantics of these inclusions is roughly (a formal definition can be found in [5]) given by: ∀x ∈ D : P (C(x)|D(x)) = α, ∀x ∈ D, y ∈ D : P (r(x, y)) = β. We assume that every terminology is acyclic; no concept uses itself. This as- sumption allows one to represent any terminology T through a directed acyclic graph. Such a graph, denoted by G(T ), has each concept name and role name as a node, and if a concept C directly uses concept D, that is if C and D appear respectively in the left and right hand sides of an inclusion/definition, then D is a parent of C in G(T ). Each existential restriction ∃r.C and value restriction ∀r.C is added to the graph G(T ) as nodes, with an edge from r and C to each restriction directly using it. Each restriction node is a deterministic node in that its value is completely determined by its parents. Example 1. Consider a terminology T1 with concepts A, B, C, D. Suppose P (A) = 0.9, B v A, C v B t ∃r.D, P (B|A) = 0.45, P (C|B t ∃r.D) = 0.5, and P (D|∀r.A) = 0.6. The last three assessments specify beliefs about partial overlap among concepts. Suppose also P (D|¬∀r.A) = ² ≈ 0 (conveying the existence of exceptions to the inclusion of D in ∀r.A). Figure 1 depicts G(T ). Fig. 1. G(T ) for terminology T in Example 1 and its grounding for domain D = {a, b}. The semantics of crALC is based on probability measures over the space of interpretations, for a fixed domain. Inferences, such as P (Ao (a0 )|A) for an ABox A, can be computed by propositionalization and probabilistic inference (for exact calculations) or by a first order loopy propagation algorithm (for approximate calculations) [5]. 3 Link Prediction by using crALC In this section we describe how to apply the PDL crALC for semantic link prediction. We borrowed some syntax from the graph-based approach where each node (a person in a social network) is represented by A, B, C, and we are interested in defining whether a link between A and B is suitable given there is no link between these nodes. Interests between the nodes are modeled through a probabilistic ontology represented by the PDL crALC. The prediction link task can be described as: Given: • a network defining relationship between objects; • an ontology represented by crALC describing the domain of the objects; • the ontology role that defines the semantic of the relationship between objects; • the ontology concept that describes the network objects. Find: • a revised network defining relationship between objects. The proposed algorithm for link prediction receives a network of a specific domain. For instance, in a collaboration network the nodes represent researchers and the relationship can have the semantic ”has a publication with” or ”is ad- vised by”. Therefore, the ontology represented by crALC describes the domain of publications between researchers, having concepts like Researcher, Publication, StrongRelatedResearcher and NearCollaborator and roles like hasPublication, hasSameInstitution and sharePublication. This ontology can be learned automat- ically through a learning algorithm as the ones proposed in [15, 17]. Thus, the nodes represent instances of one of the concepts described in the PDL crALC and the semantic of the links is described by one of the roles in the PDL crALC. These concept and role must be informed as inputs to the proposed algorithm. The link prediction algorithm is described in Algorithm 1. The algorithm starts looking for all pairs of instances of the concept C defined as the concept that provides the semantic for the network nodes. For each pair it checks whether the corresponding nodes exist in the network (this can be improved by exploring graph-based properties). If not the probability of the link is calculated through the probability of the defined role conditioned on evidence. The evidence is provided by the instances of the ontology. As many instances the ontology have the better is the inference performed. The inference is performed through the Relational Bayesian network build from ontology O. If the probability inferred is greater than a threshold then the corresponding link Require: a network N , an ontology O, the role r( , ) representing the semantic of the network link, the concept C describing the objects of the network and a threshold. Ensure: a revised network Nf 1: define Nf as N 2: for all pair of instances (a, b) of concept C do 3: if does not exist a link between nodes a and b in the network N then 4: infer probability P (r(a, b)|evidences) using the RBN created through the ontology O 5: if P (r(a, b)|evidences) > threshold then 6: add a link between a and b in network Nf 7: end if 8: end if 9: end for Algorithm 1: Algorithm for link prediction through crALC. is added to the network. Alternatively, when the threshold to be considered is not known a priori, a rank of the inferred links based on their probability is done and the top-k, where k would be a parameter, are chosen. 4 Preliminary Results Experiments were run over a collaborative network of researchers. Data was ga- thered from the Lattes curriculum platform 3 , the public repository for Brazilian curriculum researchers. In this platform, every researcher has a unique Lattes code that allows one to link to other researchers according to: shared publica- tions, advising tasks, and examination board participations. Given this collabo- rative network we are interested in predicting further links among researchers in order to either promote further collaborations (suitable co-workers to research tasks would be suggested) or gather information about research groups. Due to form-filling errors there are many missing links among researchers; thus, we are unable to completely state co-working relationships using only the Lattes platform. To tackle link prediction we firstly have collected information about 1200 researchers and learned a probabilistic ontology [15, 17], represented by the PDL crALC, for modeling their research interests. A simplified probabilistic ontology 3 http://lattes.cnpq.br/ is given by: P (Publication) = 0.3 P (Board) = 0.33 P (sharePublication) = 0.22 P (wasAdvised) = 0.05 P (hasSameInstitution) = 0.14 P (sameExaminationBoard) = 0.31 ResearcherLattes ≡ Person u(∃hasPublication.Publication u∃advises.Person u ∃participate.Board) P (PublicationCollaborator | Researcher u ∃sharePublication.Researcher) = 0.91 P (SupervisionCollaborator | Researcher u ∃wasAdvised.Researcher) = 0.94 P (SameInstitution | Researcher u ∃hasSameInstitution.Researcher) = 0.92 P (SameBoard | Researcheru ∃sameExaminationBoard.Researcher) = 0.92 P (NearCollaborator | Researcher u ∃sharePublication.∃hasSameInstitution. ∃sharePublication.Researcher) = 0.95 FacultyNearCollaborator ≡ NearCollaborator u ∃sameExaminationBoard.Researcher P (NullMobilityResearcher | Researcher u ∃wasAdvised. ∃hasSameInstitution.Researcher) = 0.98 StrongRelatedResearcher ≡ Researcher u (∃sharePublication.Researcher u ∃wasAdvised.Researcher) InheritedResearcher ≡ Researcher u (∃sameExaminationBoard.Researcher u ∃wasAdvised.Researcher) In this probabilistic ontology concepts and probabilistic inclusions denote mu- tual research interests. For instance, a PublicationCollaborator inclusion refers to Researchers who shares a Publication, thus relates two nodes (Researcher) in a col- laboration graph. Therefore, the concept Researcher and the role sharePublication are inputs to the algorithm we proposed in Algorithm 1. To perform inferences and therefore to obtain link predictions, a proposition- alization step (a resulting relational Bayesian network) is required. In addition, a collaboration graph, based on shared publications, was also defined. Statistical information was computed accordingly. Figure 2 depicts col- laborations among 303 researchers. Several relationships and clusterings can also be observed. If we carefully inspect this collaboration graph (Figure 3 shows a subgraph obtained from Figure 2) we could be interested, for instance, in predicting links among researchers from different groups. Thus, in Figure 3 one could further investigate whether a link between re- searcher R (red octagon node) and the researcher B (blue polygon node) is suitable. In order to infer this, the probability of a possible link between R and Nilton César Furtado Canto Iberê Luiz Caldas Vanessa Oliva Araujo Luiz Henrique Alves Monteiro André Alves Ferreira Renato Teodoro Ramos Henrique Schützer Del Nero José Guilherme de Souza Chaui Mattos Berlinck Eduardo Massad 1 1 Fabiane Bizinella Nardon 1 Atila Madureira Bueno 1 1 1 1 Paulo Bandiera Paiva Marilene de Pinho Santos Mazza 1 Vivian Rodrigues Fiales 1 1 Denis Roberto Zamignani Paulo Sérgio Licciardi Messeder Barreto José Roberto Castilho Piqueira 1 1 1 1 1 Eduardo Moacyr Krieger 1 Griselda Esther Jara de Garrido 1 1 1 Sergio Shiguemi Furuie 1 Cecil Chow Robilotta 1 Andre Fabio Kohn Jose Jaime da Cruz 1 Silvio Ikuyo Nabeta Luiz Lebensztajn Renato Carlson Pedro Pereira de Paula Emilio Del Moral Hernandez Mauricio Barbosa de Camargo Salles Maurício Caldora Costa 1 Ivan Eduardo Chabu Marcos Antonio Ruggieri Franco Roberto Moura Sales Sérgio Luís Lopes Verardi 1 Mario Leite Pereira Filho 1 1 1 1 1 1 Yasmara Conceicao De Polli Migliano Wanderlei Marinho da Silva 1 1 Bernardo Pinheiro de Alvarenga 1 1 Antonio Carlos de Jesus Paes 1 Paulo Roberto do Lago Helene João Cyro André 1 1 1 Luiz Eduardo Teixeira Ferreira 1 Ricardo Amorim Einsfeld Fabio Henrique Pereira 1 Djonny Weinzierl 1 1 1 José Roberto Cardoso 1 1 Pedro Afonso de Oliveira Almeida Luiz Fernando Campos Ramos Martha 1 1 Viviane Cristine Silva José Augusto Pereira da Silva 1 1 Jefferson Bettini Tulio Nogueira Bittencourt 1 1 José Alberto Cerri Gilson Fujii Marcos Cramer Esteves Fernando Rebouças Stucchi 1 1 Ricardo Hauch Ribeiro de Castro 1 1 Samuel Marcio Toffoli 1 Jaime Tupiassú Pinho de Castro José Umberto Arnaud Borges Eduardo Parente Prado Reginaldo Muccillo Edson Roberto Leite Marco Antonio Meggiolaro Jose Deodoro Trani Capocchi Joaquim Bento Cavalcante Neto 1 Ing Hwie Tan Carlos Renato Rambo 1 1 1 1 Neylor Antunes Stevo 1 Julio Cezar Adamowski 1 1 1 1 1 Emílio Carlos Nelli Silva Reinaldo Morabito Neto Carlos Shiniti Muranaka Murilo Daniel de Mello Innocentini Douglas Gouvêa 1 Victor Carlos Pandolfelli Guilherme Canuto da Silva Gracinda Marina Castelo da Silva 1 1 Kazuo Nishimoto 1 1 Jose de Anchieta Rodrigues Helmut Born 1 1 1 1 Maria Cristina Motta de Toledo 1 1 Jun Okamoto Junior Giuliana Ratti Ronaldo Câmara Cozza 1 Frank Patrick Missell 1 Raimundo Carlos Silvério Freire Jose Renato Coury Wildor Theodoro Hennies 11 Walman Benicio de Castro 1 Antonio Carlos Vieira Coelho Fernando Resende Marjorie Benegra 1 Leonardo Tolaine Massetto Marcos de Sales Guerra Tsuzuki 1 Paulo Carlos Kaminski Viviane Carillo Ferrari Rafael Giuliano Pileggi 1 Alexandre Magno Barros de Freitas Rodrigo Magnabosco 1 Julio Carlos Teixeira Alex Kenya Abiko 1 1 Silvio Burrattino Melhado Lilia Mascarenhas Sant Agostino 1 Izabel Fernanda Machado 1 1 1 Thiago Melanda Mendes 1 Lucas Antonio Moscato 1 André Rocha Studart Geraldo Caixeta Guimarães Tathyana Moratti Flávio Buiochi 1 1 1 Julio Cesar Klein das Neves Mário Sousa Takeashi Suzilene Real Janasi Vahan Agopyan 1 1 1 Heliana Comin Vargas Newton Kiyoshi Fukumasu 1 1 1 1 1 Vanessa Gomes da Silva 1 1 Raul Gonzalez Lima 1 Paulo Eigi Miyagi Marcio Gustavo Di Vernieri Cuppari 1 1 1 Roberto Cesar de Oliveira Romano Fábio Cardoso Chagas 1 1 Henrique Kahn Anna Helena Reali COSTA 1 1 1 1 Márcia Terra da Silva 1 1 1 1 Carina Ulsen 1 1 Silvia Maria de Souza Selmo 1 Fernando Jose Gomes Landgraf 1 1 1 1 1 Dairo Hernán Mesa Grajales Heloísa Cristina Fernandes Giuseppe Pintaúde 1 1 Renato Seiji Tavares 1 Jose Daniel Biasoli de Mello 1 Sérgio Cirelli Angulo 11 Miguel Aloysio Sattler 1 1 1 1 1 1 1 1 1 Mauro Zilbovicius 1 Flavio Beneduce Neto 1 Carlos Torres Formoso Andrea Murillo Betioli Roberto Martins de Souza 1 1 1 1 1 1 Deniol Katsuki Tanaka Orestes Marraccini Goncalves Daniel Rodrigues 1 1 Ivan Gilberto Sandoval Falleiros 1 1 1 1 1 1 Humberto Naoyuki Yoshimura Vanderley Moacyr John 1 1 1 1 Helio Goldenstein 1 1 1 Ricardo Fuoco 1 1 1 1 1 1 Mercia Maria Semensato Bottura de Barros 1 1 Anne Komatsu Agopyan 1 1 Adriana Gouveia Rodrigo 1 Erika Paiva Tenório de Holanda 1 1 1 1 1 1 1 1 1 Francisco Ferreira Cardoso 1 1 Cristiane Ueda 1 Ubiraci Espinelli Lemes de Souza 1 1 Andre Paulo Tschiptschin 1 Racine Tadeu Araujo Prado Neide Matiko Nakata Sato Amilton Sinatora 1 1 1 1 1 Artemária Coêlho de Andrade Miguel Angelo de Carvalho 1 1 Adriano Gameiro Vivancos 1 1 1 1 1 Ailton Camargo Sara Aida Rodriguez Pulecio 1 Brunoro Leite Giordano 1 1 Maristela Gomes da Silva 1 1 1 1 1 Cleber Marcos Ribeiro Dias 1 Henrique Lindenberg Neto Renato Rufino Xavier Marcela Paula Maria Zanin Meneguetti Antonio Domingues de Figueiredo Wilson Luiz Guesser 1 Wang Shu Hui 1 1 Abel André Cândido Recco 1 1 11 Arnaldo Manoel Pereira Carneiro 1 Bruno Luís Damineli 1 1 1 1 1 Mônica de Fátima Lobão Granero Neusa Alonso-Falleiros 1 Paulo Roberto Mei 1 Fernando Henrique Sabbatini Cristian Camilo Viáfara Arango Eliana Kimie Taniguti 1 Linilson Rodrigues Padovese Marcelo Nelson Páez Carreño Marco Giulietti Maria Elena Santos Taqueda 1 Stephan Wolynec 1 1 Idalina Vieira Aoki 1 Marco Antônio Penido de Rezende Maria Aparecida Godoy Soler Pajanian Cláudia Nascimento de Jesus Inés Pereyra 1 1 Célia Aparecida Lino dos Santos 1 1 Celso Pupo Pesce 1 1 Cláudio Vicente Mitidieri Filho 1 Henrique Eisi Toma 1 Gerson dos Santos 1 1 1 Joao Batista de Almeida e Silva Leni Campos Akcelrud 1 Patricia Alexandra Antunes 1 Fred Borges da Silva 1 1 1 1 Geraldo Narciso da Rocha Filho 11 Lucia Helena Terra Guerra 1 1 1 1 Viviane Miranda Araujo 1 Leoberto Costa Tavares Adnei Melges de Andrade 1 1 Tiago Martinello 1 11 Susana Ines Cordoba de Torresi Assis Vicente Benedetti Joao Paulo Sinnecker Vitor Angelo Fonseca Deichmann Patricia Hatsue Suegama Attilio Converti 1 1 1 1 1 1 1 André Rolim Baby Elfriede Marianne Bacchi 1 Carmen Cecilia Tadini 1 1 1 1 1 1 1 Carmen Cecilia Tadini 1 Antonio Riul Júnior John Paul Hempel Lima Luis Fernando Peffi Ferreira 1 1 1 1 1 1 1 1 1 Fernando Josepetti Fonseca 1 Salvador Pinillos Gimenez Rudolf Theoderich Bühler 1 1 Hercílio Gomes de Melo 1 1 1 1 1 Milene Galeti 1 1 Rodrigo Andrade Martinez 1 1 Paulo Sergio Santos 1 Victor Sonnenberg 1 Elizabete Wenzel de Menezes 1 1 1 1 1 1 1 1 1 1 1 Luciano Mendes Camillo Roberto Guardani 1 1 1 1 1 Antônio Otávio de Toledo Patrocínio 1 1 1 1 1 Claudio Augusto Oller do Nascimento 1 1 1 Flavio Cesar Cunha Galeazzo Marcelo Antonio Pavanello 1 Osvaldo Chiavone Filho Javier Telis Romero Pedro Vitoriano de Oliveira 1 1 1 1 Miguel Alexandre Novak 1 Sebastiao Gomes dos Santos Filho 1 João Antonio Martino 1 1 1 Pedro de Alcântara Pessôa Filho 1 1 Rodrigo Fernando Bianchi 1 Pricila Veiga dos Santos 1 Carlos Eduardo Calmanovici Aparecido Sirley Nicolett 1 1 1 1 Ely Antonio Tadeu Dirani 1 Cinthia Tiemi Muranaka Frank Herbert Quina Isolda Costa 1 1 1 1 1 Patrícia Ponce Antonio Luis Pacheco Rotondaro 1 1 1 1 Susy Frey Sabato Marystela Ferreira Suzana Caetano da Silva Lannes Alan Rodrigo Navia Mitiko Saiki 1 1 1 1 Marcello Bellodi Jorge Andrey Wilhelms Gut 1 Jose Mauricio Pinto Michelly de Souza Michele Rodrigues Aurea Yuki Sugai 1 1 Maysa Terada Isabela Bellot de Souza Will 11 11 Paulo Cesar de Morais Antonio Esio Bresciani Marina Magnani Efraim Cekinski Rocio del Pilar Bendezú Hernández Rinaldo Gregorio Filho 1 Risomá Chaves Marcel Dupret Lopes Barbosa Fernando Morais dos Reis Amilcar Machulek Junior Katia Ribeiro Antonio Carlos Silva Costa Teixeira Everaldo Carlos Venancio Sylvana Cavedon Presti Migliavacca 1 Airton Jose de Luna Esleide Lopes Casella 1 Reinaldo Giudici 1 1 1 1 1 1 1 1 Pedro Henrique Hermes de Araujo Tah Wun Song Heizir Ferreira de Castro Fabio Bentes Freire Alexandre Eliseu Stourdze Visconti Ariovaldo Bolzan Teresa Cristina Brazil de Paiva Celso Ricardo Denser Pamboukian 1 Aldo Tonso 1 1 Mateus Meneghesso da Conceição 1 1 1 1 1 Fabiano Henrique Marques 1 1 1 1 Andreas Karoly Gombert Mari Cleide Sogayar Valéria Mendes Rodas Irene Fernandes Soraia Attie Calil Jorge Felipe Santiago Chambergo Alcalde Carlos Augusto Pereira Patricia Barros dos Santos Ari José Scattone Ferreira Fig. 2. Collaboration graph among researchers. B is calculated, P (link(R, B)|E), where E denotes evidence about researchers such as publications, institution, examination board participations and so on. The role sharePublication is the one defining the semantic of the links in the graph. Therefore, it is through it that we must calculate P (link(R, B)|E). Since the concept PublicationCollaborator is defined by the role sharePublication and considering as evidence Researcher(R) u ∃hasSameInstitution.Researcher(B) one can infer P (link(R, B)|E) through: Viviane Cristine Silva José Augusto Pereira da Silva Ivan Eduardo Chabu Giuseppe Pintaúde Célia Aparecida Lino dos Santos Bernardo Pinheiro de Alvarenga Newton Kiyoshi Fukumasu Cristian Camilo Viáfara Arango Yasmara Conceicao De Polli Migliano Fabio Henrique Pereira Roberto Moura Sales Daniel Rodrigues 1 Helio Goldenstein 1 1 1 1 Marcio Gustavo Di Vernieri Cuppari Wilson Luiz Guesser 1 1 1 1 1 Wanderlei Marinho da Silva Jose Jaime da Cruz 1 1 1 Julio Cesar Klein das Neves 1 Mauricio Barbosa de Camargo Salles 1 Renato Rufino Xavier 1 1 1 Sérgio Luís Lopes Verardi 1 1 1 Amilton Sinatora 1 1 Linilson Rodrigues Padovese Marcelo Nelson Páez Carreño José Roberto Cardoso 1 1 1 1 1 Pedro Pereira de Paula Renato Carlson Paulo Roberto Mei 1 1 1 1 1 Andre Paulo Tschiptschin 1 1 Roberto Martins de Souza 1 Andrea Murillo Betioli 1 Silvio Ikuyo Nabeta 1 1 1 Ricardo Fuoco Vanessa Gomes da Silva 1 Miguel Angelo de Carvalho Carlos Shiniti Muranaka Inés Pereyra 1 Antonio Carlos de Jesus Paes 1 Antonio Domingues de Figueiredo Celso Pupo Pesce Artemária Coêlho de Andrade Thiago Melanda Mendes Ivan Gilberto Sandoval Falleiros Marcos Antonio Ruggieri Franco Emilio Del Moral Hernandez Deniol Katsuki Tanaka Orestes Marraccini Goncalves Djonny Weinzierl Mario Leite Pereira Filho Henrique Kahn Vahan Agopyan Luiz Lebensztajn Cleber Marcos Ribeiro Dias Maurício Caldora Costa 1 1 1 Brunoro Leite Giordano 1 1 Sérgio Cirelli Angulo 1 Anne Komatsu Agopyan 1 1 Maria Elena Santos Taqueda 1 Racine Tadeu Araujo Prado 1 1 1 1 1 Silvio Burrattino Melhado Márcia Terra da Silva Mário Sousa Takeashi Neide 1 Matiko Nakata Sato Vanderley Moacyr John 1 Fernando Resende 1 Fred Borges da Silva 1 Carina Ulsen 1 1 Silvia Maria de Souza Selmo 1 1 1 1 Roberto Cesar de Oliveira Romano Viviane Miranda Araujo Carlos Torres Formoso Ubiraci Espinelli Lemes de Souza 1 1 1 1 1 1 1 1 1 Maristela Gomes da Silva Marcela Paula Maria Zanin Meneguetti Miguel Aloysio Sattler Rafael Giuliano Pileggi Mercia Maria Semensato Bottura de Barros 1 1 1 Adriana Gouveia Rodrigo 1 Henrique Lindenberg Neto 1 1 Antonio Carlos Vieira Coelho 1 Bruno Luís Damineli Alex Kenya Abiko Francisco Ferreira Cardoso 1 Heliana Comin Vargas Arnaldo Manoel Pereira Carneiro 1 Heloísa Cristina Fernandes Joao Paulo Sinnecker Antonio Riul Júnior Fernando Henrique Sabbatini 1 John Paul Hempel Lima 1 1 1 Rinaldo Gregorio Filho Patricia Alexandra Antunes Leni Campos Akcelrud 1 Paulo Sergio Santos 1 1 1 1 1 Mauro Zilbovicius 1 1 1 1 1 Ely Antonio Tadeu Dirani 1 Marco Antônio Penido de Rezende Cláudia Nascimento de Jesus 1 Everaldo Carlos Venancio Tathyana Moratti 1 1 1 Gerson dos Santos Fernando Josepetti Fonseca 1 Antônio Otávio de Toledo Patrocínio 1 1 1 1 Marystela Ferreira Adriano Gameiro Vivancos 1 1 Leonardo Tolaine Massetto Rodrigo Fernando Bianchi 1 1 Eliana Kimie Taniguti Vitor Angelo Fonseca Deichmann Adnei Melges de Andrade Paulo Cesar de Morais Miguel Alexandre Novak Anna Helena Reali COSTA Erika Paiva Tenório de Holanda Cláudio Vicente Mitidieri Filho Maria Aparecida Godoy Soler Pajanian Rodrigo Andrade Martinez Fig. 3. Collaboration subgraph. P (PublicationCollaborator(R) |Researcher(R) u∃hasSameInstitution.Researcher(B)) = 0.57. If we took a threshold of 0.60, the link between R and B would not be included. One could gain more evidence, such as information about nodes that in- directly connect these two groups (Figure 3), denoted by I1 , I2 . The inference would be P (PublicationCollaborator(R) |Researcher(R) u∃sharePublication(I1 ).∃sharePublication(B) u∃sharePublication(I2 ).∃sharePublication(B)) = 0.65. Because more information was provided the probability inferred was different. The same threshold now would preserve the link. Other inferences are possible by considering the suggestion of links between surrounding nodes, i.e. nodes directly linked to the two nodes R and B , denoted by R1 , . . . , Rk , and B1 , . . . , Bn respectively. For each i = 1, ..., k and j = 1, ..., n, calculates P (link(Ri , Rj )|E) and P (link(Bi , Bj )|E). As a rule, if we are interested in discovering whether A and B could be linked, probabilistic inference P (link(A, B)) should be performed. In a more general framework, graph information could be useful to deal with a large number of link predictions. Note that graph adjacency allow us to address probabilistic inference for promising nodes. In a naive approach, each pair of nodes in the collaboration graph would be evaluated so, multiple probabilistic relational Bayesian inference calls would be required. On the other hand, if graph-based information is used, such naive scheme could be improved. In our approach, two nodes are probabilistically evaluated if there is a path between them (number of incoming/outgoing edges, number of mutual friends, node distances are also considered). Thus, numerical graph- based information guides the inference process in the relational Bayesian network (linked to the probabilistic ontology). In addition, other candidates sharing any kind of evidence are also evaluated, i.e., interests based features (linked to onto- logical knowledge) allow us to further explore link prediction. Alternatively, by completing an overall link predicting task we can devise further functionalities to the resulting collaboration network. The resulting graph can be considered as being a probabilistic network, i.e., probabilities inferred for each link could be denote strenght of the relationship. 5 Conclusion We have presented an approach for predicting links that resorts both to graph- based and ontological information. Given a collaborative network, e.g., a social network, we encode interests and graph features through a crALC probabilistic ontology. In order to predict links we resort to probabilistic inference. Prelimi- nary results focused on an academic domain, and we aimed at predicting links among researchers. These preliminary results showed the potential of the idea. Previous combined approaches for link prediction [3, 1] have focused on ma- chine learning algorithms [14]. In such schemes, numerical graph-based features and ontology-based features are computed; then both features are input into a machine learning setting where prediction is performed. Differently from such approaches, in our work we adopt a generic ontology (instead of a hierarchi- cal ontology, expressing only is-a relationships among interests). Therefore, our approach uses more information about the domain to help the prediction. Acknowledgements The third author is partially supported by CNPq. The work reported here has received substantial support by FAPESP grant 2008/03995-5. References 1. W. Aljandal, V. Bahirwani, D. Caragea, and H.W. Hsu. Ontology-aware classifica- tion and association rule mining for interest and link prediction in social networks. In AAAI 2009 Spring Symposium on Social Semantic Web: Where Web 2.0 Meets Web 3.0, Standford,CA, 2009. 2. F. Baader and W. Nutt. Basic description logics. In Description Logic Handbook, pages 47–100. Cambridge University Press, 2002. 3. D. Caragea, V. Bahirwani, W. Aljandal, and W. H. Hsu. Ontology-based link prediction in the livejournal social network. In Symposium on Abstraction, Refor- mulation and Approximation, 2009. 4. F. G. Cozman and R. B. Polastro. Loopy propagation in a probabilistic descrip- tion logic. In Sergio Greco and Thomas Lukasiewicz, editors, Second International Conference on Scalable Uncertainty Management, Lecture Notes in Artificial In- telligence (LNAI 5291), pages 120–133. Springer, 2008. 5. F. G. Cozman and R. B. Polastro. Complexity analysis and variational inference for interpretation-based probabilistic description logics. In Conference on Uncertainty in Artificial Intelligence, 2009. 6. N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In Proc. 16th Int. Joint Conference on Artificial Intelligence, pages 1300– 1309, 1999. 7. D. Heckerman, C. Meek, and D. Koller. Probabilistic entity-relationship models, prms, and plate models. In In Proceedings of the 21st International Conference on Machine Learning, 2004. 8. J. Heinsohn. Probabilistic description logics. In International Conf. on Uncertainty in Artificial Intelligence, pages 311–318, 1994. 9. W.H. Hsu, A.L. King, M.S.R. Paradesi, T. Pydimarri, and T. Wneinger. Collab- orative and structural recommendation of friends using weblog-based social net- work analysis. In Proceedings of Computational Approaches to Analysing WebLogs (AAAI), 2006. 10. M. Jaeger. Probabilistic reasoning in terminological logics. In Principals of Know- ledge Representation (KR), pages 461–472, 1994. 11. M. Jaeger. Relational Bayesian networks: a survey. Linkoping Electronic Articles in Computer and Information Science, 6, 2002. 12. D. Liben-Nowell and J. Kleinberg. The link prediction problem for social net- works. In Proceedings of the twelfth international conference on Information and Knowledge Management, pages 556–559, New York, NY, USA, 2003. ACM. 13. T. Lukasiewicz and U. Straccia. Managing uncertainty and vagueness in description logics for the semantic web. Web Semant., 6:291–308, November 2008. 14. T. Mitchell. Machine Learning. McGraw-Hill, New York, 1997. 15. J. Ochoa-Luna, K. Revoredo, and F.G. Cozman. Learning sentences and assess- ments in probabilistic description logics. In Bobillo, F., et al. (eds.) Proceedings of the 6th International Workshop on Uncertainty Reasoning for the Semantic Web, volume 654, pages 85–96, Shangai, China, 2010. CEUR-WS.org. 16. R. B. Polastro and F. G. Cozman. Inference in probabilistic ontologies with at- tributive concept descriptions and nominals. In 4th International Workshop on Uncertainty Reasoning for the Semantic Web (URSW) at the 7th International Semantic Web Conference (ISWC), Karlsruhe, Germany, 2008. 17. K. Revoredo, J. Ochoa-Luna, and F. Cozman. Learning terminologies in proba- bilistic description logics. In Antônio da Rocha Costa, Rosa Vicari, and Flavio Tonidandel, editors, Advances in Artificial Intelligence SBIA 2010, volume 6404 of Lecture Notes in Computer Science, pages 41–50. Springer / Heidelberg, Berlin, 2010. 18. M. Sachan and R. Ichise. Using semantic information to improve link prediction results in network datasets. International Journal of Computer Theory and Enge- neering, 3:71–76, 2011. 19. F. Sebastiani. A probabilistic terminological logic for modelling information re- trieval. In ACM Conf. on Research and Development in Information Retrieval (SIGIR), pages 122–130, 1994. 20. B. Taskar, M. FaiWong, P. Abbeel, and D. Koller. Link prediction in relational data. In Proceedings of the 17th Neural Information Processing Systems (NIPS), 2003. 21. T. Wohlfarth and R. Ichise. Semantic and event-based approach for link predic- tion. In Proceedings of the 7th International Conference on Practical Aspects of Knowledge Management, 2008. 22. K. Yu, W. Chu, S. Yu, V. Tresp, and Z. Xu. Stochastic relational models for discriminative link prediction. In Proceedings of the Neural Information Processing Systems (NIPS), 2006.