=Paper= {{Paper |id=Vol-1398/SocInf2015_Paper4 |storemode=property |title=Influential Analysis in Micro Scholar Social Networks |pdfUrl=https://ceur-ws.org/Vol-1398/SocInf2015_Paper4.pdf |volume=Vol-1398 |dblpUrl=https://dblp.org/rec/conf/ijcai/WeigangDSL15 }} ==Influential Analysis in Micro Scholar Social Networks== https://ceur-ws.org/Vol-1398/SocInf2015_Paper4.pdf
     Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
     July 27th, 2015 - Buenos Aires, Argentina




                          Influential Analysis in Micro Scholar Social Networks

              Li Weigang1,2, Icaro Araújo Dantas1, Ahmed Abdelfattah Saleh2, Daniel L. Li3
               1
              TransLab, Department of Computer Science, University of Brasilia, Brasilia, Brazil
          2
            PPMEC, Department of Mechanical Engineering, University of Brasilia, Brasilia, Brazil
                          3
                            Coleman Research, Raleigh, North Carolina-NC, USA
      weigang@unb.br, icaro.a.dantas@gmail.com, ahmdsalh@yahoo.com, daniel.lezhi.br@gmail.com

                          Abstract                                          In Google Scholar, the most cited paper is “A short history
                                                                          of SHELX” [Sheldrick, 2007] with 49,792 citations. The
    Scholar citation is a basic activity in scientific
    community. Some academic search engines have                          authors who cited this paper have composed a special society
    been developed in Web such as Google Scholar and                      or a network. Understanding the relations in this society is
    Microsoft Academic Search. Efficient flexible                         valuable to the researchers.
    querying method is essential for researchers to ef-                     In this massive academic network, efficient querying
    fectively follow trends within related topics of their                models of academic search engines or databases is crucial for
    research field. In this paper, we propose a procedure                 a researcher to conduct his research while following up the
    to construct Micro Scholar Social Networks (MSSN)                     development trends in a specific research topic of particular
    from Google Scholar and then develop a querying                       scientific field.
    and ranking method to find the influential re-                          There are two problems that should be deeply studied: 1)
    searchers or articles in MSSN. An extension to the                    developing efficient method and system (in Web tool level)
    Follow Model (Extended Follow Model) is pro-                          to construct Micro Scholar Social Network (MSSN) for an
    posed in this paper and applied to describe the pa-                   especial topic or field from large scholar social networks,
    per-citation and author-follow relationships. It is                   such as Google Scholar, Microsoft Academic Search, Web of
    also coupled with different ranking algorithms,                       Science or others; 2) developing efficient mining algorithms
    namely, PageRank, AuthorRank and InventorRank                         to analyze this MSSN for scholar´s diversity objectives.
    to study a MSSN in Air Traffic Management. The                          In literature, some research developed the mining methods
    case study shows that Extended Follow Model is                        of heterogeneous information networks [Sun et al. 2012].
    robust and efficient for ranking and mining a het-                    Ahmedi et al. focused on the study of the property of the
    erogeneous academic network. In spite the fact that                   Co-authorship Networks [Ahmedi et al., 2011].
    study was done on Google Scholar, but the pro-                          In recent years, many researches proposed solutions to
    posed data mining method is applicable for other                      these problems. Liu et al. [2005] demonstrated AuthorRank.
    academic search engines.                                              AMiner has been developed by [Tang et al., 2008] as a
                                                                          scholar platform with the database and search interface.
1   Introduction                                                          W-entropy was proposed to measure the influence of the
With the development of Internet technology and applica-                  members from social networks [Weigang et al., 2011].
                                                                          Sandes et al. [2012] introduced the concept of Follow Model
tions, there are at least 114 million English-language schol-
                                                                          for the development of advanced queries on social networks.
arly documents accessible on the Web [Khabsa and Giles,
                                                                          Du et al. [2015] demonstrated the way of analyzing im-
2014]. The term “scholarly documents” here refers to journal              portance of nodes in heterogeneous networks.
and conference papers, books, dissertations and theses,                     In this paper, Extended Follow Model (EMF), an extension
technical reports and working papers. The size of scholarly               to the Follow Model presented by [Sandes et al, 2012], is
documents accessible through the web differs from one                     proposed. EMF is applied to describe the paper-citation and
academic search engine to the other; Google Scholar1, for                 author-follow relationships. It is also coupled with different
example, comprises nearly 100 million scholarly documents                 ranking algorithms, namely, PageRank, AuthorRank and
and also available advanced search for general consulting.                InventorRank to study a MSSN in Air Traffic Management.
                                                                          The case study shows that Extended Follow Model is robust
                                                                          and efficient for ranking and mining a heterogeneous aca-
                                                                          demic network. In spite the fact that the MSSN used in this
1 http://scholar.google.com/                                              study was constructed using Google Scholar, but the study is
Copyright © 2015 for the individual papers by the papers' authors.        applicable, as well, for other academic search engines.
Copying permitted for private and academic purposes. This volume
is published and copyrighted by its editors.




                                                                     22
     Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
     July 27th, 2015 - Buenos Aires, Argentina




2   Micro Scholar Social Networks (MSSN)                               relevant to their paper. As such, the follow relation, as per the
                                                                       Follow Model introduced by [Sandes et al, 2012], can be used
This section introduces the concept of Micro Scholar Social            to describe the citation relation between authors.
Networks (MSSN) and explains its basic elements and rela-
                                                                        • Followee (Citation relation): Citations of a paper p are
tions using Google Scholar as an example. In a MSSN,
                                                                           those papers that were cited by p. The authors of these
authors and publications are considered the main objects.
However, in addition to authors, there are also editor of the              papers are followee of the authors of p.
journal, editor of the book, conference chairs. As for publi-           • Follower (Cited In relation): Cited Ins of a paper p are
cations, there are journal papers, conference papers, books,               those papers that cited p. The authors of these papers are
book chapters, reports and others. In this study, we use paper             followers of the author of paper p.
to refer to all different types of publications, and author for         • R-Friends (Both-cited relation): Two papers a and b are
all contributors. The specification of other objects will be               considered Both-Cited, in the case that paper a cites b and
considered in our future data mining studies.                              paper b cites a. The authors of those two papers are
   The “Micro Scholar” term refers to a specific research                  R-Friends.
field, while “Social Network” term refers to the network                • Self-Following (Self-Citation): If an author has cited one
constructed from the relations between papers and authors of               of his own papers.
that field. Figure 1 shows the MSSN constructed from 165               Co-author relation between authors
papers and 249 authors of the “Air Traffic Management                    Another important relation among authors is that of
(ATM)” research field. As seen in figure 1, MSSN-ATM is                  co-authorship. Where, for any particular paper, there are
constructed of directed graphs, whose vertices are papers                one or more authors. The relationship among those authors
(sub-graph a) or authors (sub-graph b) while the edges rep-              can be referred to as co-author. An author may be a
resent the relations among those elements.                               co-author for several authors in one or more papers. This
                                                                         paper presents a weighing formula to assign a representitve
                                                                         weight for each author depending on the order of
                                                                         authorship of different papers.

                                                                       2.2 Types of MSSN
                                                                       Scientific papers are characterized by multiple attributes (e.g.
                                                                       authors, venue, publish time, editor of the journal, editor of
                                                                       the book, conference chairs, etc.) in addition various relations
                                                                       among these attributes. As such, MSSN is considered a kind
                                                                       of heterogeneous information network that contains multiple
Figure 1: ATM-MSSN of Google Scholar; a) 165 papers and                types of elements and links [Sun et al., 2012], [Kim and
their citations; b) Follow relations of 249 authors.                   Leskovec, 2012].
                                                                         According to the nature of the elements used to construct a
2.1 Elements and Relations in MSSN                                     Micro Scholar Social Networks, MSSN’s can be divided into
                                                                       two types; i) Homogenous MSSN, with vertices (nodes)
The basic elements in MSSN are papers, authors, venue and              created using the same elements (i.e. papers or authors); and
publishers. This paper focuses on the information related to           ii) Heterogeneous MSSN, where the networks vertices in-
paper and author. Citation and co-authoring are the core               clude different classes of elements (i.e. papers and authors)
relations between papers and authors. As such, a MSSN is               with their subsequent relations.
constructed using the citation relations among papers and
authors, in addition to the co-author relations among different        2.2.1 Constructing Homogenous MSSN
authors. These relations among MSSN elements can be                    Regarding the relations in MSSN, there are citation relations
explained as follows:                                                  (citation, cited in and both cited) between papers, as well as
Relations between papers                                               co-author relations and Follow relations (followee, follower,
 • Citation relation: Citations of a paper p are those papers          r-friends) between authors. These relations form three sets of
    that were cited by p.                                              homogenous MSSN’s.
 • Cited In relation: Cited Ins of a paper p are those papers          A. ATM-MSSN-Papers
    that cited p.                                                      Figure 1(a) shows a MSSN of Air Traffic Management
 • Both-cited relation: Two papers a and b are considered              research topic, which is represented by a graph whose ver-
    Both-Cited, in the case that paper a cites b and paper b           tices are papers and the citation relation is its edges. The data
    cites a.                                                           reflected in this graph was collected from Google Scholar in
Follow relation between authors                                        January 26, 2015.
   Citation relation can be extended to describe the relation             To create the graph, a citation relation matrix Pc is intro-
among authors. Where, when a paper cites another paper, in             duced. Pc is a square matrix of size (N × N), where N is the
actuality, authors are simply citing authors with prior studies        total number of papers in the ATM-MSSN. pcij is an element




                                                                  23
     Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
     July 27th, 2015 - Buenos Aires, Argentina




of the Pc matrix, with i, j= 1,2,…N. As such, if a paper i cited        a matrix of size (M × N), with M is the total number of au-
a paper j then pcij = 1.                                                thors in the model, while N is the total number of papers. The
                                                                        matrix element paij represents the weight of author i in
B. ATM-MSSN-Authors
                                                                        writing the paper j. In other words, if a paper j has only one
Figure 1(b) shows a graph with follow relations among 249
                                                                        author I then paij = 1. While for a paper i written by more than
authors. A close relation can be observed between the two
                                                                        one author, then the value of paij depends on the order of
subfigures, where, subfigure 1(b) is based on 1(a), the only
                                                                        authors who wrote that paper [Du et al., 2015], the equation is
difference is that one paper can be written by more than one
                                                                                       
                                                                        modified as
author.
                                                                                 1                       1
                                                                                                                                
   The author relation matrix A is introduced to create this
                                                                                                                            1
                                                                                            1
graph. A is a square matrix of size (M × M), where M is the
                                                                                                              1
                                                                                       1
total number of authors in the ATM-MSSN. aij is an element
of the A matrix, with i, j= 1,2,…M. such, aij represents the
number of times an author i follows an author j.
                                                                        B. ATM-MSSN-Author-CoAuthor graph
C. ATM-MSSN-CoAuthors                                                   Other type of heterogonous graphs is that constructed using
Figure 2(a) shows the co-author relations among 249 authors.            the same class element (e.g. authors) and then connected by
An author can be a co-author with more than one author.                 different types of relations (e.g. follow and co-author rela-
   The co-author relation matrix Ac is introduced to create             tions).
this graph. Ac is a square matrix of size (M × M), where M is
the total number of authors in the ATM-SSN. acij is an ele-                Therefore, constructing MSSN’s graphs is the core step in
ment of the matrix Ac, with i, j= 1,2,…M. such, acij repre-             building an efficient data-mining model that is used to per-
sents the number of papers in which authors i and j are                 form complex analytical queries. As such, suitable hetero-
co-authors.                                                             geneous and/or homogenous graph representation is selected
                                                                        to achieve the intended goal of the mining study. These
                                                                        graphs could be extended to include various attributes (con-
                                                                        ferences, journals, publishers, etc.) in order to develop a
                                                                        data-mining model that is capable of analyzing the relations
                                                                        among these attributes.

                                                                        3   Extended Follow Model and Querying
                                                                        In this section we extend the Follow Model, introduced by
                                                                        [Sandes et al. 2012], as the best way to model MSSN’s and
Figure 2: a) Homogenous ATM-MSSN-CoAuthors; b)                          perform effective queries. In addition, PageRank and other
Heterogenous ATM-MSSN.                                                  ranking methods can be coupled with Extended Follow
                                                                        Model (EFM) to perform advanced queries.
2.2.2. Constructing Heterogeneous MSSN
Heterogeneous MSSN is formed of multiple elements and/or                3.1 Extended Follow Model (EFM)
relations. As such, a heterogeneous MSSN can be con-                    MSSN can be best described in the form of directed graph G
structed using multiple elements (e.g. papers and authors)              = (V, E) where the vertices set V represents the papers and/or
and connected using a particular relationship and/or multiple           authors, while the directed edges E: V×V represents the

                                                                        more types of relations; E can be noted as Ea, Ep or  ∪ ! .
relations for the same element class (e.g. co-author and                relations between them. For heterogeneous MSSN, there are
follow relations for authors).
A. ATM-MSSN-Author-Paper graph                                          The author follow relation (v, u) ∈ Ea means that author v
Figure 2(b) shows ATM-MSSN, which is a heterogeneous                    follows author u, and the graph Ga = (Va, Ea) presents the
MSSN constructed using two classes of elements, papers and
                                                                        author relationship. While, (a, b) ∈ Ep means that paper a
authors. Where, authors’ nodes are represented by red circles
while the papers’ nodes are represented by blue triangles.              cites paper b, and the graph Gp = (Vp, Ep) presents the paper
The citation relation connects the paper nodes, while follow            relationship.
relation connects the authors. The two classes of nodes are                As such, the Extended Follow Model (EFM) can effi-
then connected together using weighted co-authorship rela-              ciently describe the relations between the MSSN classes as
tion, where every author-paper edge has a weight that reflects          mentioned in section 2.1. Where two authors can be related
the level of involvement (order) of this author in the au-              as either; followee, follower or r-friends while two papers are
thorship of that paper as seen in equation 1. According to              related as cited (followee), cited in (follower) or both-cited
figure 2(b), a matrix PA can be used to present the relations           (r-friends).
between authors and papers in the ATM-MSSN model. PA is                    Using these relations, one can construct data subsets for




                                                                   24
     Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
     July 27th, 2015 - Buenos Aires, Argentina




big data querying. These data subsets can be extracted using              •   An author of a paper p may be interested in the papers

                                                                              used, *  ( , where P(p) is the set of all papers
the following functions:                                                      that cited his paper. In this case Follow model can be
    fout(u)={v|(u,v)∈Ea},                                    (2)
    where, fout(u) is the followee function to present the subset,            that cited the paper p (Cited Ins).
                                          →
Vout, of all followees, v, of author u, Va Vout, Va⊂V; |fout(u)|          •   The same author may be interested in listing authors who

                                                                              +  ( , where A(p) is the set of authors that cites
                                                                              follow him, therefore according to follow model,
of author u; "#$ &={p(v)|(u,v)∈Ea}, p(v) is a value of the
                 %
is the number of the elements (authors) in the followee subset
                                                                              paper p (i.e. followers of the author).

author etc. "#$  &={w(v)|(u,v)∈Ea}, w(v) is a weight value
author v, such as the order in the subset or h-index of the
                '
                                                                          •   One of many other interesting queries in the same con-
                                                                              text is obtaining the list of papers cited by those papers
of the link between the authors u and v, such as the number of                that cited p, or in other words, the list of followees of the

                                                                              query using +  "#$ ( 
citations etc.                                                                author’s followers A(p). Follow model can simplify this
    fin(u)={v|(v,u)∈Ea},                                     (3)

                                                                              papers that cites a particular paper  (cited by his paper
    where, fin(u) is the follower function to present the subset,         •   In addition an author may be interested in the set of
                                          →
Vin, of all followers, v, of author u, Va Vin, Va⊂V; |fin(u)| is
                                                                              as well) in addition to his paper p. As such, f(pi) = fin(p)
author; ( &={p(v)|(v,u)∈Ea}, p(v) is a value of the author
            %
the number of the elements (authors) in the follower subset of
                                                                              ∩
                                                                              paper p in addition to  .
                                                                                 fin(pi), where P(pi) is the set of all papers that cited his


(' &={w(v)|(u,v)∈Ea}, w(v) is a weight value of the link                  Using Follow model, *  "#$ , where P(p) is the
v, such as the order in the subset or h-index of the author etc.          •   Other users may be interested in the citations of paper p.

between the authors u and v, such as the number of citations.                 set of the papers that were cited by p.
               ∩
    fr(u)=fout(u) fin(u),                                    (4)          •   Other queries include: finding out the set of top-x (x may
    where, fr(u)is the r-friend function to present the subset,               be 5 or more) papers, in terms of number of Cited Ins, for
                                      →
Va, of all r-friends of author u, Va Vr, Va⊂V. |fr(u)| is the
                                                                              query as *  ( "#$ , , where fout(p) is a func-
                                                                              the papers cited by p. Follow model can present this


author; ) &={p(.)}, p(.) is a value of the r-friends of author
            %
                                                                              paper p; ( -.,is a function generating the top 5 pa-
number of the elements (authors) in the r-friend subset of                    tion containing the set of papers (Pc) that were cited by


)' &={w(.)}, w(.) is a weight value of the link between the
u, such as the order in the subset or h-index of the author etc.              pers, cited papers of the set Pc, that have the highest
                                                                              number of Cited-Ins.
author u and his f-friends, such as the number of co-author,              •   Also, finding out the set of top-x papers, in terms of the
etc.                                                                          influence of the paper or the authors, for the papers that

                                                                              present this query as *  "#$ ( /0 ,
    With these basic definitions, EFM has both numeric |f(.)|                 were cited by the papers that cited p. Follow model can

                                                                              where(  is a function containing the set of papers
and symbolic f(.) representations for more sophisticated

                                                                              (Pc) that cited paper p; "#$ -./0 is a function
relationships between users.
    The Follow Model is also characterized by three properties:
reverse relationship, compositionality, and extensibility                     generatingthe top 10 papers, that were cited by papers of
[Sandes et al. 2012, Weigang et al., 2014]. Joining functions                 the set Pc, that have the highest influence. Different
                                                                              ranking algorithms, explained in the following section,
allow us to create many other relationship functions. For
                                                                              can be used to determine the influence of papers and

( & represents the followers of followers of u; "#$      &
example: finfout(u) represents the followers of followees of u;
                                                            
                                                                              authors.

represents the followees of followees of u; ) & represents
                                                  
                                                                          4   Influential Scholar Ranking Models
the r-friends of r-friends of u.
                                                                          Ranking algorithms can be used to find the influential
    In this research, beside EFM is applied as a querying
                                                                          scholars in a MSSN. This section presents three ranking
method for MSSN-AUTHOR, it is also used in
                                                                          methods: PageRank, AuthorRank and InventorRank. All
MSSN-PAPER by three functions: 1) fout(p) is a function to
                                                                          these models are presented in the form of Extended Follow
present all papers which are cited by paper p; 2) fin(p) is a
                                                                          Model.
function to present all the papers which cited the paper p; and
3) fr(p) is a function to present the paper p´s both-cited, which         4.1 PageRank and AuthorRank
are papers that cited p and were cited by p.
                                                                          PageRank [Brin and Page, 1998] can be presented in the form

                                                                                                              4( 
                                                                                                                  %
3.2 Querying Google Scholar using Follow Model                            of EFM as follows:

                                                                                   *1  1 –    3                    8
                                                                                                           |"#$ 6( 7|
EFM can be applied for querying in MSSN from Google                                                                                (5)
Scholar or Microsoft Academic Search to satisfy the needs of
users of these academic search engines. For example:




                                                                     25
     Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
     July 27th, 2015 - Buenos Aires, Argentina




where, i is an author, (  is the set of the values of all
                                %

authors who linked (followed) to i, and 4( is the sum
                                              %
                                                                                      Note that AuthorRank was not modified to use this tech-

of the values in this set. |"#$ 6( 7| is the number of
                                                                                   nique due to the fact that this method is restricted to classi-
                                                                                   fication of a network that contains only authors.
followers of the followee of i.
   On the other hand, AuthorRank is an indicator of the                            5    ATM-MSSN Ranking Case Study
impact of an individual author in the network [Liu et al.                          In this section, the ranking case study in ATM-MSSN is
2005]. This algorithm is considered as an improvement of                           described in details, to demonstrate how EFM can be coupled
PageRank algorithm. Where, weights of nodes represent the                          with three classification algorithms; PageRank, AuthorRank
number of times by which an author was co-author with                              and InventerRank; to achieve effective querying. It also
another. Using EFM, AuthorRank can be represented as                               demonstrates the application of SJR to obtain influential
                                           ' 
                                         (
follows:                                                                           rankings.
      +1  1 –    94 3:                      ; .  8=
                                                          %
                                     4"#$ (  (
                                                                                      In figure 2(b), Heterogenous ATM-MSSN includes a total
                                         '                             (6)
                                                                                   of 249 authors and 165 papers. Citation relationships

  Where, the >A  ?@.. ? represents the weights of the followee
                                                                                   between the papers (where an article cites others) and

of i and >ABCD ?@ ? represents the weights of the followers
              >
                                                                                   between the authors (an author cites / follows others) are
                                                                                   illustrated. There is also another kind of relationship between
of the followee of i.                                                              authors and articles which is defined as the co-authoring
                                                                                   relationship.
4.2 InventorRank                                                                      For all these tests, the parameters of the models are defined

                                                                                   parameter d was set at 0.5. For the parameters, αOO , αOQ e αQO ,
In the study of the data model of the inventor-ranking                             by the following pattern: For PageRank and AuthorRank,
framework, [Du et al. 2015] demonstrates how to perform

                                                                                   I and I% were both set as 0.5. The classification values
analysis of important nodes in heterogeneous networks. EFM                         the values were 0.4, 0.4 and 0.2 respectively. The
is used to present one of the three rules described by [Du et al.
2015] for determining influential authors based on                                 were        all       obtained       from       SJR´s       site:
co-authorship. Where, highly ranked authors tend to                                http://www.scimagojr.com/. If there is no journal, it is
co-author with other highly ranked authors. The first rule of                      considered as null classification and does not influence the
InventorRank is determined using the following equation:                           ranking calculation.

 1 E  F GH46)' . ) . I 7  1 E1   I . |) E|JK 7
                            %
                                                                                   5.1 Ranking authors and papers without SJR
                                                                                   The first result obtained is related to the author´s ranking in
where Ri(k) is the rank of author k, fr(k) is the set of the all                   ATM-MSSN. The ranking results are diferent due to the
co-authors of k. See other rules in [Du et al. 2015].                              diferent characteristics of each model.
                                                                                      From Table 1, it is possible to observe that each profile at
4.3 Adjusting PageRank and Inventor Rank with                                      the top of the ranking reflects the characteristics that most
     SJR                                                                           affect the model.
Based on the fact that articles are usually published in some                               Table 1. Author´s Ranking for ATM-MSSN
events or journals, González-Pereira [2010] proposes a way
to classify the influence of a journal, based on the weight of                     Ranking     AuthorRank        InventorRank         PageRank
citations and eigenvector centrality, in heterogeneous net-                           1         J F Butler          E Ferons          A R Odoni
works, this model is called SCImago Journal Rank (SJR).
                                                                                       2       H N Psaraftis         L Kang           D Trivizas
Siebelt et al. [2010] e Macedo et al. [2010] are examples of
some researchers that suggest using SJR to calculate the                               3          G Roger           J P Clarke       H N Psaraftis
importance of an article. It is suggested that, like Du et al.                         4            Dear           B Delcairet         E P Gilbo
[2015] used the classification of journals for classifying
authors, it is possible to use periodicals classification as a                         5       B G Sokkappa         W D Hall             Dear
mean for calculating journals or authors importance. As such                           6          L Gippo            H Idris            G Roger
this paper proposes adding journals weights in the previously                          7           M Cini           R Bhuva          D J Bertsimas
defined classification algorithm. Thus, modified equations
can be as follows:                                                                     8       S S Patterson        AR Odoni          A Hormann
                                                                                       9         C F Dayl          R Hoffman         S S Patterson
PageRank:
            SR(i) = PR(i) * SCImago_Rank                              (8)              10       G Andreatta       D J Bertsimas     B G Sokkappa

          1% M = 1% M * SCImago_Rank
InventorRank:
                                                                       (9)           Author A R Odoni appears first according to PageRank
                                                                                   because his papers have the most citations in ATM-MSSN, at




                                                                              26
     Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
     July 27th, 2015 - Buenos Aires, Argentina




a total of 13 citations (followees). E Ferons, best ranked by          even published as they were simply graduate dissertations or
InventorRank model, co-authored with A R Odoni, B                      internal reports within the institution.
Delcairet, H Idris, J P Clarke, W D Hall e B Delcairet, all
                                                                           Table 3 Top 10 Authors Ranking Considering SJR
well ranked authors in ATM-MSSN. This shows that
although A R Odoni received many citations, a total of 89                           Ranking                Name
(followers), it was not significant enough to affect his                               1                A R Odoni
ranking because those that work with him are not the best                              2                 M O Ball
ranked in the network. In case of AuthorRank, J F Butler                               3                 W D Hall
received top ranking because his papers received most of the                           4               D J Bertsimas
citations from A R Odoni, G Andreatta and B G Sokkapa.                                 5                 E Ferons
   Table 2, which lists the Top 5 papers by PageRank and                               6                R Hoffman
InventorRank. For InventorRank, the paper´s ranking is                                 7                  G Lulli
affected mostly by the authors’ influence, and vise-vesa. This
                                                                                       8                  H Idris
correlation is not observed in PageRank.
                                                                                       9                J P Clarke
 Table 2. Top 5 papers by PageRank and InventorRank                                   10              G L Nemhauser

                                                                          On the other hand, the InventorRank demonstrates its
                                                                       robustness even with the addition of different features to the
                                                                       network. Its ranking of papers remained consistent and very
                                                                       similar to the first classification in table 2, where SJR was not
                                                                       taken into consideration.
                                                                           Table 4. Top 5 Papers Considering SJR in Ranking




5.2 Ranking authors and papers considering SJR
When analyzing the best papers ranked by InventorRank, it
is observed that the position of the authors influence dictates        6    Conclusion
the paper's ranking. PageRank does not yield similar results           This paper presents the construction of Micro Scholar Social
and relations, because it is not well structured to evaluate           Networks (MSSN) for specific research topic using Google
heterogeneous networks. The tests below analyze how these              Scholar academic search engine. Extended Follow Model
models operate when adding new characteristics to the                  (EFM) was proposed as a comprehensive way of creating
network.                                                               efficient data-mining model for querying homogenous and
   From table 3, it is possible to observe that the                    heterogeneous MSSNs. By means of the advantages, EFM
InventorRank differed in the ranking of a few others, which            was coupled with ranking algorithms to achieve a full que-
indicates that its results obtained in table 1 may not be              rying and ranking models for scholarly documents of Google
representative. Table 3 shows that the weight of SJR for the           Scholar.
journals where E Ferons’ articles were published did not                 Comparing the results of such algorithms shows that
contribute to his rankings. Instead, A R Odoni , M O Ball and          InventorRank is a much more robust and accurate model,
others are well ranked.                                                especially when considering the amount of information used
   Comparing Table 2 to Table 4, we observe that PageRank              for classification and ranking. Where, InventorRank can be
is altered significantly when considering SJR. This was a              easily adapted to adding new features to the network, in
result of either the quality of the journals where the articles        addition to its flexibility resulted from the ability to set the
were published, or in some cases where the articles were not




                                                                  27
     Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
     July 27th, 2015 - Buenos Aires, Argentina




degree of importance of each term of the algorithm. On the             [Macedoa et al., 2010] Luciana G. Macedoa, Mark R.
other hand, changes in the network directly affect classifica-            Elkinsb, Christopher Mahera, Anne M. Moseleya, Robert
tion ability of Pagerank algorithm.                                       D. Herberta, Catherine Sherrington. There was evidence
  It is worth mentioning that Extended Follow Model pro-                  of convergent and construct validity of Physiotherapy
vides a simple and efficient means for representing several               Evidence Database quality scale for physiotherapy trials.
existing ranking models. Such representation facilitates the              Journal of Clinical Epidemiology, 63(8): 920-925.
coding routine for developers, as complex equations are                [Sandes et al., 2012] Edans F. Sandes, Li Weigang and Alba
represented as an easy to understand and code algorithms.                 C. de Melo. Logical Model of Relationship for Online
  This study also showed that the ranking system can be                   Social Networks and Performance Optimizing of Queries.
further modified to consider the level of influence of the                In Proceedings of Web Information Systems Engineering,
journal where the paper is published. However, such                       pages 726-736, Paphos, Cyprus, Nuvember 2012.
modification requires caustion as models such as PageRank                 Springer Berlin Heidelberg.
is very sensitive to alterations, and may incorrectly classify         [Sheldrick, 2007] George M. Sheldrick. A short history of
articles of high quality.                                                 SHELX. Acta Crystallographica Section A: Foundations
  Thanks to the mentioned authors and their papers in                     of Crystallography, 64(1): 112-122, 2007.
ATM-MSSN from Google Scholar.                                          [Siebel et al., 2010] Michiel Siebel, Teun Siebelt, Peter Pilot,
                                                                          Rolf Bloem, M. Bhandari and Rudolf Poolman. Citation
References                                                                analysis of orthopaedic literature; 18 major orthopaedic
[Brin and Page, 1998] Sergey Brin and Lawrence Page. The                  journals compared for Impact Factor and SCImago. BMC
   anatomy of a large-scale hypertextual Web search engine.               Musculoskeletal Disorders, 11(1), 4, 2010.
   Computer networks and ISDN systems, 30(1): 107-117,                 [Sun et al., 2012] Yizhou Sun, Jiawei Han, Charu C.
   1998.                                                                  Aggarwal, Nitesh V. Chawla. When will it happen?:
[Ahmedi et al., 2011] Lule Ahmedi, Lejla Abazi-Bexheti,                   relationship prediction in heterogeneous information
   Arbana Kadriu. A Uniform Semantic Web Framework for                    networks. In Proceedings of the fifth ACM international
   Co-authorship Networks. In Proceedings of IEEE Ninth                   conference on Web search and data mining. Pages:
   International Conference on Dependable, Autonomic and                  663–672, New York, NY, USA, 2012. ACM.
   Secure Computing, pages 958-965. Sydney, Australia,                 [Tang et al., 2008] Jie Tang, Jing Zhang, Limin Yao, Juanzi
   2011. IEEE.                                                            Li, Li Zhang, Zhong Su. Arnetminer: extraction and
[Du et al., 20015] Yong-ping Du, Chang-qing Yao, Nan Li.                  mining of academic social networks. In Proceedings of
   Using Heterogeneous Patent Network Features to Rank                    the 14th ACM SIGKDD international conference on
   and Discover Influential Inventors. To appear in                       Knowledge discovery and data mining, pages 990-998,
   Frontiers of Information Technology & Electronic                       August 2008. ACM.
   Engineering, 2015. doi:10.1631/FITEE.1400394                        [Weigang et al., 2014] Li Weigang, Edans F. Sandes, Jianya
[González-Pereira et al., 2010] Borja González-Pereira,                   Zheng, Alba C. de Melo, and Lorna Uden. Querying
   Vicente P. Guerrero-Bote, and Félix Moya-Anegón. A                     dynamic communities in online social networks. Journal
   new approach to the metric of journals’ scientific                     of Zhejiang University – Science C, 15(2):81–90, 2014.
   prestige: The SJR indicator. Journal of informetrics, 4(3):         [Weigang et al., 2014] Li Weigang, Jianya Zheng, and Daniel
   379-391, 2010.                                                         Li. W-entropy index: the impact of members on social
[Hirsch, 2005] Jorge E. Hirsch. An index to quantify an                   networks. In Proceedings of the Web Information Systems
   individual's scientific research output. In Proceedings of             and Mining Conference, LNCS 6987 (I), pages 226–233,
   the National academy of Sciences of the United States of               Taiyuan, China, 2011. Springer Berlin Heidelberg.
   America, 102(46), 16569-16572, 2005.                                [Yu and Van de Sompel, 1965] Pingkang Yu and H. Van de
[Khabsa and Giles, 2014] Madian Khabsa and C. Lee Giles.                  Sompel, Networks of scientific papers. Science,169:
   The Number of Scholarly Documents on the Public Web.                   510-515, 1965.
   PloS one, 9(5): e93949, 2014.
[Kim and Leskovec, 2012] Myunghwan Kim and Jure
   Leskovec. Multiplicative Attribute Graph Model of
   Real-World Networks. Internet Mathematics, 8(1-2):
   113–160, 2012.
[Liu et al., 2005] Xiaoming Liu, Johan Bollen, Michael L.
   Nelson, Herbert Van de Sompe. Co-authorship networks
   in the digital library research community. Information
   processing & management, 41(6): 1462-1480, 2005.




                                                                  28