=Paper=
{{Paper
|id=Vol-2591/paper-03
|storemode=property
|title=New Datasets and a Benchmark of Document Network Embedding Methods for Scientific
                        Expert Finding
|pdfUrl=https://ceur-ws.org/Vol-2591/paper-03.pdf
|volume=Vol-2591
|authors=Robin Brochier,Antoine Gourru,Adrien Guille,Julien Velcin
|dblpUrl=https://dblp.org/rec/conf/birws/BrochierGGV20
}}
==New Datasets and a Benchmark of Document Network Embedding Methods for Scientific
                        Expert Finding==
<pdf width="1500px">https://ceur-ws.org/Vol-2591/paper-03.pdf</pdf>
<pre>
                                                BIR 2020 Workshop on Bibliometric-enhanced Information Retrieval


 New Datasets and a Benchmark of Document Network
   Embedding Methods for Scientific Expert Finding

          Robin Brochier1 , Antoine Gourru1 , Adrien Guille1 , and Julien Velcin1

                               Université de Lyon, Lyon 2
                                    ERIC EA3083
                      {firstname}.{lastname}@univ-lyon2.fr


        Abstract. The scientific literature is growing faster than ever. Finding an expert
        in a particular scientific domain has never been as hard as today because of the
        increasing amount of publications and because of the ever growing diversity of
        expertise fields. To tackle this challenge, automatic expert finding algorithms rely
        on the vast scientific heterogeneous network to match textual queries with poten-
        tial expert candidates. In this direction, document network embedding methods
        seem to be an ideal choice for building representations of the scientific literature.
        Citation and authorship links contain major complementary information to the
        textual content of the publications. In this paper, we propose a benchmark for ex-
        pert finding in document networks by leveraging data extracted from a scientific
        citation network and three scientific question & answer websites. We compare the
        performances of several algorithms on these different sources of data and further
        study the applicability of embedding methods on an expert finding task.


1     Introduction

Many tools offer to search and filter the vast data sources available on the Web. In par-
ticular, there is a multitude of platforms directed to the scientific community. From the
simple search engine for publications to the social network for researchers, all consume
and produce valuable data for searching scientific content of interest. Expert finding
is one the the most challenging problem that finds application in both academia and
the industry. To tackle this challenge, recent advances in document network embed-
ding (DNE) has the potential to inspire new unsupervised models that can deal with the
heterogeneous network of documents of the scientific literature. However, the design
of such efficient algorithms heavily depends on the development of strong evaluation
frameworks.
    In this paper, we propose a methodology and provide 4 datasets that extend the lim-
ited scope of expertise retrieval evaluation frameworks. Furthermore, we provide exper-
iment results computed with unsupervised methods and we extend document network
embedding algorithms to this specific task.
    Our contributions are the following:

    Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons Li-
    cense Attribution 4.0 International (CC BY 4.0). BIR 2020, 14 April 2020, Lisbon, Portugal.


                                                16
                                            BIR 2020 Workshop on Bibliometric-enhanced Information Retrieval


2         R. Brochier et al.

    – we provide 4 datasets for expert finding extracted from a scientific publication net-
      work and three question & answer (Q&A) websites and make them publicly avail-
      able 1 ;
    – we describe an evaluation methodology based on the ranking of expert candidates
      given a set of labeled document queries;
    – we report experiment results that give some insights on this expert finding task;
    – we explore and analyze the use of state-of-the-art document network embedding
      algorithms for expert finding and we show that further research is needed to bridge
      the gap between DNE methods and expert finding.

    The rest of the paper is organized as follows. In Section 2, we survey related works.
We detail in Section 3 our evaluation methodology, the datasets we extracted, the evalu-
ation measures and the algorithms we use. In Section 4 we show and analyze the results
of our experiments. Finally, in Section 5, we discuss our findings and provide future
directions.


2     Related Works
In this section, we first present a formal definition for expert finding. Then we present
algorithms of the literature that address expert finding. Finally, we describe recent meth-
ods for document network embedding that have the potential to deal with this particular
task.

2.1    Formal definition of expert finding
The concept of expert finding can cover a large range of tasks. The main principle be-
hind expertise retrieval is the search for candidates given a query. To match these two,
an algorithm will be provided with some data to link the output space, a ranking of can-
didates, with the input space, which is often a textual content. However, many different
types of data can be considered to address this challenge. To fairly compare algorithms,
we choose a fixed structure for the data which reflects common use cases. Furthermore,
if supervised methods benefit from labeled fields of expertise associated with the candi-
dates, they are beyond the scope of this paper which focuses on unsupervised methods
only. Our goal is to compare methods that do not require sometimes costly annotations.
     Early works in expert search [8] usually consider a small set of topical queries. The
direct namings of these topics are used to retrieve a list of candidates by leveraging
a collection of documents they published (e.g., emails, scientific papers). This type of
evaluation is used across several public datasets [15, 16, 26].
     More recently, the concept of expert finding has been merged into the wider con-
cept of entity retrieval [1]. As more and more complex data are produced on the Web,
expert finding becomes a particular application of entity search. At the same time, Q&A
websites such as Stack Overflow 2 generate and make publicly accessible a big amount
of questions with expert answers, collaboratively curated by their users. Several works
 1
     https://github.com/brochier/expert_finding
 2
     https://stackoverflow.com/


                                             17
                                           BIR 2020 Workshop on Bibliometric-enhanced Information Retrieval


                       Datasets and Benchmark of DNE Methods for Expert Finding          3

address the search for experts in such websites [20,28]. Often, the task consists in either
finding the exact list of users who answered a specific question or ranking the answers
according to the user votes. In the first case, the task involves considering the evolution
of the users across time and, in the second case, the task involves understanding the
intrinsic quality of a written answer. Nevertheless, [25] reviews several models for ex-
pert finding in Q&A websites. Their experiments show that matrix factorization-based
methods perform better than tree based and ranking based methods.
     In this paper, we adopt the document-query methodology recently proposed in [3].
The expert search is performed given a set of queries that are particular textual instances
of some expert topics (or fields of expertise). Given a query, an algorithm should rank
first the candidates that are associated to the same fields of expertise. We provide 4
datasets for which we annotated experts and document queries. Each dataset consists
in candidates and documents linked by authorship relations (candidate-document e.g.
authorship) and by response relations (document-document e.g citation or answer). A
query is therefore one of the documents (e.g. a scientific paper or a question) for which
we aim to retrieve some experts of the topics depicted in it. This configuration reflects
many real case scenarios such as (1) the automatic search for scientific reviewers, (2) the
recommendation of expert users in Q&A websites or even (3) the retrieval of interesting
profiles for job offers.


2.2   Algorithms for expert finding

Numerous works have addressed automatic expertise retrieval. We describe here the
main approaches and some interesting recent methods. P@noptic Expert [7] creates
meta-documents for a candidate by concatenating the contents of all documents she
produced. In this manner, ranking the candidates given a query becomes a similarity
search between the query representation and the meta-documents representations. A
voting model [14] computes the similarities between the query and the documents. The
algorithm then aggregates these scores at the candidate level by using a fusion tech-
nique such as the reciprocal rank [27]. A propagation model [21] takes advantage of the
links between candidates and documents to propagate the similarities between the query
and the documents. Using random walks with restart [17], the iterative propagation of
the scores converges in a few steps to a stationary distribution over the candidates.
WISER [6] models each candidate as a small, weighted, sub-graph of the Wikipedia
Knowledge Graph. Information derived from these graphs and traditional document
retrieval techniques are combined to identify experts w.r.t a query. Note that methods
leveraging external data are out of the scope of our benchmark. LT Expertfinder is an
evaluation framework for expert finding [11] based on an interactive tool. It integrates
various existing algorithms (such as [1]) in a user-friendly way. The underlying corpus
used by this tool is the ACL Anthology Network. However, it does not include a well-
established ground truth to assess who are the experts. Indeed, the evaluation is purely
done in an online manner since the user has to evaluate the degree of expertise based on
several features, such as author’s citations, h-index, keywords, etc. Recent works [9,23]
propose ad hoc embedding techniques, whereas, in this work, we’re interested in mea-
suring the performance of conventional network embedding techniques.


                                           18
                                          BIR 2020 Workshop on Bibliometric-enhanced Information Retrieval


4       R. Brochier et al.

2.3   Document network embedding

Network embedding [12, 19] provides an efficient approach to represent nodes in a low
dimensional vector space, suitable for solving various machine learning tasks. Recent
techniques extend NE for document networks. Text-Associated DeepWalk (TADW)
[24] extends DeepWalk to deal with textual attributes. Yang et al. prove, following the
work in [13], that Skip-Gram with hierarchical softmax can be equivalently formulated
as a matrix factorization problem. TADW then consists in constraining the factoriza-
tion problem with a pre-computed representation of the documents by using Latent
Semantic Analysis (LSA) [10]. Graph2Gauss (G2G) [2] is an approach that embeds
each node as a Gaussian distribution instead of a vector. The algorithm is trained by
passing node attributes through a non-linear transformation via a deep neural network
(encoder). GVNR-t [4] is a matrix factorization approach for document network em-
bedding, inspired by GloVe [18], that simultaneously learns word, node and document
representations by optimizing a least-square objective over a co-occurrence matrix of
the nodes constructed by truncated random walks. IDNE [5] introduces a Topic-Word
Attention mechanism, trained from the connections of a document network, to represent
documents as mixtures of topics.
    DNE algorithms do not directly apply to expert finding data since they are not de-
signed to handle multiple types of nodes, in particular candidate nodes. In this paper,
we show (1) two methods to extend their applicability to the task of expert finding and
(2) the impact of their representations when they are used as document representations
for traditional expert finding algorithms.


3     Evaluation Methodology

We present in this section the evaluation methodology that we follow to access the
performances of several algorithms for expert finding. We first describe the task we seek
to solve, then we describe the datasets that we extracted and explain how we annotated
them in order to access the quality of the algorithms’ outputs. Finally, we detail the
models used in our experiments.


3.1   Ranking expert candidates from document queries

Expert finding is a complex task that can be formalized in multiple ways. Early works
define this task as a ranking problem given several topic-queries where the naming of
these topics are directly used as queries to retrieve the expert candidates. However, in
many real world applications, a user is asked to provide a specific and detailed query.
In a Q&A website for instance, a user usually exposes the problem she faces in full
detail and does not necessarily know the exact naming of the fields of expertise needed
to solve her problem. Furthermore, querying an algorithm with a small set of topic-
queries can lead to poor evaluation measures due to the usually small number of fields
of expertise associated with the dataset. For this reasons, we follow the document-query
evaluation methodology proposed in [3] by processing 4 datasets for which a set of
document-queries is manually annotated.


                                          19
                                          BIR 2020 Workshop on Bibliometric-enhanced Information Retrieval


                       Datasets and Benchmark of DNE Methods for Expert Finding         5

    The expert finding task in this paper is a ranking problem. Given a document labeled
with a ground truth set of fields of expertise, an algorithm is queried to rank a set
of candidates, among which a subset of experts are associated with the same set of
labels. The data provided to the algorithms consists in a corpus of nd documents D,
nc candidates C, a network of authorship with adjacency matrix Adc ∈ Nnd ×nc and
a network of documents with adjacency matrix Add ∈ Nnd ×nd . Figure 1 shows an
hypothetical dataset used in this paper. The ranking is performed in an unsupervised
setting, that is, no ground truth labels of expertise are given to the algorithms. The
set of labeled documents (the queries) can be smaller than nd and the set of labeled
candidates (experts) can be smaller than nc (i.e not all documents and candidates are
labeled).


                                                      Expertise labels


              n
     C                 C1            C2          C3             C4          C5
              n
    Adc
               n
    D                D1         D2          D3         D4            D5       D6

               n
    Add


Fig. 1: Hypothetical example of an expert finding dataset we use in this paper. 5 can-
didates are authors of 6 documents. The 6 documents are connected to each other by
citation in a scientific corpus, or by answer in a same post in a Q&A website. Among
the candidates, 3 are known to be experts in stars and/or in circles. 4 documents are as-
sociated to these 2 fields of expertise as well. In our evaluation methodology, we query
an algorithm with these 4 documents and expect a ranking of candidates that will match
each document’s fields of expertise. As an example, a perfect algorithm might generate
the rankings D1 7→ C3 C4 C5 C1 C2 and D6 7→ C4 C5 C3 C2 C1 .


    To evaluate the candidate scores provided by the algorithms, we compare the result-
ing rankings with the ground truth fields of expertise. If a document is associated with
three different labels, we expect the algorithm to rank first all experts associated to at
least one of these labels. We report the area under the ROC curve (AUC), the precision


                                           20
                                           BIR 2020 Workshop on Bibliometric-enhanced Information Retrieval


6       R. Brochier et al.

at 10 (P@10) and the average precision (AP) and we compute their standard devia-
tion along the queries. That is, we evaluate the robustness of the algorithms against the
variety of document-queries.


3.2   Datasets

We consider 4 datasets. The first one is an extract of DBLP [22] in which a list 199
experts in 7 fields are annotated [26] by human judgments 3 . Our dataset only considers
the annotated experts and the other candidates that are close in the co-authorship net-
work which explains the relatively small size of our network compared to the original
one. In addition to the expert annotations, our evaluation framework requires document
annotations since we adopt the document-query methodology for expertise retrieval. We
asked two PhD students in computer science to associate independently 20 randomly
drawn documents per field of expertise (140 in total). Then, only the labels on which
the two annotators agreed were kept, leaving 114 annotated papers. The mean Cohen’s
kappa coefficient across the labels is 0.718. An advantage of our methodology is that
we can evaluate the algorithms on more queries (114 documents) than the traditional
method (7 labels). This allows us to assess the robustness of the algorithms by com-
puting the standard deviations of the ranking metrics along all queries. However, one
might suggest that these 7 labels do not reflect a representative set of expertise as there
are too broad. For this reason, we seek for a wider granularity of expertise by the use of
well-know question & answer website.
    If scientific publication networks are easy to find on the Web, scientific expertise
annotations are rarely available for both authors and publications. We use data down-
loaded in June 2019 from Stack Exchange 4 to create datasets for expert finding col-
lected from three communities closely related to research. Academia 5 is dedicated to
academics and higher education. Mathoverflow 6 gathers professional mathematicians
and is widely used by researchers. Stats 7 (also known as Cross Validated) addresses
statistics, machine learning and data mining issues. For each dataset, we first keep ques-
tions with at least 10 user votes that have at least one answer with 10 user votes or more.
We build the networks by linking questions with their answers and by linking answers
with the users who published them. The field of expertise are the tags associated with
the questions. Only the tags that occur at least 50 times are kept. We annotate an expert
with the tags of a question if her answer to that question received at least 10 votes. Note
that the tags are first provided by the users who ask the questions but they are thereafter
verified by experimented users.
    The general properties of our 4 datasets are presented in Table 1. The annotations
and the preprocessed datasets are made publicly available.

 3
   https://lfs.aminer.cn/lab-datasets/expertfinding/#expert-list
 4
   https://archive.org/details/stackexchange
 5
   https://academia.stackexchange.com/
 6
   https://mathoverflow.net/
 7
   https://stats.stackexchange.com/


                                           21
                                          BIR 2020 Workshop on Bibliometric-enhanced Information Retrieval


                       Datasets and Benchmark of DNE Methods for Expert Finding            7


                      Table 1: General properties of the datasets.
             # candidates # documents # labels # experts # queries   label example
DBLP              707         1641        7       199       114 ’information extraction’
Stats            5765        14834       59      5765      3966 ’maximum-likelihood’
Academia         6030        20799       55      6030      4214 ’recommendation-letter’
Mathoverflow     7382        38532       98      7382     10614 ’galois-representations’


3.3   Algorithms
We run the experiments with 4 baseline algorithms and 4 document network embedding
algorithms. The laters are adapted with two aggregation schemes in order to deal with
the candidates since they are primarily designed for document network only. These ag-
gregations are arbitrary and are voluntarily the most straightforward way to run DNE al-
gorithms on bipartite networks of authors-documents. We further discuss these choices
in section 4.

Baselines We run the experiments with the same models as in [3], using the tf-idf
representations and the cosine similarity measure. Also, we add a random model to
have reference metrics:

 – Random model: we randomly draw scores between 0 and 1 for each candidate;
 – P@noptic model [7]: we concatenate the textual content of each document as-
   sociated to the candidates, use their tf-idf representations and compute the cosine
   similarity to produce the scores;
 – Voting model [14]: we use the reciprocal rank to aggregate the scores at the candi-
   date level;
 – Propagation model [21]: we concatenate the two adjacency matrices Adc and
   Add to constructa transition matrix between candidates and documents such that
           Add Adc
   A=                 . The initial scores are the cosine similarities between the tf-idf
           A|dc 0
   representations of the query and the documents. The scores are propagated itera-
   tively until convergence with a restart probability of 0.5.

   We also run the voting and propagation models using document representations
produced by IDNE in place of the tf-idf vectors. The document network provided to
IDNE has adjacency matrix Ad = Adc A|dc + Add .

Extending DNE algorithms for expert finding DNE methods usually operate in net-
works of documents, with no candidate nodes. To apply them in the context of expert
finding, we propose two straightforward approaches:

 – pre-aggregation: as in the P@noptic model, meta-documents are generated by
   aggregating the documents produced by each candidates. Furthermore, an adja-
   cency matrix of a meta-network between candidates and documents is constructed.
   We compute a candidate network as Ac = A|dc Adc and a document network as


                                          22
                                           BIR 2020 Workshop on Bibliometric-enhanced Information Retrieval


8         R. Brochier et al.
                                                                       
                                                                 Ad Adc
      Ad = Adc A|dc + Add . The meta-network is then A =           |      . The candidate
                                                                Adc Ac
      and document representations are then generated by treating this meta-network and
      the concatenation of the documents and meta-documents as an ordinary instance of
      document network. From this meta-network, we generate representations with the
      DNE algorithms. The scores of the candidates are generated by cosine similarity
      between the representation of the document-query and the representations of the
      candidates;
    – post-aggregation: in this setting, we first train the DNE algorithm on the network
      of documents defined by Ad = Adc A|dc + Add . Once the representations are gen-
      erated for all documents, a representation for a candidate is computed by averaging
      the vectors of all documents associated to her. The scores are then computed by
      cosine similarity.

    We run the experiments with 4 document network embedding algorithms, using the
authors’ implementations. For all methods, the dimension of the representations is set
to 256:
    – TADW [24]: we follow the original paper by using 20 iterations and a penalty term
      λ = 0.2;
    – GVNR-t [4]: we use γ = 10 random walks of length t = 40, a sliding window of
      size l = 5 and a threshold xmin = 2 with 4 iterations;
    – Graph2gauss (G2G) [2]: we make sure the loss function converges before the max-
      imum number of iterations;
    – IDNE [5]: we run all experiments with nt = 32 topic vectors with 5000 balanced
      mini-batches of 16 positive samples and 16 negative samples.


4     Experiment Results
Tables 2 to 5 show the experiments results. In the following, we analyze the perfor-
mances of the aggregation scheme against the baseline algorithms, we highlight the
interesting results obtained when using the baselines with pre-computed document rep-
resentations with a DNE algorithm and finally we make some observations on the dif-
ferences between the datasets. Note that the implementation of TADW, provided by the
authors, could not scale to Mathoverflow.

4.1    Baselines versus DNE aggregation schemes
For all datasets, the propagation model performs generally better than the other al-
gorithms, particularly in terms of precision. Both aggregation schemes yield to poor
results and none of these two methods appear to be better than the other. GVNR-t is
the best algorithm among the document network embedding models. We believe that,
if DNE algorithms are well suited for document network representation learning, the
gap between simple tasks such as node classification and link prediction and the task of
expert finding is too big for our naive aggregation schemes to perform well. Especially,
the network structure changes significantly between an homogeneous network and an


                                            23
                                              BIR 2020 Workshop on Bibliometric-enhanced Information Retrieval


                         Datasets and Benchmark of DNE Methods for Expert Finding                    9


(a) Propagation model with tf-idf representa-          (b) Propagation model with IDNE representa-
tions: the curve has a nice shape which means          tions: the first ranked candidates are good but
the ranking of candidates are good even for the        the algorithm tends to wrongly rank last many
last ranked experts.                                   true experts.

Fig. 2: Effect of IDNE representations on the propagation model. Using document net-
work embeddings significantly damages the rankings.


heterogeneous network. Moreover, expert finding algorithms often benefit from infor-
mation about the centrality of the candidates and documents. DNE algorithms do not
particularly preserve this information neither do our aggregation schemes.


4.2   Using DNE as document representations for the baselines

Since the baseline algorithms perform well, we study the possibility to apply them using
a DNE algorithm for the representations of the documents. We only report the results
with the representations computed with IDNE but we observe the same behaviors with
other DNE models. First, these representations constantly improve the voting model,
which achieves best results in terms of AUC on Stats and Mathoverflow. Then, the most
surprising effect is the significant decrease of performance of the propagation model. If
the precision for the first ranked candidates is not affected, the AUC score significantly
drops for the three Q&A datasets. We believe that document network embeddings cap-
tures too long-range dependencies between the documents in the network, which are
then subsequently exaggerated by the propagation procedure. Figure 2 shows the effect
of the representations used with the propagation model on the ROC curve.


4.3   Differences between the datasets

The results achieved by the algorithms on all three Stack Exchange datasets are consis-
tent. However, they do not behave the same with DBLP. First, DNE methods get closer
scores to the baselines on DBLP. In the Q&A datasets, the interactions are more isolated


                                                  24
                                          BIR 2020 Workshop on Bibliometric-enhanced Information Retrieval


10      R. Brochier et al.

i.e. there are more users having fewer interactions. This difference of network proper-
ties might disadvantage DNE methods who are usually trained on scale-free networks
whose degree distribution follows a power law. Moreover, the propagation method does
not suffer with DBLP from the decrease of performance induced by the IDNE repre-
sentations. We hypothesize that the low number of expertise fields associated with this
dataset largely reduces the effect described in the previous section.


5    Discussion and Future Work
In this paper, we provide experiment materials for expert finding with the help of four
annotated datasets and further report results based on several baseline algorithms. More-
over, we study the ability of document network embedding methods to tackle the expert
finding challenge. We show that DNE algorithms can not be trivially adapted to achieve
state-of-the-art scores. However, we reveal that document network embeddings can im-
prove the voting model but diminish the propagation model.
    In future work, we would like to find an efficient way to bridge the gap between
DNE algorithms and expert finding. To do so, taking the heterogeneity into account
should help better capturing the real similarity between a document and a candidate.
Furthermore, a deeper analysis of the interplay between the candidates and the text
content of the documents appears to be a necessary way to better understand the task of
expert finding.


                                          25
                               BIR 2020 Workshop on Bibliometric-enhanced Information Retrieval


          Datasets and Benchmark of DNE Methods for Expert Finding          11


Table 2: Mean scores with their standard deviations on DBLP
                           AUC            P@10              AP
random                 49.47 (09.80)   05.00 (06.66)   07.09 (03.81)
panoptic (tf-idf)      74.06(12.94)    22.37 (16.35)   23.24 (12.55)
voting (tf-idf)        78.60 (11.97)   26.05 (15.76)   28.24 (13.92)
propagation (tf-idf)   79.26 (13.09)   33.07 (19.61)   34.66 (18.21)
pre-agg TADW           65.84 (12.94)   15.61 (11.63)   17.26 (08.78)
pre-agg GVNR-t         76.90 (11.46)   19.04 (11.70)   21.39 (09.61)
pre-agg G2G            72.87 (12.75)   15.70 (11.62)   18.53 (09.37)
pre-agg IDNE           78.08 (11.27)   20.18 (11.85)   22.00 (09.87)
post-agg TADW          68.01 (13.37)   16.32 (11.57)   18.01 (08.97
post-agg GVNR-t        73.91 (13.93)   18.86 (12.19)   20.57 (10.33)
post-agg G2G           68.94 (15.23)   16.23 (12.02)   18.21 (09.76)
post-agg IDNE          76.87 (13.36)   19.04 (14.57)   21.57 (10.96)
voting (IDNE)          82.23 (11.08)   34.82 (18.46)   37.27 (16.16)
propagation (IDNE)     82.44 (16.14)   44.47 (22.91)   47.01 (22.06)


   Table 3: Mean scores with standard deviations on Stats
                           AUC            P@10              AP
random                 50.01 (02.24)   04.52 (07.02)   04.96 (02.81)
panoptic (tf-idf)      79.47 (06.22)   13.45 (13.39)   15.22 (05.62)
voting (tf-idf)        84.96 (05.22)   52.53 (16.13)   31.01 (06.58)
propagation (tf-idf)   86.33 (05.64)   91.53 (13.44)   44.09 (07.70
pre-agg TADW           63.07 (07.70)   11.42 (12.34)   08.45 (03.87)
pre-agg GVNR-t         70.67 (09.49)   21.12 (20.99)   12.43 (07.30
pre-agg G2G            63.63 (07.62)   12.93 (12.06)   07.81 (04.15
pre-agg IDNE           65.07 (09.05)   13.37 (13.48)   09.40 (05.19)
post-agg TADW          68.74 (07.02)   13.67 (12.59)   09.99 (04.37)
post-agg GVNR-t        66.56 (08.61)   22.47 (15.92)   10.75 (05.42
post-agg G2G           62.53 (07.44)   11.95 (11.86)   07.48 (04.13)
post-agg IDNE          65.63 (08.57)   13.34 (13.13)   09.38 (04.94
voting (IDNE)          86.94 (04.91)   53.91 (18.06)   32.18 (08.33)
propagation (IDNE)     67.62 (10.11)   90.43 (15.20)   33.07 (08.93)


                                26
                                          BIR 2020 Workshop on Bibliometric-enhanced Information Retrieval


12   R. Brochier et al.


          Table 4: Mean scores with standard deviations on Academia
                                     AUC            P@10            AP
          random                 50.02 (01.78)   05.93 (08.07) 06.09 (02.72)
          panoptic (tf-idf)      81.54 (04.36)   18.35 (18.76) 22.93 (07.14)
          voting (tf-idf)        85.88 (03.47)   57.99 (15.87) 37.66 (05.83
          propagation (tf-idf)   88.02 (03.32)   99.01 (03.57) 54.04 (05.44)
          pre-agg TADW           61.47 (06.16)   11.09 (12.04) 09.29 (03.53)
          pre-agg GVNR-t         64.22 (09.69)   25.67 (23.27) 13.07 (07.54)
          pre-agg G2G            61.54 (05.38)   14.30 (12.91) 08.74 (03.69)
          pre-agg IDNE           58.74 (07.49)   10.21 (11.58) 08.41 (03.99
          post-agg TADW          71.94 (04.63)   14.44 (12.87) 12.68 (04.37
          post-agg GVNR-t        61.22 (06.24)   20.70 (14.59) 10.19 (04.21)
          post-agg G2G           58.87 (05.79)   12.80 (12.06) 08.12 (03.67)
          post-agg IDNE          59.97 (07.40)   10.61 (11.19) 08.76 (04.17))
          voting (IDNE)          86.79 (03.90)   55.81 (17.35) 37.13 (07.58)
          propagation (IDNE)     61.35 (08.56)   95.02 (10.15) 31.27 (08.21)


        Table 5: Mean scores with standard deviations on Mathoverflow
                                     AUC            P@10              AP
          random                 49.98 (01.62)   06.44 (08.28)   06.53 (03.06)
          panoptic (tf-idf)      81.87 (04.46)   21.95 (19.15)   22.95 (07.54)
          voting (tf-idf)        86.80 (03.23)   61.11 (18.68)   40.10 (08.27)
          propagation (tf-idf)   88.08 (03.38)   93.68 (12.16)   49.58 (08.90)
          pre-agg TADW                NA              NA              NA
          pre-agg GVNR-t         65.34 (09.22)   44.02 (28.31)   16.88 (08.55)
          pre-agg G2G            66.84 (08.99)   22.95 (17.81)   12.49 (05.70)
          pre-agg IDNE           67.01 (09.26)   22.96 (17.84)   13.40 (06.02)
          post-agg TADW               NA              NA              NA
          post-agg GVNR-t        63.84 (07.59)   41.81 (22.68)   14.96 (06.25)
          post-agg G2G           65.06 (09.09)   22.43 (16.94)   11.78 (05.51)
          post-agg IDNE          66.74 (09.10)   21.92 (17.21)   13.11 (05.87)
          voting (IDNE)          88.71 (03.76)   68.46 (18.53)   43.53 (09.90)
          propagation (IDNE)     69.38 (09.65)   92.35 (13.88)   39.62 (09.89)


                                          27
                                              BIR 2020 Workshop on Bibliometric-enhanced Information Retrieval


                         Datasets and Benchmark of DNE Methods for Expert Finding              13

References

 1. Balog, K., Serdyukov, P., De Vries, A.P.: Overview of the trec 2010 entity track. Tech. rep.,
    NORWEGIAN UNIV OF SCIENCE AND TECHNOLOGY TRONDHEIM (2010)
 2. Bojchevski, A., Günnemann, S.: Deep gaussian embedding of graphs: Unsupervised induc-
    tive learning via ranking. In: International Conference on Learning Representations. pp. 1–13
    (2018)
 3. Brochier, R., Guille, A., Rothan, B., Velcin, J.: Impact of the query set on the evaluation of
    expert finding systems. Proceedings of the 3rd Joint Workshop on Bibliometric-enhanced In-
    formation Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2018)
    co-located with the 41st International ACM SIGIR Conference (2018)
 4. Brochier, R., Guille, A., Velcin, J.: Global vectors for node representations. In: The World
    Wide Web Conference. pp. 2587–2593. ACM (2019)
 5. Brochier, R., Guille, A., Velcin, J.: Inductive document network embedding with topic-word
    attention. In: Proceedings of the 42nd European Conference on Information Retrieval Re-
    search. Springer (2020)
 6. Cifariello, P., Ferragina, P., Ponza, M.: Wiser: A semantic approach for expert finding in
    academia based on entity linking. Inf. Syst. 82, 1–16 (2019)
 7. Craswell, N., Hawking, D., Vercoustre, A.M., Wilkins, P.: P@noptic expert: Searching for
    experts not just for documents. In: Ausweb Poster Proceedings, Queensland, Australia.
    vol. 15, p. 17 (2001)
 8. Craswell, N., de Vries, A.P., Soboroff, I.: Overview of the trec 2005 enterprise track. In:
    Trec. vol. 5, pp. 1–7 (2005)
 9. Dargahi Nobari, A., Sotudeh Gharebagh, S., Neshati, M.: Skill translation models in expert
    finding. In: Proceedings of the 40th international ACM SIGIR conference on research and
    development in information retrieval. pp. 1057–1060. ACM (2017)
10. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by
    latent semantic analysis. Journal of the American Society for Information Science 41(6),
    391–407 (1990)
11. Fischer, T., Remus, S., Biemann, C.: Lt expertfinder: An evaluation framework for expert
    finding methods. In: Proceedings of the 2019 Conference of the North American Chapter of
    the Association for Computational Linguistics (Demonstrations). pp. 98–104 (2019)
12. Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Proceedings
    of the 22nd ACM SIGKDD international conference on Knowledge discovery and data min-
    ing. pp. 855–864. ACM (2016)
13. Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Ad-
    vances in neural information processing systems. pp. 2177–2185 (2014)
14. Macdonald, C., Ounis, I.: Voting for candidates: adapting data fusion techniques for an expert
    search task. In: Proceedings of the 15th ACM international conference on Information and
    knowledge management. pp. 387–396. ACM (2006)
15. Macdonald, C., Ounis, I., Soboroff, I.: Overview of the trec 2007 blog track. In: TREC.
    vol. 7, pp. 31–43 (2007)
16. Mislevy, R.J., Riconscente, M.M.: Evidence-centered assessment design. In: Handbook of
    test development, pp. 75–104. Routledge (2011)
17. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order
    to the web. Tech. rep., Stanford InfoLab (1999)
18. Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In:
    Proceedings of the 2014 conference on empirical methods in natural language processing
    (EMNLP). pp. 1532–1543 (2014)


                                               28
                                               BIR 2020 Workshop on Bibliometric-enhanced Information Retrieval


14       R. Brochier et al.

19. Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: Online learning of social representations. In:
    Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery
    and data mining. pp. 701–710. ACM (2014)
20. Riahi, F., Zolaktaf, Z., Shafiei, M., Milios, E.: Finding expert users in community question
    answering. In: Proceedings of the 21st International Conference on World Wide Web. pp.
    791–798. ACM (2012)
21. Serdyukov, P., Rode, H., Hiemstra, D.: Modeling multi-step relevance propagation for ex-
    pert finding. In: Proceedings of the 17th ACM conference on Information and knowledge
    management. pp. 1133–1142. ACM (2008)
22. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extraction and mining of
    academic social networks. In: Proceedings of the 14th ACM SIGKDD international confer-
    ence on Knowledge discovery and data mining. pp. 990–998. ACM (2008)
23. Van Gysel, C., de Rijke, M., Worring, M.: Unsupervised, efficient and semantic expertise
    retrieval. In: WWW. vol. 2016, pp. 1069–1079. The International World Wide Web Confer-
    ences Steering Committee (2016)
24. Yang, C., Liu, Z., Zhao, D., Sun, M., Chang, E.: Network representation learning with rich
    text information. In: Twenty-Fourth International Joint Conference on Artificial Intelligence
    (2015)
25. Yuan, S., Zhang, Y., Tang, J., Hall, W., Cabotà, J.B.: Expert finding in community question
    answering: a review. Artificial Intelligence Review 53(2), 843–874 (2020)
26. Zhang, J., Tang, J., Li, J.: Expert finding in a social network. In: International Conference on
    Database Systems for Advanced Applications. pp. 1066–1069. Springer (2007)
27. Zhang, M., Song, R., Lin, C., Ma, S., Jiang, Z., Jin, Y., Liu, Y., Zhao, L., Ma, S.: Expansion-
    based technologies in finding relevant and new information: Thu trec 2002: Novelty track
    experiments. NIST SPECIAL PUBLICATION SP 251, 586–590 (2003)
28. Zhao, Z., Zhang, L., He, X., Ng, W.: Expert finding for question answering via graph reg-
    ularized matrix completion. IEEE Transactions on Knowledge and Data Engineering 27(4),
    993–1004 (2014)


                                                29

</pre>