-

ECNU at 2018 eHealth Task 2: Technologically Assisted Reviews in Empirical Medicine

Huaying Wu

Tingting Wang

Jiayi Chen

Su Chen

Qinmin Hu

Liang He

0 0 Department of Computer Science & Technology, East China Normal University , Shanghai, 200062 , China 1 Department of Computer Science, Ryerson University , Toronto , Canada

The 2018 CLEF eHealth Task 2 has two sub-tasks in order to write a systematic review of evidence-based medicine. Researchers are required to retrieve relevant documents given by medical database for each query (sub-task 1) and re-rank the documents with the results of the Boolean search as the starting point(sub-task 2). We adopt BM25 with query expansion to acquire basic relationship and utilize a customized Paragraph2Vector to represent queries / documents trained by the training set of Boolean search. To compute the relevant score of given query-document pair, cosine similarity and logistic regression are taken in our experiments. Finally, we nd that the combination has a better performance.

Query expansion Paragraph2Vector Health Information Retrieval

The ECNUica participates in CLEF 2018 eHealth Task 2 : TAR in Empirical Medicine, which proposes to do a sorting problem based on query-documents similarity in Systematic Reviews. There are multiple stages contained in Systematic Reviews: Boolean Search in each query, Screening queries title and Abstract, and Document Checking. The task focus on the rst and second stages of the process.

In the Boolean Search stage, Participants need to do a basic binary classication for each document based on every query. Boolean query with relevant information constituted, which submits to a medical database containing details of medical studies built by experts, need to be classi ed into relevant or irrelevant. The database returns a set of potential relevant studies. In the following steps, Participants decide which ones are indeed relevant by screening titles, abstracts and full documents.

There are two sub-tasks for this task. One is to nd documents in high relevance for each given query. The other one is to re-rank the documents retrieved in the rst step given by experts. According to our work in past two years, we try to manage the data with learning-to-rank [ 6 ], word2vec [ 1, 7 ], relevance based relation between a query and documents [ 3 ], which perform well in other runs [ 4, 5, 7 ]. 2

Methods

In sub-task 1, we choose BM25 algorithm to acquire a baseline of Boolean search. Furthermore, query expansion based on MeSH and pseudo relevance feedback (PRF) is taken to get a better result. In sub-task 2, we employ Paragraph2Vector to represent query and documents for similarity calculation. 2.1

Boolean search

Query Expansion In this stage, we do query expansion to improve retrieval precision. For better performance of experiments, we compare the expansion with PRF, MeSH and RPF + MeSH.

{ The PRF returns top-10 relative features for each query. { The MeSH database is applied to extract medical terms from titles.

We choose DescriptorName part from raw data as keywords of document, which describes theme of document with a series of words, and the words form MeSH as expansion. We do not use any part of protocol in both tasks. Thus, the query we use in both tasks contains: title and objective from original query, expansion from DescriptorName or MeSH. The results show that both PRF and MeSH can improve performance.

Model Training In the model selection stage, we compare the result of BM25, DRF BM25 and PL2. For each algorithm, experiment is based on method (BM25, DEF BM25, PL2) only, method with PRF, method with MeSH, method with both PRF and MeSH. One-hot is used to represent every query for relative score calculation. 2.2

Ranking

Paragraph2Vector T. Mikolov proposed paragraph vector [ 2 ], which presented an unsupervised algorithm that learns xed-length pieces of texts. With this method, we use Paragraph2Vector Model to represent all selected documents from words to xed-length vector.

Under this framework, we should know how to learn vector representation of words rst [ 1 ]. The objective of the word vector model is to maximize the average log probability.

After training word vectors, we use softmax function as activate function to learn the softmax weights and paragraph vectors on documents. Logistic Regression With all CLEF 2017 eHealth training and testing queries and CLEF 2018 eHealth training queries as training dataset, we train a logistic regression model as a classi er. For each document given query, calculation about the relationship with the LR classi er is taken care. The text is the input of model while return a score of relevance. 3 3.1

Experiments Dataset

For sub-task 1, we are provided with a test set consisting of 20 topics of Diagnostic Test Accuracy (DTA) reviews as follows.

{ Topic-ID. { The title of the review, written by Cochrane experts. { A part of the protocol. { The entire PubMED database

For sub-task 2, we are provided with di erent data in the same reviews as follows.

{ The Boolean query manually constructed by Cochrane experts { The set of PubMED Document Identi ers (PID's) returned by running the query in MEDLINE.

For training, we choose the CLEF eHealth 2017 queries and documents with all training part and testing part, CLEF eHealth 2018 queries and documents with training part. 3.2

Runs In Sub-task 1:

We submit three runs for each sub-task whose descriptions are as follows. { ECNU TASK1 RUN1 BM25: The result retreived on entire PubMed dataset by terrier platform with BM25 model and pseudo relevance feedback. { ECNU TASK1 RUN2 LR: Rerank all documents by a Logistic Regression classi er and Paragraph Vector.

{ ECNU TASK1 RUN3 COMBINE: A combination of previous two runs. In Sub-task 2: { ECNU TASK2 RUN1 TFIDF: Rerank the pids by vector space model. Each document is represented as a vocabulary-size vector. Each dimension is the tf-idf score of a certain word. We use cosine similarity to rerank the document. { ECNU TASK2 RUN2 LR: a Logistic Regression classi er is used to rerank documents based on Paragraph Vector. { ECNU TASK2 RUN3 COMBINE: A combination of previous two runs. Task2 Combine recall@100% 0.992

num rels 3964

Table 1. Evaluations in Task1 and Task2 Summary of Runs The run-3 of each sub-task shows better performance in training. We list some results in Table 1, which shows that BM25+PRF perform best compared to other methods. Both MeSH and PRF are employed for query expansion. But during experiments, performance declines when we take them simultaneously.

Experiments choose ap, recall@100, rels found, num rels as evaluation metrics, where ap presents average precision in documents, recall@100 shows the recall score at top-100 documents, num rels reveals the number of total recalled documents and rels found in sub-task1 shows the number of documents we nd in the experiments. 4

Conclusions and Future Work

In the CLEF eHealth 2018 Task 2 TAR, ECNUica team take advantages of the Paragraph2Vector model. Combining with Statistical method, logistic regression with TF-IDF shows better performance, compared to LR method only or TFIDF only. Although the representation of queries and documents can be taken to compute similarities by cosine distance, there are many aspects of our method which need improvement. In the future work, we will focus on more features from text and better out methods.

Acknowledgement

We thank reviewers for their review comments on this paper.

1. Mikolov

, Sutskever

, Chen

, et al. Distributed Representations of Words and Phrases and their Compositionality[J] . Advances in Neural Information Processing Systems , 2013 , 26 : 3111 - 3119 .

2. Le

Q V

, Mikolov

. Distributed Representations of Sentences and Documents[J]. 2014 , 4:II-1188.

3. Lavrenko

, Croft W B. Relevance based language models[C]// International ACM SIGIR Conference. 2001 : 120 - 127 .

4. Yang

, He

, Hu

Q.M.

, He

, Haacke

E.M.:

ECNU at 2015 eHealth Task 2: User-centred Health Information Retrieval . Proceedings of the ShARe/CLEF eHealth Evaluation Lab ( 2015 ).

5. Yang

, He

, Hu

Q.M.

, He

: ECNU at 2015 CDS Track: Two Re-ranking Methods in Medical Information Retrieval . Proceedings of the 2015 Text Retrieval Conference . ( 2015 ).

6. Yang

, He

, Hu

Q.M.

, He

, Liu

H.Y.

, Wang

Y.Y.

, Luo

h .: ECNU at 2016 eHealth Task 3 : Patient-centred Information

Retrieval.

( 2016 )

7. Chen j.y., Chen

, Song

, , Liu

H.Y.

, Wang

Y.Y.

, Hu

Q.M.

, He

, and Yang

: ECNU at 2017 eHealth Task 2: Technologically Assisted Reviews in Empirical Medicine . ( 2017 )

8. Suominen , Hanna and Kelly, Liadh and Goeuriot, Lorraine and Kanoulas, Evangelos and Azzopardi, Leif and Spijker, Rene and Li, Dan and Nvol, Aurlie and Ramadier, Lionel and Robert, Aude and Zuccon, Guido and Palotti, Joao. Overview of the CLEF eHealth Evaluation Lab 2018 . CLEF 2018 - 8th Conference and Labs of the Evaluation Forum, Lecture Notes in Computer Science (LNCS) , Springer, September, 2018 .

9. Kanoulas , Evangelos and Spijker, Rene and Li, Dan and Azzopardi, Leif . CLEF 2018 Technology Assisted Reviews in Empirical Medicine Overview . CLEF 2018 Evaluation Labs and Workshop: Online Working Notes, CEUR-WS, September , 2018 .