=Paper= {{Paper |id=Vol-2068/esida4 |storemode=property |title=Towards Faster Annotation Interfaces for Learning to Filter in Information Extraction and Search |pdfUrl=https://ceur-ws.org/Vol-2068/esida4.pdf |volume=Vol-2068 |authors=Carlos A. Aguirre,Shelby Coen,Maria F. De La Torre,William H. Hsu,Margaret Rys |dblpUrl=https://dblp.org/rec/conf/iui/AguirreCTHR18 }} ==Towards Faster Annotation Interfaces for Learning to Filter in Information Extraction and Search== https://ceur-ws.org/Vol-2068/esida4.pdf

Towards Faster Annotation Interfaces for
Learning to Filter in Information Extraction and Search
Carlos A. Aguirre Shelby Coen Maria F. De La Torre
Dept. of Computer Science Dept. of Electrical and Dept. of Computer Science
caguirre97@ksu.edu Computer Engineering marifer2097@ksu.edu
shelby88@ksu.edu
Margaret Rys
William H. Hsu
Department of Industrial and Manufacturing
Dept. of Computer Science
Systems Engineering
bhsu@ksu.edu
malrys@ksu.edu
Kansas State University
Manhattan, KS, United States

ABSTRACT INTRODUCTION
This work explores the design of an annotation interface for This paper addresses the task of learning to filter [7] for
a document filtering system based on supervised and semi- information extraction and search, specifically by
supervised machine learning, focusing on usability developing a user interface for human annotation of
improvements to the user interface to improve the efficiency documents. These documents are in turn used to train a
of annotation without loss of precision, recall, and accuracy. machine learning system to filter documents by conformance
Our objective is to create an automated pipeline for to pre-specified formats and topical criteria. The purpose of
information extraction (IE) and exploratory search for which filtering in our extraction task context centers around
the learning filter serves as an intake mechanism. The question answering (QA), a problem in information retrieval
purpose of this IE and search system is ultimately to help (IR), information extraction (IE), and natural language
users create structured recipes for nanomaterial synthesis processing (NLP) that involves formulating structured
from scientific documents crawled from the web. A key part responses to free text queries. Filtering for QA tasks entails
of each text corpus used to train our learning classifiers is a restricting the set of source documents, from which answers
set of thousands of documents that are hand-labeled for to specific queries are to be extracted.
relevance to nanomaterials search criteria of interest. This
Our overarching goal is to make manual annotation more
annotation process becomes expensive as the text corpus is
affordable for researchers, by reducing the annotation time.
expanded through focused web crawling over open-access
This leads to the technical objectives of optimizing the
documents and the addition of new publisher collections. To
presentation, interactive viewing, and manual annotation of
speed up annotation, we present a user interface that
objects without loss of precision, accuracy, or recall. This
facilitates and optimizes the interactive steps of document
annotation is useful in many scientific and technical fields
presentation, inspection, and labeling. We aim towards
where users seek a comprehensive repository of publications,
transfer of these improvements to usability and response time
or where large document corpora are being compiled. In
for this annotator to other classification learning domains for
these fields, machine learning is applied to select and prepare
text documents and beyond.
data for various applications of artificial intelligence, from
Author Keywords cognitive services such as question answering, to document
annotation, document categorization, human-computer categorization. Ultimately, annotation is needed not only to
interaction, information retrieval, information extraction, deal with the cold start problem [10] of personalizing a
machine learning recommender system or learning filter, but also to keep
ACM Classification Keywords previous work up to date with new document corpora [4].
Information systems ® Information retrieval ® Users and Manual annotation is expensive because it requires expertise
in the topic and because of the time taken in the process.
interactive retrieval; Retrieval tasks and goals ® question
answering, document filtering, information extraction; Currently, there are fields such as materials science and
machine learning ® supervised learning supervised bioinformatics where annotation is needed to produce
learning by classification ground truth for learning to filter [2]. For this we have
created a lightweight PDF annotation tool to classify
documents based on relevance.
© 2018. Copyright for the individual papers remains with the authors.
Copying permitted for private and academic purposes.
ESIDA'18, March 11, Tokyo, Japan.
This annotation tool was developed with the goal to be more requires tagging and manual classification of documents to
efficient and accurate than normal document annotation. The train the classifier-learning algorithm.
task is to filter documents based on content relevance,
The document corpora that the paper focuses on is in the area
potentially reducing the size of the result set returned in
of chemistry in synthesis of nanomaterial. We have
response to a search query. This can boost the precision of
constructed a custom web crawler to retrieve and filter
search while also supporting information extraction for data
documents in this area of research. The filtering process
mining by returning selected documents that are likely to
checks for the presence of a gazetteer – a list of words in the
contain domain-specific information, such as recipes for
documents (TF-IDF) as best described in [1]. Gazetteers in
synthesizing a material of interest [6]. This can include
information extraction (IE) are so named as generalizations
passages and snippets recipes to be extracted for the
of the geographical dictionaries used in maps and atlases.
synthesis of materials of interest. Analogous to this is
This process is only intended to filter documents based on
annotating documents by category tagging. In this paper,
the vocabulary. On the other hand, other criteria might be
classification is used to determine the eligibility (by format)
needed to determine the relevance of the documents.
and relevance of a candidate document, and annotation
Because metadata in these documents is not always
refers to the process of determining both eligibility and
available, a learning to filter algorithm is necessary.
relevance. The purpose of this paper is to record and test this
annotation tool with a relatively large subject group. Need for a Fast Annotator
Background While many search engines provide a mechanism for explicit
relevance feedback, past work on rapid annotation has
In recent years, the growth of available electronic
mostly focused on markup for chunk parsing and other
information has increased the need for text mining to enable
natural language-based tasks. For example, the Basic Rapid
users to extract, filter, classify and rank relevant structured
Annotation Tool (BRAT) of Stenetorp et al. [11] is designed
and semi structured data from the web. Document
to provide assistance in marking up entities and relationships
classification is crucial for information retrieval of existing
at the phrase level. Meanwhile, fast annotators designed for
literature. Machine learning models based on global word
information extraction (IE) are often focused on a knowledge
statistics such as TF-IDF, linear classifiers, and bag-of-
capture task that is ontology-informed, such as in the case of
words support vector machine classifiers, have shown
the Melita framework for Ciravegna et al. [3] and the
remarkable efficiency at document classification. The broad
MMAX tool of Müller and Strube [8].
goal of our research is to extract figures and instructions from
domain-specific scientific publications to create organized We seek to produce a reconfigurable tool for explicit
recipes for nanomaterial synthesis, including raw relevance feedback for learning to filter that can make use of
ingredients, quantity proportions, manufacturing plans, and not only text features, but also domain-specific features
timing. This task involves classification and filtering of (such as named entities detected using a gazetteer) and
documents crawled from the web. metadata features (such as formatting for sidebars, equations,
graphs, photographs, other figures, and procedures). The
The filtering task is framed in terms of topics of interest,
longer-term goal for intelligent user interface design is to
specifically a dyad (pair) consisting of a known material and
incorporate user-specific cues for relevance determination.
morphology. This in turn supports question answering (QA)
These include actions logged from the user interface such as
tasks defined over documents that are about this query
scrolling and searching within the document, but may be
pair. For example, a nanomaterials researcher may wish to
extensible to gaze tracking data such as scan paths and eye
know the effective concentration and temperature of
fixations. [5]
surfactants and other catalysts, to achieve a chemical
synthesis reaction for producing a desired nanomaterial. [6] Manual annotation of training data brings a high cost in time
due to the amount of training examples needed. Challenges
Collecting information about a document’s representation
in human annotation extend from time consumption to
involves syntactic and semantic attributes, domain ontology
inconsistency in labeled data. The variety in the annotators’
and tokenization. Through the process of linguistically
domain expertise among other human factors can create
parsing sentences and paragraphs, semantic analysis extracts
inaccurate and problematic training data. In the present work,
key concepts and words relevant to the aimed domain topic
an annotator user interface was developed to optimize the
that are then compared to the taxonomy. In our work, this
human annotation process by providing previews of
extraction involves inference and supervised learning to
document pages and highlighting relevant keywords. The
determine different sections using metadata attributes such
increase in speed, user interface design and annotator biases
as font, text-size and spatial location, along with natural
language processing. Data and knowledge retrieval is
dependent on finding documents that contain information
about the synthesis of nanomaterials. Our approach is to use
annotation-based learning, along with TF-IDF and a bag-of-
words classifier to obtain relevant documents. This approach
Figure 1. Screenshot of the Fast Annotator showing the highlighting of gazetteer list and the general layout of the program.
are studied through an experiment with 43 unexperienced subsidiary classification task is to help identify low-level
annotators. features and those that can be identified by modern feature
extraction algorithms, such as deep learning autoencoders.
METHODOLOGY
The objective is to create a tool for faster annotation (Fast There are three sub-categories for helping determine the type
Annotator) that will not compromise on normal accuracy of document: poster/presentation, form, and other; these are
(Manual Annotation). The document corpus that is used in the subclass labels The poster/presentation category has
this experiment is composed by documents retrieved from documents that can be described as graphics, informational
the web. Since these documents are only filtered by posters or presentation talks. The form category are all
vocabulary, there are multiple types of documents present in documents that are online application forms, survey or
the corpus. Because of the importance of the validity of journal petitions. The third and final category, contains all
content of the document, the relevant documents are only documents that cannot be classified as any of the above,
going to be composed of scientific peer-reviewed papers. along with any documents that are not in the language of our
Since these types of documents often require publication research (English) since those are out of our scope.
standards, which often includes a structured layout, we Documents that are scientific posters or presentations, but
expect the relevant documents to be well-formatted. To whose content is relevant, are considered not relevant for the
classify these documents, verification of the layout is purpose of simplifying the task for human annotators and
typically an easy task for a human annotator. To take validation of content.
advantage of this, first, the annotator has to determine
Manual Annotation
whether the document has the aspect to be a scientific paper.
To evaluate the performance of the Fast Annotator we have
Therefore, our classification categories can be separated in
to compare it with the standard way of classifying documents
papers and non-papers. In the case the document is a
without an annotation tool. We are calling Manual
scientific paper, the annotator still has to determine the
Annotation the classification of documents without the Fast
relevance to the content, synthesis of nanoparticles. This
Annotator. We are considering the time it takes the annotator
process cannot be automated since the input source files are
to open, mentally determine the classification of the
in PDF, and therefore many of the metadata found in these
document, and physically classifying it. This Manual
source files are oriented for printing or rendering purposes
Annotation depends totally on the procedure in which the
rather than reading and classifying.
annotator classifies the documents. Because of this, we have
In the case the document is not a scientific paper, it is created an algorithm that is to be followed by all the
automatically considered not relevant; however further annotators.
refinement of the class label is needed. The purpose of this
Using an online stopwatch, the procedure to classify a Preliminary Experiment Design: Best-of-3, Large Batch
document is to start the time, then open the document in the In a preliminary exploratory experiment to assess the
default PDF renderer for the machine. Once the document is feasibility of learning to filter from text features for the
classified, the annotator would move the file to the materials informatics domain, we created two large batches
correspondent directory and pause the time. This ensures that of files for testing Manual Annotation and an earlier version
the time for decision making and physical annotation is taken of the Fast Annotator. The earlier version of the Fast
into account, while also following the way the Fast Annotator is functionally the same, with the difference that
Annotator records its time. it has a few more button categories, and visually, the button
layout is located on the left side of the screen rather than on
Fast Annotator
the right on the current version. Each large batch contained
The Fast Annotator (Figure 1) was designed with loose
1260 files, consisting of 12 smaller batches of size 105 each
implementation of the classical Nielsen heuristics [9]. While
(the least common multiple of 3, 5, and 7, for ease of
designing the Fast Annotator, questions such as consistency
experimenting with Best-of-3, Best-of-5, and Best-of-7 inter-
of the user experience, feedback of user’s input, simplicity,
annotator agreement).
shortcuts and other heuristics where considered. For
consistency purposes, all papers are shown to the user the Training data for supervised inductive learning was
same way: first page is in the central window, and the other generated by creating a bag of words representation of 7633
pages (up to five) are shown as thumbnails on the left side of unique tokens occurring in all small batches, after stop word
the screen. The thumbnails have two purposes, to help the removal and stemming.
user look ahead in the annotation process by showing a
In this and other preliminary experiments, we noted that the
preview of the pages, and to aid the user to get familiar to the
variance of annotation time for 1 to 3 annotators was high,
UI, since thumbnails is a very common aspect of many
suggesting that an experiment using 20-50 annotators would
document readers, visually, the user can start with something
be more conducive to testing the hypothesis that the Fast
similar to their previous experience and move to a new
Annotator required less user time than Manual Annotation,
experience as they follow to the right. Only the first 5 pages
without loss of precision, recall, and accuracy.
of the document are shown. This is to increase the speed of
PDF rendering with the hope of decreasing the final time for Speedup Experiment Design: Best-of-43, Small Batch
annotation. For this experiment conducted using 50 documents and a
participant pool of 43 users, the focus was on assessing
The Fast Annotator shows the status of the annotation speedup. Ground truth was designated to be the previous
process (paper i out of n) on the bottom left side of the screen annotation given by one of two subject matter experts.
and every time a paper is classified, a “loading” message
appears to let the user know that the operation is processing. As described earlier, the fast annotator was designed to
These gives the user a sense of task progress as the user can retrieve results at a more accelerated rate than doing the
see how many papers are done, and a sense of feedback speed classification manually. The background information,
as the loading message appears right after any classification layout, and survey were considered when organizing the
button is pressed. design.
The Fast Annotator shows the gazetteer list as “Keywords The subjects were volunteers Kansas State University
list” and highlights all the words inside the document. The industrial engineering students. They had no background
button layout is designed to show the difference in the types knowledge about synthesis of nanomaterials, or how to use
of documents (whether the document is a paper or not and our annotation tool.
then further classify based on relevance).
When preparing for the execution of the experiment, the
The procedure to use the Fast Annotator is simpler for the information provided to our subjects was observed for
annotator than the Manual Annotation. Normally, the user accurate measurements when categorizing the data. A
annotating has to start the program and choose the directory background summary of our project was provided on the
were all the documents are located, but for the experiment creation of nanomaterials and how their annotations would
this location was predetermined, so the user only had to start be used in a normal environment as training data. Their
the program. Since the program keeps track of the time spent objective was to complete the annotation as efficiently as
on each document in the background, the user does not have possible. Following the definition and reasoning for the
to keep track of the time as they had to in the Manual different categories described earlier: relevant, irrelevant,
Annotation. Once the program starts, it queues all the form, poster, and other.
documents in the specified directory, so the user does not
Later, half of the students started with the Fast Annotator and
have to open each file. The user simply has to click the
the other half started with the Manual Annotation. This
category to classify the document, and the next file will be
separation is to account for the learning curve of annotating
queued by the program right away.
a topic that the subjects were not experts in.
The task for each annotator was to annotate a total of 50 to one misrecorded time for participant #23) the time taken
documents for each type of annotation. Each document to process batches of 50 documents using Manual
corpus was previously annotated by experts in the field. The Annotation is 1070.41 ± 361.45 while the Fast Annotator
document corpora had equal representation of document time is 663.77 ± 468.14. The null hypothesis that the Fast
categories for both the Manual and the Fast Annotator. Annotator is slower than Manual Annotation is rejected with
p < .0000537 (5.37 × 10-5) at the 95% level of confidence
After the experiment, students were asked to take a
using a paired, one-tailed t-test.
completely confidential survey. This survey started with
questions that analyzed the outcomes of the data, then later We received good feedback from the survey with 89.74% of
provided feedback on improvements to the Fast Annotator. the users indicating that highlighting the keywords helped
RESULTS
them determine the type of document. We also found that on
average, 97.44% only needed the first 3 pages to classify the
Preliminary Experiment: Best-of-3, Large Batch document.
In the preliminary experiment, the focus was on
CONCLUSIONS
generalization quality rather than on the statistical
significance of speedup in the annotator. Tables 1 and 2 Summary of Results
show the results: accuracy, weighted average precision, The speedup trend observed in the preliminary experiment is
average recall, F1 score, and area under the (receiver upheld with lower variance but a much lower margin of
operating characteristic or ROC) curve, under 10-fold cross- victory: a 38% gain in speed using the Fast Annotator.
validation, for Manual Annotation and the Fast Annotator. Observed over 43 participants, however, the accuracy of the
Bold face indicates the better of the two sets of results. Fast Annotator is also conclusively higher.
Table 1. Results for Manual Annotation. The positioning of buttons, reduction of classification
Inducer Acc Prec Rec F1 AUC categories and overall layout along with the highlighting of
Logistic 75.2% 0.711 0.752 0.709 0.640 keywords can account the increase in accuracy as the ease of
J48 78.3% 0.782 0.784 0.783 0.688 use and learnability may have affected annotators’ abilities
IB1 79.9% 0.788 0.799 0.792 0.712 to make a category classification decision. This may also be
NB 74.2% 0.790 0.742 0.757 0.759 attributable to prior background expertise and interest.
RF 79.4% 0.801 0.795 0.736 0.841
Future Work
Table 2. Results for the Fast Annotator. One priority in this continuing work is to isolate
Inducer Acc Prec Rec F1 AUC improvements to the user interface, such as highlighting and
Logistic 69.3% 0.764 0.693 0.719 0.664 document previewing, from other causes of speedup and
J48 77.9% 0.785 0.779 0.782 0.668 increased filtering precision and recall. These other causes
IB1 83.8% 0.824 0.838 0.827 0.695 include UI-independent optimizations such as document
NB 71.7% 0.789 0.718 0.742 0.785 pre-fetching. It is important to be able to differentiate these
RF 83.3% 0.825 0.833 0.788 0.862 causes to fairly attribute the observed improvement in
The inducers compared in [1] were: performance measures for the system.
• Logistic: Logistic Regression Attributing annotation speedup to specific user interface
• IB1: Nearest Neighbor changes versus user-specific causes is a challenging open
• NB: Discrete Naïve Bayes problem. To provide a cognitive baseline, collecting and
• RF: Random Forests analyzing survey data regarding annotators’ expertise and
interest in the domain topic could reveal an effect on the
The average time required for Manual Annotation was speed and accuracy of the results. A related problem is that
18,413.4 seconds versus 5,246.8 seconds for the Fast of accounting for user expertise as subject matter experts and
Annotator – a 251% speedup – with statistically insignificant experience with the fast annotator: in our earliest
gains in precision or AUC, slightly lower accuracy, and experiments [1], we obtained greater speedups (251% as
lower recall. mentioned above) that may be attributable to greater
Speedup Experiment: Best-of-43, Small Batch familiarity with the fast UI due to the original annotators
As described in the design specification, accuracy was being UI developers. Although the hypothesized trends were
assessed based on user annotations relative to expert ground supported by the experiment reported in this paper, using
truth. For N = 43, the accuracy of Manual Annotation novice participants who were given only a rubric, these
classifications is 0.639 ± 0.125 (mean 0.639, stdev 0.125), trends are lower in magnitude and significance. We
while the accuracy of Fast Annotator classifications is 0.726 hypothesize a learning curve that may be useful to model.
± 0.114. The null hypothesis that the Fast Annotator is less Further analysis for user interface design is planned for the
accurate that Manual Annotation is rejected with p < Fast Annotator. To draw conclusions and test new tools, a
.00002071 (2.071 × 10-5) at the 95% level of confidence technology such as gaze tracking and gaze prediction [5]
using a paired, one-tailed t-test. Meanwhile, for N = 42 (due
could be used to expand the features available for relevance Annotation Based on Information Extraction.
determination, and also to personalize and tune the interface Proceedings of the 13th International Conference on
for faster response. One particular application of this Knowledge Engineering and Knowledge Management
technology is to procedurally automate layout of annotation (EKAW 2002): Lecture Notes in Computer Science 2473
interface elements for user experience (UX) objectives, (pp. 122--137). Berlin, Germany: Springer.
particularly the efficiency of explicit relevance feedback and
4. Fiorini, N., Ranwez, S., Montmain, J., & Ranwez, V.
multi-stage document categorization.
(2015). USI: a fast and accurate approach for conceptual
Information extraction from text and learning to filter document annotation. BMC Bioinformatics, 16(83).
documents (especially from text corpora) are already doi:10.1093/bioinformatics/btm229
actively-studied problems in different scientific fields, and
5. Karaman, Ç. Ç., & Sezgin, T. M. (2017). Gaze-based
our project aims to aid in this area. As technology
predictive user interfaces: Visualizing user intentions in
progresses, however, machine learning for information
the presence of uncertainty. International Journal of
retrieval, information extraction, and search is being applied
Human-Computer Studies, 111, 78-91.
to more types of media, such as video and audio. An efficient
doi:10.1016/j.ijhcs.2017.11.005
video or audio annotator would increase the range of
application of enabling technologies, such as action 6. Kim, E., Huang, K., Saunders, A., McCallum, A., Ceder,
recognition, to different fields. G., & Olivetti, E. (2017). Materials Synthesis Insights
from Scientific Literature via Text Extraction and
Finally, we are investigating applications of this type of
Machine Learning. Chemistry of Materials, 29(21),
human-in-the-loop information filtering in other problem
9436–9444. doi:10.1021/acs.chemmater.7b03500
domains, such as network traffic monitoring in cyberdefense,
and anomaly detection. We hypothesize that reinforcement 7. Lang, K. (1995). NewsWeeder: Learning to Filter
learning to develop policies for UI personalization can yield Netnews. Proceedings of the 12th International
improvements in filtering quality such as the kind reported Conference on Machine Learning (ICML 1995) (pp.
in this paper. 331-339). San Francisco, CA, USA: Morgan Kaufmann.
ACKNOWLEDGMENTS 8. Müller, C., & Strube, M. (2001). MMAX: a tool for the
The authors thank Yong Han and David Buttler of Lawrence annotation of multi-modal corpora. Proceedings of the
Livermore National Labs for helpful feedback and assistance 2nd IJCAI Workshop on Knowledge and Reasoning in
with ground truth annotation, and Tessa Maze for help with Practical Dialogue Systems, (pp. 45-50).
editing.
9. Nielsen, J. (1994). Enhancing the explanatory power of
This work was funded by the Laboratory Directed Research usability heuristics. Proceedings of the SIGCHI
and Development (LDRD) program at Lawrence Livermore Conference on Human Factors in Computing Systems
National Laboratory (16-ERD-019). Lawrence Livermore (CHI 1994) (pp. 152-158). New York, NY, USA: ACM.
National Laboratory is operated by Lawrence Livermore doi:10.1145/191666.191729
National Security, LLC, for the U.S. Department of Energy,
National Nuclear Security Administration under Contract 10. Schein, A. I., Popescul, A., Ungar, L. H., & Pennock, D.
DE-AC52-07NA27344. This work was also originally M. (2002). Methods and Metrics for Cold-Start
supported in part by the U.S. National Science Foundation Recommendations. Proceedings of the 25th Annual
(NSF) under grants CNS-MRI-1429316 and EHR-DUE- International ACM SIGIR Conference on Research and
WIDER-1347821. Development (pp. 253-260). New York, NY, USA:
ACM.
REFERENCES
1. Aguirre, C. A., Gullapalli, S., De La Torre, M. F., Lam, 11. Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T.,
A., Weese, J. L., & Hsu, W. H. (2017). Learning to Filter Ananiadou, S., & Tsujii, J. (2012). BRAT: a Web-based
Documents for Information Extraction using Rapid Tool for NLP-Assisted Text Annotation. Proceedings of
Annotation. Proceedings of the 1st International the 13th Conference of the European Chapter of the
Conference on Machine Learning and Data Science. Association for Computational Linguistics (ACL 2012):
IEEE Press. Demonstrations (pp. 102-107). Association for
Computational Linguistics.
2. Baumgartner Jr., W. A., Cohen, K. B., Fox, L. M.,
Acquaah-Mensah, G., & Hunter, L. (2007). Manual
curation is not sufficient for annotation of genomic
databases. Bioinformatics, 23(13), i47-i48.
doi:10.1093/bioinformatics/btm229
3. Ciravegna, F., Dingli, A., Petrelli, D., & Wilks, Y.
(2002). User-System Cooperation in Document