=Paper=
{{Paper
|id=Vol-1888/paper1
|storemode=property
|title=Do "Future Work" sections have a purpose? Citation links and entailment for global scientometric questions
|pdfUrl=https://ceur-ws.org/Vol-1888/paper1.pdf
|volume=Vol-1888
|authors=Simone Teufel
|dblpUrl=https://dblp.org/rec/conf/sigir/Teufel17
}}
==Do "Future Work" sections have a purpose? Citation links and entailment for global scientometric questions==
<pdf width="1500px">https://ceur-ws.org/Vol-1888/paper1.pdf</pdf>
<pre>
    Do “Future Work” sections have a purpose?
      Citation links and entailment for global
              scientometric questions

                                  Simone Teufel

Computer Laboratory, University of Cambridge, and Computer Science Department,
                         Tokyo Institute of Technology


      Abstract. Which tasks in digital libraries might be interesting and ad-
      dressable for us as a community now in the near future, given the recent
      developments in NLP? This paper attempts to make a guess. (Instead
      of actually even attempting to answer the question in its title.) What
      would we need to do today if we, at some point in the future, wanted to
      be able to answer this question objectively, quantifiably, and on a large
      text base? Many similar questions in the same vein exist, such as where
      the research of a field as a whole is headed, where the currently most
      contested research issues in a field are, which new ideas emerged in the
      past 5 years, and which ones of these are game-changers. I will argue
      that we might well be able to offer support for the answering of such
      questions sometime in the future, particularly if we are willing to turn
      our attention to entailment and inference in scientific writing.


1   Introduction

Something always irks me when I read “future work” sections; be it the “future
work” sections of papers I review, those of papers I read for my research, or the
ones I have to compose when writing my own papers. Questions such as – what
exactly is the purpose of these sections, and what is the status of the ideas in
these sections? How should I interpret these descriptions in other papers, and
how should I write my own?
    In terms of communicative function, what a “future work” section is supposed
to accomplish is less clear-cut than, for instance, a “methods” section. Ideas we
express there don’t really “count” in terms of the contributions of the paper:
the research ideas we present there are hypothetical (they are not yet done), so
no real contribution can be claimed for them. Lacking a potential reward, why
would we offer up our unprotected secret research plans to anybody who might
want to snatch them? On the other hand, social convention requires that we
write something; we can’t just leave the space empty. True, occasionally one has
some ongoing work that one urgently wants to tell the community about, or a
particularly clever idea that fits neatly. But I suspect that many of the ideas I
read about are mechanically written possible avenues for future research, which
get forgotten almost as soon as the paper is published. Maybe they get taken
up by somebody – more likely not. (I know this because I sometimes do this
myself.)
    But “future work” sections could also in principle be like a market for ideas, a
notice board where we announce our true intentions, and where we compete with
our readers – just who manages to beat the time and bring out the next paper
on this hot future topic? Perhaps the main inspiration for a large proportion of
papers ever written was indeed such an open research question previously posed
in a paper the authors read?
    I wished that an enterprising social scientist, bibliometrist or historian of
science did this research, and I also wished he or she would (be able to) use the
large data repositories we now have at our fingertips when doing bibliographic
searches. I would personally like a quantifiably answer, which is precise and sup-
ported by much textual historical evidence. Could this possibly be an application
of large-scale digital libraries in combination with natural language technology?
Maybe not quite yet, but thinking about what we would need in order to answer
this question could provide some near future goals for NLP.
    The short answer to what we would need to do is to match up descriptions of
planned research in an earlier set of papers, with research contributions of papers
later in time (either by the same or by different authors). The detailed answer
is more complicated and concerns technologies such as paraphrase detection,
citation link processing, citation block detection, entailment detection and many
more.


1.1   A Simple Example

Suppose we find the following sentence in the “future work” section of a paper.

(1)    We intend to investigate more sophisticated ways of document represen-
       tation and of extracting a citation’s context.

We then start a search for this task, and given that the communicative purpose
of a scientific paper (its knowledge claim) is generally quite clearly marked, let’s
say we find a paper (by the same authors) in the future of the first paper (say,
a year later), with the following title:

(2)    Context Matters: Towards Extracting a Citations Context Using Linguis-
       tic Features

We can’t help concluding that the original intention of the authors, stated in
sentence (1) above, must have been real, as it was obviously followed up by
exactly the kind of research predicted. Manually finding examples where other
researchers take up a suggestion from a previous “future work” section is a bit
harder to do. Of course, in many cases we might never find a good match.
    It is clear that a search engine that could provide a social scientist with ob-
jective, hard data on how many times a plausible match occurs would have to be
pretty intelligent. It is definitely addressing a text understanding problem. (My
definition of “text understanding” is “any process that obtains new knowledge,
i.e., something that is true and relevant but that isn’t explicitly stated in the
text). My guess is, however, that such a system wouldn’t have to be impossibly
intelligent, given today’s NLP technology.


1.2   Paraphrases, pragmatic effects, and concreteness

The first component we would need is very robust paraphrase detection. Research
in paraphrase detection is long-standing, with many successful approaches [3, 7,
12, 31]. But we would need capability going beyond paraphrase detection. In
particular, we will need to draw inferences. Understanding how human inference
works, and sometimes being able to automate some of these steps, is an impor-
tant part of text understanding, and thus of artificial intelligence–a worthwhile
exercise in its own right. For the current task at hand, being able to perform
inference would also happen to be extremely useful.
    The closest we have come as a field to a shared inference task is the RTE [11],
and its precursor, FRACAS [10]. In these tasks, systems have to verify or reject
a proposed logical entailment hypothesis between two sentences. While FRA-
CAS works with hand-crafted sentence pairs and concentrates on entailments
that follow from the manipulation of negation, quantifiers and scope, RTE uses
naturally occurring text. The inferences in RTE generally also require world
knowledge (this is the case for FRACAS to a much lower degree), and because
world knowledge is often not shared exactly across humans, the notion of entail-
ment becomes somewhat defeasible; ie., instead of “strictly logical” entailment,
a weaker notion of “plausible” entailment (that would generally be accepted by
most humans) is used.
    This has more recently been complemented by visual entailment tasks, such
as the one based on image captions that underlies the SNLI, the Stanford Natu-
ral Language Inference corpus [5]. This is a large-scale task consisting of around
570,000 sentence pairs. The possible hypotheses (entailed ones, non-entailed
ones, and possibly-entailed ones) were elicited from naive humans who did not
see the original image, but only had access to the caption as the basis on which
to form their hypotheses. Of course, such a task definition excludes cognitive,
communicative and other abstract action, events and objects, which are an es-
sential part of the inference necessary for understanding real-world text such as
scientific papers, but it does allow for the training of supervised machine learning
algorithms such as neural networks which require vast training material.
    Recognising and processing presuppositions, and implicatures in general, can
also help immensely. Apart from knowing about what is logically entailed in
a text, knowing what is implicated is another important mosaic stone in the
inference puzzle. Implicatures are defined as those statements about the world
that are assumed to be true by the speaker and transmitted “along with” their
literal message, without being explicitly stated.

(3)    a.   Miller et al. did not manage to verify whether saturation was reached.
       b.   Miller et al. did not verify whether saturation was reached.
In contrast to entailment, implicatures cannot be cancelled by negation. If I state
sentence (3-a) above, we understand that Miller et al. attempted the verification,
but that the test was inconclusive. If I state sentence (3-b) above, I trigger a
very different meaning, namely that they didn’t want to verify, and probably
didn’t even run the test. Please also note that the presupposition “they had
the intention of verifying” survives even if we turn the negative sentence into a
positive one.
    This is a remarkable difference, in that sentences (3-a) and (3-b) are truth-
conditionally equivalent; the only thing that distinguishes them is the verb man-
age to. Presuppositions are distinguished from general implicatures in that they
are lexically trigged.

(4)    a.   Miller attempted to model how X works.
       b.   Miller modelled how X works.

Another example concerns conversational implicatures. Stating sentence (4-a)
above conversationally implicates that Miller et al. didn’t manage to satisfacto-
rily model how X works. (If they had, in the author’s opinion, then sentence (4-b)
would have been more appropriate. This follows from the Gricean maxims [16],
according to which one should always state the strongest relevant and true state-
ment one possibly can. “modelling” (i.e., attempting to model and then succeed-
ing) is stronger than simply “attempting to model”.
    There are few works that model pragmatic reasoning effects computation-
ally on a large scale, but some resources such as presupposition trigger lexicons
have been created, and some automatic research exists on detecting presuppo-
sitions and implicatures beween pairs of verbs, e.g. [29]. In Cambridge, there is
some recent work on determining how the inference involved in interpreting “let
alone” sentences can be automated [27], which also involves decisions on possible
presupposition links between two statements.
    Overall, I believe that such works can contribute towards the question in the
title of this paper – finding links between people’s proclaimed intentions of doing
something (as stated in the “future work” sections), and the reports of actually
having done that thing (in a research paper). They can potentially also help find
similarities between methods used in different papers, and determine what the
exact difference might be in cases where a contrast between some methods exists
but where the nature of the difference is not spelled out by the authors.
    Scientific writing, like all texts, contain many implicatures. According to
the Gricean Maxims, we would be insulting the reader’s intelligence (or sound
pompous) if we tell them more than they need to know to make the inference
themselves, and the search space for possible inferences is very, very large. In my
opinion, any work on automatic inference should therefore start with the clear-
cut, small-step cases, such as presuppositions and implicatures, where there is
normally very high human agreement that these statements have objectively
been asserted into the discourse, and what the implicature is. This is opposed to
other types of inference, where much more world knowledge is needed to make
the inference.
   Finally, I will propose that judging the “abstractness” of a “future work”
item would also be useful. This could serve to distinguish the following pair of
sentences:

(5)    a.   In future work we intend to evaluate the algorithm as part of a
            dialogue understanding system on state of the art benchmarks.
       b.   . . . it is not known exactly how well the model will perform in the
            real world. Future work will examine installing models in real world
            applications.

    The first avenue for future work (sentence (5-a)) is quite concrete, and the
chances are high that the researchers, having already implemented their algo-
rithm, might conceivably next perform the rather concrete action of running an
evaluation. In sentence (5-b), the future work suggested sounds abstract and
vague, and like work of a different kind altogether, for which the research team
probably are neither equipped, nor have any real intention of doing. Research in
metaphor classification has contributed some methods for addressing the task of
judging the abstractness of phrases [30, 26]


1.3   Global Scientometric Questions

Predicting which “future work” suggestion is taken up in later work is an exciting
task. An additional question is that if some such suggestion does get taken up,
whether it is by the authors of the original paper or by somebody else. Given
an acceptably precision matching procedure, a natural gold standard for the
prediction task offers itself, because in citation-based research we can manipulate
the time scale of papers we take into account, in order to simulate a “future”
that is to be predicted.
    There are various related scientometric questions we might ask about the
development of a field, such as the task of describing the emergence and develop-
ment of a scientific field [4, 19], determining schools of thought [22, 1], finding the
occurrence of scientific revolutions and paradigm shifts [20, 13], identifying the
scientific areas where the most innovation currently occurs [9, 6], and detecting
the emergence of scientific ideas [21, 23]. These questions have traditionally been
answered manually or purely quantitatively, and more recently with the help of
natural language processing technology such as automatic sentiment detection
and citation function classification [25, 15, 24, 28, 18, 2, 14, 8, 17]. My argument
here is that we need not stop there. In my opinion, these existing methods could
gainfully be complemented with computational linguistics research studying tex-
tual entailment, text understanding and pragmatics of scientific writing.


References

 1. Allen, B.: Referring to schools of thought: An example of symbolic citations. Social
    Studies of Science 27(6), 937–949 (1997)
 2. Athar, A., Teufel, S.: Detection of implicit citations for sentiment detection. In:
    Proceedings of ACL-12 Workshop on Discovering Structure in Scholarly Discourse.
    Jeju Island, South Korea (2012)
 3. Barzilay, R., Lee, L.: Learning to paraphrase: an unsupervised approach using
    multiple-sequence alignment. In: Proceedings of the 2003 Conference of the North
    American Chapter of the Association for Computational Linguistics on Human
    Language Technology-Volume 1. pp. 16–23. Association for Computational Lin-
    guistics (2003)
 4. Bettencourt, L.A., Ulwick, A.W.: The customer-centered innovation map. Harvard
    Business Review 86(5), 109 (2008)
 5. Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for
    learning natural language inference. arXiv preprint arXiv:1508.05326 (2015)
 6. Boyack, K.W.: Thesaurus-based methods for mapping contents of publication sets.
    Scientometrics 111(2), 1141–1155 (2017)
 7. Callison-Burch, C.: Syntactic constraints on paraphrases extracted from parallel
    corpora. In: Proceedings of the Conference on Empirical Methods in Natural Lan-
    guage Processing. pp. 196–205. Association for Computational Linguistics (2008)
 8. Catalini, C., Lacetera, N., Oettl, A.: The incidence and role of negative citations
    in science. Proceedings of the National Academy of Sciences 112(45), 13823–13826
    (2015)
 9. Chen, C., Ibekwe-SanJuan, F., Hou, J.: The structure and dynamics of cocitation
    clusters: A multiple-perspective cocitation analysis. Journal of the Association for
    Information Science and Technology 61(7), 1386–1409 (2010)
10. Cooper, R., Crouch, D., Van Eijck, J., Fox, C., Van Genabith, J., Jaspars, J.,
    Kamp, H., Milward, D., Pinkal, M., Poesio, M., et al.: Using the framework. Tech.
    rep., Technical Report LRE 62-051 D-16, The FraCaS Consortium (1996)
11. Dagan, I., Glickman, O., Magnini, B.: The pascal recognising textual entailment
    challenge. In: Machine learning challenges. evaluating predictive uncertainty, vi-
    sual object classification, and recognising tectual entailment, pp. 177–190. Springer
    (2006)
12. Das, D., Smith, N.A.: Paraphrase identification as probabilistic quasi-synchronous
    recognition. In: Proceedings of the Joint Conference of the 47th Annual Meet-
    ing of the ACL and the 4th International Joint Conference on Natural Language
    Processing of the AFNLP: Volume 1-Volume 1. pp. 468–476. Association for Com-
    putational Linguistics (2009)
13. De Langhe, R.: Towards the discovery of scientific revolutions in scientometric
    data. Scientometrics 110(1), 505–519 (2017)
14. Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., Zha, C.: Content-based ci-
    tation analysis: The next generation of citation analysis. Journal of the Association
    for Information Science and Technology 65 (2014)
15. Garzone, M., Mercer, R.E.: Towards an automated citation classifier. In: Proceed-
    ings of the 13th Biennial Conference of the CSCI/SCEIO (AI-2000). pp. 337–346
    (2000)
16. Grice, H.P., Cole, P., Morgan, J., et al.: Logic and conversation. 1975 pp. 41–58
    (1975)
17. Jha, R., Jbara, A.A., Qazvinian, V., Radev, D.R.: Nlp-driven citation analysis for
    scientometrics. Natural Language Engineering 23(1), 93–130 (2017)
18. Kaplan, D., Iida, R., Tokunaga, T.: Automatic extraction of citation contexts for
    research paper summarization: A coreference-chain based approach. In: Proceed-
    ings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital
    Libraries. pp. 88–95. Association for Computational Linguistics (2009)
19. Kiss, I.Z., Broom, M., Craze, P.G., Rafols, I.: Can epidemic models describe the
    diffusion of topics across disciplines? Journal of Informetrics 4(1), 74–82 (2010)
20. Kuhn, T.S.: The Structure of Scientific Revolutions, 2nd enl. ed. University of
    Chicago Press (1970)
21. Kuhn, T., Perc, M., Helbing, D.: Inheritance patterns in citation networks reveal
    scientific memes. Physical Review X 4(4), 041036 (2014)
22. McCain, K.W.: Cocited author mapping as a valid representation of intellectual
    structure. Journal of the American society for information science 37(3), 111 (1986)
23. McKeown, K., Daume, H., Chaturvedi, S., Paparrizos, J., Thadani, K., Barrio,
    P., Biran, O., Bothe, S., Collins, M., Fleischmann, K.R., et al.: Predicting the
    impact of scientific concepts using full-text features. Journal of the Association for
    Information Science and Technology 67(11), 2684–2696 (2016)
24. Nakov, P., Schwarz, A., Hearst, M.: Citances: Citation sentences for semantic anal-
    ysis of bioscience text. In: SIGIR’04 Workshop on Search and Discovery in Bioin-
    formatics (2004)
25. Nanba, H., Okumura, M.: Towards multi-paper summarization using reference in-
    formation. In: Proceedings of the XXth International Joint Conference on Artificial
    Intelligence (IJCAI-99). pp. 926–931 (1999)
26. Neuman, Y., Assaf, D., Cohen, Y., Last, M., Argamon, S., Howard, N., Frieder,
    O.: Metaphor identification in large texts corpora. PloS one 8(4), e62343 (2013)
27. Razuvayevskaya, O., Teufel, S.: Finding enthymemes in real-world texts: a feasi-
    bility study. Argument and Computation (2017)
28. Teufel, S., Siddharthan, A., Tidhar, D.: Automatic classification of citation func-
    tion. In: Proceedings of EMNLP-06 (2006)
29. Tremper, G., Frank, A.: A discriminative analysis of fine-grained semantic relations
    including presupposition: Annotation and classification. Dialogue & Discourse 4(2),
    282–322 (2013)
30. Turney, P.D., Neuman, Y., Assaf, D., Cohen, Y.: Literal and metaphorical sense
    identification through concrete and abstract context. In: Proceedings of the Con-
    ference on Empirical Methods in Natural Language Processing. pp. 680–690. As-
    sociation for Computational Linguistics (2011)
31. Xu, W., Ritter, A., Grishman, R.: Gathering and generating paraphrases from
    twitter with application to normalization. In: Proceedings of the Sixth Workshop
    on Building and Using Comparable Corpora. pp. 121–128 (2013)

</pre>