<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ontology-Grounded Topic Modeling for Climate Science Research?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jennifer Sleeman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tim Finin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Milton Halem</string-name>
          <email>halemg@cs.umbc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science and Electrical Engineering University of Maryland</institution>
          ,
          <addr-line>Baltimore County Baltimore. MD 21250</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In scientific disciplines where research findings have a strong impact on society, reducing the amount of time it takes to understand, synthesize and exploit the research is invaluable. Topic modeling is an effective technique for summarizing a collection of documents to find the main themes among them and to classify other documents that have a similar mixture of co-occurring words. We show how grounding a topic model with an ontology, extracted from a glossary of important domain phrases, improves the topics generated and makes them easier to understand. We apply and evaluate this method to the climate science domain. The result improves the topics generated and supports faster research understanding, discovery of social networks among researchers, and automatic ontology generation.</p>
      </abstract>
      <kwd-group>
        <kwd>topic modeling</kwd>
        <kwd>ontology</kwd>
        <kwd>climate science</kwd>
        <kwd>explainability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The authoritative source for conveying the latest climate research findings, recommendations and mitigations steps
is the Intergovernmental Panel on Climate Change (IPCC) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The reports produced by the IPCC are published every
five years and are composed of four separate volumes: Physical Science Basis; Impacts, Adaptations and Vulnerability;
Mitigation of Climate Change; and Synthesis Reports. Each IPCC volume has eight to twenty-five chapters and each
chapter cites between 800 and 1200 external research documents. The IPCC reports provide not only a comprehensive
assessment of the climate science, but the analysis of the 30-year series of reports shows how the scientific field has
and continues to evolve.
      </p>
      <p>For a new climate scientist, absorbing this information in order to perform research or make policy contributions
can be daunting. However, if machine understanding of these reports could be used to summarize, synthesize and
model the knowledge, the researcher’s task is improved. We propose this could improve the overall scientific research
contributions that could be made by making the process more efficient.</p>
      <p>
        In our previous work [
        <xref ref-type="bibr" rid="ref14 ref15 ref16 ref17">17, 15, 16, 14</xref>
        ] we described a process by which we converted 25 years of IPCC reports
and their cited articles into raw text. We treated these two document collections, the report chapters and the scientific
research papers they cite, as two different domains. We used a topic modeling cross-domain approach to show how
these two domains interacted and how the cited research in one report influenced the subsequent reports. This allows
us to more accurately predict how the field of climate science is evolving.
      </p>
      <p>
        We quickly discovered that the standard topic modeling approaches did not work as well as we hoped on the text
in the report domain, and were even less effective on the cited research documents from the scientific domain. One
reason is that scientific literature is written more formally and typically contains more phrases that provide the context
through which one understands the literature [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Many phrases in a given scientific domain do not follow the usual
pattern of compositional semantics in which their meaning can be obtained by combining the meanings of their words.
Rather, they have a specific meaning in the domain that must be learned. In climate science, for example, black carbon
refers not to carbon whose color is black, but to the sooty material emitted from gas and diesel engines, coal-fired
power plants and other sources that burn fossil fuel.
? This work will be published as part of the book ”Emerging Topics in Semantic Technologies. ISWC 2018 Satellite Events”, E.
      </p>
      <p>Demidova, A.J. Zaveri, E. Simperl (Eds.), ISBN: 978-3-89838-736-1, 2018, AKA Verlag Berlin. Copyright held by the authors.</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        Topic modeling has a long history of relevance to natural language processing, often used to model large collections
of text documents applied to problems such as document summarization [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], classification [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ], recommendation
[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and search [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The method, Latent Dirichlet Allocation (LDA), [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] made a significant footprint in natural
language research. In Latent Dirichlet Allocation (LDA) [
        <xref ref-type="bibr" rid="ref1 ref2">2, 1</xref>
        ], every document is assumed to be a mixture of topics
represented as a probability distribution, and each topic is a probability distribution over the terms in the vocabulary
that is formed from the full collection of documents. Topics are drawn from a Dirichlet distribution.
      </p>
      <p>Topic modeling is often used to find word vectors that best represent the themes in a collection of documents. Each
word vector contains a set of words or word phrases, where each word/word phrase has an associated probability that
represents that word or word phrase’s contribution in representing a particular theme. The approach most frequently
used is to extract words from documents, removing commonly occurring words and symbols, and to generate a
‘collection vocabulary’ from the set of words. Sometimes the vector is a set of singleton words, however, word n-grams,
where the ‘n’ represents the number of words that make up the phrase, are also used.</p>
      <p>
        It has been shown that when word phrases are used in topic models, the topics tend to be more relevant to the
collection of documents [
        <xref ref-type="bibr" rid="ref19 ref20 ref3 ref8">19, 8, 20, 3</xref>
        ]. We have found this to be particularly important when the collection of documents
pertains to a scientific discipline [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. However, knowing what sort of phrases one should extract and use to train the
topic model is a problem. The standard bag of words approach is often used, where each word from the document is
treated as a singleton.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Topic Modeling for Scientific Domains</title>
      <p>
        Particularly for scientific domains, phrases often convey special meaning that is lost by treating them as single
words. The challenge with using word n-grams is knowing which sequences carry such meaning, making automatic
phrase or word n-gram extraction problematic. The fact that key phrase extraction is currently an active areas of
research among [
        <xref ref-type="bibr" rid="ref10 ref24">10, 24</xref>
        ], it is clear this is still a challenging problem.
      </p>
      <p>
        To understand this challenge further, we used tools from NLTK [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] to find common, meaningful phrases that align
with concepts in a space science domain (e.g., ’active galactic nucleus’) but other, less relevant ones were also found,
such as ’aboard the Hubble’ and ’central region of ’. Stop word removal can filter some, but not all of these words
and many n-grams would require human judgment to filter. In this example, there were also missed phrases such as
’Chandra X-Ray Observatory’, where instead ’Ray Observatory’, ’the observatory’, and ’- Ray Observatory’ were
found.
      </p>
      <p>Instead we ground the topic modeling process on a domain-specific ontology seeded with predefined key word
phrase concepts obtained from domain-specific sources such as domain experts, and by data mining semi-structured
sources. In particular, we found the IPCC glossaries and domain experts to be good sources for defining climate-related
word phrase concepts. This grounding process contextualizes the topic model such that the topics are more relevant to
the domain that is being modeled. For example, given a climate change domain ontology, if a document being used
to train the topic model included text unrelated to climate change, those words would potentially have a lower weight
than the words which represented the ’known’ or ’seed’ concepts found in the ontology.</p>
      <p>
        Table 1 shows examples from the data-mined and a bi/trigram extractor approach. Using a sample of climate
change data, all bigrams and trigrams were annotated. The documents were processed using the data mining approach,
and also using an n-gram extraction approach. The extractor approach was only able to recover 6% of the word phrases
that could be 100% represented using our data mining of glossaries approach. Similar results were found for space
science data. Though more recent research [
        <xref ref-type="bibr" rid="ref10 ref24">10, 24</xref>
        ] may significantly improve upon a typical bi/trigram extractor,
for scientific data, seeding the ontology with known concepts that are readily available through published glossaries
provides the more accurate set of concepts.
      </p>
      <p>Glossaries typically provide key concepts that are relevant to a particular domain and consist of words and phrases
whose length can be anywhere from two to ten words. For example, in the climate change community, the phrase
”soil moisture’ implies something much more meaningful than ‘soil’ and ‘moisture’ alone. Furthermore, the singleton
words might be much more frequently found than the phrase. This is often an important artifact in the topic model. For
example, ‘black carbon’ is a significant concept in climate change because of its impact on the research at a certain
period of time. The word ‘black’ may not frequently occur with other words but among climate change literature
the word ‘carbon’ occurs quite frequently with other words. An example of this is shown in Figure 1, where the
phrase ‘black carbon’ has a significantly lower occurrence across the Physical Science, Impact and Synthesis books
for Assessment Report 3 than the single word ‘carbon’. The single word ‘black’ not only appears within the phrase
‘black carbon’ but also within the phrase ‘black spruce’, ‘black-footed ferrets’, and within other phrases.</p>
    </sec>
    <sec id="sec-4">
      <title>Approach</title>
      <p>Our approach entails modeling the structure of the reports and their citations in the ontology. There were five IPCC
assessment reports, AR1-AR5, each of which follows a similar structure consisting of four distinct books: Physical
Science Basis, Impacts, Adaptations and Vulnerability, Mitigation of Climate Change, and Synthesis Reports. Each
book has between 11 and 25 chapters and a chapter typically cites between 800 and 1200 external documents. The
ontology consists of a similar structure as shown in Figure 2. We then obtain a list of concepts of importance from
domain experts and domain glossaries. Figures 3 and 4 shows example pre-defined seed concepts represented in our
IPCC ontology</p>
      <p>Fig. 2: A Partial IPCC Ontology Used for Guiding Topic Modeling.</p>
      <p>
        In addition, acronyms are mapped to the actual phrase and treated as the same concept. For example, ‘ENSO’ is
treated as the the same concept as ‘El Nino Southern Oscillation’. The ontology is then read into memory for the
preprocessing step of the topic modeling phase. As we perform preprocessing of text, we use the ontology concepts
for weighting concepts we find in the text. We do this for both the report data and the citation research papers. This
retrieval process is described in more detail in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. After we perform the topic modeling phase, we update the domain
ontology with the concepts associated with each chapter of the book, the topics generated with word probabilities, and
cross-domain mappings between the reports domain and the research paper domain. Figures 3 and 4 shows a small
subset of the concepts extracted from the First Assessment Period, Chapter 1.
      </p>
      <sec id="sec-4-1">
        <title>Ontologically Represented Topics</title>
        <p>Since ontological word phrases are used to ground the topic modeling process, topics are also represented by an
ontological structure. Topics are implicitly linked into the ontological structure, along with documents. The IPCC
reports have citation information for each chapter in the report. In our work we built a topic model for the citations and
another topic model for the reports. We used the ontology as a means for conveying how the two topic models were
related to each other using the topics generated from the two different models as a bridge between the two models. For
example, Figure 5 shows a set of common concepts from two mapped topics captured ontologically.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Ontologically Represented Social Networks</title>
        <p>Since we captured the citation information ontologically, we can use this information to discover interesting social
communities. For example, a class in the ontology called ‘Publication’ can have many ‘Authors’. When a number of
authors are referenced together across multiple publications, this can give insight into a social relationship between
these authors. Specifically, if the same authors are cited in three different chapters that relate to ‘Black Carbon’, these
authors form a relationship with a common node concept ‘Black Carbon’. The ontology can also be used to observe
social networks given the relationships between authors that cite authors that are also cited in the same chapter.</p>
        <p>Figure 6 shows an example in which ‘Callaghan’ was an author in a paper cited in one chapter and author ‘Abbs’
was also cited in the same chapter.</p>
        <p>Of the paper for which ‘Abbs’ is an author, a relationship was found between ‘Abbs’ and ‘Callaghan’ due to the fact
that they were both cited in the same chapter and one cited the other in their cited paper. The ontological representation
for citations can also shed light, in general, on which authors are cited across books and chapters.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimentation and Evaluation</title>
      <p>We used our ontology-guided topic modeling approach to model both the IPCC reports and the citations. We first
converted the reports and citations to raw text, we extract meta-information regarding the citations and we used the
ontology to link this information together. When a predefined concept from our ontology was found, we weighted the
phrase higher than non-ontology concepts by 5%, 10%, 25%, and 50%. To understand how ontological concept-based
topic modeling differs from standard bag of words topic modeling, topic models were built using both approaches. The
ontological grounded topic model was compared to a bag of words model that used the same data set, the same stop
word removal, but without the ontology concepts grounding the modeling. Therefore it did not contain word phrases
but could potentially contain word phrases as individual words. Perplexity was used to evaluate the two models.</p>
      <p>In this experiment, the IPCC ‘Physical Science’ book for assessment periods one through five were used. There
were 61 documents in total used in this topic model, with 11 documents in AR1, 11 documents in AR2, 14 documents
in AR3, 11 documents in AR4, and 14 documents in AR5. The assessment reports were used for this experiment and
each chapter was treated as a document. For perplexity evaluations, a held-out set was used for each assessment period
beginning with AR2. One way to understand the differences between these two models is by simply observing the
topics.</p>
      <p>For example, in Table 2 the two topics highlighted appear to be the ‘radiative forcing’ topic for each model. The
ontology guided model provides a richer set of words since the topic is composed of phrases. The same is also true for
the ‘El Nino’ topic for each model. Though both models provide visually relevant topics, the ontology-guided model
provides concepts that are more specific to the scientific domain.</p>
      <p>A common metric used to evaluate topic models is perplexity. Perplexity measures how well a probability
distribution predicts a held-out sample. A lower perplexity indicates the model is better at prediction. Perplexity was measured
across assessment periods where t is the held-out set of documents, using a model trained on t 1 assessment
documents, given t ranges from 2 to 5. Each experiment compared the ontology-grounded and non-ontology-grounded
methods. In Figure 7, AR4 was used to build the topic model and AR5 was used as the held-out test set. Similar
perplexity was measured for the other assessments.</p>
      <p>Given the size of this data set, the number of topics that best represents this data set is between two and six
topics. This was confirmed by visualizing the topics. When the topic size grows too large, topics tend to overlap much
more. The difference in perplexity for the same held-out data set is shown, with perplexity measures (lower is better)
indicating the ontologically grounded topic modeling method may improve perplexity which in turn means it may
offer better predictability for scientific data. As a mean of reference, the training set is also used as a held-out set so
as to show the difference in scale between the ontological method and the non-ontological method. This provides a
general idea as to the performance of these two approaches.</p>
      <p>We performed experiments to compare the ontologically-grounded word phrase approach with a standard
approach, both of which use the same method for stop word removal. With each experiment the top N words are used
for a given set of topics. Given a second topic model example comparing the ontology-grounded model and the
nonontology grounded model, as shown in Table 3, the topics in the ontology-grounded model are more closely related to
terminology found in scientific research papers when compared with the non-ontology grounded model. For example,
the first four words ‘temperature’, ‘anthropogenic’, ‘carbon dioxide’, and ‘radiative forcing’ are more descriptive than
‘change’, ‘ocean’, ‘level’, and ‘global’.</p>
      <p>Using Google to search on the combination of words, two different sets of documents (examining the top three
documents) are returned. With the word phrases contained in [‘temperature’, ‘anthropogenic’, ‘carbon dioxide’,
‘radiative forcing’] the top documents included a Wikipedia page related to ‘Radiative Forcing’ and two IPCC report
chapters, plus a number of Google Scholar research paper suggestions. With the four words contained in [‘change’,
‘ocean’, ‘level’, ‘global’], the top three results included pages related to ‘sea level rise’, the first hosted by NASA,
a second page on the same concept hosted by NOAA, and the third hosted by EPA. There were no Google Scholar
suggestions. This further supports the assertion that by grounding the topic model with concepts from the ontology, the
topics created are more context-specific and hence more fine-grained than the standard approach. For scientific data,
this an important point, as this level of detail provides the context needed to really understand scientific documentation.</p>
    </sec>
    <sec id="sec-6">
      <title>Related Work</title>
      <p>
        Typically topic models use a bag-of-words approach and more recently 1-hot encoded bags of words, leaving it
to the implementer to decide what that bag of words contains. Since the early 2000s, research has focused on ways
of improving topic modeling by adding context [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], labeled topics [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and phrases [
        <xref ref-type="bibr" rid="ref19 ref20 ref3 ref8">19, 8, 20, 3</xref>
        ]. Early research
explored ways of using word n-grams [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] to improve different tasks such as text processing, classification, named entity
recognition and knowledge base population. The idea of discovering word phrases in topic modeling was proposed by
Wang et al. [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] in 2007 and was based on using n-grams in topic models which was proposed by Wallach [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Later
work followed that proposed extensions to identifying topical phrases [
        <xref ref-type="bibr" rid="ref20 ref3 ref8">8, 20, 3</xref>
        ].
      </p>
      <p>
        Early research explored ways of using word n-grams [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] to improve different tasks such as text processing,
classification, named entity recognition and knowledge base population. It is reasonable to believe word n-grams
would produce better topics and research has shown this to be true. This idea of discovering word phrases in topic
modeling was proposed by Wang et al. [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] in 2007 and was based on using n-grams in topic models which was
proposed by Wallach [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Later work followed that proposed extensions to identifying topical phrases [
        <xref ref-type="bibr" rid="ref20 ref3 ref8">8, 20, 3</xref>
        ].
Work by Jameel et al. in 2013 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] combined n-gram models with temporal documents and was foundational in using
ontological concepts to ground the topic modeling process.
      </p>
      <p>
        Recent developments in topic modeling have started exploring its applicability to scientific concepts. Hall et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
address how scientific ideas have changed over time by modeling temporal changes employing DTM,with probability
distributions for the ACL Anthology, a public repository of all papers in the Computational Linguistics journals,
conferences and workshops. Their work proposes extensions to their model by integrating topic modeling with the
citations as done in this paper. Tang et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] investigate the use of topic modeling to identify extreme events based
on numerical atmospheric model simulations. They associate text terms with statistical ranges of numerical variables.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions, Insights and Future Work</title>
      <p>
        More recent work [
        <xref ref-type="bibr" rid="ref10 ref24">10, 24</xref>
        ] related to key-phrase identification could be used in conjunction with domain
glossaries to automatically populate the ontology. Since many scientific domains define glossaries as part of the document
collection, using a heuristic to parse the glossaries is both feasible and effective for constructing ontology concepts.
An ontology-grounded word phrase approach for topic modeling results in topics that contain word phrases, which
better represents the scientific information. Perplexity measures support the ontology-grounded method for this
specific IPCC scientific data set use case. The added benefit of guiding this process with an ontology is that the topics
and documents are linked to an ontological representation which could be used to support knowledge base population
and question answering systems for climate scientists. This approach turns the simple bag-of-words topic modeling
approach into a powerful knowledge understanding tool.
      </p>
      <p>We plan to apply this technique for other domains to get more experience and further test and evaluate the idea. We
are collecting glossaries and concept lists for the cybersecurity domain and plan to develop topic models using them.
We also hope to explore their use to enhance word embeddings.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgement References</title>
      <p>This work was partially supported by a grant of computational resource services from the Microsoft AI for Earth
program and a gift from the IBM AI Horizons Network.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          :
          <article-title>Probabilistic topic models</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>55</volume>
          (
          <issue>4</issue>
          ),
          <fpage>77</fpage>
          -
          <lpage>84</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          :
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>the Journal of machine Learning research 3</source>
          ,
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>El-Kishky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Voss</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          R., Han,
          <string-name>
            <surname>J</surname>
          </string-name>
          .:
          <article-title>Scalable topical phrase mining from text corpora</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          <volume>8</volume>
          (
          <issue>3</issue>
          ),
          <fpage>305</fpage>
          -
          <lpage>316</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Scientific literacy: A systemic functional linguistics perspective</article-title>
          .
          <source>Science education 89(2)</source>
          ,
          <fpage>335</fpage>
          -
          <lpage>347</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.:
          <article-title>Studying the history of ideas using topic models</article-title>
          .
          <source>In: Proceedings of the conference on empirical methods in natural language processing</source>
          . pp.
          <fpage>363</fpage>
          -
          <lpage>371</lpage>
          . Association for Computational Linguistics (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. IPCC:
          <article-title>Intergovernmental panel on climate change https://www</article-title>
          .ipcc.ch/
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Jameel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lam</surname>
            ,
            <given-names>W.:</given-names>
          </string-name>
          <article-title>An n-gram topic model for time-stamped documents</article-title>
          .
          <source>In: European Conference on Information Retrieval</source>
          . pp.
          <fpage>292</fpage>
          -
          <lpage>304</lpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lindsey</surname>
            ,
            <given-names>R.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Headden</surname>
            <given-names>III</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>W.P.</given-names>
            ,
            <surname>Stipicevic</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.J.:</surname>
          </string-name>
          <article-title>A phrase-discovering topic model using hierarchical pitman-yor processes</article-title>
          .
          <source>In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning</source>
          . pp.
          <fpage>214</fpage>
          -
          <lpage>222</lpage>
          . Association for Computational Linguistics (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Loper</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>NLTK: the natural language toolkit</article-title>
          .
          <source>CoRR cs.CL/0205028</source>
          (
          <year>2002</year>
          ), http://arxiv.org/abs/cs.CL/0205028
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Mahata</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuriakose</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>R.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zimmermann</surname>
          </string-name>
          , R.: Key2vec:
          <article-title>Automatic ranked keyphrase extraction from scientific articles using phrase embeddings</article-title>
          .
          <source>In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>2</volume>
          (
          <string-name>
            <given-names>Short</given-names>
            <surname>Papers</surname>
          </string-name>
          <article-title>)</article-title>
          .
          <source>vol. 2</source>
          , pp.
          <fpage>634</fpage>
          -
          <lpage>639</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mcauliffe</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.:</given-names>
          </string-name>
          <article-title>Supervised topic models</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <fpage>121</fpage>
          -
          <lpage>128</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Ramage</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nallapati</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.:
          <article-title>Labeled lda: A supervised topic model for credit attribution in multilabeled corpora</article-title>
          .
          <source>In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-</source>
          Volume 1. pp.
          <fpage>248</fpage>
          -
          <lpage>256</lpage>
          . Association for Computational Linguistics (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Rosen-Zvi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Griffiths</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steyvers</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smyth</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>The author-topic model for authors and documents</article-title>
          .
          <source>In: Proceedings of the 20th conference on Uncertainty in artificial intelligence</source>
          . pp.
          <fpage>487</fpage>
          -
          <lpage>494</lpage>
          . AUAI Press (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Sleeman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halem</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cane</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Discovering scientific influence using cross-domain dynamic topic modeling</article-title>
          .
          <source>In: 2017 IEEE International Conference on Big Data (Big Data)</source>
          . pp.
          <fpage>1325</fpage>
          -
          <lpage>1332</lpage>
          (
          <year>Dec 2017</year>
          ). https://doi.org/10.1109/BigData.
          <year>2017</year>
          .8258063
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Sleeman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halem</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cane</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Dynamic topic modeling to infer the influence of research citations on ipcc assessment reports</article-title>
          .
          <source>In: Big Data Challenges, Research, and Technologies in the Earth and Planetary Sciences Workshop, IEEE Int. Conf. on Big Data. IEEE</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Sleeman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halem</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cane</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Modeling the evolution of climate change assessment research using dynamic topic models and cross-domain divergence maps</article-title>
          .
          <source>In: AAAI Spring Symposium on AI for Social Good</source>
          . AAAI Press (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Sleeman</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          :
          <article-title>Dynamic Data Assimilation for Topic Modeling (DDATM)</article-title>
          .
          <source>Ph.D. thesis</source>
          , UMBC (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Monteleoni</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Can topic modeling shed light on climate extremes?</article-title>
          <source>Computing in Science &amp; Engineering</source>
          <volume>17</volume>
          (
          <issue>6</issue>
          ),
          <fpage>43</fpage>
          -
          <lpage>52</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Wallach</surname>
            ,
            <given-names>H.M.:</given-names>
          </string-name>
          <article-title>Topic modeling: beyond bag-of-words</article-title>
          .
          <source>In: Proceedings of the 23rd international conference on Machine learning</source>
          . pp.
          <fpage>977</fpage>
          -
          <lpage>984</lpage>
          . ACM (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Danilevsky</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Desai</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taula</surname>
            , T., Han,
            <given-names>J</given-names>
          </string-name>
          .:
          <article-title>A phrase mining framework for recursive construction of a topical hierarchy</article-title>
          .
          <source>In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . pp.
          <fpage>437</fpage>
          -
          <lpage>445</lpage>
          . ACM (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.:</given-names>
          </string-name>
          <article-title>Collaborative topic modeling for recommending scientific articles</article-title>
          .
          <source>In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . pp.
          <fpage>448</fpage>
          -
          <lpage>456</lpage>
          . ACM (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Topical n-grams: Phrase and topic discovery, with an application to information retrieval</article-title>
          .
          <source>In: Data Mining</source>
          ,
          <year>2007</year>
          .
          <article-title>ICDM 2007</article-title>
          . Seventh IEEE International Conference on. pp.
          <fpage>697</fpage>
          -
          <lpage>702</lpage>
          . IEEE (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khudanpur</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Combining nonlocal, syntactic and n-gram dependencies in language modeling</article-title>
          . In: EUROSPEECH.
          <string-name>
            <surname>Citeseer</surname>
          </string-name>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Wikirank: Improving keyphrase extraction based on background knowledge</article-title>
          . arXiv preprint arXiv:
          <year>1803</year>
          .
          <volume>09000</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>