<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>April</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>SDGi Corpus: A Comprehensive Multilingual Dataset for Text Classification by Sustainable Development Goals</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mykola Skrynnyk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gedion Disassa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrey Krachkov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Janine DeVera</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>United Nations Development Programme</institution>
          ,
          <addr-line>New York</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>2</volume>
      <fpage>5</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>We introduce SDGi Corpus (SDG Integration Corpus), the most comprehensive multilingual collection of texts labelled by Sustainable Development Goals (SDGs) to date. SDGi Corpus is a text dataset for multi-label classification that contains over 7,000 examples in English, French and Spanish. Leveraging years of SDG reporting on the international and subnational levels, we hand-picked texts from Voluntary National Reviews (VNRs) and Voluntary Local Reviews (VLRs) from more than 180 countries to create an inclusive dataset that provides both focused and broad perspectives on the SDGs. This paper reports on the dataset creation efort and use cases we envision for it. We also establish baselines for text classification by SDG using traditional machine learning (ML) and deep learning (DL) approaches. These illustrate the opportunities and challenges of using this dataset for social good. The dataset is available on Hugging Face as UNDP/sdgi-corpus.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Dataset</kwd>
        <kwd>Text Classification</kwd>
        <kwd>Sustainable Development Goals</kwd>
        <kwd>Supervised Machine Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>as increase transparency in reporting. The amount of unstructured information of analytical
interest is vast, highlighting the need for an automated solution for SDG classification. However,
the eforts to develop a viable solution have been scattered, and no single benchmark for testing
SDG classifiers has been adopted by the community thus far.</p>
      <p>Against this backdrop, we introduce SDGi Corpus, the most comprehensive multilingual
collection of texts labelled by SDGs to date. The dataset is designed to fill the gap in the
availability of SDG-labelled data, facilitate the development of SDG classification systems and
enable a consistent comparison among them. While we envision this dataset to be used as a
standard benchmark for multi-label (and multi-class) text classification by SDG, we also
encourage researchers and practitioners to use it for topic modelling, text mining and quantitative text
analysis more broadly. The dataset includes rich metadata that allows to slice and dice data to
answer various SDG-related questions.</p>
      <p>Our contribution is two-fold. First and foremost, we introduce the dataset to encourage the ML
community in developing state-of-the-art (SOTA) solutions for SDG classification. Secondly, we
report preliminary results for a series of supervised and zero-shot ML experiments, illustrating
the non-triviality of the SDG classification task in general as well as challenges of overfitting
and out-of-domain generalisation for our dataset in particular.</p>
      <p>The remainder of this paper is structured as follows. Section 2 discusses existing research on
SDG classification. Section 3 introduces SDGi Corpus, describing the data sources and curation
process as well as providing key descriptive statistics. In section 4, we report preliminary results
for a series of experiments using the dataset. The final section 5 is used to draw conclusions
and discuss avenues for future improvements.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The body of works that explore text classification by SDG is relatively small and can be
characterised by two distinct approaches: keywords-based and ML-based2. Thus, a number of
initiatives directed their eforts to creating curated lists of keywords or queries to be able
to link texts to SDGs in a rules-based fashion. Most popular of these are SIRIS [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Elsevier
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], Aurora [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and Auckland [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] systems. These systems difer in matching logic 3 and SDG
coverage, with the original Elsevier and Auckland systems excluding SDG 17. These approaches
mostly consider SDG classification from bibliometric or scientometric perspectives, although
LinkedSDG [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and SDG Mapper [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] adopted similar methodology in a policy context. While
keyword-based systems were instrumental in popularising SDG classification, their systematic
evaluations revealed issues with both accuracy and robustness [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ].
      </p>
      <p>
        Attempts to classify text by SDG using ML algorithms adopted methodologies as diverse
as semantic search [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ], topic modelling [15, 16] and text classification [ 17, 18, 19, 20, 21].
Despite a variety of methods used by ML-based approaches, one recurrent theme in the literature
is the lack of readily available SDG-labelled datasets. Most studies using supervised learning
2We use these terms to refer to the way systems can be applied to classify text by SDG and not to the way they were
derived. For example, SIRIS [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] system was enriched by matching terms using word embeddings. This however
does not change the fact that the final system is keyword-based.
3Queries allow for a more complex keyword-matching logic but they are fundamentally rules-based classification
systems with all the associated limitations.
are based on diferent sets of data collected by their respective authors. Examples include
training an XGBoost model on a combination of 200 SDG-specific reports, SDG descriptions and
a few hundred manually-labelled project descriptions [17], fine-tuning BERT on a collection
of several hundred news, articles and policy briefs from IISD "SDG Knowledge Hub" website 4
[18]; fine-tuning DistilRoBERTa using a set of more than 8,000 paragraphs manually-labelled
by unidentified experts [ 19]. Several studies have also relied on keyword-based systems to
label data for supervised learning [20, 21]. Some of these studies have resulted in open- or
closed-source products for SDG classification, including SDG Prospector based on [ 19] and
text2sdg package in R based on [21].
      </p>
      <p>To the best of our knowledge, there has been only one focused efort to create a public
SDG-labelled dataset, namely OSDG Community Dataset (OSDG-CD) [22]. OSDG-CD is an
English-only collection of passages from policy documents and reports labelled with respect to
SDGs by online volunteers. The original release of the dataset included examples for the first
16 SDGs only but examples for SDG 17 have been added recently. At the time of writing, the
dataset includes over 40,000 examples evaluated by over 1400 volunteers.</p>
      <p>OSDG-CD has several major limitations that render it inappropriate as an SDG classification
benchmark. Firstly, SDG labels are assigned by online volunteers and not domain experts. This
can lead to systematic biases that cannot be mitigated by the fact that the same example is
evaluated by multiple volunteers. Secondly, each example is evaluated by several volunteers
with respect to one SDG only, i.e., every volunteer is presented with a binary question to
determine if an example is relevant to a pre-defined SDG. Consequently, each text in the dataset
has only one associated label, which reduces the SDG task to a multi-class classification problem.
Finally, a significant number of examples have been judged not relevant by the majority of
labellers, making the efective size of the dataset much smaller.</p>
    </sec>
    <sec id="sec-3">
      <title>3. SDGi Corpus</title>
      <p>To enable a consistent comparison among existing solutions and facilitate the development
of SOTA models for SDG classification, we introduce SDGi Corpus, the most comprehensive
multilingual collection of texts labelled by SDG to date. This section explains the origin of the
data, details the data curation process and provides descriptive statistics of the dataset.</p>
      <sec id="sec-3-1">
        <title>3.1. Data Sources</title>
        <p>Although readily available datasets for SDG classification are few, textual information linked
to SDGs is abundant in reporting documents such as Voluntary National Reviews (VNRs) and
Voluntary Local Reviews (VLRs). VNRs are voluntary, multi-stakeholder and government-led
reports submitted to the High-level Political Forum on Sustainable Development (HLPF). The
rationale behind VNRs is to enable countries to share their experiences in the implementation
of the 2030 Agenda. In a similar vein, VLRs are subnational reviews from local and regional
governments that are increasingly engaged in the SDG implementation process. Unlike VNRs,
4Note that this website is managed by an independent think tank and is not part of the United Nations System.
(a) Raw example
(b) Edited example with redacted content
VLRs were not envisioned in the 2030 Agenda but emerged as a popular means of communication
about SDG localisation.</p>
        <p>To construct SDGi Corpus, we web-scraped more than 350 VNRs and close to 200 VLRs, all
in PDF format, from the United Nations5. Our dataset includes extracts from reviews published
before December 2023, covering 8 years of SDG reporting in total. While we collected all
available reports, our initial release only includes examples from reports written in English,
French or Spanish, which together constitute over 90% of the original document set 6.</p>
        <p>Being review documents, VNRs are typically structured in such a way that each chapter
discusses progress in achieving a specific SDG. However, not all countries cover all SDGs 7.
VLRs are similar reports produced by subnational governments and are therefore more likely
to focus on SDGs 9 "Industry, innovation and infrastructure" and 11 "Sustainable cities and
communities" than other SDGs. It is also common to find case studies or examples of projects
in VLRs that contribute to one or more SDGs. The next subsection describes how we leveraged
the structure of the documents to extract examples for SDGi Corpus.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Document Curation</title>
        <p>To create SDGi Corpus, we manually analysed each document, searching and extracting specific
parts clearly linked to SDGs. Our curation process can be summarised in 4 steps as follows:</p>
        <sec id="sec-3-2-1">
          <title>1. Manually examine a given document to identify SDG-labelled content. 2. Extract pages containing relevant content to SDG-specific folders. 3. Edit extracted pages to redact irrelevant content before and after the relevant content. 4. For content linked to multiple SDGs, fill out a metadata sheet.</title>
          <p>While creating examples, we made as few assumptions and judgements about the data as
possible. Our goal was not to label examples but to extract content labelled by the authors of</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>5VNRs and VLRs are available on HLPF and UN DESA websites respectivelly.</title>
          <p>6We will explore the possibility of extending SDGi Corpus to include data in all the 6 oficial languages of the United
Nations in the future updates to the dataset.
7For instance, land-locked countries commonly exclude SDG 14 "Life Below Water" from their VNRs.
VNRs and VLRs who are assumed to be domain experts. To identify relevant content, we relied
on visual and textual clues that indicated SDG relevance, such as chapter and section titles
and SDG icons. SDG-labelled content varies greatly in size and relevance. Some examples are
chapters from VNRs that span dozens of pages whereas others are short paragraphs discussing
an SDG-related project or initiative.</p>
          <p>The PDF pages containing relevant content were extracted into one of the 17 SDG-specific
folders or a dedicated folder for multi-labelled examples. Whenever pages contained clearly
separable content, they were split further into multiple examples. However, a large number of
long sections were left intact to preserve cohesion, while statistical tables and appendices were
excluded altogether.</p>
          <p>The next step was to edit every example to mask noisy or irrelevant content. Since examples
come from PDFs, the structure and layout of pages difer greatly. For all examples, contents
from the sections that precede and follow the relevant part were masked out. For a small portion
of the data, especially shorter examples of up to 4 pages, we also made a reasonable efort to
redact other irrelevant or noisy content within the section itself, including headers, footers,
tables, figures, image credits etc. Figure 1 illustrates this masking process with an example for
SDG 148. Finally, we extracted text from PDFs using pypdfium2 package in Python.</p>
          <p>Overall, our dataset represents data extracted from a variety of real-world layouts, which
results in a fair degree of noise in the extracted texts. Given the public nature and intended
use of the reviews, VNRs and VLRs are extremely unlikely to contain any sensitive Personally
Identifiable Information (PII).</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Train/Test Split</title>
        <p>One peculiarity of the dataset is that single-labelled examples may give away their true label in
their content. For instance, many sections on SDG 2 start with text containing "Zero hunger"
or will contain "Zero hunger", "SDG 2" or "Goal 2" in the header of every page. This makes
it easy to train a seemingly accurate classifier that just badly overfits the data. We consider
this as both a limitation of the dataset and technical challenge for the ML community in that
training a generalisable classifier becomes non-trivial, despite training data containing abundant
information about diferent aspects of all 17 SDGs.</p>
        <p>To partially mitigate this issue, we deviate from the standard practice of creating train/test
splits randomly, adopting an adversarial approach instead. In so doing, we first fit a shallow
neural network to classify examples using embeddings from OpenAI’s text-embedding-ada-002.
Then, we calculate cross-entropy loss for every example in the dataset. The loss values are
used as weights for sampling. We sample 20% of the examples into the test set, stratifying by
language, text size bucket and a binary variable indicating whether an example has one label or
more.</p>
        <p>As the result, the train and test sets follow a noticeably diferent distribution of labels, with
SDGs 5, 6 and 7 being underrepresented and SDGs 8, 10 and 11 being overrepresented in the test
set when compared to the train set, as seen in Figure 2. At the same time, the test set contains
more challenging examples that simpler models struggle with.
8The same raw PDF can be used to create examples for SDG 15 and 16 by masking other parts of the page.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Descriptive Statistics</title>
        <p>In total, SDGi Corpus contains 7,350 examples, with 5,880 (80%) and 1,470 (20%) examples in
the train and test sets respectively. The dataset is dominated by examples in English (71.9%),
followed by Spanish (15.9%) and French (12.2%). To allow diferent yet consistent uses of the
dataset, we include a grouping variable for text size buckets. Examples shorter than 512 tokens9
are considered short (S), between 512 and 2048 are medium (M) and longer than 2048 tokens are
long (L)10. There are 1,662 short, 3,013 medium and 2,675 long examples. Most examples (89.1%)
in the dataset have just one label, but those that are multi-labelled have anywhere between 2 to
17 labels. The average text length in the corpus is 1,382 tokens, although this slightly varies
from language to language. Admittedly, this is considerably longer than the expect input length
of most transformer models. We refer the reader to Table 1 for more details.
9Token counts here and in Table 1 are based on OpenAI’s cl100k_base tokenisation after replacing all numbers with
"NUM" placeholder value.
10When referring to all sizes, we use "X". Similarly, when referring to all languages, "XX" designation is used.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>In this section, we present preliminary results for multi-label classification using SDGi Corpus 11.
We test several popular modelling approaches, combining sparse and neural representations
with support vector machines (SVMs), feedforward neural networks (FFNs) and graph neural
networks (GNNs) [23]. In addition, a GPT architecture [24] is employed for zero-shot learning.</p>
      <sec id="sec-4-1">
        <title>4.1. Data Preparation</title>
        <p>For sparse representations, we apply standard preprocessing such as lowercasing, removing
numbers and punctuation. We also remove stop words using a combined list of English, French
and Spanish stop words but do not lemmatise or stem tokens. Texts are vectorised using Term
Frequency - Inverse Document Frequency (TF-IDF). Models using this type of features are
denoted with BOW. Neural representations are derived from text-embedding-ada-002 and are
1536-dimensional. Models based on embeddings are prefixed with "Ada".</p>
        <p>For GNN, we transform texts into graphs. Our approach is similar to [25]. We treat unique
tokens as nodes and connect tokens that co-occur within a window of size 2 with directed edges.
Only 30,000 most common tokens are kept in the vocabulary. For each graph, up to 10,000
nodes and 10,000 edges with the highest weight are used.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Models</title>
        <p>SVM. Support vector machines used to be SOTA models for text (and document) classification
[26, 27]. It is a fast and simple architecture that has a proven track record of robust performance
when combined with sparse word representations. Our experiments are based on linear SVMs.</p>
        <p>FNN. We use a shallow neural network with one hidden layer and 100 neurons, ReLU
activation and Adam optimiser. To reduce overfitting, early stopping mechanism is used. We
combine SVM and FNN with both sparse representations and embeddings from OpenAI.</p>
        <p>GNN. Graph neural networks are flexible architectures that have only recently been applied to
text with some success [25]. We use a GNN model based on GraphSAGE architecture [28]. The
model consists of an embedding layer, which embeds every node (token) into a 128-dimensional
embedding, and two SAGEConv layers that propagate information from the neighbouring nodes.
The first layer outputs 256-dimensional vector, while the second one outputs 17-dimensional
vector. The final prediction is done by reading out the representations for all nodes and averaging
them. Depending on the vocabulary size, our models have between 1 to 4 million parameters.</p>
        <p>GPT. GPT-family models are currently among the most powerful ones when it comes to
generative AI. We use GPT-3.5 Turbo (16k), relying on its zero-shot learning capabilities to
classify text by SDG without any training data. We manually experiment with several prompt
designs to maximise the performance.</p>
        <p>We optimise the SVM and FNN models by grid search, using various settings for regularisation
term as well as tf-idf transformation, if applicable. For the GNN, we manually test several settings
of hyperparameters, preferring a model with a smaller size. When training the GNN, we use
20% of the training data for validation and early stopping.
11The codebase for the experiments is available in UNDP-Data/dsc-sdgi-corpus repository on GitHub.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Results</title>
        <p>The main results for the experiments are shown in Table 2. The test set performance varies
greatly depending on the subset of data. But even the highest scores are relatively low, with
no model exceeding 81.0 macro-average 1 score. This illustrates the overfitting problem we
indicated earlier. Secondly, the Ada-based models perform better on shorter text, especially
when coupled with an FFN. The best score on the full dataset is achieved by BOW FFN, but ADA
FFN and GNN are only slightly behind. Admittedly, the GNN model significantly outperforms
others for Spanish and French texts. It may be more robust when training on small training sets
due to the way the graph representation is constructed. GPT-3.5 performance is subpar but our
experiments showed that it tends to greatly overassign labels to texts, calling for a more careful
prompt engineering or even few-short learning to constrain this behaviour.</p>
        <p>We also report generalisation performance on IISD News, an out-of-domain dataset. This
is a collection of English-only news articles from IISD used in [18]. We find generalisation
performance lacking for all models but much less so for the Ada-based SVM.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>We introduced SDGi Corpus, the most comprehensive multilingual collection of texts labelled
by SDGs to date. The dataset contains over 7,000 examples in English, French and Spanish,
hand-picked from VNRs and VLRs. We provided an overview of the data creation efort and
set baselines for multi-label classification on diferent subsets of the dataset using SVMs, FFNs,
GNNs and GPT-3.5 Turbo. Our experiments demonstrate that there is no "one-size-fits-all"
solution to SDG classification and diferent models are better on diferent tasks. Overfitting
remains a major challenge but overcoming it will lead to more robust models. Our future eforts
will be directed to adding examples in more languages as well as extending SDGi Corpus with
examples from other types of documents. We encourage researchers and practitioners to use
SDGi Corpus for developing novel SDG classification systems for social good.
[15] M. LaFleur, Art Is Long, Life Is Short: An SDG Classification System for DESA Publications,
SSRN Electronic Journal (2019). URL: https://www.ssrn.com/abstract=3400135. doi:10.
2139/ssrn.3400135.
[16] M. T. LaFleur, Using large language models to help train machine learning SDG
classiifers, Working Paper 180, 2023. URL: https://desapublications.un.org/working-papers/
using-large-language-models-help-train-machine-learning-sdg-classifiers.
[17] A. Pincet, S. Okabe, M. Pawelczyk, Linking Aid to the Sustainable
Development Goals – a machine learning approach, OECD Development Co-operation
Working Papers 52, 2019. URL: https://www.oecd-ilibrary.org/development/
linking-aid-to-the-sustainable-development-goals-a-machine-learning-approach_
4bdaeb8c-en. doi:10.1787/4bdaeb8c-en, series: OECD Development Co-operation
Working Papers Volume: 52.
[18] J. E. Guisiano, R. Chiky, J. De Mello, SDG-Meter: A Deep Learning Based Tool for
Automatic Text Classification of the Sustainable Development Goals, in: N. T. Nguyen, T. K.
Tran, U. Tukayev, T.-P. Hong, B. Trawiński, E. Szczerbicki (Eds.), Intelligent Information
and Database Systems, volume 13757, Springer International Publishing, Cham, 2022,
pp. 259–271. URL: https://link.springer.com/10.1007/978-3-031-21743-2_21. doi:10.1007/
978-3-031-21743-2_21, series Title: Lecture Notes in Computer Science.
[19] J.-B. Jacouton, R. Marodon, A. Laulanié, The Proof is in the Pudding.
Revealing the SDGs with Artificial Intelligence, Research Document 262, Agence
Française de Développement (AFD), 2022. URL: https://www.afd.fr/en/ressources/
proof-pudding-revealing-sdgs-artificial-intelligence.
[20] M. Vanderfeesten, R. Jaworek, L. Keßler, AI for mapping multi-lingual academic papers to
the United Nations’ Sustainable Development Goals (SDGs), Technical Report, Zenodo,
2022. URL: https://zenodo.org/record/5603019. doi:10.5281/ZENODO.5603019, version
Number: 1.0.
[21] D. U. Wulf, D. S. Meier, R. Mata, Using novel data and ensemble models to improve
automated labeling of Sustainable Development Goals, 2023. URL: http://arxiv.org/abs/
2301.11353, arXiv:2301.11353 [cs].
[22] OSDG, UNDP IICPSD SDG AI Lab, PPMI, OSDG Community Dataset (OSDG-CD), 2021.</p>
      <p>URL: https://zenodo.org/record/5550238. doi:10.5281/ZENODO.5550238.
[23] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, G. Monfardini, The Graph Neural
Network Model, IEEE Transactions on Neural Networks 20 (2009) 61–80. doi:10.1109/
TNN.2008.2005605.
[24] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan,
P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan,
R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin,
S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei,
Language Models are Few-Shot Learners, 2020. URL: http://arxiv.org/abs/2005.14165,
arXiv:2005.14165 [cs].
[25] L. Huang, D. Ma, S. Li, X. Zhang, H. Wang, Text Level Graph Neural Network for Text
Classification, in: Proceedings of the 2019 Conference on Empirical Methods in Natural
Language Processing and the 9th International Joint Conference on Natural Language
Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong,
China, 2019, pp. 3442–3448. URL: https://www.aclweb.org/anthology/D19-1345. doi:10.
18653/v1/D19-1345.
[26] D. D. Lewis, Y. Yang, T. G. Rose, F. Li, RCV1: A New Benchmark Collection for Text</p>
      <p>Categorization Research, J. Mach. Learn. Res. 5 (2004) 361–397. Publisher: JMLR.org.
[27] S. Wang, C. Manning, Baselines and Bigrams: Simple, Good Sentiment and Topic
Classification, in: Proceedings of the 50th Annual Meeting of the Association for Computational
Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, Jeju
Island, Korea, 2012, pp. 90–94. URL: https://aclanthology.org/P12-2018.
[28] W. L. Hamilton, R. Ying, J. Leskovec, Inductive representation learning on large graphs,
in: Proceedings of the 31st International Conference on Neural Information Processing
Systems, NIPS’17, Curran Associates Inc., Red Hook, NY, USA, 2017, pp. 1025–1035.
Eventplace: Long Beach, California, USA.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>United</given-names>
            <surname>Nations</surname>
          </string-name>
          ,
          <source>The Sustainable Development Goals Report 2023: Special Edition</source>
          ,
          <source>Technical Report</source>
          , United Nations, New York,
          <year>2023</year>
          . URL: https://unstats.un.org/sdgs/report/2023/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Elsevier</surname>
          </string-name>
          ,
          <article-title>The Power of Data to Advance the SDGs: Mapping research for the Sustainable Development Goals</article-title>
          ,
          <source>Technical Report, Elsevier</source>
          ,
          <year>2020</year>
          . URL: https://www.elsevier.com/ connect/report
          <article-title>-mapping-research-to-advance-the-sdgs.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>OCallaghan</surname>
          </string-name>
          , QS World University Rankings: Sustainable Development Goals,
          <year>2021</year>
          . URL: https://www.topuniversities.com/university-rankings/world-university-rankings/
          <article-title>sustainable-development-goals.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>[4] THE reporters</article-title>
          ,
          <source>Impact Rankings</source>
          <year>2023</year>
          : methodology,
          <year>2023</year>
          . URL: https://www.timeshighereducation.com/world-university-rankings/ impact-rankings
          <article-title>-2023-methodology</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Duran-Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fuster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Massucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Quinquillà</surname>
          </string-name>
          ,
          <article-title>A controlled vocabulary defining the semantic perimeter of Sustainable Development Goals</article-title>
          ,
          <year>2019</year>
          . URL: https://zenodo.org/ record/3567768. doi:
          <volume>10</volume>
          .5281/ZENODO.3567768.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Jayabalasingham</surname>
          </string-name>
          ,
          <source>Identifying research supporting the United Nations Sustainable Development Goals</source>
          ,
          <year>2019</year>
          . URL: https://data.mendeley.com/datasets/87txkw7khs/1. doi:
          <volume>10</volume>
          . 17632/87TXKW7KHS.1.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Vanderfeesten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Otten</surname>
          </string-name>
          , E. Spielberg,
          <article-title>Search Queries for "Mapping Research Output to the Sustainable Development Goals (SDGs)"</article-title>
          <source>v5.0.2</source>
          ,
          <year>2020</year>
          . URL: https://zenodo.org/record/ 3817444. doi:
          <volume>10</volume>
          .5281/ZENODO.3817444.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mu</surname>
          </string-name>
          ,
          <article-title>Mapping research to the Sustainable Development Goals (SDGs), preprint</article-title>
          , In Review,
          <year>2023</year>
          . URL: https://www.researchsquare.com/article/rs-2544385/v2. doi:
          <volume>10</volume>
          .21203/rs.3.rs-
          <volume>2544385</volume>
          /v2.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>UN</surname>
            <given-names>DESA</given-names>
          </string-name>
          , LinkedSDG,
          <year>2019</year>
          . URL: https://linkedsdg.oficialstatistics.org.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>European</given-names>
            <surname>Commission</surname>
          </string-name>
          . Joint Research Centre.,
          <string-name>
            <surname>Mapping</surname>
            <given-names>EU</given-names>
          </string-name>
          <article-title>policies with the 2030 agenda and SDGs :fostering policy coherence through text based SDG mapping</article-title>
          .,
          <string-name>
            <surname>Publications</surname>
            <given-names>Ofice</given-names>
          </string-name>
          ,
          <string-name>
            <surname>LU</surname>
          </string-name>
          ,
          <year>2023</year>
          . URL: https://data.europa.eu/doi/10.2760/87754.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Armitage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lorenz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mikki</surname>
          </string-name>
          ,
          <article-title>Mapping scholarly publications related to the Sustainable Development Goals: Do independent bibliometric approaches get the same results?</article-title>
          ,
          <source>Quantitative Science Studies</source>
          <volume>1</volume>
          (
          <year>2020</year>
          )
          <fpage>1092</fpage>
          -
          <lpage>1108</lpage>
          . URL: https://direct.mit.edu/qss/article/1/ 3/1092-1108/96106. doi:
          <volume>10</volume>
          .1162/qss_a_
          <fpage>00071</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>F.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vanderfeesten</surname>
          </string-name>
          ,
          <article-title>Evaluation on accuracy of mapping science to the United Nations' Sustainable Development Goals (SDGs) of the Aurora SDG queries</article-title>
          ,
          <source>Technical Report, [object Object]</source>
          ,
          <year>2021</year>
          . URL: https://zenodo.org/record/4964606. doi:
          <volume>10</volume>
          .5281/ ZENODO.4964606,
          <string-name>
            <surname>version</surname>
            <given-names>Number</given-names>
          </string-name>
          :
          <source>v1.0</source>
          .2.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Galsurkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vempaty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Varshney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sushkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Iyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kapto</surname>
          </string-name>
          , I. Research, I. Watson,
          <source>Semantic Searching for Eficient Assessment of Sustainable Development in National Plans</source>
          ,
          <year>2017</year>
          . URL: https://api.semanticscholar.org/CorpusID:86858304.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>F.</given-names>
            <surname>Sovrano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmirani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vitali</surname>
          </string-name>
          ,
          <article-title>Deep Learning Based Multi-Label Text Classification of UNGA Resolutions</article-title>
          ,
          <source>in: Proceedings of the 13th International Conference on Theory and Practice of Electronic Governance</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>686</fpage>
          -
          <lpage>695</lpage>
          . URL: http://arxiv.org/abs/
          <year>2004</year>
          .03455. doi:
          <volume>10</volume>
          .1145/3428502.3428604, arXiv:
          <year>2004</year>
          .03455 [cs, stat].
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>