<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <article-id pub-id-type="doi">10.1016/J.EIAR.2021.106632</article-id>
      <title-group>
        <article-title>Environmental impact assessment reports in Wikidata and a Wikibase</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Finn Årup Nielsen</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivar Lyhne</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Darío Garigliotti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Annika Butzbach</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emilia Ravn Boess</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katja Hose</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lone Kørnøv</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aalborg University</institution>
          ,
          <addr-line>Aalborg</addr-line>
          ,
          <country country="DK">Denmark</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>TU Wien</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Technical University of Denmark</institution>
          ,
          <addr-line>Kongens Lyngby</addr-line>
          ,
          <country country="DK">Denmark</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>3249</volume>
      <fpage>78</fpage>
      <lpage>85</lpage>
      <abstract>
        <p>Environmental impact assessment (EIA) is a required process for projects in many countries, which results in the preparation and publication of a detailed report. Such EIA reports describe impacts on, e.g., humans and the environment. In this paper, we describe our eforts in modeling the metadata of EIA reports and their description of environmental impacts and mitigations with Wikidata and Wikibase. We show that it is possible to record the bibliographic metadata of Danish EIA reports and in doing so link it to the rest of the Wikidata knowledge graph, allowing multilingual search. With our dedicated instance of a Wikibase, we show how EIA reports and their associated projects along with activities, impacts, recipients, and mitigations can be represented and how we can show aggregated views of the data based on SPARQL templates so that users can explore the data and make eficient use of it.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Environmental impact assessment</kwd>
        <kwd>Wikibase</kwd>
        <kwd>Wikidata</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>environmental data, and within the DREAMS project, the EIA reports have now been stored as PDF
ifles and made available.</p>
      <p>Part of the DREAMS project is to establish a system capturing the specific environmental efects and
mitigations that are described in the EIA reports — an efect assessment tool we refer to as CAUSA.
The relations for environmental efects and mitigations are often not just a simple triple relation, but
may be of higher order and form a small chain or graph. The project also explores how the Sustainable
Development Goals (SDG) can be integrated into EIA processes [3] and sets out to establish links between
impacts identified through the EIA process and the SDGs. We have annotated a set of EIA reports with
the cloud service tagtog5 which can handle PDF files and link relations between annotated text pieces
[4]. The annotation can be exported as JSON and we have further converted it to a spreadsheet format.</p>
      <p>Wikidata [5] is a knowledge base closely linked to Wikipedia. With the Wikicite initiative, see, e.g.,
[6], metadata about publications have been added to Wikidata so that a considerable and growing part
of Wikidata is now publications. At Wikicite.org, the estimated fraction in Wikidata of items about
publications grew to 42% in 2022, corresponding to almost 42 million items.6 As of 2023, over 38 million
Wikidata items are tagged as scholarly articles.7</p>
      <p>Wikidata is continuously converted to a Semantic Web representation that is downloadable as well
as query-able through the Wikidata Query Service (WDQS) using the SPARQL query language.8 The
conversion to a Semantic Web representation also entails a complex structure with qualifiers and
references [7].</p>
      <p>Various tools have taken advantage of the data in Wikidata available through WDQS. One such tool
is Scholia which was originally designed to display the metadata of scientific publications [ 8]. It has
now been extended to display information about, e.g., taxa, biological pathways, and chemicals, see,
e.g., [9]. Currently, Scholia does not support other languages than English, other Wikibases with other
SPARQL endpoints than the WDQS, and other specialized data that would not be appropriate to enter
into Wikidata. In our case, the issues are obstacles as we would like to support Danish and specialized
data from EIA reports.</p>
      <p>Wikibase9 is the software that runs Wikidata. Knowledge workers may download the software and
run it locally or use a cloud service where Wikibase is set up. One cloud Wikibase service provider is
Wikimedia Deutschland which provides its service from https://wikibase.cloud. Wikibase instances
may be used by organizations to manage their data — either public or private.</p>
      <p>The Wikibase set up at Wikibase.cloud is bundled together with a triple store, a Blazegraph instance,
corresponding to the Wikidata Query Service for Wikidata. Data from the Wikibase are continually
piped into the triple store so the data in a Wikibase.cloud Wikibase instance is available via SPARQL
queries to the endpoint.</p>
      <p>Compared to Wikidata the curation in (another) Wikibase can be tailored to the data of the
organization with specialized properties and items that would not usually appear in Wikidata. Furthermore,
editing may be restricted to those with authorization.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Contributions</title>
      <p>In this paper, we describe our eforts to model EIA reports within Wikidata and Wikibase as well as
supporting ontologies and modeling the environmental efects with what we call an efect pattern .
We also present a new method for displaying the information from the wiki via SPARQL queries, the
Wikibase pages, and a simple webpage with Javascript. Our eforts are available in the DREAMS Wiki,
which is a Wikibase.cloud instance at https://dreams.wikibase.cloud/.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Related research</title>
      <p>Garrido and Requena described an ontology for environmental impact assessment [10]. The ontology
features classes such as Impact, VisualImpact, NoiseImpact, TraficChanges, and Amphibian. An OWL
ifle is available 10 and the authors also described a web application to display the ontology.</p>
      <p>The SuperPattern Ontology has been suggested for the representation of individual scientific claims
[11]. This ontology is a scheme with five slots: classes for context, subject, qualifier, relation, and object.
The ontology has been used to express a scientific claim such as “strong static magnetic field generally
afects cell cortex in the context of dejellied fertilizable stage VI Xenopus laevis oocyte” [12].</p>
      <p>Wikidata can represent cause and efect between entities with the has efect (P1542) and has cause
(P828) properties. This data and the textual data in Wikipedia has been used to build a knowledge
graph of events and consequences [13]. Articles describe using Wikibase, e.g., for a food composition
knowledge base [14], as an enslavement resource [15], and for biological collection information [16].</p>
    </sec>
    <sec id="sec-4">
      <title>4. Ontologies</title>
      <p>We established several ontologies on the DREAMS Wiki. The items of these ontologies are typically
linked hierarchically with a subclass of property. The ontologies we consider are project types, report
types, significance types, taxa, efects, and SDGs.</p>
      <p>An ontology of project types was created from the itemized hierarchy of the Danish environment
assessment law. For instance, the project type “elektricitetstransportprojekt (bilag 2, 3c)” refers to
the project description of the electricity transport in appendix 2, item 3.c of the Danish law. The
corresponding item in Directive 2011/92/EU is part of annex II, item 3.b. Each report links to one or
more project types. The project types are linked together with subclass of with the project item as the
top entity. Some entities do not correspond to entities in the law, e.g., (the generic) electricity transport
project aggregates two items from diferent appendixes in the law.</p>
      <p>Our taxa ontology records species and other taxa, but also some aggregated entities, such as “annex
IV species or species group”, which is an aggregated group of protected species and species groups. A
small ontology for report types was created with just four entities: report, environment assessment
report, environmental report, and combined environmental and environmental assessment report as the
only entities. Another small ontology records the significance types , where instances are, e.g., “not
significant and negative impact” or “significant and positive impact”. We create an ontology of possible
efects . Items in this ontology are, for example, “wind turbine noise”, subclass of “noise”, subclass of
“physical/environmental determinant’,’ subclass of “human health impact”, finally subclass of “impact”.
An ontology for SDGs is set up for the 17 SDG goals and a selection of the corresponding targets (as a
subclass to an SDG goal). Efects of an activity may link to one or more targets. Identifying relevant
SDG targets was based on the SDG targets provided in [17].</p>
      <p>Our ontologies in the wiki were set up from information from experts recorded in tabular format in
documents and gradually expanded and modified based on annotated EIA reports. We record possible
aliases for ontology entities in the DREAMS Wiki, e.g., noise, which in Danish is “støj”, would have
aliases such as “støjpåvirkning”, “støjen”, “Støj” and “støjbidrag” reflecting inflections and compounds.
We use these aliases for semi-automated entity linking.
10https://github.com/julian-garrido/EAonto/blob/master/eia.owl
class
title
main topic
country of origin
language
publication date
number of pages
project type
Miljørapporter ID
Wikidata ID
citations
P2
P1
—
—
P11
P6
P27
P16
P40
P3
—
P31
P1476
P921
P495
P407
P577
P1104</p>
      <p>—
P10930</p>
      <p>—
P2860
environmental impact assessment report
“Energiparken ved Vennerslund Energi- og Naturpark”
Vennerslund, wind farm, PV system
Denmark
Danish
“2020”
“441”
photovoltaic system project, project with wind turbine
377355e9-9db6-4163-aad0-cf0e5a0004e5
Q112506586</p>
      <p>Dansk pattedyrs atlas, Behavior of Scandinavian Bats during . . .</p>
    </sec>
    <sec id="sec-5">
      <title>5. Describing EIA reports</title>
      <p>We represent metadata about an EIA report on a page in the DREAMS Wiki, see Table 1. The properties
we consider are, e.g., instance of for the class of the report, title, author, language (always Danish),
publication date, project type, and an identifier for the report in Danmarks Miljøportal. Some of these
property values are set up from information obtained for entry in Danmarks Miljøportal while others
have been manually entered — optionally and not systematically. We have currently recorded 632
reports in the DREAMS Wiki.11</p>
      <p>We also represent the EIA report in Wikidata with properties such as instance of, title, main subject,
country of origin (always Denmark), language (always Danish), date of publication, number of pages,
and Miljørapporter File ID that links to the report at Danmarks Miljøportal. The DREAMS Wiki has a
link to the report entity in Wikidata. Some EIA reports contain references — often a highly diverse
set of references, where the target of the reference may include databases, various reports, scholarly
articles, and books. We have added some references to Wikidata.</p>
      <p>Geographic entities described in a report might not yet exist, e.g., “Veddum Kaer” (Veddum marsh)
is not (yet) found in Wikidata. Other resources, such as OpenStreetMap or Geonames also return no
results. Reports may describe projects that are pointlike (e.g., small plants), lines (e.g., railway tracks),
or areas (e.g., wind turbine park). Projects may also detail multiple alternative solutions for alternative
routes, e.g., alternative routes for a new railway line.</p>
      <p>Document identifiers are not commonly used in Danish EIA reports. When the EIA reports are
represented in Danmarks Miljøportal they are each assigned a unique identifier and we can use that.
Both Wikidata and the DREAMS Wiki each as a property for that identifier.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Describing environmental efects</title>
      <p>Inspired by the SuperPattern Ontology, we developed a scheme for the DREAMS Wiki to represent the
individual environmental efect relations described in an EIA report. Our initial experiments defined a
scheme with subject class, object class, context class, and relation type properties closely following
11SPARQL query to the DREAMS Wiki Query Service: https://tinyurl.com/2jygv2o5
instance of
described with
subject text
subject class
object text
object class
recipient text
recipient class
degree of impact text
degree of impact
mitigation text
mitigation class
project type
project phase
quote
page
pattern number</p>
      <p>Description
item class
report entity
page number in PDF
number used during tagtog input
the SuperPattern Ontology scheme. However, to incorporate recipient and mitigation in a chain of
environmental efects, we extended the scheme with recipient and mitigation classes and omitted
the context class. Furthermore, we omit the relation type as the relationship between subject, object,
recipient and mitigation are implicitly given, e.g., the relation from the subject would in most cases
correspond to “afects” in the SuperPattern Ontology scheme. We call our scheme an efect pattern . This
efect pattern also contains an identifier (the Q-identifier of the Wikibase page) a link to the entity of
the EIA report, a quote from the PDF, page, an indication of the significance of the efect, and link to the
entity of the project phase class (e.g., planning, construction, operation and/or decommissioning). The
four essential parts of the environmental efect chain are subject (usually a physical construction or an
activity), object (also called impact), recipient, and mitigation. These four parts may be represented
with both a string and an entity link: From the manual annotation extracted from tagtog, we get the
string. Using the ontology of efects we manually or semi-automatically—with the aliases presented in
the wiki—make the entity linking for the string to the entities.</p>
      <p>Mitigation in one efect pattern can be a subject in another efect pattern, e.g., a road (subject)
causes noise (object/impact) which afects neighbors (recipient) and can be mitigated by a noise wall
(mitigation). The noise wall (subject in another efect pattern), in turn, causes a visual impact (object)
that afects neighbors (recipient).</p>
      <p>Table 2 displays an example of an efect pattern from an EIA report about a specific photovoltaic
system and wind turbine project. The efect pattern represents that the report describes that the specific
planned photovoltaic system creates a light impact (reflections) that afects the neighbors with a “not
significant and negative impact”. Any efect is mitigated with a vegetation zone and the efect is present
in the operational phase of the project. We have currently recorded 4303 efect patterns in the DREAMS
Wiki.12</p>
    </sec>
    <sec id="sec-7">
      <title>7. Displaying information from a wikibase</title>
      <p>Inspired by Scholia, we created a system for displaying aggregated information from the DREAMS Wiki
via SPARQL queries to the endpoint associated with our Wikibase. Instead of a Python/Flask-based
approach like Scholia, we employ a serverless single-page Web application (SPA). We mirror Scholia’s
URL patterns to distinguish diferent aspects of the data. However, as an SPA we use the URI fragment
to control which aspect and entity to show. For instance, the URI fragment #report/Q153 will display
entity Q153 (the report Vindmøller ved Bjørnstrup) as a report.</p>
      <p>Whereas Scholia stores template SPARQL queries in separate Jinja2 files read during application
startup, our present system uses pages on the wiki itself to store the templates with SPARQL queries. For
instance, the wikipage Dreams:report defines a template for a report corresponding to a URI fragment
regular expression pattern of #report/Q\d+, while Dreams:report-index corresponds to an index page
for reports where the URI fragment pattern is #report. Likewise, the Dreams:projecttype wikipage
defines the template for a project type. Other aspects we have defined on the wiki besides report
and projecttype are pattern, efect, recipient, sdg, location, taxon, and author. The Web application
can display faceted aspects. For instance, #projecttype/Q264/report will show reports for wind
turbine projects. The DREAMS Wiki Query Service can perform federated SPARQL queries against
the Wikidata Query service. We have used that to display “Related works from co-citations” from the
citation information present in Wikidata.</p>
      <p>When a user visits a new “page” on the SPA, the application reads the URI fragment, determines
which template should be downloaded from the wiki, and then builds the HTML page from the
12SPARQL query to the DREAMS Wiki Query Service: https://tinyurl.com/2kczebk3
template. Currently, the SPA recognizes wiki section headers and converts them to HTML headings and
SPARQL queries embedded in a MediaWiki template. The SPARQL queries define a target entity that
is interpolated with the entity identifier extracted from the URI fragment. The interpolated SPARQL
queries are then sent of to the SPARQL endpoint of the wiki. The response from the endpoint is
converted by Javascript to an HTML table like the method in Scholia. Like Scholia, we use jQuery and
SpryMedia’s DataTables library13 to display the tables. The DataTables library can be configured, so we
have switched the language to Danish. The SPARQL query service of the Wikibase.cloud has currently
relatively slow response times when multiple simultaneous queries are issued, hence we have limited
the number of tables shown on each page — usually to just one or two.</p>
      <p>Figure 1 shows a screenshot from a page on the Web application displaying efect patterns for
photovoltaic system projects. Each blue text fragment is linked to another page of the Web application,
displaying other aspects of the data. The “Citat” (quote) column displays an extracted paragraph, where
the efect pattern description originates. The last column in the table is a link to the PDF. With the
page number for the quote, we are able to make a URL that will direct the browser to the page with the
quote. With property path SPARQL queries we can aggregate information across the hierarchy of our
ontologies, e.g., aggregation of all efect patterns from energy industry projects.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Discussion</title>
      <p>Our manual annotation from tagtog did not correspond exactly to the scheme we settle on for
representing the environmental efects in the wiki. Hence manual and time-consuming editing in the
wiki has been necessary, particularly around information about recipients. Manual extraction of efect
patterns from EIA reports does not scale. Machine learning and large language models could help with
relationship extraction and entity linking. However, the integration of machine learning or language
models with the Wikibase.cloud system is not straightforward.</p>
      <p>The use of on-wiki SPARQL templates makes the development of the Web application for displaying
the wiki content more agile. Adding a new aspect requires only the definition of a new wikipage with
SPARQL queries, whereas a Flask/Jinja2 approach requires software developer involvement.</p>
      <p>Although developed for Danish EIA reports, the approach should also work for environmental impact
assessment in other countries. Wikibase is multilingual and some entities in the DREAMS Wiki have
been translated from Danish to English. Future development could explore how to improve the response
time of the display of data in the web application as well as ensuring that search and navigation within
the Web application are eficient for the user.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <p>This research is part of the DREAMS project, Digitally Supporting Environmental Assessment for the
Sustainable Development Goals. This research is made possible by the Innovation Fund of Denmark
(Grant agreement number, 0177-00021B DREAMS).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>European</given-names>
            <surname>Commission</surname>
          </string-name>
          ,
          <string-name>
            <given-names>COWI</given-names>
            ,
            <surname>Milieu</surname>
          </string-name>
          <string-name>
            <surname>Ltd</surname>
          </string-name>
          ,
          <article-title>Environmental impact assessment of projects: guidance on the preparation of the environmental impact assessment report</article-title>
          (
          <year>Directive 2011</year>
          /92/EU as amended by
          <year>2014</year>
          /52/EU),
          <year>2017</year>
          . doi:
          <volume>10</volume>
          .2779/41362.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>