<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Management at Scale: TERN's Implementation for Ecological Data Integration</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Junrong Yu</string-name>
          <email>junrong.yu@uq.edu.au</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Javier Sanchez Gonzalez</string-name>
          <email>j.sanchezgonzalez@uq.edu.au</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edmond Chuc</string-name>
          <email>edmond@kurrawong.ai</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Siddeswara Mayura Guru</string-name>
          <email>s.guru@uq.edu.au</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>KurrawongAI</institution>
          ,
          <addr-line>72 Yundah St, Shornclife, QLD 4017</addr-line>
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The University of Queensland</institution>
          ,
          <addr-line>Indooroopilly, QLD, 4068</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <fpage>2</fpage>
      <lpage>6</lpage>
      <abstract>
        <p>The Terrestrial Ecosystem Research Network (TERN) is Australia's National Collaborative Research Infrastructure for collecting, collating, and publishing key terrestrial ecosystem parameters across space and time. TERN observes the ecosystem across multiple scales using satellite remote sensing, drones, in-situ sensors, and human observations. In addition, TERN also publishes data from partnering institutes. Therefore, data management practices deal with heterogeneous data. Hence, harmonising data from various sources is a challenge.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>TERN uses controlled vocabularies to describe and represent all data-related artefacts. These include
platforms, sensors/instruments, observable properties, methods, people and organisations. Most
vocabularies are developed internally, while subsets representing platforms, instruments, and observed
properties are imported from authoritative external sources, including GCMD and CF metadata
conventions. Controlled Vocabularies are essential digital assets of the TERN data infrastructure and are used
to describe data and related artefacts consistently. Hence, vocabulary development and management
are crucial for TERN data management strategies to describe, index, and retrieve data-related artefacts.</p>
      <p>TERN develops vocabularies to achieve three key objectives: (1) Support consistent machine-readable
descriptions of digital artefacts including parameters, feature types, methods, platforms, instruments,
observable properties and measurement units; (2) Improve data discoverability through applications
(S. M. Guru)</p>
      <p>CEUR</p>
      <p>
        ceur-ws.org
like the TERN Data Discovery Portal [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], EcoPlots[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and EcoImages [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]; (3) Facilitate interoperability
with other data systems through explicit data representations.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Implementation and Technical Architecture</title>
      <p>
        TERN Data infrastructure creates and curates two kinds of vocabularies: SKOS-based vocabularies
including concepts and concept schemes [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]; Instances of ontology classes defined in the TERN Ontology
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Most of the vocabularies are SKOS-based. However, if we represent an instance of data artefacts, we
describe them as an instance of a class. For example, most platforms, sensors, people and organisations
are ontology-based vocabularies.
      </p>
      <p>
        Our semantic approach addresses common vocabulary management challenges: terminology
inconsistencies across projects, manual validation workflows, and limited machine readability [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. It enables
automated data validation, federated search across datasets, and sustainable vocabulary preservation
through open services.
      </p>
      <p>
        The vocabulary infrastructure follows FAIR principles (Findable, Accessible, Interoperable, Reusable)
using established guidelines for machine-readable vocabulary development [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Our collection spans
ecological research activities with significant scale: 20 concept schemes covering diferent ecological
data aspects, 138 collections containing 12,603 concepts, 302 research platforms across Australia and New
Zealand, and metadata for 72 organisations. The largest concept scheme covers ecological parameters
with 6,470 concepts organised in hierarchies of up to 6 levels.
      </p>
      <p>The technical architecture (Figure 1) uses GraphDB as the RDF triple store for vocabulary storage
and retrieval. VocBench 3.0 provides the primary editing interface where ecologists create, modify, and
deprecate vocabulary terms through a collaborative workflow with automated SHACL validation and
editorial review processes. Approved changes are directly committed to GraphDB with named graph
versioning for release management.</p>
      <p>
        DUMA, a React-based application, handles people and organisation vocabularies through REST
APIs that writes to GraphDB with integrated SHACL validation. Apache Airflow DAGs automate the
synchronisation of external vocabulary sources into our system. TERN linked data viewer [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] provides
public access to all vocabularies, with its front end and back end supported by Prez [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], which is
developed by KurrawongAI [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Approved versions are published to Research Vocabularies Australia
(RVA) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] for broader community access through external APIs.
      </p>
      <p>For downstream applications, all vocabularies are indexed in Elasticsearch through scheduled Airflow
workflow DAG that maintain daily synchronisation across the infrastructure. When leveraging existing
vocabularies from sources like GCMD or CF standard names, we create local versions linked via
exactMatch relationships to retain control while preserving interoperability.</p>
      <p>The system tracks editorial status through flags (Draft, Published, Under Revision, Deprecated) with
automated provenance documentation for all vocabulary modifications.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Applications</title>
      <p>
        The Vocabularies are used in multiple TERN applications. The vocabularies drive data discovery in
the TERN Data Discovery Portal (TDDP) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a gateway to access all TERN published data. The TDDP
will enable users to search based on platforms, instruments, parameters, people and organisations.
Users can view all controlled vocabularies from each metadata record. EcoPlots [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a data integration
platform for site-based systematic surveys, uses controlled vocabularies to map source data to standard
vocabularies to enable harmonisation and integration. EcoImages [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], an image repository for
ecologybased image collections, uses vocabularies to map data sources to drive harmonisation and integration.
All TERN vocabularies are available through SHaRED (TERN data submission tool) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] for data
librarians and researchers publishing datasets, with over 750 users from more than 70 organisations
using existing vocabularies in their data submission and sometimes, contributing vocabulary terms as
well. Vocabularies are integrated with the metadata authoring editor so that data librarians can tag data
with the pre-defined list. If none of the controlled lists are suitable, the tool enables them to create a
new term, which will be reviewed and published.
      </p>
      <p>
        All proposed vocabularies are available in RDF format and programmatically accessible for maximum
reuse. Several external organisations from Australia’s state and federal government agencies reuse
TERN feature types and observable properties vocabularies. The Biodiversity Data Repository (BDR)
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], developed in partnership with TERN, utilises TERN controlled vocabularies to represent data they
collect from industries and state and federal government agencies. Additionally, TERN has developed
controlled vocabularies for the Ecological Monitoring System Australia (EMSA) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] project, covering
various aspects of ecological field surveys. These EMSA vocabularies are employed in field survey
datasets to provide essential data context, with datasets hosted by the BDR as one of their primary
vocabulary systems. The vocabulary system has saved time by avoiding manual harmonisation of
diferent term labels across projects, with nearly 2800 metadata records and 10 million field observations
now linked through standardised vocabularies. Government agencies leverage TERN vocabularies in
their data publications and representation, while researchers consistently provide feedback to improve
vocabulary-based data search and management. TERN vocabularies are continued to be widely adopted
across ecosystem communities, contributing to standardised data practices.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>TERN’s semantic-enabled vocabulary system has significantly improved data representation,
discoverability and interoperability in multiple TERN applications. Vocabularies are used by other system run
by Australian federal and state government agencies. The machine-readable format enables consistent
descriptions of all digital artefacts and supports automated data validation plus federated search. The
system and processes developed have significantly lowered the barrier in the community to represent
any terms with associated meaning.</p>
      <p>What made this work? Three things: getting domain scientists involved early to describe terms
and definitions, developing semantic-enabled systems and processes to manage vocabularies, and
automating quality checks with SHACL validation. The robust publication processes, while keeping
vocabularies available through open services and from multiple endpoints.</p>
      <p>Based on user feedback requesting better vocabulary discovery, including cross-domain examples
like agriculture, we plan to integrate Large Language Models (LLMs) into our vocabulary management
workflow to enhance semantic inference and search capabilities. This integration will enable more
intuitive vocabulary discovery, where users searching for broad concepts like “cover” can automatically
retrieve related terms such as “vegetation cover,” “ground cover,” and “canopy cover,” while searches for
specific terms like “vegetation cover” will surface semantically similar vocabularies including “plant
cover” and “forest cover.” The LLM-enhanced system will leverage natural language processing to
understand user intent and provide contextually relevant vocabulary recommendations.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT in order to: Grammar and spelling
check, Paraphrase and reword. After using this tool/service, the author(s) reviewed and edited the
content as needed and take(s) full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F. Z.</given-names>
            <surname>Amara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hemam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Djezzar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maimour</surname>
          </string-name>
          ,
          <article-title>Semantic web technologies for internet of things semantic interoperability</article-title>
          , in: Y. Maleh,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alazab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gherabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Tawalbeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Abd</surname>
          </string-name>
          El-Latif (Eds.),
          <source>Advances in Information, Communication and Cybersecurity</source>
          , Springer International Publishing, Cham,
          <year>2022</year>
          , pp.
          <fpage>133</fpage>
          -
          <lpage>143</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Terrestrial</given-names>
            <surname>Ecosystem Research</surname>
          </string-name>
          <article-title>Network (TERN)</article-title>
          ,
          <source>TERN Data Discovery Portal</source>
          ,
          <year>2025</year>
          . URL: https: //portal.tern.org.au/.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Terrestrial</given-names>
            <surname>Ecosystem Research</surname>
          </string-name>
          <article-title>Network (TERN)</article-title>
          ,
          <source>EcoPlots</source>
          ,
          <year>2025</year>
          . URL: https://ecoplots.tern.org.au/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Terrestrial</given-names>
            <surname>Ecosystem Research</surname>
          </string-name>
          <article-title>Network (TERN)</article-title>
          ,
          <source>EcoImages</source>
          ,
          <year>2025</year>
          . URL: https://ecoimages.tern. org.au/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Miles</surname>
          </string-name>
          , S. Bechhofer,
          <article-title>SKOS simple knowledge organization system reference</article-title>
          ,
          <source>W3C Recommendation</source>
          ,
          <year>2009</year>
          . URL: https://www.w3.org/TR/skos-reference/.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Terrestrial</given-names>
            <surname>Ecosystem Research</surname>
          </string-name>
          <article-title>Network (TERN)</article-title>
          ,
          <source>TERN Ontology</source>
          ,
          <year>2025</year>
          . URL: https://github.com/ ternaustralia/ontology_tern.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Di Muri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pulieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Raho</surname>
          </string-name>
          , et al.,
          <article-title>Assessing semantic interoperability in environmental sciences: variety of approaches and semantic artefacts</article-title>
          ,
          <source>Scientific Data</source>
          <volume>11</volume>
          (
          <year>2024</year>
          )
          <article-title>1055</article-title>
          . URL: https://doi.org/10.1038/s41597-024-03669-3. doi:
          <volume>10</volume>
          .1038/s41597- 024- 03669- 3.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S. J. D.</given-names>
            <surname>Cox</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gonzalez-Beltran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Magagna</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-C. Marinescu</surname>
          </string-name>
          ,
          <article-title>Ten simple rules for making a vocabulary fair</article-title>
          ,
          <source>PLOS Computational Biology</source>
          <volume>17</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          . URL: https://doi.org/10.1371/journal. pcbi.1009041. doi:
          <volume>10</volume>
          .1371/journal.pcbi.
          <volume>1009041</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Terrestrial</given-names>
            <surname>Ecosystem Research</surname>
          </string-name>
          <article-title>Network (TERN)</article-title>
          ,
          <source>TERN Linked Data</source>
          ,
          <year>2025</year>
          . URL: https://linkeddata. tern.org.au/.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>KurrawongAI</surname>
          </string-name>
          , Prez,
          <year>2025</year>
          . URL: https://github.com/RDFLib/prez.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11] KurrawongAI, KurrawongAI,
          <year>2025</year>
          . URL: https://kurrawong.ai/.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12] Australian Research Data Commons, Research Vocabularies Australia,
          <year>2025</year>
          . URL: https://vocabs. ardc.edu.au/.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Terrestrial</given-names>
            <surname>Ecosystem Research</surname>
          </string-name>
          <article-title>Network (TERN)</article-title>
          ,
          <source>TERN SHaRED Data Submission Tool</source>
          ,
          <year>2024</year>
          . URL: https://shared.tern.org.au/.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <article-title>Australian Government Department of Climate Change, Energy, the Environment</article-title>
          and Water,
          <source>Biodiversity Data Repository</source>
          ,
          <year>2025</year>
          . URL: https://bdr.gov.au/.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Terrestrial</given-names>
            <surname>Ecosystem Research</surname>
          </string-name>
          <article-title>Network (TERN)</article-title>
          ,
          <source>Ecological Monitoring System Australia</source>
          ,
          <year>2025</year>
          . URL: https://emsa.tern.org.au/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>