=Paper=
{{Paper
|id=Vol-3743/paper1
|storemode=property
|title=Constructing a Knowledge Graph of Historical Mining Data
|pdfUrl=https://ceur-ws.org/Vol-3743/paper1.pdf
|volume=Vol-3743
|authors=Basel Shbita,Namrata Sharma,Binh Vu,Fandel Lin,Craig A. Knoblock
|dblpUrl=https://dblp.org/rec/conf/geold/ShbitaSVLK24
}}
==Constructing a Knowledge Graph of Historical Mining Data==
Constructing a Knowledge Graph of Historical Mining Data Basel Shbita∗ , Namrata Sharma, Binh Vu, Fandel Lin and Craig A. Knoblock Information Sciences Institute, University of Southern California, Marina del Rey, California, USA Abstract The interpretation and analysis of historical mining data pose significant challenges due to its hetero- geneous and scattered nature across various archival sources. This data is crucial for identifying new sources of critical minerals, understanding past resource utilization, and aiding in future project develop- ment, necessitating spatial, temporal, and semantic integration for comprehensive domain insights. In this paper, we detail our methodology for constructing, modeling, and semantically enriching a knowl- edge graph (KG) centered on historical mining data. Leveraging a custom ontology and semantic web technologies, we transform digitized archival records into a temporally and spatially aware, semantically rich KG. The resulting KG facilitates advanced temporal and spatial analyses through SPARQL queries, and enhances semantic richness by linking to additional data on the web. We demonstrate the application of our KG in the nuanced analysis of historical mining data and the generation of grade and tonnage models for two critical minerals: nickel and zinc. Our evaluation highlights the KG’s effectiveness in spatial and temporal interpretation of mining data, underscores the strengths of our entity linking method with an open knowledge base, and details the performance analysis of query execution. We also make the resulting KG available as open linked data. Keywords knowledge graphs, geospatial linked data, data semantics, mineral mining data, semantic web 1. Introduction Understanding historical mining data is a pursuit of geoscience research and a necessity for informed decision-making in resource management and environmental conservation. The ability to accurately analyze and interpret this data helps identify new sources of critical minerals and illuminates past resource utilization activities. With the increasing demand for mineral resources [1, 2], there is a growing need to draw upon historical mining data to make informed decisions about current and future mining projects. Historical mining data is often heterogeneous in nature, existing in varied forms and scattered across numerous archival sources. In many cases, Subject Matter Experts (SMEs) and orga- nizations such as the United States Geological Survey (USGS) are pivotal in organizing these data. They bring indispensable knowledge to critical tasks, such as mineral assessments [3]. However, historical mining data is scattered across multiple sources - ranging from quantitative 6th International Workshop on Geospatial Linked Data, co-located with the 21st Extended Semantic Web Conference (ESWC 2024), May 26–30, 2024, Hersonissos, Greece ∗ Corresponding author. Envelope-Open shbita@isi.edu (B. Shbita); nsharma4@isi.edu (N. Sharma); binhvu@isi.edu (B. Vu); fandelli@isi.edu (F. Lin); knoblock@isi.edu (C. A. Knoblock) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings ore details in mine reports to spatial layers in existing databases - lacks structured organization, and suffers from issues such as quality, accuracy, and completeness [4, 5]. Knowledge Graphs (KG) are a popular way to represent information in a way that can be easily interpreted by both humans and machines. Emphasizing the value of transforming historical records into structured, queryable formats, KGs offer an effective solution for such transformation, combining expressivity, interoperability, and standardization in the semantic web stack, thus providing a strong foundation for querying and analysis. The evolution and significance of the integration of geospatial data on the web and its extension to linked data has been extensively discussed by Janowicz et al. [6]. Recent techno- logical advances [7] have greatly facilitated the integration of geospatial data from historical archives and maps, transforming these diverse datasets into structured, coherent knowledge bases [8, 9, 10, 11, 12]. A similar transformation is useful in addressing the challenges of mining data analysis, as it enables the consolidation of disparate data sources into a single, accessible, and simple linked data representation - a KG that can be materialized into triples, i.e., Resource Description Framework (RDF) data. To address this task, this paper presents a methodology for the construction, modeling, and augmentation of a KG dedicated to historical mining data. Our approach involves creating a KG from diverse data sources, formulating a custom domain ontology (semantic model) to represent the data, and utilizing open knowledge from the web to enrich and contextualize data about mineral commodities. Moreover, by integrating additional attributes from geospatial databases, such as MRDS [13], we support spatial queries and visualizations pertaining to specific geo-locations, allowing the development of dedicated downstream applications. The resulting KG unlocks significant value in generating grade and tonnage plots for mineral resources and commodities. These plots are pivotal, illustrating the relationship between the grade (mineral content) of a deposit and the available tonnage (quantity of ore), which are crucial for assessing the economic viability of mining projects and for accurate reporting of mineral resources and reserves. Figure 1 showcases a grade-tonnage model derived from our KG, highlighting the technology’s ability to support sophisticated analyses and facilitate the extraction of domain insights with ease. Figure 1: Grade-tonnage model of nickel mineral deposits built from a KG query (SPARQL) response, categorized by their Critical Minerals Mapping Initiative (CMMI) deposit classification. Specific sites are marked to illustrate the variability in grade and tonnage among these deposit types. Through a well-crafted SPARQL [14] query, grade and tonnage data for various minerals can be efficiently retrieved from the KG. SPARQL is a powerful RDF query language and protocol designed for querying and manipulating RDF data (triples) in the KG, enabling users to precisely extract and analyze information based on specific criteria and the most current data. Furthermore, the KG’s flexible structure allows users to tailor queries to specific minerals, geographic areas, or time frames, showcasing its adaptability and comprehensive support for the economic assessments of mining projects. We evaluate the application of our KG through a detailed analysis of historical mining sites, focusing on two critical minerals: zinc and nickel. Our evaluation demonstrates the KG’s ability to generate grade and tonnage plots using SPARQL queries to visualize this data. It highlights the flexibility and robustness of our system in handling complex queries across multiple dimensions. For example, by aggregating mineral sites by their past resource classifications, geochemical deposit classification, or by restricting it to specific geographic regions. Additionally, we conduct a rigorous evaluation of our entity linking method, demonstrating its effectiveness in accurately matching commodities to their corresponding entities in an external knowledge base. We list the contributions of this paper as follows: 1. We present a pipeline for the integration of extracted quantitative, spatial, and semantic information from historical mining data archives, resulting in an integrated KG. 2. We introduce a method to identify and retrieve instances of a given type from a publicly available KG, specifically entity matching commodities with open linked data. 3. We assess the applicability of the resulting KG by designing queries to automatically generate grade-tonnage models for two critical minerals: zinc and nickel. 4. We make the resulting KG publicly available in the form of linked data (queryable RDF via a SPARQL endpoint).1 The rest of the paper is organized as follows: Section 2 outlines our comprehensive method- ology for constructing, linking and augmenting the KG, detailing our entity linking process and the integration of diverse data sources to populate the KG. Section 3 presents our evaluation framework, employing real-world datasets focused on two critical minerals. Section 4 describes related work. Finally, Section 5 concludes the paper by summarizing our findings and outlining future work. 2. Constructing the Knowledge Graph This section describes the methodological framework for constructing the KG for historical mining data. Our methodology is characterized by the formulation of a unique semantic model (Section 2.1), linking with external linked data on the web (Section 2.2), and finally the materialization of the data into a KG (Section 2.3). Each component is meticulously planned to ensure the integration of semantic relationships, spatial data, and temporal dimensions, thereby facilitating a robust analysis of historical and contemporary mining data. 1 https://minmod.isi.edu/sparql 2.1. Defining the Semantic Model Central to the semantic enrichment and structural integrity of our KG is developing a custom ontology and semantic model tailored to the unique characteristics and relationships inherent to historical mining data. This model is designed to capture the domain-specific attributes and complex relationships essential for accurately representing mining information about mineral commodities, such as mineral sites and inventories. Utilizing RDF as a foundational framework, our approach leverages its structural flexibility and suitability for representing diverse metadata forms, including using existing standards that adhere to universally accepted conventions. This is particularly beneficial for supporting spatial queries, a capability enhanced by integrating the OGC GeoSPARQL standard [15]. The use of this standard enriches our model by providing a vocabulary for representing geospatial data on the web, facilitating qualitative spatial reasoning and quantitative spatial computations. As illustrated in Figure 2, the semantic model delineates the primary entities and their relationships. The model identifies :MineralSite entities, characterized by spatial information (:LocationInfo ) encoded using GeoSPARQL namespace and Well-Known Text (WKT) notation for describing geometries. Mineral deposits, representing natural occurrences of minerals, are linked to mineral sites. Each site may contain multiple :MineralInventory items, representing the quantity (:Ore ) and grade (:Grade ) of commodities (:Commodity ) present. The model allows for aggregating these commodities by their :ResourceCategory (enumeration from a predefined list, e.g., indicated, measured, inferred), and the association of each site with a specific :DepositType , that is also selected from a predefined list, according to the CMMI standards [16]. Figure 2: Semantic model of the mining data structure. Cardinalities (in blue) show one-to-many node relationships. Circular nodes represent instances, rectangular nodes represent literals (or enumerations). The “:” denotes our namespace. Including missing information or expert inputs on deposit types underscores the KG’s ability to facilitate data classification and aggregation by spatial or attribute data. Each inventory item is associated with a :Reference to :Document provenance, ensuring data veracity and enabling effective interaction with the KG. owl:sameAs enhances the integration with external sources (e.g., geoKB) and within our KG (:MineralSite instances). Furthermore, inventory items are tagged with dcterms:date properties to indicate temporal aspects, adhering to the Dublin Core Metadata Initiative and W3C recommendations, enhancing the model’s comprehensiveness. 2.2. Entity Linking with geoKB Our approach to enhancing the KG with links going to additional sources involves a simple entity linking process with geoKB2 , the Geoscience Knowledge Base developed by the USGS. This process is pivotal in enriching our KG with validated and scientifically relevant data, leveraging the extensive earth systems science portfolio within geoKB. Figure 3 demonstrates how the nickel mineral is depicted in geoKB, highlighting the depth of linked data that enriches our KG with details on mineral species and their historical classifications. Figure 3: Illustration of the nickel mineral species in geoKB, showcasing the depth of information our entity linking method accesses, enriching our KG with metadata from external sources. At the heart of our methodology is a constrained search within geoKB, using SPARQL queries designed to find commodities classified under certain instance types. This approach narrows the scope of potential matches to enhance their relevance. Listing 1 presents a query aimed at fetching candidate mineral commodity instances of nickel from geoKB, marking the first step in our entity linking process. The query filters for entities that are instances of (P1 ) mineral commodities (Q406 ) as seen in line 3 (where gkbt and gkbi represent namespaces for predicates and instances within geoKB, respectively). The FILTER clause (line 4) conducts a case-insensitive search to align the entity labels with the commodity string in question, exemplifying our strategy for extracting semantically related instances. Following retrieval, we apply the Jaccard [17] similarity measure for set-based comparison between the commodity strings and geoKB entity labels. This process, based on the intersection over the union of the derived sets, helps determine the top instance among the candidates. The selected instance is then leveraged to enrich our KG, infusing it with additional semantics 2 https://geokb.wikibase.cloud/ 1 SELECT ?entity ?entityLabel WHERE { 2 ?entity rdfs:label ?entityLabel. 3 ?entity gkbt:P1 gkbi:Q406. # instance of mineral commodity 4 FILTER(CONTAINS(LCASE(?entityLabel), "nickel")) } Listing 1: A SPARQL query example targeting the nickel mineral commodity in geoKB serving as a foundational step for entity linking. and metadata from sources like Wikidata [18], the Geoscience Ontology [19], and geoKB itself. Consequently, our KG is augmented with comprehensive information on mineral species and historical mineral classifications, among other data, thereby enhancing its semantic richness and inter-connectedness. 2.3. Transforming the Data into Triples Our methodology for populating the KG leverages data sourced from semi-structured (tabular format) and structured (JSON files) data related to mining activities and mineral commodities. The initial step in the KG construction involves meticulous data cleaning and normalization, including entity deduplication and URI (Uniform Resource Identifier) mapping to ensure each entity, such as a mineral site, inventory, or document, is uniquely identifiable and accessible on the web. URIs are generated using a hash function (e.g., MD5) to create a de-referenceable URI from the unique combination of the defining attributes of an instances, including feature type, location data, and temporal information. For instance, a :MineralInventory instance’s URI is constructed using a concatenation of its commodity URI, category, the referring mineral site URI, and any associated document URIs. A similar approach is taken for :MineralSite and :Document URIs, using their respective source identifiers and bibliographic data. In our KG, entities such as commodities and deposit types are linked to external knowledge bases or predefined lists, like the CMMI, while other entities are represented as blank nodes within our namespace. The transformation of data into RDF triples is facilitated by automated tools like D-REPR [20] and SAND [21], ensuring seamless integration and update of information. Geospatial data are materialized using dedicated namespaces, notably GeoSPARQL, which augments the KG’s capacity for spatial analysis. Additionally, the ontology supports OWL [22] for representing complex inter-entity relationships and attributes. A validation layer, crucial for maintaining data integrity and consistency, is implemented through an automated system using SHACL [23]. This validation ensures that our KG not only accurately represents the data but also adheres to the predefined semantic model (Section 2.1), enabling reliable and sophisticated queries. 3. Evaluation and Discussion Our evaluation framework includes qualitative and quantitative analyses over a KG built from a dataset covering two mineral commodities, focusing on the KG’s adherence to the semantic model and completeness (Section 3.1), entity linking with geoKB (Section 3.2), and its utility and performance in advanced data analysis (Section 3.3), particularly in generating grade and tonnage models. The extensive dataset we use covers over 50 NI 43-101 technical reports (International Strategic Mineral Inventory reports) on nickel (2001 to 2021) and zinc (2002 to 2019), supplemented by spatial data from the MRDS (Mineral Resources Data System) and USMIN (US Mineral Deposit Database) databases. Sources such as Mudd and Jowitt’s compiled work on zinc from 2017 [24], and on nickel from 2022 [25], provide additional data and extensive coverage on various commodities as well, offering a rich blend of geospatial, geological, and economic data from various global locations. 3.1. Evaluation on the Semantic Model Our evaluation confirms the KG’s adherence to the semantic model, reflecting accurate domain representation in compliance with RDF standards for enhanced query performance and data interoperability. Our resulting KG characteristics are described in Table 1. The resulting KG hosts a significant number of instances and blank nodes, suggesting a rich network of connected data. Diving deeper, the KG encapsulates vital entities including :MineralSite , :MineralInventory , :Commodity , :DepositType , and :Reference . The en- tities form a network that reflects the complex interactions and interplay between various facets of mining data. For instance, :MineralSite entities are geospatially positioned through :LocationInfo relationships, while each site can encompass multiple :MineralInventory records, detailing the reserves in terms of quantity and grade for each :Commodity . Moreover, the KG adheres to CMMI standards for deposit classification and manages data provenance through :Reference links to detailed source citations, as expected. Table 1 Historical mining data knowledge graph characteristics. Characteristic Count Total Triples 2,397,708 Distinct Classes 16 Instances (Non-literals) 226,267 Geospatial Instances 2,884 Blank Nodes 1,518,981 The granular representation of the data in the resulting graph, with 1,112 zinc and 1,132 nickel reserve and resource measurements, alongside 3,809 zinc and 2,021 nickel mineral site instances, demonstrates the KG’s quantitative depth and utility. This structured model captures key attributes, ensuring interoperability and alignment with semantic data representation best practices. The approach we present is complete and follows linked data principles by: • Generating URIs as names for things, without modifying previously published identifiers • Maintaining existing relations (predicates) between instances (“backward compatibility”) • Generating machine-readable structured data • Using standard namespaces and semantics (e.g., OWL, Dublin Core, GeoSPARQL) • Linking to additional resources on the web (e.g., geoKB) 3.2. Evaluation on Entity Linking To evaluate the effectiveness of our proposed entity-linking method, we conducted an evaluation using a dataset of 135 extracted commodity labels mapped by human experts to geoKB. This dataset serves as a benchmark for determining the success of our entity linking method. To provide a comprehensive evaluation, we contrasted our approach against three distinct baseline methods within geoKB. The first two baselines involve a generalized, string search strategy with SPARQL, then label comparison — one utilizing the Jaro [26] string similarity measure and the other employing the Jaccard measure — based purely on textual relevance. The third baseline adopts a constrained search strategy (instance based) combined with the Jaro similarity measure, similar in part to our proposed method, refining the search scope yet still leveraging Jaro’s well-regarded efficiency in measuring string similarities for short texts, such as names. This multifaceted comparison shows substantial gains in matching accuracy with our pro- posed method — instance-based constrained search followed by Jaccard similarity measure for set-based comparisons — over the baselines. Specifically, our method has demonstrated significant performance enhancements, showcasing the advantage of a constrained search strategy combined with the precision of Jaccard similarity. This approach not only refines the selection of potential matches but also ensures a high degree of textual similarity between the commodity strings and the linked geoKB entities. The results are summarized in Table 2. Table 2 Evaluation results for the entity linking experiments with geoKB. Method MRR Hits@1 Hits@3 Hits@5 String search, then Jaro 0.557 0.459 0.659 0.659 String search, then Jaccard 0.648 0.637 0.659 0.659 Instance search, then Jaro 0.801 0.689 0.926 0.956 Instance search, then Jaccard (proposed) 0.940 0.904 0.978 0.978 The results underscore the superior performance of our proposed entity linking approach, with a Mean Reciprocal Rank (MRR) of 0.940 and impressive Hits@1, Hits@3, and Hits@5 rates of 0.904, 0.978, and 0.978 again, respectively. These metrics notably surpass those of the baseline methods, thereby affirming the utility of combining constrained search with the Jaccard similarity measure. The MRR value, being substantially higher than that of the baselines, indicates that our method consistently identifies the most relevant geoKB entity at the top rank. The high Hits@1 value signifies that the correct entity is identified as the top match in a significant majority of cases, a critical metric for applications relying on precision. Similarly, the near-perfect Hits@3 and Hits@5 scores suggest that if the top match isn’t the exact entity, it is highly likely to be within the top 3 or 5 candidates, offering a valuable safety net for ensuring data quality in the KG. These results collectively justify our method’s design, which meticulously tailors the search and comparison phases to optimize for both accuracy and relevance in entity linking. 3.3. Evaluation on Querying the KG To assess the query performance, utility, and effectiveness of our KG in extracting relevant information for creating grade-tonnage models, we executed a series of SPARQL queries. These queries, aimed at testing the KG’s ability to retrieve grade and tonnage data under diverse con- straints and scenarios, were conducted using RDF triples hosted on Apache Jena3 , a lightweight and programmable environment with geospatial query support. The baseline query, shown in Listing 2, is crucial for fetching grade and tonnage data along with site and inventory identifiers, and serves as the foundation for developing more complex queries. By building on this founda- tional query, we introduced three distinct constraint types in our subsequent queries — textual, temporal, and spatial — thereby not only testing the KG’s flexibility in meeting varied query requirements but also showcasing its capability to provide precise and contextually relevant data across different analytical dimensions. 1 SELECT ?ms ?mi ?ms_name ?mi_cat ?ore ?grade 2 WHERE { 3 ?ms :mineral_inventory ?mi . 4 OPTIONAL { ?ms rdfs:label|:name ?ms_name . } 5 ?mi :category ?mi_cat . 6 ?mi :ore [ :ore_value ?ore; 7 :ore_unit ?ore_unit] . 8 ?mi :grade [ :grade_value ?grade; 9 :grade_unit ?grade_unit] . } Listing 2: Baseline SPARQL query for grade and tonnage data. In the first type of query we retrieve ore and tonnage data for a specific commodity. This query aims to retrieve ore grade and tonnage data for all inventories associated with a specified commodity. It demonstrates the KG’s capability to filter data based on commodity type, which is essential for users interested in specific mineral insights. Listing 3 shows the added clause needed to retrieve entries for a given mineral commodity name, nickel in the example. 1 ?mi :commodity/:name "nickel"@en . Listing 3: SPARQL clause for filtering by commodity type: this clause filters inventory items to retrieve data specific to the nickel commodity, demonstrating how to tailor queries for particular mineral. In the second type of query we retrieve ore and tonnage data with an emphasis on a temporal constraint on document provenance, from which the data originated. This query filters ore and tonnage data based on the publication date of the source documents. Such a query is useful for researchers interested in how grade and tonnage estimates could have changed over time or analyze specific data within a specific timeframe. Listing 4 shows the added clause needed to retrieve inventory items pertaining to specific time ranges. In this case we are fetching inventories from documents published between the year 2000 to 2010. 3 https://jena.apache.org/ 1 ?mi :reference/:document [ dcterms:date ?date ] . 2 FILTER(?date >= "2000"^^xsd:gYear && ?date <= "2010"^^xsd:gYear) . Listing 4: SPARQL clause for temporal filtering: this clause applies a temporal filter to select inventory items based on their document’s publication year between 2000 and 2010, showcasing the KG’s ability to analyze historical data over a specific time range. Utilizing Apache Jena’s support for GeoSPARQL, the third query retrieves tonnage data from inventories at mineral sites within a certain distance from a given point. It exemplifies the KG’s spatial querying capabilities, which are crucial for geographical analyses and decision-making. Listing 5 shows the added clause needed to retrieve mineral sites, with inventory items, that are within a specific distance from a given point data in WKT format. In this example we are searching for mines that are within 500 miles from given coordinates. 1 ?ms :location_info/:location ?loc_wkt . 2 FILTER(geof:distance(?loc_wkt, "POINT(-118.57 47.56)"^^geo:wktLiteral, unit:mile) < 500) Listing 5: SPARQL clause for spatial proximity filtering: this clause leverages GeoSPARQL to find mineral sites within a 500-mile radius of a specified point, exemplifying spatial querying capabilities for geographical analysis. The geof and unit namespaces are standard namespaces utilized for specifying distance measurements and units, respectively. Table 3 presents a summary of the query-time performance, including average, minimum, and maximum times, effectively showcasing the efficiency of our KG when operating under various query constraints. This efficiency is underscored by the execution of hundreds of similar queries across a diverse range of values for each constrained scenario, further demonstrating the robustness and adaptability of our system in handling retrieval tasks. Table 3 Query time statistics (in milliseconds) Query Constraint Type Avg Min Max Textual 450 369 649 Temporal/Numeric 438 388 607 Spatial 708 501 811 The results outlined in Table 3 showcase the KG’s performance across different query con- straints, with query times measured in milliseconds. The average query time for textual constraints was notably efficient at 450 ms, reflecting the rapid response to straightforward textual searches. Temporal queries, with an average time of 438 ms, highlight the KG’s adept handling of quantitative and temporal data retrieval, facilitating temporal analysis. Spatial queries, while more computationally intensive due to the nature of geospatial data processing, still performed admirably with an average time of 708 ms. This demonstrates the system’s capacity to efficiently manage spatial reasoning tasks, a crucial aspect for mining data analysis where geographical context is vital. These results are not only indicative of the KG’s robust performance but also validate our methodological choices and architecture. The swift response times, especially for spatial queries, are a testament to the efficiency of integrating GeoSPARQL and our custom semantic model, facilitating advanced spatial analyses. Furthermore, the accurate retrieval of information across all query types confirms the KG’s utility in supporting complex queries for critical tasks such as generating grade-tonnage models, as we demonstrate in Figure 1. The application of SPARQL queries against our KG exemplifies the invaluable insights gained from the fusion of semantic web technology with spatial visualization techniques, enabling the straightforward interpretation of otherwise complex geographic data. For example, Figure 4 presents a detailed visualization of nickel mineral sites across the United States, categorized according to CMMI standards and overlaid on a topographic map, demonstrating the expansive coverage of our KG. By integrating this classification with other geospatial data, such as geological formations and stratigraphy information, we can significantly enhance the multi- dimensional analytical capabilities available to SMEs, allowing for predictive modeling of mineral potential and helping to identify unexplored areas with high resource prospects. Figure 4: Nickel mineral sites in the US by CMMI classification on a topographic map background. Example of a spatial visualization based on data derived from the KG and retrieved via SPARQL, showcasing nickel mine distribution. By structuring historical and current mining data within a KG, we enable powerful query capabilities through SPARQL, facilitating the retrieval and representation of complex data sets easily and quickly. The query results above establish high confidence in our model, showing that we can easily and effectively answer complex queries in a robust manner. Furthermore, the integration of our commodity data with geoKB enhances our KG’s utility by enabling federated SPARQL queries, which allow us to fetch additional data from external sources such as Wikidata. This capability significantly broadens the scope of our analysis, providing access to a wealth of information that complements our existing datasets. Overall, we demonstrated that our approach and the proposed pipeline can be effectively used to automatically construct effective and contextualized open KGs and linked data from historical and contemporary mining data, as well as support both temporal and spatial analysis. 4. Related Work Recent advancements in geology and earth science data analysis have been significantly pro- pelled by the application of machine learning techniques, which have enabled the enhancement of data mining and extraction for geology and mineral data [27]. These developments have shown considerable promise in various applications, ranging from prospectivity mapping to knowledge organization in the natural sciences [28, 29]. However, the full utilization of semantic and spatial relationships in historical mining data remain largely underexplored, indicating a gap in the current research landscape. In the domain of geoscientific knowledge graphs, our work complements existing knowledge bases such as GeoKB and the Geoscience Ontology [19], by addressing the nuanced gaps in the semantic enrichment and spatial analysis of historical mining data, areas often overlooked in the broader context of such applications. This gap presents a unique opportunity to contribute to the field by leveraging semantic web technologies with spatial and temporal data analysis to enrich our understanding of historical mining activities and their implications for contemporary and future mining endeavors. 5. Conclusion and Future Work In this paper, we introduce a comprehensive approach for constructing, modeling, and enriching a Knowledge Graph (KG) that captures the spatial and temporal dynamics, along with the complex semantic relationships, within historical mining data. Our approach enriches our understanding of historical mining operations and resource utilization and provides invaluable insights for academic research and practical applications in the mining industry. Looking ahead, the continued development of our KG opens several promising avenues for further exploration. Plans are underway to integrate a broader range of data covering additional critical minerals, which promises to significantly enhance and expand the analytical capabilities of our KG alongside the inclusion of more diverse historical datasets. Furthermore, integrating advanced machine learning algorithms with our KG can reveal novel insights from historical data, revolutionizing the mining sector’s strategic planning and operational efficiencies. Further enhancements could also explore the improvement of the semantic enrichment process to enhance the accuracy and relevance of the extracted and linked information, providing even more detailed insights into the complex history of mining data. Additionally, applying our methodologies to other historical contexts, such as assessing environmental impacts, represents a promising direction for extending the impact of our work beyond the mining domain. Acknowledgments This material is based upon works supported by the Defense Advanced Research Projects Agency (DARPA) under Agreement No. HR00112390132 and Contract No. 140D0423C0093. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the Defense Advanced Research Projects Agency (DARPA); or its Contracting Agent, the U.S. Department of the Interior, Interior Business Center, Acquisition Services Directorate, Division V. We thank Dr. Graham W. Lederer (United States Geological Survey) and Dr. Simon M. Jowitt (University of Nevada Reno, Nevada), who provided insights and expertise in the geology that greatly assisted the research. References [1] K. J. Schulz, Critical mineral resources of the United States: economic and environmental geology and prospects for future supply, Geological Survey, 2017. [2] S. M. Fortier, N. T. Nassar, G. W. Lederer, J. Brainard, J. Gambogi, E. A. McCullough, Draft critical mineral list—Summary of methodology and background information—US Geological Survey technical input document in response to Secretarial Order No. 3359, Technical Report, US Geological Survey, 2018. [3] C. J. Green, G. W. Lederer, H. L. Parks, M. L. Zientek, Grade and tonnage model for tungsten skarn deposits—2020 update, Technical Report, US Geological Survey, 2020. [4] W. C. Day, The Earth Mapping Resources Initiative (Earth MRI): Mapping the Nation’s critical mineral resources, Technical Report, US Geological Survey, 2019. [5] A. H. Hofstra, V. Lisitsin, L. Corriveau, S. Paradis, J. Peter, K. Lauzière, C. Lawley, M. Gadd, J.-L. Pilote, I. Honsberger, et al., Deposit classification scheme for the Critical Minerals Mapping Initiative Global Geochemical Database, Technical Report, US Geological Survey, 2021. [6] K. Janowicz, S. Scheider, T. Pehle, G. Hart, Geospatial semantics and linked spatiotemporal data–past, present, and future, Semantic Web 3 (2012) 321–332. [7] Y.-Y. Chiang, S. Leyk, C. A. Knoblock, A survey of digital map processing techniques, ACM Computing Surveys (CSUR) 47 (2014) 1–44. doi:10.1145/2557423 . [8] M. Alirezaie, M. Längkvist, M. Sioutis, A. Loutfi, Semantic referee: a neural-symbolic framework for enhancing geospatial semantic segmentation, Semantic Web 10 (2019) 863–880. doi:10.3233/SW- 190362 . [9] Z. Li, Y.-Y. Chiang, S. Tavakkol, B. Shbita, J. H. Uhl, S. Leyk, C. A. Knoblock, An au- tomatic approach for generating rich, linked geo-metadata from historical map im- ages, Association for Computing Machinery, New York, NY, USA, 2020, pp. 3290–3298. doi:10.1145/3394486.3403381 . [10] J. H. Uhl, S. Leyk, Z. Li, W. Duan, B. Shbita, Y.-Y. Chiang, C. A. Knoblock, Combining remote-sensing-derived data and historical maps for long-term back-casting of urban extents, Remote Sensing 13 (2021) 3672. doi:10.3390/rs13183672 . [11] B. Shbita, C. A. Knoblock, W. Duan, Y.-Y. Chiang, J. H. Uhl, S. Leyk, Building spatio- temporal knowledge graphs from vectorized topographic historical maps, Semantic Web 14 (2023) 527–549. doi:10.3233/SW- 222918 . [12] Y.-Y. Chiang, M. Chen, W. Duan, J. Kim, C. A. Knoblock, S. Leyk, Z. Li, Y. Lin, M. Namgung, B. Shbita, et al., GeoAI for the digitization of historical maps, in: Handbook of Geospatial Artificial Intelligence, CRC Press, 2023, pp. 217–247. [13] E. McFaul, G. Mason, W. Ferguson, B. Lipin, US Geological Survey mineral databases; MRDS and MAS/MILS, Technical Report, US Geological Survey, 2000. [14] W. W. W. Consortium, et al., SPARQL 1.1 overview, Technical Report, World Wide Web Consortium, 2013. [15] N. J. Car, T. Homburg, GeoSPARQL 1.1: Motivations, details and applications of the decadal update to the most important geospatial lod standard, ISPRS International Journal of Geo-Information 11 (2022) 117. [16] K. D. Kelley, D. L. Huston, J. M. Peter, Toward an effective global green economy: The critical minerals mapping initiative (cmmi), SGA News 8 (2021) 1–5. [17] P. Jaccard, The distribution of the flora in the alpine zone. 1, New phytologist 11 (1912) 37–50. [18] D. Vrandečić, M. Krötzsch, Wikidata: a free collaborative knowledgebase, Communications of the ACM 57 (2014) 78–85. [19] B. Brodaric, S. M. Richard, The geoscience ontology, in: AGU Fall Meeting Abstracts, volume 2020, 2020, pp. IN030–07. [20] B. Vu, J. Pujara, C. A. Knoblock, D-REPR: a language for describing and mapping diversely- structured data sources to rdf, in: Proceedings of the 10th International Conference on Knowledge Capture, 2019, pp. 189–196. [21] B. Vu, C. A. Knoblock, SAND: A tool for creating semantic descriptions of tabular sources, in: European Semantic Web Conference, Springer, 2022, pp. 63–67. [22] D. L. McGuinness, F. Van Harmelen, et al., OWL web ontology language overview, W3C recommendation 10 (2004) 2004. [23] W. W. W. Consortium, et al., Shapes constraint language (SHACL), Technical Report, World Wide Web Consortium, 2017. [24] G. M. Mudd, S. M. Jowitt, T. T. Werner, The world’s lead-zinc mineral resources: scarcity, data, issues and opportunities, Ore Geology Reviews 80 (2017) 1160–1190. [25] G. M. Mudd, S. M. Jowitt, The new century for nickel resources, reserves, and mining: Reassessing the sustainability of the devil’s metal, Economic Geology 117 (2022) 1961–1983. [26] M. A. Jaro, Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida, Journal of the American Statistical Association 84 (1989) 414–420. [27] Y. Qun, X. Linfu, L. Yongsheng, W. Rui, W. Bo, D. Ke, W. Jianbang, Mineral prospectivity mapping integrated with geological map knowledge graph and geochemical data: A case study of gold deposits at raofeng area, shaanxi province, Ore Geology Reviews (2023) 105651. [28] Y. Zhu, W. Zhou, Y. Xu, J. Liu, Y. Tan, et al., Intelligent learning for knowledge graph towards geological data, Scientific Programming 2017 (2017). [29] C. Wang, X. Ma, J. Chen, J. Chen, Information extraction and knowledge graph construction from geoscience literature, Computers & geosciences 112 (2018) 112–120.