<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Term Frequency Analysis for Semantic Modeling of Geological Fault Knowledge in the Energy Industry</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Fabio</forename><forename type="middle">C</forename><surname>Cordeiro</surname></persName>
							<email>fabio.cordeiro@petrobras.com.br</email>
							<affiliation key="aff0">
								<orgName type="department">Petrobras Research and Development Center</orgName>
								<address>
									<addrLine>Avenida Horácio de Macedo, 950</addrLine>
									<postCode>21941-915</postCode>
									<settlement>Rio de Janeiro</settlement>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="institution">Getulio Vargas Foundation</orgName>
								<address>
									<addrLine>Praia de Botafogo, 190</addrLine>
									<postCode>22250-900</postCode>
									<settlement>Rio de Janeiro</settlement>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yuanwei</forename><surname>Qu</surname></persName>
							<affiliation key="aff1">
								<orgName type="department" key="dep1">SIRIUS center</orgName>
								<orgName type="department" key="dep2">Department of Informatics</orgName>
								<orgName type="institution">University of Oslo</orgName>
								<address>
									<addrLine>Gaustadalléen 23B</addrLine>
									<postCode>0373</postCode>
									<settlement>Oslo</settlement>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Term Frequency Analysis for Semantic Modeling of Geological Fault Knowledge in the Energy Industry</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">F1D2FB2F778E94DBAB3E6EF187DB431A</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T20:13+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Term Frequency Analysis</term>
					<term>Geological Fault</term>
					<term>Knowledge Modeling</term>
					<term>Ontology Development</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Understanding geological faults is crucial for the oil and gas industry, as it affects the production performance of reservoirs. Nevertheless, the fragmented and ambiguous nature of geological fault information hinders efficient information retrieval. Formal geological ontologies offer a solution by enabling domain-specific data integration. One challenge that persists in ontology development is defining a set of relevant terms with good coverage used in the domain community. Based on the TF-IDF method, we conduct a term frequency study of fault-related concepts in recent academic paper abstracts. We select papers from diverse journals and evaluate terms with geologists and ontologists. The results align with experts' knowledge and contribute to the construction of a vocabulary list for the geological fault knowledge model and pave the path for thorough ontological analyses of geological faults, which facilitates data retrieval and mitigates semantic ambiguity. Future work includes improving the quality of the generated vocabulary list, implementing the proposed corpus internally, and considering more in-house technical documents for a more comprehensive coverage.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>A comprehensive understanding of a geological fault is crucial for the oil and gas industry, as it directly influences reservoir quality, potentially leading to leaks or maintaining a seal <ref type="bibr" target="#b0">[1]</ref>. In addition to the oil and gas industry, faults also play an essential role in mining, geothermal, and construction industries <ref type="bibr" target="#b1">[2]</ref>. However, the geological data required for interpretation is often scattered across various sources and managed by different disciplines, presenting a complex challenge for geological information retrieval <ref type="bibr" target="#b2">[3]</ref>. Furthermore, the geological knowledge derived from such dispersed and sparse data is often fraught with ambiguity <ref type="bibr" target="#b3">[4]</ref>.</p><p>The term 'fault' can represent a spatial arrangement structure, an abstract 2D plane, or a 3D deformed volume <ref type="bibr" target="#b4">[5]</ref>. However, in textual documents, all these concepts are expressed simply as 'fault. ' This ambiguity in geological fault knowledge and terminology significantly hinders the efficiency of geological information retrieval from complex databases. Consequently, there is a growing demand within the oil and gas industry for integrated geological data and information models capable of enhancing the retrieval process. To address this demand, one key solution is the formalization of geological knowledge.</p><p>Building formal geological ontologies stands out as a promising solution for domain-specific data integration and retrieval in the oil and gas industry. In semantic technologies, an ontology is a formal (machine-readable), explicit specification of a conceptualization (abstract of the world that we want to represent in the knowledge) that is shared and agreed upon by the domain community <ref type="bibr" target="#b5">[6]</ref>. These ontologies establish a semantic foundation that enhances search engines' capability to recognize and interpret domain-specific terminology and relationships.</p><p>Within the geological community, various ontologies have been proposed, ranging from core-ontology for geology <ref type="bibr" target="#b6">[7]</ref>, fracture ontology <ref type="bibr" target="#b7">[8]</ref>, plastic rock deformation ontology <ref type="bibr" target="#b8">[9]</ref>, geological map ontology <ref type="bibr" target="#b9">[10]</ref>, geological time ontology <ref type="bibr" target="#b10">[11]</ref>, fault ontology <ref type="bibr" target="#b4">[5]</ref> ,deep-marine deposits <ref type="bibr" target="#b11">[12]</ref>, to risks associated with the petroleum reservoir <ref type="bibr" target="#b12">[13]</ref>. A notable industrial case is Petrobras, the Brazilian oil company, which is developing a specialized search engine for geoscientific technical reports empowered by ontology and knowledge graphs <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15]</ref>.</p><p>The development of an ontology requires the involvement of ontologists, domain experts and users to define the purpose, scope, requirements, etc <ref type="bibr" target="#b15">[16]</ref>. During the conceptualization and formalization stages, domain experts bear the primary responsibility for selecting and providing knowledge and terminology resources. Yet, assessing the comprehensiveness of the selected terms and concepts, particularly in a domain as semantically ambiguous as geology, poses challenges. Furthermore, there is also a need to convince non-domain users that the selected terms have good coverage of domain knowledge, which is generally accepted within the geology community. To address this challenge during ontology development, approaches such as term frequency analysis and information extraction from domain documents are recommended to employ <ref type="bibr" target="#b16">[17]</ref>. This method has been applied in various geological information and knowledge modeling tasks, such as the subsurface energy <ref type="bibr" target="#b17">[18]</ref>, mineral exploration <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b19">20]</ref>, and geological natural hazard <ref type="bibr" target="#b20">[21]</ref>. However, the essential yet ambiguous concept of 'fault'</p><p>has not yet undergone a term frequency analysis.</p><p>In this paper, to support the knowledge modeling of geological fault for the energy industry, we conduct a term frequency study (Sect. 2) of the 'fault' concept and its related terms from academic paper abstracts. Compared to the verbose full paper, the abstract contains the most important concepts of the research objectives. Cao et al. <ref type="bibr" target="#b21">[22]</ref> compared topics extracted from academic papers abstracts and full text and found that the similarity between results is higher when more documents are analyzed. To balance the focus and extension of fault concept, the selected papers range from domain-specific, domain-related, and industrial-related to general domain journals. We listed the renowned geoscience journals with good impact factors and citation scores, and then specialists chose the most relevant for each domain. The extracted terms are subsequently presented to geologists and ontologists to assess their alignment with the domain's understanding (Sect. 3). The evaluation shows promising results. The entire corpus for this study is publicly available.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Methodology</head><p>In our pursuit of identifying terms relevant to the Geological Fault Domain, we adapted the methodology from Garcia et al. <ref type="bibr" target="#b17">[18]</ref>. Our approach consisted of the following steps: (i) the selection of scientific journals within the domain of interest; (ii) the compilation of a comprehensive corpus comprising abstracts; (iii) the application of TF-IDF analysis to determine the primary keywords for the Geological Fault Domain; and (iv) a final evaluation of these keywords by domain experts against a pre-established ontology (Figure <ref type="figure" target="#fig_0">1</ref>). TF-IDF (Term Frequency -Inverse Document Frequency) is a well-established method in information retrieval for evaluating the importance of keywords within a corpus. Essentially, it identifies terms that are frequent within a specific document but relatively rare across the entire corpus. This approach aids in highlighting words and expressions that best describe a particular document. Differently from Garcia et al. <ref type="bibr" target="#b17">[18]</ref>, we included in the analysis sets of documents with several levels of domain focuses, including general academic papers. If only one domain is analyzed, important expressions could seem common, when they are relatively rare compared to documents of different subjects. Using several degrees of focus allows us to highlight important terms and expressions for every domain.</p><p>The TF-IDF score for each term 𝑡 in a document 𝑑 is calculated as the product of two main components: Term Frequency (𝑡𝑓) and Inverse Document Frequency (𝑖𝑑𝑓). The term frequency, denoted as 𝑡𝑓 (𝑡, 𝑑), is the sum of term 𝑡 occurrences in document 𝑑. In contrast, the inverse document frequency is given by the formula:</p><formula xml:id="formula_0">𝑖𝑑𝑓 (𝑡) = 𝑙𝑜𝑔 [ 𝑛 𝑑𝑓 (𝑡)</formula><p>] + 1</p><p>Where:</p><p>• 𝑛 represents the total number of documents in the document set.</p><p>• 𝑑𝑓 (𝑡) is the number of documents in the document set that contain the term 𝑡.</p><p>Ultimately, the TF-IDF score for a term 𝑡 in document 𝑑 is computed as:</p><formula xml:id="formula_1">𝑇 𝐹 − 𝐼 𝐷𝐹 (𝑡, 𝑑) = 𝑡𝑓 (𝑡, 𝑑) * 𝑖𝑑𝑓 (𝑡) = 𝑡𝑓 (𝑡, 𝑑) * (𝑙𝑜𝑔 [ 𝑛 𝑑𝑓 (𝑡) ] + 1)</formula><p>For our TF-IDF calculation, we utilized abstracts of academic papers as our documents. Abstracts offer distinct advantages, as they condense essential information and vocabulary into concise sentences and are accessible for a wide range of papers, including those not available as open-access. Our initial step involved the selection of relevant scientific papers. The TF-IDF method compares the vocabulary of a specific set of texts against the broader corpus. To ensure a diverse vocabulary and to balance the focus and scope of fault-related terms, we initially chose scientific journals within the Geological Fault Domain. Gradually, we expanded our selection to encompass papers from broader knowledge areas. Figure <ref type="figure" target="#fig_1">2</ref> illustrates the distribution of papers across different knowledge areas and highlights our chosen journals. Once we compiled the corpus containing all abstracts, we preprocessed it by removing stop words, applying stemming (a technique that reduces words to their root or base form), and converting all text to lowercase. Subsequently, we calculated the TF-IDF scores for all words and expressions across all documents, considering single words (1-gram) as well as two (2-gram) and three-word (3-gram) expressions. Summing the TF-IDF values across documents within our domain of interest resulted a list of the most significant keywords.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Results and Evaluation</head><p>Results. In our investigation, we compiled lists of the most important terms and expressions for all subdomains of Figure <ref type="figure" target="#fig_1">2</ref>; Tables <ref type="table" target="#tab_1">1 and 2</ref> show respectively the keywords for 'Geological fault domain' and 'Geological fault with tectonic domain'. We calculate the TF-IDF value for each word of the vocabulary (and for the 2-gram and 3-gram expressions) for every paper. Then, we summed and sorted all TF-IDF values of the documents of the same domain.</p><p>Besides the lists of important keywords for the geological fault domain, this paper presents two noteworthy outcomes:</p><p>1. We have assembled a corpus comprising 4,879 scientific papers that focus on various geoscience domains. List of the 60 most relevant terms after calculating and summing TF-IDF for all abstracts of "Geological Fault Domain" journals. They are split in single words term (1-gram), and expressions with two (2-gram) and three (3-gram) words (domain experts makes the less relevant terms in red color).</p><p>2. We have demonstrated a methodology for compiling documents and extracting important keywords from them.</p><p>Evaluations. In collaboration with geologists, we analysed the alignment between the highfrequency terms in our results and the domain knowledge. For the top-20 high-frequency terms in the Geological Fault Domain (table <ref type="table" target="#tab_0">1</ref>), 49 of 60 terms are closely related to the knowledge of fault; in the results of Geological Fault with tectonics Domain (table <ref type="table" target="#tab_1">2</ref>), only 40 of 60 terms are closely related to the knowledge of fault. It's worth noting that geologists identified some terms as "noisy, " as they are used to describe faults or are related to specific study areas.</p><p>In addition to relevance checks, there are some interesting analysis results from the discussion between geologists and ontologists. In Tables <ref type="table" target="#tab_1">1 and 2</ref>, the terms shear zone and fault zone, in geologists' view, are interchangeable in the context of brittle deformation. The terms damage zone and fault core are two components of fault zone. The term fault rock shares a certain level of similarity with fault core, but not necessarily the same. The pull apart basin is the result of normal fault, and thrust fault is a type of low angle fault. These distinctions, while clear to geologists in an academic context, highlight potential semantic ambiguities in everyday usage. Such distinctions prove invaluable when geologists seek specific data and information from databases. Additionally, we also noticed that our data sources are academically biased, which contributes to the presence of certain noisy terms. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusion</head><p>This paper proposes an approach with TF-IDF to quantify the term frequency of geological faults during the conceptualization phase of ontology development. The experiment has yielded interesting results for both geologists and ontologists. Our experiment contributes to developing basic terminology for creating knowledge models of geological faults, which will facilitate the retrieval of geological information and data from various sources. The experiment results also provide a basis with good knowledge coverage for ontological analysis to help geologists and ontologists deconstruct the semantically overloaded term 'fault' and its various hidden meanings. In future research, we plan to 1. refine the identified terms for improving the quality of the vocabulary list; 2. incorporate more industrial and technical documents for a more comprehensive analysis; 3. conduct ontological analyses of the terms and implement the proposed corpus to support the development of Petrobras' in-house search engine.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Methodology for identification of keywords for the Geological Fault Domain</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Distribution of paper by knowledge areas and scientific journals. The numbers in parenthesis are the total of abstract extracted.</figDesc><graphic coords="4,160.97,114.08,270.84,270.28" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc></figDesc><table><row><cell>1-gram</cell><cell>Σ TF-IDF</cell><cell>2-gram</cell><cell>Σ TF-IDF</cell><cell>3-gram</cell><cell>Σ TF-IDF</cell></row><row><cell>fault</cell><cell>48.59</cell><cell>shear zone</cell><cell>5.58</cell><cell>strike slip fault</cell><cell>2.7</cell></row><row><cell>zone</cell><cell>16.29</cell><cell>strike slip</cell><cell>5.34</cell><cell>fold thrust belt</cell><cell>1.86</cell></row><row><cell>deform</cell><cell>15.24</cell><cell>fault zone</cell><cell>5.07</cell><cell>fault bend fold</cell><cell>1.29</cell></row><row><cell>fractur</cell><cell>15.16</cell><cell>normal fault</cell><cell>4.37</cell><cell>fault slip data</cell><cell>1.2</cell></row><row><cell>fold</cell><cell>14.57</cell><cell>damag zone</cell><cell>4.3</cell><cell>pull apart basin</cell><cell>1.14</cell></row><row><cell>slip</cell><cell>13.93</cell><cell>slip fault</cell><cell>2.93</cell><cell>fault propag fold</cell><cell>1.1</cell></row><row><cell>structur</cell><cell>13.83</cell><cell>fault core</cell><cell>2.88</cell><cell>fault damag zone</cell><cell>0.84</cell></row><row><cell>shear</cell><cell>12.3</cell><cell>fractur network</cell><cell>2.49</cell><cell>anisotropi magnet sus-</cell><cell>0.74</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>cept</cell><cell></cell></row><row><cell>thrust</cell><cell>11.62</cell><cell>fold thrust</cell><cell>2.35</cell><cell>pre exist structur</cell><cell>0.61</cell></row><row><cell>strike</cell><cell>8.67</cell><cell>nw se</cell><cell>2.24</cell><cell>strike slip shear</cell><cell>0.6</cell></row><row><cell>stress</cell><cell>8.49</cell><cell>fault slip</cell><cell>2.12</cell><cell>dextral strike slip</cell><cell>0.57</cell></row><row><cell>model</cell><cell>8.23</cell><cell>fault rock</cell><cell>2.12</cell><cell>tecton shear stress</cell><cell>0.5</cell></row><row><cell>rock</cell><cell>8.08</cell><cell>thrust belt</cell><cell>2</cell><cell>fault stress regim</cell><cell>0.49</cell></row><row><cell>tecton</cell><cell>7.97</cell><cell>pre exist</cell><cell>2</cell><cell>thrust fault stress</cell><cell>0.49</cell></row><row><cell>strain</cell><cell>7.33</cell><cell>thrust fault</cell><cell>1.97</cell><cell>virtual outcrop model</cell><cell>0.49</cell></row><row><cell>basin</cell><cell>6.92</cell><cell>fault band</cell><cell>1.85</cell><cell>actual contact area</cell><cell>0.45</cell></row><row><cell>kinemat</cell><cell>6.79</cell><cell>fault segment</cell><cell>1.77</cell><cell>damag zone width</cell><cell>0.45</cell></row><row><cell>normal</cell><cell>6.55</cell><cell>deform band</cell><cell>1.74</cell><cell>dip normal fault</cell><cell>0.44</cell></row><row><cell>detach</cell><cell>6.29</cell><cell>fault propag</cell><cell>1.65</cell><cell>slip shear zone</cell><cell>0.44</cell></row><row><cell>seismic</cell><cell>6.15</cell><cell>ne sw</cell><cell>1.6</cell><cell>low angl fault</cell><cell>0.43</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>List of the 60 most relevant terms after calculating and summing TF-IDF for all abstracts of "Geological fault with tectonics" journals. They are split in single words term (1-gram), and expressions with two (2-gram) and three (3-gram) words (domain experts makes the less relevant terms in red color).</figDesc><table><row><cell>1-gram</cell><cell>Σ TF-IDF</cell><cell>2-gram</cell><cell>Σ TF-IDF</cell><cell>3-gram</cell><cell>Σ TF-IDF</cell></row><row><cell>fault</cell><cell>85.02</cell><cell>strike slip</cell><cell>10.34</cell><cell>strike slip fault</cell><cell>5.47</cell></row><row><cell>zone</cell><cell>27.33</cell><cell>fault zone</cell><cell>9.07</cell><cell>fold thrust belt</cell><cell>2.76</cell></row><row><cell>slip</cell><cell>27.01</cell><cell>normal fault</cell><cell>7.73</cell><cell>fault damag zone</cell><cell>1.48</cell></row><row><cell>deform</cell><cell>24.68</cell><cell>shear zone</cell><cell>7.33</cell><cell>pull apart basin</cell><cell>1.39</cell></row><row><cell cols="2">earthquak 24.21</cell><cell>slip fault</cell><cell>5.81</cell><cell>fault bend fold</cell><cell>1.34</cell></row><row><cell>structur</cell><cell>23.06</cell><cell>damag zone</cell><cell>5.03</cell><cell>fault slip data</cell><cell>1.26</cell></row><row><cell>seismic</cell><cell>21.05</cell><cell>nw se</cell><cell>3.5</cell><cell>fault propag fold</cell><cell>1.23</cell></row><row><cell>stress</cell><cell>18.62</cell><cell>fold thrust</cell><cell>3.45</cell><cell>fault stress regim</cell><cell>1.04</cell></row><row><cell>thrust</cell><cell>18.52</cell><cell>thrust belt</cell><cell>3.4</cell><cell>dextral strike slip</cell><cell>0.94</cell></row><row><cell>fold</cell><cell>18.12</cell><cell>ne sw</cell><cell>3.34</cell><cell>later strike slip</cell><cell>0.93</cell></row><row><cell>fractur</cell><cell>17.92</cell><cell>fault slip</cell><cell>3.24</cell><cell>philippin sea plate</cell><cell>0.93</cell></row><row><cell>shear</cell><cell>17.55</cell><cell>fault core</cell><cell>3.17</cell><cell>seismic reflect profil</cell><cell>0.92</cell></row><row><cell>strike</cell><cell>16.99</cell><cell>slip rate</cell><cell>3.15</cell><cell>anisotropi magnet sus-</cell><cell>0.91</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>cept</cell><cell></cell></row><row><cell>tecton</cell><cell>16.29</cell><cell>thrust fault</cell><cell>3.08</cell><cell>strike slip motion</cell><cell>0.82</cell></row><row><cell>model</cell><cell>16</cell><cell>tibetan plateau</cell><cell>2.89</cell><cell>apatit fission track</cell><cell>0.82</cell></row><row><cell>basin</cell><cell>14.78</cell><cell>pre exist</cell><cell>2.87</cell><cell>normal fault earthquak</cell><cell>0.79</cell></row><row><cell>km</cell><cell>13.31</cell><cell>stress field</cell><cell>2.76</cell><cell>southern qilian shan</cell><cell>0.76</cell></row><row><cell>ruptur</cell><cell>13.24</cell><cell>fault segment</cell><cell>2.73</cell><cell>play import role</cell><cell>0.76</cell></row><row><cell>strain</cell><cell>12.9</cell><cell>strain rate</cell><cell>2.73</cell><cell>north china craton</cell><cell>0.75</cell></row><row><cell>region</cell><cell>12.15</cell><cell>fractur network</cell><cell>2.64</cell><cell>ne sw trend</cell><cell>0.74</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments This work was partially supported by the Norwegian Research Council via SIRIUS (237898), PeTWIN (294600) and Petrobras Researcher Center (CENPES).</head><p>Code availability: https://github.com/fabiocorreacordeiro/GeoscienceCorpus</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Insight into petrophysical properties of deformed sandstone reservoirs</title>
		<author>
			<persName><forename type="first">A</forename><surname>Torabi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Fossen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Braathen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Aapg Bulletin</title>
		<imprint>
			<biblScope unit="volume">97</biblScope>
			<biblScope unit="page" from="619" to="637" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Industrial geological information capture with geostructure ontology</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Qu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kharlamov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Giese</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st International Workshop on Semantic Industrial Information Modelling (SemIIM 2022) co-located with 19th Extended Semantic Web Conference</title>
				<meeting>the 1st International Workshop on Semantic Industrial Information Modelling (SemIIM 2022) co-located with 19th Extended Semantic Web Conference<address><addrLine>ESWC</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022. 2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Toward the geoscience paper of the future: Best practices for documenting and sharing research from data to software to provenance</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Gil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H</forename><surname>David</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Demir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">T</forename><surname>Essawy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">W</forename><surname>Fulweiler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Goodall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Karlstrom</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">J</forename><surname>Mills</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-H</forename><surname>Oh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Earth and Space Science</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="388" to="415" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Fault fictions: Systematic biases in the conceptualization of fault-zone architecture</title>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">K</forename><surname>Shipton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Comrie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Kremer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Lunn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Caine</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Special Publications</title>
		<imprint>
			<biblScope unit="volume">496</biblScope>
			<biblScope unit="page" from="125" to="143" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note>Geological Society</note>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Qu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Perrin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Torabi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Abel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Giese</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2302.07059</idno>
		<title level="m">Geofault: A well-founded fault ontology for interoperability in geological modeling</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">What is an ontology?</title>
		<author>
			<persName><forename type="first">N</forename><surname>Guarino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Oberle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Staab</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Handbook on ontologies</title>
				<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="1" to="17" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">The geocore ontology: a core ontology for general use in geology</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">F</forename><surname>Garcia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Abel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Perrin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Dos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Santos</forename><surname>Alvarenga</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computers &amp; Geosciences</title>
		<imprint>
			<biblScope unit="volume">135</biblScope>
			<biblScope unit="page">104387</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Ontology of fractures</title>
		<author>
			<persName><forename type="first">J</forename><surname>Zhong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Aydina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>Mcguinness</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Structural Geology</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="page" from="251" to="259" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Semantic modeling of plastic deformation of polycrystalline rock</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">A</forename><surname>Babaie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Davarpanah</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computers &amp; Geosciences</title>
		<imprint>
			<biblScope unit="volume">111</biblScope>
			<biblScope unit="page" from="213" to="222" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Ontology-driven representation of knowledge for geological maps</title>
		<author>
			<persName><forename type="first">A</forename><surname>Mantovani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Piana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Lombardo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computers &amp; Geosciences</title>
		<imprint>
			<biblScope unit="volume">139</biblScope>
			<biblScope unit="page">104446</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">A geologic timescale ontology and service</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Cox</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Richard</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Earth Science Informatics</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="5" to="19" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">GeoReservoir: An ontology for deep-marine depositional system description</title>
		<author>
			<persName><forename type="first">F</forename><surname>Cicconeto</surname></persName>
		</author>
		<ptr target="https://lume.ufrgs.br/bitstream/handle/10183/220455/001124842.pdf?sequence=1&amp;isAllowed=y" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">F D</forename><surname>Silva</surname></persName>
		</author>
		<ptr target="https://www.maxwell.vrac.puc-rio.br/58981/58981.PDF" />
		<title level="m">ResRiskOnto: an application ontology for risks in the petroleum reservoir domain</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Busca semântica (tipo google) para recuperação mais inteligente de informação de reservatórios e exploração, ???</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">K</forename><surname>Romeu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">C</forename><surname>Cordeiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D C</forename><surname>Rodrigues</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">D S M</forename><surname>Gomes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M A</forename><surname>Alexandre</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<ptr target="https://petroles.puc-rio.ai/index_en.html" />
		<title level="m">PUC-Rio/ICA, Petrolês -corpus for the oil and gas industry</title>
				<imprint>
			<publisher>Petrobras</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">F</forename><surname>Noy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>Mcguinness</surname></persName>
		</author>
		<title level="m">Ontology development 101: A guide to creating your first ontology</title>
				<imprint>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">The neon methodology framework: A scenario-based methodology for ontology development</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Suárez-Figueroa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gómez-Pérez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fernandez-Lopez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied ontology</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="107" to="145" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">What geologists talk about: Towards a frequency-based ontological analysis of petroleum domain terms</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">F</forename><surname>Garcia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">H</forename><surname>Rodrigues</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lopes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">D S A</forename><surname>Kuchle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Perrin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Abel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ONTOBRAS</title>
		<imprint>
			<biblScope unit="page" from="190" to="203" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Prospecting information extraction by text mining based on convolutional neural networks-a case study of the lala copper deposit, china</title>
		<author>
			<persName><forename type="first">L</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Jianping</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Jie</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE access</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="52286" to="52297" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Geodocafast analysis of geological content in mineral exploration reports: A text mining approach</title>
		<author>
			<persName><forename type="first">E.-J</forename><surname>Holden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Horrocks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wedge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Duuring</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Beardsmore</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Ore Geology Reviews</title>
		<imprint>
			<biblScope unit="volume">111</biblScope>
			<biblScope unit="page">102919</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Text visualization for geological hazard documents via text mining and natural language processing</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Qiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Earth Science Informatics</title>
		<imprint>
			<biblScope unit="page" from="1" to="16" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">A comparison study of topic modeling based literature analysis by using full texts and abstracts of scientific articles: a case of COVID-19 research</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Liao</surname></persName>
		</author>
		<idno type="DOI">10.1108/LHT-03-2022-0144</idno>
		<ptr target="https://www.emerald.com/insight/content/doi/10.1108/LHT-03-2022-0144/full/html.doi:10.1108/LHT-03-2022-0144" />
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="page" from="543" to="569" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
