<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Interestingness from COVID-19 Data: Ontology and Transformer-Based Methods</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Nihar</forename><surname>Sanda</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Indian Institute of Information Technology Dharwad (IIIT Dharwad)</orgName>
								<address>
									<region>Karnataka</region>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Kavi</forename><surname>Mahesh</surname></persName>
							<email>drkavimahesh@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Indian Institute of Information Technology Dharwad (IIIT Dharwad)</orgName>
								<address>
									<region>Karnataka</region>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Interestingness from COVID-19 Data: Ontology and Transformer-Based Methods</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">2E8323B7C86E2B9E7B44D17DF4E26FC6</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-06-19T14:15+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Ontology</term>
					<term>Semantic Annotation</term>
					<term>Association Rule Mining</term>
					<term>COVID-19</term>
					<term>Interesting Patterns</term>
					<term>Transformer Models</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Data Interestingness resides in most information systems. Significant implicit facts are hidden in healthcare data. Existing data interestingness techniques rely on standard data mining methodologies that lack the semantic aspect of the data. Data Interestingness is a useful functionality in analyzing large data corpora. Finding significant patterns in data helps make it more convenient and understandable for end users. In this study, our primary goal is to identify interesting patterns using ontology-based mining techniques and process them with BioClinicalBERT and CovidBERT to identify the interesting rules from the mined corpora. Further, we use the semantic similarity measure to compare the models with their similarity index to analyze the understanding of the model. The experimental results found that our proposed method is novel and operates on structured healthcare data using domain ontology. Finally, as a use case, we demonstrated using the proposed approach for paraphrasing the rules for decision-makers.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The volume of healthcare data generated during the COVID-19 pandemic is having a significant impact on tabulating, summarizing, and indexing the facts that could help healthcare workers plan and prevent the spread, according to <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. Due to the accessibility of existing biomedical knowledge repositories, the current coronavirus pandemic highlights the need for automatic relation extraction techniques. In recent years, there has been a lot of time supporting the use of patterns in prediction models <ref type="bibr" target="#b2">[3]</ref>.</p><p>Machine learning models are rapidly becoming a powerful resource in healthcare. However, the quality of these models depends on the availability of high-quality training data. In addition to large datasets being necessary, these training sets must be robust and accurate. However, obtaining comprehensive and accurate real-world data for machine learning in healthcare is challenging due to privacy and ethical issues associated with such data.</p><p>Rule mining automatically mines the logical rules from a given knowledge base (KB). For example, the interesting rule mining methods find that "If A is the husband of B, and A lives in the USA, then B also lives in the USA". This type of rule a mined based on certain confidence. These are necessary to have a complete KB. Rules are widely used in data and ontology for alignment and fact-checking purposes.</p><p>The Resource Description Framework (RDF) relies on graph-based structures. The description from a graph illustrates the relationships between the entities. Also, the information is decentralized, so connecting two graphs create a new graph. RDF follows an open-world assumption, facts that are stated are considered true, and the facts that are not stated are considered unknown Motivation With the abundance of healthcare data available, it is critical for decision-makers to use it for predictive and preventive measures. Semantic data mining using ontology and transformer-based methods can reveal hidden inferences from data. This encourages decisionmakers to keep track of data points that are relevant or interesting. The main focus of this paper is to infer interesting facts from two corpora of COVID-19 using the proposed interesting framework. More precisely, our contribution is as follows:</p><p>• We define a framework for data interestingness using domain ontology.</p><p>• We propose a novel technique to identify interesting rules using ontology and transformersbased methods. • We compare the performances of two BioBERT Models for interestingness in COVID-19 data Further, we demonstrated the usage of the proposed approach for paraphrasing the rules for decision-makers. The remainder of the paper is as follows: Section 2 discusses the data and methods. Section 3 proposes the Ontology-based Data Interestingness (ODBI) framework used in this study. Section 4 discusses the results from two COVID-19 corpora by comparing their semantic nature of it. Section 5 concludes the paper by outlining future research directions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Literature Study</head><p>Information extraction aims at automatically extracting information from unstructured data sources. Applications include information retrieval, opinion mining, sentiment analysis, question answering, and machine translation.</p><p>In computer science, the domain-specific task requires ontology as data and semantic model <ref type="bibr" target="#b0">[1]</ref>. An ontology generally consists of an agreed (i.e., semantics) understanding of a specific field, axiomatization, explicitly expressed in a computer resource as a logical theory <ref type="bibr" target="#b3">[4]</ref>.</p><p>Association Rule Mining (ARM) is the most important topic in data mining research. Its goal is to discover interesting correlations, patterns, and associations between groups of items in transaction databases. Telecommunication networks, market and risk management, and inventory control all use association principles. Finding interesting association rules is a popular and current topic in data mining techniques <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7]</ref>.</p><p>In the state-of-the-art, several measurements are proposed, with ontology being less explored. An ontology that uses the semantic web, where data is represented as Resource Description Framework (RDF) triples (subject, predicate, object) makes it machine understandable. This fortifies the system to infer knowledge using the underlying schema of ontology <ref type="bibr" target="#b7">[8]</ref>. The publication "Attention is all you need" by <ref type="bibr" target="#b8">[9]</ref> presented the Transformers architecture (2017). The architecture of transformers is encoder-decoder. The BERT model has recently produced cutting-edge results in a variety of NLP tasks in the same context. It's a different kind of transfer learning. BERT's primary operating mode is a transfer by fine-tuning similar to the one used by ULMFiT. Additionally, BERT can be used in the transfer mode by removing features like ELMo. Early detection model using Chat bot analytical language resources of descriptive questions to extract interesting facts. Three distinct models, CT-BERT, BERTweet, and Roberta are tuned on COVID-19-linked text data to distinguish between fake and real news <ref type="bibr" target="#b9">[10]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Outcomes of Literature</head><p>These successful studies demonstrate that ontologies can be used to improve the performance and enhance the usability of complex data analytics systems. The transformer models used in the study were pre-trained on biological data, giving them a deeper understanding of the terminology used in biomedicine. We use these state-of-the-art transformer-based methods for generating rule embeddings and cluster them further to analyze them with semantic scores for interesting ones.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Preliminaries, Data and Methods</head><p>This section explains the preliminary definitions and dataset with the proposed OBDI methodology.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Preliminaries</head><p>Ontology and ARM methods closely work towards the data interestingness <ref type="bibr" target="#b0">[1]</ref>. In data mining literature, association rule mining is widely used for rule generation based on frequent patterns. This section aims to provide the readers with the necessary background knowledge.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 1. Association Rule: Technique used to mine the frequent patterns in Data. The discovered patterns define the relationship between them.</head><p>we call X − → Y as association rule. To have the strong association rule, we need to compute the support and confidence as indicated in equations 1 and 2. Rules are defined considering our domain information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>𝑆𝑢𝑝𝑝𝑜𝑟𝑡 = (𝑋 → 𝑌 ) = 𝑋 &amp; 𝑌 𝑇 𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠 𝑠𝑒𝑡</head><p>(1)</p><formula xml:id="formula_0">𝐶𝑜𝑛𝑓 𝑖𝑑𝑒𝑛𝑐𝑒(𝑋 → 𝑌 ) = 𝐵𝑜𝑡ℎ 𝑋 &amp; 𝑌 𝐴𝑙𝑙 𝑣𝑎𝑙𝑢𝑒 𝑠𝑒𝑡 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑋<label>(2)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 2. Ontology: An Ontology O is defined as O = ( Tbox + Abox, G).</head><p>Tbox: define the schema or an ontology. Abox refers to RDF triples at the instance level. G is a labeled graph structure produced by connecting the relations with concepts. Figure <ref type="figure" target="#fig_0">1</ref> illustrates the importance of ontology.</p><p>Definition 3. Data Interestingness: Our notion of data is derived by integrating domain ontology with data in RDF and user interest rules.  Data Download Data Access may be requested to HFWS.</p><p>Data Instances 120000</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Data</head><p>In this work, we used two COVID-19 corpora from the Indian state of Karnataka. 12 The data statistics is illustrated in Table <ref type="table" target="#tab_0">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Embeddings</head><p>We generate the embedding using the BioClinicalBERT and CovidBERT models in this study.</p><p>BioClinicalBERT is a model trained with data corpora.</p><p>BioClinicalBERT is a model that is initialized on BioBERT (BioBERT-Base v1.0 + PubMed 200K + PMC 270K) and then it is trained on the MIMIC III <ref type="bibr" target="#b10">[11]</ref> notes. These MIMIC notes consist of electronic health records from ICU patients of a hospital. For the pretraining of this model, the authors utilized a batch size of 32, a maximum sequence length of 128 with a learning rate of 5 * 10 −5 . The models were trained for 150,000 steps using all MIMIC notes.</p><p>CoviBERT is a model that Deepset trains on AllenAI's COR19 dataset which consists of various scientific articles about coronaviruses. The model is initialized on BERT word piece vocabulary. Then, using the sentence-transformers library, it is fine-tuned on the SNLI and MultiNLI datasets to construct universal sentence embeddings using the average pooling technique and a softmax loss <ref type="bibr" target="#b11">[12]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Methodology</head><p>The proposed OBDI framework, as in Figure <ref type="figure" target="#fig_1">2</ref>, is an ontology-based mining framework that uses semantic similarity to determine the interestingness of rules. OBDI's goal is to automatically generate rules and knowledge from datasets to improve future decision-making process efficiency. OBDI's logic structure is as follows: RDF data instances are created from a dataset and a domain ontology. These data are backed by the domain experts' knowledge and also ontology concepts. Interesting rules are formulated as shown in Table <ref type="table" target="#tab_1">2</ref>,3 and 4 using the ontology and experts' knowledge. The OBDI methods include the IntApriori proposed <ref type="bibr" target="#b12">[13]</ref>. It's significant that the generated rules are processed by BERT models for semantic scores to determine a rule's importance and degree of interest. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results and Discussions</head><p>This section discusses the semantic association rules generated from COVID-19 data and how it is processed on BioClinicalBERT and CovidBERT for identifying the similarity-based interesting rules. With the detailed analysis from the current state-of-the-art, BioClinicalBERT and CovidBERT are used in this study.</p><p>In a given set of rule embeddings, we find clusters that are closely related to the embeddings. This mapping is facilitated by BioBERT embeddings <ref type="bibr" target="#b13">[14]</ref> of the rules generated by ontologybased mining. This helps reduce the rules' search space to have the most interesting ones. The cluster centroid is considered as the interesting point indicated as I . The rules that match the cluster I value are termed as the most interesting ones. Focusing on the rules in the particular cluster, we use BioClinicalBERT and CovidBERT <ref type="bibr" target="#b14">[15]</ref> embeddings and text summarization model to find the best-matched rules by generated the summarization of the cluster. Further, this summarization is treated as a paraphrase to decision-makers for future actions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Semantic Association Rules from OBDI</head><p>The goal of the OBDI framework is to generate interesting rules, given the data and the domain ontology. The COP and COKPME ontology is used for generating semantic Association rules <ref type="bibr" target="#b12">[13]</ref>. Our previous studies illustrate the design and implementation of COP and COKPME <ref type="foot" target="#foot_2">5</ref> . Table <ref type="table" target="#tab_1">2</ref> shows the semantic association rules of the KATrace COVID-19 Dataset. Further, these rules will be used by BioClinicalBERT and CovidBERT to identify the interesting rules.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Rules in Ontology</head><p>With the object and data properties defined in the COP ontology, the relationships inferred by the reasoner are the initial path for interesting fact generation. We define a set of rules that are operated on COP ontology for interesting fact generation. A few of the rules are indicated in Figure <ref type="figure" target="#fig_2">3</ref>.</p><p>The rules are generated using ontology-based methods. A few of the rules with higher confidence are listed in Table <ref type="table" target="#tab_1">2</ref>. The generated rules are semantically annotated so that decisionmakers can interpret them and take the appropriate actions. The results show the patient's age, status, the location from which he traveled, and the treatment provided.</p><p>Tables <ref type="table" target="#tab_3">3 and 4</ref> describe the rules associated with its interesting index (I). The cluster centroid values are used as interesting data points, as are the embedding values that point to specific rules. Clustering using K-means <ref type="bibr" target="#b15">[16]</ref> is applied to both the CovidBERT and BioClinicalBERT embedding sets. Both models generated five centroid points, five of which were interesting (I). Interesting rules are extracted from the rules pointing to the I value. The output of K-Means clustering on CovidBERT and BioClinicalBERT embeddings is shown in Figure <ref type="figure" target="#fig_3">4</ref>.</p><p>Figure <ref type="figure" target="#fig_5">5</ref>. depicts the distribution plot of the average of word embeddings obtained by the two models BioClinicalBERT and CovidBERT. The model's embedding distribution is also typical. The distributed rules demonstrate the model's understanding of the input rules. Table <ref type="table" target="#tab_4">5</ref> also describes the ontology relationships distributed across the rule embeddings. It has been discovered thattreatmentProvided and sufferFrom are the two majorly identified ontology relationships.      The box-whisker plot of the semantic scores obtained from the two models BioClinicalBERT and CovidBERT is shown in 6. When compared to the BioClinicalBERT model, the CovidBERT model has a lot of variation in the semantic score. This demonstrates the two models' different levels of comprehension. The BioClinicalBERT model calculates high cosine similarity values between these rules, implying that they are very similar. The value of the min, max and mean similarity scores are as depicted in Table <ref type="table" target="#tab_5">6</ref> The results show that the methodology learns to generate interesting facts based on the simple linguistic feature (COVID-19 Corpora) which are embedded in textual data using the BioClinicalBERT and CovidBERT model. The paraphrased summary of the identified interesting rules is as follows:</p><p>• Patient with {ILI, Diabetic} are highly prone to COVID-19 Infection.   • Below the age group of 35 is all suggested to have {MPHQ} advice. So healthcare facilities should be reserved for higher age groups. • Many health workers are infected and admitted to their own hospitals, creating a shortage of resources. • The most widely documented symptom in the COVID-19 dataset is the common flu.</p><p>The decision-makers understand these paraphrased rules for having preventive and predictive analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>This article proposes a novel methodology for mining ontology based on interesting facts from the COVID-19 data corpora. The Mined ontology-based rules are used as input to the transformer-based models like BioClinicalBERT and CovidBERT for interesting rules. The aggregate value of all rule embeddings is clustered. Next, using the cluster centroid, the Interestingness index (I) is derived and illustrated as the most interesting rule. Further, with the similarity scores from both models, the rules are compared for their similarity index. It observed that BioClinicalBERT outperformed CovidBERT with the similarity score by giving high relevance to the generated rules. As future research directions, this study is continued to compare the model-generated rules with domain expert rules to justify our claims. Another possible extension of this work could be to use it in the applications like the state-of-the-art COVID-19 Sentiment Analysis Toolkit.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Ontology Illustration</figDesc><graphic coords="4,89.29,84.19,416.72,107.92" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: OBDI Methodology</figDesc><graphic coords="5,177.17,349.11,240.95,141.74" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Rules for COP ontology</figDesc><graphic coords="6,177.17,84.19,240.94,62.36" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Word Embeddings Avg. Cluster plot from COVID and Clinical BERT Model</figDesc><graphic coords="8,297.64,300.37,198.43,131.23" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head></head><label></label><figDesc>(a) Word Embedding Average plot from CovidBert (b) Word Embedding Average plot from CLinicalBERT</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Word Embeddings Avg. Distribution from COVID and Clinical BERT Model</figDesc><graphic coords="8,89.29,303.21,198.43,131.23" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Box and Whisker Plot of Semantic Scores from BioClinicalBERT and CovidBERT.</figDesc><graphic coords="9,198.43,232.60,198.42,141.73" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 COVID</head><label>1</label><figDesc></figDesc><table /><note>-19 Dataset DescriptionsDataset Name KATraceDescriptionThe data is collected from HFWS web portal3 and is curated and stored in spree,d-sheet by Siva Athreya and other researchers at the Indian Statistical Institute Bangalore.4 .AttributesCase ID, age, diagnosedOn, gender, city, cluster, reason, nationality, and status as attributes.Data Download www.isibang.ac.in/ athreya/incovid19/ Data Instances 71000Dataset Name COPDescriptionThe data is collected from the HFWS as part of the funded project.Data Access may be requested to HFWS.AttributesCase ID, age, Date, diagnosis, prescription for, drug store, district.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Few Semantic Association Rules from COVID-19 KATrace Data Corpora</figDesc><table><row><cell>Semantic Association Rules of KATrace COVID-19 Dataset</cell><cell>{sufferfromComorbidOthers}: (age, 27, 'Covid-19 (Suspect))⇒ (hasDiagnosedFor, Breathlessness(Influenza</cell><cell>like Illness,)) (prescribedWith, Medicine Prescribed with Home Quarantine)</cell><cell>{hasDiagnosedFor}: (age, above 65, Severe Acute Respiratory Infection) ⇒</cell><cell>(suspectedReasonOfCatchingCovid-19, Contact with other patients)</cell><cell>{gender}: (Male, Female) (travelledFrom, TJ Congregation from 13th to 18th March in Delhi) ⇒</cell><cell>(suspectedReasonOfCatchingCovid-19, Family contact)</cell><cell>{gender}: (Male, Female) ⇒ {(currentStatus ,cured) , (location, From Maharastra)</cell><cell>{sufferFrom}(sneezing and an itchy, runny or blocked nose) ⇒ {prescribedWith}(Allergy Drugs)</cell><cell>{sufferFrom}(sneezing and an itchy, runny or blocked nose, sore throat ) ⇒ {sufferFrom}(Allergy Drugs,</cell><cell>Cough Syrup)</cell><cell>{sufferFrom}(Sweating, Headache, Muscle aches, Loss of appetite, Dehydration, General weakness, sore</cell><cell>throat) ⇒ {sufferFrom}( Fever Drugs, Cough Syrup )</cell></row><row><cell>#</cell><cell>R1</cell><cell></cell><cell>R2</cell><cell></cell><cell>R3</cell><cell></cell><cell>R4</cell><cell>R5</cell><cell>R6</cell><cell></cell><cell>R7</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Cluster Rule Summary with Interesting -I Value from CovidBERT</figDesc><table><row><cell>Score (I) Rule No.</cell></row><row><cell>Cluster</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>Cluster Rule Summary with Interesting-I Value from BioClinicalBERT</figDesc><table><row><cell>Score (I) Rule No.</cell></row><row><cell>Cluster</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5</head><label>5</label><figDesc>Relations Summary in each Rule</figDesc><table><row><cell>Relations</cell><cell>Count</cell></row><row><cell>prescribedwith</cell><cell>50</cell></row><row><cell>sufferfrom</cell><cell>812</cell></row><row><cell>hascategory</cell><cell>877</cell></row><row><cell>livesin</cell><cell>651</cell></row><row><cell>treatmentprovided</cell><cell>883</cell></row><row><cell>residesat</cell><cell>86</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 6</head><label>6</label><figDesc>Semantic Similarity Scores for CovidBERT and BioClinicalBERT Statistic Semantic Score (BioClinicalBERT) Semantic Score(CovidBERT)</figDesc><table><row><cell>Min</cell><cell>0.703100</cell><cell>0.003200</cell></row><row><cell>Max</cell><cell>0.999900</cell><cell>0.999300</cell></row><row><cell>Mean</cell><cell>0.933196</cell><cell>0.622179</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://karunadu.karnataka.gov.in/hfw/pages/home.aspx</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://www.isibang.ac.in/ athreya/incovid19/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_2">https://bioportal.bioontology.org/ontologies/COKPME</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Acknowledgments</head><p>This work was supported in part by the Department of Health and Family Welfare Services (HFWS), Government of Karnataka, India. We also extend our special thanks to the E-Health section of HFWS, Government of Karnataka, India, for providing all the necessary support and encouragement. Also, we would like to thank two anonymous reviewers for commenting on earlier versions of this paper.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Graph analytics applied to covid19 karnataka state dataset</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Mahesh</surname></persName>
		</author>
		<idno type="DOI">10.1145/3459955.3460603</idno>
		<idno>doi:10.1145/3459955.3460603</idno>
		<ptr target="https://doi.org/10.1145/3459955.3460603" />
	</analytic>
	<monogr>
		<title level="m">2021 The 4th International Conference on Information Science and Systems</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="74" to="80" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Ontology is what makes data interesting: Interestingness framework for covid-19 corpora</title>
		<author>
			<persName><forename type="first">C</forename><surname>Abhilash</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Mahesh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Information Science</title>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><surname>Bringmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Nijssen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zimmermann</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1111.6191</idno>
		<title level="m">Pattern-based classification: A unifying perspective</title>
				<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<ptr target="https://en.wikipedia.org/w/index.php?title=FAIR_data&amp;oldid=1038845392" />
		<title level="m">Fair data -Wikipedia, the free encyclopedia</title>
				<imprint>
			<date type="published" when="2021-08-24">2021. 24-August-2021</date>
		</imprint>
	</monogr>
	<note>Wikipedia contributors</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">An improved algorithm for mining association rules in large databases</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">H</forename><surname>Al-Zawaidah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">H</forename><surname>Jbara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Marwan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">World of Computer science and information technology journal</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="311" to="316" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Fast algorithms for mining association rules</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Srikant</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 20th int. conf. very large data bases, VLDB</title>
				<meeting>20th int. conf. very large data bases, VLDB</meeting>
		<imprint>
			<publisher>Citeseer</publisher>
			<date type="published" when="1994">1994</date>
			<biblScope unit="volume">1215</biblScope>
			<biblScope unit="page" from="487" to="499" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Latent association rule cluster based model to extract topics for classification and recommendation applications</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">F</forename><surname>Santos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Domingues</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">V</forename><surname>Sundermann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">O</forename><surname>De Carvalho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Moura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">O</forename><surname>Rezende</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Expert Systems with Applications</title>
		<imprint>
			<biblScope unit="volume">112</biblScope>
			<biblScope unit="page" from="34" to="60" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">The semantic web</title>
		<author>
			<persName><forename type="first">T</forename><surname>Berners-Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hendler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Lassila</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Scientific american</title>
		<imprint>
			<biblScope unit="volume">284</biblScope>
			<biblScope unit="page" from="34" to="43" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ł</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Covid-19 fake news detection using ensemble-based deep learning model</title>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Singh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IT Professional</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="32" to="37" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Mimic-iii, a freely accessible critical care database</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">E</forename><surname>Johnson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">J</forename><surname>Pollard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L.-W</forename><forename type="middle">H</forename><surname>Lehman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ghassemi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Moody</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Szolovits</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Anthony Celi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">G</forename><surname>Mark</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Scientific data</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="1" to="9" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Covid-19 infection risk ontology</title>
		<author>
			<persName><forename type="first">S</forename><surname>Egami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yamamoto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Ohmukai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Okumura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ciro</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PloS one</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page">e0282291</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Ontology-based interestingness in covid-19 data</title>
		<author>
			<persName><forename type="first">C</forename><surname>Abhilash</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Mahesh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Research Conference on Metadata and Semantics Research</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="322" to="335" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Biobert: a pre-trained biomedical language representation model for biomedical text mining</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Yoon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H</forename><surname>So</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page" from="1234" to="1240" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Alsentzer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Murphy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Boag</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-H</forename><surname>Weng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Jin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Naumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mcdermott</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1904.03323</idno>
		<title level="m">Publicly available clinical bert embeddings</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">A comparative analysis of data sets using machine learning techniques</title>
		<author>
			<persName><forename type="first">C</forename><surname>Abhilash</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Rohitaksha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Biradar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE international advance computing conference (IACC), IEEE</title>
				<imprint>
			<date type="published" when="2014">2014. 2014</date>
			<biblScope unit="page" from="24" to="29" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
