<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Stop-word based contextual auditing to identify inconsistencies in SNOMED</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Rashmi</forename><surname>Burse</surname></persName>
							<email>rashmi.burse@ucdconnect.ie</email>
						</author>
						<author>
							<persName><forename type="first">Gavin</forename><surname>Mcardle</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Michela</forename><surname>Bertolotto</surname></persName>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="institution">University College Dublin</orgName>
								<address>
									<settlement>Belfield, Dublin</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Stop-word based contextual auditing to identify inconsistencies in SNOMED</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">80147F11C78B9CFFF74BD115AF653320</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T21:37+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>SNOMED</term>
					<term>Quality Assurance</term>
					<term>Lexical Auditing</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>SNOMED is one of the most widely adopted Clinical Terminology systems. However, incomplete representations and modelling inconsistencies in SNOMED are preventing healthcare applications from exploiting its full potential. This paper presents a novel stop-word based contextual auditing method to identify potential inconsistencies in the modelling of SNOMED concepts. The results of a pilot study method show promising potential with this method. The percentage of identified missing attribute relationships using this method is as high as 69.56% and for identified missing hierarchical relationships it is 28.26%. The auditing method proposed in this paper can act as a supplementary Quality Assurance check in the International Health Terminology Standards Development Organization's effort to improve the quality of SNOMED.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Incomplete, inconsistent and erroneous representations of Clinical Terminology (CT) systems limit their expressiveness and have a variety of repercussions including retrieval of incomplete or incorrect result sets. Missing relationships result in the existence of partially defined concepts which obstruct the divulgence of rich inferential knowledge. For example, in the International Edition of March 2020 SNOMED version, the concept Insomnia with sleep apnea (disorder) has only one parent, Insomnia (disorder). The hierarchical link to Sleep apnea (disorder) is absent. Sleep apnea (disorder) has a role group containing three attribute relationships which are missing from the concept Insomnia with sleep apnea (disorder), thus preventing it to capture all relevant information to define this condition. If someone executes a query to retrieve all patients suffering from sleep apnea (disorder), the patients suffering from Insomnia with sleep apnea (disorder) would not be retrieved due to the missing hierarchical relationship between sleep apnea (disorder) and Insomnia with sleep apnea (disorder). This will yield inaccurate partial results. Given the critical nature of medical data, effective Quality Assurance (QA) of CT systems is imperative. 1  However, the development of effective auditing methods for the QA of CT systems is a major challenge and an ongoing process in the health-informatics domain. In spite of continual research efforts, the healthcare community is still striving to hone its auditing techniques for two major reasons: (a) the huge size of CT systems makes it impractical to audit each and every concept manually. (b) the diverse nature of clinical data has led to a variety of conflicting modelling styles making it impossible to develop a "one size fits all" solution that can be applied to all CT systems. Taking into consideration these constraints, the best way forward is to develop efficient auditing techniques that highlight concentrated erroneous regions in a CT system. Such areas can then be presented to authors and curators of a CT system for manual inspection. The main objective of such techniques is to direct the limited available resources to highly concentrated erroneous areas and identify maximum number of inconsistencies with minimal effort.</p><p>With this objective, we present a novel method based on lexical analysis of concept names containing stop-words. It is our hypothesis that stop-words which have been disregarded by other lexical auditing methods can prove to be rich sources of information to identify problematic areas. The pilot version of this method is restricted to the stop-words "and" and "with" due to their conjunctive nature. However, we plan to expand our analysis to other stopwords in the future. The proposed method identifies two types of inconsistencies: missing hierarchical relationships (i.e., if a SNOMED concept exists, which is lexically equivalent or a lexical variant of any of the subjects appearing before or after the stop-word and is not assigned as a parent of the concept) and missing attribute relationships (i.e., in the case of a missing hierarchical relationship, if the attribute relationship(s) of the identified lexically ideal parent is/are not included as a role group in the modeling of the concept). The proposed method promotes semantic completeness by identifying missing attribute relationships to refine a concept and ensures consistency in structural modelling by identifying missing hierarchical relationships. An additional advantage of our method over other auditing methods is that it not only identifies inconsistencies but also provides a potential list of suggestive corrections for each identified inconsistency. The aim of our method is to highlight areas with a high concentration of errors in order to save time and effort of experts and curators on manual auditing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>Bodenreider et al. <ref type="bibr" target="#b15">[15]</ref> developed a method to identify missing elements in SNOMED by targeting concepts containing binary antonymous adjectives such as (acute, chronic), (unilateral, bilateral), (primary, secondary), and (acquired, congenital). The proposed method extracted adjectival modifiers from the targeted concepts ([MOD][CONTEXT]) and created new terms by experimenting with various combinations of modifiers and contexts. Bodenreider et al. <ref type="bibr" target="#b14">[14]</ref> exploited the lexical features of concepts to identify missing hyponomic relationships. The method selected concepts conforming to a modifier+noun form ([MOD][NOUN]), where modifier was usually an adjectival modifier further describing the noun. They intuitively assumed that modifier+noun should be a hyponym of the noun, e.g. acute appendicitis should be a child of appendicitis, and identified missing hyponomic relationships. Pacheco et al. <ref type="bibr" target="#b20">[20]</ref> assumed that non-attributed concepts were underspecified and employed a semantic indexing method to suggest attribute relationships for such concepts. The method derived sub-words from a non-attributed concept's Fully Specified Name (FSN) with the help of MorphoSaurus <ref type="bibr" target="#b18">[18]</ref>. The derived sub-words were compared with the concept's parent(s). Common sub-words appearing both in child and parent concept were eliminated. The concepts containing the remaining sub-words were then searched and chosen as eligible candidates to refine the non-attributed concept.</p><p>Agrawal and Elhanan <ref type="bibr" target="#b5">[5]</ref> examined five types of inconsistencies among concepts whose FSNs were lexically similar, i.e., differed by only one word. The method created similarity sets consisting of concepts that differed from a base descriptor by one word. E.g. for the base descriptor "upper limb stretching", Prophylactic upper limb stretching (procedure), Therapeutic upper limb stretching (procedure), and Prophylactic lower limb stretching (procedure) constituted a similarity set. The method was applied to Procedure sub-hierarchy of SNOMED. 5 samples each consisting of 50 similarity sets were created and each sample was examined for hierarchical, attribute assignment, attribute target value, group, and definitional inconsistencies. Bodenreider <ref type="bibr" target="#b13">[13]</ref> claimed that the root cause for all inconsistencies in CT systems was concepts modeled with faulty logical definitions. With this notion they recreated logical definitions from the lexical features of a concept name and inferred hierarchical relationships among these newly defined concepts. The newly obtained hierarchy was then compared with the original SNOMED hierarchy to detect differences. Schulz et al. <ref type="bibr" target="#b22">[22]</ref> detected ambiguities in hierarchy tags, attribute relationships, and IS-A relationships based on the lexical features of SNOMED concepts and made some valuable suggestions for the curators of SNOMED. Rector and Iannone <ref type="bibr" target="#b21">[21]</ref> focused on finding concepts from the findings and diseases sub-hierarchies of SNOMED that should be classified as chronic or acute according to CORE problem list but currently are not and studied the effect of this misclassification on post-coordination queries. Ceusters et al. <ref type="bibr" target="#b16">[16]</ref> scrutinized concepts containing negation words like absence, negation, and not and misclassification caused due to these words. They introduced four categories into which negative relationships can be classified, suggested that SNOMED should be aligned with an Upper Level Ontology (ULO) like Basic Formal Ontology (BFO), and introduced a new "lacks" relationship to correctly classify such negative concepts.</p><p>Agrawal et al. <ref type="bibr" target="#b7">[7]</ref> reported the results of a study that statistically concluded that the complexity and thereby the chances of identifying errors increases with the length (number of words) of a concept name and the number of parents of a concept. Agrawal <ref type="bibr" target="#b4">[4]</ref> proposed an auditing method based on the hypothesis that if two concepts are lexically similar then their structural and logical modeling should also be similar. E.g. the concepts Acute injury of anterior cruciate ligament (disorder) and Acute injury of posterior cruciate ligament (disorder) are lexically similar as they differ by only one word and hence have similar structural and logical modelling. Both concepts have the same number of hierarchical relationships, same number and type of attribute relationships differing only in the target values (anterior and posterior). Many variations of this method, including simple similarity sets <ref type="bibr" target="#b6">[6,</ref><ref type="bibr" target="#b12">12]</ref>, positional similarity sets <ref type="bibr" target="#b8">[8,</ref><ref type="bibr" target="#b9">9]</ref>, and employing machine learning tools to create similarity sets <ref type="bibr" target="#b10">[10,</ref><ref type="bibr" target="#b11">11]</ref> were developed and applied to different versions and sub-hierarchies of SNOMED. Cui et al. <ref type="bibr" target="#b17">[17]</ref> proposed a hybrid method combining the structural and lexical aspects of a CT system and identified four lexical patterns in non-lattice subgraphs that suggested potential missing hierarchical relationships and potential missing concepts.</p><p>To summarize, all the lexical auditing methods applied so far work on one of the following principles (a) counting the length of a concept name to estimate its complexity and thereby calculate the probability of potential inconsistencies harbored by it; (b) performing lexico-syntactic and morphosyntactic analysis on the concept names to identify missing concepts/relationships; (c) applying normalization techniques and LVG algorithms to deal with variation in concept names; (d) looking for lexical similarity among concept names to check for inconsistencies in their structural and logical modelling.</p><p>The intent and focus of all the aforementioned methods is on medical jargons and their lexical variants. As a result, these methods scrutinized fixed parts of speech like adjectival modifiers, nouns, and verbs and found repeatedly occurring stop-words like "and", "or", "with" etc. to be a hindrance. To improve the performance efficiency of their algorithms, these methods ignored a list of such stop-words <ref type="bibr" target="#b2">[3]</ref>. These stop-words that are disregarded and eliminated by all the aforementioned studies can prove to be rich sources of information to identify problematic areas. They can serve as effective indicators to identify concepts harboring potential inconsistencies. The stop-word list <ref type="bibr" target="#b2">[3]</ref> eliminated by these studies serves as a major motivation for our approach. In this work we present a unique method that targets concepts containing stop-words, "and" and "with", to identify two types of inconsistencies: missing hierarchical relationships and missing attribute relationships. The pilot version of this method is restricted to the stop-words "and" and "with" due to their conjunctive nature. However, we plan to expand our analysis to other stop-words <ref type="bibr" target="#b2">[3]</ref> in the future. To the best of our knowledge, there is no lexical method developed so far that targets stop-words to audit CT systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Materials and Method</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Materials</head><p>In this pilot study, the proposed method will be applied to the Disorder sub-hierarchy of SNOMED's March 2020 International Edition. However, the proposed method is quite generic and can be applied to other hierarchies of SNOMED as well as other CT systems. We have chosen this sub-hierarchy because after performing a preliminary inspection, we found many concepts in the disorder sub-hierarchy containing the stop-words "and" and "with" that were either missing hierarchical relationships or were assigned inconsistent hierarchical relationships that varied in granularity and were missing attribute relationships. There are almost 7000 eligible concepts, containing "and" or "with", that need to be systematically assessed and it is our hypothesis that the proposed method will highlight erroneous concepts that require manual auditing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Method</head><p>The proposed method is based on four assumptions and identifies two types of inconsistencies. Lexical variants in this work are considered to be concept FSNs conforming to the lexical structure "subject + syndrome" and terms appearing before and after "and" or "with" will hereafter be referred to as subjects. Inconsistencies are defined as follows: Missing hierarchical relationship: If a SNOMED concept exists, which is lexically equivalent or a lexical variant of any of the subjects and is not assigned as a parent of the concept. Missing attribute relationship (role group): In case of a missing hierarchical relationship, if the attribute relationship(s) of the identified lexically ideal parent is/are not included as a role group in the modeling of the concept.</p><p>The assumptions made in this study are based on the observation that concepts containing "and" and "with" are expected to have at least two parents and at least two role groups. The first assumption is also supported by a semantic rule proposed during the early formative years of SNOMED <ref type="bibr" target="#b19">[19]</ref>. Mendonca et al. <ref type="bibr" target="#b19">[19]</ref> conducted a thorough analysis of SNOMED concepts containing conjunctions like "and", "and/or", "either/or", "neither/nor" and came to the conclusion that if a SNOMED concept contains the word "and", it should be treated as a "logical and" and the properties of the subjects appearing before and after the conjunction must be present in the concept. All other cases that entertain the idea of exclusivity allowing the presence of either one or both subjects should be represented using the more lenient "and/or" conjunction. Fig. <ref type="figure">1</ref> illustrates the example of a concept Pneumonia and influenza (disorder) which has two parents influenza(disorder) and Pneumonia(disorder). The names of the parents are lexically equivalent to the subjects. It has two role groups one belonging to each of the parent disorders, i.e. role group 1 containing three attribute relationships: pathological process -infectious process, causative agent -influenza virus, finding site -structure of respiratory system belonging to influenza (disorder) and role group 2 containing two attribute relationships: associated morphology -Inflammation and consolidation, finding site -lung structure belonging to pneumonia (disorder). Fig. <ref type="figure">2</ref> illustrates the individual disorder concepts pneumonia (disorder) and influenza (disorder) along with their role groups. The diagrammatic representations of concepts are downloaded from IHTSDO's SNOMED browser <ref type="bibr" target="#b1">[2]</ref>. Based on this observation and the semantic rules mentioned in <ref type="bibr" target="#b19">[19]</ref>, we present Assumptions 1 and 2. Assumption 1 Concepts containing the stop-word "and" should have at least two parents and the parents must either be lexically equivalent or must be lexical variants of the subjects appearing before and after "and".</p><p>Assumption 2 Concepts containing the stop-word "and" should have at least two role groups, and the role groups should be equivalent to the role groups of each individual concept corresponding to the subjects appearing before and after "and". Fig. <ref type="figure" target="#fig_1">3</ref> illustrates the example of a concept Ornithosis with pneumonia (disorder) which has four parents including Ornithosis (disorder) and Pneumonia (disorder) and two role groups, one for each individual disorder parent corresponding to the subject. Fig. <ref type="figure" target="#fig_2">4</ref> illustrates the individual concept Ornithosis (disorder) along with its role group. The other parent Pneumonia (disorder) along its role group is already illustrated in Fig. <ref type="figure">2 (b</ref>). Based on this observation, we present Assumptions 3 and 4. Assumption 3 Concepts containing the stop-word "with" should have at least two parents and the parents must either be lexically equivalent or must be lexical variants of the subjects appearing before and after "with".</p><p>Assumption 4 Concepts containing the stop-word "with" should have at least two role groups, and the role groups should be equivalent to the role groups of each individual concept corresponding to the subjects appearing before and after "with".</p><p>We formulated a set of rules based on the aforementioned assumptions which form the backbone of our algorithm. The developed algorithm identifies missing hierarchical relationships, missing attribute relationships, and also makes corrective suggestions by listing lexically ideal concepts using the four assumptions. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Results and Discussion</head><p>Table <ref type="table" target="#tab_0">1</ref> displays the number of eligible concepts containing the keywords "and" and "with" which were found in the disorder sub-hierarchy of SNOMED's Inter-national Edition March 2020 release. The pilot study is limited to concepts containing a maximum of three words (excluding the hierarchy tag, (disorder)) in their Fully Specified Names (FSNs). From Table <ref type="table" target="#tab_0">1</ref>, we can see that out of 6989 concepts containing stop-words "and" or "with", 92 concepts have a maximum of three words in their FSN. 76747 Concepts containing stop-words "and" and "with" (FSN length -any ) 6989 Concepts containing stop-words "and" and "with" (FSN length -3) 92</p><p>Out of the 92 concepts, 26 concepts (28.26%) were identified to be missing one or more parent(s) according to the lexical rules stated in assumptions 1-4. Out of the 26 concepts, 3 concepts had all suggested parents that belonged to finding sub-hierarchy. Currently, these concepts are dropped from the analysis due to lack of medical expertise to check conformance with the guidelines <ref type="bibr" target="#b0">[1]</ref>, but will be covered in future work after developing appropriate rules for such cases.</p><p>Out of the 23 concepts, 16 concepts (69.56%) were found to be missing attribute relationships. Table <ref type="table" target="#tab_1">2</ref> reports the statistics of the results related to missing hierarchical relationships and Table <ref type="table" target="#tab_2">3</ref> reports the statistics of the results related to missing attribute relationships that were obtained by our method. In tables 2 and 3, the second column (#) displays the number of concepts belonging to the category described by the first column (Description), the third column (Percentage) displays the count in terms of percentage and the fourth and fifth columns display the "and" and "with" concept distribution of the count respectively. Table <ref type="table" target="#tab_3">4</ref> lists the top three missing parents and missing attribute relationships identified by our method. In table 4, the first column represents the identified concept containing the stop-word "and" or "with", second column displays the suggested missing hierarchical relationship, i.e. missing parent, and the third column represents its corresponding attribute relationship that should be ideally present but is missing in the identified concept.</p><p>The results of this preliminary experiment show the potential of our approach. The percentage of identified missing hierarchical relationships using our method is 28.26% and that of identified missing attribute relationships is as high as 69.56%. Fig. <ref type="figure" target="#fig_3">5</ref>. Illustrates a diagrammatic example of Scleritis and episcleritis (disorder), one of the identified concepts with missing hierarchical and attribute relationships. According to the assumption 1, Scleritis and episcleritis (disorder) is missing parents: Scleritis (disorder) and Episcleritis (disorder). As a result of this, it is also missing the attribute relationships Associated morphology -inflammatory morphology (morphologic abnormality) and Finding site -Scleral structure (body structure), associated with Scleritis (disorder). Fig. <ref type="figure">6</ref>. Illustrates  a diagrammatic example of the suggested parent Scleritis (disorder) and highlights the suggested missing attribute relationships that need to be added as an additional role group to complete the modelling of Scleritis and episcleritis (disorder).</p><p>Since the pilot implementation of this method has a limited scope, the following limitations are noted. Currently the method only processes FSNs containing a maximum of three words (excluding the hierarchy tag), therefore concepts containing composite-word disorder names like Myopathy and diabetes mellitus (disorder) (4 words), Hepatitis A and Hepatitis B (disorder) (5 words) are not considered in spite of being suitable candidates. Currently, the approach is not considering concepts containing "and/or" due to their complexity <ref type="bibr" target="#b19">[19]</ref>. Lexi-Fig. <ref type="figure">6</ref>. Diagrammatic representation of suggested parent "Scleritis (disorder) SCTID: 78370002" cal variants are generated based only on the pattern "subject + syndrome", e.g. osteochondrodysplasia with osteopetrosis (disorder) is suggested a parent osteochondrodysplasia syndrome (disorder). As a result other variants are neither identified as existing parents nor included in the suggested parent list. Due to lack of medical expertise on the team to verify the guidelines <ref type="bibr" target="#b0">[1]</ref>, we have for now disregarded cases where the suggested parent for a disorder belongs to the Finding sub-hierarchy. E.g. the suggestion of isoimmunization (finding) as a missing parent for the concept pregnancy with isoimmunization (disorder) has not been considered for further analysis. However, in spite of these limitations the method has shown promising potential and we hope to improve the accuracy of results further by working on the aforementioned limitations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion and Future Work</head><p>Incomplete and inconsistent representations of CT systems cause retrieval of incorrect or partially correct result sets. Given the critical nature of medical data, the repercussions of such inaccurate results could be serious ranging from incorrect decision making in Clinical Decision Support Systems to predicting misleading trends in Population Health Management and Predictive Analytics. Thus, it is very important to implement effective QA measures for CT systems to identify any inconsistencies right at the source. In this paper, we presented a unique lexical stop-word based contextual auditing method to identify two types of inconsistencies; missing hierarchical relationships and missing attribute relationships. Employing a pilot version of this method have given promising results. The percentage of identified missing attribute relationships using our method is as high as 69.56% and that of identified missing hierarchical relationships is 28.26%. Our method has an additional asset over other QA methods that it not only identifies inconsistencies but also provides a list of potential suggestions for each identified inconsistency. Our method contributes to the improvement of a CT system in the following ways:</p><p>1. Help to produce a complete CT system by adding the suggested relationships to the CT system. 2. Ensure better extraction of inferential knowledge which is otherwise not divulged due to incomplete relationships and partially defined concepts. 3. Ensure retrieval of complete information in result sets which will facilitate informed decision making.</p><p>As future work we propose to improve our algorithm to identify composite disorder names such as Diabetes Mellitus. This will allow the algorithm to be applied to any FSN irrespective of its length. We plan to work on all the identified limitations. We also plan to widen the range of stop-words used in our analysis to include "of", "due to", "to", etc. Finally, we will expand the technique to process FSNs containing multiple stop-words instead of a single stop-word. E.g. Disorder due to and following burn of wrist (disorder).</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .Fig. 2 .</head><label>12</label><figDesc>Fig. 1. Diagrammatic representation of SNOMED concept "Pneumonia and influenza (disorder) SCTID: 195878008"</figDesc><graphic coords="6,134.77,300.36,345.76,85.45" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Diagrammatic representation of SNOMED concept "Ornithosis with pneumonia (disorder) SCTID:81164001"</figDesc><graphic coords="7,134.77,115.83,210.71,141.73" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 4 .</head><label>4</label><figDesc>Fig. 4. Diagrammatic representation of SNOMED concept "Ornithosis (disorder) SC-TID: 75116005"</figDesc><graphic coords="7,134.77,485.31,269.00,85.04" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 5 .</head><label>5</label><figDesc>Fig. 5. Diagrammatic representation of SNOMED concept "Scleritis and episcleritis (disorder) SCTID: 267659002"</figDesc><graphic coords="9,134.77,413.94,345.84,76.16" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Number of eligible concepts Description # Total concepts in Disorder sub-hierarchy (active only)</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Results for missing hierarchical relationships</figDesc><table><row><cell>Description</cell><cell>#</cell><cell cols="3">Percentage # "and" Concept # "with" Concept</cell></row><row><cell>Concepts for which parents were</cell><cell>26</cell><cell>28.26%</cell><cell>10</cell><cell>16</cell></row><row><cell>suggested (including finding sub-</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>hierarchy concepts)</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Concepts for which parents were</cell><cell>23</cell><cell>25%</cell><cell>10</cell><cell>13</cell></row><row><cell>suggested (excluding finding sub-</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>hierarchy concepts)</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 .</head><label>3</label><figDesc>Results for missing attribute relationships</figDesc><table><row><cell>Description</cell><cell>#</cell><cell cols="3">Percentage # "and" Concept # "with" Concept</cell></row><row><cell>Concepts for which missing at-</cell><cell>16</cell><cell>69.56%</cell><cell>9</cell><cell>7</cell></row><row><cell>tribute relationships were suggested</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 .</head><label>4</label><figDesc>Top three missing relationship suggestions</figDesc><table><row><cell>Concept</cell><cell>Suggested Parent</cell><cell>Suggested Attribute Relationship</cell></row><row><cell>Cataplexy and narcolepsy</cell><cell></cell><cell></cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="https://confluence.ihtsdotools.org/pages/viewpage.action?pageId=71172245" />
		<title level="m">SNOMED Clinical Finding/Disorder</title>
				<imprint>
			<date type="published" when="2020-07-29">2020/07/29</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<ptr target="https://browser.ihtsdotools.org/?" />
		<title level="m">SNOMED CT Browser</title>
				<imprint>
			<date type="published" when="2020-07-30">2020/07/30</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">PubMed Help</title>
		<imprint/>
	</monogr>
	<note>Internet</note>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<ptr target="https://www.ncbi.nlm.nih.gov/books/NBK3827/table/pubmedhelp.T.stopwords/" />
		<title level="m">National Center for Biotechnology Information (US)</title>
				<meeting><address><addrLine>Bethesda (MD</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2005">2005. 2020/07/28</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Evaluating lexical similarity and modeling discrepancies in the procedure hierarchy of snomed ct</title>
		<author>
			<persName><forename type="first">A</forename><surname>Agrawal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC Medical Informatics and Decision Making</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Contrasting lexical similarity and formal definitions in snomed ct: Consistency and implications</title>
		<author>
			<persName><forename type="first">A</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Elhanan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of biomedical informatics</title>
		<imprint>
			<biblScope unit="volume">47</biblScope>
			<biblScope unit="page" from="192" to="198" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Dissimilarities in the logical modeling of apparently similar concepts in snomed ct</title>
		<author>
			<persName><forename type="first">A</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Elhanan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Halper</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AMIA ... Annual Symposium proceedings. AMIA Symposium</title>
				<imprint>
			<date type="published" when="2010">2010. 2010</date>
			<biblScope unit="page" from="212" to="216" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Identifying inconsistencies in snomed ct problem lists using structural indicators</title>
		<author>
			<persName><forename type="first">A</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Perl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Elhanan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AMIA ... Annual Symposium proceedings. AMIA Symposium</title>
				<imprint>
			<date type="published" when="2013">2013. 2013</date>
			<biblScope unit="page" from="17" to="26" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Identifying problematic concepts in snomed ct using a lexical approach</title>
		<author>
			<persName><forename type="first">A</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Perl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Elhanan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Studies in health technology and informatics</title>
		<imprint>
			<biblScope unit="volume">192</biblScope>
			<biblScope unit="page" from="773" to="777" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A contextual auditing method for snomed ct concepts</title>
		<author>
			<persName><forename type="first">A</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Perl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ochs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Elhanan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Int. J. Data Min. Bioinform</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page" from="372" to="391" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">A machine learning approach for quality assurance of snomed ct</title>
		<author>
			<persName><forename type="first">A</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Qazi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Bioinformatics and Biomedicine</title>
				<imprint>
			<publisher>BIBM</publisher>
			<date type="published" when="2019">2019. 2019</date>
			<biblScope unit="page" from="792" to="798" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Detecting modeling inconsistencies in snomed ct using a machine learning technique</title>
		<author>
			<persName><forename type="first">A</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Qazi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Methods</title>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Analysis of the consistency in the structural modeling of snomed ct and core problem list concepts</title>
		<author>
			<persName><forename type="first">A</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Revelo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Bioinformatics and Biomedicine</title>
				<imprint>
			<publisher>BIBM</publisher>
			<date type="published" when="2017">2017. 2017</date>
			<biblScope unit="page" from="292" to="296" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Identifying missing hierarchical relations in snomed ct from logical definitions based on the lexical features of concept names</title>
		<author>
			<persName><forename type="first">O</forename><surname>Bodenreider</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICBO/BioCreative</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Lexically-suggested hyponymic relations among medical terms and their representation in the umls</title>
		<author>
			<persName><forename type="first">O</forename><surname>Bodenreider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Burgun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">C</forename><surname>Rindflesch</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Assessing the consistency of a biomedical terminology through lexical knowledge</title>
		<author>
			<persName><forename type="first">O</forename><surname>Bodenreider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Burgun-Parenthoine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">C</forename><surname>Rindflesch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International journal of medical informatics</title>
		<imprint>
			<biblScope unit="volume">67</biblScope>
			<biblScope unit="issue">1-3</biblScope>
			<biblScope unit="page" from="85" to="95" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Negative findings in electronic health records and biomedical ontologies: A realist approach</title>
		<author>
			<persName><forename type="first">W</forename><surname>Ceusters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">L</forename><surname>Elkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Smith</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International journal of medical informatics</title>
		<imprint>
			<biblScope unit="volume">76</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="S326" to="333" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
	<note>Suppl</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in snomed ct</title>
		<author>
			<persName><forename type="first">L</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">T</forename><surname>Case</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Bodenreider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">Q</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the American Medical Informatics Association : JAMIA</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="788" to="798" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Morphosaurus-design and evaluation of an interlingua-based, cross-language document retrieval engine for the medical domain</title>
		<author>
			<persName><forename type="first">K</forename><surname>Marko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schulz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Hahn</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Methods of information in medicine</title>
		<imprint>
			<biblScope unit="volume">44</biblScope>
			<biblScope unit="page" from="537" to="545" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Reproducibility of interpreting &quot;and&quot; and &quot;or&quot; in terminology systems</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">A</forename><surname>Mendonça</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Cimino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">E</forename><surname>Campbell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">A</forename><surname>Spackman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings. AMIA Symposium</title>
				<meeting>AMIA Symposium</meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="790" to="794" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Detecting underspecification in snomed ct concept definitions through natural language processing</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Pacheco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Stenzhorn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nohama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Paetzold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schulz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AMIA ... Annual Symposium proceedings. AMIA Symposium</title>
				<imprint>
			<date type="published" when="2009">2009. 2009</date>
			<biblScope unit="page" from="492" to="496" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Lexically suggest, logically define: Quality assurance of the use of qualifiers and expected results of post-coordination in snomed ct</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">L</forename><surname>Rector</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Iannone</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of biomedical informatics</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="page" from="199" to="209" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">Lexical ambiguity in snomed ct</title>
		<author>
			<persName><forename type="first">S</forename><surname>Schulz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Martínez-Costa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Miñarro-Giménez</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<publisher>JOWO</publisher>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
