<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Building A Knowledge Graph for Audit Information</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Naser</forename><surname>Ahmadi</surname></persName>
							<email>naser.ahmadi@eurecom.fr</email>
							<affiliation key="aff0">
								<orgName type="institution">EURECOM</orgName>
								<address>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Hansjorg</forename><surname>Sand</surname></persName>
							<email>hsand@kpmg.com</email>
							<affiliation key="aff1">
								<orgName type="institution">KPMG</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Paolo</forename><surname>Papotti</surname></persName>
							<email>papotti@eurecom.fr</email>
							<affiliation key="aff0">
								<orgName type="institution">EURECOM</orgName>
								<address>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Building A Knowledge Graph for Audit Information</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">BBEB16522633BFC0E5259D07041A9A94</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T10:12+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>knowledge graph</term>
					<term>auditing</term>
					<term>text</term>
					<term>taxonomy</term>
					<term>structured data</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We present our insights from the experience of creating a knowledge graph (KG) for the auditing domain. We discuss the main challenges in building such KG starting from text and unstructured data and present an overview of our solution. The proposed approach follows a standard pipeline when it first extracts entities from auditing documents and then finds relationships among them. However, the process is especially challenging because auditing entities are in most cases non-named entities, which are hard to model in the graph and to identify in text. From our experience, we finally derive a set of observations on the limits of automatic methods for the construction of audit KGs and a possible direction to address them.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>A Knowledge Graph (KG) is a structured representation of information which stores real-world entities as nodes, and relationships between them as edges. KGs represent data with large collections of interconnected entities. Usually, types (classes) describe the entities (e.g., entity Paris is a city, France is a country), while predicates describe their relationships (a city isCapital of a country) and their properties (France has a population:62M). RDF KGs organize information in the form of triples with a predicate expressing a binary relation between a subject and an object. KGs store large amounts of triples, or facts, e.g., the English version of DBpedia stores 850 million facts. The syntactic and semantic structures of knowledge in KGs are useful in building applications, such as Question Answering <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref> and Semantic Search <ref type="bibr" target="#b2">[3]</ref>.</p><p>Manually building a KG is a very expensive process. For this reason, research has been conducted on KG creation both in academia <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8]</ref> and in the industry <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b9">10]</ref>. However, when applied on the textual documents in the financial domain, these methods fail short. Indeed the KGs for legal and audit enterprises are very different from Wikipedia pages. While most of the KGs in the literature are encyclopedic, covering objects and facts in the real world, some enterprises may have information which is mostly composed of non-named entities and abstract topics, making it close to a commonsense KG. See examples that highlight the difference in Figure <ref type="figure" target="#fig_0">1</ref>. The latter category is much harder to build automatically, and most efforts rely on humans, usually in a crowdsourcing fashion, such as ConceptNet <ref type="bibr" target="#b10">[11]</ref> and ATOMIC <ref type="bibr" target="#b11">[12]</ref>. The specific and technical domain of an enterprise content is one of the biggest challenges in creating financial KGs <ref type="bibr" target="#b12">[13]</ref>, in general, and an audit KG in our setting.</p><p>External commonsense resources, such as ConceptNet, are used in some of the relevant methods, but they are not a direct solution to the KG construction problem. Many terms are domain-specific, so they are either missing from the existing resource or their modeling in the commonsense KGs does not match the level of details that is needed in the enterprise setting. For example, in an accounting dictionary AIM stands for Alternative Investment Market and goodwill is "a type of tangible assets that occurs when a buyer acquires an existing business", while these words have very different meanings in a general dictionary. We remark also the challenge in modeling the above definition of goodwill by using non-named entities in the KG, what are the right noun phrases to add? Can the properties expressed in the sentence be represented with binary relationships?</p><p>In our work, we are developing tools for automating different parts of a framework for continuous creation and curation of KGs. However, we face a lot of challenges that make the automatic creation of such data structures much harder than in other settings. We start with an example of a KG we are creating in our collaboration with KPMG and then explain the difficulties and the opportunities in building an audit KG.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Audit Knowledge Graph</head><p>We introduce a very high-level KG based on node entities and only two kinds of relationships between entities. This KG is different from traditional entity-centric knowledge graphs and it is motivated by text data and taxonomies that are available in the KPMG corpus of textual documents. The design of the KG is done also according to target applications. Figure <ref type="figure" target="#fig_1">2</ref> shows a sample with a few sentences from two documents (left) and a fragment of a taxonomy for the audit process (right). In our corpus there are thousands of documents with variable size, from very short (only one sentence) to quite large documents with dozens of paragraphs. For the taxonomies, they can vary in size but are in the order of hundred nodes, each composed of a short sentence. These can be considered the starting point of the KG construction and from those several other nodes are derived. In Figure <ref type="figure" target="#fig_2">3</ref>, there is only one kind of node, representing entities. Those are very generic texts, they can be single words, paragraphs or long documents. The relationships across them are represented by directed edges and the nodes are connected in many to many relationships. We consider two kinds of relationships. The first one is the containment, in the example E6 is contained in E4. This could be a word contained in a document, for example, or a sub-element in a hierarchy (e.g., the relation between IEC 27001 and Audit process in the hierarchy in Figure <ref type="figure" target="#fig_1">2</ref>). Also, E2 could be a topic that describes document E8. We remark that all manually defined edges are given the same weight with value 1, but in the KG edges can be weighted with a value between 0 and 1 for uncertain relationships (according to the confidence given by an automatic tool, for example).</p><p>The above example representation is very generic and simplified, we introduce it to give a feeling of the kind of graph that we are interested in. However, in our deployed KG, the nodes are of six different types:</p><p>• Documents nodes are (possibly long) texts containing one to multiple paragraphs. For example, in Figure <ref type="figure" target="#fig_1">2</ref> two paragraphs are shown on the left side; those correspond to two D nodes.</p><p>• Taxonomy nodes are auditing concepts following a hierarchical structure. For example, every process step can be represented as a path from the root node to the leaf, e.g., Audit programme → ISO 19001 → Initial audit.</p><p>• Caption nodes are client-specific short documents that are described by taxonomy nodes, i.e., a describes edge goes from a taxonomy node to a caption node.</p><p>• Topics nodes are terms with one or multiple related entities; e.g., "risk treatment" and "audit process" are topics in the describes relationship with the Risk treatment in audit process step. Entities are associated in an isIn relationship with a topic.</p><p>• Entities nodes contain n-gram terms that are representative of relevant items, names and concepts in the audit domain. Every entity is the representative for a family of words, where a family includes (with isIn relationships) synonyms and abbreviations that can be used to express such entity in documents.</p><p>• Word nodes are words in an entity, their synonyms or other variations. E.g., auditing, adt and prc are words for entity audit process.</p><p>There are two main design choices behind our representation.</p><p>First, we use several node types and very few relationship types, as the latter are harder to extract automatically from text. We found that NLP analysis of the text can identify the two (relatively simple from a semantic viewpoint) relationships, while for the entity types the task is simplified by the awareness of their provenance, i.e., some types that can be mostly derived from the source of extraction. However, obtaining such types and relationships automatically from text documents is a difficult task, as we discuss in the next section.</p><p>Second, some node types are inspired by the target users. The proposed representation has been validated by experts and it is used for one text matching application at the firm. This application exploits the rich granularity of the text representation in the KG. Indeed, the different types enable the immediate characterization of a new text, say a customer document, in terms of entities (with entity and word nodes) and more abstract concepts (set of entities). We found this freedom crucial given the challenge of fixing the right abstraction for the expression of non-named entities in the KG.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Limits and Opportunities of Automatic Methods</head><p>Given the nature of the auditing content, automatic methods for encyclopedic KG construction are not very effective <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b16">17]</ref>. We experimented largely with such methods, but with results that were far away from the required quality <ref type="bibr" target="#b17">[18]</ref>. We list five main challenges. (1) Auditing entities are not standard named entities, such as France and IBM. (2) Non-named entities are expressed as noun phrases that can be recognized as subject in sentences but are hard to organize in a structured graph. For example, "tangible asset" should be modeled with one or two entities? (3) Most of these entities are oftentimes used in the form of acronyms or abbreviations. (4) Taking in account the richness of human language, there are many variations of noun phrases in expressing the same concept. ( <ref type="formula">5</ref>) There is no training data in this domain, and general corpora miss the subtle differences in the audit domain <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b14">15]</ref>. While some of these challenges apply in general for KG construction, we found that these problems are especially hard for existing tools in this setting.</p><p>As the project moved forward, different parts of the KG have been manually defined by the domain experts at KPMG. For example, a list of potential entities has been identified with NLP traditional tools and then manually revised by a human team. This process had identified some of the opportunities to introduce automatic methods to help in the KG construction. Moreover, the manually crafted portions of the KG offered us some ground truth for the evaluation of the proposed algorithms <ref type="bibr" target="#b19">[20]</ref>.</p><p>In our pipeline, the first task is the automatic identification of nodes and the second task is the identification of relationships across the different nodes. We first tackle the task of generating the entity nodes, or key short phrases, that act as subjects and objects. Starting from those, we generate families of words for each entity node. The goal is to find a group of semantically equivalent words, including abbreviations and acronyms, and to associated them to the representative entity given only the documents <ref type="bibr" target="#b19">[20]</ref>. Words and representative entities are related with isIn relationships. When evaluated against the ground truth written by the experts, we found that the proposed unsupervised technique for mapping words and entities can achieve high precision, but only limited recall, with the latter varying between 0.55 and 0.4 depending on the language at hand, i.e., English is easier than German <ref type="bibr" target="#b19">[20]</ref>.</p><p>We then propose a method to identify relationships of type describes between nodes, and we conduct experimental campaigns on the discovery of relations between documents and taxonomy nodes <ref type="bibr" target="#b20">[21]</ref>. Our method exploits a deep learning approach for the unsupervised modeling of the entities as vectors in the presence of free text and structured data <ref type="bibr" target="#b21">[22]</ref>. Such vectors are then used in the unsupervised matching step. In particular, we report promising results in matching documents and taxonomy nodes, which is a challenging task for existing methods because of the long textual content in our entities. Compared to the manually created relationships, the unsupervised method obtains 0.6 F-measure when looking at top-3 matches <ref type="bibr" target="#b20">[21]</ref>.</p><p>While our initial results are promising, we need better methods that involve the experts in the KG building process with simple interfaces <ref type="bibr" target="#b22">[23,</ref><ref type="bibr" target="#b23">24]</ref>. The design of human-in-the-loop solutions is at the core of our current efforts. The knowledge graphs with the human-in-theloop solutions we work on will support a broad range of scenarios in financial and economic settings:</p><p>• Automated classification of financial records in data ingestion and analysis pipelines.</p><p>• Automated classification of financial transaction documents to support automated transaction processing.</p><p>• Automated metadata tagging for documents and sub-documents in legal and accounting corpora to improve the reliability of semantics search engines.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Examples of knowledge triples from encyclopedic and commonsense KGs [14].</figDesc><graphic coords="1,314.71,273.82,179.17,58.51" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: An example of KPMG's documents (left) and an audit taxonomy (right).</figDesc><graphic coords="2,89.29,192.19,208.36,78.13" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Generic KG with one node type and two kinds of relationships.</figDesc><graphic coords="2,97.22,427.89,187.51,143.34" type="bitmap" /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">An introduction to question answering over linked data</title>
		<author>
			<persName><forename type="first">C</forename><surname>Unger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Freitas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cimiano</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Reasoning Web International Summer School</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="100" to="140" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Core techniques of question answering systems over knowledge bases: a survey</title>
		<author>
			<persName><forename type="first">D</forename><surname>Diefenbach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Lopez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Maret</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Knowledge and Information systems</title>
		<imprint>
			<biblScope unit="volume">55</biblScope>
			<biblScope unit="page" from="529" to="569" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Semantic search on text and knowledge bases</title>
		<author>
			<persName><forename type="first">H</forename><surname>Bast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Björn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Haussmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Foundations and Trends in Information Retrieval</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="119" to="271" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Toward an architecture for never-ending language learning</title>
		<author>
			<persName><forename type="first">A</forename><surname>Carlson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Betteridge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kisiel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Settles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">R</forename><surname>Hruschka</surname><genName>Jr</genName></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Mitchell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AAAI</title>
				<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="1306" to="1313" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">DBpedia-A crystallization point for the web of data</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kobilarov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Auer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Becker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Cyganiak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hellmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Web Semantics</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="154" to="165" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">YAGO: A core of semantic knowledge unifying wordnet and wikipedia</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Suchanek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kasneci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Weikum</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2007">2007</date>
			<publisher>WWW</publisher>
			<biblScope unit="page" from="697" to="706" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Wikidata: A free collaborative knowledgebase</title>
		<author>
			<persName><forename type="first">D</forename><surname>Vrandečić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Krötzsch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Comm. of the ACM</title>
		<imprint>
			<biblScope unit="volume">57</biblScope>
			<biblScope unit="page" from="78" to="85" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">KATARA: a data cleaning system powered by knowledge bases and crowdsourcing</title>
		<author>
			<persName><forename type="first">X</forename><surname>Chu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Morcos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">F</forename><surname>Ilyas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ouzzani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Papotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ye</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2015">2015</date>
			<publisher>SIGMOD</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">From data fusion to knowledge fusion</title>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">L</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Gabrilovich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Heitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Horn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Murphy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PVLDB</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="881" to="892" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Building, maintaining, and using knowledge bases: a report from the trenches</title>
		<author>
			<persName><forename type="first">O</forename><surname>Deshpande</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Lamba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tourn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Subramaniam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rajaraman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Harinarayan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Doan</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2013">2013</date>
			<publisher>SIGMOD</publisher>
			<biblScope unit="page" from="1209" to="1220" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Conceptnet 5.5: An open multilingual graph of general knowledge</title>
		<author>
			<persName><forename type="first">R</forename><surname>Speer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Havasi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI Conference on Artificial Intelligence</title>
				<meeting>the AAAI Conference on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">31</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">ATOMIC: an atlas of machine commonsense for if-then reasoning</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">L</forename><surname>Bras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Allaway</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bhagavatula</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Lourie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Rashkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Roof</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Choi</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
			<publisher>AAAI, AAAI Press</publisher>
			<biblScope unit="page" from="3027" to="3035" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A high precision pipeline for financial knowledge graph construction</title>
		<author>
			<persName><forename type="first">S</forename><surname>Elhammadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">V S</forename><surname>Lakshmanan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Simpson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Huai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">COLING</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="967" to="977" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Safavi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Koutra</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2104.05837</idno>
		<title level="m">Relational world knowledge representation in contextual language models: A review</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">M</forename></persName>
		</author>
		<title level="m">Domain-specific knowledge graph construction</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Expert-guided entity extraction using expressive rules</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kejriwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Shao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Szekely</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGIR</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1353" to="1356" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Domain-specific knowledge graphs: A survey</title>
		<author>
			<persName><forename type="first">B</forename><surname>Abu-Salih</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Network and Computer Applications</title>
		<imprint>
			<biblScope unit="volume">185</biblScope>
			<biblScope unit="page">103076</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Fonduer: Knowledge base construction from richly formatted data</title>
		<author>
			<persName><forename type="first">S</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Hsiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Hancock</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rekatsinas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Levis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ré</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGMOD, ACM</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1301" to="1316" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Domain-specific knowledge graph construction for semantic analysis</title>
		<author>
			<persName><forename type="first">N</forename><surname>Jain</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European Semantic Web Conference</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="250" to="260" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">N</forename><surname>Ahmadi</surname></persName>
		</author>
		<title level="m">A framework for the continuous curation of a knowledge base system</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
		<respStmt>
			<orgName>EURECOM</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Ph.D. thesis</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Unsupervised matching of data and text</title>
		<author>
			<persName><forename type="first">N</forename><surname>Ahmadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Sand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Papotti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICDE, IEEE</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">Creating embeddings of heterogeneous relational datasets for data integration tasks</title>
		<author>
			<persName><forename type="first">R</forename><surname>Cappuzzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Papotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Thirumuruganathan</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
			<publisher>SIGMOD</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">How to invest my time: Lessons from human-in-the-loop entity extraction</title>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">C</forename><surname>Dragut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Vucetic</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGKDD, ACM</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="2305" to="2313" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Large-scale relation extraction from web documents and knowledge graphs with human-in-theloop</title>
		<author>
			<persName><forename type="first">P</forename><surname>Ristoski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">L</forename><surname>Gentile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Alba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gruhl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Welch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Web Semant</title>
		<imprint>
			<biblScope unit="volume">60</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
