<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Semantic Enrichment of Social Media Resources for Adaptation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Oliver</forename><surname>Schimratzki</surname></persName>
							<email>oliver.schimratzki@gmx.de</email>
							<affiliation key="aff0">
								<orgName type="institution">Friedrich Schiller University of Jena</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Fedor</forename><surname>Bakalov</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Friedrich Schiller University of Jena</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Adrian</forename><surname>Knoth</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Friedrich Schiller University of Jena</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Birgitta</forename><surname>König-Ries</surname></persName>
							<email>birgitta.koenig-ries@uni-jena.de</email>
							<affiliation key="aff0">
								<orgName type="institution">Friedrich Schiller University of Jena</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Semantic Enrichment of Social Media Resources for Adaptation</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">9ED4B7153F242D25F86B457EF8489F0F</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T02:44+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>With more and more dynamic content available on the web, we need systems that aggregate and filter information from different sources to provide us with only the information we are really interested in. In this paper, we present one such system, the CompleXys portal, aimed at users interested in complexity or subtopics thereof. It accesses a large variety of different information sources, among them calendars, news sites and blogs, semantically annotates and categorizes the retrieved content and displays only relevant content to the user.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The amount of dynamic content available on the web is rapidly growing. It becomes more and more difficult for users to keep track of all relevant informationin particular since it becomes more and more overwhelming to manually separate relevant from irrelevant content. Even if a user has identified a variety of news sites and blogs that often contain information she is interested in, those sites will also contain lots of information the user is not interested in. Purely syntactic filtering based, e.g., on keywords, as offered by today's tools, offers only a partial solution. What is really needed is semantic filtering, i.e., filtering based on some "understanding" of the content. This will allow for higher precision, i.e., fewer irrelevant articles displayed, and higher recall, i.e. less relevant articles deleted, and will thus increase user confidence in using the tools.</p><p>In this paper, we present the CompleXys portal, an information site that will provide users with personalized access to information related to the topic of complexity. CompleXys harvests information from a large variety of sites, ranging from event calendars to blogs and news sites. It semantically annotates the retrieved content. These annotations are then used to categorize the retrieved items and to decide whether they are sufficiently related to complexity or should be discarded. In the future, CompleXys will use the categorization for a more fine-grained personalization, displaying the most relevant items most prominently and providing recommendations to the user.</p><p>In the remainder of this paper, after a brief discussion of related work in Section 2 we take a closer look at CompleXys and the underlying technologies: Section 3 provides an overview of the CompleXys architecture. We will then focus on the most interesting part of this architecture, namely the semantic content annotator which will be presented in Section 4. Finally, Section 5 contains a summary and an outlook on our future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>In this paper we describe an architectural solution and an approach to providing a personalized access to the variety of resources residing on the Web and in intranets. To achieve this, we combine the approaches and technology from three areas of research, namely content aggregation, semantic content annotation, and content-based recommender systems.</p><p>Content aggregation, though a relatively new field, has already achieved the state of maturity. Apart from the multitude of research proposals <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b10">11]</ref>, there exists a number of industry standards and commercial applications of content aggregators. Really Simple Syndication (RSS) <ref type="foot" target="#foot_0">1</ref> and Atom<ref type="foot" target="#foot_1">2</ref> formats have been successfully used by a large number of Web and desktop application for aggregating various types of content, including but not limited to calendar information, news, blog entries, and podcasts. The iCalendar<ref type="foot" target="#foot_2">3</ref> format is used by many applications for aggregating appointments and events from multiple calendar systems. Personal Web portals like iGoogle <ref type="foot" target="#foot_3">4</ref> and My Yahoo!<ref type="foot" target="#foot_4">5</ref> allow their users to place different types of content harvested through RSS and Atom feeds on their personal pages. Portals like Technorati<ref type="foot" target="#foot_5">6</ref> aggregate information on more or less specific topics. RSS filtering tools like Feed Rinse<ref type="foot" target="#foot_6">7</ref> allow the user to define keyword based filters on RSS feeds to get rid of irrelevant items. These tools work, however, on a purely syntactic level.</p><p>The field of semantic content annotation mainly deals with the challenges related to availability of well-formed metadata for the unstructured text resources, which is essential for achieving high recall and precision of information retrieval. A number of approaches to semantic content annotation have been reported in the literature <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b6">7]</ref>. GATE <ref type="bibr" target="#b3">[4]</ref> has become one of the most widely used open source frameworks for implementing natural language processing (NLP) tasks. The framework empowers developers to implement such components as tokenizers, sentence splitters, part-of-speech taggers, gazetteers, semantic taggers, and the components for identifying relationships among the entities in the text. A number of NLP systems leverage GATE and its components for semantic tagging of content; these include but not limited to the KIM platform <ref type="bibr" target="#b13">[14]</ref>, MUSE <ref type="bibr" target="#b11">[12]</ref>, and Ont-O-Mat <ref type="bibr" target="#b7">[8]</ref>.</p><p>Availability of machine-processable metadata of content is one of the most essential requirements for the content-based recommender systems <ref type="bibr" target="#b12">[13]</ref>. These systems recommend relevant content to the user based on the semantic description of available resources and the user's personal preferences. The relevant content is selected by analyzing the content metadata and the user's profile and identifying the items that match the user's individual interests. A number of systems leveraging this approach have been proposed. CHIP <ref type="bibr" target="#b16">[17]</ref>, for instance, is capable of recommending the user artworks from multiple museum collections. For recommendation, the system leverages the semantic description of artworks and the user's personal interests in the domain of cultural heritage, which the system identifies based on the user's explicit ratings of artworks and semantic relations among the art topics. Other examples of the systems leveraging similar recommendation approach are the Personal Reader Framework <ref type="bibr" target="#b2">[3]</ref> and Personal Learning Assistant <ref type="bibr" target="#b5">[6]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Overall Architecture</head><p>The CompleXys portal aggregates a multitude of different sources from the Internet, categorises the retrieved content, applies semantic annotation and finally presents the filtered and personalised results to the user. Table <ref type="table">1</ref> shows a schematic overview of CompleXys' architecture and its data flow in a left to right manner, which basically implements the Input-Processing-Output model. On the input site, the harvester retrieves arbitrary content and stores a mangled version in the crawler database. Since the particular source of each entry is known, this step also provides content type indexing for free.</p><p>The crawler database is fully generic and hence supports any kind of input source. Figure <ref type="figure" target="#fig_0">1</ref> shows the underlying schema. The DBMS will increment the unique key id for every newly retrieved entry. All later processing steps make use of this key: querying the crawler database for new content simply means querying for all id s higher than the last known or processed id.  <ref type="table">1</ref>. Database layout of the crawler database. This database contains already fetched items from a potentially large variety of sources. id is supposed to be monotonically increasing, while internal id holds a suitable hashsum (e.g. MD5) of the cached resource. The source field specifies the origin of the item stored in content.</p><p>The content itself is blindly stored as text (BLOB), semantic parsing is delayed to subsequent stages in the processing pipeline. The source column contains the SIOC content type, it serves as a type indicator to the processing modules.</p><p>The crawler is idempotent, that is, it can be run several times without storing already known content again. This property is achieved by internal id, another column set to be a unique key. For each retrieved entry, the crawler calculates a suitable MD5 hash and stores both, content and its derived hash into the database. If this entry has been already fetched, the DBMS will prevent inserting a duplicate MD5 hash into internal id and consequently avoid storing known content again. Obviously, finding an appropriate way for calculating the MD5 hash is crucial. CompleXys currently has built-in support for two different source types: it can directly retrieve calendar events from SQL databases and arbitrary HTML input from the web via RSS, but more resource types can be added via crawler plugins. Generating a suitable hash representing the content usually differs among sources and is hence individually implemented in each such crawler plugin.</p><p>The SQL crawler connects to specified source databases in the University's network and harvests information about upcoming events, potentially of interest to the user. Since CompleXys strictly adhere to UTF-8 character encoding throughout the whole processing, the crawler is responsible to convert any source specific encoding, e.g., from Latin1 to UTF-8. This way, subsequent processing modules do not have to take care for different character encodings.</p><p>The retrieved SQL calendar events are normalised into a standardised template as shown in Figure <ref type="figure">2</ref>. The crawler finally calculates the appropriate MD5 hash for this event by concatenating the source prefix (a constant arbitrary string), the event's primary key in the foreign database and the provided last-update timestamp. This way, the MD5 hash of the concatenation is different for each event from every source database. Even more, updates to already retrieved events have a different timestamp, and consequently, a new MD5 hash together with this updated content will end up in the crawler database. Whenever the CompleXys portal encounters multiple entries for the same source URI in its crawler database, younger rows are updates to already known events.</p><p>In addition to SQL calendar events, the harvester has a HTTP crawler for arbitrary HTML content. URLs are extracted from RSS feeds specified in a static configuration file (see Figure <ref type="figure">3</ref>).</p><p>Whenever possible, the crawler tries to use the print version of a document to remove navigation menus, advertisements and other unrelated noise. If the source already provides a more structured representation, e.g., iCal format, it will be used instead. Likewise the SQL crawler, the HTTP crawler wraps retrieved content into a SIOC<ref type="foot" target="#foot_8">9</ref> schema as depicted in Figure <ref type="figure" target="#fig_1">4</ref>, generates a suitable MD5 hash and tries to store the result in the crawler database. Again, this insert will fail if the content is already known.</p><p>At this stage, all entries in the crawler database are simply unstructured raw text. Unless already provided by the source, there is no semantic information available, yet. However, semantic annotation is required to decide if a given content item is of interest to the user. The next section will explain in detail how this is done.</p><p>Once semantic annotation has been provided, relevant items are displayed to the user of the CompleXys portal categorized in appropriate domains. We are currently working on integrating our approach to personalization into Com-pleXys. This will allow to adapt the information provided to individual user String wrappedNewsItem = "&lt;sioc:Post rdf:about=\"" + newsItem.link + "\"&gt;\n" + "\t&lt;dcterms:title&gt;" + newsItem.title + "&lt;/dcterms:title&gt;\n" + "\t&lt;dcterms:created&gt;" + newsItem.pubDate + "&lt;/dcterms:created&gt;\n" + "\t&lt;sioc:topic rdfs:label=\"" + newsItem.category + "\"/&gt;\n" + "\t&lt;sioc:content&gt;\n" + "&lt;![CDATA[" + newsItem.content + "]]&gt;\n\t&lt;/content&gt;\n" + "&lt;/sioc:Post&gt;"; return wrappedNewsItem; } needs: Only information relevant to a specific user (and not to complexity in general) will be provided, the most important information will be displayed most prominently, related information (and possibly related users) will be recommended etc. Underlying this adaptation is a user interest model realized as an overlay over the domain model, that collects user interests based on the interactions of the user with the system and also allows the user manual adaptations <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Semantic Content Annotators</head><p>The Semantic Content Annotators pursue the purpose of extracting semantic data from incoming text documents and of annotating this data back to the resources. Furthermore, they are meant to decide whether a given resource is relevant for the topic of complexity and to categorize it by means of corresponding topical concepts.  Both, the annotation and the categorization tasks rely on an ontology, that represents the domain knowledge space of complexity. It is implemented as a SKOS 10 taxonomy and shallowly organized within two hierarchical levels -ten main categories and 297 appendant terms. Furthermore some terms are interconnected by the relation type related, to express either topical closeness between two terms or an ambiguity of belonging, when a term could be assigned to more than one main category. Figure <ref type="figure" target="#fig_2">5</ref> shows an excerpt of the model as taxonomy circle <ref type="foot" target="#foot_10">11</ref> . The main categories are displayed in the inner circle, while the outer circle contains examples of their appendant terms. The connections between some of the terms are exemplarily for the use of the related relationships.</p><p>Figure <ref type="figure" target="#fig_3">6</ref> visualizes the architectural composition of the Semantic Content Annotator. It is structured as a parallel working pipeline, utilizing the standard java concurrency package<ref type="foot" target="#foot_11">12</ref> for its implementation. The current pipeline consists of five components, which are called CompleXys Tasks. The Crawled Content Reader and the Content Writer take care of an internally valid data structure and of persistency tasks. In contrast, the inner Complexys Tasks are the actual processing units. They analyze the resources, extract the semantic data and finally annotate it back to the text. The analysis is based on existing NLP services from various contexts. These services are called by using intermediate GATE modules, so that the Complexys Tasks need not care about the technical details of the annotation, but just have to adjust the modules according to their needs and evaluate the solutions.  The Crawled Content Reader is the first component of the pipeline and its main purpose is to gather the documents from the input data store and to prepare them for the succeeding tasks. It wraps the new resources into the internally used GATE data format, embeds them into the corresponding persistency layer and sends them into an output queue for further processing in the pipeline.</p><p>The Onto Gazetteer Annotator searches the text for keywords, that are listed in the gazetteer files and annotates found terms with the corresponding taxonomy concepts. The frequency of occurring annotations can then be used as a simple indicator for the categorization. The central element of this component is the OntoGazetteer or semantic tagger, that is included in the information extraction system ANNIE <ref type="foot" target="#foot_12">13</ref> . It is not directly applicable to the SKOS CompleXys taxonomy, but can make use of a derived, rule-based version. Therefore, every main category of the domain model gets its own .lst gazetteer file, wherein all subordinate terms are listed one per line. A file mappings.def defines the mapping rules from the .lst files to SKOS concepts. However, the expressiveness of the gazetteer data is very limited, so the relationships can not be transformed.</p><p>The KEA Annotator also categorizes a document into the concepts of the CompleXys domain model. It is based on the Keyphrase Extraction Algorithm KEA <ref type="foot" target="#foot_13">14</ref> , that analyzes texts in order to identify the most important words or word groups for each one. The idea of leveraging this behavior for the task of semantic data extraction is, that KEA is implicitly capable of scoring terms according to their text importance. While the OntoGazetteer is capable of answering the question "Do taxonomy terms occur in the text and how often?", KEA goes one step further and additionally tries to answer "Are these terms relevant for the text?". In order to do so, it utilizes additional factors like the relative term occurrence in a single text, compared to the occurrence in all processed texts or the SKOS related relationships as weight boosting functions. To ensure that the keyphrases are matchable to the domain model anyway, it simply uses the CompleXys taxonomy as a controlled vocabulary for the extraction process. To use this functionality the older KEA GATE plugin was manually adapted to the new KEA version 5.0, which allows the controlled indexing. As categorization model CompleXys is trained with the CiteULike-180 data set <ref type="foot" target="#foot_14">15</ref> . First evaluations indicate, that a well adjusted KEA Annotator is capable of outperforming the competing OntoGazetteer solution by means of precision.</p><p>The Open Calais Annotator utilizes the OpenCalais<ref type="foot" target="#foot_15">16</ref> metatagging web service to semantically annotate named entities, events and facts in the text. The so obtained data is not yet used for the domain categorization, but links the data to the wide external set of Calais' stored semantical knowledge base. To exploit these relations has great potential in further improving the categorization, but also for other features like enriching the displayed resources in the front end with additional information.</p><p>Finally the Content Writer ensures, that every document is correctly stored in the Semantic DB, before the pipeline terminates. However, it also checks if a document has actually exceeded the critical threshold of Onto Gazetteer annotations or Kea annotations, that marks the relevancy for the domain of complexity. If a document fails to pass this test, it is deleted. Furthermore, the annotations of a document are counted and mapped to their corresponding main categories. The document is ultimately regarded as being a member of the most frequently occurring categories.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion and Future Work</head><p>In this paper, we have described the CompleXys portal, an information system about complexity, as an example for a system that allows to automatically aggregate, semantically annotate and filter content stemming from a wide variety of sources. We believe that in times of a rapidly growing amount of content that is being dynamically created in ever increasing rates, such systems are an absolute necessity to ensure that users do not "drown in information" and at the same time do not miss relevant information. Only with such intelligent support will we be able to take advantage of this information revolution.</p><p>Up to now, the parts of CompleXys dealing with information harvesting, annotating and filtering have been implemented. A first evaluation shows that CompleXys reaches, indeed reasonable precision and recall with acceptable runtime. For more details please refer to <ref type="bibr" target="#b14">[15]</ref>. Right now, we are working on integrating our approach to personalization into CompleXys. Once this has been done, the portal will be launched as an information site for the members of the research focus area "Analysis and Management of Complex Systems" at our university and for the general public.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Overview of the CompleXys portal. Resources are fetched and stored in crawler database, then semantically annotated and finally presented to the user, if they match his personal preferences and interests.</figDesc><graphic coords="3,134.78,368.78,345.67,141.37" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 4 .</head><label>4</label><figDesc>Fig. 4. Wrapper code for encapsulating newsfeed items into SIOC and DublinCore.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 5 .</head><label>5</label><figDesc>Fig. 5. The CompleXys taxonomy</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 6 .</head><label>6</label><figDesc>Fig. 6. The Semantic Content Annotators</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>8   </figDesc><table><row><cell>Column</cell><cell>Type</cell><cell>Modifiers</cell></row><row><cell>id</cell><cell>bigint</cell><cell>UNIQUE</cell></row><row><cell>source</cell><cell>character varying(255)</cell><cell></cell></row><row><cell>content</cell><cell>text</cell><cell></cell></row><row><cell cols="3">internal id character varying(255) UNIQUE not null</cell></row><row><cell>Table</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>Standardised ruby template for calendar events. All occurrences of params are substituted by values retrieved from a SQL based event management system.</figDesc><table><row><cell>def fill_template(params)</cell></row><row><cell>"&lt;sioc:Item rdf:about=\"#{params[:source]}-#{params[:their_id]}\"&gt;\n" +</cell></row><row><cell>"\t&lt;vevent:dtstart&gt;#{params[:date]}&lt;/vevent:dtstart&gt;\n" +</cell></row><row><cell>"\t&lt;dcterms:creator&gt;#{params[:speaker]}&lt;/dcterms:creator&gt;\n" +</cell></row><row><cell>"\t&lt;vevent:location&gt;#{params[:affi]}&lt;/vevent:location&gt;\n" +</cell></row><row><cell>"\t&lt;dcterms:title&gt;#{params[:title]}&lt;/dcterms:title&gt;\n" +</cell></row><row><cell>"\t&lt;dcterms:abstract&gt;&lt;![CDATA[#{params[:abstract]}]]&gt;\n\t&lt;/dcterms:abstract&gt;" +</cell></row><row><cell>"\t&lt;vevent:url&gt;#{params[:url]}&lt;/vevent:url&gt;" +</cell></row><row><cell>"\t&lt;vevent:dtend&gt;#{params[:endtime]}&lt;/vevent:dtend &gt;\n" +</cell></row><row><cell>"\t&lt;dcterms:modified&gt;#{params[:lastupdate]}&lt;/dcterms:modified&gt;\n" +</cell></row><row><cell>"&lt;/sioc:Item&gt;"</cell></row><row><cell>end</cell></row><row><cell>Fig. 2.</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://web.resource.org/rss/1.0/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">http://tools.ietf.org/html/rfc4287</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">http://tools.ietf.org/html/rfc5545</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">http://www.google.com/ig</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">http://my.yahoo.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">http://technorati.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">http://http://www.feedrinse.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7">SELECT * FROM crawler WHERE id &gt; already seen</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_8">http://sioc-project.org/ontology</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_9">http://www.w3.org/TR/skos-reference/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="11" xml:id="foot_10">The complete ontology can be found at http://www.minervaportals.de/o/complexys.rdf.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="12" xml:id="foot_11"><ref type="bibr" target="#b11">12</ref> http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/packagesummary.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="13" xml:id="foot_12">http://gate.ac.uk/sale/tao/splitch6.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="14" xml:id="foot_13">http://www.nzdl.org/Kea/index.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="15" xml:id="foot_14">11.01.2010: http://maui-indexer.googlecode.com/files/citeulike180.tar.gz</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="16" xml:id="foot_15">http://www.opencalais.com/</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A Hybrid Approach to Identifying User Interests in Web Portals</title>
		<author>
			<persName><forename type="first">F</forename><surname>Bakalov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>König-Ries</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nauerz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Welsch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 9th Int. Conf. on Innovative Internet Community Systems</title>
				<meeting>of the 9th Int. Conf. on Innovative Internet Community Systems</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">IntrospectiveViews: An interface for scrutinizing semantic user models</title>
		<author>
			<persName><forename type="first">F</forename><surname>Bakalov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>König-Ries</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nauerz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Welsch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 18th Int. Conf. on User Modeling, Adaptation, and Personalization</title>
				<meeting>of the 18th Int. Conf. on User Modeling, Adaptation, and Personalization</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The Personal Publication Reader: Illustrating web data extraction, personalization and reasoning for the semantic web</title>
		<author>
			<persName><forename type="first">R</forename><surname>Baumgartner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Henze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Herzog</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 2nd European Semantic Web Conference</title>
				<meeting>of the 2nd European Semantic Web Conference</meeting>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">GATE: A framework and graphical development environment for robust nlp tools and applications</title>
		<author>
			<persName><forename type="first">H</forename><surname>Cunningham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Maynard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Bontcheva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Tablan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL&apos;02)</title>
				<meeting>of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL&apos;02)</meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Indexing a web site with a terminology oriented ontology</title>
		<author>
			<persName><forename type="first">E</forename><surname>Desmontils</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Jaquin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Emerging Semantic Web</title>
				<imprint>
			<publisher>IOS Press</publisher>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="181" to="197" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Personalization in distributed elearning environments</title>
		<author>
			<persName><forename type="first">P</forename><surname>Dolog</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Henze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Nejdl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sintek</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 13th Int. World Wide Web Conf</title>
				<meeting>of the 13th Int. World Wide Web Conf</meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Semantic search</title>
		<author>
			<persName><forename type="first">R</forename><surname>Guha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Mccool</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Miller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 12th Int. Conf. on World Wide Web</title>
				<meeting>of the 12th Int. Conf. on World Wide Web</meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">S-CREAM -Semi-automatic CRE-Ation of Metadata</title>
		<author>
			<persName><forename type="first">S</forename><surname>Handschuh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Staab</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ciravegna</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 13th Int. Conf. on Knowledge Engineering and Knowledge Management</title>
				<meeting>of the 13th Int. Conf. on Knowledge Engineering and Knowledge Management</meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Semantic annotation, indexing, and retrieval</title>
		<author>
			<persName><forename type="first">A</forename><surname>Kiryakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Popov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Terziev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Manov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ognyanoff</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Web Semantics</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="49" to="79" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">MyPortal: robust extraction and aggregation of web content</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kowalkiewicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kaczmarek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Abramowicz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 32nd Int. Conf. on Very Large Data Bases</title>
				<meeting>of the 32nd Int. Conf. on Very Large Data Bases</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Lalmas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Murdock</surname></persName>
		</author>
		<title level="m">Proc. of the Workshop on Aggregated Search held in conj. with the 31st Int. ACM SIGIR Conf</title>
				<meeting>of the Workshop on Aggregated Search held in conj. with the 31st Int. ACM SIGIR Conf</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">MUSE: a MUlti-Source Entity recognition system</title>
		<author>
			<persName><forename type="first">D</forename><surname>Maynard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Tablan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Bontcheva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Cunningham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wilks</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computers and the Humanities</title>
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Content-based recommendation systems</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Pazzani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Billsus</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Adaptive Web: Methods and Strategies of Web Personalization</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="325" to="341" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">KIM a semantic platform for information extraction and retrieval</title>
		<author>
			<persName><forename type="first">B</forename><surname>Popov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kiryakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ognyanoff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Manov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kirilov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Natural Language Engineering</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">3-4</biblScope>
			<biblScope unit="page" from="375" to="392" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">An approach for semantic enrichment of social media resources for context dependent processing</title>
		<author>
			<persName><forename type="first">O</forename><surname>Schimratzki</surname></persName>
		</author>
		<ptr target="http://www.minerva-portals.de/publications/theses/an-approach-for-semantic-enrichment-of-social" />
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
		<respStmt>
			<orgName>University of Jena</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Diploma Thesis</note>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Aggregate documents: making sense of a patchwork of topical documents</title>
		<author>
			<persName><forename type="first">M</forename><surname>Shilman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 8th ACM Symp. on Document Engineering</title>
				<meeting>of the 8th ACM Symp. on Document Engineering</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Recommendations based on semantically enriched museum collections</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Wanga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Stash</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Aroyoa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Gorgels</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rutledge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Schreiber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Web Semantics: Science, Services and Agents on the World Wide Web</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="283" to="290" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
