<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Semantic Pen -A Personal Information Management System for Pen Based Devices</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Akila</forename><surname>Varadarajan</surname></persName>
							<email>akilav@umich.edu</email>
							<affiliation key="aff0">
								<orgName type="department">Dept. of Computer Science</orgName>
								<orgName type="institution">The University of Michigan -Dearborn</orgName>
								<address>
									<addrLine>4901, Evergreen Road</addrLine>
									<postCode>48080</postCode>
									<settlement>Dearborn</settlement>
									<region>MI</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nilesh</forename><surname>Patel</surname></persName>
							<email>patelnv@umich.edu</email>
							<affiliation key="aff0">
								<orgName type="department">Dept. of Computer Science</orgName>
								<orgName type="institution">The University of Michigan -Dearborn</orgName>
								<address>
									<addrLine>4901, Evergreen Road</addrLine>
									<postCode>48080</postCode>
									<settlement>Dearborn</settlement>
									<region>MI</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">William</forename><surname>Grosky</surname></persName>
							<email>wgrosky@umich.edu</email>
							<affiliation key="aff0">
								<orgName type="department">Dept. of Computer Science</orgName>
								<orgName type="institution">The University of Michigan -Dearborn</orgName>
								<address>
									<addrLine>4901, Evergreen Road</addrLine>
									<postCode>48080</postCode>
									<settlement>Dearborn</settlement>
									<region>MI</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Semantic Pen -A Personal Information Management System for Pen Based Devices</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">3F08D8764B2BC5B09B82A2F860A62168</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T05:17+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The Onset of Semantic Web technology have promised a new vision of Personal Information Management (PIM). With the advent of Pen-based computing, PIM faces new challenges: usability and flexibility are important constraints in the pen based environment. We present our system of Semantic Pen -an augmented pen based PIM system that merges the efficiency of semantic web with the usability of pen based devices. The architecture consists of an intuitive user interface which can capture digital ink, a Hidden Markov model (HMM) to extract personal information and a data model of Resource Description Framework(RDF) for flexible organization and semantic querying of data.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Personal Information Managers (PIM) have become increasingly common these days. The usage model of PIM systems have gone beyond scheduling reminders and simple record maintenance. Semantic Web, through the introduction of ontological reasoning by means of Resource Description Framework(RDF) <ref type="bibr" target="#b0">[1]</ref> have proven to be an efficient solution for PIM . The Haystack Project <ref type="bibr" target="#b1">[2]</ref> is well known for applying semantic web technologies to create a fully flexible and customizable PIM portal for organizing the germane information. The Gnowsis Semantic desktop <ref type="bibr" target="#b2">[3]</ref> targets data integration including data from 3rd party applications. Semex <ref type="bibr" target="#b3">[4]</ref> focuses on personalized desktop search. Chandler <ref type="bibr" target="#b4">[5]</ref> is an Interpersonal Information Manager that supports data sharing besides managing email, calendar and other general information. Retsina Calendar Agent <ref type="bibr" target="#b5">[6]</ref>, is a distributed meeting scheduling agent which works in conjunction with Microsoft Outlook 2000 and Semantic Web.</p><p>While most of the research in PIM using Semantic Web is centered around desktop and notebooks, there is a need to extend such concepts in context of penbased computing. The pen-based systems have empowered users by providing the most natural form of input modality known as Digital Ink. Since its introduction, researchers have shown increased interest to ease the user interface centric tasks. Wilcox et al. designed a system Dynomite <ref type="bibr" target="#b6">[7]</ref> for organizing telephone numbers and other tasks by applying properties for ink words. Scribbler <ref type="bibr" target="#b7">[8]</ref> is another tool that enables searching ink words, symbols or simple sketches by matching raw strokes instead of recognized text. Marquee <ref type="bibr" target="#b8">[9]</ref>is a logging tool where users can correlate their personal notes and keywords with a videotape during recording. Microsoft's products One Note 2003 and Journal helps to capture, customize and organize ink documents suitably.</p><p>We present Semantic Pen that aims to combine the efficacy of semantic web with the usability of pen based devices to provide a next generation highly intuitive and intelligent PIM system. Semantic Pen has a simple and attractive user interface comparable to leading note taking tools. In addition, our system is composed of two core modules; <ref type="bibr" target="#b0">(1)</ref> an Automatic Data Extraction (ADE) wizard and (2) an Association wizard. The system Architecture of Semantic Pen is shown in figure.1. ADE is the heart of the system which extracts the data via Hidden Markov Models(HMM) <ref type="bibr" target="#b9">[10]</ref>. Additional details such as name of a person for an extracted email address can be semi-automatically included through the ADE wizard. This wizard displays a smart name/place list generated by our intelligent noun filter algorithm. The user can either choose a name from the list or enter his own. This extracted personal information is then automatically stored in the commercial information management tool such as Microsoft Outlook. Once the personal information is extracted, an Association Wizard helps associating the data with the existing data repository items. Our approach uses the popular RDF framework Jena <ref type="bibr" target="#b10">[11]</ref> to store and retrieve the data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Semantic Pen</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Automatic Information Extraction using Hidden Markov Model(HMM)</head><p>HMM is a finite state automation that implements stochastic state transitions and symbol emissions. We use the model of Freitag and MacCallum <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b9">10]</ref> to extract personal data from the ink notes. Once the states for the HMM (Prefix, Target,Suffix and Background states) is decided, the document is parsed and taxonomized to obtain the emission vocabulary of the HMM. We generate a set of intuitive term by feature pairs t, f where t is the intuitive term and f is an identified feature that creates the appropriate intuition on that term <ref type="bibr" target="#b12">[13]</ref>.</p><p>The possible formats and constraints for the Intuitive term f eatures such as Email ID, Phone No, Date, Proper Noun are identified and defined in a database. Then we calculate W F M to classify the intuitive terms. For a term t, W M F (t) is computed as follows:</p><formula xml:id="formula_0">W F M (t) = N c(t) N c(f )</formula><p>Where, N c(f ) represents the total number of constraints for the word feature f . For example, the '@ 'symbol and a domain name are some constraints for an email address. N c(t) is defined as:</p><formula xml:id="formula_1">N c(f ) x=0 M (t, c x )</formula><p>where M (t, c x ), the matching function, equals 1 if the term t contains a matching constraint c x W F M (t) is calculated by varying f in N c(f ). If W F M (t) equals 1 for some value of f in N c(f ), it means the term t is of the suspected word feature type f . If W F M (t) is less than 1 for all values of f in N c(f ), it means the term is not of any type of suspected word feature.</p><p>Table <ref type="table">.</ref>1 describes how we define the emission vocabulary for the HMM by means of W F M and Bikel's classification of word features <ref type="bibr" target="#b12">[13]</ref>. Once the emission vocabulary by means of the intuitive word features is obtained, the Viterbi algorithm <ref type="bibr" target="#b11">[12]</ref> is used to accurately identify the most likely state sequences of a particular document. Finally, the HMM outputs the strings which are likely to be the personal data that need to be stored.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Personal Information Association using RDF</head><p>The next step is to create suitable associations of the new data with the existing elements in the database. We are currently in the development stage of this algorithm. In this, we define two components namely instances and associations. The instances are the actual objects that need to be associated such as email address of Bob or web page of an institution "XYZ". The associations are the relationship that might exist between two instances. Consider, "Bob works at XYZ". In this case works at is an association that exists between instances Bob and XYZ that binds them together. Similarly there might be another association existing such as "Steve works at XYZ". Now an automatic link gets associated between Bob and Steve. However, the user is prompted to obtain a suitable association between these two instances.</p><p>Initially all possible instances such as contact information, task list and web page links will be extracted by the system. Associations among these instances will be obtained semi-automatically by running the association wizard. The instances and associations are then stored in a separate database and represented by means of Ontolgies using Resource Description Framework (RDF). The RDF framework Jena <ref type="bibr" target="#b10">[11]</ref> is chosen to store and retrieve the RDF data. Also, when a new item is added to the database externally, our system will alert the user to run the association wizard to form suitable associations.</p><p>Our interface will identify the associated instances by querying the RDF database and generate associations such as;(i) a calendar entry is related with a file which is modified at that date and time,(ii) a book marked web page consists of information about a workshop in the task item, (iii) a contact is the author of a particular document. The user will be allowed to choose an association from an existing list or to define his own.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Experimental Results</head><p>A NEC Versa Lite Pad Tablet was used to test our system. The note taking interface is developed using Agilix infinotes <ref type="bibr" target="#b13">[14]</ref> .NET component. To test our Automatic Data Extraction (ADE) wizard, we collected meeting notes from 25 people. Each collected note was about 250-500 words in length, containing a mixture of email address, phone number, date-time information, proper nouns, and hyper-links. The data extraction results were analyzed off-line via the Automatic Data Extraction(ADE) wizard. Our application uses the recognized ASCII text from the meeting notes for all manipulations. The built-in Microsoft Hand Writing Recognizer that comes with the tablet is used to translate ink data to the ASCII text. Since the Association Wizard is still under development, the experimental results pertaining to only ADE is presented in this paper.</p><p>The ink to text recognition accuracy plays a major role in performance of ADE wizard. In past, due to its least individuality the numbers have been reported with higher recognition accuracy <ref type="bibr" target="#b14">[15]</ref>. Our analysis also supports the previous research in this regard. The recognition rate of numbers in our experiment, was found to be as good as 88.9% compared to the recognition rate of the letters which was found to be just 58.8%. Similarly, we also found that the noncursive handwriting had the highest recognition rate of 82.2% compared to the cursive handwriting recognition rate of about 73.2%. The printed handwriting had the worst recognition rate of 21.1%.</p><p>In addition to the recognizer's inaccuracy, we also found that the emission vocabulary symbols failed to get identified due to unnecessary white spaces inserted by the recognizer. Our system intelligently handles these white spaces to improve an overall precision of the data extraction system.</p><p>The results of our phase I activity in extracting key personal information using HMM is measured using standard precision and recall measures, defined as:</p><p>P recision = R R+Ri * 100; Recall = R R+Rm * 100 where, R=Relevant records retrieved, R i = Irrelevant records retrieved and R m =Missed relevant records. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Architecture of Semantic-Pen</figDesc><graphic coords="2,199.67,304.63,216.13,132.75" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Emission vocabulary for HMM</figDesc><table><row><cell>Intuitive Word Feature</cell><cell>Example formats</cell></row><row><cell>Email ID</cell><cell>bob@umich.edu, bob@yahoo.com, bob@xyz.org</cell></row><row><cell>Phone No.</cell><cell>(586)-779-6320, 586-779-6320, 779-6320</cell></row><row><cell>Date</cell><cell>09/01/06, 09-01-06, 09/01/2006,Sep-1-06</cell></row><row><cell>Time</cell><cell>12.30 pm, 12:30 a.m, 12.30 AM</cell></row><row><cell>Proper Noun(Name or Place)</cell><cell>Bob, Michigan</cell></row><row><cell>URL</cell><cell>www.umich.edu</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Precision Vs Recall for automatic data extraction</figDesc><table><row><cell>Data Extracted</cell><cell cols="2">Precision Recall</cell></row><row><cell>Email address</cell><cell>90.15</cell><cell>89.34</cell></row><row><cell>Phone Number</cell><cell>96.57</cell><cell>96.87</cell></row><row><cell>Schedule Information (Date and Time)</cell><cell>88.26</cell><cell>89.75</cell></row><row><cell>URL</cell><cell>91.12</cell><cell>92.34</cell></row><row><cell>Proper Noun(Name or place)</cell><cell>93.23</cell><cell>89.23</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Rdf vocabulary description language 1.0: Rdf schema</title>
		<author>
			<persName><forename type="first">R</forename><surname>Guha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Brickley</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">W3C recommendation</title>
		<imprint>
			<date type="published" when="2004-02-10">10 february 2004. 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Haystack: A platform for authoring end user semantic web applications</title>
		<author>
			<persName><forename type="first">D</forename><surname>Quan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Huynh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">R</forename><surname>Karger</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ISWC</title>
		<imprint>
			<biblScope unit="page">738753</biblScope>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The gnowsis semantic desktop for information integration</title>
		<author>
			<persName><forename type="first">L</forename><surname>Sauermann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of WM</title>
				<meeting>WM</meeting>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Cai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">L</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Halevy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Madhavan</surname></persName>
		</author>
		<title level="m">Personal information management with semex</title>
				<meeting><address><addrLine>Baltimore, Maryland USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><surname>Osaf</surname></persName>
		</author>
		<ptr target="http://www.osafoundation.org/ChandlerCompellinVision.htm" />
		<title level="m">Chandler</title>
				<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Calendar agents on the semantic web</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">R</forename><surname>Payne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sycara</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE INTELLIGENT SYSTEMS</title>
		<imprint>
			<biblScope unit="page" from="84" to="86" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Dynomite: A dynamically organized ink and audio notebook</title>
		<author>
			<persName><forename type="first">D</forename><surname>Wilcox</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Schilit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sawhney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGCHI</title>
		<imprint>
			<biblScope unit="page" from="186" to="193" />
			<date type="published" when="1997">1997</date>
			<publisher>ACM</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Scribbler: A tool for searching digital ink</title>
		<author>
			<persName><forename type="first">A</forename><surname>Poon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Weber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Cass</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CHI &apos;95</title>
				<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="1995">1995</date>
			<biblScope unit="page" from="252" to="253" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Marquee: a tool for real-time video logging</title>
		<author>
			<persName><forename type="first">K</forename><surname>Weber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Poon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGCHI&apos; 94</title>
				<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="1994">1994</date>
			<biblScope unit="page" from="58" to="64" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Address extraction using hidden markov models</title>
		<author>
			<persName><forename type="first">K</forename><surname>Taghva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Coombs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Pereda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Nartker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IS&amp;TSPIE</title>
		<imprint>
			<date type="published" when="2005-01">2005. January 2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Jena: A semantic web framework for java</title>
		<ptr target="http://jena.sourceforge.net" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Information extraction with hmm structures learning by stochastic optimization</title>
		<author>
			<persName><forename type="first">D</forename><surname>Freitag</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mccallum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">17th National Conference AI</title>
				<imprint>
			<date type="published" when="2000">2000</date>
			<biblScope unit="page" from="584" to="589" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Nymble: a high-performance learning namefnder</title>
		<author>
			<persName><forename type="first">D</forename><surname>Bikel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Miller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weischedel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ANLP</title>
		<imprint>
			<biblScope unit="volume">97</biblScope>
			<biblScope unit="page" from="194" to="201" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><surname>Agilix</surname></persName>
		</author>
		<ptr target="http://www.agilix.com/www/notecontrol.aspx?pid=14" />
		<title level="m">Infinotes</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Individuality of handwritten characters</title>
		<author>
			<persName><forename type="first">B</forename><surname>Zhan</surname></persName>
		</author>
		<author>
			<persName><forename type="middle">N</forename><surname>Sargur</surname></persName>
		</author>
		<author>
			<persName><surname>Srihari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IC-DAR 2003</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
