Context Correlation Using Probabilistic Semantics Setareh Rafatirad Kathryn Laskey Paulo Costa George Mason University George Mason University George Mason University Email: srafatir@gmu.edu Email: klaskey@gmu.edu Email: pcosta@gmu.edu Abstract—We present an approach for recognizing high- event type visit-landmark may have two instances; one instance level geo-temporal phenomena – referred as events/occurrences– associated with World War II Memorial and the other to from in-depth discovery of information, using geo-tagged photos, Washington Monument). Consider the following example: A formal event models, and various context cues like weather, person takes a photograph at an airport less than 1 hour after space, time, and people. Due to the relative availability of his flight arrives. To explain this photograph, we first need the information, our approach automatically obtains a probabilistic background knowledge about the events that generally occur measure of occurrence likelihood for the recognized geo-temporal phenomena. This measure, however, is not only used to find the in the domain of a trip. These semantics can only come from best event among the merely possible candidates – witnessing the an event-ontology that provides the vocabulary for event/entity data (including photos), but it can also provide informative cues and event relationships related to a domain. An event-ontology to human operators in the environments where uncertainty is allows explicit specification of models that could be modified involved in the existing knowledge. using context information to provide very flexible models for high-level semantics of events. We refer to this modification I. I NTRODUCTION as Event Ontology Extension. It constructs a more robust and refined version of an event-ontology either fully or semi- Sensors have become one of the biggest contributors of automatically. Secondly, given the uncertain nature of sensory BIG DATA datasets. Numerous datasets have been already data (like GPS that is not always accurate), the event type generated in real-time with rich content, about various informa- witnessed by the available context data is not decisive; in tion. Mobile wireless devices with multiple sensors like camera the above example, the event might either be rent a car, or and GPS, and internet connectivity, can continuously capture baggage claim that are two possible conclusions — sometimes photos and record camera parameters, GPS location, and time. no single obvious explanation is available, but rather, several The availability of various web services like MapMyRide competing explanations exist and we must select the best one. 1 , and Wunderground 2 , provides semantics like ride, and In this work, reasoning from a set of incomplete information geo-temporal weather status logs, using the captured sensory (observations) to the most related conclusion out of all possible data. Given that context data exists in massive volumes, an ones (explanations) is performed through a ranking algorithm information management paradigm is needed to correlate the that incorporates the plausibility measure; this ranking process information and infer higher level semantics. We propose a is used in Event Ontology Extension. technique that automatically correlates various information, and creates a context-aware event graph by combining event Problem Formulation: Every input photo has context models with contextual information related to photos, sensor information (timestamp, location, and camera parameters) and logs, heterogeneous data sources, and web services. Our tech- a user. Each photo belongs to a photo stream P of an event nique automatically computes the occurrence-likelihood for with a domain event model O(V, E) –handcrafted by a group the event nodes in the output graph – referred as plausibility of domain experts– whose nodes V are event/entity classes, measure that provides informative cues to human operators and edges E represent the relationships between the nodes. in uncertain environments to make better decisions. Note that There is a bucket B of external data sources represented with this work provides a holistic view of the high-level events a schema. The sources can be queried using the metadata witnessed by a dataset; further cause-effect decision-making of the input photographs and other available information, using the output of this stage in out of the scope of this paper. including the information about the associated user. Given P , B, O, and information associated to the user, how does one Events, in general, are structured and their subevents find the finest possible event tag that can be assigned to a have relatively more expressive power [13]. In this work, an photo or a group of similar photos in P ? event model (or event ontology) provides a multi-granular conceptual description, i.e., it provides conceptual hierarchy in multiple levels using containment event-event relationships Solution: We propose an Event Ontology Extension tech- e.g., subevent-of, and subClassOf. In addition, event types nique described as follows: select a relevant domain event can have multiple instances; instance events are contextual, model through the information related to both P and the user. and they should be augmented with context cues (like place, Using P , B, O, and the user information, infer S – that time, weather). This makes instance events more expressive consists of the best relevant subevent categories to P – where than event types. Augmenting an instance event with context S ✓ V . A member of S is the most plausible event category for cues adapts a concept to multiple contextual descriptions (e.g., a group of contextually-similar photos. For a group of similar photos cj , a function f calculates the plausibility measure mpij 1 http://www.mapmyride.com/ for every competing event candidate si : f (si , cj ) = mpij ; this 2 http://www.wunderground.com/ measure indicates how much si is relevant to cj such that STIDS 2013 Proceedings Page 2 cj ⇢ P . Using the information from B, extend S with one or not suit the purpose of this research (they deal with low more augmented instances of S, and obtain expressive event level events i.e.,activities). However, higher-level events have tags T . An event tag tei 2 T is a subevent of an event that relatively more contextual characteristics; d) according to the either exists in O, or can be derived from O such that tei useful properties of photos, relevant event categories in the is the finest subevent tag that can be assigned to a group of model must be discovered. This paper is organized as follows: similar photos. If tei is an assignable tag to any photo, and in section II, we review the prior art that use context and event tei 62 O, we intend to extend O by adding tei to O such models for annotating photographs; in section III and IV, we that the constraints governing O are preserved. The output explain our solution strategy; this is followed by section V is an extension to O that is referred as Or (see fig 1). We that demonstrates our experiments, and section VI which is argue that attribute values related to an inferred event need the conclusion. to be obtained, refined, and validated as much as possible to create very expressive and reliable metadata. Fig 3 depicts II. S TATE O F A RT the processing components of our proposed approach. We used semantics such as spatiotemporal attributes/constraints of The important role of context is emphasized in [9]. Con- events, subevent structure, and spatiotemporal proximity. In text information and ontological event models are used in contrast to machine learning approaches that are limited to conjunction by [16], [6]. Cao et al. present an approach for the training data set and require an extensive amount of anno- event recognition in image collections using image timestamp, tation, we propose a technique in which existing knowledge location, and a compact ontology of events and scenes [4]; this sources are modified and expanded with context information work, does not support subevent structure. Liu et al. reports a in external data sources including public data sources (like framework that converts each event description from existing public event/weather directories, local business databases), and event directories (like Last.fm) into an event ontology that is a digital media archives (like photographs). With this knowledge minimal core model for any general event [11]. This approach expansion, new infrastructures are constructed to serve relevant is not flexible to describe domain events (like trip) and their data to communities. Event tags are propagated with event subevent structure. Paniagua et al. propose an approach that title, place information (like city, category, place name), time, builds an event hierarchy using the contextual information of weather, etc. Our proposed technique provides two unique a photo based on moving away from routine locations, and key benefits as follows: 1) A sufficiently flexible structure to string analysis of English album titles (annotated by people) for express context attributes for events such that the attributes are public web albums in Picasaweb [12]. The limitations of this not hardwired to events, but rather they are discovered on the approach are: 1) human-induced tags are noisy, and 2) subevent fly. This feature does not limit our approach to a single data relationship is more than just spatiotemporal containment. For set; 2) leveraging context data across multiple sources could instance, albeit a car accident may occur in the spatiotemporal facilitate building a consistent, unambiguous knowledge base. extent of a trip, it is not part of the subevent-structure of the trip. According to [3], events form a hierarchical narrative structure that is connected by causal, temporal, spatial and subevent relations. If these aspects are carefully modeled, they can be used to create a descriptive knowledge base for interpreting multimedia data. In [14], a mechanism is proposed that exploits context sources in conjunction with subevent- structure of an event — this structure is modeled in a domain event ontology. The limitation of this approach is no matter how much an event category is relevant to a group of photos in a photo stream, it is used in photo annotation; as a result, the quality of annotation degrades. III. E VENT O NTOLOGY E XTENSION Photo’s incomplete information can be improved if com- bined with the information related to a group of similar photos. In this work, two images are similar if they belong to the same event type. Partitioning a photo stream of an event based on the context of its digital photographs can create separate subevent boundaries for its photos [5]. An event is a spatiotemporal entity [7]. In addition, optical camera parameters (CP) in Fig. 1. An example of an event model being extended with contextually photos provide useful information related to the environment propagated instances. (like outdoor) at which an event occurs [15]. We used a clustering that partitions photos hierarchically based on their Some of the main challenges of this work are: a) collecting timestamp, location, and CP. We used single linkage clustering and correlating information from various sources – we need and Euclidean distance in our clustering technique. However, a general mechanism that automatically queries sources and one can use other approaches and refine the results. We present represents the output; b) a validation mechanism to ensure the the observations (i.e., photos/clusters) with a set of descriptors coherency of the obtained data; c) currently, publicly available – a cluster consists of a group of contextually similar photos. benchmark data sets such as those offered by TRECVid do In this section, we show that it is feasible to go from a set of STIDS 2013 Proceedings Page 3 descriptors D to the best subevent category, when the following {F lash : ‘of f ‘, conf idence : 1.0} for a photo, states that conditions are satisfied: (a) the descriptors in D are consistent the flash was off when the photo was captured with 100% among themselves, (b) the descriptors in D satisfy subevent certainty. Photo and cluster descriptors follow the same rep- categories, (c) axioms of a subevent category are consistently resentation model, however the rules for computing the value formulated in an event ontology, and (d) the inferred subevent of conf idenced are different. We will describe these rules in categories are sound and complete. the following paragraphs. The descriptor model of a cluster includes two fields in addition to that of a photo: plausibility- A. EVENT MODEL weight 0 , and implausibility-weight < 0. Later, we will explain the usage of these fields. All descriptors are either We use a basic derivation of E* model [8] as our core event direct or derived. For photo descriptors, by convention, we model, to specify the general relationships between events and assume that a direct descriptor is straightly extracted from entities. Specifically, we utilized the relationships subeventOf, the EXIF metadata of a photo, and its confidence is 1, as which specifies the event structure and event containment. The in the above example. The direct descriptors that we used in expression e1 subeventOf e2 indicates that e1 occurs within this paper are related to time, location, and optical parameters the spatiotemporal bounds of e2 , and e1 is part of the regular of photos like GPSLatitude ,GPSLongitude , Orientation, structure of e2 . Additionally, we used the spatiotemporal rela- Timestamp, and ExposureTime. For a derived descriptor like tionships like occurs-during and occurs-at to specify the space {sceneT ype : ‘indoor‘, conf idence : 0.6}, the descriptor and time properties of an event. The time and space model that value ‘indoor‘ is computed using direct descriptors like Flash, we used in this work is mostly derived from E* model. The through a sequence of computations that extract information relationship participant is used to describe the presence of a from a bucket of data sources. Some of these descriptors person in an event. We use the relationships co-occurring-with, are PlaceCategory3 , Distance4 , and HoursOfOperation5 . The and co-located-with, spatially-near, temporal-overlap , before, confidence score is obtained from the processing unit used and after to describe the spatiotemporal neighborhood of an to compute the descriptor value — we developed several event. The relationship same-as between two events, makes information retrieval algorithms for this purpose,in addition them equivalent entities. Also, we used several other relation- to the existing tools in our lab [15]. If a descriptor value is ships to describe additional constraints about events (e.g., e1 directly extracted from an external data source, conf idenced has-ambient-constraint A, and A has-value indoor). Moreover, is equal to 1. Direct descriptors of a cluster must represent to express a certain group of temporal constraints, we utilized all photos contained in it; some of these descriptors represent some of Linear Temporal Logic, Metric Temporal Logic, and boundingbox, time-interval, and size of the cluster. The confi- Real-Time Temporal Logic formulas [10], [2]. These formulas dence value for direct descriptors is equal to 1, for instance, are a combination of the classical operators ^ (conjunction) in the descriptor {size : 5, conf idenced : 1.0} that indicates , _ (disjunction) , implication (!) , Allen’s calculus [1], ⇤ the number of photos in a cluster, conf idenced is equal to 1. operator, ⌃ operator, linear constraints, and distance functions; they are used to model complex relative temporal properties. Given a photo pi in a photo stream P , and the cluster c For instance constraint ⇤[t1 ,t2 ] (e1 ! ⌃[t2 ,t2 +1800] e2 ^ D̃(e2 )  that groups pi with the most similar photos in P , a processing 1800) states that e2 eventually happens within 1800 seconds unit produces the descriptors of c using the descriptors of the after e1 and that e2 lasts less than or equal to 1800 seconds. photos in c, and more importantly, this process is guided by the We developed a language L with a syntax and grammar as descriptors of pi . Every photo in c must support every derived an extension to OWL to embrace complex temporal formulas. descriptor of pi ; such cluster is referred as a sound cluster Further, we extended the language to support a combination of for pi , and the derived descriptors for c are represented by classical propositional operators, linear spatial constraints, and the distinct union of the derived descriptors of the photos in c. spatial distance functions which can not be expressed in OWL; For a derived cluster descriptor d, the value of conf idenced is equation feucDist (e1 , e2 , @  100) shows a relative spatial calculated using the formula in equation 1, in which |c| is the constraint in L, which states the event e1 occurs at most 100 size of the cluster, pj is every photo in c that is represented by meters away from the place at which event e2 occurs. d, and g(pj , d) gives the confidence value of d in pj . To find a sound cluster for a photo, the hierarchical structure that is Domain Event Model: A domain event ontology provides produced by the clustering unit, is traversed using depth-first specialized taxonomy for a certain domain like trip, see fig search — the halting condition for this navigation, if no sound 2. The Miscellaneous subevent category in this model is used cluster was found, is when current cluster is a leaf node. to annotate the photos that are not matched with any other X category. The general vocabulary in a core event model is 1 conf idenced = ⇥ f (pj , d) (1) reused in a domain event ontology. For instance, Parking in fig |c| 2, is a subClassOf of Occurrent (or event) concept in the core event ontology. Also, relationships like subeventOf are reused Descriptor Consistency: Consistency among a set of de- from the core event ontology. We assume that domain event scriptors is a mandatory condition to infer the best possible ontologies are handcrafted by a group of domain experts. conclusion from it. In this work, consistency must exist among the descriptors of a photo as well as the descriptors of a cluster, B. DESCRIPTOR REPRESENTATION MODEL using entailment rules described below. (a) vi ! vk : if vi implies vk , then the rules for vk must also be applied to vi . This We represent a descriptor using the schema in script {typed : valued , conf idenced : val}, in which typed , valued , 3 The category of the nearest local business to the coordinates of a photo. and val indicate the type, value,and certainty (between 0 and 4 The distance of a local business to the coordinates of a photo. 1) of the descriptor, respectively. For instance, the descriptor 5 The hours during which a local business is open. STIDS 2013 Proceedings Page 4 is referred as transitive entailment rule. For instance, suppose a photo/cluster has the following description, 0 outdoorSeating : true0 ; 0 sceneT ype : outdoor0 ; 0 weatherCondition : storm0 , which implies that the nearest local business (e.g. restaurant) to the photo/cluster, offers outdoorSeating, and the weather was stormy when the photo(s) were captured. Given the sequence of rules below, outdoorSeating ^ outdoor ! f ineW eather, f ineW eather ! ¬storm rule 2 is entailed that indicates an inconsistency among the descriptors of a photo/cluster. outdoorSeating ^ outdoor ! ¬storm (2) (b) vi ! f uncremove (vk ): vi implies removing the descriptor vk . This is referred as a deterministic entailment rule. (c) vi ^ vk ! truth value: rules of this type are referred as non-deterministic entailment rules in which the inconsis- Fig. 2. An event ontology for the domain professional trip. tency is expressed by a false truth value e.g. closeShot ^ landscape ! f alse. In that case, further decisions on keep- ing,modifying, or discarding either of the descriptors vi or D. EVENT INFERENCE vk will be based on the confidence value assigned to each descriptor — this operation is referred as update, which is From a set of consistent cluster descriptors (observations), executed when an inconsistency occurs between two candidate we developed a context discovery algorithm to infer the most descriptors. The following rules are used by this process: (a) plausible subevent category described in a domain event on- for two descriptors with the same type, the descriptor with tology. This algorithm, uses the domain event model, which is lower confidence score is discarded, (b) for two descriptors a graph; we represent this graph with the notation O(V, E) with different types, the one with lower confidence score gets in which V includes event classes, and E includes event modified until the descriptors are consistent. The modification relationships. Traversing the event graph O starts with the is defined as either negation or expansion within the search root of hierarchical subevent structure. The algorithm visits space. In case of negation, e.g. ¬outdoor ! indoor, the con- event candidates in E through some of the relationships in E fidence value for indoor descriptor is calculated by subtracting like subeventOf, co-occurring-with, co-located-with, spatially- the confidence value of outdoor descriptor from 1. An example near, temporal-overlap, before, and after — these relation- of expansion is increasing a window size to discover more ships help to reach other event candidates that are in the local businesses near a location. To avoid falling inside an spatiotemporal neighborhood of an event. An expandable list, infinite loop, we limit the count of negation, and the size of referred as Lv , is constructed from E, to maintain the visited search space during expansion, by a threshold. We assign null event/subevent nodes during an iteration i — if an event is to the descriptor that has already reached a threshold and is still added to Lv , it cannot be processed again during the extent inconsistent. null is universally consistent with any descriptor. of i. At the end of each iteration, Lv is cleared. In every The vocabulary that is used to model the descriptors for a iteration, the best subevent category is inferred through a photo/cluster is taken from the vocabulary that is specified in ranking process, from a set of consistent observations. the core event model. To find the most plausible subevent category, we introduce Measure of Plausibility (mpij ) to rank event candidates. This C. DATA SOURCES measure is computed using two parameters (1) granularity We represent each data source with a declarative schema, score (wg ), and (2) plausibility score (wAX ). wg is equivalent by using the vocabulary of the core event model. This schema to the level of the event in the subevent hierarchy in the domain indicates the type of source output. In addition, it specifies event ontology. To compute wAX , we used ’plausibility- what type of the input attributes a source needs, to deliver the weight’ (w+ ) and ’implausibility-weight’ (w ) which are two output. Data sources are queried using the SPARQL language6 . fields of a cluster descriptor. The value of w+ is equal to the A query is constructed automatically using the schema of data confidence value assigned to a descriptor, and the value of sources, and the available information. Simply put, a source is w is equal to w+ . If a descriptor could not be mapped to selected if its input attributes match the available information any event constraint, wAX remains unchanged. If a descriptor I. At every iteration, I is incrementally updated with new data with w+ = ↵ satisfies an event constraint, then w+ is added to that is delivered by a source. The next source is selected if its wAX , otherwise, w is added to wAX (i.e., wAX = wAX ↵). input attributes are included in I. This process continues until The only exception is for the cluster descriptors time-interval no more source with matching attributes is left in the bucket and boundingbox; if either one of these descriptors satisfies an B. explanation, then w+ = 1; in the opposite case, w  100 — when a cluster has no overlap with the spatiotemporal extent 6 http://www.w3.org/TR/rdf-sparql-query/ of an event si , w  100 makes si the least plausible STIDS 2013 Proceedings Page 5 candidate in the ranking. According to the formula in III-D, user information and the attributes of a seed-event (I) that wAX also depends on the fraction of satisfied event constraints; is represented with the same schema that is described in the N is the total number of constraints for an event candidate. event ontology. Given a sequence of input attributes, if a data source returns an output-array of size K, then our algorithm 1 X j wAX = wAX , 1  j  N (3) creates K new instances of events with the same type as in N the seed-event, and augments them with the information in the Finally, we use the following instructions to compare two output-array. The augmented seed-events are added to I for event candidates e1 and e2 : when e1 is subsumed by e2 , the next iteration; I is constantly updated until all the event mpij for each event candidate is normalized using the formula categories in E 0 are augmented, and/or there is no more data in equation 4, in which ei ⌘ e1 and ej ⌘ e2 , otherwise, source (in the bucket B) to query. To avoid falling into an ei .mpij = ei .wAX . The candidate with the highest mpij is the infinite loop of querying data sources, we set the following most plausible subevent category. condition: a data source cannot be queried more than once for each seed-event. We defined some queries manually that ei .wAX ei .wg are expressed through the relative spatiotemporal relationships ei .mpij = + (4) in the event ontology, and the augmented seed-events; these max(ei .wAX , ej .wAX ) max(ei .wg , ej .wg ) queries are used to augment the seed-events with relative When a subevent category is inferred from a set of observa- spatiotemporal properties. When a seed-event gets augmented tions, it will not be considered again as a candidate for the next with information, our technique validates the event tag by set of observations. Event inference halts if no more subevent using the event constraints, augmented event attributes, and category is left to be inferred from the domain event ontology. a sequence of entailment rules that specify the cancel status for an event. For instance, if the weather attribute for an EXTENSION: The inferred subevent categories E 0 are event is heavy rain, and the weather constraint fine weather is refined with the context data extracted from data sources in the defined for an event, then the status of the event tag becomes bucket B, through the refinement process. First, let us elaborate canceled. After the validation, event tags are added to the this process by introducing the notion of seed event, which is domain event ontology by extending event classes through an instance of an inferred category in E 0 , which is not yet typeOf relationship. This step produces an augmented event augmented with information. An augmented seed-event is an ontology that is the extended version of the prior model (see expressive event tag. The seed-event is continuously refined fig 1). with information from multiple sources. IV. FILTERING Filtering is a two-step process; (1) redundant and irrelevant clusters are pruned from the hierarchical cluster structure produced by the clustering component, see fig 4-step-1. (2) filter redundant photos from the matched cluster, see fig 4- step-2. This is accomplished by applying the context and visual constraints of the expressive tag that is matched to the cluster. We used a concept verification tool7 to verify the visual constraints of events using image features. This tool uses pyramids of color histogram and GIST features. Filtering operation is deeply guided by the expressive tags. During this operation, subevent relations are used for navigating the augmented event model. V. EXPERIMENTAL EVALUATIONS Fig. 3. The Big Picture. Photos and their metadata are stored in photo-base and metadata-base respectively. Using user info, including events’ type, time, We focused on 3 domain scenarios vacation, professional and space in a user’s calendar, a photo stream is queried, and its metadata is passed to Clustering. In Validation, a set of consistent descriptors is obtained trip, and wedding. We crawled Flickr, Picasaweb, and our lab from the cluster that best represents an individual photo — the component data sets. We observed that many people store their personal event inference uses these descriptors in addition to a domain model that is photos according to events; accordingly, we collected the selected according to user info. Event Ontology Extension propagates the most data sets based on time, space, and event types (like travel, relevant subevent categories (to the input photo stream) with the information conference, meeting, workshop, vacation, and wedding). We discovered from Data Sources, then extends the event structure (ontology) with the applicable propagated event instances (i.e., tags). The tags are validated developed some crawlers to download about 700 albums of (using data sources), and added to the event ontology – the extended event the day’s featured photos; we crawled photo albums created ontology is used in filtering that queries visual concept verification tool. In since the year 2010 since most of the older collections did this stage given an event, irrelevant cluster branches are pruned. Next, for each not contain geo-tagged photos. After 4 months, we collected matched cluster, less relevant photos to a subevent tag are filtered. The output is a set of photos labeled with some tags; these tags are then stored as new 570 albums (about 60K photos) which had the required EXIF metadata for the photos. The remaining photos are tagged as miscellaneous. information containing location, timestamp, and optical camera parameters. We ignored the albums a) smaller than 20 photos, b) with non-English annotations. The average number of Our extension algorithm uses a similar strategy as what photos per album was 105. We used the albums from the most we used in subsection III-C. The difference is, the attributes of a data source at each iteration is supplemented by the 7 http://socrates.ics.uci.edu/Pictorria/public/demo STIDS 2013 Proceedings Page 6 Fig. 6. Role of context in improving the correctness of event tags. set, while providing them with three domain event models. For the non-lab data set, the ground truth provides a manual and subjective event labeling done by the very owner of the data set being unaware of the experiments. Because of the subjective nature of the non-lab data set, the event types that Fig. 4. Filtering Operation. were not contained in the event domain ontology are replaced with event type miscellaneous that is an event type in every domain event ontology in this work. For each experiment, we compute standard information retrieval measures (precision, recall, and F1-measure), for the event types used in tags. In addition to that, we introduce a measure of correctness for event tags. The score is obtained based on multiple context cues. For instance, label meeting with Tom Johnson at RA Sushi Japanese Restaurant in Broadway, San Diego, during time interval ”blah” in a sunny day, in an outdoor environment, specifies type of the event, its granularity in the subevent hierarchy, place, time, and environment condition. We developed an algorithm that evaluates each cue with a number in the range of 0 to 1 as follows: 1) event type: wrong L = 0, correct = 1, somehow correct = LTpP such that Lp is the subevent-granularity level for a predicted tag and LT P is the subevent granularity level for the true-positive tag (the predicted tag is the direct or indirect superevent of the true- L positive tag i.e., LTpP  1); 2) place: includes place name, Fig. 5. Data set geographical distribution. The black bars show the number of albums in each geographic region, and the gray bars show the number of category and geographical region. If the place name is correct, data sources that supported the corresponding geographic region. score 1 is assigned and the other attributes will not be checked. Otherwise, 0 is assigned; for the category and/or geographical region if correct, score 1 is assigned, and 0 otherwise. The active users based on the amount of user annotations, ending average of these values represent the score for place; 3) for up with a collection of 20 users with heterogeneous photo weather, optical, and visual constraint: wrong=0, correct =1, albums in terms of time period and geographical sparseness. unsure = 0.5; 4) time interval: if the predicted event tag occurs The geographic sparseness of albums ranged from being across anytime during the true-positive event tag, 1 is the score, continents, to cities of the same country/state (see fig 5). otherwise 0. The average of the above scores represents the We noticed that data sources do not equally support all the correctness measure for a predicted event tag. We introduce geographic regions; e.g., only a small number of data sources average correctness of annotation that is calculated using the supported the data sets captured inside India. The photos formula in equation 5, where wj is the score for the j th for vacation/professional-trip domains have higher temporal predicted tag, and L is the total number of expressive event and geographical sparseness compared to photos related to tags detected by our approach. wedding domain. The number of albums for vacation domain exceeds the other two. PL j=1 wj correctness = ; context = 1 Err (5) Experimental Set-Up L We picked the 4 most active users (based on the amount of The metric context in equation 5 is used to measure the user annotation) from our non-lab, downloaded data set, and 2 average context provided by data sources for annotating a most active users from our lab data set (based on the number of photo stream; parameter Err is the average error related to collections they own). As ground-truth for the lab data set, we the information provided by data sources used for annotating asked the owners to annotate the photos using their personal a photo stream (0  Err  1); the following guidelines experiences, and an event model that best describes the data are applied automatically, to measure this value: (a) if the STIDS 2013 Proceedings Page 7 information in a data source is related to the domain of a photo stream, but it is irrelevant to the context of the photo stream, assign error-score 1. For instance, data source TripAdvisor returns zero results related to Things-To-Do for the country at which a photo stream is created. Also, if a photo stream for a vacation trip does not include any picture taken in any landmark location, TripAdvisor does not provide any coverage; (b) assign error-score 0 if the type of a source is relevant as well as its data (i.e. non-empty results); (c) if the data from a relevant source is insufficient for a photo stream, assign error- score 0.5. For instance, only a subset of business venues in a region are listed in data source Yelp; as a result, the data source returns information for less than 30% of the photo stream; (d) for a data source, multiply the error-score by a fraction in which the numerator is the number of photos tagged using this data source, and the denominator is the size of the photo stream. Do this for all the sources and obtain the weighted average of the error-scores. The result is Err. The implication of our result in fig 6 is as follows: while the correctness of event tags (for a photo stream of an event) peaks Fig. 7. CPU-Time for experimental data sets of the 5 most active users. Each data set is represented by its owner, domain type, source, and size. The with the increase in context, relatively, smaller percentage domain wed implies wedding domain. of photos are tagged using non-miscellaneous events, and larger percentage of photos are tagged using miscellaneous event. This means if the suitable event type for a group of CPU-Performance photos does not exist in an event ontology, the photos are not tagged with an irrelevant non-miscellaneous event; instead, The running time for our proposed approach, and visual they are tagged with miscellaneous event which means other. concept verification is shown in fig 7, which illustrates the The right side of the figure indicates that even though the results for data sets of two sources i.e., lab, and non-lab number of miscellaneous and non-miscellaneous event tags (including Flickr, and Picasaweb), and three event domains. does not change, the correctness is still increasing; this means Cross-Domain Comparison : In general, we found smaller that the tags get more expressive since more context cues number of context sources for wedding data sets compared are attached to them. The quality of annotations is increased to the other two domains; as a result, the extension process when more context information is available. This shows that exits relatively faster, and the running time for the concept event ontology by itself is not as effective as augmented verification process increases. We observed the correctness of event ontology. We demonstrate three classes of experiments event tags degrades when Event Ontology Extension process in table I. This table shows the average values (between 0 to exists fast. This observation confirms the findings of fig 6. 1) for the measure metrics discussed earlier (precision, recall, F1, correctness). We use the work proposed in (Paniagua, Cross-Source Comparison: Within each domain, we com- 2012) as a baseline. It is based on space and time to detect pared the cpu-performance among lab and non-lab data sets; event boundaries in conjunction with using English album Event Ontology Extension exits relatively faster for non-lab descriptions. This baseline approach, with F1-measure about data sets. The justification for this observation is that we could 0.6 and correctness of almost 0.56, illustrates that time and obtain user-related context like facebook events/check-ins from space are important parameters to detect event boundaries. On our lab users (U3, U4), but such information was missing in the the other hand, the baseline approach is limited to using only case of non-lab data sets. This absence of information impacts spatiotemporal containment for detecting subevent hierarchy, wedding data sets the most, since the context information in the it does not support other types of relationships among events wedding scenario largely includes personal information such as (like co-occurring events, relative temporal relationships) and guest list, and wedding schedule that are not publicly available other semantic knowledge about the structure of events. Also, on photo sharing websites. In professionalTrip scenario, this it requires human-induced tags which are noisy. For the second impact is smaller than wedding, and larger than vacation; the set of experiments, we use an event domain ontology without missing data is due to the lack of context information related augmenting it with context information. This approach gives to personal meetings, and conference schedules. In vacation worse results since the context information is disregarded scenario, data sources are mostly public; only a small portion during detecting event boundaries. It provides the F1-measure of context information comes from the user-related context of almost 0.32 and correctness of 0.13. Our last experiment such as flight information,and facebook check-ins; therefore, leverages our proposed approach, and achieves F1-measure of we did not find a significant change in the cpu-time between about 0.85, and correctness of 0.82. Compared to our baseline lab and non-lab data sets. approach, we obtain about 26% improvement in the quality of tags which is a very promising result. VI. CONCLUSIONS Our proposed technique addresses a broad range of re- search challenges to achieve a powerful event-based system that can adapt to different scenarios and applications like STIDS 2013 Proceedings Page 8 Users U1 U2 U3 U4 U5 [16] W. Viana, J. Bringel Filho, J. Gensel, M. Villanova-Oliver, and H. Mar- prec 0.65 0.58 0.39 0.53 0.74 tin. Photomap: from location and time to context-aware photo annota- recall 0.89 0.4 0.61 0.64 0.8 tions. Journal of Location Based Services, 2008. baseline f1 0.75 0.47 0.48 0.6 0.77 corr 0.63 0.62 0.52 0.62 0.28 prec 0.41 0.17 0.3 0.48 0.12 recall 0.4 0.2 0.5 0.43 0.24 event ontology f1 0.4 0.18 0.37 0.45 0.16 corr 0.2 0.08 0.12 0.2 0.03 prec 0.74 0.83 0.95 0.92 0.88 recall 0.91 0.93 0.88 0.7 0.97 proposed f1 0.81 0.88 0.91 0.79 0.92 corr 0.8 0.75 0.85 0.79 0.9 TABLE I. R ESULTS FOR AUTOMATIC PHOTO ANNOTATION FOR THE DATA SETS OWNED BY THE 5 MOST ACTIVE USERS . those in intelligence community, multimedia applications, and emergency response. This is the starting step for combining complex models with BIG DATA. R EFERENCES [1] J. F. Allen and G. Ferguson. Actions and events in interval temporal logic. In Journal of Logic and Computation, 1994. [2] R. Alur and T. A. Henzinger. Logics and models of real time: A survey. In J. W. de Bakker, Cornelis Huizing, Willem P. de Roever, and Grzegorz Rozenberg, editors, REX Workshop, Springer, 1991. [3] N. Brown. On the prevalence of event clusters in autobiographical memory. Social Cognition, 2005. [4] L. Cao, J. Luo, H. Kautz, and T. Huang. Annotating collections of photos using hierarchical event and scene models. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE. [5] M. Cooper, J. Foote, A. Girgensohn, and L. Wilcox. Temporal event clustering for digital photo collections. ACM Transactions on Multimedia Computing, Communications, and Applications, 2005. [6] A. Fialho, R. Troncy, L. Hardman, C. Saathoff, and A. Scherp. What’s on this evening? designing user support for event-based an- notation and exploration of media. In 1st International Workshop on EVENTS-Recognising and tracking events on the Web and in real life, 2010. [7] B. Gong, U. Westermann, S. Agaram, and R. Jain. Event discovery in multimedia reconnaissance data using spatio-temporal clustering. In Proc. of the AAAI Workshop on Event Extraction and Synthesis, 2006. [8] A. Gupta and R. Jain. Managing event information: Modeling, retrieval, and applications. Synthesis Lectures on Data Management, 2011. [9] R. Jain and P. Sinha. Content without context is meaningless. In Proceedings of the international conference on Multimedia. ACM, 2010. [10] R. Koymans. Specifying real-time properties with metric temporal logic. In Real-Time Syst.,2(4), 1990. [11] X. Liu, R. Troncy, and B. Huet. Finding media illustrating events. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval. ACM, 2011. [12] J. Paniagua, I. Tankoyeu, J. Stöttinger, and F. Giunchiglia. Indexing media by personal events. In Proceedings of the 2nd ACM International Conference on Multimedia Retrieval. ACM, 2012. [13] S. Rafatirad, A. Gupta, and R. Jain. Event composition operators: Eco. In Proceedings of the 1st ACM international workshop on Events in multimedia. ACM, 2009. [14] S. Rafatirad and R. Jain. Contextual augmentation of ontology for recognizing sub-events. In Semantic Computing (ICSC), 2011 Fifth IEEE International Conference. IEEE, 2011. [15] P. Sinha and R. Jain. Classification and annotation of digital photos using optical context data. In CIVR, 2008. STIDS 2013 Proceedings Page 9