Context Correlation Using Probabilistic Semantics

                  Setareh Rafatirad                       Kathryn Laskey                            Paulo Costa
              George Mason University                George Mason University                 George Mason University
              Email: srafatir@gmu.edu                Email: klaskey@gmu.edu                  Email: pcosta@gmu.edu


    Abstract—We present an approach for recognizing high-           event type visit-landmark may have two instances; one instance
level geo-temporal phenomena – referred as events/occurrences–      associated with World War II Memorial and the other to
from in-depth discovery of information, using geo-tagged photos,    Washington Monument). Consider the following example: A
formal event models, and various context cues like weather,         person takes a photograph at an airport less than 1 hour after
space, time, and people. Due to the relative availability of        his flight arrives. To explain this photograph, we first need the
information, our approach automatically obtains a probabilistic
                                                                    background knowledge about the events that generally occur
measure of occurrence likelihood for the recognized geo-temporal
phenomena. This measure, however, is not only used to find the      in the domain of a trip. These semantics can only come from
best event among the merely possible candidates – witnessing the    an event-ontology that provides the vocabulary for event/entity
data (including photos), but it can also provide informative cues   and event relationships related to a domain. An event-ontology
to human operators in the environments where uncertainty is         allows explicit specification of models that could be modified
involved in the existing knowledge.                                 using context information to provide very flexible models for
                                                                    high-level semantics of events. We refer to this modification
                       I.   I NTRODUCTION                           as Event Ontology Extension. It constructs a more robust and
                                                                    refined version of an event-ontology either fully or semi-
     Sensors have become one of the biggest contributors of         automatically. Secondly, given the uncertain nature of sensory
BIG DATA datasets. Numerous datasets have been already              data (like GPS that is not always accurate), the event type
generated in real-time with rich content, about various informa-    witnessed by the available context data is not decisive; in
tion. Mobile wireless devices with multiple sensors like camera     the above example, the event might either be rent a car, or
and GPS, and internet connectivity, can continuously capture        baggage claim that are two possible conclusions — sometimes
photos and record camera parameters, GPS location, and time.        no single obvious explanation is available, but rather, several
The availability of various web services like MapMyRide             competing explanations exist and we must select the best one.
1
  , and Wunderground 2 , provides semantics like ride, and          In this work, reasoning from a set of incomplete information
geo-temporal weather status logs, using the captured sensory        (observations) to the most related conclusion out of all possible
data. Given that context data exists in massive volumes, an         ones (explanations) is performed through a ranking algorithm
information management paradigm is needed to correlate the          that incorporates the plausibility measure; this ranking process
information and infer higher level semantics. We propose a          is used in Event Ontology Extension.
technique that automatically correlates various information,
and creates a context-aware event graph by combining event              Problem Formulation: Every input photo has context
models with contextual information related to photos, sensor        information (timestamp, location, and camera parameters) and
logs, heterogeneous data sources, and web services. Our tech-       a user. Each photo belongs to a photo stream P of an event
nique automatically computes the occurrence-likelihood for          with a domain event model O(V, E) –handcrafted by a group
the event nodes in the output graph – referred as plausibility      of domain experts– whose nodes V are event/entity classes,
measure that provides informative cues to human operators           and edges E represent the relationships between the nodes.
in uncertain environments to make better decisions. Note that       There is a bucket B of external data sources represented with
this work provides a holistic view of the high-level events         a schema. The sources can be queried using the metadata
witnessed by a dataset; further cause-effect decision-making        of the input photographs and other available information,
using the output of this stage in out of the scope of this paper.   including the information about the associated user. Given P ,
                                                                    B, O, and information associated to the user, how does one
    Events, in general, are structured and their subevents          find the finest possible event tag that can be assigned to a
have relatively more expressive power [13]. In this work, an        photo or a group of similar photos in P ?
event model (or event ontology) provides a multi-granular
conceptual description, i.e., it provides conceptual hierarchy
in multiple levels using containment event-event relationships          Solution: We propose an Event Ontology Extension tech-
e.g., subevent-of, and subClassOf. In addition, event types         nique described as follows: select a relevant domain event
can have multiple instances; instance events are contextual,        model through the information related to both P and the user.
and they should be augmented with context cues (like place,         Using P , B, O, and the user information, infer S – that
time, weather). This makes instance events more expressive          consists of the best relevant subevent categories to P – where
than event types. Augmenting an instance event with context         S ✓ V . A member of S is the most plausible event category for
cues adapts a concept to multiple contextual descriptions (e.g.,    a group of contextually-similar photos. For a group of similar
                                                                    photos cj , a function f calculates the plausibility measure mpij
  1 http://www.mapmyride.com/                                       for every competing event candidate si : f (si , cj ) = mpij ; this
  2 http://www.wunderground.com/                                    measure indicates how much si is relevant to cj such that


                                                 STIDS 2013 Proceedings Page 2
cj ⇢ P . Using the information from B, extend S with one or             not suit the purpose of this research (they deal with low
more augmented instances of S, and obtain expressive event              level events i.e.,activities). However, higher-level events have
tags T . An event tag tei 2 T is a subevent of an event that            relatively more contextual characteristics; d) according to the
either exists in O, or can be derived from O such that tei              useful properties of photos, relevant event categories in the
is the finest subevent tag that can be assigned to a group of           model must be discovered. This paper is organized as follows:
similar photos. If tei is an assignable tag to any photo, and           in section II, we review the prior art that use context and event
tei 62 O, we intend to extend O by adding tei to O such                 models for annotating photographs; in section III and IV, we
that the constraints governing O are preserved. The output              explain our solution strategy; this is followed by section V
is an extension to O that is referred as Or (see fig 1). We             that demonstrates our experiments, and section VI which is
argue that attribute values related to an inferred event need           the conclusion.
to be obtained, refined, and validated as much as possible to
create very expressive and reliable metadata. Fig 3 depicts                                   II.   S TATE O F A RT
the processing components of our proposed approach. We
used semantics such as spatiotemporal attributes/constraints of             The important role of context is emphasized in [9]. Con-
events, subevent structure, and spatiotemporal proximity. In            text information and ontological event models are used in
contrast to machine learning approaches that are limited to             conjunction by [16], [6]. Cao et al. present an approach for
the training data set and require an extensive amount of anno-          event recognition in image collections using image timestamp,
tation, we propose a technique in which existing knowledge              location, and a compact ontology of events and scenes [4]; this
sources are modified and expanded with context information              work, does not support subevent structure. Liu et al. reports a
in external data sources including public data sources (like            framework that converts each event description from existing
public event/weather directories, local business databases), and        event directories (like Last.fm) into an event ontology that is a
digital media archives (like photographs). With this knowledge          minimal core model for any general event [11]. This approach
expansion, new infrastructures are constructed to serve relevant        is not flexible to describe domain events (like trip) and their
data to communities. Event tags are propagated with event               subevent structure. Paniagua et al. propose an approach that
title, place information (like city, category, place name), time,       builds an event hierarchy using the contextual information of
weather, etc. Our proposed technique provides two unique                a photo based on moving away from routine locations, and
key benefits as follows: 1) A sufficiently flexible structure to        string analysis of English album titles (annotated by people) for
express context attributes for events such that the attributes are      public web albums in Picasaweb [12]. The limitations of this
not hardwired to events, but rather they are discovered on the          approach are: 1) human-induced tags are noisy, and 2) subevent
fly. This feature does not limit our approach to a single data          relationship is more than just spatiotemporal containment. For
set; 2) leveraging context data across multiple sources could           instance, albeit a car accident may occur in the spatiotemporal
facilitate building a consistent, unambiguous knowledge base.           extent of a trip, it is not part of the subevent-structure of the
                                                                        trip. According to [3], events form a hierarchical narrative
                                                                        structure that is connected by causal, temporal, spatial and
                                                                        subevent relations. If these aspects are carefully modeled,
                                                                        they can be used to create a descriptive knowledge base for
                                                                        interpreting multimedia data. In [14], a mechanism is proposed
                                                                        that exploits context sources in conjunction with subevent-
                                                                        structure of an event — this structure is modeled in a domain
                                                                        event ontology. The limitation of this approach is no matter
                                                                        how much an event category is relevant to a group of photos
                                                                        in a photo stream, it is used in photo annotation; as a result,
                                                                        the quality of annotation degrades.

                                                                                    III.   E VENT O NTOLOGY E XTENSION
                                                                            Photo’s incomplete information can be improved if com-
                                                                        bined with the information related to a group of similar photos.
                                                                        In this work, two images are similar if they belong to the same
                                                                        event type. Partitioning a photo stream of an event based on the
                                                                        context of its digital photographs can create separate subevent
                                                                        boundaries for its photos [5]. An event is a spatiotemporal
                                                                        entity [7]. In addition, optical camera parameters (CP) in
Fig. 1. An example of an event model being extended with contextually   photos provide useful information related to the environment
propagated instances.                                                   (like outdoor) at which an event occurs [15]. We used a
                                                                        clustering that partitions photos hierarchically based on their
    Some of the main challenges of this work are: a) collecting         timestamp, location, and CP. We used single linkage clustering
and correlating information from various sources – we need              and Euclidean distance in our clustering technique. However,
a general mechanism that automatically queries sources and              one can use other approaches and refine the results. We present
represents the output; b) a validation mechanism to ensure the          the observations (i.e., photos/clusters) with a set of descriptors
coherency of the obtained data; c) currently, publicly available        – a cluster consists of a group of contextually similar photos.
benchmark data sets such as those offered by TRECVid do                 In this section, we show that it is feasible to go from a set of


                                                    STIDS 2013 Proceedings Page 3
descriptors D to the best subevent category, when the following           {F lash : ‘of f ‘, conf idence : 1.0} for a photo, states that
conditions are satisfied: (a) the descriptors in D are consistent         the flash was off when the photo was captured with 100%
among themselves, (b) the descriptors in D satisfy subevent               certainty. Photo and cluster descriptors follow the same rep-
categories, (c) axioms of a subevent category are consistently            resentation model, however the rules for computing the value
formulated in an event ontology, and (d) the inferred subevent            of conf idenced are different. We will describe these rules in
categories are sound and complete.                                        the following paragraphs. The descriptor model of a cluster
                                                                          includes two fields in addition to that of a photo: plausibility-
A. EVENT MODEL                                                            weight      0 , and implausibility-weight < 0. Later, we will
                                                                          explain the usage of these fields. All descriptors are either
    We use a basic derivation of E* model [8] as our core event           direct or derived. For photo descriptors, by convention, we
model, to specify the general relationships between events and            assume that a direct descriptor is straightly extracted from
entities. Specifically, we utilized the relationships subeventOf,         the EXIF metadata of a photo, and its confidence is 1, as
which specifies the event structure and event containment. The            in the above example. The direct descriptors that we used in
expression e1 subeventOf e2 indicates that e1 occurs within               this paper are related to time, location, and optical parameters
the spatiotemporal bounds of e2 , and e1 is part of the regular           of photos like GPSLatitude ,GPSLongitude , Orientation,
structure of e2 . Additionally, we used the spatiotemporal rela-          Timestamp, and ExposureTime. For a derived descriptor like
tionships like occurs-during and occurs-at to specify the space           {sceneT ype : ‘indoor‘, conf idence : 0.6}, the descriptor
and time properties of an event. The time and space model that            value ‘indoor‘ is computed using direct descriptors like Flash,
we used in this work is mostly derived from E* model. The                 through a sequence of computations that extract information
relationship participant is used to describe the presence of a            from a bucket of data sources. Some of these descriptors
person in an event. We use the relationships co-occurring-with,           are PlaceCategory3 , Distance4 , and HoursOfOperation5 . The
and co-located-with, spatially-near, temporal-overlap , before,           confidence score is obtained from the processing unit used
and after to describe the spatiotemporal neighborhood of an               to compute the descriptor value — we developed several
event. The relationship same-as between two events, makes                 information retrieval algorithms for this purpose,in addition
them equivalent entities. Also, we used several other relation-           to the existing tools in our lab [15]. If a descriptor value is
ships to describe additional constraints about events (e.g., e1           directly extracted from an external data source, conf idenced
has-ambient-constraint A, and A has-value indoor). Moreover,              is equal to 1. Direct descriptors of a cluster must represent
to express a certain group of temporal constraints, we utilized           all photos contained in it; some of these descriptors represent
some of Linear Temporal Logic, Metric Temporal Logic, and                 boundingbox, time-interval, and size of the cluster. The confi-
Real-Time Temporal Logic formulas [10], [2]. These formulas               dence value for direct descriptors is equal to 1, for instance,
are a combination of the classical operators ^ (conjunction)              in the descriptor {size : 5, conf idenced : 1.0} that indicates
, _ (disjunction) , implication (!) , Allen’s calculus [1], ⇤             the number of photos in a cluster, conf idenced is equal to 1.
operator, ⌃ operator, linear constraints, and distance functions;
they are used to model complex relative temporal properties.                  Given a photo pi in a photo stream P , and the cluster c
For instance constraint ⇤[t1 ,t2 ] (e1 ! ⌃[t2 ,t2 +1800] e2 ^ D̃(e2 )    that groups pi with the most similar photos in P , a processing
1800) states that e2 eventually happens within 1800 seconds               unit produces the descriptors of c using the descriptors of the
after e1 and that e2 lasts less than or equal to 1800 seconds.            photos in c, and more importantly, this process is guided by the
We developed a language L with a syntax and grammar as                    descriptors of pi . Every photo in c must support every derived
an extension to OWL to embrace complex temporal formulas.                 descriptor of pi ; such cluster is referred as a sound cluster
Further, we extended the language to support a combination of             for pi , and the derived descriptors for c are represented by
classical propositional operators, linear spatial constraints, and        the distinct union of the derived descriptors of the photos in c.
spatial distance functions which can not be expressed in OWL;             For a derived cluster descriptor d, the value of conf idenced is
equation feucDist (e1 , e2 , @  100) shows a relative spatial            calculated using the formula in equation 1, in which |c| is the
constraint in L, which states the event e1 occurs at most 100             size of the cluster, pj is every photo in c that is represented by
meters away from the place at which event e2 occurs.                      d, and g(pj , d) gives the confidence value of d in pj . To find
                                                                          a sound cluster for a photo, the hierarchical structure that is
    Domain Event Model: A domain event ontology provides                  produced by the clustering unit, is traversed using depth-first
specialized taxonomy for a certain domain like trip, see fig              search — the halting condition for this navigation, if no sound
2. The Miscellaneous subevent category in this model is used              cluster was found, is when current cluster is a leaf node.
to annotate the photos that are not matched with any other                                                       X
category. The general vocabulary in a core event model is                                                   1
                                                                                         conf idenced =        ⇥     f (pj , d)          (1)
reused in a domain event ontology. For instance, Parking in fig                                            |c|
2, is a subClassOf of Occurrent (or event) concept in the core
event ontology. Also, relationships like subeventOf are reused                Descriptor Consistency: Consistency among a set of de-
from the core event ontology. We assume that domain event                 scriptors is a mandatory condition to infer the best possible
ontologies are handcrafted by a group of domain experts.                  conclusion from it. In this work, consistency must exist among
                                                                          the descriptors of a photo as well as the descriptors of a cluster,
B. DESCRIPTOR REPRESENTATION MODEL                                        using entailment rules described below. (a) vi ! vk : if vi
                                                                          implies vk , then the rules for vk must also be applied to vi . This
   We represent a descriptor using the schema in script
{typed : valued , conf idenced : val}, in which typed , valued ,            3 The category of the nearest local business to the coordinates of a photo.
and val indicate the type, value,and certainty (between 0 and               4 The distance of a local business to the coordinates of a photo.

1) of the descriptor, respectively. For instance, the descriptor            5 The hours during which a local business is open.


                                                     STIDS 2013 Proceedings Page 4
is referred as transitive entailment rule. For instance, suppose a
photo/cluster has the following description, 0 outdoorSeating :
true0 ; 0 sceneT ype : outdoor0 ; 0 weatherCondition :
storm0 , which implies that the nearest local business (e.g.
restaurant) to the photo/cluster, offers outdoorSeating, and the
weather was stormy when the photo(s) were captured. Given
the sequence of rules below,
        outdoorSeating ^ outdoor ! f ineW eather,
        f ineW eather ! ¬storm

rule 2 is entailed that indicates an inconsistency among the
descriptors of a photo/cluster.
             outdoorSeating ^ outdoor ! ¬storm                 (2)

(b) vi ! f uncremove (vk ): vi implies removing the descriptor
vk . This is referred as a deterministic entailment rule.
(c) vi ^ vk ! truth value: rules of this type are referred
as non-deterministic entailment rules in which the inconsis-
                                                                      Fig. 2.   An event ontology for the domain professional trip.
tency is expressed by a false truth value e.g. closeShot ^
landscape ! f alse. In that case, further decisions on keep-
ing,modifying, or discarding either of the descriptors vi or          D. EVENT INFERENCE
vk will be based on the confidence value assigned to each
descriptor — this operation is referred as update, which is               From a set of consistent cluster descriptors (observations),
executed when an inconsistency occurs between two candidate           we developed a context discovery algorithm to infer the most
descriptors. The following rules are used by this process: (a)        plausible subevent category described in a domain event on-
for two descriptors with the same type, the descriptor with           tology. This algorithm, uses the domain event model, which is
lower confidence score is discarded, (b) for two descriptors          a graph; we represent this graph with the notation O(V, E)
with different types, the one with lower confidence score gets        in which V includes event classes, and E includes event
modified until the descriptors are consistent. The modification       relationships. Traversing the event graph O starts with the
is defined as either negation or expansion within the search          root of hierarchical subevent structure. The algorithm visits
space. In case of negation, e.g. ¬outdoor ! indoor, the con-          event candidates in E through some of the relationships in E
fidence value for indoor descriptor is calculated by subtracting      like subeventOf, co-occurring-with, co-located-with, spatially-
the confidence value of outdoor descriptor from 1. An example         near, temporal-overlap, before, and after — these relation-
of expansion is increasing a window size to discover more             ships help to reach other event candidates that are in the
local businesses near a location. To avoid falling inside an          spatiotemporal neighborhood of an event. An expandable list,
infinite loop, we limit the count of negation, and the size of        referred as Lv , is constructed from E, to maintain the visited
search space during expansion, by a threshold. We assign null         event/subevent nodes during an iteration i — if an event is
to the descriptor that has already reached a threshold and is still   added to Lv , it cannot be processed again during the extent
inconsistent. null is universally consistent with any descriptor.     of i. At the end of each iteration, Lv is cleared. In every
The vocabulary that is used to model the descriptors for a            iteration, the best subevent category is inferred through a
photo/cluster is taken from the vocabulary that is specified in       ranking process, from a set of consistent observations.
the core event model.                                                     To find the most plausible subevent category, we introduce
                                                                      Measure of Plausibility (mpij ) to rank event candidates. This
C. DATA SOURCES                                                       measure is computed using two parameters (1) granularity
    We represent each data source with a declarative schema,          score (wg ), and (2) plausibility score (wAX ). wg is equivalent
by using the vocabulary of the core event model. This schema          to the level of the event in the subevent hierarchy in the domain
indicates the type of source output. In addition, it specifies        event ontology. To compute wAX , we used ’plausibility-
what type of the input attributes a source needs, to deliver the      weight’ (w+ ) and ’implausibility-weight’ (w ) which are two
output. Data sources are queried using the SPARQL language6 .         fields of a cluster descriptor. The value of w+ is equal to the
A query is constructed automatically using the schema of data         confidence value assigned to a descriptor, and the value of
sources, and the available information. Simply put, a source is       w is equal to w+ . If a descriptor could not be mapped to
selected if its input attributes match the available information      any event constraint, wAX remains unchanged. If a descriptor
I. At every iteration, I is incrementally updated with new data       with w+ = ↵ satisfies an event constraint, then w+ is added to
that is delivered by a source. The next source is selected if its     wAX , otherwise, w is added to wAX (i.e., wAX = wAX ↵).
input attributes are included in I. This process continues until      The only exception is for the cluster descriptors time-interval
no more source with matching attributes is left in the bucket         and boundingbox; if either one of these descriptors satisfies an
B.                                                                    explanation, then w+ = 1; in the opposite case, w  100
                                                                      — when a cluster has no overlap with the spatiotemporal extent
  6 http://www.w3.org/TR/rdf-sparql-query/                            of an event si , w  100 makes si the least plausible


                                                  STIDS 2013 Proceedings Page 5
candidate in the ranking. According to the formula in III-D,                        user information and the attributes of a seed-event (I) that
wAX also depends on the fraction of satisfied event constraints;                    is represented with the same schema that is described in the
N is the total number of constraints for an event candidate.                        event ontology. Given a sequence of input attributes, if a data
                                                                                    source returns an output-array of size K, then our algorithm
                               1 X j
                      wAX =       wAX , 1  j  N                             (3)   creates K new instances of events with the same type as in
                               N
                                                                                    the seed-event, and augments them with the information in the
  Finally, we use the following instructions to compare two                         output-array. The augmented seed-events are added to I for
event candidates e1 and e2 : when e1 is subsumed by e2 ,                            the next iteration; I is constantly updated until all the event
mpij for each event candidate is normalized using the formula                       categories in E 0 are augmented, and/or there is no more data
in equation 4, in which ei ⌘ e1 and ej ⌘ e2 , otherwise,                            source (in the bucket B) to query. To avoid falling into an
ei .mpij = ei .wAX . The candidate with the highest mpij is the                     infinite loop of querying data sources, we set the following
most plausible subevent category.                                                   condition: a data source cannot be queried more than once
                                                                                    for each seed-event. We defined some queries manually that
                      ei .wAX               ei .wg                                  are expressed through the relative spatiotemporal relationships
  ei .mpij =                          +                      (4)                    in the event ontology, and the augmented seed-events; these
               max(ei .wAX , ej .wAX ) max(ei .wg , ej .wg )                        queries are used to augment the seed-events with relative
When a subevent category is inferred from a set of observa-                         spatiotemporal properties. When a seed-event gets augmented
tions, it will not be considered again as a candidate for the next                  with information, our technique validates the event tag by
set of observations. Event inference halts if no more subevent                      using the event constraints, augmented event attributes, and
category is left to be inferred from the domain event ontology.                     a sequence of entailment rules that specify the cancel status
                                                                                    for an event. For instance, if the weather attribute for an
    EXTENSION: The inferred subevent categories E 0 are                             event is heavy rain, and the weather constraint fine weather is
refined with the context data extracted from data sources in the                    defined for an event, then the status of the event tag becomes
bucket B, through the refinement process. First, let us elaborate                   canceled. After the validation, event tags are added to the
this process by introducing the notion of seed event, which is                      domain event ontology by extending event classes through
an instance of an inferred category in E 0 , which is not yet                       typeOf relationship. This step produces an augmented event
augmented with information. An augmented seed-event is an                           ontology that is the extended version of the prior model (see
expressive event tag. The seed-event is continuously refined                        fig 1).
with information from multiple sources.
                                                                                                                   IV.      FILTERING
                                                                                         Filtering is a two-step process; (1) redundant and irrelevant
                                                                                    clusters are pruned from the hierarchical cluster structure
                                                                                    produced by the clustering component, see fig 4-step-1. (2)
                                                                                    filter redundant photos from the matched cluster, see fig 4-
                                                                                    step-2. This is accomplished by applying the context and
                                                                                    visual constraints of the expressive tag that is matched to
                                                                                    the cluster. We used a concept verification tool7 to verify the
                                                                                    visual constraints of events using image features. This tool
                                                                                    uses pyramids of color histogram and GIST features. Filtering
                                                                                    operation is deeply guided by the expressive tags. During
                                                                                    this operation, subevent relations are used for navigating the
                                                                                    augmented event model.

                                                                                                 V.     EXPERIMENTAL EVALUATIONS
Fig. 3. The Big Picture. Photos and their metadata are stored in photo-base
and metadata-base respectively. Using user info, including events’ type, time,          We focused on 3 domain scenarios vacation, professional
and space in a user’s calendar, a photo stream is queried, and its metadata is
passed to Clustering. In Validation, a set of consistent descriptors is obtained
                                                                                    trip, and wedding. We crawled Flickr, Picasaweb, and our lab
from the cluster that best represents an individual photo — the component           data sets. We observed that many people store their personal
event inference uses these descriptors in addition to a domain model that is        photos according to events; accordingly, we collected the
selected according to user info. Event Ontology Extension propagates the most       data sets based on time, space, and event types (like travel,
relevant subevent categories (to the input photo stream) with the information       conference, meeting, workshop, vacation, and wedding). We
discovered from Data Sources, then extends the event structure (ontology) with
the applicable propagated event instances (i.e., tags). The tags are validated      developed some crawlers to download about 700 albums of
(using data sources), and added to the event ontology – the extended event          the day’s featured photos; we crawled photo albums created
ontology is used in filtering that queries visual concept verification tool. In     since the year 2010 since most of the older collections did
this stage given an event, irrelevant cluster branches are pruned. Next, for each   not contain geo-tagged photos. After 4 months, we collected
matched cluster, less relevant photos to a subevent tag are filtered. The output
is a set of photos labeled with some tags; these tags are then stored as new
                                                                                    570 albums (about 60K photos) which had the required EXIF
metadata for the photos. The remaining photos are tagged as miscellaneous.          information containing location, timestamp, and optical camera
                                                                                    parameters. We ignored the albums a) smaller than 20 photos,
                                                                                    b) with non-English annotations. The average number of
   Our extension algorithm uses a similar strategy as what                          photos per album was 105. We used the albums from the most
we used in subsection III-C. The difference is, the attributes
of a data source at each iteration is supplemented by the                             7 http://socrates.ics.uci.edu/Pictorria/public/demo


                                                             STIDS 2013 Proceedings Page 6
                                                                             Fig. 6.   Role of context in improving the correctness of event tags.


                                                                             set, while providing them with three domain event models.
                                                                             For the non-lab data set, the ground truth provides a manual
                                                                             and subjective event labeling done by the very owner of the
                                                                             data set being unaware of the experiments. Because of the
                                                                             subjective nature of the non-lab data set, the event types that
Fig. 4.   Filtering Operation.                                               were not contained in the event domain ontology are replaced
                                                                             with event type miscellaneous that is an event type in every
                                                                             domain event ontology in this work. For each experiment, we
                                                                             compute standard information retrieval measures (precision,
                                                                             recall, and F1-measure), for the event types used in tags.
                                                                             In addition to that, we introduce a measure of correctness
                                                                             for event tags. The score is obtained based on multiple
                                                                             context cues. For instance, label meeting with Tom Johnson
                                                                             at RA Sushi Japanese Restaurant in Broadway, San Diego,
                                                                             during time interval ”blah” in a sunny day, in an outdoor
                                                                             environment, specifies type of the event, its granularity in the
                                                                             subevent hierarchy, place, time, and environment condition.
                                                                             We developed an algorithm that evaluates each cue with a
                                                                             number in the range of 0 to 1 as follows: 1) event type: wrong
                                                                                                                         L
                                                                             = 0, correct = 1, somehow correct = LTpP such that Lp is
                                                                             the subevent-granularity level for a predicted tag and LT P
                                                                             is the subevent granularity level for the true-positive tag (the
                                                                             predicted tag is the direct or indirect superevent of the true-
                                                                                                 L
                                                                             positive tag i.e., LTpP  1); 2) place: includes place name,
Fig. 5. Data set geographical distribution. The black bars show the number
of albums in each geographic region, and the gray bars show the number of    category and geographical region. If the place name is correct,
data sources that supported the corresponding geographic region.             score 1 is assigned and the other attributes will not be checked.
                                                                             Otherwise, 0 is assigned; for the category and/or geographical
                                                                             region if correct, score 1 is assigned, and 0 otherwise. The
active users based on the amount of user annotations, ending                 average of these values represent the score for place; 3) for
up with a collection of 20 users with heterogeneous photo                    weather, optical, and visual constraint: wrong=0, correct =1,
albums in terms of time period and geographical sparseness.                  unsure = 0.5; 4) time interval: if the predicted event tag occurs
The geographic sparseness of albums ranged from being across                 anytime during the true-positive event tag, 1 is the score,
continents, to cities of the same country/state (see fig 5).                 otherwise 0. The average of the above scores represents the
We noticed that data sources do not equally support all the                  correctness measure for a predicted event tag. We introduce
geographic regions; e.g., only a small number of data sources                average correctness of annotation that is calculated using the
supported the data sets captured inside India. The photos                    formula in equation 5, where wj is the score for the j th
for vacation/professional-trip domains have higher temporal                  predicted tag, and L is the total number of expressive event
and geographical sparseness compared to photos related to                    tags detected by our approach.
wedding domain. The number of albums for vacation domain
exceeds the other two.                                                                                    PL
                                                                                                             j=1 wj
                                                                                       correctness =                  ; context = 1          Err     (5)
Experimental Set-Up                                                                                           L

    We picked the 4 most active users (based on the amount of                    The metric context in equation 5 is used to measure the
user annotation) from our non-lab, downloaded data set, and 2                average context provided by data sources for annotating a
most active users from our lab data set (based on the number of              photo stream; parameter Err is the average error related to
collections they own). As ground-truth for the lab data set, we              the information provided by data sources used for annotating
asked the owners to annotate the photos using their personal                 a photo stream (0  Err  1); the following guidelines
experiences, and an event model that best describes the data                 are applied automatically, to measure this value: (a) if the


                                                       STIDS 2013 Proceedings Page 7
information in a data source is related to the domain of a photo
stream, but it is irrelevant to the context of the photo stream,
assign error-score 1. For instance, data source TripAdvisor
returns zero results related to Things-To-Do for the country
at which a photo stream is created. Also, if a photo stream
for a vacation trip does not include any picture taken in any
landmark location, TripAdvisor does not provide any coverage;
(b) assign error-score 0 if the type of a source is relevant as
well as its data (i.e. non-empty results); (c) if the data from a
relevant source is insufficient for a photo stream, assign error-
score 0.5. For instance, only a subset of business venues in
a region are listed in data source Yelp; as a result, the data
source returns information for less than 30% of the photo
stream; (d) for a data source, multiply the error-score by
a fraction in which the numerator is the number of photos
tagged using this data source, and the denominator is the size
of the photo stream. Do this for all the sources and obtain
the weighted average of the error-scores. The result is Err.
The implication of our result in fig 6 is as follows: while the
correctness of event tags (for a photo stream of an event) peaks    Fig. 7. CPU-Time for experimental data sets of the 5 most active users.
                                                                    Each data set is represented by its owner, domain type, source, and size. The
with the increase in context, relatively, smaller percentage        domain wed implies wedding domain.
of photos are tagged using non-miscellaneous events, and
larger percentage of photos are tagged using miscellaneous
event. This means if the suitable event type for a group of         CPU-Performance
photos does not exist in an event ontology, the photos are
not tagged with an irrelevant non-miscellaneous event; instead,         The running time for our proposed approach, and visual
they are tagged with miscellaneous event which means other.         concept verification is shown in fig 7, which illustrates the
The right side of the figure indicates that even though the         results for data sets of two sources i.e., lab, and non-lab
number of miscellaneous and non-miscellaneous event tags            (including Flickr, and Picasaweb), and three event domains.
does not change, the correctness is still increasing; this means
                                                                        Cross-Domain Comparison : In general, we found smaller
that the tags get more expressive since more context cues
                                                                    number of context sources for wedding data sets compared
are attached to them. The quality of annotations is increased
                                                                    to the other two domains; as a result, the extension process
when more context information is available. This shows that
                                                                    exits relatively faster, and the running time for the concept
event ontology by itself is not as effective as augmented
                                                                    verification process increases. We observed the correctness of
event ontology. We demonstrate three classes of experiments
                                                                    event tags degrades when Event Ontology Extension process
in table I. This table shows the average values (between 0 to
                                                                    exists fast. This observation confirms the findings of fig 6.
1) for the measure metrics discussed earlier (precision, recall,
F1, correctness). We use the work proposed in (Paniagua,                Cross-Source Comparison: Within each domain, we com-
2012) as a baseline. It is based on space and time to detect        pared the cpu-performance among lab and non-lab data sets;
event boundaries in conjunction with using English album            Event Ontology Extension exits relatively faster for non-lab
descriptions. This baseline approach, with F1-measure about         data sets. The justification for this observation is that we could
0.6 and correctness of almost 0.56, illustrates that time and       obtain user-related context like facebook events/check-ins from
space are important parameters to detect event boundaries. On       our lab users (U3, U4), but such information was missing in the
the other hand, the baseline approach is limited to using only      case of non-lab data sets. This absence of information impacts
spatiotemporal containment for detecting subevent hierarchy,        wedding data sets the most, since the context information in the
it does not support other types of relationships among events       wedding scenario largely includes personal information such as
(like co-occurring events, relative temporal relationships) and     guest list, and wedding schedule that are not publicly available
other semantic knowledge about the structure of events. Also,       on photo sharing websites. In professionalTrip scenario, this
it requires human-induced tags which are noisy. For the second      impact is smaller than wedding, and larger than vacation; the
set of experiments, we use an event domain ontology without         missing data is due to the lack of context information related
augmenting it with context information. This approach gives         to personal meetings, and conference schedules. In vacation
worse results since the context information is disregarded          scenario, data sources are mostly public; only a small portion
during detecting event boundaries. It provides the F1-measure       of context information comes from the user-related context
of almost 0.32 and correctness of 0.13. Our last experiment         such as flight information,and facebook check-ins; therefore,
leverages our proposed approach, and achieves F1-measure of         we did not find a significant change in the cpu-time between
about 0.85, and correctness of 0.82. Compared to our baseline       lab and non-lab data sets.
approach, we obtain about 26% improvement in the quality of
tags which is a very promising result.                                                     VI.    CONCLUSIONS
                                                                        Our proposed technique addresses a broad range of re-
                                                                    search challenges to achieve a powerful event-based system
                                                                    that can adapt to different scenarios and applications like


                                                 STIDS 2013 Proceedings Page 8
             Users            U1 U2 U3 U4 U5                                     [16]   W. Viana, J. Bringel Filho, J. Gensel, M. Villanova-Oliver, and H. Mar-
                      prec 0.65 0.58 0.39 0.53 0.74                                     tin. Photomap: from location and time to context-aware photo annota-
                      recall 0.89 0.4 0.61 0.64 0.8                                     tions. Journal of Location Based Services, 2008.
       baseline
                      f1     0.75 0.47 0.48 0.6 0.77
                      corr 0.63 0.62 0.52 0.62 0.28
                      prec 0.41 0.17 0.3 0.48 0.12
                      recall 0.4 0.2 0.5 0.43 0.24
       event ontology
                      f1      0.4 0.18 0.37 0.45 0.16
                      corr 0.2 0.08 0.12 0.2 0.03
                      prec 0.74 0.83 0.95 0.92 0.88
                      recall 0.91 0.93 0.88 0.7 0.97
       proposed
                      f1     0.81 0.88 0.91 0.79 0.92
                      corr 0.8 0.75 0.85 0.79 0.9

 TABLE I.      R ESULTS FOR AUTOMATIC PHOTO ANNOTATION FOR THE
            DATA SETS OWNED BY THE 5 MOST ACTIVE USERS .


those in intelligence community, multimedia applications, and
emergency response. This is the starting step for combining
complex models with BIG DATA.


                              R EFERENCES
 [1]   J. F. Allen and G. Ferguson. Actions and events in interval temporal
       logic. In Journal of Logic and Computation, 1994.
 [2]   R. Alur and T. A. Henzinger. Logics and models of real time: A
       survey. In J. W. de Bakker, Cornelis Huizing, Willem P. de Roever,
       and Grzegorz Rozenberg, editors, REX Workshop, Springer, 1991.
 [3]   N. Brown. On the prevalence of event clusters in autobiographical
       memory. Social Cognition, 2005.
 [4]   L. Cao, J. Luo, H. Kautz, and T. Huang. Annotating collections of
       photos using hierarchical event and scene models. In Computer Vision
       and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on.
       IEEE.
 [5]   M. Cooper, J. Foote, A. Girgensohn, and L. Wilcox. Temporal
       event clustering for digital photo collections. ACM Transactions on
       Multimedia Computing, Communications, and Applications, 2005.
 [6]   A. Fialho, R. Troncy, L. Hardman, C. Saathoff, and A. Scherp.
       What’s on this evening? designing user support for event-based an-
       notation and exploration of media. In 1st International Workshop on
       EVENTS-Recognising and tracking events on the Web and in real life,
       2010.
 [7]   B. Gong, U. Westermann, S. Agaram, and R. Jain. Event discovery
       in multimedia reconnaissance data using spatio-temporal clustering. In
       Proc. of the AAAI Workshop on Event Extraction and Synthesis, 2006.
 [8]   A. Gupta and R. Jain. Managing event information: Modeling, retrieval,
       and applications. Synthesis Lectures on Data Management, 2011.
 [9]   R. Jain and P. Sinha. Content without context is meaningless. In
       Proceedings of the international conference on Multimedia. ACM, 2010.
[10]   R. Koymans. Specifying real-time properties with metric temporal logic.
       In Real-Time Syst.,2(4), 1990.
[11]   X. Liu, R. Troncy, and B. Huet. Finding media illustrating events. In
       Proceedings of the 1st ACM International Conference on Multimedia
       Retrieval. ACM, 2011.
[12]   J. Paniagua, I. Tankoyeu, J. Stöttinger, and F. Giunchiglia. Indexing
       media by personal events. In Proceedings of the 2nd ACM International
       Conference on Multimedia Retrieval. ACM, 2012.
[13]   S. Rafatirad, A. Gupta, and R. Jain. Event composition operators: Eco.
       In Proceedings of the 1st ACM international workshop on Events in
       multimedia. ACM, 2009.
[14]   S. Rafatirad and R. Jain. Contextual augmentation of ontology for
       recognizing sub-events. In Semantic Computing (ICSC), 2011 Fifth
       IEEE International Conference. IEEE, 2011.
[15]   P. Sinha and R. Jain. Classification and annotation of digital photos
       using optical context data. In CIVR, 2008.


                                                           STIDS 2013 Proceedings Page 9