<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enabling Advanced Business Intelligence in Divino</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Danilo Croce</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Garzoli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Montesi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diego De Cao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Basili</string-name>
          <email>basilig@info.uniroma2.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Enterprise Engineering University of Roma</institution>
          ,
          <addr-line>Tor Vergata 00133 Roma</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents the system targeted in the Divino project, funded under the Industria 2015 framework of the Italian Ministry of Industry. The resulting platform embodies an innovative portal technology where Social Web functionalities, User Pro ling and Aspect-based Opinion Mining are integrated through Liferay, a well known Enterprise Portal Technology. The proposed approach allows analysts to bootstrap an opinion-mining system by interacting with data-driven functions based on e ective Online Machine Learning paradigms. The evaluation of the proposed methods is carried out in the targeted domain, i.e. the marketing of national wine products, one of the major focus area of the Made in Italy track of Industria 2015.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In Business Intelligence, analysts have nowadays access to a variety of public
forums where opinions and sentiments about companies, products and strategies
are expressed in unstructured form. Opinion Mining (OM) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] tackles di erent
problems that arise in this scenario, such as determining if a segment of text
(sentence, paragraph or section) is opinionated, identifying the opinion-holder
(the person or organization who expresses the opinion) or determining the
polarity (i.e. how positive or negative each opinion is). For business intelligence,
it is also useful to classify each opinion according to the aspect of the analyzed
product, such the avor or taste of a wine.
      </p>
      <p>This paper discusses the system targeted in the Divino project, funded under
the Industria 2015 framework of the Italian Ministry of Industry. The resulting
platform embodies an innovative portal technology where Social Web
functionalities, User pro ling and Aspect-based Opinion Mining (OM) are integrated.
On the one hand, users can visit a portal enjoying a community interested in
the eno-gastronomic domain of wine. When logged, the so-called Divino User
has a deeper interaction with the portal, leaving message in the forum,
designing a personalized blog or buying items in a specialized e-shop; every registered
user becomes part of a Social Network, determining friendship-based links with
other users. On the other hand, an Opinion Mining work ow has been
implemented to capture people opinions and preferences expressed within the portal.
These are enriched by crawling and processing specialized sites and blogs from
the Web. Opinions are stored in a semi-structured form and meaningfully
summarized to be consumed by Market Analysts. Based on the Enterprise Portal
Technology known as Liferay, the system results in a Web Portal where di erent
users can enjoy and interact, always providing valuable information for Business
Intelligence processes.</p>
      <p>
        The proposed OM work ow is quite general and it can be used to bootstrap
and adapt an OM system to a target domain. This can be achieved by applying
online Learning Algorithms [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], training classi ers that recognize topics, aspects
and opinions in texts, comments and blogs. The online learning paradigm is
appealing as it allows an interaction between the system and a Market Analyst,
who can incrementally re ne the domain by validating classi ers predictions. The
applicability of the proposed approach is then evaluated in the targeted domain
of the national and international marketing of wine products, one of the major
focus area of the Made in Italy track of Industria 2015. In the rest of the paper,
Section 2 discusses the OM process in Divino. Section 3 provides a description of
the resulting portal. Section 4 provides the experimental evaluation and Section
5 derives the conclusions.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Modeling Opinion Mining in Divino</title>
      <p>If we are interested in detecting opinions about wines, all textual units containing
information related to the target products must be carefully retrieved. Let us
consider the following excerpt related to the wine domain:</p>
      <p>La gamma aziendale prevede un vino rosso basato su uve ciliegiolo in
purezza, il Ciliegiolo Golfo del Tigullio doc, vini cato in acciaio, che
dona al vino netti ma delicati sentori di ciliegia, violetta e una sottile
vena speziata (pepe) senza mancare di una buona acidita e tannicita.1
It contains information about a wine, the \Ciliegiolo Golfo del Tugullio doc", i.e.
the entity to which the author refers. As we are interested in opinions related
to speci c aspects of wine, such as avor and taste, textual units containing
objective expressions can be neglected. Words like \sentori netti ma delicati " and
\buona acidita e tannicita" here give a positive connotation to the Aroma and
Taste aspects, respectively. Moreover, even if not made explicit, the underlying
domain must be properly addressed as it allows to reject texts related to other
products, e.g. cars or mobile phones.</p>
      <p>
        Many approaches have been de ned to determine and recognize opinions in
texts, as discussed in [
        <xref ref-type="bibr" rid="ref11 ref14 ref8">8, 11, 14</xref>
        ], ranging from di erent text genres, from newswire
1 Translation: The product range contains a red wine derived from Ciliegiolo grapes,
that is the Ciliegiolo Golfo del Tigullio doc, vini ed in stainless steel, which gives
strong but delicate hints of cherry, violet and a slightly spicy note (pepper) without
missing a good acidity and tannin levels.
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] to social media, such as Twitter [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. These studies let to the development
of several corpora with detailed opinion and sentiment annotations, e.g., the
MPQA corpus [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] of newswire text. These corpora have proved very valuable as
resources for learning about the language of sentiment in general. As discussed
in the following section, in Divino we applied empirical methods in order to
automatically train classi ers able to associate sentences to speci c classes useful
to characterize the writer opinion. More formally, our ultimate aim is therefore
to extrapolate structured information such as the n-tuple hu; t; h; r; a; bi where:
{ u is the Textual Unit, e.g. a sentence or paragraph expressing an opinion;
{ t is the Topic related to u, e.g. the WineryProduct, that represents the
opinion domain;
{ h is the Opinion Holder, the person or organization expressing the opinion
(here the blog author);
{ r is the Opinion Target, that is the entity subjectively valued (e.g.
Ciliegiolo Golfo del Tugullio doc.);
{ a is the Aspect for r in the domain t (e.g. avor or taste);
{ b is the Polarity, associated with a target r and its speci c aspect a, e.g.
      </p>
      <p>Positive, Negative or Neutral.</p>
      <p>In the next section, data-driven learning algorithms to associate each u to the
proper n-tuple will be discussed.
2.1</p>
      <sec id="sec-2-1">
        <title>The Opinion Mining Work ow</title>
        <p>Behind the Divino portal, an OM work ow has been developed to structure
opinions, as discussed above. We de ned a speci c ontology providing a
metamodel from which domain-speci c OM work ows are derived, not shown here
for space reasons. In the Divino project, the work ow shown in Figure 1 has
been implemented.</p>
        <p>
          In the Data Gathering phase, a dedicated Web Crawler downloads
documents from wine specialized sites, blogs and forums. Chaos [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], the Natural
Language Processing (NLP ) processor made available at the University of Tor
Vergata, analyzes such documents to extract morpho-syntactic and semantic
information required by the work ow.
        </p>
        <p>In the Information Extraction phase the Target Extractor allows to
identify sentences mentioning one or more target products. In the domain addressed
by Divino, examples of target can be wines, such as Barolo or Taurasi, or
Varietal, such as Syrah or Merlot. This module is based on the Name Entity
Recognizer and Classi er (NERC) made available by Chaos. The Target Propagator
nds sentences referring to targets, even if they are not explicitly mentioned.</p>
        <p>The core Sentiment Analysis functionalities determine opinions and are
realized as a sequence of classi cation steps. Among all existing Machine
Learning paradigms, we investigated the class of Online Learning Algorithms. The
goal, as in a traditional fashion, is to predict classes for instances. In addition,
soon after the prediction is made, it can then be used to re ne the prediction
hypothesis used by the algorithm. In a traditional setting, the training phase would
Data  Gathering  
Informa0on    
Extrac0on  
Sen0ment  
Analysis  </p>
        <p>Web  Crawler  
NLP  Processor  
Target  Extractor  
Target  Propagator  
Topic  Annotator  
Aspect  Annotator  </p>
        <p>Polarity  Annotator  
have started ex-novo, re-considering all training examples. Such online schemas
allow implementing mechanisms for relevance feedback: it incrementally re nes
the domain classi ers and adapts the resulting analysis to the target domain.</p>
        <p>
          In particular, the Passive Aggressive (PA) learning algorithm [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] is one of
the most popular online approaches and it is generally referred as a state-of-art
online method. Its core idea is quite simple: when an example is misclassi ed,
the algorithm updates the model with the hypothesis that is more similar to
the current one. Formally, let (xt; yt) be the t-th example where xt 2 Rd is a
feature vector that represents a document or sentence in a d-dimensional space,
while yt 2 f+1; 1g is the corresponding label, e.g. a sentence does/does not
belong to a topic or polarity class. Let wt 2 Rd be the current classi cation
hypothesis. The PA classi cation function is f (x) = wT x. After receiving xt, the
new classi cation function wt+1 becomes the one that minimizes the objective
function Q(w) = 12 kw wtk2 + C l(w; (xt; yt)). The rst term kw wtk is
a measure of how much the new hypothesis di ers from the old one while the
second term l(w; (xt; yt)) is a proper loss function assigning a penalty cost to
an incorrect classi cation. C is the aggressiveness parameter that balances the
two competing terms2. Minimizing Q(w) corresponds to solving a constrained
optimization problem, whose solution let to update the classi er according to
the following schema: wt+1 = wt + txt; t = yt min nC; H(wktx;(txkt2;yt)) o.
        </p>
        <p>
          If xt is correctly classi ed, the model does not change, while, after a wrong
prediction, the new classi cation function wt+1 becomes a linear combination
between the old one wt and the feature vector xt. A kernelized version of the
PA algorithm is easy to obtain and gives the possibility to exploit rich data
representations, as discussed in [
          <xref ref-type="bibr" rid="ref15 ref3">3, 15</xref>
          ].
2 In this work we will consider the hinge loss H(w; (xt; yt)) = max(0; 1
ytwT xt)
        </p>
        <p>In the resulting work ow, given a new document, the Topic Annotator
retrieves paragraphs related to all topics t that are compatible with the domain,
e.g. WineryProducts or Varietals. Each paragraph is associated by a PA
classi er to each target topic t. In order to model an open-world scenario, where not
all topics are already known, the OtherTopic class is introduced: each paragraph
classi ed as OtherTopic is not considered in the remaining processing chain by
the other annotators. The Aspect Annotator classi es all sentences from the
remaining paragraphs with respect to the active aspects a of a given topic t. Even
at this level, the open-world assumption is valid, so the OtherAspect class is
introduced. Finally, for each sentence associated to a valid aspect, the
corresponding polarity is provided by another PA-based classi er with respect to the
Positive, Negative or NoPolarity classes3. More details about the
modeling of single textual units u are provided in Section 4. At the moment of writing
the Opinion Holder h is assumed to be the content creator, e.g. the author of a
blog page or comment in a forum.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>The Divino portal</title>
      <p>The Divino portal is designed as a set of interacting services whose overall logic
is integrated within the Liferay portal. Liferay4 is a free and open source
enterprise portal written in Java and distributed under the GNU Lesser General
Public License and proprietary licenses. It allows to e ciently create a portal
for Internet or Intranet use and it is fundamentally constructed of functional
3 When a sentence is classi ed as Positive and Negative at the same time, it is
considered as Neutral.
4 http://www.liferay.com/
units called portlets, that represent portal functionalities and produce fragments
of markup code that are aggregated into a portal.</p>
      <p>Liferay enables the creation of di erent users and di erent roles, so that every
role associates a user group sharing the same permissions. Permissions are linked
to Portal, Portlet and other Liferay entities. In addition to the role of
Administrator, the Divino Portal handles four roles, i.e. Guest, Divino User, Annotator
and Market Analyst, each enabled to access to the following functionalities.</p>
      <sec id="sec-3-1">
        <title>Enjoying the Divino Portal as a Registered User. A user can visit the</title>
        <p>
          Divino Portal without being registered. As a Guest, he can view a limited set
of pages providing not tailored information as well as the Divino Forum and
e-commerce services, i.e. the Divino Shop, as shown in Fig. 2. A log-in step is
required in order to post any message or buy items. Moreover, a Divino Search
portlet allows to retrieve all web pages downloaded during the Data Gathering
phase, described in Section 2. When logged-in at the Divino Portal, the user
assumes the role of Divino User. He can now participate to the social activities
made available in the portal within the forum and e-commerce portlets. As shown
in the background of Figure 3, each user is associated to a personal MyDivino
page where a blog can be easily populated with comments. In line with popular
Social Networks, a friendship schema is applied to allow a restricted number of
friends to read the personal blog. Each user can retrieve other users and ask their
friendship. Every Divino User owns a pro le that keep all the information about
his search queries, preferences and purchased items. Such interactions with the
system, as well as other information provided through a questionnaire suggested
in the registration phase, are crucial for many portal functionalities. They enable
the design of di erent User Recommending and Information Filtering schemas, as
discussed in [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. At the moment of writing, a rst recommending schema is used
to suggest friends. All information gathered during registration provide a set of
preferences Pui describing each Divino User di. For example, one can prefer red
wines instead of white wines or wines from speci c regions. A rst recommending
function has been implemented by estimating the similarity among user pairs
di and dj in terms of the Jaccard Similarity score between the sets of related
preferences: J (di; dj ) = jPdi \Pdj j . The score is 1 for user pairs with exactly the
jPdi [Pdj j
same interests, while it drops to 0 for \di erent" users. Figure 3 shows the User
Suggestion, i.e. two users nominated to be friends.
        </p>
        <p>Providing labeled material as Divino Annotator. The machine learning
methods proposed in Section 2.1 require labeled data in order to acquire a proper
model of target phenomena. The role of Divino Annotator allows user to access
the annotation functionalities. When logged, users can retrieve, add, remove
and modify documents downloaded during the Data Gathering phase. Given a
document, the user annotates all paragraphs with the corresponding information,
such as Topic, Aspects and Polarity. In Figure 4 the interface shows a brief part
of a document related to a speci c wine, the Chianti Classico: in particular, two
sentences expressing positive comments about the taste aspect are shown. The
contribution of the Online Learning schema is emphasized in the annotation
phase. In fact, the annotator can ask the system to automatically annotate
the examples and validate the proposed information. When these are validated
and submitted, the model can be corrected and improved through the novel
annotations, so conforming to the Annotator notion of the target domain. In a
real scenario, the system is expected to produce wrong annotations during its
rst life-cycle and to improve the annotation quality after a reasonable number
of interactions with the annotators.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Advanced Business Intelligence in Divino. The automatic detection of</title>
        <p>
          users preferences and opinion from the portal and the corresponding translation
in a semi-structured form, represents a valuable source of information for Market
Analysts to feed Business Intelligence processes. Some of these information are
automatically captured from user interactions, while other come from external
sources, retrieved in the Data Gathering phase. For example, the Market Analyst
can browse statistics about purchased items or the query logs from the Search
Portlet. Advanced Business Intelligence techniques can also be applied in order
to capitalize the knowledge extracted within the Opinion Mining process, as
discussed in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. As an example, the Opinion Browsing portlet is shown in
Figure 5: a multi-level pie chart, the so-called Donut, provides a synthetic view
of opinions expressed by people within the forum or the targeted web pages.
It is represented as the percentage of textual units expressing opinions about
di erent aspects within a speci ed domain, such as WineryProduct. A
negrained analysis can be enabled focusing on a speci c target, e.g. a Brunello
di Montalcino. For example, in Figure 5 the percentage of textual units giving
positive comments about the taste of the product is 29,69%, while the percentage
of negative comments about the price is 1.64%. The analyst can have a deep
look on these statistics by clicking on every percentage, so visualizing the list of
textual units and, if needed, can browse the source document. It is also possible
to access to other reports and charts, enabling complex activities such as the
monitoring of temporal trends, by visualizing the opinion depending on speci c
temporal based selections.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Evaluation</title>
      <p>
        In this section, the Opinion Mining process is evaluated, as it represents the
core functionality enabling Advanced Business Intelligence processes within the
entire Divino Portal. In particular, the quality of classi ers powering di erent
annotators described in section 2.1 is considered. The classi cation task is
tackled through a Multiple Kernel approach, as discussed in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Kernel methods
are bene cial because the combination of kernel functions can be integrated
into state-of-the-art classi ers, such as Support Vector Machines [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] or Passive
Aggressive algorithm [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], as they are still kernels.
4.1
      </p>
      <sec id="sec-4-1">
        <title>Textual Unit representation</title>
        <p>A multiple kernel approach allows to combine the contribution of complex kernel
functions to implicitly integrate di erent linguistic and semantic information
of annotated examples. In this work, two kernels have been employed in our
modeling. The Bag of Word Kernel (BOWK) re ects the lexical overlap between
textual units t, represented as a vector whose dimensions correspond to di erent
words. Each dimension represents a boolean indicator of the presence or not of
a word in the text. The kernel function is the cosine similarity between vectors.</p>
        <p>
          Another kernel is added, as lexical information of BOWK is highly a ected
by data sparseness, and words as found in test cases may often result rare or
unseen in the training set. Our aim is to increase robustness to the resulting system
by extending lexical information through Distributional Analysis. The core idea
is that the meaning of a word can be described by the set of textual contexts in
which it appears (Distributional Hypothesis as described in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]). Words can be
geometrically represented as vectors whose components re ect the corresponding
contexts: two words close in the space (i.e. they have similar contexts) are likely
to be related by some type of generic semantic relation, either paradigmatic (e.g.
synonymy, hyperonymy, antonymy) or syntagmatic (e.g. meronymy, conceptual
and phrasal association), as observed in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. A word-by-context matrix M is
obtained through a large scale corpus analysis. Then the Latent Semantic
Analysis [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] technique is applied to capture the statistical information of M by a
lower k-dimensional space. Given two words w1 and w2, their similarity function
is estimated as the cosine similarity between the corresponding projections
w1; w2 in the space, i.e (w1; w2) = kww11kkww22k . The result is that every word
can projected in the reduced Word Space. The representation of a t consists of
a linear combination of vectors representing words. Finally, the resulting Lexical
Semantic Kernel (LSK) function is the cosine similarity between vector pairs, in
line with [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], allowing to generalize the lexical information. The Word Space is
acquired through the distributional analysis of a corpus made of about 2.5
million tokens; it is composed by web pages downloaded during the Data Gathering
phase and pages from Wikipedia related to the Wine category, in order to have
a space tied to the target domain. All words occurring more than 30 times are
represented through vectors. The original space dimensions are generated from
the set of the 20,000 most frequent words in the corpus. One dimension describes
the Pointwise Mutual Information score between one feature, as it occurs on a
left or right window of 5 tokens around a target. Left contexts of targets are
treated di erently from the right ones, in order to capture asymmetric syntactic
behaviors (e.g., useful for verbs): 40,000 dimensional vectors are thus derived for
each target, later reduced to k = 250.
        </p>
        <p>
          As a side e ect of the LSK, sentences are projected in the same representation
space of words as in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Given a textual unit u referring to an aspect a with a
polarity p, the set of m words more semantically related to us can be obtained,
namely Wtk. By collecting all Wakp from sentences referring to a speci c aspect
a with a polarity p, a Tag Cloud can be obtained, as discussed in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Figure 6
shows tag clouds related to the taste and price aspects. They are generated by
selecting the k = 20 words more similar to examples used in this experimental
evaluation. Notice that the word size depends on the number of times a tag is
suggested by a single u.
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Opinion Mining Results</title>
        <p>In our approach, the kernel combination BOWK + LSK estimates the
similarity between textual units, linearly combining lexical properties captured by
BOWK and the lexical generalization of the LSK5. A set of 60 web pages has
been annotated according to the schema proposed in Section 2.1. Annotations
are derived from 7 specialized sites and blogs6 from the enogastronomic domain
targeted in the Divino Project. The Topic annotator is powered with a
classier associating paragraphs with respect to 4 classes, i.e. WineryProducts,
Varietals, WineryBrands and OtherTopics. The analysis has been then
specialized for the WineryProducts and each sentence within this topic has
been classi ed with respect to di erent aspects, i.e. Taste, Aroma, Color,
Price and OtherAspects. Each sentence related to a valid aspect is then
5 Here, parameters and weight the combination of the three kernels. In our
experiments, and are set to 1.
6 We annotated pages from www.intravino.com, www.enofaber.com, percorsidivino.blogspot.it,
ilvinoeoltre.blogspot.it, grappolidivini.blogspot.it, simodivino.blogspot.it and
grappolorosso.blogspot.it.
classi ed with respect to the Positive, Negative and NoPolarity classes.
Table 1 shows the number of paragraphs annotated with Topic classes and the
number of sentences annotated with Aspect and Polarity classes.</p>
        <p>
          In order to evaluate the robustness of the employed Passive Aggressive (PA)
classi ers, we compared performances with a Support Vector Machine based
classi er, which represents the state-of-the-art of kernel-based (non online)
machines. In particular, the SV M multiclass schema described in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] is applied7. A
One-VS-All schema is used for the PA to realize the multi-classi cation: a binary
classi er is used for each class and the one providing the highest classi cation
function is selected. As the PA model depends on the order of example provided
in the training phase, a 10 fold cross validation schema is applied. On the
contrary, SV M multiclass adopts the implicit multi-class formulation described in
[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Results are measured in terms of accuracy, i.e. the percentage of examples
obtaining the correct labeling. Table 2 shows the mean results of both classi ers
within the 10 folds. As expected, the SVM generally achieves slightly higher and
more stable scores. It is not surprising as SVM, as a batch learning algorithm,
nds the optimal solution of the classi cation problem, while the PA does not,
according to its online nature [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. However, high results achieved by di erent
PA classi ers, i.e. about the 80% accuracy, con rms the applicability of online
schema in the OM work ow within the Divino Portal. The slightly lower
accuracy of the polarity classi ers emphasizes the complexity of capturing opinions
in the domain of wine.
7 http://svmlight.joachims.org/svm multiclass.html
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>This paper shows a comprehensive web portal where Social Web functionalities,
User Pro ling and automatic Aspect-based Opinion Mining are integrated. The
resulting portal allows people to express their preferences while enabling Market
Analysts to bootstrap an opinion-mining system from scratch. The e ectiveness
of the proposed Online Machine Learning schema has been evaluated in a real
use case in the national marketing of wine products. Future work will focus
on improving the system bootstrapping capability with fewer annotated data,
as well as a deeper study to combine modern Business Intelligence to
semistructured information extracted through Opinion Mining techniques.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Basili</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zanzotto</surname>
            ,
            <given-names>F.M.:</given-names>
          </string-name>
          <article-title>Parsing engineering and empirical robustness</article-title>
          .
          <source>Nat. Lang. Eng</source>
          .
          <volume>8</volume>
          (
          <issue>3</issue>
          ),
          <volume>97</volume>
          {120 (Jun
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Crammer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singer</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>On the algorithmic implementation of multi-class svms</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>2</volume>
          ,
          <issue>265</issue>
          {
          <fpage>292</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Crammer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dekel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keshet</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shalev-Shwartz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singer</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Online passiveaggressive algorithms</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>7</volume>
          ,
          <issue>551</issue>
          {
          <fpage>585</fpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cristianini</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shawe-Taylor</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Lodhi</surname>
          </string-name>
          , H.:
          <article-title>Latent semantic kernels</article-title>
          .
          <source>J. Intell. Inf. Syst</source>
          .
          <volume>18</volume>
          (
          <issue>2-3</issue>
          ),
          <volume>127</volume>
          {
          <fpage>152</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Halvey</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keane</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          :
          <article-title>An assessment of tag presentation techniques</article-title>
          .
          <source>In: Proceedings of WWW 2007</source>
          . pp.
          <volume>1313</volume>
          {
          <fpage>1314</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Distributional structure</article-title>
          . In: Katz,
          <string-name>
            <given-names>J.J.</given-names>
            ,
            <surname>Fodor</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.A</surname>
          </string-name>
          . (eds.)
          <source>The Philosophy of Linguistics</source>
          . Oxford University Press (
          <year>1964</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Joachims</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finley</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>C.N.</given-names>
          </string-name>
          :
          <article-title>Cutting-plane training of structural SVMs</article-title>
          .
          <source>Machine Learning</source>
          <volume>77</volume>
          (
          <issue>1</issue>
          ),
          <volume>27</volume>
          {
          <fpage>59</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
          </string-name>
          , E.:
          <article-title>Determining the sentiment of opinions</article-title>
          .
          <source>In: Proceedings of COLING. Association for Computational Linguistics</source>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Landauer</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumais</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A solution to plato's problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge</article-title>
          .
          <source>Psychological Review</source>
          <volume>104</volume>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosenthal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kozareva</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ritter</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Wilson, T.:
          <article-title>Semeval-2013 task 2: Sentiment analysis in twitter</article-title>
          .
          <source>In: In SemEval 2013</source>
          . pp.
          <volume>312</volume>
          {
          <fpage>320</fpage>
          .
          <string-name>
            <surname>Atlanta</surname>
          </string-name>
          , Georgia, USA (
          <year>June 2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Opinion mining and sentiment analysis</article-title>
          .
          <source>Foundations and Trends in Information Retrieval</source>
          <volume>2</volume>
          (
          <issue>1-2</issue>
          ),
          <volume>1</volume>
          {
          <fpage>135</fpage>
          (Jan
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Rajaraman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ullman</surname>
          </string-name>
          ,
          <source>J.D.: Recommendation Systems, chap. 9</source>
          . Cambridge University Press (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Sahlgren</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The Word-Space Model</article-title>
          .
          <source>Ph.D. thesis</source>
          , Stockholm University (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Seerat</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , and, F.A.:
          <article-title>Article: Opinion mining: Issues and challenges (a survey)</article-title>
          .
          <source>International Journal of Computer Applications</source>
          <volume>49</volume>
          (
          <issue>9</issue>
          ),
          <volume>42</volume>
          {
          <fpage>51</fpage>
          (
          <year>July 2012</year>
          ), published by Foundation of Computer Science, New York, USA
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Shawe-Taylor</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Cristianini</surname>
          </string-name>
          , N.:
          <article-title>Kernel Methods for Pattern Analysis</article-title>
          . Cambridge University Press, New York, NY, USA (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Wiebe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Wilson,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Cardie</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          :
          <article-title>Annotating expressions of opinions and emotions in language</article-title>
          .
          <source>Language Resources and Evaluation</source>
          <volume>1</volume>
          (
          <issue>2</issue>
          ),
          <volume>0</volume>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. Wilson,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Wiebe</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          , Ho mann, P.:
          <article-title>Recognizing contextual polarity in phraselevel sentiment analysis</article-title>
          .
          <source>In: Proceedings of EMNLP. Stroudsburg</source>
          , PA, USA (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>