Personalized Voice Search for Internet TV
             Joaquin A. Delgado, PhD.                                   Ravi Kalluri                              Krishnaja Gutta
                Verizon Corporation                                Verizon Corporation                          Verizon Corporation
                    San Jose, CA                                        San Jose, CA                                San Jose, CA
                        USA                                                 USA                                         USA
          joaquin.a.delgado@verizon.com                          ravi.kalluri@verizon.com                   krishnaja.gutta@verizon.com

                            Arun B. Krishna                                                            Devon Turner
                          Verizon Corporation                                                        Verizon Corporation
                              San Jose, CA                                                               San Jose, CA
                                  USA                                                                        USA
                       arun.krishna@verizon.com                                                  devon.g.turner@verizon.com
ABSTRACT                                                                        Verizon FiOS is a bundled Internet access, telephone, and
                                                                                television service that operates over a fiber-optic communications
In this paper, we discuss various strategies that have helped                   network with over 6 million customers in nine U.S. States. FiOS
address the unique set of challenges we have faced in the attempt               is in the process of upgrading its customers to the new FiOS IPTV
to provide highly relevant and personalized voice search results to             platform.
users of our Internet TV (a.k.a. IPTV) system. While movie                           Perhaps, the key benefit of having all information come
recommender systems have been heavily studied in the academia                   through IP is that it will allow providers to deliver more content to
[1] as well as in the industry [2], full TV recommender systems                 a wider variety of devices, all with an improved user experience
are less prevalent and require a deeper understanding of real-                  (through better analytics and more relevant content), usually
world complex scenarios, such as using voice search as a                        accompanied with an improved customer interface and hardware.
mechanism for providing an easy-to-use interface for content                    Navigating channels and programs now feels more like surfing the
search and discovery in IPTV platforms. It also requires the                    web, and system upgrades are easily performed, ensuring the
generation of fresh, domain-specific, relevant and highly                       experience can be regularly updated with new features as easily as
contextual search results and recommendations within the                        apps are updated on mobile devices.
constraints of what is playable and what is not; whether the                         Even with the advances of video IP, one aspect of the user
suggested programs come from airings currently available from                   interface that is seemingly difficult to overcome is the
live/linear channels, time-shifted (a.k.a. catch-up) TV, digital                cumbersome typing that is often necessary to perform a search,
video recordings (DVR) or video-on-demand (VOD), or from                        typically done by selecting letters on a screen using the remote or
future airings that may not yet be available but may still be of                other pointing devices. Instead, companies have developed
interest to users to subsequently follow and/or record them.                    advanced voice and natural language understanding user
                                                                                interfaces such as Apple’s Siri, Amazon’s Alexa or Google’s
KEYWORDS                                                                        Assistant. These kinds of interfaces are tremendously useful when
Internet TV, Recommender Systems, Voice Search, Constraint-based                performing search and discovery on TV due its simplicity vs. the
recommendation, Query-driven recommendation                                     use of an on-screen keyboard. Along the same lines, Verizon has
                                                                                developed and deployed a system for voice command and control
1     INTRODUCTION                                                              as well as for search and discovery for its FiOS TV platform,
                                                                                currently available only in its Mobile App.
    Internet TV or IPTV systems are platforms that deliver high
quality and reliable video streaming of live/linear channels, time-
shifted and recorded TV, as well as streaming of video-on-
demand (VOD) over Internet protocol (IP). Examples of these
platforms include over-the-top (OTT) providers such as Sony’s
PlayStation Vue, Sling TV, Hulu TV, and the recently announced
YouTube TV, as well as advanced video IP network providers such
as Verizon’s FiOS IPTV and Google Fiber.
    This is in contrast, to pure IP-based video-on-demand (VOD)
only streaming services (e.g. Netflix, Amazon Prime Video,
iTunes, Google Play, etc.) and other TV systems that use
quadrature amplitude modulation (QAM) for video delivery, a
standard used by most traditional digital cable television providers
such as Cox, Cablevision, Time Warner Cable and Comcast1.


1
    Technically, Comcast’s Xfinity X1 is a hybrid QAM+IP based system.
2
    One could argue that surfacing “free” content is equally important.
                                                                                   Figure 1: Recent Voice Searches on FiOS TV Mobile App
ComplexRec 2017, Como, Italy.
2017. Copyright for the individual papers remains with the authors. Copying
permitted for private and academic purposes. This volume is published and
copyrighted by its editors. Published on CEUR-WS, Volume 1892.
ComplexRec 2017, August 31, 2017, Como, Italy                                                                            J. Delgado et al.

    Users are always interested in receiving contextually relevant      2.2     Program Types
and personalized search results, which may include
recommendations based on usage. These results can help improve              The user, through voice, might be trying to issue a TV control
users’ satisfaction and can increase the likelihood that a user finds   query, like “Tune into channel X” or “lower volume”. But when it
something enjoyable to watch.                                           comes to content discovery the user intent goes hand-in-hand with
    The remainder of this paper describes in more detail the types      the type of programs that the user is interested in retrieving.
of queries and the strategies we have developed to cope with the        Examples of such program types are:
unique challenges regarding relevancy and personalization our                    •    Episodic TV Series (e.g. “The Americans”)
voice search users now expect.                                                   •    Single Programming Event (e.g. “The Grammys”)
                                                                                 •    Movie (e.g. “Rogue One”)
2 VOICE QUERIES AND USER INTENT                                                  •    Music Video (e.g. Maroon 5, “Sugar” – 2015)
                                                                                 •    Sports/Game (e.g. “NBA Finals”)
                                                                                           o Event (Actual game/match, e.g. “Golden
2.1 Constraint-based Query Fabrication                                                          State Warriors vs. Cavaliers @ Oracle
    The simplified diagram in Fig. 2 illustrates the query                                      Arena”)
fabrication sequence of a typical voice search platform, starting                          o Non-event (Commentaries, pre-game
with the original speech utterance by the user, automatic speech                                shows, etc., e.g. “NFL Pre-game Show”)
recognition (ASR) module, natural language understanding (NLU)
processing and formulation of a final search query that ultimately
runs against the metadata DB to produce results.


   Speech                  ASR Module             NLU Engine
  Utterance


                 Search                             Search
                 Results                            Query


                                                                              Figure 4: Movies with Brad Pitt from the 90’s
                            Metadata DB
                                                                        2.3    Constraints and Query Specificity
  Figure 2: A Typical Voice Search System Architecture                      In Fig. 4, the user did not specify a particular movie title
                                                                        he/she wants to watch but rather inquired about all the movies in
    The query generated by the NLU engine is, in general, a             which a particular person (e.g. “Brad Pitt”) was a cast of from the
constraint-based search query that can be represented using a           decade of the 1990’s. Prior to sorting this query yields many
SQL-like syntax. For example, an utterance converted into text          results without any particular order. Depending on the entities
that says: “Show me Brad Pit movies from the 90’s” will result in       used to build the constraints, the number of results and the sorting
a constraint-based query of the following form:                         order, a query could be classified as specific or generic, with a full
                                                                        spectrum in-between.
          SELECT items FROM MetadataDB                                      Some low to medium cardinality entities, when specified as
          WHERE Person = “Brad Pitt”                                    constraints, lead to more generic queries. For example: People
          AND Decade = “90”                                             (Cast & Crew, Singer), Genre, Sports League, etc.
                                                                            Others entities, of very high cardinality when specified as
          AND ProgramType=”Movie”
                                                                        constraints tend to narrow down to a lesser number of results, thus
                                                                        generating a more specific kind of queries. Such entities are: Title
                                                                        (movie, music video or TV program title), Sports Team or Sports
              Figure 3: Constraint-based Query                          Tournament.
                                                                            On the other hand qualifiers, such as time and quality,
    So far this model assumes that constraints in the search query
                                                                        augment the specificity of the query by narrowing down the
only determine membership in the result set. There is no reference
                                                                        number or results and/or predetermining a sorting order.
to sorting parameters and/or relevance ranking, which we will           Examples of such qualifiers are:
discuss in greater details later.
                                                                                 •    Specific period (year or decade): e.g. “from the
                                                                                      90’s”.
ComplexRec 2017, August 31, 2017, Como, Italy                                                                                    J. Delgado et al.

        •     Relative time: e.g. “Latest”, “Oldest”.                     3.3 Strategies Selection and Relevance
        •     Qualitative sorting: e.g. “Top Rated”, “Best”.
                                                                          Functions
3 SEARCH STRATEGIES &                                                         Based on Program Type that was inferred from the user intent
                                                                          mentioned in Section 2.1 we can decide to apply different
PERSONALIZATION                                                           strategies to further refine the query output from the NLU Engine.
                                                                              A strategy is actually a query + a relevance function which
3.1 Search Dimensions                                                     may be used to sort the search results.
    Our metadata content contains several attributes that represent           In general, any ranking will happen after all constraints
various dimensions with which our search applications must work           specified in the query are applied. What we refer to, as
in order to build query-based constraints like the one shown in           recommendation (personalization) ranking is actually a function
Fig. 3 When searching TV programs, these attributes include:              of:
         •   Prose text (including overviews, synopsis, and user                   1. Text query (TF×IDF) [6] relevance score,
             reviews).                                                                   Popularity/trending score plus Score (U, I) shown in
         •   Shorter text (such as director and actor names, and                         equation 1.
             titles).                                                              2. In the case of TV Series we weigh higher shows
         •   Text labels (such as moods, keywords, sports league,                        aired on channels in MWC, and for sports, we weigh
             sports team).                                                               higher items associated to the users’ favorite
                                                                                         teams/sports in the MWT list.
         •   Numerical attributes (user ratings, movie revenue,
             the number of awards, Rotten Tomato scores [4],
                                                                                Here is a list of the strategies we implemented.
             IMDb ratings [5]).
         •   Programming schedule (airing-date), Release dates            3.3.1      TV Series Strategy
             and other attributes important in search.
    In theory, any of these dimensions can be used to construct              This includes episodic content airing at various times or
hard constraints (filters) or soft constraints (ranking) as part of the   available as VOD.
search query and sorting strategy. Some of these dimensions could              •    In general, airing time is not relevant (e.g. we should
be used to derive newly computed values that can also be used in                    not give preference to a specific airing window),
the ranking function, such as:                                                      however it is important to surface first playable2 assets.
         •    Popularity and Trending: Shows that are popular or               •    If the user searches for a series with an exact match, or
              trending based on viewership and recording events.                    matches, only return those specific results (e.g.
         •    New, Live and On Now: Shows that are airing for                       "homeland" should return one result – the TV series
              the first time, air “live” and/or are currently airing                “Homeland”).
              now.                                                             •    If the user's title search matches multiple titles, sort
                                                                                    titles based on text query relevance; e.g.: "Family"
3.2 User and Item Taste Vectors                                                     should yield “Modern Family" which is a currently
                                                                                    airing show before “Family Ties” and “All in the
    Besides these content dimensions, we have modeled users and                     Family.”
items in the same latent space using taste vectors. Item taste                 •    If the user performs a more generic search (e.g.
vectors item taste vectors are the result of Matrix Factorization [3]               “Dramas on HBO”), apply the filter and then rank by
on the user-item DVR recording matrix from the FiOS TV legacy                       recommendation, including possible bias towards shows
system that decomposes users and movies into a set of latent                        from channels/providers from the MWC list.
factors (which we can think of as categories like “Fantasy” or                 •    If the user mentions certain qualifiers, the defined data
“Violence”).                                                                        point should be used:
     For Users, who are relatively new in the system, we are                             o "New episodes": Return series with "new"
inferring taste vectors from the top items present in the user’s                             episode aired in the last week sorted by
current viewing history.                                                                     personalization.
    For any item in the result set for which we have an item taste                       o "Latest TV airings": Return airings sorted by
vector we are able to compute a score as the dot product of the                              original airing date and then by personalization.
user taste vector and the item taste vector:                                             o "Top rated series: Return series sorted by
                                                                                             IMDB/Rotten Tomatoes rating. If multiple
                      𝑆𝑐𝑜𝑟𝑒 𝑈, 𝐼 = 𝑼 ∙ 𝑰         (1)                                         series have the same rating, then sort by
                                                                                             personalization.
    This score can then be used to personalize any search results
set by including it in the function score to determine the final          3.3.2      Single Title Strategy
ranking.                                                                       This includes movies, single programming event and music
    We also store and use personalized lists of entities inferred         titles.
from the user’s viewing history, such as:
         •    Most Watched Channels (MWC).
         •    Most Watched Teams (MWT).                                   2
                                                                              One could argue that surfacing “free” content is equally important.
    to improve our intent-based relevance functions.

                                                                                                                                                     3
ComplexRec 2017, August 31, 2017, Como, Italy                                                                             J. Delgado et al.

   •    If the user searches for a title with an exact match, or                  that for the sports use case users are more interested in
        matches, only return those specific results (e.g. "James                  live sports results, and upcoming schedule of their
        Bond movies" or "Star Wars".                                              favorite teams rather than popular sports shows.
   •    If the user's title search matches multiple titles, sort titles
        based on text query relevance.                                    4    CONCLUSIONS
   •    If the user performs a more generic search based on
        genre: "comedy movies” or “action thrillers” then rank                In Voice Search for Internet TV queries with high specificity
        by recommendations after applying all constraints.                tend to be very precise and have little need for additional sorting
                                                                          and/or relevance ranking to satisfy the user’s request. On the other
   •    Generic search with a single filter (e.g. “movies with
                                                                          hand, the more generic a query is, the more results the user has to
        Brad Pitt”), rank by recommendations after applying all
                                                                          sift through, thus requiring some type of relevance ranking to
        required constraints.
                                                                          bubble up results that are expected to be most relevant to the user.
   •    Generic search with multiple filters (e.g. “movies with
                                                                              But, what is relevance in this context? Is relevancy universal
        Brad Pitt & Angelina Jolie”), same as the case with one
                                                                          or does it depend on the user that asking?
        filter.
                                                                              Relevancy tuning is a hard problem — it’s usually
   •    If no taste vector exists for the user (new profile, no           misunderstood, and it’s often not immediately obvious when
        activity) a generic search result should be sorted by             something is wrong. It usually requires seeing many bad examples
        original airing-date or release-date, ascending.                  to identify problematic patterns, and it’s often challenging to
   •    If a user performs a generic search with the following            know what better results would look like without actually seeing
        sort qualifier as interpret by the NLU engine, the defined        them show up. Unfortunately, it’s often not until well after a
        data point should be used to sort:                                search system is deployed into production that organizations begin
              o "Latest comedy movies": Sort based on                     to realize the gap between out-of-the-box relevancy defaults and
                   theatrical release date.                               true domain-driven, personalized matching.
              o "Top rated comedy movies": Sort based on                      This paper describes some promising strategies that have been
                   critics rating and then by recommendation..            used to implement personalized voice search for Internet TV to
                                                                          mitigate the relevancy problem. We will report the evaluation of
3.3.3   Game (Sports) Strategy
                                                                          this approach in subsequent reports.
    This applies to both sports events (e.g. “Golden State Warriors
vs. Cleveland Cavaliers” NBA match) as well as sports non-                DISCLAIMER
events (e.g. “NBA Pre-game Show”, “Inside the NBA”, etc.). It is               This paper makes does not describe any specific product
perhaps the trickiest strategy to implement.                              feature nor does it promise the delivery of one. It bares no
    We classify sports searches into the following 5 categories           influence on the development roadmap of FiOS IPTV or any other
that range from more specific to the more generic:                        Verizon product for that matter. It is a research paper, exploratory
      •   Team Search: A user does a search for a specific team           in nature, that represents the discussions and ideas solely
          (e.g. “Golden State Warriors”).                                 attributed to the authors and does not represent any company plan
      •   Tournament Search: A user does a search for a specific          and/or position.
          tournament or event (e.g. “Kentucky Derby”, “The
          Masters”, “Indianapolis 500”, “Super Bowl”).
      •   League Search: A user does a search for a league (e.g.
                                                                          REFERENCES
          “NBA”).                                                         [1] Jonathan L. Herlocker , Joseph A. Konstan , Loren G. Terveen,
      •   Sport Genre Search: If the user does a search for a                 John T. Riedl, Evaluating collaborative filtering recommender
          generic genre of sports (e.g. “Basketball”).                        systems, ACM Transactions on Information Systems (TOIS), v.22
      •   Sports On Now: The user does a search for “Sports on                n.1, p.5-53, January 2004
          now” to figure out what is being shown currently in on          [2] The Netflix Prize [http://www.netflixprize.com/]
          TV based on the guide/schedule.                                 [3] Y. Koren, R. Bell, and C. Volinsky. Matrix factorization
    For all these use cases we factor the following elements into             techniques for recommender systems. IEEE Computer, 42(8):30–
the ranking:                                                                  37, 2009.
    •    Program Type: Sports Events will always weigh higher             [4] Rotten Tomatoes [https://www.rottentomatoes.com/]
         than Sports Non-Events.                                          [5] Internet Movie Database (IMDb) [http://www.imdb.com/]
                                                                          [6] Ricardo Baeza-Yates, Berthier Ribeiro-Neto. Modern Information
    •    New or Live programs will always weigh higher than
                                                                              Retrieval: The Concepts and Technology Behind Search, Addison
         repeated programs or “re-runs”.
                                                                              Wesley, 2011
    •    Airing start-time: While events that are airing now are
                                                                          [7] The Closer, The Better
         weighed the highest, past and upcoming events decrease
                                                                              [https://www.elastic.co/guide/en/elasticsearch/guide/current/decay-
         their score based on Gaussian decay [7] based on airing
                                                                              functions.html]
         start-time.
    •    League Bias: We bias results towards popular/
         professional leagues in the U.S. (e.g. NFL, NBA, NHL,
         NCAA, etc.) when a generic search are performed.
    •    Personal Team Bias: Extra boost is given to teams in the
         user’s MWT list in generic searches as well. We found