Personalized Voice Search for Internet TV Joaquin A. Delgado, PhD. Ravi Kalluri Krishnaja Gutta Verizon Corporation Verizon Corporation Verizon Corporation San Jose, CA San Jose, CA San Jose, CA USA USA USA joaquin.a.delgado@verizon.com ravi.kalluri@verizon.com krishnaja.gutta@verizon.com Arun B. Krishna Devon Turner Verizon Corporation Verizon Corporation San Jose, CA San Jose, CA USA USA arun.krishna@verizon.com devon.g.turner@verizon.com ABSTRACT Verizon FiOS is a bundled Internet access, telephone, and television service that operates over a fiber-optic communications In this paper, we discuss various strategies that have helped network with over 6 million customers in nine U.S. States. FiOS address the unique set of challenges we have faced in the attempt is in the process of upgrading its customers to the new FiOS IPTV to provide highly relevant and personalized voice search results to platform. users of our Internet TV (a.k.a. IPTV) system. While movie Perhaps, the key benefit of having all information come recommender systems have been heavily studied in the academia through IP is that it will allow providers to deliver more content to [1] as well as in the industry [2], full TV recommender systems a wider variety of devices, all with an improved user experience are less prevalent and require a deeper understanding of real- (through better analytics and more relevant content), usually world complex scenarios, such as using voice search as a accompanied with an improved customer interface and hardware. mechanism for providing an easy-to-use interface for content Navigating channels and programs now feels more like surfing the search and discovery in IPTV platforms. It also requires the web, and system upgrades are easily performed, ensuring the generation of fresh, domain-specific, relevant and highly experience can be regularly updated with new features as easily as contextual search results and recommendations within the apps are updated on mobile devices. constraints of what is playable and what is not; whether the Even with the advances of video IP, one aspect of the user suggested programs come from airings currently available from interface that is seemingly difficult to overcome is the live/linear channels, time-shifted (a.k.a. catch-up) TV, digital cumbersome typing that is often necessary to perform a search, video recordings (DVR) or video-on-demand (VOD), or from typically done by selecting letters on a screen using the remote or future airings that may not yet be available but may still be of other pointing devices. Instead, companies have developed interest to users to subsequently follow and/or record them. advanced voice and natural language understanding user interfaces such as Apple’s Siri, Amazon’s Alexa or Google’s KEYWORDS Assistant. These kinds of interfaces are tremendously useful when Internet TV, Recommender Systems, Voice Search, Constraint-based performing search and discovery on TV due its simplicity vs. the recommendation, Query-driven recommendation use of an on-screen keyboard. Along the same lines, Verizon has developed and deployed a system for voice command and control 1 INTRODUCTION as well as for search and discovery for its FiOS TV platform, currently available only in its Mobile App. Internet TV or IPTV systems are platforms that deliver high quality and reliable video streaming of live/linear channels, time- shifted and recorded TV, as well as streaming of video-on- demand (VOD) over Internet protocol (IP). Examples of these platforms include over-the-top (OTT) providers such as Sony’s PlayStation Vue, Sling TV, Hulu TV, and the recently announced YouTube TV, as well as advanced video IP network providers such as Verizon’s FiOS IPTV and Google Fiber. This is in contrast, to pure IP-based video-on-demand (VOD) only streaming services (e.g. Netflix, Amazon Prime Video, iTunes, Google Play, etc.) and other TV systems that use quadrature amplitude modulation (QAM) for video delivery, a standard used by most traditional digital cable television providers such as Cox, Cablevision, Time Warner Cable and Comcast1. 1 Technically, Comcast’s Xfinity X1 is a hybrid QAM+IP based system. 2 One could argue that surfacing “free” content is equally important. Figure 1: Recent Voice Searches on FiOS TV Mobile App ComplexRec 2017, Como, Italy. 2017. Copyright for the individual papers remains with the authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. Published on CEUR-WS, Volume 1892. ComplexRec 2017, August 31, 2017, Como, Italy J. Delgado et al. Users are always interested in receiving contextually relevant 2.2 Program Types and personalized search results, which may include recommendations based on usage. These results can help improve The user, through voice, might be trying to issue a TV control users’ satisfaction and can increase the likelihood that a user finds query, like “Tune into channel X” or “lower volume”. But when it something enjoyable to watch. comes to content discovery the user intent goes hand-in-hand with The remainder of this paper describes in more detail the types the type of programs that the user is interested in retrieving. of queries and the strategies we have developed to cope with the Examples of such program types are: unique challenges regarding relevancy and personalization our • Episodic TV Series (e.g. “The Americans”) voice search users now expect. • Single Programming Event (e.g. “The Grammys”) • Movie (e.g. “Rogue One”) 2 VOICE QUERIES AND USER INTENT • Music Video (e.g. Maroon 5, “Sugar” – 2015) • Sports/Game (e.g. “NBA Finals”) o Event (Actual game/match, e.g. “Golden 2.1 Constraint-based Query Fabrication State Warriors vs. Cavaliers @ Oracle The simplified diagram in Fig. 2 illustrates the query Arena”) fabrication sequence of a typical voice search platform, starting o Non-event (Commentaries, pre-game with the original speech utterance by the user, automatic speech shows, etc., e.g. “NFL Pre-game Show”) recognition (ASR) module, natural language understanding (NLU) processing and formulation of a final search query that ultimately runs against the metadata DB to produce results. Speech ASR Module NLU Engine Utterance Search Search Results Query Figure 4: Movies with Brad Pitt from the 90’s Metadata DB 2.3 Constraints and Query Specificity Figure 2: A Typical Voice Search System Architecture In Fig. 4, the user did not specify a particular movie title he/she wants to watch but rather inquired about all the movies in The query generated by the NLU engine is, in general, a which a particular person (e.g. “Brad Pitt”) was a cast of from the constraint-based search query that can be represented using a decade of the 1990’s. Prior to sorting this query yields many SQL-like syntax. For example, an utterance converted into text results without any particular order. Depending on the entities that says: “Show me Brad Pit movies from the 90’s” will result in used to build the constraints, the number of results and the sorting a constraint-based query of the following form: order, a query could be classified as specific or generic, with a full spectrum in-between. SELECT items FROM MetadataDB Some low to medium cardinality entities, when specified as WHERE Person = “Brad Pitt” constraints, lead to more generic queries. For example: People AND Decade = “90” (Cast & Crew, Singer), Genre, Sports League, etc. Others entities, of very high cardinality when specified as AND ProgramType=”Movie” constraints tend to narrow down to a lesser number of results, thus generating a more specific kind of queries. Such entities are: Title (movie, music video or TV program title), Sports Team or Sports Figure 3: Constraint-based Query Tournament. On the other hand qualifiers, such as time and quality, So far this model assumes that constraints in the search query augment the specificity of the query by narrowing down the only determine membership in the result set. There is no reference number or results and/or predetermining a sorting order. to sorting parameters and/or relevance ranking, which we will Examples of such qualifiers are: discuss in greater details later. • Specific period (year or decade): e.g. “from the 90’s”. ComplexRec 2017, August 31, 2017, Como, Italy J. Delgado et al. • Relative time: e.g. “Latest”, “Oldest”. 3.3 Strategies Selection and Relevance • Qualitative sorting: e.g. “Top Rated”, “Best”. Functions 3 SEARCH STRATEGIES & Based on Program Type that was inferred from the user intent mentioned in Section 2.1 we can decide to apply different PERSONALIZATION strategies to further refine the query output from the NLU Engine. A strategy is actually a query + a relevance function which 3.1 Search Dimensions may be used to sort the search results. Our metadata content contains several attributes that represent In general, any ranking will happen after all constraints various dimensions with which our search applications must work specified in the query are applied. What we refer to, as in order to build query-based constraints like the one shown in recommendation (personalization) ranking is actually a function Fig. 3 When searching TV programs, these attributes include: of: • Prose text (including overviews, synopsis, and user 1. Text query (TF×IDF) [6] relevance score, reviews). Popularity/trending score plus Score (U, I) shown in • Shorter text (such as director and actor names, and equation 1. titles). 2. In the case of TV Series we weigh higher shows • Text labels (such as moods, keywords, sports league, aired on channels in MWC, and for sports, we weigh sports team). higher items associated to the users’ favorite teams/sports in the MWT list. • Numerical attributes (user ratings, movie revenue, the number of awards, Rotten Tomato scores [4], Here is a list of the strategies we implemented. IMDb ratings [5]). • Programming schedule (airing-date), Release dates 3.3.1 TV Series Strategy and other attributes important in search. In theory, any of these dimensions can be used to construct This includes episodic content airing at various times or hard constraints (filters) or soft constraints (ranking) as part of the available as VOD. search query and sorting strategy. Some of these dimensions could • In general, airing time is not relevant (e.g. we should be used to derive newly computed values that can also be used in not give preference to a specific airing window), the ranking function, such as: however it is important to surface first playable2 assets. • Popularity and Trending: Shows that are popular or • If the user searches for a series with an exact match, or trending based on viewership and recording events. matches, only return those specific results (e.g. • New, Live and On Now: Shows that are airing for "homeland" should return one result – the TV series the first time, air “live” and/or are currently airing “Homeland”). now. • If the user's title search matches multiple titles, sort titles based on text query relevance; e.g.: "Family" 3.2 User and Item Taste Vectors should yield “Modern Family" which is a currently airing show before “Family Ties” and “All in the Besides these content dimensions, we have modeled users and Family.” items in the same latent space using taste vectors. Item taste • If the user performs a more generic search (e.g. vectors item taste vectors are the result of Matrix Factorization [3] “Dramas on HBO”), apply the filter and then rank by on the user-item DVR recording matrix from the FiOS TV legacy recommendation, including possible bias towards shows system that decomposes users and movies into a set of latent from channels/providers from the MWC list. factors (which we can think of as categories like “Fantasy” or • If the user mentions certain qualifiers, the defined data “Violence”). point should be used: For Users, who are relatively new in the system, we are o "New episodes": Return series with "new" inferring taste vectors from the top items present in the user’s episode aired in the last week sorted by current viewing history. personalization. For any item in the result set for which we have an item taste o "Latest TV airings": Return airings sorted by vector we are able to compute a score as the dot product of the original airing date and then by personalization. user taste vector and the item taste vector: o "Top rated series: Return series sorted by IMDB/Rotten Tomatoes rating. If multiple 𝑆𝑐𝑜𝑟𝑒 𝑈, 𝐼 = 𝑼 ∙ 𝑰 (1) series have the same rating, then sort by personalization. This score can then be used to personalize any search results set by including it in the function score to determine the final 3.3.2 Single Title Strategy ranking. This includes movies, single programming event and music We also store and use personalized lists of entities inferred titles. from the user’s viewing history, such as: • Most Watched Channels (MWC). • Most Watched Teams (MWT). 2 One could argue that surfacing “free” content is equally important. to improve our intent-based relevance functions. 3 ComplexRec 2017, August 31, 2017, Como, Italy J. Delgado et al. • If the user searches for a title with an exact match, or that for the sports use case users are more interested in matches, only return those specific results (e.g. "James live sports results, and upcoming schedule of their Bond movies" or "Star Wars". favorite teams rather than popular sports shows. • If the user's title search matches multiple titles, sort titles based on text query relevance. 4 CONCLUSIONS • If the user performs a more generic search based on genre: "comedy movies” or “action thrillers” then rank In Voice Search for Internet TV queries with high specificity by recommendations after applying all constraints. tend to be very precise and have little need for additional sorting and/or relevance ranking to satisfy the user’s request. On the other • Generic search with a single filter (e.g. “movies with hand, the more generic a query is, the more results the user has to Brad Pitt”), rank by recommendations after applying all sift through, thus requiring some type of relevance ranking to required constraints. bubble up results that are expected to be most relevant to the user. • Generic search with multiple filters (e.g. “movies with But, what is relevance in this context? Is relevancy universal Brad Pitt & Angelina Jolie”), same as the case with one or does it depend on the user that asking? filter. Relevancy tuning is a hard problem — it’s usually • If no taste vector exists for the user (new profile, no misunderstood, and it’s often not immediately obvious when activity) a generic search result should be sorted by something is wrong. It usually requires seeing many bad examples original airing-date or release-date, ascending. to identify problematic patterns, and it’s often challenging to • If a user performs a generic search with the following know what better results would look like without actually seeing sort qualifier as interpret by the NLU engine, the defined them show up. Unfortunately, it’s often not until well after a data point should be used to sort: search system is deployed into production that organizations begin o "Latest comedy movies": Sort based on to realize the gap between out-of-the-box relevancy defaults and theatrical release date. true domain-driven, personalized matching. o "Top rated comedy movies": Sort based on This paper describes some promising strategies that have been critics rating and then by recommendation.. used to implement personalized voice search for Internet TV to mitigate the relevancy problem. We will report the evaluation of 3.3.3 Game (Sports) Strategy this approach in subsequent reports. This applies to both sports events (e.g. “Golden State Warriors vs. Cleveland Cavaliers” NBA match) as well as sports non- DISCLAIMER events (e.g. “NBA Pre-game Show”, “Inside the NBA”, etc.). It is This paper makes does not describe any specific product perhaps the trickiest strategy to implement. feature nor does it promise the delivery of one. It bares no We classify sports searches into the following 5 categories influence on the development roadmap of FiOS IPTV or any other that range from more specific to the more generic: Verizon product for that matter. It is a research paper, exploratory • Team Search: A user does a search for a specific team in nature, that represents the discussions and ideas solely (e.g. “Golden State Warriors”). attributed to the authors and does not represent any company plan • Tournament Search: A user does a search for a specific and/or position. tournament or event (e.g. “Kentucky Derby”, “The Masters”, “Indianapolis 500”, “Super Bowl”). • League Search: A user does a search for a league (e.g. REFERENCES “NBA”). [1] Jonathan L. Herlocker , Joseph A. Konstan , Loren G. Terveen, • Sport Genre Search: If the user does a search for a John T. Riedl, Evaluating collaborative filtering recommender generic genre of sports (e.g. “Basketball”). systems, ACM Transactions on Information Systems (TOIS), v.22 • Sports On Now: The user does a search for “Sports on n.1, p.5-53, January 2004 now” to figure out what is being shown currently in on [2] The Netflix Prize [http://www.netflixprize.com/] TV based on the guide/schedule. [3] Y. Koren, R. Bell, and C. Volinsky. Matrix factorization For all these use cases we factor the following elements into techniques for recommender systems. IEEE Computer, 42(8):30– the ranking: 37, 2009. • Program Type: Sports Events will always weigh higher [4] Rotten Tomatoes [https://www.rottentomatoes.com/] than Sports Non-Events. [5] Internet Movie Database (IMDb) [http://www.imdb.com/] [6] Ricardo Baeza-Yates, Berthier Ribeiro-Neto. Modern Information • New or Live programs will always weigh higher than Retrieval: The Concepts and Technology Behind Search, Addison repeated programs or “re-runs”. Wesley, 2011 • Airing start-time: While events that are airing now are [7] The Closer, The Better weighed the highest, past and upcoming events decrease [https://www.elastic.co/guide/en/elasticsearch/guide/current/decay- their score based on Gaussian decay [7] based on airing functions.html] start-time. • League Bias: We bias results towards popular/ professional leagues in the U.S. (e.g. NFL, NBA, NHL, NCAA, etc.) when a generic search are performed. • Personal Team Bias: Extra boost is given to teams in the user’s MWT list in generic searches as well. We found