My Health Dictionary:
       Study on Web Service using Program
    Information Data-hub as Linked Open Data

    Masaru Miyazaki1 , Makoto Urakawa1 , Ichiro Yamada1 , Kikuka Miura1 ,
         Taro Miyazaki1 , Hiroshi Fujisawa1 , and Toshio Nakagawa1

               Japan Broadcsting Corporation(NHK), Tokyo, Japan,
                           miyazaki.m-fk@nhk.or.jp


      Abstract. With the evolution of the global Internet, it has become
      increasingly common for companies to automatically exchange various
      types of data among themselves. In addition, content providers, such as
      broadcasting stations, are being required to change their content-serving
      strategy so that the content can be delivered to viewers via various
      external services. To address this strategy change, this paper proposes
      program-related information as machine-readable web data that can be
      used in internal and external services. We report on the construction of a
      program information data-hub using the Linked Open Data (LOD) stan-
      dard format recommended by the World Wide Web Consortium. Results
      obtained by prototyping the data-hub and associated web services show
      that services employing a variety of program information can be realized
      by representing knowledge about the content as LOD data.

      Keywords: linked open data, program information, health


1   Introduction

The circulation of large amounts of content on the Internet, the rapid spread of
mobile devices such as smartphones, and the appearance of new viewing styles
such as time-shift playback have resulted in a significant change in the behavior
of viewers of TV programs. In line with this change, content providers such as
broadcasting stations are being required to change their content-serving strategy,
so that the content can be delivered to viewers via various external services
rather than simply waiting for them to access content via a broadcast and video
on demand service. We believe that an eﬀective strategy to achieve this objective
is to utilize semantic web and Linked Open Data (LOD) technology to build a
“Web of Data.” Consequently, we are currently studying how to build a hub
comprising various types of data associated with a TV program by creating
LOD program-related information and external data. In this demonstration, we
give an outline of the program information data-hub and introduce the prototype
health information services that use the data-hub.
2     Related Works
The BBC recognized the possibility of improving the utilization and presence
of content via LOD technologies at an early stage, and they have been working
to build a content data space using LOD. They have consequently developed
LOD for content such as program episodes and music artist information, and
these can be referenced from a variety of external sources. Moreover, at the
web site Wildlife1 , animal species and behavior information are systemized as
RDF. Thus, a user can immediately enjoy relevant information and programs.
Further, their eﬀorts to connect various pieces of program information as LOD
by crowdsourcing using tags are highly appreciated in the academic field[1].
    In this study, our aim is to actualize a more advanced data space that can be
utilized as an internal or external service of broadcasting stations. As a result, we
have constructed a program information data-hub that has accumulated not only
existing program information but also knowledge about the program contents,
and are currently exploring its possibilities.


3     Development of program information data-hub
NHK, Japan’s public broadcast station, has developed a variety of program-
related information in an in-house database for the purpose of providing broad-
casting information on their web site. However, much of the information was
described in RDB tabular form and, as a result, was underutilized for collabo-
ration with other new services. For example, a vegetable that is introduced as a
disease prevention measure in a health program could also be introduced as an
ingredient for a recipe in a cooking program. In this way, programs often provide
information that share common concepts with another program. By connecting
such programs to each other via the common concept, the creation of a new ser-
vice that connects programs transversely is possible. With the aim of realizing
such services, we gathered the program-related information (such as location of
image, video, web site) residing in in-house databases, transformed information
to RDF and automatically constructed an RDF store called a program informa-
tion data-hub. We used the Programmes Ontology of the BBC as a reference in
describing the schema of our data-hub, and expanded it to be able to describe
NHK’s own program information. Next, in order to realize the cooperation be-
tween the various external services, we automatically extracted performer names
and important words included in the program information, and added the link
information to the vocabulary of the DBpedia Japanese2 . Further as external
knowledge, we automatically added a link to a “knowledge map,” which is a
program-related knowledge dataset that is currently being built.
    The knowledge map comprises two types of data. One is “concept map,”
which consists of data obtained by analyzing a large text corpus on the web.
The map shows the semantic relations between words, such as causal relations
1
    http://www.bbc.co.uk/nature/wildlife
2
    http://ja.dbpedia.org
and hyponymy relations[2][3]. The second is “content map,” which was generated
by analyzing the summary text of a program. This method is composed of two
processes: topic extraction and relation estimation based on TF-IDF statistics
and semantic relation in concept map. It shows the relation between words and
the associated program.
    A structural example of a knowledge map associated with the program infor-
mation is shown in Fig.1. Concept map included 28 types, approximately 1,012
million word relations in total.
    Finally, we accumulated all schemas, instances, and knowledge map data in
an RDF Store, and constructed an environment that is accessible from a variety
of services through the SPARQL endpoints and WebAPI. Currently, program
information data-hub contains 1.89 million pieces of RDF triple created from
the experimental accumulation of 6,700 pieces of program data over a period of
two months.
                               Concept  Map

             ConceptWord        relation              ConceptWord

             “Vinegar”        “Prevention”         “Hypertension”


                               Content  Map

                                                           TV  Program
                                relation
                                              URI:nhkdwc0112
             “Vinegar”          “topic”       “Dining  with  the  chef”  
                                              No.112  “Vinegar-‐‑‒marinated  Aji  Salad”
             ConceptWord


       Fig. 1. Structure example of program information and knowledge map


4   My Health Dictionary service using program
    information data-hub
We have developed a “My Health Dictionary”(Fig.2) as a service that utilizes
the program information data-hub. My Health Dictionary is implemented as an
extension of Google Chrome. When the user selects a keyword in which he/she
is interested while browsing the Web, the program associated with the word is
displayed as a popup. Fig.2 shows an example in which the keyword “hyper-
tension” is selected on the text of a Webpage. When the user right-clicks on
the selected keyword, a knob describing the keyword string is displayed, and
related words linking to the keyword are displayed around the knob. If the user
then selects a related word such as “prevention,” a list of programs related to
“prevention of hypertension.” is then displayed. By clicking the program, the
user can then watch the video of the program or browse the program website.
Using the information from the concept maps, the system is also able to list the
cooking program that introduced a recipe using “vinegar” which is said to help
in the prevention of hypertension. Because the data-hub stores program infor-
mation about various genres, the cross-sectoral services that connect the various
                                                              Relation word “prevention”	


                                                                     List of program	


                     Selected word	

                                                                  The program about the
                                                                  "mineral" that helps in
                                   Knob of “hypertension”	
       the "prevention" of
                                                                  ”hypertension."	


                   Fig. 2. Screen shot of “My Health Dictionary”


programs can be realized by utilizing program-related knowledge such as from
the concept map or the content map.


5   Conclusion and future work
In this paper, we reported on a prototype LOD based program information data-
hub constructed by linking external knowledge with existing program informa-
tion. Further, we demonstrated a service example using the data-hub. In future
work, we plan to conduct studies on data-hub construction and utilization in
areas such as news, culture, and education, and establish an improved and so-
phisticated data-hub for actual service operation.


References
 1. Yves Raimond, Tristan Ferne, Michael Smethurst, Gareth Adams: The BBC World
    Service Archive prototype, Web Semantics: Science, Services and Agents on the
    World Wide Web, Vol.27–28, pp.2–9 (2014)
 2. Stijn De Saeger, Kentaro Torisawa, Jun’ichi Kazama, Kow Kuroda and Masaki
    Murata: Large Scale Relation Acquisition using Class Dependent Patterns, In
    Proceedings of the IEEE International Conference on Data Mining (ICDM’09)，
    pp.764–769(2009)
 3. Stijn De Saeger, Kentaro Torisawa, Masaaki Tsuchida, Jun ichi Kazama, Chikara
    Hashimoto, Ichiro Yamada, Jong-Hoon Oh, Istvan Varga, Yulan Yan,Relation Ac-
    quisition using Word Classes and Partial Patterns,In Proceedings of the Conference
    on Empirical Methods in Natural Language Processing (EMNLP 2011), pp.825-
    835(2011)