=Paper= {{Paper |id=Vol-1481/paper11 |storemode=property |title=Creating and Using Sports Linked Data: Analytics and Applications |pdfUrl=https://ceur-ws.org/Vol-1481/paper11.pdf |volume=Vol-1481 |dblpUrl=https://dblp.org/rec/conf/i-semantics/PhilippidesBCTV15 }} ==Creating and Using Sports Linked Data: Analytics and Applications== https://ceur-ws.org/Vol-1481/paper11.pdf
                        Creating and Using Sports Linked Data:
                              Applications and Analytics
        Panagiotis-Marios                              Charalampos Bratsas                                      Andreas Veglis
           Philippides                                    Mathematics Department
                                                                                                          School of Journalism & Mass
           OKF Greece                              Aristotle University of Thessaloniki
                                                                                                                Communications,
        Thessaloniki, Greece                                  OKF Greece
                                                                                                       Aristotle University of Thessaloniki
                                                          Thessaloniki, Greece
 filippidis.okfgr@gmail.com                                                                                  veglis@jour.auth.gr
                                                  charalampos.bratsas@okfn.com

    Evangelos Chondrokostas                                 Dimitra Tsigari                                    Ioannis Antoniou
     Mathematics Department                              Mathematics Department                            Mathematics Department
 Aristotle University ofThessaloniki               Aristotle University of Thessaloniki              Aristotle University of Thessaloniki
    echondrok@gmail.com                                 dimitra.tsi@gmail.com                             iantonio@math.auth.gr
                                                                               situations, performance and results in ways and methods that can
                                                                               potentially then be used in other scientific fields.
ABSTRACT
                                                                                    A typical example is basketball, a sport full of statistics that
      Linked data have made significant progress over the last few             can largely be represented through the boxscore, a table
years and many kinds of datasets are transformed into this format              containing the performance in every statistical category for every
at a highly increasing rate, contributing to the openness,                     player and team of a game. This amount of statistical data is
connectivity and re-use of web data. However, this progress is not             sufficiently large so that very precise and detailed analytics about
the case for a popular sport like basketball, at least as far as the           the sport of basketball can be made. However, basketball data are
raw statistics is concerned. This kind of data contains valuable               not usually available in large quantities, at least in their raw form
information that can be used by fans, teams and coaches,                       and this leads the related scientific researches to devote a very
statisticians and other scientists. In this work, statistical data from        large part of their time in searching for these data, that is not easy
Euroleague are transformed into linked data, thereby filling the               afterwards to share, or to link with similar data. That generates the
relevant gap in the LOD Cloud, while ways of exploitation of                   need to transform such data in linked data form and this is the
them are presented, from fascinating applications for the fans, like           primary subject of this work, using Euroleague statistics, the top
the Euroleague Timeline, to cases of complex processing, analysis              European basketball competition for clubs.
and visualization of data through software like R.
                                                                                    The benefits of the semantic enrichment of these data is more
Categories and Subject Descriptors                                             than obvious, since the openness of large volumes of structured
[World Wide Web]: Web data description languages – Resource                    data is valuable not only to the coaches, statisticians and other
Description Framework (RDF)                                                    scientists of basketball, who could have an easy and direct access
                                                                               to data relevant to their job, but also to a large audience such as
[Information Retrieval]: Document representation – Document
                                                                               basketball fans, who could make in depth analyses of their
structure, Ontologies
                                                                               favourite sport on their own. The statistical nature of these data
[Probability and Statistics] : Statistical paradigms – Statistical             increases the value of their openness, since they can be processed
Graphics, Exploratory Data Analysis                                            in many ways, from anyone interested, to lead to further inference
                                                                               about the sport. The linked data technologies themselves include
General Terms                                                                  means by which such information can be easily used and
Measurement, Design, Experimentation.                                          processed.
                                                                                    Besides the statistics of boxscores, additional data relating to
Keywords                                                                       the games of the competition such as the court and the date they
Sports Open Data, Linked Data, Analytics, Data Visualizations                  took place have been transformed into linked data form too. This
                                                                               kind of information is essential and can link basketball data to
                                                                               other LOD datasets in many ways, so further information about
                                                                               games, teams or players can be reached and retrieved. This
1. INTRODUCTION                                                                connection complements the statistical information and enriches
     Sports data can be valuable not only to anyone related to                 the provided knowledge, while, a common way of modeling such
sports, but also to the scientific community, because the statistical          data can benefit the comparison of similar data and increases their
information they include can widely describe what has happened                 processing, analysis and visualization capabilities in favor of
in a sports game. Processing, modeling and visualization of these              every stakeholder of the sport. Additionally, it provides a
data can benefit areas such as sports analysis, either from the                complete informational source for creating fascinating
perspective of the players and their performance, or from the                  applications for basketball fans and encourages in turn initiatives
perspective of coaches and their tactics and can draw inference                to create and make use of linked data, thus benefits the LOD
about the finding of the best players and teams, the detection of              cloud itself, with further data enrichment and linkage. One such
the sport's important elements, or the prediction of game




                                                                          38
                                                       Figure 1. System Architecture

application is the Euroleague Timeline, as presented below, while            properties such as the teams of a game, the final score, the date
some examples of data analytics and visualizations of basketball             and time, the court and the week it took place. Other basic classes
data are introduced, as the second subject of this work.                     is the Phase class, containing the name of the phase and its
                                                                             starting and ending week, the Group class, containing teams and
     The entire system architecture is shown in Figure 1. The first
                                                                             the phase of the competition, the Team class, with the names of
stage is about data retrieval and processing, while the second
                                                                             teams, their players and their courts, the Player class, which
stage involves the creation of linked data. Then, these data can be
                                                                             contains the names of players and the teams they are part of and
used in many ways, like the Euroleague Timeline or analytics and
                                                                             the Court class, including the name and the geo-coordinates of the
visualizations through software like R, while they are also linked
                                                                             court. The whole ontology schema is illustrated in Figure 2.
with the LOD Cloud and especially DBpedia.
                                                                                  Based mainly on this ontology, but also using additional
2. EUROLEAGUE LINKED DATA                                                    ontologies such as foaf1, skos2, event3 and timeline4, the mapping
                                                                             of relational data to RDF was made, leveraging the quad map
    The whole procedure of creating Euroleague linked data                   patterns of Virtuoso. A unique RDF graph for each season of the
from raw data statistics is presented in this section.                       competition has been created.

2.1 Creating the RDF Graphs                                                       Updating data with new games and statistics needs to
                                                                             perform almost the whole procedure, only for the specific amount
      Initially, there was a handcrafted extraction of data from the         of data, namely, the data retrieval and processing tasks, the
official website of Euroleague. Basic data of the games stored               insertion to the database, although different php files have to be
directly in databases, while boxscores statistics initially saved in         executed in order to update the database and finally, the recreation
text files to undergo the necessary processing. The cleaning and             of the year's rdf graph.
filtering process of the extracted boxscores involved tasks such as
insertion of delimiter characters between the statistics, renaming           2.2 Datacube Integration
"team" value with the corresponding team name, filling of empty
values with zero values and separation of each shooting column                     The next step is to integrate the Datacube vocabulary5, the
(eg 10/11 2FG means 10 made two pointers and 11 attempted two                most appropriate ontology on statistical data. This process is a
pointers). After that, the statistics data of text files stored in           work in progress, the structure of the cube however has been quite
databases too. For each season of the competition a unique                   defined. Most of the dimensions of the cube concern mainly the
database has been created. The main tables of a database are the             games data (date, time, teams, court, etc), but there is an extra
boxscores table, containing the statistical information, so that each        dimension, the player dimension, referring to a subject of the
row of the table is a statline of a player or a team in a Euroleague         statistics recorded in the game which is identified by the other
game and the schedule table, containing the basic data of every              dimensions. Since the statistics of the boxscore (points, rebounds,
game, such as time and date. Additional key tables have been                 turnovers etc) have been defined as the measures of the cube, each
created about the teams, the players, the phases of the                      observation is a statline of a player or a team, defined by the game
competition, the groups of teams and the courts. The structure and           it's been recorded. There are additional attributes in the cube that
data of all databases then stored in a Virtuoso Server.                      provide supplementary information, such as the player's team in a
      The first step in transforming Euroleague data in linked data          game, the number of his jersey and the unit of measure that is
was the creation of an ontology, under which the mapping of                  defined separately for every statistical measure.
relational data to RDF would take place. The basic classes created
                                                                             1
in the ontology are conceptually related with the main database                  http://xmlns.com/foaf/0.1/
tables, like the Statline class, whose properties are similar to the         2
                                                                                 http://www.w3.org/2004/02/skos/core#
columns of the boxscores table, namely statistics such as the
                                                                             3
points of the player or team in a game. Statline class has                       http://purl.org/NET/c4dm/event.owl#
additional properties, such as the game and the week the statline            4
                                                                                 http://purl.org/NET/c4dm/timeline.owl#
has been recorded. The same holds for the Game class, containing             5
                                                                                 http://purl.org/linked-data/cube#

                                                                        39
                                                               Figure 2. Ontology Schema
     The dimensions of the cube may have as range the classes of            virtuoso containing the Euroleague data and on the DBpedia
the ontology that has already been created and thus contain, in             endpoint, to extract further information on players and teams. The
this way, its properties, apart from the cube, while some code lists        final result contains all the relevant information (basic data,
have been created for specific dimensions such as the season, the           statistics, additional data from DBpedia) and is shown in Figure 3.
week, the phase, the group and the time dimension. Meanwhile,               The application is online at wiki.el.dbpedia.org/apps/Euroleague.
some slices that is likely to be used refer to players, teams, weeks
and games, with each slice containing the relevant information              3.2 Data Visualizations
and finally, all the above concepts have been defined as skos
                                                                                 The plethora of statistical data that has been transformed into
concepts, according to the Datacube specifications.
                                                                            linked data is suitable for data processing and analytics and this
3. APPLICATIONS AND ANALYTICS                                               task was carried through R Studio, which can make sparql queries
                                                                            to any endpoint through its packages. The retrieved data may then
     The Euroleague linked data that have been created can be               undergo any mathematical processing and visualization. Extra
exploited in many ways, such as applications and analytics, as              visualization capabilities are enabled through the R Shiny
presented below.                                                            package, via its widgets, while the R Shiny Dashboard allows
                                                                            handling many visualizations that interact, simultaneously, thus
3.1 Euroleague Timeline                                                     serving as a complete information visualization framework, that
                                                                            can utilize other applications as well to display information such
      An application that makes the most of this work and all the
                                                                            as Google Vis.
advantages of linked data is the Euroleague Timeline, which is a
                                                                                 Using this technology, some visualizations exploiting
timeline of the results of Euroleague games, containing
                                                                            Euroleague linked data have been generated, providing useful
information both on the basic data of the games and on their
                                                                            information and insights for basketball fans or even coaches such
boxscores. The Euroleague Timeline involves two types of
                                                                            as:
timelines, the team timeline, including all the games of a team in
the competition and the season timeline, containing all the games                     Table of rosters of teams for each season with the
of a season. In any case, games are displayed in a time series and                     average statistics of the players
after the user selection of a season or a team, he or she can                        Points and shots distributions of teams along with their
navigate through the games either consecutively, or by selection,                     shooting percentages
via the special time bar featured by the Timeline, which contains
                                                                                     Relations between turnovers and points and fouls and
all the games of the season or the team.
                                                                                      points for the teams of a game, for all games of a season
      The Euroleague Timeline is based on the TimelineJS, which                      Graph of the results of teams in the competition
loads json files to get and display the information, so that was the
file format needed to extract data from Virtuoso. A separate json                    Players comparison via diagrams, based on their
file has been created for every team and every season, while it is                    average statistics
possible to update them with new games data. The information                         Map containing the teams of each season and each
stored in each json file and appearing in the timeline is retrieved                   phase of the competition, with additional information
through a series of queries, both on the sparql endpoint of                           from dbpedia




                                                                       40
3.3 Further Analytics
     Besides these visualizations, Euroleague statistics are used
for further mathematical processing and analysis in order to
examine measures, ratings and relations that could yield useful
results. Some examples already done on these data by this work
are:
         research on the teamwork of teams and its relation to
          their success, relying on categories like assists, points
          and turnovers
         research on individual defensive actions and on
          relations between the steals, the blocks and the fouls,
          along with predicting the number of steals and blocks of
          a player under the fouls and the court he plays at
          (home/away)
         Evaluating the best players per position on the basis of
          normalized equations of their statistics
         Creating and analyzing a network of players who have
          played in Euroleague
         research on the correlation of the shooting percentages
          of the two teams of a game with the final result and their
          points difference

4. FUTURE WORK
     A large volume of basketball data has been transformed into
linked data, however it could be further enriched, especially with                    Figure 3. A game in the Euroleague Timeline
the play-by-plays of games, which contain all the actions of the
players that are statistically recorded, in order of time. This would        creating fascinating applications, like the Euroleague Timeline, or
increase significantly the information processing, analysis and              by processing and analyzing these statistics through R, to draw
visualization capabilities. The examples that have been made so              useful inference and display the corresponding diagrams, thus
far in this work is only the beginning and there are still countless         demonstrating the enormous range of capabilities offered by
topics on these data that can be explored, as well as many other             linked data in a sport that is full of statistical information.
ways of analysis. Their combination is the step forward and can
lead to applications and results that will reveal and provide                6. REFERENCES
additional knowledge on basketball, which would be readily                   [1] Bizer C., Heath T., and Berners-Lee T. 2009. Linked Data -
accessible to every fan, through tools that leverage linked data.                the story so far. Int. J. Semantic Web Inf. Syst, 5(3):1-22
                                                                             [2] Klyne G. and Carroll J. 2004. Resource Description
5. CONCLUSION                                                                    Framework (RDF): Concepts and Abstract Syntax.
     Basketball linked data offer a variety of possibilities in
sports, statistical and scientific field, because of their large             [3] Lehmann J., Bizer C., Kobilarov G., Auer S., Becker C.,
volume of statistics. This work transforms Euroleague basketball                 Cyganiak R., and Hellmann S. 2009. DBpedia - a
data into linked data to enrich the LOD Cloud with valuable                      crystallization point for the web of data. Journal of Web
sports statistics and to utilize these data in various ways, such as             Semantics, 7(3): 154-16




                                                                        41