=Paper=
{{Paper
|id=Vol-1481/paper11
|storemode=property
|title=Creating and Using Sports Linked Data: Analytics and Applications
|pdfUrl=https://ceur-ws.org/Vol-1481/paper11.pdf
|volume=Vol-1481
|dblpUrl=https://dblp.org/rec/conf/i-semantics/PhilippidesBCTV15
}}
==Creating and Using Sports Linked Data: Analytics and Applications==
Creating and Using Sports Linked Data:
Applications and Analytics
Panagiotis-Marios Charalampos Bratsas Andreas Veglis
Philippides Mathematics Department
School of Journalism & Mass
OKF Greece Aristotle University of Thessaloniki
Communications,
Thessaloniki, Greece OKF Greece
Aristotle University of Thessaloniki
Thessaloniki, Greece
filippidis.okfgr@gmail.com veglis@jour.auth.gr
charalampos.bratsas@okfn.com
Evangelos Chondrokostas Dimitra Tsigari Ioannis Antoniou
Mathematics Department Mathematics Department Mathematics Department
Aristotle University ofThessaloniki Aristotle University of Thessaloniki Aristotle University of Thessaloniki
echondrok@gmail.com dimitra.tsi@gmail.com iantonio@math.auth.gr
situations, performance and results in ways and methods that can
potentially then be used in other scientific fields.
ABSTRACT
A typical example is basketball, a sport full of statistics that
Linked data have made significant progress over the last few can largely be represented through the boxscore, a table
years and many kinds of datasets are transformed into this format containing the performance in every statistical category for every
at a highly increasing rate, contributing to the openness, player and team of a game. This amount of statistical data is
connectivity and re-use of web data. However, this progress is not sufficiently large so that very precise and detailed analytics about
the case for a popular sport like basketball, at least as far as the the sport of basketball can be made. However, basketball data are
raw statistics is concerned. This kind of data contains valuable not usually available in large quantities, at least in their raw form
information that can be used by fans, teams and coaches, and this leads the related scientific researches to devote a very
statisticians and other scientists. In this work, statistical data from large part of their time in searching for these data, that is not easy
Euroleague are transformed into linked data, thereby filling the afterwards to share, or to link with similar data. That generates the
relevant gap in the LOD Cloud, while ways of exploitation of need to transform such data in linked data form and this is the
them are presented, from fascinating applications for the fans, like primary subject of this work, using Euroleague statistics, the top
the Euroleague Timeline, to cases of complex processing, analysis European basketball competition for clubs.
and visualization of data through software like R.
The benefits of the semantic enrichment of these data is more
Categories and Subject Descriptors than obvious, since the openness of large volumes of structured
[World Wide Web]: Web data description languages – Resource data is valuable not only to the coaches, statisticians and other
Description Framework (RDF) scientists of basketball, who could have an easy and direct access
to data relevant to their job, but also to a large audience such as
[Information Retrieval]: Document representation – Document
basketball fans, who could make in depth analyses of their
structure, Ontologies
favourite sport on their own. The statistical nature of these data
[Probability and Statistics] : Statistical paradigms – Statistical increases the value of their openness, since they can be processed
Graphics, Exploratory Data Analysis in many ways, from anyone interested, to lead to further inference
about the sport. The linked data technologies themselves include
General Terms means by which such information can be easily used and
Measurement, Design, Experimentation. processed.
Besides the statistics of boxscores, additional data relating to
Keywords the games of the competition such as the court and the date they
Sports Open Data, Linked Data, Analytics, Data Visualizations took place have been transformed into linked data form too. This
kind of information is essential and can link basketball data to
other LOD datasets in many ways, so further information about
games, teams or players can be reached and retrieved. This
1. INTRODUCTION connection complements the statistical information and enriches
Sports data can be valuable not only to anyone related to the provided knowledge, while, a common way of modeling such
sports, but also to the scientific community, because the statistical data can benefit the comparison of similar data and increases their
information they include can widely describe what has happened processing, analysis and visualization capabilities in favor of
in a sports game. Processing, modeling and visualization of these every stakeholder of the sport. Additionally, it provides a
data can benefit areas such as sports analysis, either from the complete informational source for creating fascinating
perspective of the players and their performance, or from the applications for basketball fans and encourages in turn initiatives
perspective of coaches and their tactics and can draw inference to create and make use of linked data, thus benefits the LOD
about the finding of the best players and teams, the detection of cloud itself, with further data enrichment and linkage. One such
the sport's important elements, or the prediction of game
38
Figure 1. System Architecture
application is the Euroleague Timeline, as presented below, while properties such as the teams of a game, the final score, the date
some examples of data analytics and visualizations of basketball and time, the court and the week it took place. Other basic classes
data are introduced, as the second subject of this work. is the Phase class, containing the name of the phase and its
starting and ending week, the Group class, containing teams and
The entire system architecture is shown in Figure 1. The first
the phase of the competition, the Team class, with the names of
stage is about data retrieval and processing, while the second
teams, their players and their courts, the Player class, which
stage involves the creation of linked data. Then, these data can be
contains the names of players and the teams they are part of and
used in many ways, like the Euroleague Timeline or analytics and
the Court class, including the name and the geo-coordinates of the
visualizations through software like R, while they are also linked
court. The whole ontology schema is illustrated in Figure 2.
with the LOD Cloud and especially DBpedia.
Based mainly on this ontology, but also using additional
2. EUROLEAGUE LINKED DATA ontologies such as foaf1, skos2, event3 and timeline4, the mapping
of relational data to RDF was made, leveraging the quad map
The whole procedure of creating Euroleague linked data patterns of Virtuoso. A unique RDF graph for each season of the
from raw data statistics is presented in this section. competition has been created.
2.1 Creating the RDF Graphs Updating data with new games and statistics needs to
perform almost the whole procedure, only for the specific amount
Initially, there was a handcrafted extraction of data from the of data, namely, the data retrieval and processing tasks, the
official website of Euroleague. Basic data of the games stored insertion to the database, although different php files have to be
directly in databases, while boxscores statistics initially saved in executed in order to update the database and finally, the recreation
text files to undergo the necessary processing. The cleaning and of the year's rdf graph.
filtering process of the extracted boxscores involved tasks such as
insertion of delimiter characters between the statistics, renaming 2.2 Datacube Integration
"team" value with the corresponding team name, filling of empty
values with zero values and separation of each shooting column The next step is to integrate the Datacube vocabulary5, the
(eg 10/11 2FG means 10 made two pointers and 11 attempted two most appropriate ontology on statistical data. This process is a
pointers). After that, the statistics data of text files stored in work in progress, the structure of the cube however has been quite
databases too. For each season of the competition a unique defined. Most of the dimensions of the cube concern mainly the
database has been created. The main tables of a database are the games data (date, time, teams, court, etc), but there is an extra
boxscores table, containing the statistical information, so that each dimension, the player dimension, referring to a subject of the
row of the table is a statline of a player or a team in a Euroleague statistics recorded in the game which is identified by the other
game and the schedule table, containing the basic data of every dimensions. Since the statistics of the boxscore (points, rebounds,
game, such as time and date. Additional key tables have been turnovers etc) have been defined as the measures of the cube, each
created about the teams, the players, the phases of the observation is a statline of a player or a team, defined by the game
competition, the groups of teams and the courts. The structure and it's been recorded. There are additional attributes in the cube that
data of all databases then stored in a Virtuoso Server. provide supplementary information, such as the player's team in a
The first step in transforming Euroleague data in linked data game, the number of his jersey and the unit of measure that is
was the creation of an ontology, under which the mapping of defined separately for every statistical measure.
relational data to RDF would take place. The basic classes created
1
in the ontology are conceptually related with the main database http://xmlns.com/foaf/0.1/
tables, like the Statline class, whose properties are similar to the 2
http://www.w3.org/2004/02/skos/core#
columns of the boxscores table, namely statistics such as the
3
points of the player or team in a game. Statline class has http://purl.org/NET/c4dm/event.owl#
additional properties, such as the game and the week the statline 4
http://purl.org/NET/c4dm/timeline.owl#
has been recorded. The same holds for the Game class, containing 5
http://purl.org/linked-data/cube#
39
Figure 2. Ontology Schema
The dimensions of the cube may have as range the classes of virtuoso containing the Euroleague data and on the DBpedia
the ontology that has already been created and thus contain, in endpoint, to extract further information on players and teams. The
this way, its properties, apart from the cube, while some code lists final result contains all the relevant information (basic data,
have been created for specific dimensions such as the season, the statistics, additional data from DBpedia) and is shown in Figure 3.
week, the phase, the group and the time dimension. Meanwhile, The application is online at wiki.el.dbpedia.org/apps/Euroleague.
some slices that is likely to be used refer to players, teams, weeks
and games, with each slice containing the relevant information 3.2 Data Visualizations
and finally, all the above concepts have been defined as skos
The plethora of statistical data that has been transformed into
concepts, according to the Datacube specifications.
linked data is suitable for data processing and analytics and this
3. APPLICATIONS AND ANALYTICS task was carried through R Studio, which can make sparql queries
to any endpoint through its packages. The retrieved data may then
The Euroleague linked data that have been created can be undergo any mathematical processing and visualization. Extra
exploited in many ways, such as applications and analytics, as visualization capabilities are enabled through the R Shiny
presented below. package, via its widgets, while the R Shiny Dashboard allows
handling many visualizations that interact, simultaneously, thus
3.1 Euroleague Timeline serving as a complete information visualization framework, that
can utilize other applications as well to display information such
An application that makes the most of this work and all the
as Google Vis.
advantages of linked data is the Euroleague Timeline, which is a
Using this technology, some visualizations exploiting
timeline of the results of Euroleague games, containing
Euroleague linked data have been generated, providing useful
information both on the basic data of the games and on their
information and insights for basketball fans or even coaches such
boxscores. The Euroleague Timeline involves two types of
as:
timelines, the team timeline, including all the games of a team in
the competition and the season timeline, containing all the games Table of rosters of teams for each season with the
of a season. In any case, games are displayed in a time series and average statistics of the players
after the user selection of a season or a team, he or she can Points and shots distributions of teams along with their
navigate through the games either consecutively, or by selection, shooting percentages
via the special time bar featured by the Timeline, which contains
Relations between turnovers and points and fouls and
all the games of the season or the team.
points for the teams of a game, for all games of a season
The Euroleague Timeline is based on the TimelineJS, which Graph of the results of teams in the competition
loads json files to get and display the information, so that was the
file format needed to extract data from Virtuoso. A separate json Players comparison via diagrams, based on their
file has been created for every team and every season, while it is average statistics
possible to update them with new games data. The information Map containing the teams of each season and each
stored in each json file and appearing in the timeline is retrieved phase of the competition, with additional information
through a series of queries, both on the sparql endpoint of from dbpedia
40
3.3 Further Analytics
Besides these visualizations, Euroleague statistics are used
for further mathematical processing and analysis in order to
examine measures, ratings and relations that could yield useful
results. Some examples already done on these data by this work
are:
research on the teamwork of teams and its relation to
their success, relying on categories like assists, points
and turnovers
research on individual defensive actions and on
relations between the steals, the blocks and the fouls,
along with predicting the number of steals and blocks of
a player under the fouls and the court he plays at
(home/away)
Evaluating the best players per position on the basis of
normalized equations of their statistics
Creating and analyzing a network of players who have
played in Euroleague
research on the correlation of the shooting percentages
of the two teams of a game with the final result and their
points difference
4. FUTURE WORK
A large volume of basketball data has been transformed into
linked data, however it could be further enriched, especially with Figure 3. A game in the Euroleague Timeline
the play-by-plays of games, which contain all the actions of the
players that are statistically recorded, in order of time. This would creating fascinating applications, like the Euroleague Timeline, or
increase significantly the information processing, analysis and by processing and analyzing these statistics through R, to draw
visualization capabilities. The examples that have been made so useful inference and display the corresponding diagrams, thus
far in this work is only the beginning and there are still countless demonstrating the enormous range of capabilities offered by
topics on these data that can be explored, as well as many other linked data in a sport that is full of statistical information.
ways of analysis. Their combination is the step forward and can
lead to applications and results that will reveal and provide 6. REFERENCES
additional knowledge on basketball, which would be readily [1] Bizer C., Heath T., and Berners-Lee T. 2009. Linked Data -
accessible to every fan, through tools that leverage linked data. the story so far. Int. J. Semantic Web Inf. Syst, 5(3):1-22
[2] Klyne G. and Carroll J. 2004. Resource Description
5. CONCLUSION Framework (RDF): Concepts and Abstract Syntax.
Basketball linked data offer a variety of possibilities in
sports, statistical and scientific field, because of their large [3] Lehmann J., Bizer C., Kobilarov G., Auer S., Becker C.,
volume of statistics. This work transforms Euroleague basketball Cyganiak R., and Hellmann S. 2009. DBpedia - a
data into linked data to enrich the LOD Cloud with valuable crystallization point for the web of data. Journal of Web
sports statistics and to utilize these data in various ways, such as Semantics, 7(3): 154-16
41