<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">SKYSHARK: A Benchmark with Real-world Data for Line-rate Stream Processing with FPGAs</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Maximilian</forename><surname>Langohr</surname></persName>
							<email>maximilian.langohr@fau.de</email>
							<affiliation key="aff0">
								<orgName type="department">Chair of Computer Science</orgName>
								<orgName type="institution">Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)</orgName>
								<address>
									<addrLine>Martensstraße 3</addrLine>
									<postCode>91058</postCode>
									<settlement>Erlangen</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Tim</forename><surname>Vogler</surname></persName>
							<email>tim.vogler@fau.de</email>
							<affiliation key="aff0">
								<orgName type="department">Chair of Computer Science</orgName>
								<orgName type="institution">Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)</orgName>
								<address>
									<addrLine>Martensstraße 3</addrLine>
									<postCode>91058</postCode>
									<settlement>Erlangen</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Klaus</forename><surname>Meyer-Wegener</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Chair of Computer Science</orgName>
								<orgName type="institution">Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)</orgName>
								<address>
									<addrLine>Martensstraße 3</addrLine>
									<postCode>91058</postCode>
									<settlement>Erlangen</settlement>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">SKYSHARK: A Benchmark with Real-world Data for Line-rate Stream Processing with FPGAs</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">937CEB796A6C8166ECC26653CB90A1D1</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:20+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>benchmark</term>
					<term>stream-processing-system</term>
					<term>FPGA</term>
					<term>hardware acceleration</term>
					<term>real-world data</term>
					<term>air-traffic control</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>To test and evaluate a heterogeneous stream-processing system consisting of an FPGA-based systemon-chip and a host, we develop a benchmark called SKYSHARK. It uses real-world data from air-traffic control that is publicly available. These data are enhanced for the purpose of the benchmark without changing their characteristics. They are further enriched with aircraft and airport data. We define 14 queries with respect to the particular requirements of our system. They should be useful for other hardware-accelerated platforms as well. A first evaluation has been done using Apache Flink. We envision a great potential because of the flexibility of the approach.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The ReProVide project <ref type="bibr" target="#b0">[1]</ref>, which is part of the DFG Priority Program SPP 2037 1 "Scalable Data Management for Future Hardware", engages an FPGA-based system-on-chip (SoC) to accelerate database queries that analyze large data sets. In order to test the system, queries from the well-known database benchmarks like TPC-DS have been used <ref type="bibr" target="#b1">[2]</ref>. In the second phase of the project, however, the scope has been extended to include stream-processing queries as well. Here, the field of benchmarks is more diverse. We selected queries from the Yahoo Streaming Benchmark <ref type="bibr" target="#b2">[3]</ref> and from RIoTBench <ref type="bibr" target="#b3">[4]</ref> in some evaluations, but they did not match the specifics of our accelerator well enough. While the processing can be done at line-rate, some of the stream-query operators (e. g., sort, join) cannot be implemented easily on an FPGA-based system and thus must be executed on the host system that runs a full stream-processing system (SPS). On the other hand, some operators can be integrated into a single accelerator on the FPGA, e. g., parsing, projection, and filter. The need to transfer data between the SoC and the host calls for early filtering. So we need a number of queries on data streams that are suitable for such a combined, heterogeneous SPS. Modifying one of the existing benchmarks turned out to be more effort than to define a new and perfectly matching one from scratch. This nourished the idea to use public flight data and utilize the queries that are already in use to analyze and visualize them. Hence, we designed a benchmark called SKYSHARK.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1.">Real-Time Aircraft Tracking Data</head><p>Benchmarks face a significant challenge in obtaining sufficient and realistic data for their queries. Existing benchmarks often rely on synthetic data generated through specialized tools, which may not accurately represent real-world conditions. Access to authentic production data is limited due to concerns over data privacy and trade secrets. SKYSHARK offers a promising solution by leveraging publicly available real-world data from the aviation domain. This domain provides a wealth of data as nearly every aircraft continuously broadcasts its position, speed, altitude, and more. It allows for thorough data collection and evaluation. The data is not encrypted and can be received using software-defined radios or purpose-built receivers. Commercial providers collect this type of data from official sources like the Federal Aviation Administration (FAA), Eurocontrol and their own network of ADS-B receivers. Additionally, open-source projects like OpenSky Network (OSN) <ref type="bibr" target="#b4">[5]</ref> rely on hobbyists to provide receivers, enabling the collection of tracking information. OSN has been developed by researchers for researchers, providing access to both historic and real-time tracking data. These data are primarily intended for research purposes in the aviation domain but are also publicly accessible for anyone to use.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2.">Contribution</head><p>Recognizing the opportunity presented by this data-rich environment, we created a new benchmark called SKYSHARK to leverage the real-world data for SPSs. It aims to overcome the limitations of synthetic data and to provide a more accurate representation of real-world conditions. While the first implementation has been done in the context of a Master's Thesis <ref type="bibr" target="#b5">[6]</ref>, it has been improved substantially since then.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Many benchmarks have been proposed and implemented for stream processing. A very good overview is given in <ref type="bibr" target="#b6">[7]</ref>. The purpose of such a benchmark can be different. Most common is the comparison of different systems. But some are also used to test a given system (which has been our original motivation) or to evaluate particular configuration settings. The early ones like Linear Road <ref type="bibr" target="#b7">[8]</ref> and NexMark <ref type="bibr" target="#b8">[9]</ref> are no longer used because they are too simple for today's SPSs. More recently, the Yahoo Streaming Benchmark (YSB) <ref type="bibr" target="#b9">[10]</ref> has been engaged by many projects. The stream data consists of reactions on ads on the Web. There is only one stream query composed from filter, projection, static join, and a time-based window. The windows together with counts and a timestamp are stored in a database. The original paper reports results on Flink, Storm, and Spark. A follow-up paper <ref type="bibr" target="#b2">[3]</ref> shows different results for Flink. The benchmark has just one query, with a precise definition only by code. The stream data are synthetic. Some of the specifics of our system are not addressed by this query, e. g., the option of using arithmetics. Karimov et al. <ref type="bibr" target="#b10">[11]</ref> propose a benchmarking framework called Rovio. It is similar to the YSB, trying to improve upon its criticism.</p><p>The benchmark RIoTBench is much larger than the YSB <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b11">12]</ref>. Queries (called applications) are defined as data-flow graphs, with tasks (operations) as nodes and edges for transporting messages between them. Tasks are given as the standard operations on streams. Unfortunately, the semantics are not precisely defined, one has to look at the implementation in GitHub. Streams are given with an input rate (messages per second), a distribution, which can be uniform, in bursts, sawtooth, normal, and bi-modal (e. g., day and night), and a message size. The benchmark first offers some micro-benchmarks, which are single tasks. It further defines IoT applications, which are a combination of tasks. All IoT input streams have elements with sensorId, timestamp, and n sensor values. The following four are provided with the benchmark: Sense Your City (CITY), NYC Taxi cab (TAXI), Energy dataset (GRID), and Mobile Health dataset (FIT). Real data are used here. However, the benchmark is very large. It does not provide queries, but rather whole applications. These cannot be implemented by SPSs alone.</p><p>StreamBench <ref type="bibr" target="#b12">[13]</ref> provides a much broader set of evaluations in four workload suites: performance, multi-recipient, fault tolerance, and durability. It uses real data as seed and generates the workload by repetition. The 7 benchmarks (queries, or applications) are: identity, sample, projection, grep, wordcount, distinctcount, and statistics. Extensive measurements have been made. It has been planned to put the benchmark on GitHub, but the project there with the same name has different authors and goals. Only the simple queries are suitable for our environment, and they do not address the specific challenges of our system.</p><p>The approach that is closest to our ideas is DSPBench <ref type="bibr" target="#b6">[7]</ref>. A low-level API is provided for unified development, so the workload can run anywhere (assuming an appropriate adapter that maps to the SPS-specific code). So-called probers inject timestamps in the tuples to track the latency, to count the number of received and sent tuples of an operator, to measure the time required for processing one tuple in an operator, and to calculate its throughput. Workload characterization is done by performance measurement instrumentation and source code analysis. 15 applications are defined as graphs. The benchmark uses real-world datasets. They are repeated if necessary, but with modifications. Some geographic calculations are also included. The benchmarked system runs on homogeneous nodes, while we have special operators running only on the FPGA. The semantics of the API is not given, but the code is available at GitHub. The timestamps are added to the tuples, which we tried to avoid. It is not clear which overhead is introduced by the measurements.</p><p>So while very good solutions are available already, we still felt a need to create another benchmark with the following characteristics: We insisted on using real data. Some extrapolations had to be done due to legal issues, but they do not change the characteristics. We have a larger set of queries, and they have been selected from existing applications w. r. t. line-rate processing on dedicated hardware. Queries are given in standard SQL (if possible) to define precise semantics. The benchmark is extensible: Other data can be downloaded, and new queries can be defined. (This may not be suitable for a benchmark, but definitely for testing specific systems.) Since the data are stored, a DMBS (Flight Schedule, Airports, Aircrafts) can also be evaluated. While all benchmarks so far are designed for homogeneous SPSs, either on a single node or distributed, our system is heterogeneous, consisting of one or more SoCs and a host.</p><p>It is our hope that other projects on FPGA-based stream processing can also benefit from our benchmark. There are a number of them; we just cite <ref type="bibr" target="#b13">[14]</ref> and <ref type="bibr" target="#b14">[15]</ref> as examples.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">SKYSHARK Reference Data Set</head><p>As mentioned earlier, one of our goals is to use data as realistic as possible for our benchmark. Unfortunately, organizations are generally reluctant to share precise data due to concerns over potential data-privacy breaches or loss of trade secrets. In this regard, SKYSHARK emerges as a viable solution by leveraging publicly available real-world data, which can be collected by anyone. This section provides an overview of key concepts and terminology related to Air Traffic Control (ATC). ATC is vital for ensuring safe and efficient air travel.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Technical Background</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1.">Radar</head><p>Primary Surveillance Radar (PSR) uses radio signals to detect objects by analyzing the delay and direction of the reflected signals <ref type="bibr" target="#b15">[16]</ref>. However, it cannot identify individual aircrafts and only displays them as blips on the radar screen. To address this limitation, Secondary Surveillance Radar (SSR) has been developed. It transmits an interrogation signal to aircrafts, which then respond with key information such as their unique identification code (ICAO<ref type="foot" target="#foot_0">2</ref> ), altitude, and squawk code (see 3.1.3). The SSR ground station combines this information with PSR data to provide a comprehensive radar display. Automatic Dependent Surveillance-Broadcast (ADS-B), a modern form of SSR, autonomously broadcasts aircraft information without being interrogated. It gathers data like position and velocity from internal systems such as GPS. ADS-B communication is not encrypted or authenticated, allowing even small general aviation aircrafts and others to receive these signals. This open communication protocol enhances safety and situational awareness for all aircrafts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2.">Aircraft Tracking Networks</head><p>Currently, there are several commercial flight-tracking services available, with Flightradar24 2 being the most popular of them. It offers real-time tracking, access to flight schedules, historical data, and detailed aircraft information. Other commercial networks like FlightAware<ref type="foot" target="#foot_1">3</ref> , Radar-Box <ref type="foot" target="#foot_2">4</ref> , and ADS-B Exchange<ref type="foot" target="#foot_3">5</ref> provide similar functionalities. Alongside with the commercial networks, the OpenSky Network (OSN) <ref type="bibr" target="#b4">[5]</ref> is a nonprofit research network. It plays a significant role in the benchmark. OSN has a network of over 1,800 receivers and has collected millions of MODE-S messages <ref type="foot" target="#foot_4">6</ref> from hundreds of thousands of aircrafts. While commercial networks often filter data, OSN provides unfiltered raw access to MODE-S/ADS-B messages received by its receivers. Initially covering the South of Germany and Switzerland, OSN has expanded to include the United States and Europe, but significant gaps still exist in other parts of the world.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.3.">Aircraft Tracking Data</head><p>To facilitate easier access to ADS-B data for researchers, an abstraction called state has been introduced. A state represents an ADS-B message from an aircraft. The following attributes are particularly important for the benchmark:</p><p>• icao24: Unique 24-bit ICAO identifier. Identifies the aircraft.</p><p>• time_position: Unix timestamp (seconds) of the last position update.</p><p>• latitude and longitude: Geographic coordinates of the aircraft's position.</p><p>• baro_altitude: Barometric altitude in meters. Can be null.</p><p>• velocity: Speed of the aircraft, measured in meters per second.</p><p>• squawk: Special transponder code, used as ID for ATC or to indicate emergencies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.4.">Additional Relational Meta-Data</head><p>To facilitate the integration of real-time tracking data with relational (batch) data, we focus on two additional sources of data: a database containing information about all aircrafts, and a database containing information about all airports. It is important to note that the data we utilize for this purpose are publicly available and sourced from open source communities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Aircraft Database:</head><p>The OSN maintains a comprehensive list of approx. 580,000 aircrafts belonging to 1,800 airlines, as of March 2023. It is important to note that due to the coverage limitations discussed, this list may not be complete. However, since our aircraft-tracking information is sourced from the same database, most states should have a corresponding aircraft entry in the database. The list is periodically exported as a CSV file, which is made available for download on the OSN dataset website.</p><p>Airport Database: Another valuable resource is OurAirports<ref type="foot" target="#foot_5">7</ref> , a community-maintained open-data website that manages a database of over 5,200 airports as of March 2023. This extensive collection encompasses airports of various sizes and types, ranging from major international airports to smaller seaplane or helicopter bases. The airport database can be downloaded in CSV format. It is worth noting that community-maintained databases may not always be up-to-date. However, this is acceptable for our purposes as we do not need absolute accuracy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Obtaining Real-Time Aircraft Tracking Data</head><p>The reference dataset that we have created by collecting and processing data from the OSN extends over a two-week period. OSN provides various data-retrieval options, including a REST API, Java and Python packages, and an Impala Shell. Access to the REST API is available both anonymously and as a registered user, with different credit-based mechanisms in place to prevent abuse. Anonymous users have a daily budget of 400 credits, registered users have a budget of 4,000 credits, and users with their own ADS-B receiver enjoy a larger budget of 8,000 credits. Within these limitations, we have been able to collect a comprehensive dataset using the SKYSHARK downloader. It allows users to gather a specified amount of data. Depending on the available credits, the downloader calculates the optimal interval between two API requests to achieve evenly spaced states. If users have an account and a daily budget of 4,000 credits, they can make 1,000 API requests per day. Distributing these 1,000 requests evenly over 24 hours would result in an average of approximately one request sent to OSN every 87 seconds. The states are stored as CSV files. The downloader can then convert the CSV file to a desired file format, e. g., JSON.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Generating Flight Schedules</head><p>Obtaining airline flight plans without significant financial investment is not feasible due to the high cost associated with commercial services. To overcome this issue, we create our own flight schedules by querying individual flights using the REST API provided by OSN. We first retrieve the callsigns which consist of an airline abbreviation (e. g., DLH for Lufthansa) and a flight number (e. g., 101). Private or military aircrafts may not have callsigns or their meaning is unknown. We then use the OSN REST API and the callsigns to retrieve flight information such as origin and destination airports. Not all flights with flight numbers have flights recorded in OSN. We determine flight takeoff and landing by monitoring the on_ground field in each state. Changes from true to false indicate takeoff, and vice versa indicate landing. By storing and rounding the timestamps of these transitions, we obtain a rough flight schedule that aligns with the recorded flights. It is important to note that this flight schedule is incomplete and not entirely accurate but serves the purpose of the benchmark.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Benchmark Design and Implementation</head><p>In this chapter, we provide an overview of the general design of the SKYSHARK benchmark. To accomplish this, we will first examine the benchmarking tool. Following that, we will present the benchmark metrics and the 14 SKYSHARK queries. As examples, we will discuss our considerations and constraints in designing two of the queries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Benchmarking Tool</head><p>The centerpiece of our benchmark is our custom-developed benchmarking tool. This tool allows to send the collected states to the SPS under test and to receive the resulting data. Additionally, it gathers various metrics that serve the purpose of comparing different systems and implementations with each other. In Fig. <ref type="figure" target="#fig_0">1</ref>, a schematic representation of our benchmarking tool is shown, and how it can be integrated with an SPS. Our tool provides a set of welldefined interfaces for connecting with the SPS. These interfaces include TCP, UDP, as well as adapters for Apache Kafka<ref type="foot" target="#foot_6">3</ref> and ZeroMQ <ref type="foot" target="#foot_7">4</ref> . When designing the benchmark, we decided not to measure the individual operators of the SPS but rather focus on the system as a whole. This decision was motivated by our intention to create a benchmark for systems that utilize modern hardware, where such operator-level measurements may not be feasible. Considering the expected data streams in the area of 10Gbps, measuring the individual operators would impose an additional burden on the system. Generating measurement data at the operator level, which would then need to be processed by our system, would introduce additional CPU, RAM, and potentially network overhead. This additional load would impact the performance of the SPS. The benchmarking tool loads the data from the drive and generates a data stream. Before sending to the SPS, a hash is computed using the icao24 and the time_position of the state. This key, along with a timestamp, is stored internally. Once the tuple reaches the benchmarking tool again, another hash is computed and stored with a timestamp. The result tuple is then saved to disk. In a process running concurrently with sending and receiving, the latencies of each tuple are calculated and stored for further analysis. Since some queries involve filtering tuples based on certain conditions, we cannot calculate the latency for all tuples. The calculation of latencies is only possible for queries that do not involve blocking operations, as this would disrupt the relationship between input and output. For these queries, only the throughput can be measured.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">SKYSHARK Metrics</head><p>As mentioned in Chapter 2, there is a variety of metrics that can be used for measurement purposes. For SKYSHARK, we specifically focus on two metrics.</p><p>End-to-End Latency: End-to-end latency plays a crucial role in the deployment of SPSs. Depending on the application the latency can have immense impact on the usability of the result. Therefore, we have identified this metric as an important measurement for SKYSHARK.</p><p>Throughput: Another significant metric in the field of stream processing is throughput. With our benchmarking tool, we can measure it with high precision, as we have full control over the incoming and outgoing tuples. Furthermore, using the aforementioned map where keys and timestamps are stored, we can measure the number of tuples that have been discarded during the process. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>No</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">SKYSHARK Queries</head><p>In addition to the data and the benchmarking tool, queries play a particularly important role in SKYSHARK. We have closely aligned the design of the queries with real-world aviation and airspace surveillance challenges. A total of 14 queries has been identified. These queries progressively increase in complexity and in demand from the system being measured. We have chosen to code the queries in standard SQL or, if that was impossible, in the extension of SQL developed by Apache Calcite. This extension allows for the definition of stream queries, where no standard is available yet. The goal was to provide a precise and formal description to eliminate misinterpretation. The actual code of the queries can be different in a particular SPS. The table in Fig. <ref type="figure">2</ref> presents a list of all queries, their intended focus, and an indication of whether latencies can be measured for these queries. A detailed listing of all queries, including their code and intention, can be found on our website 5 . To further illustrate our design decisions we present two exemplary queries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.1.">Using Basic Arithmetic to Solve Complex Problems</head><p>Query 9 of the SKYSHARK benchmark (refer to Fig. <ref type="figure">3</ref>) exemplifies the trade-offs we made in the query design process. One specific requirement from the ReProVide project is that only basic arithmetic operations can be computed on the FPGA. The purpose of this query is to filter out aircrafts within a certain radius around the airport. Ideally, the Euclidean distance would be calculated, which requires the use of trigonometric functions. However, due to the FPGA's limitations in performing these calculations, a decision was made to approximate the distance. The approximation method used assumes that the circumference of the Earth is the  Figure <ref type="figure">3</ref>: Query 9 -Airport Proximity same everywhere, allowing the cosine calculation to be replaced with a linear function. This approximation introduces a decrease in accuracy at the poles and near the equator. Nonetheless, for our specific purposes, this is acceptable as long as the selected airport is not located in these boundary areas. We believe that these types of calculations are better suited for offloading to FPGAs or other modern hardware, which are capable of handling more complex computations efficiently.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.2.">Joining Stream Data and Relational Data</head><p>The second query we take a closer look at is Query 13 (see Fig. <ref type="figure">4</ref>, "Diversion Airports"). In the event of an emergency, the pilot needs to know which airports are closest to the aircraft at all times. These airports can be used for an emergency landing if necessary. In this query, we not only have the complexity of distance calculation, but also involve joining each incoming tuple with the airport relation. Additionally, the join condition poses a particularly complex aspect. The topic of joining data on FPGAs is gaining popularity <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b17">18]</ref>, making such a query a great candidate for exploration in this area.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Evaluation</head><p>To test the functionality of our benchmarking tool, we have implemented several queries from the benchmark in Apache Flink. The measurements have been performed on a PC with an AMD Ryzen 9 5900X 12-core processor and 32GB DDR4 RAM. Throughout the testing, all queries were able to run at a throughput of 10Gbps. For Query 2, the average latency measured was 100ms, with occasional spikes reaching 180ms or more. Fig. <ref type="figure">5</ref> provides a visual representation of the latencies caused by Apache Flink. It is important to emphasize that these measured numbers are preliminary and require further refinement. A thorough evaluation of the feasibility of our benchmark is ongoing work. The hardware with the FPGA should be available within a few weeks.</p><p>SELECT STREAM i c a o 2 4 , c a l l s i g n , t i m e _ p o s i t i o n , i d e n t as a i r p o r t , 2 ( ( a i r p o r t s _ l a t − ( s t a t e _ l a t ) ) * ( a i r p o r t s _ l a t − ( s t a t e _ l a t ) ) ) 3 + ( ( a i r p o r t s _ l o n g − ( s t a t e s _ l o n g ) ) * ( a i r p o r t s _ l o n g − ( s t a t e s _ l o n g ) ) )</p><p>4 as d i s t a n c e _ t o _ a i r p o r t _ s q u a r e d 5 FROM ( SELECT s t a t e s . i c a o 2 4 , s t a t e s . c a l l s i g n , s t a t e s . t i m e _ p o s i t i o n , 6 ( s t a t e s . l a t i t u d e * 1 1 1 . 1 3 9 ) AS s t a t e _ l a t i t u d e , </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Future Work</head><p>With SKYSHARK, we have taken the first step in using real-time aircraft-tracking data as a source for benchmarks. Focusing on SPSs, we were mainly inspired by our project ReProVide and the FPGA used in it. Our next step is to extensively measure the prototype we developed as part of ReProVide using our new benchmark. Additionally, we want to complete and measure our reference implementation in Apache Flink as well, to compare our system with a homogeneous SPS. Additional queries could be identified, e. g., using match-recognize as proposed in <ref type="bibr" target="#b18">[19]</ref>.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Architecture of the SKYSHARK Benchmarking Tool</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>7 ( 8 AS 9 FROM 12 ( 13 FROMFigure 4 :Figure 5 :</head><label>789121345</label><figDesc>Figure 4: Query 13 -Diversion Airports</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">https://www.icao.int/ 2 https://www.flightradar24.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">https://de.flightaware.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">https://www.radarbox.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3">https://globe.adsbexchange.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4">MODE-S is a transponder standard, where each aircraft is broadcasting its 24-bit ICAO address.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_5">https://ourairports.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_6">https://kafka.apache.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_7">https://zeromq.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_8">A description of the benchmark is available at https://skyshark.org.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work has been supported by the German Science Foundation (Deutsche Forschungsgemeinschaft, DFG) as a part of SPP 2037 with the grant no. ME 943/9-2. We thank the reviewers for their valuable hints!</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Online Resources</head></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">ReProVide: Towards utilizing heterogeneous partially reconfigurable architectures for near-memory data processing</title>
		<author>
			<persName><forename type="first">A</forename><surname>Becher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Herrmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wildermann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Teich</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. BTW -Workshopband</title>
				<meeting>BTW -Workshopband<address><addrLine>Bonn</addrLine></address></meeting>
		<imprint>
			<publisher>Gesellschaft für Informatik</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="51" to="70" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">SQL query processing using an integrated FPGA-based near-data accelerator in ReProVide (demo paper</title>
		<author>
			<persName><forename type="first">L</forename><surname>Beena Gopalakrishnan Nair</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Becher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Meyer-Wegener</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wildermann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Teich</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. EDBT</title>
				<meeting>EDBT</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="639" to="642" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Grier</surname></persName>
		</author>
		<ptr target="https://www.ververica.com/blog/extending-the-yahoo-streaming-benchmark" />
		<title level="m">Extending the Yahoo! Streaming Benchmark</title>
				<imprint>
			<date type="published" when="2016-06-14">2016. June 14, 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Shukla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chaturvedi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Simmhan</surname></persName>
		</author>
		<idno>CoRR abs/1701.08530</idno>
		<title level="m">RIoTBench: A real-time IoT benchmark for distributed stream processing platforms</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Bringing up OpenSky: A large-scale ADS-B sensor network for research</title>
		<author>
			<persName><forename type="first">M</forename><surname>Schäfer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Strohmeier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Lenders</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Martinovic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wilhelm</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 13th Int. Symp. on Information Processing in Sensor Networks (IPSN)</title>
				<meeting>13th Int. Symp. on Information essing in Sensor Networks (IPSN)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="83" to="94" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Vogler</surname></persName>
		</author>
		<title level="m">Development and Implementation of a Database-Benchmark Using Real-Time Flight Data (ADS-B) and Flight Schedules</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
		<respStmt>
			<orgName>FAU</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Master&apos;s thesis</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">DSPBench: A suite of benchmark applications for distributed data stream processing systems</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">V</forename><surname>Bordin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Griebler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Mencagli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">F R</forename><surname>Geyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">G L</forename><surname>Fernandes</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="222900" to="222917" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Linear road: A stream data management benchmark</title>
		<author>
			<persName><forename type="first">A</forename><surname>Arasu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cherniack</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">F</forename><surname>Galvez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Maier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Maskey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ryvkina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Stonebraker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Tibbetts</surname></persName>
		</author>
		<ptr target="http://www.vldb.org/conf/2004/RS12P1.PDF" />
	</analytic>
	<monogr>
		<title level="m">Proc. VLDB</title>
				<meeting>VLDB</meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="480" to="491" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><surname>Nexmark</surname></persName>
		</author>
		<ptr target="http://datalab.cs.pdx.edu/niagara/NEXMark/" />
		<title level="m">Nexmark benchmark</title>
				<imprint>
			<date type="published" when="2002-06-15">2002. June 15, 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Chintapalli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dagit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Evans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Farivar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Graves</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Holderbaugh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Nusbaum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Patil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">J</forename><surname>Peng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Poulosky</surname></persName>
		</author>
		<ptr target="https://developer.yahoo.com/blogs/135370591481/" />
		<title level="m">Benchmarking streaming computation engines at Yahoo!, Online, yahoo! developer</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Benchmarking distributed stream data processing systems</title>
		<author>
			<persName><forename type="first">J</forename><surname>Karimov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rabl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Katsifodimos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Samarev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Heiskanen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Markl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. ICDE</title>
				<meeting>ICDE</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1507" to="1518" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">RIoTBench: An IoT benchmark for distributed stream processing systems</title>
		<author>
			<persName><forename type="first">A</forename><surname>Shukla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chaturvedi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Simmhan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Concurrency and Computation: Practice and Experience</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="page">e4257</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">StreamBench: Towards benchmarking modern distributed stream computing frameworks</title>
		<author>
			<persName><forename type="first">R</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 7th IEEE/ACM Int. Conf. on Utility and Cloud Computing</title>
				<meeting>7th IEEE/ACM Int. Conf. on Utility and Cloud Computing<address><addrLine>UCC, London, United Kingdom</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE Computer Society</publisher>
			<date type="published" when="2014-11">Dec. 8-11. 2014</date>
			<biblScope unit="page" from="69" to="78" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">When FPGA-Accelerator meets stream data processing in the Edge</title>
		<author>
			<persName><forename type="first">S</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ibrahim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. ICDCS</title>
				<meeting>ICDCS</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1818" to="1829" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Hardware Accelerated Stream Processing</title>
		<author>
			<persName><forename type="first">M</forename><surname>Najafi</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
		<respStmt>
			<orgName>TU München ; Fakultät für Informatik</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Ph.D. thesis</note>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Stevens</surname></persName>
		</author>
		<ptr target="https://books.google.de/books?id=qH9TAAAAMAAJ" />
		<title level="m">Secondary Surveillance Radar, Artech House radar library</title>
				<imprint>
			<publisher>Artech House</publisher>
			<date type="published" when="1988">1988</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">FPGA-accelerated hash join operation for relational databases</title>
		<author>
			<persName><forename type="first">M.-T</forename><surname>Xue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q.-J</forename><surname>Xing</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z.-G</forename><surname>Ma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Circuits and Systems II: Express Briefs</title>
		<imprint>
			<biblScope unit="volume">67</biblScope>
			<biblScope unit="page" from="1919" to="1923" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Accelerating hashbased query processing operations on FPGAs by a hash table caching technique</title>
		<author>
			<persName><forename type="first">B</forename><surname>Salami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Arcas-Abella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Sonmez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Unsal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>Kestelman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Communications in Computer and Information Science</title>
				<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="131" to="145" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Pretty fly for a VAT GUI: visualizing event patterns for flight data</title>
		<author>
			<persName><forename type="first">C</forename><surname>Beilschmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Drönner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Glombiewski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Heigele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Holznigenkemper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Isenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Körber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mattig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Morgen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Seeger</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. DEBS</title>
				<meeting>DEBS</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="224" to="227" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
