<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Evaluation of Data Transfer Methods for Block-based Realtime Audio Processing with CUDA</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Christoph</forename><surname>Kuhr</surname></persName>
							<email>christoph.kuhr@hs-anhalt.de</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Sciences and Languages</orgName>
								<orgName type="institution">Anhalt University of Applied Sciences Köthen</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alexander</forename><surname>Carôt</surname></persName>
							<email>alexander.carot@hs-anhalt.de</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Sciences and Languages</orgName>
								<orgName type="institution">Anhalt University of Applied Sciences Köthen</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Evaluation of Data Transfer Methods for Block-based Realtime Audio Processing with CUDA</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">052DF078B53E7A2700BB9C28038D98F6</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T21:06+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Realtime audio production environments generally do not use GPUs, as long as they are not involved in 3D rendering or video production processes. Thus, the GPU is idle most of the time and can be utilized as an audio co-processor. The block-based streaming nature and floating point representation of computer audio hardware are very well suited for GPGPU programming techniques. In this paper we line out the data transfers as the most expensive part in the processing of realtime audio data and evaluate different data transfer methods and positively evaluate different data transfer methods with respect to future audio DSP applications.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>Modern computer systems are equipped with a CPU and a GPU. CPUs control the peripheral hardware and perform calculations unrelated to 3D graphics or video decoding. A GPU in contrast is concerned with rendering 3D graphics or utilizing special hardware codecs to decode nowadays video codes like H264 <ref type="bibr" target="#b0">[1]</ref>. If a computer system is used for any kind of audio production, that excludes 3D rendering and video decoding, the GPU is mostly idle. Additionally, GPUs are designed to handle multiple floating point operations at the same time in a threaded fashion. These considerations promote the idea to use a GPU as an audio co-processor for signal processing purposes. Computation intensive audio signal processing of realtime data has already been done, e.g. <ref type="bibr">Wefers and</ref> Berg have used a GPU to process FIR and IIR filters <ref type="bibr" target="#b1">[2]</ref>, Jedrzejewski and Marasek have used the GPU to do impulse response computations for virtual room acoustics <ref type="bibr" target="#b2">[3]</ref>.</p><p>In this paper we will investigate the lower limit for the usage of a GPU for such signal processing tasks in a realtime audio production environment. The limit is given as the combination of channel count and sample buffer size in use. The bottlenecks in the communication between CPU and GPU are evaluated and discussed. Further, possible workarounds to increase the performance aspects under investigation are proposed and evaluated. CUDA (Compute Unified Device Architecture) is a programming langauge designed for high-performance computing <ref type="bibr" target="#b3">[4]</ref>. The idea is to make use of thousands of threads running in parallel, which is not possible with When a kernel is executed on the GPU, the kernel launches a grid of several blocks, the limit is depending on GPU features. Inside each block on the grid, multiple threads execute the actual computations at runtime. The same computation runs on each thread, but with different data. Threads can be handled in a synchronous or an asynchronous way. The latter requires the concept of streams for a destinct mapping of the data shared between the threads of one block. The structure of CUDA computing grids is shown in fig. <ref type="figure" target="#fig_0">1</ref>.</p><p>The concept of CUDA streams <ref type="bibr" target="#b5">[6]</ref> is very convenient for the problem at hand. Different audio streams can be treated asynchronously, which is a better representation of their orthogonal nature then a matrix with an appropriate amount of rows and columns. This way the orthogonality may also be represented appropriately, but access to the matrix would be centralized and would experience possible racing conditions. Beyond, using a dimension (x, y or z) for the representation of the different audio channels, reduces the available dimensionality that is useable for calculations at runtime. This paper is part of the research project fast-music <ref type="bibr" target="#b6">[7]</ref>. The project has the goal to enable symphonic orchestras to rehearse via the public internet, by using the realtime communication software Soundjack <ref type="bibr" target="#b7">[8]</ref>  <ref type="bibr" target="#b8">[9]</ref>. Research in the field of packet loss concealtment will use GPUs for complex signal processing based on machine learning algorithms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. ARCHITECTURE</head><p>The work of Wefers and Berg <ref type="bibr" target="#b1">[2]</ref> has also shown, that realtime processing of audio data with a GPU is possible. The communication between CPU and GPU is realized via driver calls and shared memory, either DMA, GPU or CPU RAM. The CPU is also referred to as host and the GPU as device. Nowadays, system architectures where CPU and GPU share the same cache are used increasingly, albeit mainly in embedded systems. This architecture completely eleminates memory copies, since the memory is coherently accessible by the CPU and the GPU. In conventional systems which communicate via the PCIe bus, data has to be copied from CPU RAM to GPU RAM and back. Since the API calls copying data between CPU and GPU have much overhead, it is more efficient to copy huge amounts of data. Thus, it is even more interesting to investigate the use case of small amounts of data, as generated and processed in the audio domain.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 2: Legend Data Transfer Method Measurements</head><p>Realtime audio data is represented as a two dimensional vector field. At any sample point in time some analog digital converter process generates a sample, with typical bit depths of 16, 24 or 32 bits, either encoded as integer or floating point <ref type="bibr" target="#b9">[10]</ref>. Computer audio hardware manages data by using buffers that consist of a predefined amount of samples. The audio driver repeatedly accesses the memory of the audio hardware and copies the sample buffers to the CPU RAM for further usage. The responsivness of such an audio system depends on the size of the sample buffers, while the response time reduces with an increasing sample buffer size. Typical sample buffer sizes are 64, 128, 256, 512, 1024 samples <ref type="bibr" target="#b10">[11]</ref>.</p><formula xml:id="formula_0">AudioDataBlock = SampleDepth • SampleBuf f erSize • ChannelCount AudioDataBlock = 32bit • {64, 128, 512, 1024} Samples s • {2, 8, 16, 32, 64}</formula><p>Due to this block-based streaming nature, the data transfer and processing of audio data between CPU and GPU might reduce the impact of the data copying overhead, particularly if multiple audio channels are used. The audio data, that we will transfer and process with the GPU, is provided by a professional audio driver and server combination called Jack Audio Connection Kit <ref type="bibr" target="#b11">[12]</ref>. On top of a Linux ALSA <ref type="bibr" target="#b12">[13]</ref> driver, Jack provides the means to interconnecting jack-aware audio software to the audio interface with 32 bit floating point precision. The floating point format requires the development of a prototype, because the Soundjack clients use an integer format instead of floating point and would require additional conversion. We developed a most simple Jack client for testing purposes with varying channel counts and sample buffer sizes. The Jack client is linked against a shared library that provides the CUDA Kernel <ref type="bibr" target="#b3">[4]</ref>. This way CUDA computations can be integrated in arbitrary C programs. The Jack Server configures the audio interface by utilizing the ALSA driver infrastructure. The most important configuration parameters for our investigations are the channel count and sample buffer size, called frame or period in the Jack domain. At runtime, the Jack Server requests our Jack client to process a frame with a callback function. If the callback function is not done with its computations in time, the Jack Server reports a buffer underrun, also called xrun in the Jack domain. A Nvidia Geforce GT940mx GPU with 2 GB of DDR3 RAM is connected to an Intel i7-6870 4-Core CPU with 16 GB DDR3 RAM via a PCIe x16 2.0 bus <ref type="bibr" target="#b13">[14]</ref>, in the system under test. Thus, the transfer rate between CPU and GPU is limited to the bus bandwidth of 8 GBps simplex. The Nvidia Geforce GT940mx has a compute capability of 5.0 (≥ 2.0), which allows it to use managed memory. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. CUDA MEMORY ORGANIZATION AND MANAGEMENT</head><p>The data structure and data transfer between CPU and GPU are the bottlenecks for the entire signal processing. Three different data transfer methods can be used:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1) Synchronous data transfer</head><p>A synchronous data transfer returns as soon as the memory operation on the GPU memory is done, with a success or failure result. For the GPU integration of synchronous data transfers, it is irrelevant whether the memory is pagable or pinned. Either type can be accessed. Pagable memory is memory from the virtual address space of CPU or the operating system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2) Asynchronous data transfer</head><p>An asynchronous data transfer returns immediately after invoking the data transfer, regardless of the result. The result of the operation has to be checked seperately. It requires the additional concept of streams for the integration on the GPU. Further, the host memory has to be pinned. Pinned memory addresses are allocated in the DMA address space of the host system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>3) Managed memory with coherent caches on CPU and GPU</head><p>With managed memory, the requirement of memory copy operations is eliminated. The GPU driver allocates memory on the CPU and GPU respectively, manages any data access onto these memory segments implicitly and thus keeps the data in both memory locations coherent by small caching operations. The HostToDevice mode utilizes the Direct Memory Access (DMA) memory of the host system. This enables the CPU to offload the data transfer operations to the GPU without waiting for the completion or result. Invoking CUDA memcpy between two GPUs uses memory copy operations between the RAM of both GPUs. If a D2D memory copy operation is issued on a single device however, the GPUs' internal cache is used for the data transfer. Although the DeviceToHost mode does not utilize the DMA memory it may also operate asynchronously, but slower since it is copied from GPU to CPU RAM.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. EXPERIMENTS</head><p>We investigated the influence that the sample buffer size and channel count had on the data transfer rates. The audio channel count was varied between 2, 8, 16 and 32 channels, while each channel count was tested with each common sample buffer size of 64, 128, 256, 512 and 1024 samples per buffer. The samples were formatted as 32 bit floating point. A simple CUDA kernel is provided for an exemplary computation. Each thread in a block handles exactly one sample, copies it from the input to the output buffer. This way 64 up to 1024 threads run in parallel in a single block. The worstcase for the data transfer times, is given by the Jack servers buffersize and sample rate, which in this case is 48kHz ( Sample Duration =  The profiling overhead of the NVidia Visual Profiler (NVVP) for 32 channels with 64 samples per buffer pushed the host machine to its limits. Thus, tests with 64 audio channels were omitted.  A comparisson of fig. <ref type="figure" target="#fig_3">5</ref> and fig. <ref type="figure" target="#fig_6">8</ref> shows that the memory mapped H2D mode takes less time, at minimum, average and maximum then the asynchronous copy mode. The kernel execution times for the two other transfer methods shown in fig. <ref type="figure" target="#fig_2">4</ref> and fig. <ref type="figure" target="#fig_5">7</ref>, exhibit no significant difference. In fig. <ref type="figure" target="#fig_1">3</ref> only the device to device copy operation is shown, which does not involve any kernel launch. These findings suggest that the synchronous memory transfer method would also be suitable for the H2D copy mode. Since a kernel has to wait until all data is present in the GPU memory, it is of no consequence at this point, if the data is transfered synchronously or asynchronously. In contrast to the D2H mode, where a non blocking data transfer allows the processing chain to finish sooner. The magnitude of these savings is much lower then of the overhead introduced by the CUDA API and driver calls. This is observable in the rows below the CUDA Context in fig. <ref type="figure" target="#fig_8">9</ref>, the three smaller gaps (≈ 7ms) on the right side and a larger gap (≈ 28ms) on the left side relate to the small chunks in the rows for the respective streams. These chunks are the hardware based memory operations as mentioned above and take only a few micorseconds in average. All three memory organization modes exhibit a common problem of cyclic nature. At a given interval (≈ 11s for pagable memory, ≈ 5s for pinned memory and ≈ 2.5s for managed memory) memory operations last approximately four times longer, resulting in the larger gap on the left side in fig. <ref type="figure" target="#fig_8">9</ref>. These API and driver calls introduce jitter to the tested audio signal.</p><p>The turning point from where the CUDA API overhead is neglectable, can be quantified: </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure1: CUDA Computing Grids<ref type="bibr" target="#b4">[5]</ref> </figDesc><graphic coords="1,319.21,239.88,224.49,286.21" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Device to Device Copy Duration Synchronous Data Transfer Method</figDesc><graphic coords="2,314.44,423.49,234.04,199.29" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Kernel Execution Duration Managed Memory Data Transfer Method</figDesc><graphic coords="3,62.25,160.42,234.04,200.30" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Host to Device Transfer Duration Data Transfer Method</figDesc><graphic coords="3,62.25,500.25,234.04,200.45" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Device to Host Transfer Duration Data Transfer Method</figDesc><graphic coords="3,314.44,500.71,234.05,199.99" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Kernel Execution Duration Asynchronous Data Transfer Method 2) DeviceToDevice (DtoD or D2D)</figDesc><graphic coords="4,67.02,181.65,224.49,191.23" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8: Host to Device Transfer Duration Asynchronous Data Transfer Method</figDesc><graphic coords="4,62.25,500.87,234.04,199.83" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 10 :</head><label>10</label><figDesc>Figure 10: Device to Host Transfer Duration Asynchronous Data Transfer Method</figDesc><graphic coords="4,314.44,500.32,234.04,200.38" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 9 :</head><label>9</label><figDesc>Figure 9: NVVP Screenshot showing CUDA API Overhead</figDesc><graphic coords="5,61.77,67.93,487.19,228.10" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table I :</head><label>I</label><figDesc>Tolerable Worst Cast Latencies for Realtime Audio</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table II :</head><label>II</label><figDesc>Channel Count and Sample Buffersize Limit for Realtime Audio Processing VI. CONCLUSIONS All three memory transfer methods are able to operate on realtime audio data. Managed memory however is most convenient, because host and device pointers do not require any special handling and integrate smoothly into C code as well as CUDA code. For the usage with Jack however, two memory copy operations are still required, because Jack provides preallocated pointers to its buffer interface. Low sample buffer sizes increase jitter, but no buffer underruns were detected. Although the duration of the CUDA API and driver calls suggest that underruns should occur with sample buffer sizes below 512 samples. Evaluation of Data Methods for Block-based Realtime Audio Processing with CUDA</figDesc><table><row><cell cols="2">Channel Sample Count Buffer Size 2 512</cell></row><row><cell>4</cell><cell>512</cell></row><row><cell>8</cell><cell>1024</cell></row><row><cell>16</cell><cell>1024</cell></row><row><cell>32</cell><cell>1024</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">VII. FUTURE WORK</head><p>The evaluation of the data transfer method has been a feasability study for further goals. In the future, machine learning algorithms will be investigated in this environment as well as common signal processing algorithms, with respect to error concealment techniques and the generation of audio effects.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VIII. ACKNOWLEDGEMENTS</head><p>fast-music is part of the fast-project cluster (fast actuators sensors &amp; transceivers), which is funded by the BMBF (Bundesministerium für Bildung und Forschung).</p></div>
			</div>


			<div type="availability">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Data Transfer Method</head><p>Evaluation of Data Transfer Methods for Block-based Realtime Audio Processing with CUDA</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Advanced video coding for generic audiovisual services</title>
		<author>
			<persName><forename type="first">H</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ITU-T Std. H</title>
		<imprint>
			<biblScope unit="volume">264</biblScope>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">High-performance real-time fir-filtering using fast convolution on graphics hardware</title>
		<author>
			<persName><forename type="first">F</forename><surname>Wefers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Berg</surname></persName>
		</author>
		<idno>DAFX-1 -DAFX-8</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 13th Int. Conference on Digital Audio Effects (DAFx-10)</title>
				<meeting>of the 13th Int. Conference on Digital Audio Effects (DAFx-10)<address><addrLine>Graz, Austria</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010-06-10">Sep. 6-10, 2010</date>
		</imprint>
		<respStmt>
			<orgName>Institute of Technical Acoustics, RWTH Aachen University</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Computation of room acoustics using programmable video hardware</title>
		<author>
			<persName><forename type="first">M</forename><surname>Jedrzejewski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Marasek</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computer Vision and Graphics</title>
				<imprint>
			<publisher>PJWSTK</publisher>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m">Getting Started with CUDA</title>
				<imprint>
			<publisher>NVidia Corporation</publisher>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Cuda C Programming</forename><surname>Guide</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nvidia</forename><surname>Corporation</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m">CUDA Streams, Best Practices and Common Pitfalls, NVidia Corporation</title>
				<imprint/>
	</monogr>
	<note>Year unknown</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<ptr target="https://de.fast-zwanzig20.de/" />
		<title level="m">) fast actuators, sensors and transceivers</title>
				<imprint>
			<date type="published" when="2017-06">2017. Jun.</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<ptr target="http://www.soundjack.eu" />
		<title level="m">Soundjack -a realtime communication solution</title>
				<imprint>
			<date type="published" when="2017-06">2017. Jun.</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Carôt</surname></persName>
		</author>
		<title level="m">Musical telepresence -a comprehensive analysis towards new cognitive and technical approaches</title>
				<meeting><address><addrLine>Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2009-05">May 2009</date>
		</imprint>
		<respStmt>
			<orgName>University of Lübeck</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Ph.D. dissertation</note>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Discrete-time signal processing</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">V</forename><surname>Oppenheim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">W</forename><surname>Schaefer</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1989">1989</date>
			<publisher>Prentice Hall, Inc</publisher>
			<pubPlace>Englewood Cliffs, NJ</pubPlace>
		</imprint>
	</monogr>
	<note>2nd ed</note>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Principles of Digital Audio</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">C</forename><surname>Pohlmann</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2005">2005</date>
			<publisher>The Mcgraw-Hill Companies</publisher>
		</imprint>
	</monogr>
	<note>5th ed</note>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<ptr target="https://jackaudio.org" />
		<title level="m">Jack audio connection kit</title>
				<imprint>
			<date type="published" when="2017-06">2017. Jun</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<ptr target="https://alsa-project.org/main/index.php/MainPage/" />
		<title level="m">Advanced linux sound architecture</title>
				<imprint>
			<date type="published" when="2017-06">2017. Jun</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<ptr target="https://members.pcisig.com/wg/PCI-SIG/document/download/8246" />
		<title level="m">Pci express base specification revision 2.0</title>
				<imprint>
			<date type="published" when="2006-12">2006. Dec.</date>
		</imprint>
	</monogr>
	<note>PCI-SIG</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
