<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">On the Effi cient Parallel Computing of Long Term Reliable Trajectories for the Lorenz System</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Ivan</forename><surname>Hristov</surname></persName>
							<email>ivanh@fmi.uni-sofia.bg</email>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Mathematics and Informatics</orgName>
								<orgName type="institution">Sofi a University</orgName>
								<address>
									<country key="BG">Bulgaria</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Radoslava</forename><surname>Hristova</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Mathematics and Informatics</orgName>
								<orgName type="institution">Sofi a University</orgName>
								<address>
									<country key="BG">Bulgaria</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Stefka</forename><surname>Dimova</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Mathematics and Informatics</orgName>
								<orgName type="institution">Sofi a University</orgName>
								<address>
									<country key="BG">Bulgaria</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Peter</forename><surname>Armyanov</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Mathematics and Informatics</orgName>
								<orgName type="institution">Sofi a University</orgName>
								<address>
									<country key="BG">Bulgaria</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nikolay</forename><surname>Shegunov</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Mathematics and Informatics</orgName>
								<orgName type="institution">Sofi a University</orgName>
								<address>
									<country key="BG">Bulgaria</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Igor</forename><surname>Puzynin</surname></persName>
							<affiliation key="aff1">
								<orgName type="laboratory">JINR</orgName>
								<orgName type="institution">Laboratory of Information Technologies</orgName>
								<address>
									<settlement>Dubna</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Taisia</forename><surname>Puzynina</surname></persName>
							<affiliation key="aff1">
								<orgName type="laboratory">JINR</orgName>
								<orgName type="institution">Laboratory of Information Technologies</orgName>
								<address>
									<settlement>Dubna</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Zarif</forename><surname>Sharipov</surname></persName>
							<affiliation key="aff1">
								<orgName type="laboratory">JINR</orgName>
								<orgName type="institution">Laboratory of Information Technologies</orgName>
								<address>
									<settlement>Dubna</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Zafar</forename><surname>Tukhliev</surname></persName>
							<affiliation key="aff1">
								<orgName type="laboratory">JINR</orgName>
								<orgName type="institution">Laboratory of Information Technologies</orgName>
								<address>
									<settlement>Dubna</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">On the Effi cient Parallel Computing of Long Term Reliable Trajectories for the Lorenz System</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">FC8B8FDB6ABE492E191A2AF1CCFF2C4E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T01:15+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Parallel Computing</term>
					<term>Multiple Precision</term>
					<term>Variable Stepsize Taylor Series Method</term>
					<term>Lorenz System</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this work, we propose an effi cient parallelization of multiple-precision Taylor series method with variable stepsize and fi xed order. For given level of accuracy the optimal variable stepsize determines higher order of the method than in the case of optimal fi xed stepsize. Although the used order of the method is greater than that in the case of fi xed stepsize, and hence the computational work per step is greater, the reduced number of steps gives less overall work. In addition, the greater order of the method is benefi cial in the sense that it increases the parallel effi ciency. As a model problem, we use the paradigmatic Lorenz system. With 256 CPU cores in Nestum cluster, Sofi a, Bulgaria, we succeed to obtain a correct reference solution in the rather long time interval -[0,11000]. To get this solution we perform two large computations: one computation with 4566 decimal digits of precision and 5240-th order method, and second computation for verifi cation -with 4778 decimal digits of precision and 5490-th order method.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Multiple precision Taylor series method is an affordable and very effi cient numerical method for integration of some classes of low dimensional dynamical systems in the case of high precision demands <ref type="bibr" target="#b0">[1]</ref>, <ref type="bibr" target="#b1">[2]</ref>. The method gives a new powerful tool for theoretical investigation of such systems.</p><p>A numerical procedure for computing reliable trajectories of chaotic systems, called Clean Numerical Simulation (CNS), is proposed by Shijun Liao in <ref type="bibr" target="#b2">[3]</ref> and applied for different systems <ref type="bibr" target="#b3">[4]</ref>, <ref type="bibr" target="#b4">[5]</ref>, <ref type="bibr" target="#b5">[6]</ref>. The procedure is based on multiple precision Taylor series method. The main concept for CNS is the critical predictable time T c , which is a kind of practical Lyapunov time. T c is defi ned as the time for decoupling of two trajectories computed by two different numerical schemes. The CNS works as follows. An optimal fi xed stepsize is chosen. Then estimates of the required order of the method N and the required precision (the number of exact decimal digits K of the fl oating-point numbers) are obtained. The optimal order N is estimated by computing the T c -N dependence by means of the numerical solutions for fi xed large enough K. The estimate of K is obtained by computing the T c -K dependence by means of the numerical solutions for fi xed large enough N. This estimate of K is in fact an estimate for the Lyapunov exponent <ref type="bibr" target="#b6">[7]</ref>. For given T c the solution is then computed with the estimated N and K and after that one more computation with higher N and K is performed for verifi cation. The choice of N and K ensures that the round-off error and the truncation error are of the same order.</p><p>When very high precision and very long integration interval are needed, the computational problem can become large. In this case, the parallelization of the Taylor series method is an important task and needs to be carefully developed. The fi rst parallelization of CNS is reported in <ref type="bibr" target="#b7">[8]</ref> and later improved in <ref type="bibr" target="#b8">[9]</ref>. A pretty long reference solution for the paradigmatic Lorenz system, namely in the time interval [0,10000], obtained in about 9 days and 5 hours by using the computational resource of 1200 CPU cores, is given in <ref type="bibr" target="#b9">[10]</ref>. However, no details of the parallelization process are given in <ref type="bibr" target="#b7">[8]</ref>, <ref type="bibr" target="#b8">[9]</ref>, <ref type="bibr" target="#b9">[10]</ref>. In our recent work <ref type="bibr" target="#b11">[11]</ref> we reported in details a simple and effi cient hybrid MPI + OpenMP parallelization of CNS for the Lorenz system and tested it for the same parameters as those in <ref type="bibr" target="#b9">[10]</ref>. The results show very good effi ciency and very good parallel performance scalability of our program.</p><p>This work can be regarded as a continuation of our previous work <ref type="bibr" target="#b11">[11]</ref>, where fi xed stepsize is used. Here we make a modifi cation of CNS with a variable stepsize and fi xed order following the simple approach given in <ref type="bibr" target="#b12">[12]</ref>. Although the used order of the method is greater than that in the case of fi xed stepsize, and hence the computational work per step is greater, the reduced number of steps gives less overall work. In addition, the greater order of the method is benefi cial in the sense that it increases the parallel effi ciency. With 256 CPU cores in Nestum cluster, Sofi a, Bulgaria, we succeed to obtain a correct reference solution in [0,11000] and in this way we improve the results from <ref type="bibr" target="#b9">[10]</ref>. To obtain this solution we performed two large computations: one computation with 4566 decimal digits of precision and 5240-th order method, and second computation for verifi cation -with 4778 decimal digits of precision and 5490-th order method for verifi cation. The computations lasted ≈ 9 days and 18 hours and ≈ 11 days and 7 hours, respectively. Let us note that the improvement of the numerical algorithm does not change the parallelization strategy from our previous work <ref type="bibr" target="#b11">[11]</ref>, where the parallelization process is explained in more details. The difference from the previous parallel program is one additional OpenMP single section with negligible computational work, which computes the optimal step.</p><p>It is important to mention that although our test model is the classical Lorenz system, the proposed parallelization strategy is rather general -it could be applied as well to a large class of chaotic dynamical systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Taylor series method and CNS for the Lorenz system</head><p>We consider as a model problem the classical Lorenz system <ref type="bibr" target="#b13">[13]</ref>: <ref type="bibr" target="#b0">(1)</ref> where R = 28, σ = 10, b = 8/3 are the standard Salztman's parameters. For these parameters, the system is chaotic. Let us denote with x i , y i , z i , i = 0, ..., N the normalized derivatives (the derivatives divided by i!) of the approximate solution at the current time t. Then the N-th order Taylor series method for (1) with stepsize τ is:</p><p>(2)</p><p>The i-th Taylor coeffi cients (the normalized derivatives) are computed as follows. From system (1) we have By applying the Leibniz rule for the derivatives of the product of two functions, we have the following recursive procedure for computing x i+1 , y i+1 , z i+1 for i = 0, ..., N-1:</p><formula xml:id="formula_0">(3)</formula><p>To compute the i+1-st coeffi cient in the Taylor series we need all previous coeffi cients from 0 to i. In fact, this algorithm for computing the coeffi cients of the Taylor series is called automatic differentiation, or also algorithmic differentiation <ref type="bibr" target="#b14">[14]</ref>. It is obvious that we need O(N 2 ) fl oating point operations for computing all coeffi cients. The subsequent evaluation of Taylor series with Horner's rule needs only O(N) operations.</p><p>Let us now explain how we choose the stepsize τ. We use a variable stepsize strategy, which makes the method much more robust then in the fi xed stepsize case. We use a simple strategy taken from <ref type="bibr" target="#b12">[12]</ref>, which ensures both the convergence of the Taylor series and the minimization of the computational work per unit time. If we denote the vector of the normalized derivatives of the solution with X i = (x i , y i , z i ) and take a safety factor 0.993, then the stepsize τ is determined by the last two terms of the Taylor expansions <ref type="bibr" target="#b12">[12]</ref>: In <ref type="bibr" target="#b12">[12]</ref> the order of the method is determined by the local error tolerance. However, we do not work explicitly with some local error tolerance and, we do not use any explicit dependence between the local and the global error. Instead of this, as in <ref type="bibr" target="#b2">[3]</ref>, we compute an a priori estimate of the needed order of the method for a reliable solution. As said before, the critical predictable time T c is defi ned as the time for decoupling of two trajectories computed by two different numerical schemes (in this case -by different N). The solutions are computed with large enough precision to ensure that the truncation error is the leading one. As a criterion for decoupling time, we choose the time for establishing only 30 correct digits. The obtained T c -N dependencies for fi xed stepsize τ = 0.01 and variable stepsize are shown in Figure <ref type="figure" target="#fig_0">1</ref>. As seen from this fi gure, the computational work for one-step in the case of variable stepsize is ≈ 80% greater than in the case of fi xed stepsize -(2.98/2.22) 2 ≈ 1.80. However, the reduced number of steps gives less overall work. In addition, the greater order of the method is benefi cial in the sense that it increases the parallel effi ciency. The reason is that with increasing the order N of the method, the parallelizable part of the work becomes relatively even larger than the serial part and the parallel overhead part.</p><p>Similarly, we compute an a priori estimate of the needed precision by means of computing the T c -K dependence. In this case, we compare the solutions for different K and large enough N. We obtain the dependence T c = 2.55K -81, which is the same, as expected, for fi xed and for variable stepsize.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Parallelization of the algorithm</head><p>The improvement of the numerical algorithm does not change the parallelization strategy from our previous work <ref type="bibr" target="#b11">[11]</ref>, where the parallelization process is explained in more details. However, as we will see, the variable stepsize not only decreases the computational work for a given accuracy, but also gives a higher parallel effi ciency.</p><p>Let us store the Taylor coeffi cients in the arrays x, y, z of lengths N+1. The values of x i are stored in x[i], those of y i in y[i] and those of z i in z[i]. As explained in <ref type="bibr" target="#b7">[8]</ref>, <ref type="bibr" target="#b8">[9]</ref>, the crucial decision for parallelization is to make a parallel reduction for the two sums in (3). However, in order to reduce the remaining serial part of the code and hence to improve the parallel speedup from the Amdal's law, we should utilize some limited, but important parallelism. We compute x[i+1], y[i+1], z[i+1] in parallel. Moreover, we compute x[i+1] in advance, before computing the sums in (3), when during the reduction process some of the computational resource is free. In the same way we compute in advance Rx[i] -y[i] from the formula for y[i + 1] and bz[i] from the formula for z[i + 1]. These computations are taken in advance; because multiplication is much more expensive than the other used operations, such as division by an integer number is not so expensive. The three evaluations by Horner's rule for the new x[0], y[0], z[0] are also done in parallel.</p><p>In this work we consider a hybrid MPI + OpenMP strategy <ref type="bibr" target="#b15">[15]</ref>, <ref type="bibr" target="#b16">[16]</ref>, i.e., every MPI process creates a team of OpenMP threads. For multiple precision fl oating-point arithmetic, we use GMP library (GNU Multiple Precision library) <ref type="bibr" target="#b17">[17]</ref>. The main reason to consider a hybrid strategy, rather than a pure MPI one, is that OpenMP performs slightly better than MPI on one computational node. For packing and unpacking of the GMP multiple precision types for the MPI messages, we rely on the tiny MPIGMP library of Tomonori Kouya <ref type="bibr" target="#b18">[18]</ref>, <ref type="bibr" target="#b19">[19]</ref>, <ref type="bibr" target="#b20">[20]</ref>, <ref type="bibr" target="#b21">[21]</ref>.</p><p>It is important to note that for our problem the pure OpenMP parallelization has its own importance. First, the programming with OpenMP is easier, because it avoids the usage of libraries like MPIGMP. Second, since the algorithm does not allow domain decomposition, the memory needed for one computational node is multiplied by the number of the MPI processes per that node, while OpenMP needs only one copy of the computational domain and thus some memory is saved.</p><p>The sketch of our parallel program is given in Figure <ref type="figure">2</ref>. Every thread gets its id and stores it in tid and then the loop with index i is performed. Every MPI process takes its portion -the fi rst and the last index controlled by the process. After that the directive #pragma omp for shares the work for the loop between threads.</p><p>Although OpenMP has a build-in reduction clause, we cannot use it, because we use user-defi ned types for multiple precisions number and user-defi ned operations. A manual reduction by applying a standard tree based parallel reduction is done. We use containers for the partial sums of every thread and these containers are shared. The containers are stored in the array sum. We have in addition an array of temporary variables tempv for storing the intermediate results of the multiplications. To avoid false sharing, a padding strategy is applied <ref type="bibr" target="#b16">[16]</ref>. At the point where each process has computed its partial sums, we perform MPI_ALLREDUCE between the master threads <ref type="bibr" target="#b15">[15]</ref>. It is useful to regard MPI_ALLREDUCE as a continuation of the tree based reduction process, which starts with the OpenMP reduction. Communications between master threads are overlapped with some computations for x[i+1], y[i+1], z[i+1] that can be taken before the computation of the sums in (3) is fi nished. When the MPI_ALLREDUCE is fi nished, we compute in parallel the remaining operations for</p><formula xml:id="formula_1">x[i+1], y[i+1], z[i+1].</formula><p>In between the block which computes the Taylor coeffi cients and the block which computes the new values of x[0], y[0], z[0] in parallel, we compute the new optimal stepsize within an omp single section. While the block for comput-ing the Taylor coeffi cients is O(N 2 ) and the block for evaluations of the polynomials is O(N), this block is only O(1) and hence the work is negligible. Let us note that the GMP library does not offer a power function for the computations from formula (4). The good thing is that we do not need to compute the stepsize with multiple precision and double precision is enough. Therefore, we use the C standard library function pow in double precision. We do a normalization of the large GMP fl oating point numbers in order to work in the range of the standard double precision numbers. The C-code in terms of GMP library of our hybrid MPI + OpenMP program can be downloaded from <ref type="bibr" target="#b22">[22]</ref>.</p><p>Let us mention that if one half of the OpenMP threads computes one of the sums in (3) and the other half computes the other sum, one could also expect some small performance benefi t, because for the small indexes i the unused threads will be less and the difference from the perfect load balance between threads will be less. However, the last approach is not general because it strongly depends on the number of sums for reduction (two in the particular case of the Lorenz system) and the number of available threads.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Computational resources. Performance and numerical results</head><p>The preparation of the parallel program and the many tests are performed in the Nestum Cluster, Sofi a, Bulgaria <ref type="bibr" target="#b23">[23]</ref> and in the HybriLIT Heterogeneous Platform at the Laboratory of IT of JINR, Dubna, Russia <ref type="bibr" target="#b24">[24]</ref>. The large computations for the reference solution in the time interval [0,11000] and the presented results for the performance are from Nestum Cluster. Nestum is a homogeneous HPC cluster based on two socket nodes. Each node consists of 2 x Intel(R) Xeon(R) Processor E5-2698v3 (Haswell-based processors) with 32 cores at 2.3 GHz. We have used Intel C++ compiler version 17.0, GMP library version 6.2.0, OpenMPI version 3.1.2 and compiler optimization options -O3 -xhost.</p><p>We use the same initial conditions as those in <ref type="bibr" target="#b9">[10]</ref>, namely x(0) = -15.8, y(0) = -17.48, z(0) = 35.64, in order to compare with the benchmark table in <ref type="bibr" target="#b9">[10]</ref>. We computed a reference solution in the rather long time interval [0,11000] and repeated the benchmark table up to time 10000. Computing this table by two different stepsize strategies is a good demonstration that Clean Numerical Simulation (CNS) is a correct and valuable approach for computing reliable trajectories of chaotic systems.</p><p>We performed two large computations with 256 CPU cores (8 nodes in Nestum). The fi rst computation is with 4566 decimal digits of precision and 5240-th order method (5% reserve from the a priori estimates). The second computation is for verifi cation -with 4778 decimal digits of precision and 5490-th order method (10% reserve from the a priori estimates). The fi rst computation lasted ≈ 9 days and 18 hours and the second ≈ 11 days and 7 hours. The overall speedup with 256 cores for the fi rst computation is 162.8, for the second -164. <ref type="bibr" target="#b5">6</ref>.</p><p>By estimating the time needed for the same accuracy and with fi xed stepsize 0.01, we conclude that by applying variable stepsize strategy we have 2.1x speedup. There are two reasons for this speedup -less overall work and increased parallel effi ciency. Although the work per step in the case of variable stepsize increases by ≈ 80%, the average stepsize is ≈ 0.034 and thus the overall work is ≈ 53% from the work in the case of fi xed stepsize 0.01. In addition, the parallel effi ciency increases from 55.5% up to 63.6% for the fi rst computation and from 56.2% up to 64.3% for the second. This is because by increasing the order of the method N, we increase the amount of the parallel work, which mitigates the impact of the serial work and the parallel overhead work.</p><p>As we computed the reference solution with some reserve of the estimated N and K, we actually obtain the solution with some more correct digits. The reference solution with 60 correct digits at every 100-time units can be seen in <ref type="bibr" target="#b22">[22]</ref>. The reference solution at t = 11000 is: </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusions</head><p>Parallelized version of multiple precision Taylor series method and particularly the Clean Numerical Simulation should be used with a variable stepsize strategy as a better alternative of the fi xed stepsize one. An important observation is that variable stepsize not only decreases the computational work for a given accuracy, but also gives a higher parallel effi ciency.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. T c -N dependencies for fi xed and variable stepsize.</figDesc><graphic coords="4,97.66,302.99,261.07,253.15" type="bitmap" /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Acknowledgement</head><p>We thank for the opportunity to use the computational resources of the Nestum cluster, Sofi a, Bulgaria. We would like to give our special thanks to Dr. Stoyan Pisov for his great help in using the Nestum cluster and Prof. Emanouil Atanassov from IICT, BAS for valuable discussions and important remarks on the parallelization process. We also thank the Laboratory of Information Technologies of JINR, Dubna, Russia for the opportunity to use the computational resources of the HybriLIT Heterogeneous Platform. The work is supported by a grant of the Plenipotentiary Representative of the Republic of Bulgaria at JINR, Dubna, Russia.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0" />			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Performance of the Taylor series method for ODEs/DAEs</title>
		<author>
			<persName><forename type="first">R</forename><surname>Barrio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied Mathematics and Computation</title>
		<imprint>
			<biblScope unit="volume">163</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="525" to="545" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Breaking the limits: the Taylor series method</title>
		<author>
			<persName><forename type="first">R</forename><surname>Barrio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied mathematics and computation</title>
		<imprint>
			<biblScope unit="volume">217</biblScope>
			<biblScope unit="page" from="7940" to="7954" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">On the reliability of computed chaotic solutions of non-linear differential equations</title>
		<author>
			<persName><forename type="first">S</forename><surname>Liao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Tellus A: Dynamic Meteorology and Oceanography</title>
		<imprint>
			<biblScope unit="volume">61</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="550" to="564" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">On the numerical simulation of propagation of micro-level inherent uncertainty for chaotic dynamic systems</title>
		<author>
			<persName><forename type="first">S</forename><surname>Liao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Chaos, Solitons &amp; Fractals</title>
		<imprint>
			<biblScope unit="volume">47</biblScope>
			<biblScope unit="page" from="1" to="12" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">On the clean numerical simulation (CNS) of chaotic dynamic systems</title>
		<author>
			<persName><forename type="first">S</forename><surname>Liao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Hydrodynamics, Ser. B</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="729" to="747" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Over a thousand new periodic orbits of a planar three-body system with unequal masses</title>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Jing</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Liao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Publications of the Astronomical Society of Japan</title>
		<imprint>
			<biblScope unit="volume">70</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page">64</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">On the relation between reliable computation time, fl</title>
		<author>
			<persName><forename type="first">P</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1410.4919</idno>
	</analytic>
	<monogr>
		<title level="m">oat-point precision and the Lyapunov exponent in chaotic systems</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Computational uncertainty and the application of a high-performance multiple precision scheme to obtaining the correct reference solution of Lorenz equations</title>
		<author>
			<persName><forename type="first">P</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Numerical Algorithms</title>
		<imprint>
			<biblScope unit="volume">59</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="147" to="159" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Clean numerical simulation for some chaotic systems using the parallel multiple-precision Taylor scheme</title>
		<author>
			<persName><forename type="first">P</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Chinese science bulletin</title>
		<imprint>
			<biblScope unit="volume">59</biblScope>
			<biblScope unit="page" from="4465" to="4472" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">On the mathematically reliable long-term simulation of chaotic solutions of Lorenz equation in the interval</title>
		<author>
			<persName><forename type="first">S</forename><surname>Liao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wang</surname></persName>
		</author>
		<imprint>
			<date>10000</date>
			<biblScope unit="volume">0</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">Science China Physics, Mechanics and Astronomy</title>
		<imprint>
			<biblScope unit="volume">57</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="330" to="335" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Parallelizing multiple precision Taylor series method for integrating the Lorenz system</title>
		<author>
			<persName><forename type="first">I</forename><surname>Hristov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2010.14993</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A software package for the numerical integration of ODEs by means of high-order Taylor methods</title>
		<author>
			<persName><forename type="first">A</forename><surname>Jorba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Experimental Mathematics</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="99" to="117" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Deterministic nonperiodic fl ow</title>
		<author>
			<persName><forename type="first">E</forename><surname>Lorenz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the atmospheric sciences</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="130" to="141" />
			<date type="published" when="1963">1963</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Methods and applications of interval analysis</title>
		<author>
			<persName><forename type="first">R</forename><surname>Moore</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Society for Industrial and Applied Mathematics</title>
		<imprint>
			<date type="published" when="1979">1979</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">W</forename><surname>Gropp</surname></persName>
		</author>
		<title level="m">Using MPI: portable parallel programming with the message-passing interface</title>
				<imprint>
			<publisher>MIT press</publisher>
			<date type="published" when="1999">1999</date>
			<biblScope unit="volume">1</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Using OpenMP: portable shared memory parallel programming</title>
		<author>
			<persName><forename type="first">B</forename><surname>Chapman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Jost</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Van Der Pas</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2008">2008</date>
			<publisher>MIT press</publisher>
			<biblScope unit="volume">10</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<ptr target="https://gmplib.org/" />
		<title level="m">GNU GMP library</title>
				<imprint>
			<date type="published" when="2021-06-24">2021/06/24</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Kouya</surname></persName>
		</author>
		<ptr target="http://na-inet.jp/na/bnc/" />
		<title level="m">BNCpack</title>
				<imprint>
			<date type="published" when="2021-06-24">2021/06/24</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">A Brief Introduction to MPIGMP &amp; MPIBNCpack</title>
		<author>
			<persName><forename type="first">T</forename><surname>Kouya</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">MPIBNCpack library</title>
		<author>
			<persName><forename type="first">E</forename><surname>Nikolaevskaya</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Studies in Computational Intelligence</title>
		<imprint>
			<biblScope unit="volume">397</biblScope>
			<biblScope unit="page" from="123" to="134" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Performance Evaluation of Multiple Precision Numerical Computation using x86 64 Dualcore CPUs</title>
		<author>
			<persName><forename type="first">T</forename><surname>Kouya</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">FCS2005 Poster Session</title>
				<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<ptr target="https://github.com/rgoranova/hpcvss" />
		<title level="m">Article source code</title>
				<imprint>
			<date type="published" when="2021-06-24">2021/06/24</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<ptr target="http://hpc-lab.sofiatech.bg/" />
		<title level="m">Nestum Home Page</title>
				<imprint>
			<date type="published" when="2021-06-24">2021/06/24</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<ptr target="http://hlit.jinr.ru/" />
		<title level="m">HybriLIT Home Page</title>
				<imprint>
			<date type="published" when="2021-06-24">2021/06/24</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
