=Paper=
{{Paper
|id=Vol-2081/paper01
|storemode=property
|title=Method of Mixed Traffic Model Formation
|pdfUrl=https://ceur-ws.org/Vol-2081/paper01.pdf
|volume=Vol-2081
|authors=Alexey Begaev,Mikhail Chesnakov,Yuriy Starodubtsev
}}
==Method of Mixed Traffic Model Formation==
Method of Mixed Traffic Model Formation Alexey Begaev Mikhail Chesnakov, Yuriy Starodubtsev North West Echelon, JSC 32nd Department St-Petersburg, Russia Budyonny Military Academy of Communications, a.begaev@nwechelon.ru St-Petersburg, Russia, chesnakof@gmail.com; ys@e-nw.ru Abstract β This paper proposes a method of mixed traffic Actual information telecommunication systems are built and model formation, which allows to create statistical models of mixed operated by a significant number of operators using hardware traffic for each network element as well as to have sustainable and software from various manufacturers. The situation is statistical data characterizing mixed traffic on each network characterized by continuous development of technical element. It also shows diversity of network traffic by its basic specifications and standards used in information characteristics. Limitations of applicability of existing models and telecommunication systems while manufactures implement their methods complex for mixed network traffic specification are different versions [6]. discussed herein. We offer a variant of mixed stream decomposition to uniform streams using the Theory of Pattern Various routing options as well as destruction actions of Recognition methods. We offer a variant of uniform stream individual intruders (hackers) and their organized groups have a representation as random numerical sequences relevant to packets great impact on traffic parameters [12]. arrival time. There is grounding for selection of rules for checking of accordance between experimental and theoretical distribution Consequently, current information telecommunication in respect to uniform streams of network traffic typical for existing systems traffic is mixed and highly dynamic. Herewith it may and perspective information telecommunication systems. dramatically differ at various network points [7]. The method allows to have stable statistical date Keywords β traffic; stream; model; network; information telecommunication systems; Theory of Pattern Recognition; characterizing mixed traffic for each element of specified statistical analysis; random values distribution law subnetwork element of communication network. Standard approaches based on mathematical statistics methods cannot be applied because they do not provide event stream uniformity. Network packets differ from each other on a variety of characteristics: type, size, address, priority, etc. I. INTRODUCTION Request for communication service processing includes The rationale for developing of method of mixed traffic connection request stream as well as stream of transmitted userβs model formation is predetermined by significant number of information. actual circumstances and importance of date characterizing traffic for practice. Further on, it is expected that in properly functioning network the time share for connection request processing in the These data are necessary for solving important practical overall traffic volume is substantially less than time share for tasks on calculation of probabilistic time-response data exchange. All the switching nodes have the same service characteristics of specified subnetwork elements, required procedure. performance determination β π, at specified traffic intensity β π and at assigned service procedure by switching nodes, and Clear representation about the scope of information finding facts and reasons for traffic parameters abnormal processed by switching nodes in current information alteration [1]. telecommunication systems can be obtained examining statistic of overall traffic transferred through Internet Exchange Points. Relating to the statistical and uniform traffic, we developed The overall traffic transferred through node MSK-IX1 is shown a complex of models and methods [2, 3, 4, 5] which allows to on Figure 1 [8]. solve practical tasks with adequate accuracy. However, the current multiservice communication systems are characterized by a number of distinguishing features which do not admit of traditional methodological approach. 1 https://www.msk-ix.ru/traffic/ 1 by structure. The header formats have considerable number of fields which can take on considerable but limited number of values. Decomposition based on packet classification condition with exactly identical values in all fields will lead to unnecessarily increasing of uniform stream number that make more difficult to realize proposed model. Based on existing tasks it is acceptable to ignore some of fields values. From the other hand it is possible to perform packets classification conditions according to all header fields values. Moreover we can use specific traffic analyzers to make classification according to packet body content. It confirms the model flexibility. Fig. 1. Memberβs overall traffic transferred through node MSK-IX. We present the mixed traffic stream in a form of some data aggregate. In the proposed model decomposition is based only on header characteristics, characteristics of payload transferred in packet when assigning stream to certain class will be ignored. The recognition performs two basic operations. At first, it is II. METHOD OF MIXED TRAFFIC MODEL FORMATION calculation of realization similarity factor with all references. Significant traffic volumes allow to have a huge sampling Second operation is assigning of realization to reference with that is substantially different from the situation where the highest similarity. The recognition as decomposition of some set number of experiments is relatively small. to certain number of non-empty disjoint subsets using selected criteria. The method of mixed traffic model formation is expected to be realized in relatively stand-alone five phases. Graphical view The primary criterion is the assignment of mixed traffic of model formation process is shown on Figure 2. stream to one of existing network protocol (IP, X.25, etc.), at a later stage classification is performed based on criteria arising from differences of packet header fields values. Network packet structure and fill range of permissible fields values are always Unit for the 1-st Sequential hypothesis Random values stream ID creating testing distribution law known and finite which is necessary condition for combining of Mixed traffic Algorithm for recognition of mixed Unit for the i-th stream ID creating Sequential hypothesis testing Random values distribution law various networks to single one. Up to date described in RFC 791 traffic Unit for the k-th Sequential hypothesis Random values specification IPv4 protocol and its sequel, IPv6, are basic stream ID creating testing distribution law network protocols. This protocol is used as an example for Database of streams reference description Timer t0 to Ξt Database of random values distribution further description but the developed method allows to work Mixed traffic Mixed traffic Measurement and functions Model collection for with any primary date. fixation decomposition to creating of input data Statistical processing subnetwork element uniform streams (ID) for subsequent of communication statistical processing network IP packet header size may vary from 20 bytes to 60 bytes and contain as minimum 12 fields (Version, IHL, Type of Service, Fig. 2. Mixed traffic modeling process. Total Length, Identification, Flags, Fragment Offset, Time to Live, Protocol, Header Checksum, Source Address, Destination Address), therefore, assignment of packets with identical The first phase involves elements fixation for traffic headers to separate class would create its huge number. processed by πβth element of selected communication Format of IP packet header is shown on Figure 3. subnetwork. There are dedicated means β network protocol analyzers which are used for mixed traffic fixation and resulted IP packets header values determination. The typical functions of network protocol analyzer are packets capturing, decryption, packet analysis and displaying. As an example the most common network protocol analyzers could be considered: Wireshark, York, SoftPerfect Network Protocol Analyzer, Accurate Network Monitor and etc. All of them allow to have information concerning date and time of packet capturing, source and destination IP address, protocol type (network, transport or application layer) and other information about captured data. During the second phase, based on the set of specified Fig. 3. Format of IP packet header. characteristics a mixed traffic stream is decomposed to uniform ones. The model is suitable for various network protocols traffic processing, at the same time packet header formats may differ 2 IP packet header fields description: comparison of object to be recognized with reference set. The following similarity measures are available for binary data3: ο· Version: 4 bits. The Version field indicates the format of the internet header. Russell-Rao. This is a binary version of the inner (dot) product. Equal weight is given to matches and nonmatches. This ο· IHL: 4 bits. Internet Header Length is the length of the is the default for binary similarity data. internet header in 32 bit words, and thus points to the beginning of the data. Simple matching. This is the ratio of matches to the total number of values. Equal weight is given to matches and ο· Type of Service: 8 bits. The Type of Service provides an nonmatches. indication of the abstract parameters of the quality of service desired. Jaccard. This is an index in which joint absences are excluded from consideration. Equal weight is given to matches ο· Total Length: 16 bits. Total Length is the length of the and nonmatches. Also known as the similarity ratio. datagram, measured in octets, including internet header and data. This field allows the length of a datagram to be Dice. This is an index in which joint absences are excluded up to 65,535 octets. from consideration, and matches are weighted double. Also known as the Czekanowski or Sorensen measure. ο· Identification: 16 bits. An identifying value assigned by the sender to aid in assembling the fragments of a Rogers and Tanimoto. This is an index in which double datagram. weight is given to nonmatches and others. ο· Flags: 3 bits. Various Control Flags. The indices listed above can be used as a function π½(π1 , π2 , β¦ , ππ ), which determines the "distance" between ο· Time to Live: 8 bits. This field indicates the maximum classes in the attribute space with the coordinates π1 , π2 , β¦ , ππ . time the datagram is allowed to remain in the internet system. If this field contains the value zero, then the The task of pattern recognition using the methods of datagram must be destroyed. This field is modified in statistical recognition theory is realized in two stages. The stage internet header processing. of learning and constructing the standard descriptions of classes and the stage of recognition. ο· Protocol: 8 bits. This field indicates the next level protocol used in the data portion of the internet datagram. The source of information about recognizable images is the set of results of independent observations (sampling values) that ο· Options: variable. The options may appear or not in make up the learning (learning) (π₯π )1ππ = (π₯1 , π₯2 , β¦ , π₯ππ ) and datagrams. the control (exam) (π₯π )1π = (π₯1 , π₯2 , β¦ , π₯π ) samples, and Full description of IP packet header fields you can find at depending on the nature of the recognition problem (one- RFC 7912. dimensional or multidimensional) π₯π can be either a one- dimensional or a π- dimensional random variable. In the context of the current task the subject of interest is only uniform streams with a large share in overall stream. It is Training is aimed at the formation of standard class reasonable to group all relatively uncommon streams into descriptions. The decisive rule based on the formation of the separate class. likelihood ratio and its comparison with a certain threshold π, the value of which is determined by the selected quality Packet reference description database may be presented in criterion: logic table format. Letβs define by πΌ a set containing selected Μ (π₯ ,π₯ ,β¦,π₯ |π ) π classes of homogeneous in the sense of equality of header πΏΜ = Μ π 1 2 π 2 β₯ π (1) ππ (π₯1 ,π₯2 ,β¦,π₯π |π 1 ) selected fields values or disjoint values ranges, and by π½ a set of all possible header fields values or disjoint values ranges. In this where π Μπ (π₯1 , π₯2 , β¦ , π₯π |π π ) is the he estimate of the case if j-th header field value corresponds to π-th class of packets conditional joint π-dimensional probability density π₯1 , π₯2 , β¦ , π₯π then table element ππΌπ½ (π, π) = 1, otherwise ππΌπ (π, π) = 0. provided they belong to the class π π . A table such as the one described above but containing all At the stage of training and the construction of reference possible variants of values would have dramatic dimension that class descriptions, the following actions are performed: is not necessary because in practice only packet classes containing certain values in header are interesting. 1) Form a set of characteristics from the number of available to measure the characteristics of the object π1 , π2 , β¦ , ππ . To assign any mixed traffic packet to closest uniform class 2) Specify the function π½(π1 , π2 , β¦ , ππ ) that defines the we will use the Theory of Pattern Recognition methods. The "distance" between classes in the characteristic space with the Theory of Pattern Recognition method based on pair-wise coordinates π1 , π2 , β¦ , ππ . 2 https://tools.ietf.org/html/rfc791 3 www.ibm.com/support/knowledgecenter/ru/SSLVMB_24.0.0/ spss/base/cmd_proximities_sim_measure_binary.html 3 3) Define the probability distribution of probability During fourth stage statistical processing of uniform network characteristics for classes. traffic streams performs to establish continuous distribution law 4) Calculate and select π new characteristics π1 , π2 , β¦ , ππ , which most highly specifies random value sample of which was π < π, which correspond to the minimal eigenvalues ππ in the obtained during experimental observations, a hypothesize π concerning accordance between experimental and theoretical sum π½ = 2π‘π|π = βπ=1 ππ ,). distribution put forward which may be checked applying various The above sequence of actions will reduce the number of accordance criteria [9]. features that will reduce the cost of performing measurements The most frequently applicable in practice criteria are: 1) and calculations. Criteria of π 2 type; 2) Various non-parametric criteria: The recognition problem can be reduced to the problem of Kolmogorov criterion, Smirnov criterion, Mises criterion. They recognition of multidimensional normal populations. differed in the conditions of applicability when testing the Approaches to the solution of this problem are clearly set forth accordance hypothesize for various distribution laws (see GOST in [9]. R 50.1). At the stage of measuring and creating of primary data for There is difference between simple and complex further statistical processing the random numerical sequences hypothesizes. The simple tested hypothesize has a form: relevant to arrival time of packets belonging to uniform stream π»0 : π(π₯) = π(π₯, π0 ), where π(π₯) - density function; π0 - known will be received in a form of sequences of arrival times of scalar or vector parameter of theoretical distribution which used packets belonging to uniform streams: π π (π‘π ; π‘π + π₯π‘) = during accordance testing. The complex hypothesize has a form {π1π , β¦ , πlπ , β¦ , πππ },, where π π - numerical sequence of arrival π»0 : π(π₯) β {π(π₯, π), π β π©}, where Ξ β space of parameters and times of packets belonging to uniform stream; π‘π ; π‘π + π₯π‘ - scalar or vector parameter estimator πΜ is calculated using the current time range; π1π - arrival time of π-th packet, π-th stream. same sampling as for accordance hypothesize testing [11, 12]. The selection of set of distribution functions was conducted From the proposed in the method sequence of events taking on the basis of physical meaning of random value specifying into account characteristics of obtained uniform traffic streams time intervals between uniform traffic packets arrivals. Random and applicability of various accordance criteria we offer use values will be located only on positive semiaxis and uniform by hypothesize testing criterion of π 2 type for testing accordance nature traffic for which IP-header fields values are equal may be between experimental and theoretical distribution. Application overall traffic of large number of users or applications used one of π 2 type criteria is described in GOST R 50.1. type communication service. When testing simple hypothesize concerning accordance Database of distribution functions may be created from between experimental and theoretical distribution of random following distribution laws: gamma distribution, Erlang value π, the following sequence of actions is implemented: distribution, Rayleigh distribution, Pareto distribution and others a) Form a tested hypothesize by choosing a theoretical which are not contrary to physical meaning of random value distribution of random value πΉ(π₯, π) accordance of which is specifying time intervals between uniform traffic packets worth checking. arrivals. b) Make random sampling of π volume from aggregation. Mentioned above distribution laws are presented in Table 1. c) According to sampling volume π select interval number TABLE 1. DENSITY DISTRIBUTION FUNCTIONS π. Distribution d) Select edge points of group interval. In doing so the Density Distribution function name sampling may be stratified into intervals of equal length, Gamma ππΌ πΌβ1 βππ₯ intervals of equal probability or according to asymptotically distribution π(π₯) = π₯ π , π₯ > 0, optimum grouping for selected distribution law, but because π€(πΌ) where Ξ» β scale parameter (Ξ»>0); Ξ± β shape distribution laws for various βπ‘ may be different, we suggest to parameter (Ξ±>0) use the stratifying into intervals of equal length. In this case it is Erlang ππ necessary to calculate number ππ and determine probability distribution of π(π₯) = π₯ πβ1 π βππ₯ , π₯ β₯ 0, (π β 1)! values ππ (π). m-th order where Ξ» β scale parameter (Ξ»>0); m β shape parameter, distribution order, positive real e) After calculations ππ and ππ (π) according to selected number (mβ₯ 1) π₯ testing criterion it is necessary to calculate test statistics value π β Rayleigh 2 2 π(π₯) = 2 π βπ₯ β(2π ) , π₯ > 0, π according to the formula (2) or (3): where a β scale parameter, mode (a>0) (ππ βπβππ (π))2 Pareto πΌ π₯0 πΌ+1 π(π₯) = ( ) , π₯ > π₯0 , ππ2 = π βππ=1 , (2) ππ (π) π₯0 π₯ where π₯0 β location parameter, left border of π (π) possible values range (π₯0 > 0); Ξ± β shape πΠΎΠΏ = β2 ln π = β2 βππ=1 ππ ln ( π β ). (3) ππ π parameter (Ξ±>0) 2 f) According to ππβ1 - distribution in accordance with the formula (4) calculate value π{π > π β }. If π{π > π β } > πΌ, where πΌ is specified significance level, then there is no reason for 4 rejecting of tested hypothesize. Otherwise, tested hypothesize is REFERENCES rejected. [1] Staroduvtsev Yu.I., Begaev A.N., Davlyatova M.A. Quality Management 1 β of Information Services. β SPb: SPbSTU, 2017, 454p. (In Russ.). π {ππ2 > ππβ 2 } = πβ π β«π π π β2ο1 π βπ β2 ππ > πΌ (4). [2] Anisimov V.V., Begaev A.N., Staroduvtsev Yu.I. Functional model of 2 2 Ξ( β2) π2 communication network with unknown level of confidence and assess its Calculated test statistics value π β is compared with critical capabilities to provide VPN service with specified quality. Voprosy kiberbezopasnosti [Cybersecurity issues]. 2017. N 1 (19), pp. 6-15. DOI: value ππ,πΌ , where π = π β 1 is the number of degrees of freedom 10.21681/2311-3456-2017-1-6-15. defined by the equation: [3] Gross D.,Shortle J.F., Thompson J.M., Harris C.M. Fundamentals of 1 β Queueing Theory. 4th Ed. Wiley-Interscience, 2008, 528 p. β« π πβ2β1 π βπ β2 ππ = πΌ . 2πβ2 π€(π β2) ππ,πΌ (5) [4] Krylov V.V., Samohvalov S.S. Teletraffic and its application theory. ο Spt.: BHV - Peterburg, 2005 ο 288 p. (In Russ.). Values ππ,πΌ are given in the various handbooks. Accordance [5] Starodubtsev Yu.I., Begaev A.N., Kozachok A.V. The method of hypothesize is rejected if test statistics value is in critical range, controlling access to information resources of multi-service networks of i.e. at π β > ππ,πΌ . various levels of confidentiality. Voprosy kiberbezopasnosti [Cybersecurity issues]. 2016. N 3 (16), pp. 13-17. During complex hypothesize testing and parameter [6] Markov A., Luchin D., Rautkin Y., Tsirlov V. Evolution of a Radio estimators calculation on grouped date, as a result of Telecommunication Hardware-Software Certification Paradigm in minimization of statistics predetermined by formulas (2) and (3) Accordance with Information Security Requirements. In Proceedings of a checking sequence is similar to case of simple hypothesize the 11th International Siberian Conference on Control and Communications (Omsk, Russia, May 21-23, 2015). SIBCON-2015. with setting the number of degrees of freedom π = π β 1, where IEEE, 2015, pp. 1-4. DOI: 10.1109/SIBCON.2015.7147139. π is number of parameters estimated according to this sampling. [7] Vencel E.S.The theory of probability: Textbook for university students. Herewith, recommendations regarding grouping method remain 9-th ster. ed. - M.: Publishing House "Academia", 2003. - 576 p. (In valid. Russ.). [8] Buranova M.A. Analysis of statistical characteristics of multimedia traffic At the firth stage reasonable set of distribution functions is aggregation node in a multiservice network. / M.A. Buranova, V.G. received, each function specify particular network traffic packet Kartashevsky, M.S. Samoilov. // Radio-technical and telecommunication stream as well as their aggregate traffic source. systems. Systems, networks and devices of telecommunications. -Murom, 2014. - No 4 (16). - P. 63-69. (In Russ.). [9] Y.A. Fomin, G.R. Tarlovskii. Statistical Theory of Recognition of Images. - M .: Radio and Communication, 1986. -264 p. (In Russ.). III. CONCLUSIONS [10] Anisimov V.V., Begaev A.N., Starodubtsev Yu.I., Sukhorukova E.V., Developed method allows to: Fedorov V.G., Chukarikov A.G., The way of purposeful transformation of the model parameters of the real fragment of the communication ο· Create statistical models of mixed traffic for each network. Printed: May 23, 2016, Bul. N 15, 2620200. (In Russ.). network element. [11] Begaev A.N., Starodubtsev Yu.I., Fedorov V.G.. A method for estimating the manageability of a fragment of a public communication network, ο· Obtain statistical models of mixed traffic which can be taking into account the influence of a plurality of control centers and used for analysis of real communication networks and destructive program influences. Voprosy kiberbezopasnosti design of perspective communication networks. [Cybersecurity issues]. 2017 N 4 (22), pp. 32-39. DOI: 10.21681/2311- 3456-2017-4-32-39. ο· Provide its updating when implementing perspective [12] Starodubtsev Yu.I., Grechishnikov E.V., Komolov D.V. Use of neural protocols. networks to ensure stability of communication networks in conditions of external impacts. Telecommunications and Radio Engineering. 2011. V. With additional development of methods of model inter- 70. N 14. P. 1263-1275. comparison for various network elements obtained using proposed method fix the fact of abnormal traffic change and identify its reasons. 5