292 Building Formal Model of the Internet Routing for Risk Evaluation of Cyberattacks on Global Routing © Vitalii Zubok Pukhov Institute for Modelling in Energy Engineering National Academy of Sciences of Ukraine, Kyiv, Ukraine vitaly.zubok@gmail.com Abstract. One of the most significant problems deriving from the Border Gateway Protocol (BGP) weaknesses is route leaks and route hijacks. Estimat- ing the risks of route interception requires quantitative measurement of the im- pact of an attack on the routing distortion, and therefore, the loss of information security breach. This offers a way of exploring the topology of connections be- tween autonomous Internet systems to further formulate the risk management task as a topology problem. One of the most important steps in modeling the impact of routing attacks is to build a formal model of global Internet routing. In this paper we offer to provide formal description for objects, relations and processes of the Internet routing. Firstly we postulate the Internet as a set of IP addresses, grouped to address prefixes in routing tables. Prefixes are announced by autonomous systems with BGP protocol. Prefixes are aggregated and incapsulated one to another using subnet mask and this affects the accessibility of IP addresses in prefix. The set of routes to each prefix can be represented as directional graph, and combina- tion of all that graphs built as routes to all announced prefixes can be used as representation of the Internet at global routing level. We provide a formal de- scription of global routing objects and their relationships, as well as the process of route selection. All formulations are equally applicable to both IPv4 and IPv6 types of prefixes. Therefore, a formal description of the two components of the routing process is provided: the calculation of the IP prefix and path selection. These steps are an important in formulating the route hijack risk management problem as a task for researching effective link topology. Keywords: Global Internet Routing, Route Hijacking, Routing Model, Cyber- security. 1 Introduction Autonomous Systems (AS) use the Border Gateway Protocol (BGP) to exchange routes to their IP prefixes and set up cross-domain routes on the Internet. BGP is a distributed protocol that lacks route validation and authentication. As a result, AS may promote illegitimate routes for non-IP prefixes. These illegal route announce- ments spread and "pollute" the Internet, affecting service availability, integrity and Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 293 privacy of communications. This phenomenon, called BGP hijack hijacking, can be caused by incorrect router configuration or malicious attacks. Because any route can be originated and announced by any random network, independent of its rights to announce that route, there needs to be an out-of-band method to help BGP manage which network can announce which route. There are proposed proactive mechanisms such as Resource Public Key Infrastructure (RPKI) [1]. It's part of the Internet Rout- ing Registry system. This service provides a collective method to allow one network to filter another networks routes. Method begins with cryptographic signing the route origin. A Route Origin Authorisation (ROA) is a cryptographically signed object that states which AS is authorised to originate a certain prefix. A ROA contains three in- formational elements: the AS Number that is authorised, the prefix that may be origi- nated from the AS, and the maximum length of the prefix. However such techniques are fully effective only in global deployment, and opera- tors are reluctant to deploy them because of the associated technical and financial costs. For example, Telia, one of the Tier-I Internet backbone operators, announced that it's using RPKI for security in its internet routing infrastructure since only Sep- tember, 2019. Anti-hijack protection consists of two steps: detection and mitigation. An analysis of the mechanisms of the attack, depending on its objectives and options for its im- plementation is described in detail in [2]. Detection is mainly provided by third-party services such as BGPMon. They notify the network administrator of suspicious events related to their prefixes based on routing information. They track worldwide routes by tracing and keep track of route announcements in BGP. In the event of an incident, the affected networks begin to mitigate the consequences of the event, for example by announcing more specific prefixes to their networks or by requesting other ASs to filter out false announcements. However, due to the combination of technological and practical deployment issues, existing reactive approaches are largely inadequate. In particular, the most advanced technologies have the following major problems: − the variety of types of routing attacks and combinations of methods lead to the lack of a reliable method for detecting route interception; − operators should be informed in advance of legitimate changes to their rout- ing policy (new interactions between AS, announcement of a new prefix, etc.) so that such changes are not considered suspicious events for condition- al third party detection systems. Otherwise, adopting a less rigorous policy to compensate for the lack of updated information and reducing the number of false positives carries the risk of neglecting real events and not detecting false negatives; − only few minutes of unauthorized traffic diversion can result in heavy finan- cial losses due to unavailability of service or security breaches. At the same time, the response time to incidents is slow in any case, as current practice requires the need to manually check alerts coming from monitoring systems and third-party services. The duration of widely known incidents ranged from several hours to months [3]. In the risk assessment process, specific requirements for the quality of information are raised at the risk identification stage. The highest possible level of completeness, 294 accuracy and conformity at the time of its receipt is required. Quality requirements are also raised to the quality of information sources [3,4]. The outcome of a risk iden- tification should be structured and encompass four elements: the sources of the risk, the immediate events of the realization of the threats, the causes of these events and the expected consequences. Building a formal model of Internet routing will help solve the problem of predicting the consequences of events. 2 Graph Approach to the Internet Modelling The telecommunication network is an integral part of the information and telecom- munication systems and is characterized by different forms of communication and different types of interaction. Often a graph can serve as a mathematical model for such networks. A graph can be thought of as a set of points called vertices or nodes, joined by lines called arcs or bonds. Each link and graph node can be matched by a number of parameters that characterize natural constraints. For example, a telecom- munications network may be represented as a graph in which the edges correspond to the communication channels and the vertices to the switching nodes. Important pa- rameters can be included in the model in the form of numbers or weights assigned to the arcs and vertices of the graph. These scales can be fixed and random. Thus, for a network, typical vertex representing a switching node may have the following weights: maximum bandwidth, storage capacity etc. A typical edge is a communica- tion line and may have the following weights: maximum bandwidth, average trans- mission delay over the communication channel, reliability of the communication channel, and so on. The feasibility of constructing and using a network model as a graph depends on the physical nature of the network we study. The most obvious advisability of using the graph model in solving problems related to connectivity. In usual cases we may be interested in the task of delivering information from anywhere to anywhere. This is a structural task in which at least one path from any vertex to any other path must be established. Another important task is to find the shortest path between two, several, or all vertices of the graph, as well as find the shortest path with different constraints. Using graphs, we can also solve the problem of synthesis of network topology, that is, building an optimal system. One of the possible optimality criteria may be the surviv- ability or reliability of the telecommunication system. Since the telecommunication networks tend to damage and failure (resulting in a breach of communication), a pos- sible task may be to build a system in which the consequences of malfunctions are minimal under the specified operating conditions. Mathematical apparatus of graph theory, and later - complex network theory, in application to the global telecommunication network topology allows to analyze a single nodes degree, the degree distribution, the shortest path between a pair of nodes, the average shortest path in the network, the clustering, betweenness and many other factors and characteristics. Currently, in many studies, the graph is used to build a model of the Internet at the level of autonomous systems [5,6]. In those studies the G := (V , E ) Internet is represented as a graph where V is a set of autonomous sys- 295 tems, and E is their connections formed by the BGP-4 routing protocol. In this case, the bandwidth of the connections between the nodes of the Internet is not taken into account and does not affect the weight of the edges, thus the graph is unweighted. Node relationships between offline systems on the Internet were explored. Several types of relationships were identified, including “customer-provider”, “peer-peer”, and others. In the case of the interaction of two equal status operators, they announce to each other their own preferences and network prefixes of their customers. In the case of provider-customer interaction, the provider announces to the client all the prefixes available to him, and the client announces the prefixes of his own networks. Thus, the connections between the autonomous systems appear to be bidirectional, so the edges of the graph are undirected. Using the mathematical apparatus of graph theory, we have proposed a metric dis- tance function on the Internet for such a graph and proved its correspondence to the metric axioms of minimality, symmetry and the triangle inequality [7]. 3 Formal Description of Global Internet Routing Objects and Their Relations We come again with thought that in order to detect route hijcks and route leaks, investigate the extent of the impact on the topology, and to further assess the risks, it is necessary to have a global routing model of the Internet, that is, a BGP-based mod- el. The first difference from the approaches presented in the previous section is that the routing process itself is inextricably linked to the choice of destination. Therefore, when investigating routing interference (when tampering results in unauthorized change of direction), we will not be able to use an undirected graph as a model. To determine the required qualities of the new model, it is proposed to formalize the concept of routing. Let’s formulate the initial data. There is an address space of Internet A - a set of unique IP addresses a, which are grouped into IP prefixes p (hereinafter simply "pre- fixes"): A = {a1 , a2 , a3 ,...a A : ai ≠ a j ,{i, j} ≤ A } ; a∈ p ⊂ A . Prefixes, in turn, are grouped (often the term "aggregated" is used) from more spe- cific to less specific, as defined in [8] and shown in Fig. 1. For a complete list of pre- fixes, see the CIDR documentation. From the illustration it is clear that any IP address belongs to 32 prefixes that are "encapsulated", i.e. include each other by changing the network mask. So the entire address space A can be described by one prefix with the length of the network mask 0, or by combining two prefixes with the netmask length 1, or four with the netmask length 2, and so on: p3 ⊂ p2 ⊂ p1 ⊂ p0 ; p0 = 2 p1 = 4 p2 = 8 p3 . 296 Prefix nota- Network Number of IP Number of such tion mask bit addresses prefixes in the addrss length space x.x.x.x/32 32 1 4294967296 x.x.x.x/31 31 2 2147483648 x.x.x.x/30 30 4 1073741824 x.x.x.x/29 29 8 536870912 ….. x.x.x.0/24 24 256 16777216 x.x.x.0/23 23 512 8388608 x.x.x.0/22 22 1024 4194304 ….. x.x.0.0/17 17 32768 131072 x.x.0.0/16 16 65536 65536 x.x.0.0/15 15 131072 32768 ….. x.0.0.0/9 9 33554432 512 x.0.0.0/8 8 16777216 256 x.0.0.0/7 7 8388608 128 ….. x.0.0.0/3 3 536870912 8 x.0.0.0/2 2 1073741824 4 x.0.0.0/1 1 2147483648 2 0.0.0.0/0 0 4294967296 1 Fig.1. Aggregation of the IPv4 prefixes according to RFC 4632. In the general case, we can thus express the relation between the subsets of IP ad- dresses defined by the prefix prefixes in the set of all IP addresses:  pi = 2 j −i p j ;  i ≤ j; 0 ≤ {i, j} ≤ log A  2 (1) Prefixes are announced by autonomous systems (the term "originating" is used in the literature), so each prefix has its "origin" - an autonomous system that announces it. An example of AS interaction is shown in Fig. 2. In the normal state, each prefix is announced by only one autonomous system and there is at least one route for each prefix ( m( p) := v p , e p ), (2) 297 ep which consists of directional edges that correspond to the announcement of the vp perfix between vertices (autonomous systems). There are no routes to the unan- nounced prefix, and it cannot be considered a member of the Internet. m A set of routes p to a specific prefix p can be represented as a ring-bound, ori- ented graph without loops: M p := (V p , E p ) , (3) Vp where - connections between autonomous systems that receive the announcement E of the prefix p, and p - autonomous systems that have routes to the prefix p. Fig.2. An example of exporting announcements of IP-prefixes in the process of inter- action of autonomous systems. The vertices of the graph on the inbound edges receive prefix announcements, and on the outbound edges they relay them to other vertices (whether or not relayed, de- pending on the routing policy or channel status). One of the vertices has only the out- bound edges, and it's origin AS. Other vertices must have inbound vertices (because they accept announcements) and may have output ribbons (if they relay announce- ments). The combination (ensemble) of all graphs is the set of all routes to all prefixes. It forms a graph: G := (V , E ) : V = V p ; E =  E p p p . This graph that can be interpreted as an Internet network represented at the global routing level. 298 p p Let's delve into the routing process. If the prefix j is a subset of the prefix i , p j ⊂ pi that is , it does not mean that they necessarily have the same origin, in general the prefix has origin regardless of the nesting. Example: a portion of a large ISP’s prefix may be assigned to one of the ISP's clients and authorized to be announced by a p client to other ASs. On the other hand, if there is no announcement for a prefix j in a certain autonomous system, it does not mean that it is unreachable through it: the p presence at some node of the announcement i also means the possibility of routing pj to (this is a unilateral statement): p j ⊂ pi  m( p j ) ⊂ m( pi ) . (4) A prime example of this is an edge network device connected to the network with its single network interface. Its routing table can only contain a route to the general prefix 0.0.0.0/0 (see Figure 1), which is also called the default route. It should be noted that the prefix 0.0.0.0/0 is only announced in BGP for participants who use equipment that cannot handle the full view - a complete routing table (usually end users - small ASs that are not transitors) . 4 A Formal Description of the Route Selection Process The task of finding the best route is complicated. To solve this, the network topol- ogy, communication bandwidths, average message length should be known. This is a non-linear programming problem for which no algorithms exist, even with polynomi- al complexity. Only heuristic algorithms are known, which allow us to obtain only an approximate solution to the optimization problem. Therefore, the TCP/IP stack has adopted the so-called one-step approach to optimizing the packet route (next-hop routing) - each router and destination node only have to choose one step forward of packet transmission. A one-step approach means a distributed computation of the task of route selection, and this is an advantage that underlies the virtually endless scala- bility of the Internet. The first step in choosing a delivery destination is choosing a prefix. The most specific prefix must be selected in the routing table, taking into account (1): p(a) = {min( p j ) : a ∈ p ⊂ A, 0 < j ≤ A } j . (5) As explained in the previous paragraph, the subject of route selection in global routing is the graph node, that is, the autonomous system. The prerequisite for starting the selection process is that there is more than one route to the same prefix p. One of the available routes can be defined as the path between two nodes of graph (2), where the initial node is the autonomous decision system and the final node is the autono- mous system, which is the origin for the prefix p. Omitting specific local route attrib- utes and administratively set routing rules, the only one factor that is taken into ac- 299 count is the network topology image “captured” at the time of the routing decision. The common criterion for choosing a path is path length (best path) - the number of transit nodes between the start and end nodes. Given (2) and (3), the route to the pre- π ( p) : fix will be this route v π v ( p ) = {min ( mv ( p )) : π ∈ M p , v ∈ V p } v , (6) where v stands for outbound node in which desicion is made. Therefore, there are two components of the routing process - prefix calculation (5) and path selection (6). A formal description is provided for them. 5 Relationships Between Global Routing Objects and Different Types of IP Addresses To the date, there are two types of IP address that are different in bit rate on the Inter- net. The traditional IP address consisted of 32 bits. This limited the number of possi- ble addresses to 232 = 4294967296. The growth of the Internet, despite the introduc- tion of economical allocation of address space, led to a lack of addresses. A modern IP address, commonly referred to as an IPv6 address (unlike the traditional IPv4 ad- dress), is 128 bits long. This difference affects the operation of the link layer and net- work layer according to the OSI model. However, the principles of CIDR and routing have generally not been changed by IPv6 implementation. From a CIDR perspective, the difference IPv6 from IPv4 is this: − the IPv6 address space consists of 2128 addresses; − the maximum length of the network mask is 128 bits instead of 32; − every network address in IPv6 address space is included in 128 IPv6 prefix- es, incapsulated in each other. Multiprotocol BGP is the supported exterior gateway protocol (EGP) for IPv6. Multiprotocol BGP extensions for IPv6 supports many of the same features and func- tionality as IPv4 BGP. IPv6 enhancements to multiprotocol BGP include support for an IPv6 address family and network layer reachability information and next hop (the next router in the path to the destination) attributes that use IPv6 addresses [9]. So the BGP-4 protocol for IPv4 and IPv6 is no different - BGP speakers use the same protocol messages, path attributes, route selection criteria for both the IPv4 address space and IPv6 address space. Therefore, the description of the global routing model proposed in the previous section, including expressions (1), (4) - (6), is the same regardless of the type of IP prefixes. 6 Conclusion An important step towards assessing the risk posed by attacks on global routing is to predict the impact of the attack, namely to assess the scale of the attack (distribu- 300 tion routes, impact area, number of "damaged" routes). The routing process is inextri- cably linked to the choice of direction, so it is obvious that the routing attack affects direction change. When investigating routing interventions that result in an unauthor- ized change of direction, we must have adequate routing model. In this paper we offer to provide formal description for objects, relations and processes of the Internet rout- ing. Firstly we postulate the Internet as a set of IP addresses, grouped to address pre- fixes in routing tables. Prefixes are announced by autonomous systems with BGP protocol. Prefixes are aggregated and incapsulated one to another using subnet mask (1) and this affects the accessibility of IP addresses in prefix. The set of routes to each prefix can be represented as directional graph (3) and combination of all that graphs built as routes to all announced prefixes can be used as representation of the Internet at global routing level. The TCP/IP stack uses the so-called one-step approach to optimizing next-hop routing - each router and terminal node only have to choose one step (next step) for packet transmission. The first step in choosing a delivery destination is choosing a prefix. The most specific prefix must be selected in the routing table (5). The actor in the process of choosing a route in global routing is a graph node, that is, an autono- mous system. The prerequisite for starting the selection process is that there is more than one route to the same p prefix. One of the available routes can be defined as the path between two graph nodes. The general criterion for choosing a path is its length, that is, the number of transit nodes between the start and end nodes (6). All formula- tions are equally applicable to both IPv4 and IPv6 types of prefixes. Therefore, formal descriptions of global routing objects - autonomous systems, IP prefixes, BGP announcements, and relationships between them - are offered. A for- mal description of the two components of the routing process is identified and provid- ed: the calculation of the IP prefix, and path selection. These steps are an important step towards modeling cyberattacks on global Internet routing such as route hijacks in order to formulate the risk management task for a particular node as a task for search- ing its most effective communications topology. References 1. G. Huston, G. Michaelson et al.: Resource Public Key Infrastructure (RPKI) Validation Reconsidered. https://tools.ietf.org/html/rfc8360, last accessed on 209/10/29. 2. Zubok, V.: Metric Approach to Risk Evaluation of Cyberattacks on Global Routing: Se- lected Papers of the XVIII International Scientific and Practical Conference "Information Technologies and Security" (ITS 2018). Vol-2318 urn:nbn:de:0074-2318-4. 3. Zubok V., Mokhor V.: Exploring the relations between topology and security risk of cybernetic attacks on global Internet routing]. // Modelyuvannya ta informaciyni technologii, vol. 85, pages 23-26. 4. Risk Management – Vocabulary (ISO Guide 73:2009, IDТ) : DSTU ISO Guide 73:2013. – [Valid since 2014–07–01] . – Kyiv : Minekonomrozvytku Ukrainy, 2014. 301 5. IPv4 and IPv6 AS Core: Visualizing IPv4 and IPv6 Internet Topology at a Macroscopic Scale : http://www.caida.org/research/topology/as_core_network/, last accessed on 2019/10/29. 6. Pavlos Sermpezis, Vasileios Kotronis et al.: ARTEMIS: Neutralizing BGP Hijacking within a Minute : https://arxiv.org/abs/1801.01085 , last accessed on 2019/06/25. 7. Mokhor, V., Zubok, V.: Interconnection of the Internet nodes using methods of the complex networks theory. Кyiv : «Prometheus», 2017. – 175 pages. 8. Fuller, V., Li, T.: Classless Inter-domain Routing (CIDR): The Internet Address Assign- ment and Aggregation Plan : https://tools.ietf.org/html/rfc4632, last accessed on 2019/10/01. 9. Marques, P., Dupont, F.: Use of BGP-4 Multiprotocol Extensions for IPv6 Inter-Domain Routing : https://tools.ietf.org/html/rfc2545 , last accessed on 2019/10/12.