Optimization of Processing the Large Data Stream in Web-interface Nataliya V. Papulovskaya1 , Artem A. Rapoport2 1 Ural Federal University named after the first President of Russia B.N.Yeltsin, Yekaterinburg, Russia; pani28@yandex.ru 2 Uberall GmbH, Berlin, Germany Abstract. The paper presents description of the problems related to a large amount of data frequently received from a Web-server by the Web- interface causing the insufficient performance of the latest one. A com- parative analysis of methods for updating the data in the Web-interface was made, and the optimal method for updating the data in the realtime Web-application was chosen. The paper also provides an example of opti- mization of data processing using the data buffering and implementation of this example in the modern JavaScript. Keywords: data processing, data buffering, big data, Web-interface, Webprogramming, WebSocket 1 Introduction Multiple issues about insufficient performance exist in the modern Web-appli- cations development. These issues include: slow first page load because of large amounts of data; low responsiveness or even freezing of an interface when up- dating a lot of data, and others. Web-applications can store more than 900 GB of data on the server side (or in a database), for example, the data of the students from all around the world [1]. Therefore, one of the most significant problems in the development of the Web-frontend part in the Web-applications is the low speed of handling the big data flow sent from the server to the Web-client (the latter most often being a Web-browser). Implementation of updating the visual data in a Web- client can be done in different ways: update (and re-render) the data only after a specific user action; use polling (in this case the Web-client sends requests to the server with a time interval), or using the network messaging protocol, other than HTTP (HyperText Transfer Protocol), which would allow the Web-server and Web-browser to communicate in real-time (e.g. WebSocket) and others [2]. This research shows the advantages of data updating implementation by using the Websocket comparing to other implementations of real-time data Web- applications and the efficiency of using an additional data handling optimization by application of buffering. 114 2 Comparison of data updating implementations Implementation of updating the visual data in a Web-client can be done by well- known HTTP-requests, which are fired by some user actions. For instance, this exact method is used in the search engine of the Google, USA: by pressing the search button, the user sends HTTP-request to the server, and after some short period of time, the Web-client receives the response and updates the information on the page. Listing 1 contains the example of a simple HTTP-request written in the JavaScript. Listing 2-1. HTTP-request example, written in JavaScript import a x i o s from ’ a x i o s ’ ; //HTTP−r e q u e s t s l i b r a r y import C l i e n t S t o r e from ’ . / C l i e n t S t o r e ’ ; // s t o r e o f Web−c l i e n t /∗∗ Function t h a t r e q u e s t s t h e data from s e r v e r and r e c e i v e s ∗ an answer ∗ @function getInfoFromServer ∗ @return { v o i d } ∗/ function getInfoFromServer () { axios . get ( ’/ api / info ’ ) // Got ’ p o s i t i v e ’ r e s p o n s e with t h e data from t h e s e r v e r . then ( ( r e s p o n s e ) => { // R e w r i t e t h e data i n t h e Web−c l i e n t s t o r e C l i e n t S t o r e . r e w r i t e C o n t e n t ( r e s p o n s e . data ) ; }) // Got ’ n e g a t i v e ’ r e s p o n s e from t h e s e r v e r ( an e r r o r ) . . c a t c h ( ( r e s p o n s e ) => { // Inform t h e u s e r about t h e e r r o r console . e r r o r ( ’ Error while r e c e i v i n g response = ’ , r e s p o n s e . data ) ; }); } Implementation of updating the visual data in a Web-client can be done by the help of HTTP-polling. In this case, the same simple HTTP data request is used, but it is repeated with a time interval, so that the Web-client can show the real-time information (Listing 2 has the example written in the JavaScript). This method is easy to implement and can be used when developing a simple Web-interface of a network device to observe relatively small amounts of data, for example, monitor one parameters table of a network device in real time. Listing 2-2. HTTP-polling example, written in the JavaScript import a x i o s from ’ a x i o s ’ ; //HTTP−r e q u e s t s l i b r a r y import C l i e n t S t o r e from ’ . / C l i e n t S t o r e ’ ; // s t o r e o f Web−c l i e n t l e t r e f r e s h I d = n u l l ; // i d o f t h e p o l l i n g i n t e r v a l /∗ @ f u n c t i o n s e t I n f o P o l l i n g ∗ @return { v o i d } ∗/ function setInfoPolling () { refreshId = setInterval ( 115 ( ) => { axios . get ( ’/ api / info ’ ) // Got ’ p o s i t i v e ’ r e s p o n s e // with t h e data from t h e s e r v e r . then ( ( r e s p o n s e ) => { // R e w r i t e t h e data i n t h e Web−c l i e n t s t o r e C l i e n t S t o r e . r e w r i t e C o n t e n t ( r e s p o n s e . data ) ; }) // Got ’ n e g a t i v e ’ r e s p o n s e from t h e s e r v e r . c a t c h ( ( r e s p o n s e ) => { // Inform t h e u s e r about t h e e r r o r console . e r r o r ( ’ Error while r e c e i v i n g response = ’ , r e s p o n s e . data ) ; }); } , 1000 ) ; } /∗ @ f u n c t i o n s t o p I n f o P o l l i n g @return { v o i d } ∗/ function stopInfoPolling () { i f ( refreshId ) { clearInterval ( refreshId ); } } Finally, the WebSocket can be an implementation of updating the visual data. In this case, the Web-client establishes a connection with the server and subscribes to some necessary topics (themes) of data, and the server sends the data when and only when it is obligatory, for example, when the data has been changed in the database. (Listing 3 contains WebSocket example implementation written in the Javascript). Any Web-application with real-time data can be set as a good WebSocket implementation (as in the short example), because the WebSocket is the most performance-wise and server-load-wise efficient transport [3],[4]. Yan- dex.Mail uses the WebSocket to load messages (e-mails) from the server in real time (Fig.1). Fig. 1. WebSocket-connection on the Yandex.Mail page 116 Listing 2-3. Example of the WebSocket session handling, written in JavaScript c o n s t DATA URL = ’ data ’ ; c o n s t GET SNAPSHOT RPC = ’ getSnapshot ’ ; /∗ Handler t h a t a b s t r a c t s t h e AutobahnJS (WAMP) methods ∗ @ c l a s s S e s s i o n H a n d l e r ∗/ c l a s s SessionHandler { /∗ P r i v a t e f i e l d t h a t s t o r e s s e s s i o n o b j e c t ∗ @ p r i v a t e ∗/ session = null ; /∗ @ c o n s t r u c t o r ∗ @param {? S e s s i o n } [ s e s s i o n = n u l l ] ∗ C l i e n t −s e r v e r s e s s i o n . N u l l by d e f a u l t . ∗/ constructor ( session = null ) { this . setSession ( session ); } /∗ Method t h a t s t o r e s t h e s e s s i o n i n t h e ∗ SessionHandler s instance ∗ @method s e t S e s s i o n ∗ @param {? S e s s i o n } [ s e s s i o n = n u l l ] ∗ C l i e n t −s e r v e r s e s s i o n . N u l l by d e f a u l t . ∗ @return { v o i d } ∗/ s e t S e s s i o n = ( s e s s i o n = n u l l ) => { this . session = session ; }; /∗ C a l l b a c k −f u n c t i o n t h a t h a n d l e s s u b s c r i p t i o n m e s s a g e s ∗ @see h t t p : / / autobahn . ws/ j s / r e f e r e n c e . html#s u b s c r i b e ∗ Autobahn documentation ∗ @callback subscriptionCallbackFn ∗ @param { Array } a r g s a r r a y with e v e n t p ay l oa d ∗ @param { Obj ect } kwargs o b j e c t with e v e n t p a yl oa d ∗ @param { Obj ect } d e t a i l s o b j e c t with e v e n t metadata ∗ @return { v o i d } ∗/ /∗ Method t h a t s u b s c r i b e s t o data u p d a t e s m e s s a g e s ∗ @method s u b s c r i b e T o D a t a ∗ @param { s u b s c r i p t i o n C a l l b a c k F n } c a l l b a c k F u n c t i o n ∗ f u n c t i o n t h a t s h o u l d be e x e c u t e d when r e c e i v e d a message ∗ @return { Promise } ∗/ s u b s c r i b e T o D a t a = ( c a l l b a c k F u n c t i o n ) => { i f (! this . session ) { c o n s o l e . e r r o r ( ’ T r i e d t o s u b s c r i b e t o d e v i c e data but no s e s s i o n was s p e c i f i e d i n S e s s i o n H a n d l e r ’ ) ; return null ; } r e t u r n t h i s . s e s s i o n . s u b s c r i b e (DATA URL, c a l l b a c k F u n c t i o n ) ; }; /∗ Method t h a t r e q u e s t s s n a p s h o t ( f o r t h e cold start ) 117 ∗ @method g e t S n a p s h o t ∗ @return { Promise } ∗/ g e t S n a p s h o t = ( ) => { i f (! this . session ) { c o n s o l e . e r r o r ( ’ T r i e d g e t s n a p s h o t but no s e s s i o n was s p e c i f i e d i n S e s s i o n H a n d l e r ’ ) ; return null ; } r e t u r n t h i s . s e s s i o n . c a l l (GET SNAPSHOT RPC ) ; }; } The deduction (that most efficient option to handle big data is WebSocket) appears when comparing the options described above using several criteria (Ta- ble 1) because the size of the messages flow between the Web-client and the server is the smallest and most justified at this point. The method of updating data only by user action is not suitable for the application with real-time data, and polling is not suitable for it when the data becomes bloated. Moreover, with using polling, the Web-client always sends requests, even when the data in the database was not changed. Multiple WebSocket protocol “wrappers” without the standardization, on the other hand, are the apparent draw back for a devel- oper. In this case, the developer should choose a protocol supported both on the Web-client and on the server side. Despite of the tiny amount of box solutions of this protocol data transfer implementations, the WAMP (the Web Application Messaging Protocol) [6] might be a suitable open-standard protocol that allows one to implement messaging though WebSocket is relatively simply [7]. 3 Solution overview To solve the issue of excessive messages flow from the server to the Web-client, for example, in the case of monitoring a huge network, handling of each received message should contain minimum amount of calculations. So that the overall message handling time is decreased, which gives the Web-client more time to execute other tasks (for example such heavy tasks as DOM (Document Object Model) rendering) in the one-thread Web-client. This can be efficiently done with using data buffering. This kind of buffering works alike the buffering in a CPU[8]. More specifically, the Web-client does not handle the messages immediately after it receives them from the server, but stores them in a buffer, not into an array but into one message object. So applying the object to the store data is simple and should only be done once per time interval. Let us assume that Web-application (that is needed to be developed) can have either huge or small amount of data in the database. For example, in the case of network monitoring application with real-time data. If there is a low amount of network nodes (assuming their quantity is from 1 up to 50), the buffering will not change much in terms of performance because the modern computers have enough calculation resources to handle telemetries of such nodes amount in a 118 short time (if every message does not lead to recalculation and re-render of the network map graph, of course). Table 1. Comparison of the different ways of data updating in the Web-client Comparison Updates only Polling WebSocket criteria by user actions Possible to cre- no yes yes ate an applica- tion with real- time data up- dates Big number yes yes no of boilerplates (ready-made implementa- tions) and, therefore, simplicity of development Web-client to Depends on the Requests are sent by an There is only one server requests frequency of the interval, e.g. every 2 sec- request: to estab- amount user actions onds lish the connec- tion to the server Necessity of Data flow can be The polling is used in The server sends implementation large only when the real-time data ap- the data to the of the big data there is a high plications; therefore, the Web-client by flow optimiza- amount of direct data should be updated himself; there- tion methods in user requests. It quite often. On the other fore, if the server the Web-client should be handled hand, if the server con- observes fre- on the server, so tains too much data, then quent changes that other users the polling, even with of the data in do not experience the data clustering im- the database, all lags during work- plementation, might be- the changes will ing with the inter- come inappropriate deci- be sent to the face sion. Web-client as a large messages flow. It follows that the buffering influence on visual Web-interface user experience is somehow needed to be minimized when there are not so many network nodes, or, more specifically, messages from the server (when the messages come from the server less frequently than the buffering interval). On the other hand, the buffering impact on the frequency of the data updates is also needed to be increased in order to avoid stack overflow of the calculation operations and, thus, avoid the interface freezing when the messages start to come from the 119 server more frequently. To implement this, the timeout of updating the data should be refreshed when no message was received in the buffering interval at the shortest point and when the sum of the time intervals (including the time refresh-iterations) becomes somewhat critical to do a force update at the longest point. For instance, assuming minimum update interval is 200ms, if the Web- client receives only one message during these 200 ms, the data from this message will be applied to the data in the Web-client’s store immediately after this time interval. Otherwise, if the Web-client gets more than 1 message per these 200 ms, the timeout will be refreshed and will wait for the next message again, then the timeout will be refreshed again etc. This will go on until the total sum of the time intervals becomes critical, e.g. equal to 1 second. In this case, all the data, which were merged from the messages (that were received during this 1 second) will be applied to the Web-client’s data store. Thus, when the messages flow is large, the update occurs only 1 time per second and when the messages flow is smaller, the update can occur from as frequent as 1 time per 200 ms to as frequent as 1 time per 1 second. Listing 4 contains the implementation of the algorithm above written in JavaScript, EcmaScript 2015 standard [9]. There, the UpdateHandler class does the accumulating and applying the accumulated data to the Web-client’s data store. In this implementation, the class handles the messages about the events that occur in a massive network, therefore, the buffering needed to be intro- duced so that the operations delay becomes minimal. The external WebSocket message handlers pass the messages with data updates of events, occurred in the network, to the “bufferedUpdate” method. Assuming that the constant named Constants.DEFAULT UPDATE TIMEOUT is equal to 200 ms, the implemen- tation of the algorithm described above is presented in the listing. Listing 3-4. The example of updates handling with buffering usage, written in the JavaScript import ∗ a s C o n s t a n t s from ’ . . / Constants ’ ; import ∗ a s e v e n t s A c t i o n s from ’ a c t i o n s / e v e n t s A c t i o n s ’ ; c l a s s UpdateHandler { /∗ i d o f t h e t i m e o u t with minimum i n t e r v a l ∗/ interval = null ; /∗ i d o f t h e t i m e o u t with maximum i n t e r v a l ∗ ( when t h e f o r c e update o c c u r s ) ∗/ maxInterval = null ; /∗ @type { Boolean } ∗ Boolean f l a g o f t h e f o r c e update ∗/ forceApply = f a l s e ; /∗ Main data s t o r e i n s t a n c e ∗/ store = null ; 120 /∗ @type { S t r i n g } ∗ Updating mode ( can be e i t h e r s e t ∗ to updating or accumulating ) ∗/ mode = n u l l ; /∗ @type { Ob ject } ∗ Ob ject with accumulated u p d a t e s o f type ∗ { e v e n t s : Map} ∗/ acc umul atedU pdat es ; constructor ( store ) { t h i s . acc umul atedU pdat es = { e v e n t s : new Map ( ) } ; /∗ i n i t i a l i z i n g t h e v a l u e f o r t h e accumulated u p d a t e s ∗/ this . store = store ; // s a v i n g t h e i n s t a n c e o f t h e main s t o r e i n t h e f i e l d c l e a r I n t e r v a l ( t h i s . i n t e r v a l ) ; // s t o p p i n g u p d a t e s t h i s . startBatchedWaitingTime ( ) ; /∗ i n i t i a t i n g e x t e r n a l t i m e o u t with t h e i n t e r v a l o f t h e f o r c e update ∗/ } /∗ @ p r i v a t e ∗ @method ap ply Acc umu lat edU pda tes If Nee ded ∗ Method t h a t a p p l i e s t h e data , t h a t was accumulated ∗ during the i n t e r v a l of updating ∗ I f t h e r e was no message from t h e s e r v e r , ∗ data i s not needed t o be a p p l i e d ∗/ ap ply Acc umu lat edU pda tes If Nee ded = ( ) => { i f ( t h i s . acc umul atedU pdat es . e v e n t s . s i z e > 0 ) { clearTimeout ( t h i s . maxInterval ) ; /∗ s t o p p i n g ‘ e x t e r n a l ‘ t i m e o u t with t h e i n t e r v a l o f t h e f o r c e update ∗/ clearInterval ( this . interval ); // s t o p p i n g ‘ i n t e r n a l ‘ t i m e o u t t h i s . startBatchedWaitingTime ( ) ; /∗ i n i t i a t i n g ‘ e x t e r n a l ‘ t i m e o u t with t h e i n t e r v a l o f t h e f o r c e update ∗/ t h i s . applyAccumulatedUpdates ( ) ; // a p p l y i n g accumulated data u p d a t e s t o t h e s t o r e } }; s t a r t B a t c h e d W a i t i n g T i m e = ( ) => { // s e t t i n g f o r c e update f l a g t o f a l s e this . forceApply = f a l s e ; // S e t t i n g t h e f o r c e update f l a g t o t r u e a f t e r 1 s e c o n d t h i s . m a x I n t e r v a l = setTimeout ( t h i s . clearBatchedWaitingTime , C o n s t a n t s .DEFAULT UPDATE TIMEOUT ∗ 5 ) ; 121 }; c l e a r B a t c h e d W a i t i n g T i m e = ( ) => { this . forceApply = true ; }; bufferedUpdate ( updates ) { // s t o p p i n g ‘ i n t e r n a l ‘ t i m e o u t clearInterval ( this . interval ); // adding u p d a t e s ( a c c u m u l a t i n g ) t h i s . accumulateUpdates ( updates ) ; i f ( t h i s . f o r c e A p p l y === t r u e ) { /∗ i f t h e f o r c e update f l a g i s s e t t o t r u e u p d a t i n g t h e data ∗/ t h i s . ap ply Acc umu lat edU pda tes If Nee ded ( ) ; // e x i t i n g t h e c u r r e n t method return ; } t h i s . i n t e r v a l = s e t I n t e r v a l ( ( ) => { /∗ c r e a t i n g ‘ i n t e r n a l ‘ timeout , a f t e r which t h e data update i s c a l l e d ∗/ t h i s . ap ply Acc umu lat edU pda tes If Nee ded ( ) ; } , C o n s t a n t s .DEFAULT UPDATE TIMEOUT ) ; } accumulateUpdates ( updates ) { i f ( ! updates ) { return ; } u p d a t e s . f o r E a c h ( update => { /∗ e l s e , f o r each update add t h e data t o accumulated u p d a t e s o b j e c t ∗/ t h i s . acc umul atedU pdat es . e v e n t s = UpdateHandler . appendEventMessage ( t h i s . acc umul atedU pdat es . e v e n t s , update ) ; }); } s t a t i c appendEventMessage ( i n i t i a l V a l u e , newMessage ) { /∗ Here , depending on t h e message s t r u c t u r e , new data about t h e network e v e n t s i s added ∗/ r e t u r n i n i t i a l V a l u e . s e t ( newMessage . id , newMessage . v a l ) ; } applyAccumulatedUpdates ( ) { c o n s t { e v e n t s } = t h i s . acc umul atedU pdat es ; // a p p l y i n g data t o t h e s t o r e t h i s . s t o r e . d i s p a t c h ( e v e n t s A c t i o n s . updateEvents ( e v e n t s ) ) ; /∗ i n i t i a l i z i n g t h e v a l u e f o r t h e accumulated u p d a t e s ∗/ t h i s . acc umul atedU pdat es . e v e n t s = new Map ( ) ; } } 122 This algorithm was successfully implemented and used in the task of creating the Web-interfaces of network and geographical map that might contain up to 10 thousand of network devices and around 50 thousand wireless broadband links between them. The Web-server can send messages quicker than 1 message per 1 ms in the configuration with such a big network. Therefore the one-thread Web-client could not be able to handle and apply the data without a significant visual delay (when not using buffering). Applying adaptive methods of the data handling, on the other hand, allows one to increase performance while decreasing the calculations amounts for any width of the data flow. 4 Conclusion Modern Web-programming technologies allow the developers to implement the Web-applications that do not have the excessive calculations and optimization operations. Especially, it is important when having a low amount of data but also do not have the significant calculations delay when having a big data flow. Different development tasks require using different techniques, methods, and approaches to handle information and optimize the data flow. Nevertheless, using the WebSocket and buffering in the Web-client is an efficient way to organize and handle the data updating in the Web-application and to optimize the data flow. References 1. AWS Case Study: Kaplan. https://aws.amazon.com/solutions/case-studies/ kaplan/ 2. Liping, G., Dongfang, G., Naixue, X., Changhoon, L.: CoWebDraw: a real-time collaborative graphical editing system supporting multi-clients based on HTML5. Multimedia Tools and Applications. Vol. 77, 4, 5067–5082 (2018) 3. Chto takoe Long-Polling, WebSockets, SSE i Comet. https://myrusakov.ru/ long-polling-websockets-sse-and-comet.html 4. Postojannoe soedinenie mezhdu brauzerom i serverom. https://www.insight-it. ru/interactive/2012/postoyannoe-soedinenie-mezhdubrauzerom-i-serverom/ 5. Kotov, A., Krasil’nikov, N.: Klasterizacija dannyh. http://yury.name/internet/ 02ia-seminar-note.pdf 6. WAMP - The Web Application Messaging Protocol. http://wampproto.org/ 7. GitHub - WAMP in JavaScript for Browsers and NodeJS. https://github.com/ crossbario/autobahn-js 8. Muller, H., Flynn, M. J.: Processor Architecture and Data Buffering. IEEE Trans- actions on computers. Vol. 41, 10, 1211-1222 (1992) 9. ECMAScript 2015 Language Specification – ECMA-262 6th Edition. http://www. ecma-international.org/ecma-262/6.0/