Data processing method for Gini coefficient application in
                         assessing the centralization within the BTC lightning
                         network*
                         Laura Atmanavičiūtė1,∗,†, Tomas Vanagas1,†, Justinas Grigaras1,† and Saulius Masteika1,∗,†
                         1
                             Vilnius University, Kaunas Faculty, Kaunas, Lithuania

                                             Abstract
                                             The Bitcoin Lightning Network (BLN) is a second layer blockchain solution, which emerged
                                             to address scalability issues. However, potential centralization concerns have surfaced as
                                             current distribution might indicate a trend toward centralization. The Gini coefficient, a
                                             measure of inequality, can be applied to BLN to assess its centralization by analyzing the
                                             distribution of channel capacity among nodes. This research proposes a data processing
                                             method specifically designed to utilize the Gini coefficient for evaluating centralization
                                             within the BLN. Main challenge in applying the Gini coefficient to assess BLN centralization
                                             is limitations of existing research. The lack of description on how to process data makes it
                                             difficult to replicate these studies and verify the conclusions made by other researchers. The
                                             proposed data processing method addresses the challenges associated with collecting data
                                             from both Bitcoin blockchain and Lightning Network, including data linking, storage, and
                                             variable selection. Results of the experimental research of the proposed method show that
                                             Gini coefficient increased from 0.829 to 0.930. The results are confirmed by existing research
                                             and can be used for future research to explore the BLN centralization.

                                             Keywords
                                             Bitcoin, lightning network, blockchain, data processing, Gini coefficient

                                1. Introduction
                            Since its beginning Bitcoin (BTC) has undergone a remarkable evolution – a growing demand for
                         faster transactions has emerged the Lightning Network (LN), a second layer blockchain solution [1].
                         LN acts as a separate layer (Layer 2) built on top of the BTC blockchain (Layer 1). It functions like a
                         network of channels designed for micro-payments. Instead of adding every individual payment to the
                         blockchain, two counterparts open a secure channel with each other on the BTC blockchain. This
                         channel, established through a multi-signed transaction, allows them to send and receive a
                         predetermined amount of BTC back and forth quickly and efficiently [2].
                            Originally LN was designed to address scalability issues in BTC by enabling faster and cheaper
                         transactions while maintaining decentralization. But as Bitcoin Lightning Network (BLN) grows, it
                         appears to be shifting towards a more centralized architecture [3]. While BTC was designed to be
                         decentralized, the LN has witnessed a trend towards centralization, particularly evident in the
                         concentration of power among specific nodes. These nodes, often referred to as "hubs," possess a
                         disproportionately large share of the network's total channel capacity [2]. The hubs with the largest
                         capacity in the network earn super linearly more than nodes with lower capacity. This occurs when
                         the routing algorithm prioritizes routes based on capacity rather than minimizing fees [4]. This
                         concentration of resources and influence raises questions about the integrity of the LN decentralized
                         architecture.


                         *
                           IVUS2024: Information Society and University Studies 2024, May 17, Kaunas, Lithuania
                         1,∗
                            Corresponding author
                         †
                            These author contributed equally.
                             laura.atmanaviciute@knf.vu.lt (L. Atmanavičiūtė); tomas.vanagas@knf.vu.lt (T. Vanagas); justinas.grigaras@knf.stud.vu.lt (J.
                         Grigaras); saulius.masteika@knf.vu.lt (S. Masteika)
                             0000-0002-1770-670X (S. Masteika)
                                     ©️ 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
    One of the most common methods for determining the level of centralization in LN is the Gini
coefficient. It is used to measure inequalities in the distribution of specific resources or characteristics
within a certain population [5]. Gini coefficient is an important measure used in many different fields,
including healthcare economics, sociology, tourism management, social determinant of diseases and
BLN centralization studies. It can be measured using various parameters like income, population
demographics, channel capacity or resource allocation, and offering a quantitative method to evaluate
inequalities within a system [6, 7, 8, 9]. In healthcare, Gini coefficient was used to analyze inequalities
in tuberculosis incidence, where it was demonstrated how it is associated with various social
determinants of health – including income inequality, education, and demographic [6]. Similarly, the
Gini index was utilized in tourism management research to evaluate the seasonal concentration of
tourism demand, gaining insights into the distribution of visitors across different periods [8].
Furthermore, the Gini index is used to assess the distribution of healthcare resources, which revealed
disparities in the availability of physicians, paramedics, and hospital beds [9]. Moreover, in BLN
centralization studies, Gini coefficient is used to measure inequal distribution between nodes,
associating it with node capacity and channels being controlled by a few nodes [10, 11]. Overall, Gini
coefficient applications in different fields show Gini’s versatility in quantifying inequality across
different fields and this way allowing for comparisons between them.
    Existing studies which assessed the centralization of the BLN by applying Gini coefficient showed
a possible centralization. Research [10] presented the average high coefficient of 0.95 for node
capacity and average coefficient of 0.76 for the number of channels in nodes. Another research [3] also
reveals an average Gini coefficient value of 0.88 of channel count per node. Furthermore, authors [11]
presented Gini value for BLN of 0.77, while research [12] also highlighted the gradual growth of Gini
coefficient from 0.82 to 0.92 between April 2019 and January 2021.
    When utilizing the Gini coefficient in assessing the centralization of the LN, it can be measured
using the following formula:
                                           n   n                                                      (1)
                                        ∑ ∑ ¿ x i−x j ∨¿
                                    G= i=1 j=1 2             ,¿
                                              2N x
where N should be used to represent a total number of nodes, x i and x j to represent the capacity of
nodes and x as an average capacity across all nodes. The Gini coefficient ranges between 0 (perfect
equality) and 1 (maximum inequality), where 0 signifies everyone having an equal share of the
resource and 1 represents a scenario where one individual has everything [3], [11].
   The problem in applying the Gini coefficient when assessing the centralization within the BLN is
data collection across Layer 1 and Layer 2, including data linking, storing, and choosing variables for
calculations. Existing research often overlooks how data is extracted from blockchain and how data
from both layers can be linked ensuring consistency. The aim of this study is to establish a data
processing method for applying the Gini coefficient to assess the centralization of the BLN. To achieve
this aim, the research will focus on the following objectives:
   1. Extract data from both Layer 1 (L1) and Layer 2 (L2) to gather relevant datasets on node
      activity and capacity.
   2. Categorize data and link datasets from different sources, ensuring consistent integration of L1
      and L2 data without distortions.
   3. Propose a data processing method and scheme for applying the Gini coefficient to assess the
      centralization within the BLN.
   4. Implement and conduct experimental calculations using the Gini coefficient and provide visual
      representation of the results.
   This research focuses on developing a data processing method specifically designed for applying
the Gini coefficient to assess centralization within the BLN. First, data will be collected across from
both L1 (BTC blockchain) and L2 (LN). This includes aspects like data linking, storage and selecting
appropriate variables. Subsequently, a data processing method will be proposed. Finally, the paper
will apply the method in calculating the Gini coefficient for BLN. The results will be presented
visually.
    2. Data processing method
    Within the framework of the LN, the Gini coefficient serves as a metric for assessing centralization.
This method involves aggregating nodes under common or similar aliases to evaluate the distribution
of channel capacities and the consolidation of authority within the network. While other studies
measure Gini coefficient in BLN by considering node capacity, this paper proposes a method that goes
beyond analyzing individual nodes by grouping them into entities based on aliases. Through the
systematic analysis of the Gini coefficient across these entities, our objective is to discern the degree of
centralization inherent within the BLN ecosystem. This attempt facilitates a comprehensive
exploration of the network's structural composition and its implications for decentralization.
    To research the centralization tendencies withing the BLN, data will be gathered from 2 primary
sources – LN Research [13] and BTC blockchain.
    LN Research meticulously investigates the LN, a second-layer solution built on the BTC blockchain
to tackle scalability and fees [13].
    To access BTC blockchain Bitcoin Core was utilized, the authoritative BTC protocol
implementation which validates transactions and confirms blocks and Electrum node which indexes
BTC blockchain for fast information retrieval.
    LN gossip messages do not contain information when channels have been opened or closed. To
address this limitation the ‘MyNodeBTC’ environment was utilized – an operating system designed to
manage different BTC node types – to synchronize a BTC full node and an Electrum node for the
transaction indexing [10]. BTC blockchain transactions dataset includes all transactions that have
occurred on the BTC network. This includes information on the date and time when BTC was locked
in a transaction, the specific amount, and the status of the channel – whether it was closed or still
open. Examples of BTC blockchain blocks database table can be found in Figure 4 and BTC
transactions database table in Figure 5.
    BLN works by exchanging messages to enable finding payment routes within the LN. These
messages have been broadcast to all network participants and have been collected by LN Research
team. This information provides the foundation for the research. The data has been imported from
both the BTC blockchain and the LN Research repository as shown in Figure 1.


Figure 1 . Initial database tables
   This information exchange is specified in the gossip protocol, where nodes broadcast 3 types of
messages to the network – ‘Channel Announcement’, ‘Channel Update’ and ‘Node announcement’.
For the purposes of this study, 2 specific message types will be leveraged – channel announcements
and node announcements [13]:
   ‘Channel Announcement’ message announces the creation of a new payment channel on the LN,
including the unique identifier (ID) of the channel and the public keys of the two nodes participating
in the channel. Example of channel announcement database table is presented in Figure 2. The
‘ShortChannelID' provides concise information about the channel. 'NodeID1' and 'NodeID2' represent
the IDs of LN nodes that initiated the channel specified in the preceding 'ShortChannelID' field.


Figure 2. Example of channel announcement database table

    ‘Node Announcement’ message informs other nodes about a new node joining the LN, typically
containing the unique identifier (ID) of the node and optional information such as the node's operator
or its public key (depending on the implementation). As shown in Figure 3, in ‘NodeAnnouncement’
messages, there are two fields ‘NodeID’ and ‘Alias’. ‘NodeID’ field contains a unique identifier
assigned to the node within the LN, enabling identification and communication. The ‘Alias’ field
provides a user-friendly name or identifier associated with the node, allowing for easy recognition
and interaction without the need to refer to the ‘NodeID’.


Figure 3. Example of node announcement database table

    The LN does not contain information about when the channel was opened or closed to address this
limitation, the ‘MyNodeBTC’ operating system was utilized – an operating system designed to
manage different BTC node types – to synchronize a Bitcoin full node and an Electrum node for the
transaction indexing [10]. BTC blockchain contains all transactions that have occurred on the BTC
network. This includes information of the date and time when BTC was locked in a payment channel,
the specific amount, and if the channel was closed – date and time when it was closed. Example of data
fragment is shown in Figure 4. The 'BlockIndex' column denotes the height of each block in the
blockchain, starting from 0 for the Genesis Block. The 'BlockHash' serves as a unique identifier for
each block in the blockchain. The 'Timestamp' represents the UNIX timestamp of each block. The
'Time' and 'Date' fields are derived from the 'Timestamp' field and are utilized for easier data selection
in subsequent calculations.
Figure 4. Example of blockchain blocks database table

   For transactions identified as spent, it was further investigated by assigning the specific block
height where the spending transaction has occurred.


Figure 5. Example of transaction database table

    The database table ‘Blockchain_Transactions’ comprises a relevant BTC transaction list for our
research, with all imported transactions involving the opening and closing of the LN channels. Among
the fields present, 'ShortChannelID' encapsulates crucial channel details, including the block height,
transaction index within the block, and the output index within the transaction, while
'FundingBlockIndex', 'FundingTxIndex', and 'FundingOutputIndex' are derived from the
'ShortChannelID' field, signifying the block height, transaction index, and output index associated
with channel funding. Additionally, 'FundingTxID' serves as the hash of the transaction that funded
and initiated the channel, while 'Value' represents the amount of BTC locked within the lightning
channel. Furthermore, the 'SpendingBlockIndex' column denotes the block height of the transaction
that closes the channel, with open channels during the research marked with an arbitrary large
number, '9999999999', in the 'SpendingBlockIndex' field. Lastly, 'SpendingTxID' indicates the hash of
the transaction responsible for closing the channel, remaining empty if the channel remains open.
    The data collected by LN Research was linked to the relevant blockchain transactions that opened
the channels. This link was facilitated by the ‘ShortChannelID’, which consists of the block height, the
transaction index within the block, and the transaction output index, facilitating the linking of data
collected by LN Research to the relevant blockchain transactions that opened the channels.
    A method utilizing the Gini coefficient and Lorenz curve was developed to assess centralization in
the BLN. Data was imported from the BTC blockchain and LN research, grouped based on node ids
and node aliases. Filtered at six timestamps to capture a snapshot of network channels capacity
distribution. The Gini coefficient quantified centralization into a single number for a specific moment
of time, while the Lorenz curve depicted channel capacity distribution at a specific moment of time
throughout the nodes in the network. This approach enabled comprehensive analysis and trend
identification.
         Data collection and processing workflow is presented in Figure 6. This scheme illustrates data
retrieval, storage, and calculation workflow to calculate Gini coefficient of weighted degree centrality
throughout the time grouped by ‘NodeID’. At the very beginning BTC full node needs to be
synchronized, which will be used to retrieve information from the blockchain such as timestamps of
the transactions, their values and when those transactions have been spent. All transactions do not
need to be imported from the BTC blockchain – LN research’s collected BLN gossip data is utilized to
identify which transactions need to be imported by using ‘ShortChannelID’ in the
‘ChannelAnnouncement’ message. At this stage necessary data is imported to the initial database
tables.
         The research then proceeds with analysis of imported data. In this example Gini coefficient is
calculated on weighted degree centrality, grouped by nodes. To achieve this, the opened channels are
filtered at the specific moments of time and their channels capacity are summed up and then stored in
the next database table ‘_CACHED_WeightedDegreeCentralityByNode’ for further calculations. Date
variable is iterated with a granularity of 1 month, for example 2018-03-01, 2018-04-01, etc. In the
newly created database table named ‘_CACHED_WeightedDegreeCentralityByNode’ amount in BTC
is locked in BLN channels for each public node in the network at each moment of time of the iteration
– in this study it is every month since BLN inception.
         The research then utilizes the grouped data from the previous step and Gini coefficient
formula is applied to the data at each moment of time. The results are stored in the new database table
named ‘_CACHED_GiniByWeightedDegreeCentrality’. This database table has the information about
the whole network in the form of Gini coefficient, which allows to query the Gini coefficient data
whenever it is needed by the frontend or chart creation tool to visualize the data.


Figure 6. Gini coefficient calculation workflow

   After processing data according to Figure 6, data is ready to be utilized to apply Gini coefficient
calculations to assess the level of centralization of the BLN.

    3. Experimental research of proposed method
   The research employs a static analysis approach, examining the snapshots of the asset distribution
at specific points in time - timestamps. Six timestamps were utilized, starting in March 2018, then
Lightning Labs’ lnd became the first LN implementation was released, and ending in March 2023, the
most recent available data. Each timestamp and number of nodes is presented in Table 1. Static
analysis allows to track changes, analyse trends, and understand the dynamics of a phenomenon over
time. However, the research is interested not only in its static distribution at any given timestamp, but
also in its dynamic flow across different time periods.

Table 1
Lightning Network nodes at specific timestamps
         Abbr.                Timestamp                       Date                Number of nodes
           T1                 1519855474                     Mar. 2018                 467
           T2                 1551391683                     Mar. 2019                4347
           T3                 1583014153                     Mar. 2020                4978
           T4                 1614550557                     Mar. 2021                6893
             T5                   1646088233                 Mar. 2022                 15933
             T6                   1677621623                 Mar. 2023                 11889
   The Gini coefficient is a widely employed metric for evaluating inequality and plays a crucial role
in understanding the distribution of transaction activity within the LN. The Gini coefficient aids in
gauging the concentration of transactions among nodes.
   ‘Weighted degree centrality’ for a node in the BLN is calculated by summing the capacities of all its
channels. This helps to understand how important or central a node is within the network based on
the capacity of its channels. Unlike ‘Degree centrality’, which counts the number of channels a node
has, ‘weighted degree centrality’ considers the capacity of these connections [14].
   The experimental research results of Gini coefficient of BLN nodes are described in Table 2. Results
show that the Gini coefficient of the BLN has been increasing over time. At timestamp 1 Gini
coefficient is 0.832 and when with each timestamp it gets bigger and reaches 0.95 at the timestamp 6,
which indicates greater inequality. Calculations reveal an average coefficient of 0.918 and indicate
that a few nodes have a much higher weighted degree centrality than others.

Table 2
Gini coefficient of Bitcoin Lightning Network nodes on weighted degree centrality aspect
          Abbr.                  Timestamp                  Date              Gini Coefficient
            T1                    1519855474             Mar. 2018                   0.832
            T2                    1551391683             Mar. 2019                   0.892
            T3                    1583014153             Mar. 2020                   0.930
            T4                    1614550557             Mar. 2021                   0.950
            T5                    1646088233             Mar. 2022                   0.951
            T6                    1677621623             Mar. 2023                   0.954

   The experimental research results of the proposed method are shown in Table 3. These results
present the Gini coefficient of BLN entities, instead of nodes. The coefficient values a lower compared
to Table 2, but nevertheless it shows an apparent centralization of BLN entities. It was 0.829 in March
2018 and steadily grew to 0.930 in March 2023.

Table 3
Gini coefficient of Bitcoin Lightning Network entities on weighted degree centrality aspect
          Abbr.                  Timestamp                   Date              Gini Coefficient
            T1                    1519855474               Mar. 2018                  0.829
            T2                    1551391683               Mar. 2019                  0.855
            T3                    1583014153               Mar. 2020                  0.899
            T4                    1614550557               Mar. 2021                  0.921
            T5                    1646088233               Mar. 2022                  0.912
            T6                    1677621623               Mar. 2023                  0.930

   The results in Table 3 are visually represented by utilizing Lorenz curves. Figure 7 presents Lorenz
curves for the BLN entities on weighted degree centrality aspect captured at six specific timestamps.
The Gini coefficient is the area below the line of perfect equality (45 degrees), minus the area beneath
the Lorenz curve, and then this difference is divided by the total area under the line of perfect equality
[12]. Figure 7 shows how weighted degree centrality of BLN entities moves further away from the
perfect equality and area which covers Gini coefficient grows. This graph was created by retrieving
data from intermediate database table ‘_CACHED_WeightedDegreeCentralityByNode’ at specific
moments of time, joining the data with ‘Lightning_Entities’ and ‘Lightning_NodeAliases’ tables to
retrieve entity name and then grouping by the entities and summing up BTC amounts. Last step
sorting all the entities in ascending order by amount and calculating cumulative percentages of the
whole network in 1% granularity to calculate Lorenz curve.
Figure 7. Lorenz curves of weighted degree centrality of BLN entities

   Experimental research results of proposed method agree with the results of the existing research –
Gini coefficient values inequalities in the BLN. As previously analyzed, other research also measured
high values of the Gini coefficient – the values range between 0.76 and 0.95 depending on specific
timestamps and method used for calculations. This confirms that the data processing method
proposed in this paper is reliable and can be used in the future studies of assessing the centralization
within the BLN and utilizing Gini coefficient for this task.

    4. Results and conclusions
   In this paper, to assess centralization within the BLN, data was successfully extracted from
separate sources for L1 and L2. L1 data on transactions was taken from Bitcoin Core blocks and
Electrum Nodes facilitated transaction indexing. L2 data was obtained from LN research, where
specific gossip messages were used to gather relevant data.
   This study ensured consistent integration of data from both layers by focusing on specific details
within each data source. Channel-related data messages provided information on nodes IDs and
channel capacities from L2 which was then linked to blockchain transactions in L1 using a unique
identifier ‘ShortChannelID’. This linking process connected channel information directly to the actual
locked BTC within the channel and ensured consistent data without distortions.
   The data processing method for applying the Gini coefficient to assess the centralization within the
BLN was proposed and explained in detail. This paper contributes to the research of applying Gini
coefficient in BLN by grouping nodes into entities based on aliases and this way providing a broader
understanding of network distribution. This approach utilizes data from both the BTC blockchain and
LN research, this way ensuring that data for calculating the Gini coefficient is accurate.
   To evaluate the quality of the proposed data processing method, experimental calculations using
the Gini coefficient were implemented with a static analysis for the specific six different timestamps.
The data processing method proved reliable as results obtained were verified by already existing
research. In this paper, Gini coefficient for entities reached 0.930 in March 2023, and as well as other
authors’ articles, demonstrated a clear trend of increasing inequality in the BLN over time.

    5. Discussions
   This research proposes a data processing method for applying the Gini coefficient to assess
centralization within the BLN. While Gini coefficient is a valuable measure, the proposed method
opens doors for future research to explore the BLN centralization.
   This method could be potentially adapted to incorporate alternative network centrality measures,
such as degree, betweenness, eigenvector or closeness centrality, providing a more comprehensive
picture. This study might lay the path for extending the proposed method to analyze dynamic
centralization trends – future studies could incorporate real-time data collection, tracking trends and
identifying potential transition towards centralization within the BLN.
   This research also contributes to the standardized approach to centralization assessment. The
proposed method could serve as a foundation for future work towards standardizing data collection
and processing methodologies.

    6. References
[1] Divakaruni, A., Zimmerman, P. (2022). The Lightning Network: Turning Bitcoin into money.
     Finance Research Letters 52.
[2] Martinazzi, S., Flori, A. (2020). The evolving topology of the Lightning Network: Centralization,
     efficiency, robustness, synchronization, and anonymity. PloS One, 15(1), e0225966–e0225966.
     doi:10.1371/journal.pone.0225966
[3] Lin, J.-H., Primicerio, K., Squartini, T., Decker, C., & Tessone, C. J. (2020). Lightning network: a
     second path towards centralisation of the Bitcoin economy. New Journal of Physics, 22(8), 83022.
     doi:10.1088/1367-2630/aba062
[4] Carotti, A., Sguanci, C., & Sidiropoulos, A. (2023). Rational Economic Behaviours in the Bitcoin
     Lightning Network. doi:10.48550/arxiv.2312.16496
[5] Crucitti, P., Latora, V., & Porta, S. (2006). Centrality Measures in Spatial Networks of Urban
     Streets. doi:10.48550/arxiv.physics/0504163
[6] De Castro, D. B., de Seixas Maciel, E. M. G., Sadahiro, M., Pinto, R. C., de Albuquerque, B. C., &
     Braga, J. U. (2018). Tuberculosis incidence inequalities and its social determinants in Manaus
     from 2007 to 2016. International Journal for Equity in Health, 17(1), 187–187. doi:10.1186/s12939-
     018-0900-3
[7] Wong, S. K. (2010). Crime clearance rates in Canadian municipalities: A test of Donald Black's
     theory of law. International Journal of Law, Crime and Justice, 38(1), 17–36.
     https://doi.org/10.1016/j.ijlcj.2009.11.002
[8] Fernández-Morales, A., Cisneros-Martínez, J. D., & McCabe, S. (2016). Seasonal concentration of
     tourism demand: Decomposition analysis and marketing implications. Tourism Management
     (1982), 56, 172–190. https://doi.org/10.1016/j.tourman.2016.04.004
[9] Darzi Ramandi, S., Niakan, L., Aboutorabi, M., Javan Noghabi, J., Khammarnia, M., & Sadeghi, A.
     (2016). Trend of Inequality in the Distribution of Health Care Resources in Iran. Galen, 5(3).
     https://doi.org/10.31661/gmj.v5i3.618
[10] Masteika, S., Rebždys, E., Driaunys, K., Šapkauskienė, A., Mačerinskienė, A., & Krampas, E.
     (2023). Bitcoin double-spending risk and countermeasures at physical retail locations.
     International Journal of Information Management, 102727. doi:10.1016/j.ijinfomgt.2023.102727
[11] Mahdizadeh, M. S., Bahrak, B., & Sayad Haghighi, M. (2023). Decentralizing the lightning
     network: a score-based recommendation strategy for the autopilot system. Applied Network
     Science, 8(1), 73–33. doi:10.1007/s41109-023-00602-2
[12] Zabka, P., Foerster, K.-T., Decker, C., & Schmid, S. (2022). Short Paper: A Centrality Analysis of
     the Lightning Network. In Financial Cryptography and Data Security (pp. 374–385). Cham:
     Springer International Publishing. doi:10.1007/978-3-031-18283-9_18
[13] Decker, C. (2023). Lightning Network Gossip. https://github.com/lnresearch/topology
[14] Opsahl, T., Agneessens, F., & Skvoretz, J. (2010). Node centrality in weighted networks:
     Generalizing degree and shortest paths. Social Networks, 32(3), 245–251.
     doi:10.1016/j.socnet.2010.03.006