<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Technological Learning, Innovation and Development</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1057/s41599-022-01151-2</article-id>
      <title-group>
        <article-title>Shareholder Structure of Major Technology Companies: A Graph Analytics Study during COVID-19 and Beyond</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Julio C. Esquivel</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ixent Galpin</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oscar M. Granados</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Universidad de Bogota Jorge Tadeo Lozano</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bogota</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Colombia</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2008</year>
      </pub-date>
      <volume>12</volume>
      <issue>2020</issue>
      <fpage>240</fpage>
      <lpage>258</lpage>
      <abstract>
        <p>During financial crises or other unexpected events, investors often seek to include lower-risk assets in their portfolios. Some assets are more sensitive than others to such phenomena. In the equities markets, adjustments tend to be made to the shareholdings of companies that are associated with a higher level of uncertainty. In this work, we explore the evolution of shareholder structure of various well-known companies in the technology sector during the COVID-19 pandemic and beyond. We model, as graphs, shareholder ownership data about twenty US-listed companies between 2020 and 2022. We use freely available tools to explore the bipartite interactions and generate a wide range of topologies that facilitate the identification of how shareholding structures have evolved during the pandemic. In addition, we study the role that some nodes play in the network topology and the process of change that is observed. Our findings include that (1) most investors reduced the amount invested in technology stocks during the pandemic and that these investments tended to bounce back in the post-pandemic era; (2) Vanguard Group, Inc., is the most influential investor in the network; (3) Apple has the highest market capitalization of all technology stocks for all quarters in this study, Microsoft Corp has a significantly lower market capitalization, but a significantly higher number of investors; and (4) While investors for Apple and Microsoft tend to be from London and New York, companies such as Oracle have investors from a variety of locations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>50.00%
40.00%
price (including the latest split) (d) Google (GOOGL) share price (e) Microsoft (MSFT) share price.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Materials</title>
      <sec id="sec-2-1">
        <title>2.1. Dataset</title>
        <p>June 2022. In addition to the bipartite relationships between investors and companies, the data
set integrates the geographic location (city and country of origin) and the type and subtype of
investor in which they are classified by Refinitiv. Furthermore, the categories of technology
companies such as e-commerce, Semiconductors, Cloud, Software, and IT Services, among
others, are shown.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Data Processing</title>
        <p>Data pre-processing is carried out using Python with data manipulation libraries such as Pandas
and NumPy. The pre-processing step consists of identifying and correcting or replacing faulty
records in the data set. In this way, 77,542 refined records are obtained corresponding to the
participation in the twenty companies by the 8,730 distinct investors in the data set (see Table
2). Subsequently, the data in tabular format for all twenty companies are imported into Neo4j
using the Cypher language, which provides functionality to convert the data to the graph data
model described in Section 3.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Graph Databases</title>
        <p>from biomedical data to financial fraud detection [ 15]. The graph data model is deemed
wellsuited for domains that require the representation of complex relationships between entities, as
relationships are first-class citizens in a graph Database Management System (DBMS). This is
in contrast to the well-established relational model, in which relationships between entities are
implemented using foreign keys, and often computationally expensive joins are required to link
data from diferent entities [16].</p>
        <p>Neo4j1 is a well-known example of a DBMS that implements the graph data model. It
is implemented in Java and developed by Neo Technology. It purportedly supports ACID
transactions, does not enforce a schema, and ofers high availability[ 16]. Neo4j stores the
information by creating a directed graph between the vertices and the connections between
them [17]. An undirected graph can also be supported by ignoring the direction of an edge.
Each edge has exactly one type, and nodes may have zero, one, or more labels. The declarative
Cypher language is used to query or manipulate data in Neo4j. This is inspired by SQL and
SPARQL [18]. The Cypher query language makes use of pattern matching for the selection of
data from the graph. That is, it allows users to perform queries by specifying sub-graphs to
search for within the overall graph[17]. The syntax used to represent the graph’s topological
features can be represented solely using ASCII symbols.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Graph Data Model</title>
      <sec id="sec-3-1">
        <title>3.1. Methods</title>
        <p>Category</p>
        <p>IN_CATEGORY
Company
with the projection result. Additionally, if we project the original graph into the network of
all the investors, the meaningful information may be overwhelmed by the high link density
[22]. Another situation is that the weighting of edges is a critical problem in constructing a
bipartite network projection. To solve this situation, scholars implemented several methods to
use the weighted edge [23, 24], a relevant factor in our dataset. However, the goal of this paper
is about the shareholders’ dynamics in each company. Thus, we included other data to build
the communities, e.g., geographical data like city and country.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Data Model</title>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Graph Exploration</title>
        <p>As mentioned previously, Cypher, a declarative query language, is used to query Neo4j databases.
The Neo4j Python driver may also be used to pose queries and integrate result sets from within
Python programs. The following Python code with embedded Cypher calculates the number of
nodes for each label in the data set:
City
1 MATCH (inves:Investor)-[:INVESTS_IN]-&gt;(comp:Company)
2 RETURN comp.name AS Company, count(inves) AS Count
3 ORDER BY Count DESC</p>
        <p>The results show that the five largest technology companies (GAFAM or Big Five) have the
largest number of shareholders. Microsoft (MSFT) has the largest number of investors with
7,065, (Table 1) followed by Apple (AAPL) with 6,710, Amazon (AMZN) with 6,618, Alphabet
(GOOGL) with 6,119 and Meta (META- formerly Facebook) with 6,016. The others have between
104
103
102
101</p>
        <sec id="sec-3-3-1">
          <title>IN_CATEGORY</title>
        </sec>
        <sec id="sec-3-3-2">
          <title>DOMICILED_INCITY_IN</title>
          <p>LOCATED_IN
(b) Number of relationships per type
community detection algorithms are executed on a projection of the graph data model, which
can be understood as a materialization of a view (or sub-graph) of the overall graph.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>4.1. Graph projection</title>
        <p>1 CALL gds.graph.project(
2 ’InvestorAndCompany’,
3 [’Investor’, ’Company’],
4 {
5
6
7 }
8 }
9 )
10 YIELD
11 graphName AS graph,
12 relationshipProjection AS INVESTS_INProjection,
13 nodeCount AS nodes,
14 relationshipCount AS rels</p>
        <p>INVESTS_IN:{ properties: ’Shareholding_avg’,</p>
        <p>orientation: ’NATURAL’</p>
        <p>The projections’ results for the diferent adjusted periods, and the average participation for
all periods of this study, are presented in Table 3.</p>
        <p>Table 3
Projection of Investor to Company, including the shareholding over time.</p>
        <p>Shareholding (USD Million)
Investor Company Q1 2020 Q2 2021 Q4 2021 Q2 2022 (all qAuvagrters)
The Vanguard Group, Inc. Apple Inc 1,346.91 1,264.94 1,261.26 1,270.00 1,277.06
Ellison (Lawrence Joseph) Oracle Corp 1,138.73 1,138.73 1,138.73 1,145.73 1,139.43
Berkshire Hathaway Inc. Apple Inc 980.62 887.14 887.14 890.92 912.31
BlackRock Institutional Trust Company, N.A. Apple Inc 749.42 671.03 675.69 676.87 701.29
State Street Global Advisors (US) Apple Inc 722.24 622.58 633.12 613.85 647.05
The Vanguard Group, Inc. Microsoft Corp 640.17 610.97 615.95 621.60 620.37
Fidelity Management &amp; Research Company LLC Apple Inc 359.06 338.32 338.59 338.49 343.72
BlackRock Institutional Trust Company, N.A. Microsoft Corp 347.00 325.59 333.86 335.63 338.50
State Street Global Advisors (US) Microsoft Corp 314.77 294.82 302.54 300.10 302.20
Geode Capital Management, L.L.C. Apple Inc 256.71 254.16 264.35 272.08 257.66
... ... ... ... ... ... ...</p>
        <p>In Neo4j, although graphs are always directed, it is possible to traverse the connections of a
graph according to the direction stipulated in the data model (which is called natural orientation,
in this case, it would be from the nodes Investor to the Company nodes), or in the opposite
direction to that stipulated in the data model (which is called reverse orientation, in this case, it
would be from the Company nodes to Investor nodes).</p>
      </sec>
      <sec id="sec-3-5">
        <title>4.2. Degree Centrality</title>
        <p>Degree centrality is a metric that reflects the importance of the nodes. In turn, it allows the
number of incoming and outgoing relations to be measured, taking into account the direction of
the relationship in the graph projection. The higher the degree of a node, the higher its degree
of centrality, which implies that the entity represented by the node has greater importance [25].
In the following subsections, the degree of centrality algorithm is used to understand the most
important nodes within the networks.
4.2.1. Investor Degree Centrality
We compute the unweighted and weighted degree of centrality for nodes labeled Investor in
the graph projection InvestorAndCompany. The unweighted degree of centrality counts the
number of node connections without taking into account the weights. The query is shown
below:</p>
        <p>The following query shows the distribution of the degrees of centrality obtained:
1 MATCH (i:Investor)
2 RETURN count(i.degree_centrality) AS Count,
3 avg(i.degree_centrality) AS Ave,
4 (percentileDisc(i.degree_centrality, 0.25)) AS ‘25%‘,
5 (percentileDisc(i.degree_centrality, 0.50)) AS ‘50%‘,
6 (percentileDisc(i.degree_centrality, 0.75)) AS ‘75%‘,
7 (percentileDisc(i.degree_centrality, 0.90)) AS ‘90%‘,
8 (percentileDisc(i.degree_centrality, 0.95)) AS ‘95%‘,
9 (percentileDisc(i.degree_centrality, 0.99)) AS ‘99%‘,
10 (percentileDisc(i.degree_centrality, 0.999)) AS ‘99.9%‘,
11 (percentileDisc(i.degree_centrality, 1)) AS ‘100%‘</p>
        <p>Count
8,730</p>
        <p>Thus, 50% of the investors only have connections with up to eight diferent companies.
Subsequently, to facilitate comparison, the degree of centrality is normalized. Thus, a weighted
projection is used, where the shareholding properties of the relation are considered. In this
way, the algorithms of the GDS library calculate the sum of the weights of the relation to
determine the degree of centrality of the nodes. The centrality indicator is calculated in each
period mentioned in Section 4.1, using the diferent adjusted projections. In this way, the query
is shown to determine the centrality metric for the average participation for each investor in
the diferent periods:
4.2.2. Company Degree Centrality</p>
        <p>Degree centrality (weighted)
Q2 2021</p>
      </sec>
      <sec id="sec-3-6">
        <title>4.3. Community Analysis</title>
        <p>There are many large-scale complex networks in the real-world whose structure is not fully
understood by some methods [26]. By including the city and the country as other nodes in
the shareholders’ network, we identify new patterns that bipartite networks could not reveal
because of much-hidden information about shareholders’ networks that are not easy to detect by
simple observation. Although a bipartite network has diferent methods to identify communities,
we decide to use the richness of the data set because we can detect overlapping communities
and isolated communities</p>
        <p>Communities are groups that are densely connected among their members and sparsely
connected with the rest of the network [27, 28]. Habitually, community detection allows nodes
in the graph to be grouped into clusters, in such a way that nodes in the same cluster are
more closely related than nodes in other clusters [29]. We employ the Louvain method, which
allows community detection in large networks. Louvain’s modularity algorithm detects clusters
by evaluating the density of node connections within a cluster, compared to how connected
they would be in an average or random sample [30]. This measure of community allocation
is known as modularity. Modularity takes values on a scale between -0.5 and 1, where -0.5
indicates a non-modular grouping, and 1 is a totally modular grouping. The algorithm optimizes
modularity locally on all nodes present in small communities and subsequently groups them
into larger communities.</p>
        <p>As with the degree centrality computation described in Section 4.2, we use the
InvestorAndCompany graph projection as the starting point for the Louvain community detection.
Unweighted communities are determined using the number of node connections and ignoring edge
weights, using the Cypher query presented in Figure 5a. The result defines two communities
with 4,450 and 4,300 members respectively, and modularity of 0.104 for the entire graph, with
the modularity of the communities in the range [-0.032, 0.104]. with communities in the range
[-0.0044, 0.130], expressed by the weights of the average shareholding in the projection of the
graph. 96.83% of members are grouped into five communities with populations of 5,793, 1,088,
536, 530, and 526 respectively.</p>
      </sec>
      <sec id="sec-3-7">
        <title>4.4. Network Topology</title>
        <p>(a) Unweighted
(b) Weighted
in Country (). The number of edges (), the weighted degree of centrality ( ), and
the number of weighted Louvain communities ( ). In both cases, we used the investors
with a weighted degree of centrality indicating greater investment than 1.66 million dollars.</p>
        <p>The Apple network we analyzed contains, for Q1 2020, 734 investors, 207 cities in 25 countries,
and 941 edges. By Q2 2021, 731 investors, 216 cities in 32 countries, and 948 edges. By Q4 2021,
715 investors, 217 cities in 30 countries, and 933. While by Q2 2022, there were 812 investors, 245
cities in 33 countries, and 1,058 edges. The results show that topology and the weighted Louvain
communities evolved similarly to the stock prices. Until November 2021, the financial markets
recovered the impact of the first Covid-19 wave when the financial asset prices were down
almost 25% in March 2020. After that, the global inflation fear started to afect the asset prices,
and for December 2021, the NASDAQ index had down from its highest level of 16,057 points on
November 15 to 15,644 on December 27. Henceforth, technology stock prices continued down,
and the NASDAQ index by June 13, 2022, was at 10,798 points, a fall of 30%. However, June had
shown one of the best valuation performances of 2022 that would last until August 8, when
they began to fall again. As shown in Figure 6, we detected more communities when the prices
increase, i.e., the investment appetite changes the community levels. However, some long-term
investors did not change their position considerably.</p>
        <p>Across each quarter, two crucial groups of investors are apparent: the first community
comprises those originating in New York City, and the second, those originating in London.
The topology evolution for Microsoft (MSFT) is very similar to Apple’s case. There are two
communities of investors, one originating in New York and the other in London. The number
of nodes and their relationships is lower in the fourth quarter of 2021 compared to the first
quarter of 2020.</p>
        <p>Figure 7 presents the changes in the topology of the Oracle (ORCL) network for each period
in the study. Unlike Apple (Figure 6) and Microsoft, the number of nodes and relationships for
this company increased in 2 2021 compared to the 1 2020, then decreased, and in the final
quarter increased again. Although New York City continues to dominate as a place of origin
market.</p>
        <p>In this paper, we focus on particularly examining the evolution of the stock and share data
of the largest companies in the technology sector in the United States during the
COVID19 pandemic and beyond; employing a study of some graph analytics techniques. Graphic
representations of the bipartite interaction between shareholders and companies are presented,
which allow us to determine the evolution of the shareholder structure in diferent quarters
between 2020 and 2022, reflecting the influence of the changes produced by the COVID-19
pandemic and by subsequent events, given that investment decreases each quarter, as shown by
the degree of centrality of the figures obtained.</p>
        <p>The unweighted investor degree centrality tells us about the number of companies invested
per investor. We observe that overall, most investors reduced the amount invested in technology
stocks during the pandemic (Q1 2020 to Q4 2021) and that these investments bounced back in
the post-pandemic era (Q2 2022). It is also the case that The Vanguard Group, Inc., is the most
influential investor in the network, as it invests in all 20 companies and also has the highest
shareholding across all companies. On the other hand, the company degree centrality figures
show that, while Apple has the highest market capitalization of all technology stocks for all
quarters in this study, Microsoft Corp has a significantly lower market capitalization but a
substantially higher number of investors.</p>
        <p>The Louvain method yields two communities with 4,450 and 4,300 members, respectively,
and a modularity of 0.104 for the unweighted graph. For the weighted graph, we find that
96.83% of members are grouped into five communities. The network topology analyses for
both Apple and Oracle reflect the reduction in technology prices during the pandemic and the
post-pandemic rebound. We also observe two noteworthy communities of investors in London
and New York for Apple. In contrast, for Oracle, we keep that the location of investors is more
diverse.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>