<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Asset Subsets Identification via Investment Universe Complex Network</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mert Arda Asar</string-name>
          <email>maasar@gsu.edu.tr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Günce Keziban Orman</string-name>
          <email>korman@gsu.edu.tr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Complex networks</institution>
          ,
          <addr-line>Financial Networks, Dynamic Network Modeling, Portfolio Allocation</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Galatasaray University</institution>
          ,
          <addr-line>Ortaköy, Çırağan St, 34349 Beşiktaş, İstanbul</addr-line>
          ,
          <country country="TR">Turkey</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This study seeks to identify an optimal asset subset by ensuring diversity through company description and price behavior over time. We propose the Investment Universe Complex Network (InUCoN) framework, which models the investment universe as a complex network. InUCoN consists of three components: dynamic network generation, snapshot aggregation, and universe filtering. Experiments on S&amp;P stocks demonstrate that InUCoN reduced risk by selecting a more independent stock set, with our proposed portfolio yielding a 32% higher return compared to the unfiltered universe.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the complex and ever-evolving landscape of financial markets, investors face the significant challenge
of selecting an optimal subset of assets from a vast investment universe. Filtering this universe into
stocks from diferent sectors and behaviors is essential for distributing risk and diversifying the portfolio.
The primary dificulty lies in filtering assets by not only considering recent performance and correlations
but also preserving their inherent, often subtle, relationships. These relationships can be intricate
and dynamic, reflecting the deeply interconnected nature of financial markets. Overlooking these
complexities can lead to suboptimal portfolio construction, exposing investors to unforeseen risks.
Traditional methods often depend heavily on time-series correlations or simply cluster stocks by the
sectors in which they operate. This approach neglects the dynamic, emergent relationships among
assets because it fails to consider the investment universe as a complex, interdependent system. While
there have been studies modeling financial markets with complex networks [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ] and eforts to
identify the most appropriate metrics for constructing financial market networks [ 4], none have yet
applied this modeling approach specifically to the challenge of stock allocation.
      </p>
      <p>This study aims to identify an optimal subset of assets by ensuring diversity through company
description and price behavior over time. While recent price behaviors reflect short-term market trends
and sentiment, enduring characteristics provide a foundation for long-term decisions. We introduce the
Investment Universe Complex Network (InUCoN) framework, integrating these aspects to address previous
limitations. Our major contributions are: (i) developing a framework using complex networks for
dynamic and static modeling to diversify assets; (ii) generating a hybrid similarity metric combining both
aspects; (iii) optimizing this metric using the network’s structural consistency; and (iv) experimentally
validating the efectiveness of the filtered asset subsets.</p>
    </sec>
    <sec id="sec-2">
      <title>Method</title>
      <p>The main strategy of InUCoN is to treat the investment universe as a complex system, model it
with a complex network, and analyze it using this model. The framework has three components: (i)</p>
      <p>CEUR</p>
      <p>ceur-ws.org
dynamic network generation; (ii) snapshot aggregation; and (iii) investment universe filtering. In
dynamic network generation, assets are modeled through a series of snapshots, each representing
the system’s state at a specific time interval. This approach captures the evolving nature of asset
relationships over time. While the assets are represented as fixed nodes, their relationships (links)
change dynamically, reflecting the fluid nature of financial markets where correlations can shift rapidly
due to economic factors. Constructing these relationships carefully is essential, as they directly influence
the accuracy and efectiveness of the analysis. We define a hybrid asset similarity  ℎ (  ,   ) =
 1 ⋅   (  ,   ) +  2 ⋅   (  ,   ) considering both stock description similarity   , which is calculated
using the cosine similarity of embedding vectors generated by the DistilBert [5] model using asset
descriptions, and time-series similarity   . For   , we propose four metrics: dynamic time warping,
distance correlation, Euclidean distance, and Pearson correlation. During the time-series similarity
analysis, we apply an overlapping window approach. This ensures that the relationship between two
consecutive snapshots is not entirely lost, providing a smoother transition between time intervals
and capturing ongoing trends. The weights  1 and  2 are experimentally optimized to achieve the
highest average network consistency for all snapshots, based on a “structural consistency” index from
ifrst-order matrix perturbation [ 6]. Pairs of nodes with  ℎ values above a certain ℎ ℎ are linked
in the snapshots. ℎ ℎ is obtained during experiments.</p>
      <p>In the snapshot aggregation phase, a single market network    was formed to represent all
snapshots. Nodes in    correspond to the same nodes of snapshots. Links were established between
nodes that shared a community in at least one snapshot, with weights indicating the frequency of such
occurrences. The infomap algorithm identified communities in all snapshots. This community-based
aggregation approach allows us to capture both stable and evolving relationships between assets over
time, providing a more comprehensive view of market dynamics. In the investment universe filtering
phase, we aimed to select the most diversified stock subset using network centrality measures. We
employed a two-step process based on closeness centrality and PageRank scores, designed to identify
stocks that are both independent and influential within the network. First, we selected nodes with
lower than average closeness centrality, identifying stocks relatively disconnected from overall market
trends. From this subset, we then chose stocks with PageRank scores above a calculated threshold. A
node with a high PageRank score and low closeness centrality suggests that the stock is influential
within the network but not necessarily closely connected to other nodes. For determining the PageRank
threshold, we used the following equation:
ℎ ℎ
 =  
+ 0.5 ⋅  
(1)
Where   is the mean PageRank score of the selected nodes (those with below-average closeness
centrality), and   is the standard deviation of these scores. This approach aims to construct a
portfolio that maximizes diversification while ensuring each selected asset contributes significantly to
overall network dynamics.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments and Results</title>
      <p>We selected the 200 highest average trading volume stocks from the S&amp;P 500 for our experiments,
covering 01/06/2021 to 01/06/2024. To evaluate InUCoN’s efectiveness, we applied the Markowitz
mean-variance optimization algorithm [7] for portfolio allocation and calculated the average portfolio
return based on the allocated weights. We constructed a dynamic network using a six-month period,
with the remaining time used to measure performance, following a buy-and-hold strategy with no
rebalancing. We compared InUCoN with the baseline (the unfiltered universe) and with portfolios
using single similarity metrics instead of our proposed hybrid approach. As shown in Figure 1, the
historical cumulative returns demonstrate that InUCoN efectively reduced risk and improved portfolio
performance by selecting a more independent set of stocks.</p>
      <p>Notably, the text-only model showed strong performance, outperforming several time-series-based
approaches. This suggests that company descriptions contain valuable information for portfolio
diversification that may not be fully captured by price movements alone. By incorporating static textual
information, we may have been able to avoid some of the noise inherent in price data during the filtering
process. By leveraging both quantitative price data and qualitative company information, InUCoN
appears to capture a more holistic view of each asset’s role in the investment universe. This indicates
that while both types of data provide valuable insights, integrating them allows for a more
comprehensive understanding of a stock’s behavior and its relationships within the market network. This
comprehensive approach leads to more efective portfolio diversification and improved risk-adjusted
returns, as evidenced by the superior performance of the hybrid method in our results.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this study, we introduced the Investment Universe Complex Network (InUCoN) framework as a novel
method to refine asset selection in dynamic financial markets. By leveraging textual and time series
similarities within a complex network model, we aimed to enhance portfolio allocation algorithms.
We demonstrated the utility of the structural consistency metric in optimizing network generation.
InUCoN significantly improved portfolio outcomes compared to both the unfiltered stock universe and
previously highlighted metrics. We outperformed the unfiltered investment universe by 32% increased
return over the time period by modeling financial markets based on Pearson correlation.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work is supported by the Galatasaray University Research Fund (BAP) within the scope of project
number FBA-2024-1240, titled Bağlantı Tahmini Yöntemleri ve Sınıf Dengesizliği Sorununa Çözüm
Yaklaşımları.
[4] S. C. Ugwu, P. Miasnikof, Y. Lawryshyn, Distance correlation market graph: The case of s&amp;p500
stocks, Mathematics 11 (2023) 3832. doi:10.3390/math11183832.
[5] V. Sanh, L. Debut, J. Chaumond, T. Wolf, DistilBERT, a distilled version of BERT: smaller, faster,
cheaper and lighter, arXiv.org (2019).
[6] L. Lü, L. Pan, T. Zhou, Y.-C. Zhang, H. E. Stanley, Toward link predictability of complex networks,
Proceedings of the National Academy of Sciences of the United States of America 112 (2015)
2325–2330. doi:10.1073/pnas.1424644112.
[7] H. Markowitz, Portfolio selection, The Journal of Finance 7 (1952) 77–91.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chevallier</surname>
          </string-name>
          ,
          <article-title>Complex network analysis of global stock market co-movement during the covid-19 pandemic based on intraday open-high-low-close data</article-title>
          ,
          <source>Financial Innovation</source>
          <volume>10</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          . doi:
          <volume>10</volume>
          .1186/s40854- 023- 00548- 5.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jothimani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kavaklioğlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Başar</surname>
          </string-name>
          ,
          <article-title>Financial networks: A study of the toronto stock exchange (</article-title>
          <year>2018</year>
          )
          <fpage>4684</fpage>
          -
          <lpage>4691</lpage>
          . doi:
          <volume>10</volume>
          .1109/BigData.
          <year>2018</year>
          .
          <volume>8621969</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Durrani</surname>
          </string-name>
          ,
          <article-title>A network analysis of the pakistan stock exchange</article-title>
          ,
          <source>Webology</source>
          <volume>18</volume>
          (
          <year>2021</year>
          )
          <fpage>5744</fpage>
          -
          <lpage>5763</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>