<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MG-GNN: Enhancing GNNs for Anomaly Detection via Minority Class Sample Generation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ronghui Guo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minghui Zou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sai Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaowang Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhiyong Feng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Intelligence and Computing, Tianjin University</institution>
          ,
          <addr-line>Tianjin, 300350</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Anomaly detection distinguishes anomalies from normals. In an anomaly graph, both anomalies and normals are represented as nodes, with their relationships denoted by edges. However, in graph anomaly detection, the number of anomalous nodes is typically far fewer than that of normal nodes. To address the issue of class imbalance, existing Graph Neural Networks (GNNs) tend to overlook anomalous (minority class) node samples, resulting in suboptimal performance. To solve this, we propose a method MG-GNN, which generates minority class samples for GNN in the hidden space, thereby improving the classification performance for anomalous nodes. Experiments have demonstrated the efectiveness of our method in solving this problem.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Graph anomaly detection</kwd>
        <kwd>Class imbalance</kwd>
        <kwd>Graph neural networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Typically, the graph anomaly detection (GAD) task is treated as a semi-supervised binary node
classification problem (normal vs. anomalous). However, in an anomaly graph, the number
of anomalies is significantly lower than normals. Generally, eforts to adapt GNNs to
classimbalanced graphs can be broadly categorized into two types [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]: data-level and algorithm-level
methods. Data-level methods typically attempt to balance class distribution by pre-processing
the training samples using oversampling or undersampling techniques [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Algorithm-level
methods consider misclassification costs to focus more on minority classes or to ignore majority
classes, thereby mitigating the impact of class imbalance [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>However, recent GAD methods struggle to adapt to this extreme class imbalance, leading to
poor classification performance for anomalous nodes (minority class). Table 1 summarizes the
distribution of the two classes of nodes in the YelpChi [4] and Amazon [5] datasets, as well as
the test accuracy of a recent GNN for these two classes. It is evident that anomalies constitute
only a small portion of the total nodes, and the prediction accuracy for anomalies is significantly
lower than that for normals.</p>
      <p>In this poster, we propose a method MG-GNN to solve this problem, which generates minority
class samples for GNN in the hidden space, mitigating the negative impact of class imbalances.
Specifically, we first use a GNN to map the node feature and structural information into hidden
space. Then, based on the hidden representations of the minority class, we generate a large
number of minority nodes to achieve a relatively balanced class distribution before performing
classification. The experiments show that our method can handle the class imbalance issues.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <sec id="sec-2-1">
        <title>2.1. Problem Definition</title>
        <p>Given an anomaly graph  containing both normal and anomalous nodes, the objective is to
learn a classifier  (· ) based on the graph  and a set of partially labeled nodes Train. The
classifier aims to predict the labels of the unlabeled nodes ˆ Test, where 1 represents anomalies
and 0 represents normal nodes. The task can be formalised as:</p>
        <p>(,  ) → ˆ</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Model Overview</title>
        <p>Our model, MG-GNN, consists of three main components. First, a GNN encoder transforms the
node feature and structural information into hidden space. Next, based on the representation
of the minority class in the hidden space, a large number of nodes representing the minority
class are generated, ensuring that the numbers of normal and anomalous classes are relatively
balanced. Finally, a classifier is used to perform classification under these balanced conditions.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. GNN Decoder</title>
        <p>To encode the anomaly graph, we utilize BWGNN as the backbone network due to its
lowand band-pass characteristics. It is noteworthy that we do not use GCN [7] as the encoder
here because GCN is based on the homophily assumption and cannot adequately handle the
heterophily of anomaly graphs. The decoder is defined as</p>
        <p>= BWGNN(, )
where  ∈ R×  represents the raw node features,  is the adjacency matrix of the graph,
and  is the node representation in hidden space.
(1)
(2)</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Synthetic Node Generator</title>
        <p>After obtaining the node representations , we use SMOTE [8] to generate synthetic anomalies.
The basic idea is to interpolate between samples of the target minority class and their nearest
neighbors in the hidden space. Specifically, let ℎ ∈  represent the representation of an
anomalous node . First, the nearest node ℎ ∈  of node  is found based on Euclidean
distance:
 = argmin‖ℎ − ℎ‖, ℎ ∈</p>
        <p>
          where, unlike [9, 10], we do not require node  to necessarily be an anomaly class, as recent
research [11] has shown that this strategy can better expand the decision space of anomalies.
Then, we generate a new minority class node ℎ through linear interpolation:
ℎ = ℎ  + (1 −  )ℎ
where  ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] is sampled from the Beta distribution. The synthesized node  is labeled as an
anomalous node. Therefore, we can obtain a large number of synthesized anomalous nodes.
Additionally, we can select more nearest neighbors to get more synthetic anomalous nodes.
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Classifier</title>
        <p>After synthesizing a large number of nodes, we stack the representations of the original nodes
with the synthesized abnormal nodes to obtain a more balanced class, denoted as ′. Finally,
we use another MLP as a classifier for final prediction.</p>
        <p>ˆ =  (  (′))
Finally, the loss is calculated using cross-entropy. It is important to note that the loss is computed
not only for the original nodes but also for the synthesized anomalous nodes.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <p>We employ three widely used class equalization metrics for fair comparisons, namely
F1macro, AUC and GMean. The experimental results from Table 2 show that after generating a
large number of anomaly class nodes using our method, the class imbalance is better handled
and the overall performance is improved. Additionally, from Table 3 and Table 1 we observe
(3)
(4)
(5)
a significant improvement in the accuracy of anomalies, while the accuracy of correct nodes
remains largely unafected. This demonstrates that our method efectively mitigates the impact
of class imbalance.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this poster, we propose a method MG-GNN, which generates minority class samples for
GNN in the hidden space. Experimental results demonstrate that our method can enhance the
classification performance of anomalous nodes while having minimal impact on normal nodes.
In future work, we are interested in addressing the issue of class imbalance by leveraging the
original distribution of the graph.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was supported by the Project of Science and Technology Research and Development
Plan of China Railway Corporation (N2023J044).
ICDM 2022, Orlando, FL, USA, November 28 - Dec. 1, 2022, IEEE, 2022, pp. 259–268. URL:
https://doi.org/10.1109/ICDM54844.2022.00036. doi:10.1109/ICDM54844.2022.00036.
[4] S. Rayana, L. Akoglu, Collective opinion spam detection: Bridging review networks and
metadata, in: L. Cao, C. Zhang, T. Joachims, G. I. Webb, D. D. Margineantu, G. Williams
(Eds.), Proceedings of the 21th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, Sydney, NSW, Australia, August 10-13, 2015, ACM, 2015, pp. 985–
994. URL: https://doi.org/10.1145/2783258.2783370. doi:10.1145/2783258.2783370.
[5] J. J. McAuley, J. Leskovec, From amateurs to connoisseurs: modeling the evolution of user
expertise through online reviews, in: D. Schwabe, V. A. F. Almeida, H. Glaser, R.
BaezaYates, S. B. Moon (Eds.), 22nd International World Wide Web Conference, WWW ’13, Rio
de Janeiro, Brazil, May 13-17, 2013, International World Wide Web Conferences Steering
Committee / ACM, 2013, pp. 897–908. URL: https://doi.org/10.1145/2488388.2488466. doi:10.
1145/2488388.2488466.
[6] J. Tang, J. Li, Z. Gao, J. Li, Rethinking graph neural networks for anomaly detection, in:
K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvári, G. Niu, S. Sabato (Eds.), International
Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA,
volume 162 of Proceedings of Machine Learning Research, PMLR, 2022, pp. 21076–21089.</p>
      <p>URL: https://proceedings.mlr.press/v162/tang22b.html.
[7] T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks,
in: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France,
April 24-26, 2017, Conference Track Proceedings, OpenReview.net, 2017. URL: https:
//openreview.net/forum?id=SJU4ayYgl.
[8] N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, SMOTE: synthetic minority
over-sampling technique, J. Artif. Intell. Res. 16 (2002) 321–357. URL: https://doi.org/10.
1613/jair.953. doi:10.1613/JAIR.953.
[9] S. Shi, K. Qiao, C. Chen, J. Yang, J. Chen, B. Yan, Over-sampling strategy in feature
space for graphs based class-imbalanced bot detection, in: T. Chua, C. Ngo, R. K. Lee,
R. Kumar, H. W. Lauw (Eds.), Companion Proceedings of the ACM on Web Conference
2024, WWW 2024, Singapore, Singapore, May 13-17, 2024, ACM, 2024, pp. 738–741. URL:
https://doi.org/10.1145/3589335.3651544. doi:10.1145/3589335.3651544.
[10] T. Zhao, X. Zhang, S. Wang, Graphsmote: Imbalanced node classification on graphs
with graph neural networks, in: L. Lewin-Eytan, D. Carmel, E. Yom-Tov, E. Agichtein,
E. Gabrilovich (Eds.), WSDM ’21, The Fourteenth ACM International Conference on Web
Search and Data Mining, Virtual Event, Israel, March 8-12, 2021, ACM, 2021, pp. 833–841.</p>
      <p>URL: https://doi.org/10.1145/3437963.3441720. doi:10.1145/3437963.3441720.
[11] W. Li, C. Wang, H. Xiong, J. Lai, Graphsha: Synthesizing harder samples for
classimbalanced node classification, in: A. K. Singh, Y. Sun, L. Akoglu, D. Gunopulos, X. Yan,
R. Kumar, F. Ozcan, J. Ye (Eds.), Proceedings of the 29th ACM SIGKDD Conference on
Knowledge Discovery and Data Mining, KDD 2023, Long Beach, CA, USA, August 6-10,
2023, ACM, 2023, pp. 1328–1340. URL: https://doi.org/10.1145/3580305.3599374. doi:10.
1145/3580305.3599374.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <article-title>Graphsr: A data augmentation algorithm for imbalanced node classification</article-title>
          , in: B.
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , J. Neville (Eds.),
          <source>Thirty-Seventh AAAI Conference on Artificial Intelligence</source>
          ,
          <source>AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI</source>
          <year>2023</year>
          , Washington, DC, USA, February 7-
          <issue>14</issue>
          ,
          <year>2023</year>
          , AAAI Press,
          <year>2023</year>
          , pp.
          <fpage>4954</fpage>
          -
          <lpage>4962</lpage>
          . URL: https://doi.org/10.1609/aaai.v37i4.25622. doi:
          <volume>10</volume>
          .1609/AAAI.V37I4.25622.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <article-title>Pick and choose: A gnn-based imbalanced learning approach for fraud detection</article-title>
          , in: J.
          <string-name>
            <surname>Leskovec</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Grobelnik</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Najork</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Tang</surname>
          </string-name>
          , L. Zia (Eds.),
          <source>WWW '21: The Web Conference</source>
          <year>2021</year>
          , Virtual Event / Ljubljana, Slovenia,
          <source>April 19-23</source>
          ,
          <year>2021</year>
          , ACM / IW3C2,
          <year>2021</year>
          , pp.
          <fpage>3168</fpage>
          -
          <lpage>3177</lpage>
          . URL: https://doi.org/10. 1145/3442381.3449989. doi:
          <volume>10</volume>
          .1145/3442381.3449989.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ma</surname>
          </string-name>
          , J. Wu,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Beheshti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. Z.</given-names>
            <surname>Sheng</surname>
          </string-name>
          , C. C.
          <article-title>Aggarwal, DAGAD: data augmentation for graph anomaly detection</article-title>
          , in: X.
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Ranka</surname>
            ,
            <given-names>M. T.</given-names>
          </string-name>
          <string-name>
            <surname>Thai</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Washio</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          Wu (Eds.),
          <source>IEEE International Conference on Data Mining,</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>