Home / Journals / CMC / Online First / doi:10.32604/cmc.2026.079941
Special Issues
Table of Content

Open Access

ARTICLE

A Hybrid Self-Supervised Learning Framework for Advanced Persistent Threat Detection

Marwan Ali Albahar*
Department of Computing, College of Engineering and Computing in Al-Lith, Umm Al-Qura University, Makkah, Saudi Arabia
* Corresponding Author: Marwan Ali Albahar. Email: email
(This article belongs to the Special Issue: Cyber Attack Detection in Cyber-Physical Systems)

Computers, Materials & Continua https://doi.org/10.32604/cmc.2026.079941

Received 31 January 2026; Accepted 13 April 2026; Published online 27 April 2026

Abstract

Advanced Persistent Threats (APTs) are stealthy cyberattacks that can evade detection in system-level audit logs. Provenance graphs encode these logs as interacting entities and events, exposing a causal and dependency structure that is often obscured in linear representations. Prior provenance-based detectors typically apply anomaly detection over such graphs, yet they frequently incur high false-positive rates and produce coarse grained alerts; moreover, approaches that heavily depend on node-specific identifiers (e.g., file paths) can learn spurious correlations, reducing robustness and limiting reliability across heterogeneous workloads. In this paper, we present Self-Training Adaptive Graph Encoder (stage), a lightweight, self-supervised anomaly detection framework for provenance graphs that (i) trains without attack labels and (ii) enforces leakage-free model selection and thresholding with explicit control over false-alarm rates. STAGE uses learnable degree and node-type embeddings, processed by a compact two-layer Graph Convolutional Networks (GCN) with residual connections and dual pooling. A memory augmented attention module captures global benign prototypes, improving resilience to rare-but-legitimate behaviors, and suppressing false alarms. Training combines contrastive learning over augmented graph views with a one-class Support Vector Data Description (SVDD) objective that learns a compact benign hypersphere in the embedding space. Inference, STAGE fuses neural embeddings with fixed dimensional structural graph statistics and scores them using an ensemble of classical one-class detectors. As a result, STAGE attains strong ranking quality and practical operating points on two benchmarks: the StreamSpot and Wget datasets. In the StreamSpot dataset, STAGE achieves an AUC of 0.998, operating at 95% recall with a 0% false positive rate. On the Wget dataset, it attains an AUC of 0.998 and an average precision of 0.998, achieving 100% recall and 96% precision at a 4% false positive rate. Overall, STAGE demonstrates strong empirical separability for benign-only provenance-based detection and provides an explicit mechanism to trade off recall and false positive rate through predefined thresholding policies.

Keywords

Provenance graphs; advanced persistent threats; benign-only anomaly detection; self-supervised learning; false positive control
  • 80

    View

  • 17

    Download

  • 0

    Like

Share Link