SAGA: Synthetic Audit Log GenerAtion for APT Campaigns

SAGA is a configurable framework that generates synthetic audit logs over arbitrary durations, embedding stealthy APT activities with fine-grained annotations aligned with the MITRE ATT&CK framework. It enables reproducible research for intrusion detection, TTP hunting, and APT campaign analysis.

Abstract

Advanced Persistent Threats (APTs) continue to evolve in sophistication, highlighting the need for reproducible, labeled audit log datasets that facilitate trustworthy cybersecurity research. SAGA (Synthetic Audit Log GenerAtion for APT Campaigns)is a configurable framework that produces synthetic audit logs combining benign and malicious activities across arbitrary durations. Malicious traces are aligned with the MITRE ATT&CK framework with detailed annotations. These datasets can be used to train and evaluate intrusion detection, technique-hunting, and campaign attribution models, promoting reproducibility and transparency in AI-driven threat research.

Key Contributions

  • • Realism: Simulated audit traces that mirror real-world enterprise activities and attack patterns.
  • • Scenario Coverage: 38 campaigns (8 known, 20 random, 10 composite) encompassing 80 ATT&CK techniques and 169 attack patterns.
  • • Detailed Labeling: Event-level annotations with corresponding tactics, techniques, timestamps, and metadata.
  • • Higher-Level Representation: Logs can be abstracted into graph or sequential representations suitable for ML and DL model training.
  • • Flexibility & Diversity: Supports arbitrary time spans, attack density, and hybrid combinations, enabling generalization and variety.
  • • Applicability: Useful for host-based intrusion detection, TTP hunting, and APT campaign attribution research.

Method Overview

The SAGA framework generates synthetic audit logs that emulate both benign enterprise operations and advanced persistent threat (APT) activities. Its design emphasizes configurability, reproducibility, and realism, allowing users to simulate complex multi-host attack scenarios over arbitrary time spans.

  • 1. Scenario Configuration: Users can define attack settings including participating hosts, campaign type, time duration, and event density to establish the simulation environment.
  • 2. Event Simulation: Both benign and malicious events are simulated based on process, file, and network interactions that align with real-world host behavior.
  • 3. Audit Log Synthesis: Events are synthesized and merged across hosts, forming a coherent sequence of system activities that maintains temporal and causal consistency.
  • 4. Labeling and Validation: Each event is annotated with corresponding MITRE ATT&CK tactics and techniques, along with contextual metadata such as timestamps and process identifiers.
  • 5. Export and Visualization: The generated audit logs can be exported for analysis, visualization, or integration into benchmark datasets for security research.
SAGA data generation process
Figure 1: Workflow of the SAGA synthetic audit log generation process

Case Study: APT28

Using APT28 as an example, SAGA simulates a spear-phishing campaign (T1566.001) delivering a malicious attachment that exploits CVE-2023-38831 in WinRAR versions below 6.23, leading to user execution (T1204.002), persistence establishment, and lateral movement. The attack progression aligns with the Mandiant adversary lifecycle, covering the Initial Compromise and Establish Foothold stages.

APT28 campaign example
Figure 2: APT28 scenario aligned with ATT&CK

Citation

BibTeX
@ARTICLE{11281529,
  author={Yi-Ting Huang, Ying-Ren Guo, Yu-Sheng Yang, Guo-Wei Wong, Yu-Zih Jheng,
  Yeali Sun, Jessemyn Modini, Timothy Lynar, Meng Chang Chen},
  journal={IEEE Transactions on Dependable and Secure Computing},
  title={{SAGA: Synthetic Audit Log Generation for APT Campaigns}},
  year={2025},
  number={01},
  ISSN={1941-0018},
  pages={1-16},
  doi={10.1109/TDSC.2025.3640696},
  url={https://doi.ieeecomputersociety.org/10.1109/TDSC.2025.3640696},
  publisher={IEEE Computer Society},
  address={Los Alamitos, CA, USA},
  month=dec
}