SAGA Datasets

Synthetic audit logs embedding configurable APT campaigns following Mandiant’s adversary lifecycle. Includes ATT&CK-aligned labels, event-level metadata, and reproducible configurations.

Overview & Downloads

Simple Evaluation Set
Baseline dataset for benchmarking
v1
Version
5 campaigns
curated APT playbooks
Full kill chain
multi-stage lifecycle
<1% malicious
realistic class imbalance
  • Each campaign modeled after known threat activity (e.g. FIN7 / APT28 / Patchwork).
  • Per-event MITRE ATT&CK labeling and phase annotations (Initial Compromise → Foothold → …).
Campaign Preview
(full list in metadata.json)
Higaisa — 607,416 events, malicious 0.005%
{
  "campaign": "Higaisa",
  "stages": [
    "Step 1. Initial Compromise",
    "Step 2. Establishing Foothold",
    "Step 3. Maintaining Presence",
    "Step 4. Internal Reconnaissance",
    "Step 5. Internal Reconnaissance",
    "Step 6. Maintaining Presence",
    "Step 7. Maintaining Presence"
  ],
  "techniques": [
    "phishing Attachment",
    "Malicious File Execution",
    "Registry Run Keys",
    "System Information Discovery",
    "System Network Configuration Discovery",
    "Masquerade Task or Service",
    "Scheduled Task"
  ],
  "event": 607416,
  "mal_event": "0.005%"
}

Full campaign inventory (all actors, stages, per-campaign event counts) is bundled as metadata.json inside the archive.

checksum: 7a68449413c9a38b5132859b237e070c62587f95489e7cc9d7670b97548ffb43
size 1.71 GB
Download size
Download
Complete Evaluation Set
Baseline for benchmarking
v2
Version
38 campaigns
total
10 mixed
blended from 3 base campaigns
~1.2M+ events
temporal & multi-host
  • 28 unique synthetic base campaigns + 10 synthetic “mixed” campaigns (3-campaign fusion).
  • Time-ordered activity capturing complex APT campaign attacks.
  • This is the reference evaluation set for SAGA.
checksum: c15e8b985b4cab1d1ebf106e01ce5c14d9b730da5af786167b58778c422992b1
size 14.5 GB
Download size
Download
Training Corpus
Large-scale synthetic audit logs for model training
public
Access
1000
single-campaign logs
100
mixed-campaign logs
1h / 1d
per-campaign time slices
  • Each “single” log = one full APT kill chain.
  • Each “mixed” log blends 3 distinct campaigns to emulate complex intrusions.
  • Both 1-hour burst windows and 1-day dwell-time windows are included, so you can train alerting models and hunting models.
checksum: For integrity verification, please refer to the Checksums.ini file.
size 5.1 TB
Download size
Download

Quick Start

Load with Python
import json

file_path="path/to/json file"
with open(file_path) as fp:
    events=[json.loads(l) for l in fp.readlines()]"

Event Schema

Expand the blocks below to reveal fields. Only the parts you open render their tables—keeping the page short.

* Each event captures a directed interaction between srcNode and dstNode.

Mandiant Lifecycle / MITRE ATT&CK Coverage

Explore phases → techniques → abilities.

License & Citation

Released for research & education. Please cite the SAGA paper.

BibTeX
@ARTICLE{11281529,
  author={Yi-Ting Huang, Ying-Ren Guo, Yu-Sheng Yang, Guo-Wei Wong, Yu-Zih Jheng,
  Yeali Sun, Jessemyn Modini, Timothy Lynar, Meng Chang Chen},
  journal={IEEE Transactions on Dependable and Secure Computing},
  title={{SAGA: Synthetic Audit Log Generation for APT Campaigns}},
  year={2025},
  number={01},
  ISSN={1941-0018},
  pages={1-16},
  doi={10.1109/TDSC.2025.3640696},
  url={https://doi.ieeecomputersociety.org/10.1109/TDSC.2025.3640696},
  publisher={IEEE Computer Society},
  address={Los Alamitos, CA, USA},
  month=dec
}