SAGA Datasets
Synthetic audit logs embedding configurable APT campaigns following Mandiant’s adversary lifecycle. Includes ATT&CK-aligned labels, event-level metadata, and reproducible configurations.
Overview & Downloads
Simple Evaluation Set
Baseline dataset for benchmarking
v1
Version
5 campaigns
curated APT playbooks
Full kill chain
multi-stage lifecycle
<1% malicious
realistic class imbalance
- Each campaign modeled after known threat activity (e.g. FIN7 / APT28 / Patchwork).
- Per-event MITRE ATT&CK labeling and phase annotations (Initial Compromise → Foothold → …).
Campaign Preview
(full list in metadata.json)
Higaisa — 607,416 events, malicious 0.005%
{
"campaign": "Higaisa",
"stages": [
"Step 1. Initial Compromise",
"Step 2. Establishing Foothold",
"Step 3. Maintaining Presence",
"Step 4. Internal Reconnaissance",
"Step 5. Internal Reconnaissance",
"Step 6. Maintaining Presence",
"Step 7. Maintaining Presence"
],
"techniques": [
"phishing Attachment",
"Malicious File Execution",
"Registry Run Keys",
"System Information Discovery",
"System Network Configuration Discovery",
"Masquerade Task or Service",
"Scheduled Task"
],
"event": 607416,
"mal_event": "0.005%"
}Full campaign inventory (all actors, stages, per-campaign event counts) is bundled as metadata.json inside the archive.
checksum: 7a68449413c9a38b5132859b237e070c62587f95489e7cc9d7670b97548ffb43
size 1.71 GB
Download size
Complete Evaluation Set
Baseline for benchmarking
v2
Version
38 campaigns
total
10 mixed
blended from 3 base campaigns
~1.2M+ events
temporal & multi-host
- 28 unique synthetic base campaigns + 10 synthetic “mixed” campaigns (3-campaign fusion).
- Time-ordered activity capturing complex APT campaign attacks.
- This is the reference evaluation set for SAGA.
checksum: c15e8b985b4cab1d1ebf106e01ce5c14d9b730da5af786167b58778c422992b1
size 14.5 GB
Download size
Training Corpus
Large-scale synthetic audit logs for model training
public
Access
1000
single-campaign logs
100
mixed-campaign logs
1h / 1d
per-campaign time slices
- Each “single” log = one full APT kill chain.
- Each “mixed” log blends 3 distinct campaigns to emulate complex intrusions.
- Both 1-hour burst windows and 1-day dwell-time windows are included, so you can train alerting models and hunting models.
checksum: For integrity verification, please refer to the Checksums.ini file.
size 5.1 TB
Download size
Quick Start
Load with Python
import json
file_path="path/to/json file"
with open(file_path) as fp:
events=[json.loads(l) for l in fp.readlines()]"Event Schema
Expand the blocks below to reveal fields. Only the parts you open render their tables—keeping the page short.
* Each event captures a directed interaction between srcNode and dstNode.
Mandiant Lifecycle / MITRE ATT&CK Coverage
Explore phases → techniques → abilities.
License & Citation
Released for research & education. Please cite the SAGA paper.
BibTeX
@ARTICLE{11281529,
author={Yi-Ting Huang, Ying-Ren Guo, Yu-Sheng Yang, Guo-Wei Wong, Yu-Zih Jheng,
Yeali Sun, Jessemyn Modini, Timothy Lynar, Meng Chang Chen},
journal={IEEE Transactions on Dependable and Secure Computing},
title={{SAGA: Synthetic Audit Log Generation for APT Campaigns}},
year={2025},
number={01},
ISSN={1941-0018},
pages={1-16},
doi={10.1109/TDSC.2025.3640696},
url={https://doi.ieeecomputersociety.org/10.1109/TDSC.2025.3640696},
publisher={IEEE Computer Society},
address={Los Alamitos, CA, USA},
month=dec
}