Agent Training Pipeline

Status: Draft (tracks #8) Phase: 0 — design Related: aegis-training-plan.md, intent-mapping.md, soul-hash.md

Purpose

Turn raw L1/L2 history into behavioral profiles and keep them fresh post-launch. This is Sprints 1–3 of aegis-training-plan.md, made concrete enough for a contributor to pick up.

Goals

Ingest 12 months of Ethereum L1 + major L2 history for the top 1M active addresses.
Emit profiles in the schema defined by intent-mapping.md.
Support incremental updates (per-tx streaming, no full recompute).
Produce Tier 1 (heuristic) and Tier 2 (statistical) screening artifacts with measurable precision/recall on historical exploits.

Non-goals

Tier 3 LLM escalation infra beyond a thin prompt scaffold — separate issue.
Slashing / consensus integration.

1. Data ingestion

Sources

Archive RPC — Alchemy or Infura primary; self-hosted Erigon as cost fallback.
Chains (v0): Ethereum L1, Arbitrum, Base, Optimism.

Indexer language — recommend Rust using alloy + reth-primitives. Go is acceptable if the contributor is stronger there.

Targeting (v0)

Top 1M addresses by tx count over rolling 12 months.
All Etherscan-verified contracts with ≥1000 interactions.
All bridge contracts (explicit allowlist).

Output — rows in tx_feature_log (intent-mapping.md §Schema).

2. Feature engineering

Per-tx (from receipts + traces)

from, to, value_wei, gas_used, gas_price_wei, nonce
function selector (first 4 bytes of calldata)
decoded arg summary, length-bounded
token transfer events (ERC-20 / ERC-721 / ERC-1155)
block timestamp → UTC hour, day-of-week

Aggregated, windowed (7d / 30d)

rolling mean/std for value and frequency
gas-price percentile
counterparty set + Jaccard drift vs. prior window
protocol interaction entropy over selectors

3. Model format — "BYO model" constraint

Validators may run different models. The pipeline ships:

Canonical profiles — shared, part of soul hash.
Reference screening models — Tier 1 rules + Tier 2 stat model. Validators may use or replace.
Stable I/O interface — any third-party model pluggable.

Reference Tier 2 model (v0): sklearn IsolationForest + per-address z-score on windowed features. Serialize as ONNX where possible, or versioned pickle with schema pin.

4. Backtesting — known exploit replay

Replay these exploits against the pipeline:

Exploit	Expected tier
Ronin Bridge	Tier 1 (value anomaly)
Wormhole	Tier 2 (param anomaly)
Poly Network	Tier 1 (cross-chain)
Mango Markets	Tier 2 + 3
Curve re-entrancy	Tier 1 (known pattern)
Harmony Bridge	Tier 1 (value + counterparty)

Target: ≥90% detection, ≤1% false positive on a held-out normal-tx sample.

5. Real-time path

After backfill:

Streaming consumer (Kafka or NATS) reads new blocks, updates address_profile / contract_profile incrementally.
Nightly reconciliation job recomputes from tx_feature_log and diffs against the live profile — any drift logged.

6. Interface for third-party models

interface Screener {
  screen(tx: CanonicalTx, profile: AddressProfile, contract: ContractProfile | None)
    -> { flag: 🟢🟡🟠🔴, score: u16 /* bp */, tier: 1|2|3, reasons: string[] }
}

Language bindings: Rust trait + gRPC so non-Rust implementations are first-class.

Acceptance criteria

Indexer ingests ≥12 months of Ethereum L1 for the 1M-address target set into tx_feature_log
Feature extraction emits canonical per-address and per-contract profiles
Tier 1 rule engine (~50 rules seeded from known exploit patterns)
Tier 2 reference model trained and serialized
screen(tx) -> {flag, score, tier} API, <50ms p99 for Tier 1+2
Exploit-replay backtest report with precision/recall
Streaming consumer + reconciliation job
Docs on plugging in a non-reference model

Open questions

Can we batch eth_getLogs + traces cheaply enough on hosted RPC, or is a self-hosted archive node required early?
Ship Tier 3 prompt template here or in a separate issue?
Schema versioning without invalidating historical backfills — migration policy?