AscensionAI Experiments

011 Tempo-Targeted Smithing — mainline_v3

Diagnostics ruled out deeper search (88% of Act-2 deaths are "lost on arrival") and raw deck-power (losing decks aren't damage-deficient), leaving upgrade efficiency. The smith was upgrading a basic Strike 97% of the time; targeting the highest-tempo card instead yields 9 wins/150 (6.0%), floor 26.1 — the biggest single-lever floor gain so far, at no HP cost.

010 First Winning Agent — mainline_v2

Search-combat → first wins. A diagnose-first stack of macro levers (deck hygiene, varN Act-2/3 combat retrain, AoE/draw role scorer, conservative smith) yields mainline_v2: 7 wins/150 (4.7%), floor 24.9 — the project's first winning agent.

009 Simulator Era — Variants A–M & First Wins

Headless-sim variant series (Jun 10–14). Combat was the wall: 1-ply search nearly doubled run depth (floor 14.7→26.4, Act 3 reached), and deck hygiene on top produced the project's first 3 wins.

008 PPO 19k + HP-Urgency Heal

19,400+ games, network upgrade to (512,256,256) GELU, HP-urgency heal reward. 200-game eval: 38.1% boss WR, 20% Act 2 reach.

007 12k PPO + Boss Shaping

12,088 PPO games, 1,488 updates, boss reward shaping for all bosses, accelerated BC anchor decay, and the May 20 200-game evaluation.

006 5,146 PPO Eval

5,146 PPO games, 641 updates, and the May 18 200-game evaluation (peak pre-shaping).

005 4,136 PPO Eval

4,136 PPO games, 515 trainer updates, and the May 16 150-game evaluation.

004 Long PPO Eval

~2,500 PPO games, 311 trainer updates, and the May 14 fixed-seed comparison.

003 Fixed-Seed Eval

Heuristic, BC, and PPO checkpoint comparison on deterministic seeds.

002 Parallel PPO

Parallel rollout collection, offline trainer updates, and stale-rollout behavior.

001 BC Baseline

Behavior cloning collection and supervised warm-start training metrics.

Snapshot Summary

Run	Games	Average Floor	Avg Reward	Boss WR	Act 2+	Notes
Heuristic 150-game eval	150	15.78	8.44	39.0%	26.0%	Reference policy baseline.
BC checkpoint 150-game eval	150	12.81	-0.55	30.6%	12.0%	Playable warm start below heuristic.
PPO 5k (200-game eval)	200	15.44	4.03	31.1%	20.0%	Peak PPO pre-shaping. Within 0.34 floors of heuristic.
PPO 12k (200-game eval)	200	14.83	-0.95	21.7%	13.5%	Exploration valley post-BC-decay. 49.5% deaths on floor 16.
PPO 19k (200-game eval)	200	14.66	9.60	38.1%	20.0%	Live-era peak. Network upgrade + HP-urgency heal; boss WR nearly matches heuristic.
Sim plateau (varB, headless)	150k	~14.7	—	—	~26%	Headless-sim era. Plateau confirmed not capacity/entropy (and pre-fix deck-blind).
Sim + search combat (48-seed full run)	48	26.35	—	—	83.3%	1-ply search at combat decisions. Act 3 reached (14.6%), max floor 50 — combat was the wall.
Sim + search + deck override (48-seed)	48	23.3	—	—	~73%	First wins in project history: 3/48. Deck hygiene layered on search-combat.
mainline_v2 (150-seed)	150	24.9	—	—	—	First winning agent: 7 wins (4.7%). varN combat + deck hygiene + AoE/draw scorer + conservative smith.
mainline_v3 (150-seed)	150	26.1	—	—	—	9 wins (6.0%); biggest single-lever floor gain. + tempo-targeted smithing (Bash+/attacks vs Strikes), no HP cost.

Open machine-readable experiment registry