AscensionAI

Distributed reinforcement learning for Slay the Spire

AscensionAI wraps a live desktop game in a training system with behavior cloning, PPO fine-tuning, action masking, parallel rollout workers, checkpoint-aware offline updates, deterministic evaluation, and public experiment reporting.

Project Snapshot

The current public snapshot focuses on making the existing ML systems work legible: training infrastructure, deterministic evaluation, dashboarding, architecture documentation, and honest early results.

Action space134
Observation vector530
BC validation accuracy84.948%
PPO rollout games4,136

What To Review First

Results Dashboard

Static viewer for fixed-seed comparisons, PPO metrics, BC validation curves, and local CSV uploads.

Open dashboard

Architecture

Trainer/worker topology, checkpoint sync, stale rollout rejection, crash handling, and scaling points.

Open docs

Experiments

Reproducible summaries for BC baseline, parallel PPO, and fixed-seed evaluation.

Open reports

Scripts

Plain-English reference for the training, evaluation, environment, plotting, and logging scripts.

Read script guide

Technical Writeup

Deep implementation notes on observation encoding, action masking, reward shaping, PPO, and limitations.

Read writeup

System In Motion

Animated diagram showing rollout workers producing files, an offline trainer updating a checkpoint, and workers reloading the model.

AscensionAI runs as a local distributed training loop: live game workers write checkpoint-tagged rollout files, the offline trainer consumes fresh batches, saves an updated checkpoint, and workers reload the policy for the next collection cycle.

Current Results

The latest PPO snapshot still improves over the BC checkpoint on the 150-game evaluation, but the wider sample shows it remains behind the heuristic baseline. The value here is the complete training system and evaluation pipeline, with clear metrics for future improvement.

Heuristic baseline

150 eval games: 15.78 average floor, 8.44 average shaped reward, 26.0% Act 2 reach rate.

View evaluation

BC checkpoint

86,297 supervised samples and 84.948% final validation accuracy from heuristic demonstrations.

View BC report

Parallel PPO

4,136 rollout games, 515 PPO update batches, 6 stale rollouts in the latest trainer batch, and a 14.70 average floor in the latest 150-game PPO eval.

View PPO report