♟️ Case Study

PieBot Chess Engine

An AlphaZero-inspired chess engine trained to International Master level for just $5 — several hundred times cheaper than Google DeepMind's original, by leveraging years of advances in machine learning.

Duration Several months · 2025

Role Solo Engineer

Stack Python · PyTorch

Training cost $5 (IM level)

Origin

Starting from a Small Python Project

PieBot began as a small, existing Python project — a modest attempt at replicating AlphaZero, the landmark chess engine published by Google DeepMind in 2018 that achieved superhuman play purely through self-play reinforcement learning, without any human chess knowledge.

I took that foundation and set myself a challenge: how much of the original AlphaZero's capability could I replicate, and how cheaply could I do it? The original project required significant compute resources that made it inaccessible to individual developers. I wanted to close that gap.

The Work

Optimizing for Cost & Performance

Over several months, I significantly optimized the model architecture, training pipeline, and self-play infrastructure by incorporating advances in machine learning that have emerged since the original 2018 AlphaZero paper. The field has moved substantially — better training techniques, more efficient architectures, and improved understanding of what actually matters for chess-playing strength.

I used Gemini Deep Research, ChatGPT Pro, and Claude Max to systematically research what optimizations would yield the largest gains, rapidly evaluate ideas, and help reason through the trade-offs in model design. The result was a training process that delivered dramatically more capability per dollar spent.

600×

Cheaper than original AlphaZero on modern hardware

Cost to train to International Master level

12 hrs

Training time on RTX 4080

Process

How I Got There

Start with the Python AlphaZero baseline

Took an existing open-source Python project replicating the core AlphaZero self-play loop — Monte Carlo Tree Search combined with a neural network policy and value head — and got it running end-to-end.

Research modern ML advances with AI tools

Used Gemini Deep Research, ChatGPT Pro, and Claude Max to identify the highest-leverage improvements since AlphaZero's 2018 publication — covering architectural changes, training tricks, and efficiency improvements from subsequent research.

Implement and iterate on optimizations

Applied a targeted set of improvements to the architecture and training pipeline. Benchmarked each change against playing strength and training cost, cutting what didn't help and doubling down on what did.

Train final model on consumer hardware

Trained the optimized model on an RTX 4080 for 12 hours at a total compute cost of $5, achieving International Master-level play strength as measured against established benchmarks.

Results

AlphaZero vs. PieBot

The original AlphaZero required Google-scale TPU infrastructure to train. By leveraging the 7 years of ML research that followed, PieBot achieves a comparable (or better) cost-efficiency on consumer hardware.

Metric	AlphaZero (2018)	PieBot (2025)
Training hardware	5,000 TPUs + 44 TPUs (ML eval)	1× RTX 4080 (consumer GPU)
Training duration	9 hours (TPU-scale)	12 hours
Estimated training cost	$3,000+ on modern hardware	$5
Cost reduction	—	~600× cheaper
Playing strength	Superhuman (beats Stockfish)	International Master level
Knowledge source	Self-play only (zero human knowledge)	Self-play + modern ML techniques

What's Next

Path to Super-Grandmaster

The current model reaches International Master level for $5 in training compute. With some small modifications to the training process, I believe the same architecture can be pushed to Super-Grandmaster level for approximately $15 in compute.

The key levers are longer training runs and marginal improvements to the self-play data pipeline. The architecture is already capable of the higher strength — it's a matter of giving the model more high-quality self-play experience to learn from.

Current: International Master level

$15

Target: Super-Grandmaster level

Development

Tools & Stack

I used a combination of state-of-the-art AI research assistants to navigate the ML literature and figure out the optimal model. These tools allowed me to move at a pace that would have taken a research team months to achieve independently.

Python PyTorch AlphaZero (MCTS) Gemini Deep Research ChatGPT Pro Claude Max RTX 4080 Self-play RL