Back to Projects
♟️ Case Study

PieBot Chess Engine

An AlphaZero-inspired chess engine trained to International Master level for just $5 — several hundred times cheaper than Google DeepMind's original, by leveraging years of advances in machine learning.

Duration Several months · 2025
Role Solo Engineer
Stack Python · PyTorch
Training cost $5 (IM level)

Starting from a Small Python Project

PieBot began as a small, existing Python project — a modest attempt at replicating AlphaZero, the landmark chess engine published by Google DeepMind in 2018 that achieved superhuman play purely through self-play reinforcement learning, without any human chess knowledge.

I took that foundation and set myself a challenge: how much of the original AlphaZero's capability could I replicate, and how cheaply could I do it? The original project required significant compute resources that made it inaccessible to individual developers. I wanted to close that gap.

Optimizing for Cost & Performance

Over several months, I significantly optimized the model architecture, training pipeline, and self-play infrastructure by incorporating advances in machine learning that have emerged since the original 2018 AlphaZero paper. The field has moved substantially — better training techniques, more efficient architectures, and improved understanding of what actually matters for chess-playing strength.

I used Gemini Deep Research, ChatGPT Pro, and Claude Max to systematically research what optimizations would yield the largest gains, rapidly evaluate ideas, and help reason through the trade-offs in model design. The result was a training process that delivered dramatically more capability per dollar spent.

600×
Cheaper than original AlphaZero on modern hardware
$5
Cost to train to International Master level
12 hrs
Training time on RTX 4080

How I Got There

1
Start with the Python AlphaZero baseline
Took an existing open-source Python project replicating the core AlphaZero self-play loop — Monte Carlo Tree Search combined with a neural network policy and value head — and got it running end-to-end.
2
Research modern ML advances with AI tools
Used Gemini Deep Research, ChatGPT Pro, and Claude Max to identify the highest-leverage improvements since AlphaZero's 2018 publication — covering architectural changes, training tricks, and efficiency improvements from subsequent research.
3
Implement and iterate on optimizations
Applied a targeted set of improvements to the architecture and training pipeline. Benchmarked each change against playing strength and training cost, cutting what didn't help and doubling down on what did.
4
Train final model on consumer hardware
Trained the optimized model on an RTX 4080 for 12 hours at a total compute cost of $5, achieving International Master-level play strength as measured against established benchmarks.

AlphaZero vs. PieBot

The original AlphaZero required Google-scale TPU infrastructure to train. By leveraging the 7 years of ML research that followed, PieBot achieves a comparable (or better) cost-efficiency on consumer hardware.

Metric AlphaZero (2018) PieBot (2025)
Training hardware 5,000 TPUs + 44 TPUs (ML eval) 1× RTX 4080 (consumer GPU)
Training duration 9 hours (TPU-scale) 12 hours
Estimated training cost $3,000+ on modern hardware $5
Cost reduction ~600× cheaper
Playing strength Superhuman (beats Stockfish) International Master level
Knowledge source Self-play only (zero human knowledge) Self-play + modern ML techniques

Path to Super-Grandmaster

The current model reaches International Master level for $5 in training compute. With some small modifications to the training process, I believe the same architecture can be pushed to Super-Grandmaster level for approximately $15 in compute.

The key levers are longer training runs and marginal improvements to the self-play data pipeline. The architecture is already capable of the higher strength — it's a matter of giving the model more high-quality self-play experience to learn from.

$5
Current: International Master level
$15
Target: Super-Grandmaster level

Tools & Stack

I used a combination of state-of-the-art AI research assistants to navigate the ML literature and figure out the optimal model. These tools allowed me to move at a pace that would have taken a research team months to achieve independently.

Python PyTorch AlphaZero (MCTS) Gemini Deep Research ChatGPT Pro Claude Max RTX 4080 Self-play RL
Related Project
← SocialNetwork.Social
View Case Study