An AlphaZero-inspired chess engine trained to International Master level for just $5 — several hundred times cheaper than Google DeepMind's original, by leveraging years of advances in machine learning.
Origin
PieBot began as a small, existing Python project — a modest attempt at replicating AlphaZero, the landmark chess engine published by Google DeepMind in 2018 that achieved superhuman play purely through self-play reinforcement learning, without any human chess knowledge.
I took that foundation and set myself a challenge: how much of the original AlphaZero's capability could I replicate, and how cheaply could I do it? The original project required significant compute resources that made it inaccessible to individual developers. I wanted to close that gap.
The Work
Over several months, I significantly optimized the model architecture, training pipeline, and self-play infrastructure by incorporating advances in machine learning that have emerged since the original 2018 AlphaZero paper. The field has moved substantially — better training techniques, more efficient architectures, and improved understanding of what actually matters for chess-playing strength.
I used Gemini Deep Research, ChatGPT Pro, and Claude Max to systematically research what optimizations would yield the largest gains, rapidly evaluate ideas, and help reason through the trade-offs in model design. The result was a training process that delivered dramatically more capability per dollar spent.
Process
Results
The original AlphaZero required Google-scale TPU infrastructure to train. By leveraging the 7 years of ML research that followed, PieBot achieves a comparable (or better) cost-efficiency on consumer hardware.
| Metric | AlphaZero (2018) | PieBot (2025) |
|---|---|---|
| Training hardware | 5,000 TPUs + 44 TPUs (ML eval) | 1× RTX 4080 (consumer GPU) |
| Training duration | 9 hours (TPU-scale) | 12 hours |
| Estimated training cost | $3,000+ on modern hardware | $5 |
| Cost reduction | — | ~600× cheaper |
| Playing strength | Superhuman (beats Stockfish) | International Master level |
| Knowledge source | Self-play only (zero human knowledge) | Self-play + modern ML techniques |
What's Next
The current model reaches International Master level for $5 in training compute. With some small modifications to the training process, I believe the same architecture can be pushed to Super-Grandmaster level for approximately $15 in compute.
The key levers are longer training runs and marginal improvements to the self-play data pipeline. The architecture is already capable of the higher strength — it's a matter of giving the model more high-quality self-play experience to learn from.
Development
I used a combination of state-of-the-art AI research assistants to navigate the ML literature and figure out the optimal model. These tools allowed me to move at a pace that would have taken a research team months to achieve independently.