Blog

What Is Exploitability in Poker Bots – And How to Reduce It?

Written by

Aleksey Kozikov

Published July 21, 2025

Alexey Kozikov is the lead developer and AI specialist at PokerBotAI. With over a decade of experience in software development and AI, Alexey is a pioneer in creating automated online poker solutions to help players and poker clubs.

It starts, as it so often does, not with a bang, nor with a blunder, but with a dull throbbing sense of unease (and an inauspiciously small one): a few things are not as they should be. The bot plays well. It bluffs in the right places, value-bets mercilessly, folds when folding is painful but warranted. And yet, over tens of thousands of hands, a strange pattern appears: A good player wins, not by outplaying the computer systematically, but by finding holes in the stitching, small, persistent leaks in the armor of strategy. This is not about variance. This is about exploitability.

The Invisible Yardstick

For those of us in AI poker’s gray corridors of development, exploitability isn’t a metric so much as a spectre. As one of the authors of the new paper, Jacob Abernethy, puts it, it formalizes mathematically the average loss that a strategy would suffer when playing against a completely optimal opponent, against a best-response adversary who knows your weaknesses and your weaknesses alone. For those attempting to play GTO, the delta between aspiration and implementation is exploitability.

Consider a strategy that overfolds just a bit in a given river spot. Not disastrously. Just a smidgen more than equilibrium implies. A human might miss it. A crappy bot probably would, yes. But a high-end A.I., tweaked for adversarial edge, will attack. That fold frequency is a place to enter through — a scratch that with enough prodding can become a gash.

Measuring the Leak

You might think that such deficiencies would be easy to spot. But unlike chess or Go, poker is a game of shadows. The best‐response opponent is a ghost: Τheoretical, omniscient, patient. In reality, when actually computing exploitability you’re almost always sampling the ghost, whether by LBR rollouts or deep Monte Carlo approximations. Robson’s researchers are talking in three- and four-decimal-point sentences, using units of measure like milli-big-blinds per game (mbb/g) and, yes, they do believe that every tenth counts. A bot with 1 mbb/g exploitability is state of the art. Nine spigots, or 10, eight, two or five, but one with 300 is a leaky faucet.

Even in 2025, there is no known bot that publicly plays anywhere close to truly unexploitable poker at scale in six-max No-Limit games.” Heads-up? We’re close. But the number of decision points — the combinatorial explosion — is staggering. So what programmers do is model, generalize, solve, re-solve and always, always watch.

Where Leaks Begin

Exploitability creeps in quietly. But often it’s the price of a shortcut — hand bucketing that pools subtly different holdings, or betting abstractions that round nuanced judgments into convenient easy-to-handle forms. Sometimes it’s a function-approximation bias: a neural network has learned over millions of examples in simulated environments to predict EV, but it breaks down across an edge case that it hasn’t seen before. And sometimes, it’s an engineering decision made under duress — a random number generator too predictable, a timing pattern too consistent, a subgame solved with assumptions that no longer hold.

The fascinating thing about these problems,” he told me, “isn’t just that they happen but that they resonate. One predictable river raise size can’t hurt. But what if it’s predictable and happens all the time in standard boards? The bot becomes readable. Exploitable.

The Countermeasures

What, then, is the antidote? There isn’t one. Not exactly. But there is a mosaic of techniques, each chipping away at the risk.

CFR and its kin: Counterfactual Regret Minimization, and variants like CFR+, DCFR+, Deep CFR—these are the workhorses. They learn by iteration, playing against themselves until regrets diminish to near-zero. But even they need millions—sometimes billions—of iterations to approach minimal exploitability.
Safe subgame solving: This is where bots like Libratus and DeepStack shined. They didn’t trust their blueprints blindly. At each node, they recalculated, refined, and bounded their risk. “Never re-solve to a strategy more exploitable than your base”—a mantra of safe AI poker.
Randomization discipline: Even this is non-trivial. If your PRNG isn’t cryptographically sound, or if action timings are too rhythmic, an observant opponent can reverse-engineer your logic. The best bots jitter, both in strategy and in tempo.
Testing under fire: Continuous LBR testing, adversarial self-play, injection of off-tree bets—these are part of a rigorous fitness regimen. Bots improve not in isolation, but through stress.

A Tension, Never Resolved

GTO is the dream, exploitation is the seduction. Pure GTO play is immune but apathetic—against weak opponents, it leaves money on the table. Instead, we have something worse: Exploitative play eats fish but bleeds vs. sharks. Most sophisticated bots combine the two: a low-exploitability core with opportunistic overlays, forever kept as a leak sysop maintained watch on, always has been.

And therein lies the tension. Because every instance of a person being taken advantage of entails a risk. For each abstraction is a simplification of a universe. The reason, of course, is that poker, unlike games of perfect information, never affords perfect feedback — only noisy, delayed signals.

So we wonder, again and again: How exploitable is this strategy? What’s the chances if being informed of this leak and just by whom? Can we afford the deviation? Should we be grouping this hand class with that one? And always, behind these questions, is a deeper question: How close are we, really, to solving the game?

Not close enough, perhaps. But closer than yesterday.