Personal Research Notes on Mathematics | Pre-flop Player Model – continued

Table of Contents

Overview
The Reality of Mixed Strategies
Position and Pre-Flop Strategy
Stacks and Pre-flop All-in Strategies
Rake and Ante
Summary

Overview

Continuing from last time, we will continue to consider preflop behavioral models. In this article, we will expand on the previous model by adding the following components:

Mixed strategy (changing betting strategy with a certain probability even with the same hand)
Adjusting VPIP and bet size for each position
Changing strategy based on stack
Bluffs and All-ins
Ante and Rake

The Reality of Mixed Strategies

Your brain is not capable of randomization.

Andrew Brokos, “Play Optimal Poker”

In games with incomplete information such as Texas Hold’em, there are times when the optimal strategy is to choose different actions (mixed strategies) with a certain probability even in the same situation. A good example would be rock-paper-scissors. In rock-paper-scissors, if you choose the same action every time, your opponent will know your hand and you will be more likely to lose. The most optimal strategy is to randomly choose rock, paper, or scissors with a probability of 1/3. However, if you were told to randomly select rock, paper, or scissors with a probability of 1/3, would you be able to act in accordance with that? In reality, you will likely end up playing a particular hand more than once, or there will be serial correlation in the order in which you play them. Typically, there is a tendency for the probability of playing a different hand from the previous one to increase. It seems that in an attempt to bring the probability closer to 1/3, players often adopt a cyclical hand strategy such as rock-paper-scissors, rock-paper-scissors, etc.

It is clear that generating uniform random numbers and randomly choosing actions, as computers do, is a difficult decision-making method for humans. Because humans place importance on efficiency in their work, they tend to stylize their actions to a certain extent and behave in a certain pattern. Even in online poker, some players seem to use a computer to generate random numbers (randomizers) for a mixed strategy. In addition, humans are animals whose physical characteristics inevitably come out when they lie. In poker, this is called a tell, but it seems that there are often differences in behavior between bluffing or slow playing to try to trap the opponent and simply making a value bet. Considering this, it may be that implementing a mixed strategy effectively is quite difficult and requires training.

The purpose of a mixed strategy is to prevent your opponent from reading your strategy, but in a real poker game, there aren’t many situations where you are playing while remembering your opponent’s strategy. Unless you are playing with celebrities or regular players who go to the same casino every day, it can be hard to imagine your opponent’s strategy. After playing just a few dozen games, it seems difficult to distinguish whether your opponent’s actions are a strategy or whether they were just dealt a hand that led to that kind of behavior by chance. Players who play online games or at casinos while traveling abroad generally do not know the strategy, and most likely will leave the table by the time it becomes clear. Given this, it seems a bit unclear how much value there is in using a mixed strategy.

On the contrary, in reality, most human brains do not seem to be equipped to adopt the same behavior in the same situations. The reason why people engaged in simple labor endure repeating the same tasks is probably because they are forced and managed in a situation where they cannot make a living unless they endure it. It is assumed that simple labor workplaces are often environments where power harassment and violent oppression occur to some extent, and in modern times it is common for workers to work in conditions where they are constantly monitored by surveillance cameras. Furthermore, the income one can earn there is almost always set at the bare minimum required for survival. Unless they are forced into such a situation, it seems that humans are animals who find it difficult to behave in the same way in the same situation. It may be said that it is not a reasonable assumption to expect people to behave in accordance with such rules in games such as poker or gambling, which are games that are played in a way that is somewhat free from everyday life.

As described above, in reality, almost no one is able to intentionally employ a mixed strategy, and conversely, humans are unable to employ a 100% pure strategy unless they are forced to do so by some extremely powerful force. In many situations, a mixed strategy will never be effective in the first place, and the reality is that even if you don’t think about it, you will end up using a mixed strategy in the sense that you would not necessarily take the same actions if you were dealt the same hand.

A mixed strategy like GTO or a 100% pure strategy is obviously not human-like decision making, so it might be better to tune it to a certain probability of randomly choosing a hand that is on the border between 2-betting and 3-betting. The degree of ambiguity of that boundary will reflect each player’s individuality. Typically, players with tight ranges tend to have well-defined boundaries, while loose players tend to be more arbitrary with the same hand, but it’s also possible to have a setup that’s tight but ad-hoc, or loose but controlled.

Algorithmic: Trained to follow rules and always acting according to a strategy based on the cards in hand.
Serious: About 10% of the cards in his hand will result in an action that is different from the original, but he generally acts according to his strategy.
Standard: About 20% of the cards in your hand will behave differently than they should.
Sloppy: about 30% of the cards in your hand will act differently than they should.
Ad hoc: About 50% of the cards in your hand will behave differently than they should.

Specifically, we can set it to perform the following actions:

With a 3bet/3bet hand, they will call an open raise without 3betting for a specified probability.
With a 2bet/3bet hand, they 3bet half the specified probability. (I wrote “half,” but the idea is to 3bet the ratio of the hand combinations between 2bet/3bet and 3bet/3bet.)
With a 2bet/3bet or 3bet/3bet hand, if it comes down to a 3bet, they will fold a specified probability.
If they have a 2bet/2bet hand and it is their turn to be a 3bet, they will call with the specified probability. (Similarly, they will call with the ratio of hand combinations of 2bet/2bet and 2bet/3bet+3bet/3bet.)

Either of these may correspond to actions that could be called misplays, but specifying this ratio represents a mixed strategy. It is important to note that algorithmic players are not strong, and haphazard players are not weak. Frequent 3-betting with mediocre hands often serves as a bluff. In fact, it is difficult to distinguish between a mixed strategy and a misplay, and it may be said that a mixed strategy is an intentional misplay. Bluffing and slow playing are misplays in the sense that they are a failure to correctly evaluate the strength of your hand, but in a game of imperfect information, being too right is not correct, and being too wrong is not correct either, so you have to choose the appropriate frequency. Although the above algorithm may not be a GTO-like mixed strategy, it may be a human-like mixed strategy.

By the way, when I asked ChatGPT if there is no difference between a mixed strategy and a misplay, they replied, “No, that’s not true. There are the following differences.”

	Mixed Strategy	Misplay
Definition	Intentionally dividing actions probabilistically	Unable to execute the original strategy, deviating due to mistakes and emotions
Purpose	Optimization techniques to prevent your opponent from reading your hand	Unintentional errors and emotional decisions
Control	Controlled	Not being in control (anxiety, uncertainty, inertia, etc.)
Strategic Coherence	Mathematically and theoretically designed	Inconsistent (random but exploitable)

From the outside, it seems like sometimes it’s hard to tell if someone is using a mixed strategy or tilting… I think they’re tilting because they’ve been losing recently and they’ve been making a lot of aggressive bets, but it turns out they just have strong cards in their hand. So in the end, it’s hard to tell the difference, right? That’s my honest opinion. Well, it may depend on your ability to read people…

Position and Pre-Flop Strategy

It is commonly said in poker books that you should change your preflop entry hand range depending on your position. Because decisions are made sequentially, the player who makes the decision later generally has an advantage. What is confusing is that pre-flop the order is UTG, MP, CO, BTN, SB, BB, but after the flop it is SB, BB, UTG, MP, CO, BTN. In the pre-flop stage, the SB and BB are in an advantageous position, but after the flop, the SB and BB will be in a disadvantageous position. Overall, it can be said that the UTG and SB are in a disadvantageous position, and the CO and BTN are in an advantageous position.

When there are six players, the appropriate VPIP is said to be roughly in the range below (the average VPIP is around 25%, so if the VPIP is 20%, the value is about 80% of the value in the table below).

Position	VPIP (Open Raise)	VPIP (3Bet)
UTG	15～18%	—
MP	18～22%	2～3% (AK, QQ+)
CO	22～26%	3% (AK, AQs, JJ+)
BTN	35～45%	5% (AQ+, TT+)
SB	30～40%	7～8% (AQ+, JJ+, Axs, KJs+)
BB	20～25%	2～3% (AK, QQ+)

Since the SB will inevitably be out of position after the flop, it seems that they will be 3-betting quite widely. Because the SB is at a disadvantage after the flop, they have a stronger desire to end the game pre-flop than other positions. For this reason, when making a 3-bet from the SB or BB, the multiplier is set to 4x (3x in other positions), and actions that are more conscious of getting other players to fold are often taken. The BB does not count towards VPIP (does not pay additional chips) when the BTN or SB limps in, so although the VPIP may appear low, in reality the BB will be on the flop in nearly half of the games. The VPIP here is the percentage of players who call a raise or who raise themselves and participate, which is 20-25%.

Up until now, we have used a single value for VPIP to specify the hand range, but in reality, it is necessary to specify a VPIP for each position and change the hand range. This can be expressed by multiplying the average VPIP by a factor for each position. You can specify parameters for types that care about position and types that don’t care about position. For example, you can set the following bets for each player. The left value is the bet for a 2 bet, and the right value is the bet for a 3 bet.

Position	Ignore	Under Estimation	Standard	Emphasis	Over Estimation
UTG	1, 1	0.825, —	0.65, —	0.475, —	0.3, —
MP	1, 1	0.900, 0.75	0.80, 0.50	0.700, 0.25	0.6, 0
CO	1, 1	0.975, 0.875	0.95, 0.75	0.925, 0.625	0.9, 0.5
BTN	1, 1	1.250, 1.125	1.50, 1.25	1.750, 1.375	2, 1.5
SB	1, 1	1.125, 1.375	1.25, 1.75	1.375, 2.125	1.5, 2.5
BB	1, 1	0.925, 0.875	0.85, 0.75	0.775, 0.625	0.7, 0.5

In addition, ideally you should change the range of whether you call or fold when you are raised, not just when you are raising yourself. You might want to vary the ratio of active betting to passive calling. It is possible that there may be players who bet in a standard manner but do not consider position when calling. As for other valuations, it may be good to prepare a set of ratios that reflect the player’s preferences, such as if you like UTG and have a high VPIP for UTG, or if you like SB and have a high entry rate for SB. I think I’ll think about this when I actually develop it.

Stacks and Pre-flop All-in Strategies

In the article on how to win at the ultra deep stack Texas Hold’em, we mentioned that the deeper the stacks, the weaker your preflop action should be. Conversely, the shallower the stacks, the stronger your preflop action should be. Generally, when your stack falls below 10bb, you’re left with two choices: fold or go all-in. If your stack is 20bb, a 3bet will mean you are betting about half your chips, so if you raise after a 2bet, it will often be an all-in. To reflect this in your behavior, you could implement it by changing a raise to an all-in if it would exceed a certain percentage of your stack (for example, 30-40%).

If you have a stack of 30bb or more, it may not make sense to go all-in preflop with anything other than AA or KK, but there will still be real-world cases where you do so. This may be divided into the following patterns:

If you want a call from a player with a strong hand (if you have AA, KK, QQ, AK, etc.)
If you want to get a call from only one player with a strong hand (pocket pair below JJ)
If you want all your opponents to fold
When you gamble because you recently lost a lot of chips (or won a lot of chips)

Pattern 1 can be implemented by going all-in with a certain probability when you have a 3bet/3bet hand. If you are a solid player, you can set this probability low, and if you are a gambler-like player, you can set it high.

Pattern 2 is implemented so that players with high pocket pair preferences will go all-in with a certain probability. Low-ranking pocket pairs are weak when called by multiple players, so it would be a good idea to adjust the probability so that it is higher when there are few players who have not folded. Pattern 3. may be set to go all-in with a certain probability when playing BTN, SB, or BB and three or more players have called with 2 bets or less, or when a certain number of chips are put into the pot. As for pattern 4, it might be a good idea to have each player have something like a tilt parameter, which accumulates when they lose chips or win large amounts, and when it exceeds a certain value, they will start gambling and going all-in repeatedly for several games (maybe three?). It may be more realistic to use it by resetting the tilt parameters after going all-in a few times.

In summary, you can calculate the probability of going all-in by adding up the following factors, and if you hit all-in, you should go all-in regardless of your strategy.

Probability of going all-in based on remaining stack (proportion of whether to call or go all-in)
All-in probability per hand (or betting strategy)
All-in probability adjustment based on number of players not folding (may or may not affect your hand)
All-in probability adjustment term (steal parameter) depending on the amount of chips already bet
All-in probability adjustment by tilt parameter (parameter to go all-in regardless of hand)

The size of each adjustment factor should be set according to the player’s characteristics (whether they like to go all-in with strong hands, steal frequently, or tend to tilt).

Rake and Ante

In a previous post, I mentioned that if there is an ante, you should adjust your VPIP to be about 4% higher per 0.1bb. Also, if there is a rake, you should adjust your VPIP in the same way.

Rake	VPIP
Rake 5%, no cap	Reduce by 3-5%
Rake 2.5% or capped (1bb, 2bb, etc.)	Reduce by 1-2%
No rake	No change

It is unlikely that you will change the ante or rake rules within the same game, so you can choose to either set the adjusted VPIP in the player’s data from the beginning, or set the player’s data without ante or rake and then adjust the parameters.

Summary

For now, I think we have identified the points that need to be taken into account in a preflop player model. All that remains is to actually create it and try it out. I’ll try to implement this as soon as possible. However, I feel like I’ve finally understood Counterfactual Regret Minimization, so I’ll write about it from next time onwards.