0% found this document useful (0 votes)

29 views22 pages

Acing Black Jack

Uploaded by

Harshavardhini.K (Harsha)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views22 pages

Acing Black Jack

Uploaded by

Harshavardhini.K (Harsha)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Acing Blackjack

- A Monte Carlo approach

1
Blackjack 101
• Blackjack (also known as 21) is the most
widely played casino game in the world.
• It is a comparing card game played between
player(s) and the dealer.
– Players compete against the dealer but not against
each other.
• The objective of the game is to outscore the
dealer (via sum of dealt cards) without
exceeding 21.

2
Blackjack gameplay
• The player is dealt a two-card hand
– Face cards (Kings, Queens, and Jacks) are counted as 10 points.
– Aces may be counted as 1 or 11 points.
– All other cards are counted as the numeric value shown on the card.
• The dealer then deals two cards from the deck for himself and shows only
the first card (face-up) to the player. The second card is kept face-down.
• After looking at the dealer’s face-up card, the player may decide to "hit“,
i.e., receiving additional card(s) one-by-one until
• The card sum crosses 21 (player is “busted”).
• Player decides to end his turn (player “sticks”).

• Once the player sticks, it becomes the dealer’s turn.

– The dealer plays with a fixed strategy without choice: he continues to
hit until his cards total 17 or more points.
• Finally, the player’s and dealer’s card sums are computed.

3
Blackjack: ways to win
• Three ways to beat the dealer:
– Get 21 points on the player's first two cards itself
(called a natural), without a dealer natural. The
game ends immediately in this case.
– Reach a final score higher than the dealer without
exceeding 21 (otherwise player busted).
– Let the dealer draw additional cards until his hand
exceeds 21 (dealer busted).
• If the dealer and player hand sums are the same, it’s
a tie!

4
Blackjack 201
• If the player holds an ace that he could count as 11 without going
bust, then the ace is said to be usable (also called “Soft Hand”).
Otherwise, its called hard. The term soft means the hand has two
possible totals.
– In this case it is always counted as 11 because counting it as 1
would make the sum 11 or less, in which case there is no
decision to be made because, obviously, the player should
always hit.
• We only consider “Hit” and “Stick” here. Other options include
– “Double Down” (double wager, take a single card and finish).
– “Split” (if the two cards have the same value, separate them to
make two hands).
– “Surrender”(give up a half-bet and retire from the game).
• Most blackjack tables have a payout of 3:2.
5
Blackjack sample gameplay
DEALER PLAYER

PLAYER STICKS.
12; NO USABLE ACE
DEALER SUM 22; PLAYER SUM: 20; USABLE
13; NO ACEACE
USABLE

Dealer busted => Player wins!

6
The Problem
• What is the optimal winning strategy (policy)?
i.e., When do you hit? When do you stick?
• Markov decision process provide a mathematical
framework for this environment:
– The player and dealer cards constitute the current state.
– The possible actions from any state are to hit or stick.
– Transition probabilities represent the probabilities of
moving to the next state.
– At the end of game, the player gets a reward (win/loss).
• Finding the optimal state value or action value function
gives the solution to this problem.
7
Dynamic Programing (DP)
• DP requires complete model of the environment as a Markov
decision process (states and possible actions/transition
probabilities/expected immediate rewards).
• Optimal state-value function is

• DP methods require the distribution of next events which is not

easy to determine in the case of blackjack.
– For example, suppose the player's sum is 18 and he chooses to
stick. What is his expected reward and transition probabilities?
These computations are often complex and error-prone.

8
Monte Carlo (MC)
• Unlike DP methods, MC methods do not assume
complete knowledge of the environment.
• MC methods require only experience--sample
sequences of states, actions, and rewards from on-
line or simulated interaction with an environment.
– By trading-off exploration (in unknown territory)
and exploitation (in known territory), MC methods
can achieve optimal performance.

9
DP versus MC

Value of a state depends on the average Value of a state depends only on the final
reward, the transition probabilities and return
the value of other states

10
Advantages of MC over DP
• MC methods can be used to learn optimal behavior directly from
interaction with the environment, with no model of the
environment's dynamics.
• Monte Carlo methods are particularly attractive when one
requires the values of only a subset of states.
– One can generate many sample episodes starting from these states alone,
averaging returns only from these states ignoring all others.
• MC methods do not “bootstrap”: The estimate for one state
does not build upon the estimate of any other state, as is the
case in DP (or temporal difference learning methods).
• Not very sensitive to initial values; easy to understand and use.
• MC methods work in non-Markovian environments as well.

11
Solving Blackjack
• Playing Blackjack is naturally formulated as an episodic finite MDP. Each game of blackjack is
an “episode”.
– We assume experience is divided into episodes, and that all episodes eventually
terminate no matter what actions are selected.

STATES & ACTIONS

• The states depend on the player's cards and the dealer's showing card.
• Basically, the player makes decisions on the basis of three variables: his current sum (12-21),
the dealer's one showing card (ace-10), and whether or not he holds a usable ace. This makes
for a total of (10*10*2= 200) states.
• Possible actions from any state are hit and stick.

RETURNS
• Returns of 1.5$, -1$, and 0$ are given for winning, losing, and drawing, respectively only at
the end of the episode for every $ bet. Note: they are not the immediate expected rewards.
• All rewards within a game are zero, and we do not discount (γ = 1); therefore these terminal
rewards are also the returns.

12
Monte Carlo control
• Finding the optimal policy (strategy) classically
comprises a generalized form of policy iteration.
– Start off with an arbitrary strategy π and evaluate its
state/action values (Q).
– Improve the strategy in a ‘greedy’ manner.
– Repeat the policy evaluation and improvement
steps until the policy and action value functions
attain optimality.

13
Monte Carlo with Exploring Starts
Action values
Policy
Returns

Exploration phase

Policy evaluation

Greedy policy improvement

Recall: Q(s,a) represents the “action-value” function, and is the average reward
obtained upon starting at state s, taking action a and following policy π thereafter.
Exploring starts assumes that episodes start with state-action pairs randomly selected.14
MC policy evaluation

An arbitrary policy*, π State values** (under the policy π)

Black: ‘HIT’ Black: ‘HIT’
White: ‘STICK’ White: ‘STICK’

Poor state values suggest that the policy is sub-optimal and could be improved.

*A policy is a mapping from a state to an action.

15
**State values represent the average reward (winnings) from any state.
MC policy improvements
The starting policy Π(t=0) Π (after 5e5 episodes)

Π* - The optimal policy

Π (after 1e6 episodes) (plotted after 5e6 episodes)

STICK STICK

HIT HIT

16
The optimal value function

17
Average winnings (for the optimal strategy)

• -0.425 with the payout of 3:2

– If you play $100, you get back only 57$.
• Casinos typically use more than 1 deck to
decrease these odds further
– With 2 decks: -0.43
– With 4 decks: -0.435
– With 8 decks: -0.44

18
Avoiding exploring starts
• Exploring starts can sometimes be arranged in
applications with simulated episodes, but are
unlikely in learning from real experience. Instead,
one of two general approaches can be used.
• In on-policy methods, the agent commits to
always exploring and tries to find the best policy
that still explores.
• In off-policy methods, the agent also explores,
but learns a deterministic optimal policy that may
be unrelated to the policy followed.
19
Shortcomings of MC methods
• MC methods must wait until the end of
episode before return is known.
– Hence, they work for episodic (terminating)
environments.
• High error variances.
• Problematic when the number of states is too
large.
• Compared to temporal difference learning,
convergence will be slow.

20
What gives dealer the advantage?
• The house has an advantage in blackjack simply
because the player has to draw first and if he busts, the
player automatically loses regardless of whether the
dealer would have busted or not.
• Player’s advantages over the dealer:
– 3:2 winnings.
– Double down, split, surrender options.
– Dealer plays fixed strategy (stick at 17 or more).
• With the right strategy, house advantage can be
reduced to -0.005, i.e., you lose 50 cents for every 100$
bet (still negative!).
• Positive advantage if you can count cards ;)

21
Advanced Blackjack Strategy

Blackjack - Can ADP Bring Down The House
No ratings yet
Blackjack - Can ADP Bring Down The House
16 pages
Blackjack Strategy with AI
No ratings yet
Blackjack Strategy with AI
5 pages
Document 1 Fer 32 Fwgregfd 12 Gty 65 Erbtrq
No ratings yet
Document 1 Fer 32 Fwgregfd 12 Gty 65 Erbtrq
4 pages
Optimal Blackjack Betting Strategies Through Dynamic Programming and Expected Utility Theory - Lucas Bordeu and Javier Castro
No ratings yet
Optimal Blackjack Betting Strategies Through Dynamic Programming and Expected Utility Theory - Lucas Bordeu and Javier Castro
50 pages
Baldwin 1956
No ratings yet
Baldwin 1956
13 pages
Module 2
No ratings yet
Module 2
84 pages
A Formal Game-Theoretic Model of Blackjack
No ratings yet
A Formal Game-Theoretic Model of Blackjack
43 pages
303 Project
No ratings yet
303 Project
4 pages
Statistically Analyzing The Short Tenn Profit of Basic Strategy, Multi-Deck, Strip Blackjack
No ratings yet
Statistically Analyzing The Short Tenn Profit of Basic Strategy, Multi-Deck, Strip Blackjack
23 pages
Math IA
No ratings yet
Math IA
26 pages
Blackjack Counting Strategies
No ratings yet
Blackjack Counting Strategies
12 pages
Baldwin OptimalStrategyBlackjack
No ratings yet
Baldwin OptimalStrategyBlackjack
11 pages
Brainstorm
No ratings yet
Brainstorm
7 pages
Blackjack Notes
No ratings yet
Blackjack Notes
21 pages
Speelman 2015
No ratings yet
Speelman 2015
11 pages
Unit 3
No ratings yet
Unit 3
30 pages
The Optimun Strategy
No ratings yet
The Optimun Strategy
12 pages
Shsconf Icprss2022 03038
No ratings yet
Shsconf Icprss2022 03038
5 pages
Monte Carlo Methods for RL Experts
No ratings yet
Monte Carlo Methods for RL Experts
28 pages
Zimran Klis Fuster Rivelli - Gametheory.blackjack PDF
No ratings yet
Zimran Klis Fuster Rivelli - Gametheory.blackjack PDF
19 pages
Markov Chain Blackjack
No ratings yet
Markov Chain Blackjack
15 pages
Perfect Blackjack Strategy
No ratings yet
Perfect Blackjack Strategy
18 pages
Methodology
No ratings yet
Methodology
7 pages
Omputer Rogramming: 1 Game Description
No ratings yet
Omputer Rogramming: 1 Game Description
3 pages
Blackjack Rules
No ratings yet
Blackjack Rules
11 pages
Blackjack Strategy: Winning and Losing
No ratings yet
Blackjack Strategy: Winning and Losing
7 pages
Notes
No ratings yet
Notes
6 pages
Intelligent Blackjack: by Sam Braids
No ratings yet
Intelligent Blackjack: by Sam Braids
31 pages
Blackjack Strategy Guide: Win Tips
100% (1)
Blackjack Strategy Guide: Win Tips
6 pages
Monte Carlo Learning
No ratings yet
Monte Carlo Learning
14 pages
Perfect BlackJack and Card Counting
No ratings yet
Perfect BlackJack and Card Counting
15 pages
CH3 - 2 Montecarlo Control
No ratings yet
CH3 - 2 Montecarlo Control
33 pages
A Perspective On Quantitative Finance: Models For Beating The Market
No ratings yet
A Perspective On Quantitative Finance: Models For Beating The Market
6 pages
Investing-Thorp, Edward O. - Models For Beating The Market (Extract From The Best of Wilmott)
No ratings yet
Investing-Thorp, Edward O. - Models For Beating The Market (Extract From The Best of Wilmott)
6 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
20 pages
Gaining and Sustaining Competitive Advantage - Jay Barney
No ratings yet
Gaining and Sustaining Competitive Advantage - Jay Barney
52 pages
CH3 - 1 Montecarlo Components
No ratings yet
CH3 - 1 Montecarlo Components
18 pages
Ken Uston - Million Dollar Blackjack
100% (6)
Ken Uston - Million Dollar Blackjack
371 pages
Blackjack
No ratings yet
Blackjack
7 pages
Blackjack Card Counting Guide
No ratings yet
Blackjack Card Counting Guide
22 pages
How To Play Black Jack
No ratings yet
How To Play Black Jack
8 pages
19 - Monte Carlo and Temporal Difference For Markov Decision Processes
No ratings yet
19 - Monte Carlo and Temporal Difference For Markov Decision Processes
57 pages
Beginners Guide To
No ratings yet
Beginners Guide To
4 pages
Value Functions & Bellman Equations: UNIT-3
No ratings yet
Value Functions & Bellman Equations: UNIT-3
11 pages
An You Still Beat The Dealer: RIJ Niversiteit Msterdam
No ratings yet
An You Still Beat The Dealer: RIJ Niversiteit Msterdam
52 pages
Can You Still Beat The Dealer
No ratings yet
Can You Still Beat The Dealer
52 pages
2.2+model Free+Control
No ratings yet
2.2+model Free+Control
92 pages
5.4-Reinforcement Learning-Part2-Learning-Algorithms
No ratings yet
5.4-Reinforcement Learning-Part2-Learning-Algorithms
15 pages
Lnotes 03
No ratings yet
Lnotes 03
11 pages
The Math of Blackjack
No ratings yet
The Math of Blackjack
95 pages
Blackjack Strategy Guide for Gamblers
100% (1)
Blackjack Strategy Guide for Gamblers
24 pages
Beating Casino Blackjack
No ratings yet
Beating Casino Blackjack
24 pages
The Standard Deviation Effect (Or Why One Should Sit First Base Playing Blackjack)
No ratings yet
The Standard Deviation Effect (Or Why One Should Sit First Base Playing Blackjack)
27 pages
L4 Dominant Strategy
No ratings yet
L4 Dominant Strategy
21 pages
Practical Book of The Editor: General Recommendations On Creating Missions
No ratings yet
Practical Book of The Editor: General Recommendations On Creating Missions
67 pages
Corvid Court RPG A5
100% (3)
Corvid Court RPG A5
11 pages
Advanced Passing Game Strategies
100% (1)
Advanced Passing Game Strategies
81 pages
Kismerle
No ratings yet
Kismerle
2 pages
Teen Beginner's Monthly Test
No ratings yet
Teen Beginner's Monthly Test
3 pages
Unity Starter Assets Guide
No ratings yet
Unity Starter Assets Guide
13 pages
MCP - Config-1 16 5-20210115 111550-Mappings
No ratings yet
MCP - Config-1 16 5-20210115 111550-Mappings
1,155 pages
Kingdom of Chess Mega Sunday Chess Tournament
No ratings yet
Kingdom of Chess Mega Sunday Chess Tournament
4 pages
How to Join Bedrock Servers on Xbox/Switch
No ratings yet
How to Join Bedrock Servers on Xbox/Switch
1 page
Irregular Verbs Game Board
No ratings yet
Irregular Verbs Game Board
3 pages
Kabaddi Defensive Skills Guide
100% (3)
Kabaddi Defensive Skills Guide
12 pages
Curriculum Vitae: Rigoberto Saenz Imbacuan
No ratings yet
Curriculum Vitae: Rigoberto Saenz Imbacuan
1 page
Frisbee
No ratings yet
Frisbee
14 pages
CDDG2 DriveThruRPG Edition - 1st Printing - April 2017
No ratings yet
CDDG2 DriveThruRPG Edition - 1st Printing - April 2017
406 pages
Tendaar - Character Folio
No ratings yet
Tendaar - Character Folio
8 pages
List of Learners For ARAL Reading Program
100% (2)
List of Learners For ARAL Reading Program
2 pages
English Blue Archive FAQ
No ratings yet
English Blue Archive FAQ
40 pages
Board Game Book
No ratings yet
Board Game Book
2 pages
Dragon Magazine #319 (Dark Sun Player's Handbook)
100% (7)
Dragon Magazine #319 (Dark Sun Player's Handbook)
91 pages
Ds Unit 2 One Shot by Multi Atoms
No ratings yet
Ds Unit 2 One Shot by Multi Atoms
16 pages
Ken Ken: Zebra
No ratings yet
Ken Ken: Zebra
1 page
DPP - 04 New
No ratings yet
DPP - 04 New
3 pages
1001 Checkmate Exercises - Advanced Edition (Cicero 2018)
100% (1)
1001 Checkmate Exercises - Advanced Edition (Cicero 2018)
2,012 pages
Uperpowered Estiary: Boleth To Yclops
No ratings yet
Uperpowered Estiary: Boleth To Yclops
6 pages
Gold Coast Suns Draw - Google Search
No ratings yet
Gold Coast Suns Draw - Google Search
1 page
Focus3 2E Unit Test Dictation Listening Reading Unit2 GroupA B ANSWERS
100% (1)
Focus3 2E Unit Test Dictation Listening Reading Unit2 GroupA B ANSWERS
1 page
Microsoft Flight Simulator 2024 On Steam
No ratings yet
Microsoft Flight Simulator 2024 On Steam
1 page
The Ultimate Chess Guide
No ratings yet
The Ultimate Chess Guide
3 pages
674 - 3 - Focus 3. Workbook - 2020, 2nd, 180p-27
No ratings yet
674 - 3 - Focus 3. Workbook - 2020, 2nd, 180p-27
1 page
Deadly Day of The Dread Destructus PDF
100% (2)
Deadly Day of The Dread Destructus PDF
10 pages

Acing Black Jack

Uploaded by

Acing Black Jack

Uploaded by

Acing Blackjack

- A Monte Carlo approach

• Once the player sticks, it becomes the dealer’s turn.

Dealer busted => Player wins!

• DP methods require the distribution of next events which is not

STATES & ACTIONS

Greedy policy improvement

An arbitrary policy*, π State values** (under the policy π)

*A policy is a mapping from a state to an action.

Π* - The optimal policy

• -0.425 with the payout of 3:2

You might also like