A/B testing
B AY E S I A N D ATA A N A LY S I S I N P Y T H O N
Michal Oleszak
Machine Learning Engineer
A/B testing
Randomized experiment: divide users in two groups (A and B)
1 Picture: adapted from h ps://commons.wikimedia.org/wiki/File:A-B_testing_simple_example.png
BAYESIAN DATA ANALYSIS IN PYTHON
A/B testing
Randomized experiment: divide users in two groups (A and B)
Expose each group to a di erent version of something (e.g. website layout)
1 Picture: adapted from h ps://commons.wikimedia.org/wiki/File:A-B_testing_simple_example.png
BAYESIAN DATA ANALYSIS IN PYTHON
A/B testing
Randomized experiment: divide users in two groups (A and B)
Expose each group to a di erent version of something (e.g. website layout)
Compare which group scores be er on some metric (e.g. click-through rate)
1 Picture: adapted from h ps://commons.wikimedia.org/wiki/File:A-B_testing_simple_example.png
BAYESIAN DATA ANALYSIS IN PYTHON
A/B testing: frequentist way
Based on hypothesis testing
Check whether A and B perform the same or not
Does not say how much be er is A than B
BAYESIAN DATA ANALYSIS IN PYTHON
A/B testing: Bayesian approach
Calculate posterior click-through rates for website layouts A and B and compare them
Directly calculate the probability that A is be er than B
Quantify how much be er it is
Estimate expected loss in case we make a wrong decision
BAYESIAN DATA ANALYSIS IN PYTHON
A/B testing: Bayesian approach
When a user lands on the website, there are two scenarios:
Click (success)
No click (failure)
Use binomial distribution! (probability of success = click rate)
BAYESIAN DATA ANALYSIS IN PYTHON
Simulate beta posterior
We know that if the prior is Beta(a, b), then the posterior is Beta(x, y), with:
x = NumberOfSuccesses + a
y = NumberOfObservations − NumberOfSuccesses + b
def simulate_beta_posterior(trials, beta_prior_a, beta_prior_b):
num_successes = np.sum(trials)
posterior_draws = np.random.beta(
num_successes + beta_prior_a,
len(trials) - num_successes + beta_prior_b,
10000
)
return posterior_draws
BAYESIAN DATA ANALYSIS IN PYTHON
Comparing posteriors
Lists of 1s (clicks) and 0s (no clicks): Plot posteriors:
print(A_clicks) sns.kdeplot(A_posterior, shade=True, label="A")
print(B_clicks) sns.kdeplot(B_posterior, shade=True, label="B")
plt.show()
[0 1 1 0 0 0 0 0 0 0 1 ... ]
[0 0 0 1 0 0 0 1 1 0 1 ... ]
Simulate posterior draws for each layout:
A_posterior = simulate_beta_posterior(A_clicks, 1, 1)
B_posterior = simulate_beta_posterior(B_clicks, 1, 1)
BAYESIAN DATA ANALYSIS IN PYTHON
Comparing posteriors
Posterior di erence between B and A: Probability of B being be er:
diff = B_posterior - A_posterior (diff > 0).mean()
sns.kdeplot(diff, shade=True, label="difference: A-B")
0.9639
plt.show()
BAYESIAN DATA ANALYSIS IN PYTHON
Expected loss
If we deploy the worse website version, how many clicks do we lose?
# Difference (B-A) when A is better
loss = diff[diff < 0]
# Expected (average) loss
expected_loss = loss.mean()
print(expected_loss)
-0.0077850237030215215
BAYESIAN DATA ANALYSIS IN PYTHON
Ads data
print(ads)
user_id product site_version time banner_clicked
0 f500b9f27ac611426935de6f7a52b71f clothes desktop 2019-01-28 16:47:08 0
1 cb4347c030a063c63a555a354984562f sneakers mobile 2019-03-31 17:34:59 0
2 89cec38a654319548af585f4c1c76b51 clothes mobile 2019-02-06 09:22:50 0
3 1d4ea406d45686bdbb49476576a1a985 sneakers mobile 2019-05-23 08:07:07 0
4 d14b9468a1f9a405fa801a64920367fe clothes mobile 2019-01-28 08:16:37 0
... ... ... ... ... ...
9995 7ca28ccde263a675d7ab7060e9ed0eca clothes mobile 2019-02-02 08:19:39 0
9996 7e2ec2631332c6c4527a1b78c7ede789 clothes mobile 2019-04-04 03:27:05 0
9997 3b828da744e5785f1e67b5df3fda5571 clothes mobile 2019-04-15 15:59:06 0
9998 6cce0527245bcc8519d698af2224c04a clothes mobile 2019-05-21 20:43:21 0
9999 8cf87a02f96327a1a8a93814f34d0d0c sneakers mobile 2019-03-02 21:27:57 0
BAYESIAN DATA ANALYSIS IN PYTHON
Let's A/B test!
B AY E S I A N D ATA A N A LY S I S I N P Y T H O N
Decision analysis
B AY E S I A N D ATA A N A LY S I S I N P Y T H O N
Michal Oleszak
Machine Learning Engineer
Decision analysis
Decision-makers care about maximizing pro t, reducing costs, saving lives, etc.
BAYESIAN DATA ANALYSIS IN PYTHON
Decision analysis
Decision-makers care about maximizing pro t, reducing costs, saving lives, etc.
Decision analysis → translating parameters to relevant metrics to inform decision-making
BAYESIAN DATA ANALYSIS IN PYTHON
From posteriors to decisions
To make strategic decisions, one should know the probabilities of di erent scenarios.
Bayesian methods allow us to translate parameters into relevant metrics easily.
BAYESIAN DATA ANALYSIS IN PYTHON
From posteriors to decisions
To make strategic decisions, one should know the probabilities of di erent scenarios.
Bayesian methods allow us to translate parameters into relevant metrics easily.
BAYESIAN DATA ANALYSIS IN PYTHON
From posteriors to decisions
To make strategic decisions, one should know the probabilities of di erent scenarios.
Bayesian methods allow us to translate parameters into relevant metrics easily.
BAYESIAN DATA ANALYSIS IN PYTHON
From posteriors to decisions
To make strategic decisions, one should know the probabilities of di erent scenarios.
Bayesian methods allow us to translate parameters into relevant metrics easily.
BAYESIAN DATA ANALYSIS IN PYTHON
Posterior revenue
# Different revenue per click
num_impressions = 1000
rev_per_click_A = 3.6
rev_per_click_B = 3
# Compute number of clicks
num_clicks_A = A_posterior * num_impressions
num_clicks_B = B_posterior * num_impressions
# Compute posterior revenue
rev_A = num_clicks_A * rev_per_click_A
rev_B = num_clicks_B * rev_per_click_B
BAYESIAN DATA ANALYSIS IN PYTHON
Forest plot
import pymc3 as pm
# Collect posterior draws in a dictionary
revenue = {"A": rev_A, "B": rev_B}
# Draw the forest plot
pm.forestplot(revenue)
BAYESIAN DATA ANALYSIS IN PYTHON
Forest plot
import pymc3 as pm
# Collect posterior draws in a dictionary
revenue = {"A": rev_A, "B": rev_B}
# Draw the forest plot
pm.forestplot(revenue, hdi_prob=0.99)
BAYESIAN DATA ANALYSIS IN PYTHON
Let's analyze
decisions!
B AY E S I A N D ATA A N A LY S I S I N P Y T H O N
Regression and
forecasting
B AY E S I A N D ATA A N A LY S I S I N P Y T H O N
Michal Oleszak
Machine Learning Engineer
Linear regression
y = β0 + β1 x1 + β2 x2 + ...
sales = β0 + β1 marketingSpending
Frequentist inference:
sales = β0 + β1 marketingSpending + ε
ε ∼ N (0, σ)
Bayesian inference:
sales ∼ N (β0 + β1 marketingSpending, σ)
BAYESIAN DATA ANALYSIS IN PYTHON
Normal distribution
normal_0_1 = np.random.normal(0, 1, size=10000)
sns.kdeplot(normal_0_1, shade=True, label="N(0,1)")
plt.show()
BAYESIAN DATA ANALYSIS IN PYTHON
Normal distribution
normal_0_1 = np.random.normal(0, 1, size=10000)
normal_3_1 = np.random.normal(3, 1, size=10000)
sns.kdeplot(normal_0_1, shade=True, label="N(0,1)")
sns.kdeplot(normal_3_1, shade=True, label="N(3,1)")
plt.show()
BAYESIAN DATA ANALYSIS IN PYTHON
Normal distribution
normal_0_1 = np.random.normal(0, 1, size=10000)
normal_3_1 = np.random.normal(3, 1, size=10000)
normal_0_3 = np.random.normal(0, 3, size=10000)
sns.kdeplot(normal_0_1, shade=True, label="N(0,1)")
sns.kdeplot(normal_3_1, shade=True, label="N(3,1)")
sns.kdeplot(normal_0_3, shade=True, label="N(0,3)")
plt.show()
BAYESIAN DATA ANALYSIS IN PYTHON
Bayesian regression model definition
sales ∼ N (β0 + β1 marketingSpending, σ)
β0 ∼ N (5, 2)
β1 ∼ N (2, 10)
σ ∼ Unif (0, 3)
We expect $5000 sales without any marketing.
We expect $2000 increase in sales from each 1000 increase in spending.
Uniform prior for standard deviation, as we don't know what it could be.
BAYESIAN DATA ANALYSIS IN PYTHON
Estimating regression parameters
Grid approximation → impractical for many parameters
Choose conjugate priors and simulate from a known posterior → unintuitive priors
Third way: simulate from the posterior even with non-conjugate priors!
For now, assume the parameter draws are given
BAYESIAN DATA ANALYSIS IN PYTHON
Plot posterior
sales = β0 + β1 marketingSpending
print(marketing_spending_draws)
array([9.6153, 8.9922, ..., 4.59565])
import pymc3 as pm
pm.plot_posterior(
marketing_spending_draws,
hdi_prob=0.95
)
BAYESIAN DATA ANALYSIS IN PYTHON
Posterior draws analysis
posterior_draws_df = pd.DataFrame({
"intercept_draws": intercept_draws,
"marketing_spending_draws": marketing_spending_draws,
"sd_draws": sd_draws
})
print(posterior_draws_df)
intercept_draws marketing_spending_draws sd_draws
count 10000.000000 10000.000000 10000.000000
mean 2.972130 5.999146 1.337621
std 3.008565 2.020708 0.471723
min -8.562093 -2.842438 0.029643
25% 0.972832 4.621807 1.003229
50% 3.002940 5.975067 1.427617
75% 5.020615 7.362572 1.736310
max 15.228549 13.258955 1.999834
BAYESIAN DATA ANALYSIS IN PYTHON
Predictive distribution
How much sales can we expect if we spend $1000 on marketing?
sales ∼ N (β0 + β1 marketingSpending, σ)
# Get point estimates of parameters
intercept_mean = intercept_draws.mean()
marketing_spending_mean = marketing_spending_draws.mean()
sd_mean = sd_draws.mean()
# Calculate mean of predictive distribution
predictive_mean = intercept_mean + marketing_spending_mean * 1000
# Simulate from predictive distribution
prediction_draws = np.random.normal(predictive_mean, sd_mean, size=10000)
BAYESIAN DATA ANALYSIS IN PYTHON
Predictive distribution
How much sales can we expect if we spend $1000 on marketing?
BAYESIAN DATA ANALYSIS IN PYTHON
Let's regress and
forecast!
B AY E S I A N D ATA A N A LY S I S I N P Y T H O N