We have a platform for selling cars. Sellers find more potential buyers by purchasing paid services (PS) to promote their ad. After using these paid services, ads are in the top and buyers see them more often.
The monetization team often makes changes to the mechanics of work and the control interface. Each such change goes through an AB test. The metric the platform most often track is ARPU = total revenue/total number of users.
We have the results of three unrelated AB tests.
experiment_num - experiment number (1, 2, 3)
experiment_group - the group the user is in (test, control)
user_id - user id
revenue - revenue generated by the user by purchasing a paid promotion service
Calculate ARPU and p-value for each experiment and identify which mechanics should be scaled.
We are going to import necessary Python libraries, upload and explore our data.
# import the necessary libraries
import pandas as pd
from scipy.stats import ttest_ind
# upload the data
df = pd.read_csv('abtest.csv')
# first, let's check the data
df.head(5)
experiment_num | experiment_group | user_id | revenue | |
---|---|---|---|---|
0 | 1 | test | 38456 | 520 |
1 | 1 | control | 13125924 | 806 |
2 | 1 | control | 9761984 | 0 |
3 | 1 | test | 11387012 | 208 |
4 | 1 | test | 18319648 | 104 |
# check the data types
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2835 entries, 0 to 2834 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 experiment_num 2835 non-null int64 1 experiment_group 2835 non-null object 2 user_id 2835 non-null int64 3 revenue 2835 non-null int64 dtypes: int64(3), object(1) memory usage: 88.7+ KB
df.describe()
experiment_num | user_id | revenue | |
---|---|---|---|
count | 2835.000000 | 2.835000e+03 | 2835.000000 |
mean | 2.000000 | 7.314436e+06 | 681.030335 |
std | 0.816641 | 7.553155e+06 | 2266.459889 |
min | 1.000000 | 6.548000e+03 | 0.000000 |
25% | 1.000000 | 4.673860e+05 | 0.000000 |
50% | 2.000000 | 3.874702e+06 | 74.000000 |
75% | 3.000000 | 1.267934e+07 | 312.000000 |
max | 3.000000 | 2.434127e+07 | 32835.000000 |
We are going to obtain ARPU and p-value for each experiment.
# Filter rows with experiment_num equal 1, 2, and 3
exp01 = df[df['experiment_num'] == 1]
exp02 = df[df['experiment_num'] == 2]
exp03 = df[df['experiment_num'] == 3]
# Calculate the ARPU for experiment number 1
# Calculate total revenue
exp01total_revenue = exp01['revenue'].sum()
# Calculate total number of users
exp01total_users = exp01['user_id'].nunique()
# Calculate ARPU
exp01arpu = exp01total_revenue / exp01total_users
print(exp01arpu)
693.6497354497354
# Calculate the ARPU for each experiment number 2
# Calculate total revenue
exp02total_revenue = exp02['revenue'].sum()
# Calculate total number of users
exp02total_users = exp02['user_id'].nunique()
# Calculate ARPU
exp02arpu = exp02total_revenue / exp02total_users
print(exp02arpu)
515.8412698412699
# Calculate the ARPU for each experiment number 3
# Calculate total revenue
exp03total_revenue = exp03['revenue'].sum()
# Calculate total number of users
exp03total_users = exp03['user_id'].nunique()
# Calculate ARPU
exp03arpu = exp03total_revenue / exp03total_users
print(exp03arpu)
833.6
# Perform independent samples t-test experiment_num 1
group_test = exp01[exp01['experiment_group'] == 'test']['revenue']
group_control = exp01[exp01['experiment_group'] == 'control']['revenue']
t_statistic, p_value = ttest_ind(group_test, group_control)
print("T-statistic:", t_statistic)
print("P-value:", p_value)
alpha = 0.05 # significance level
if p_value < alpha:
print("Reject null hypothesis: There is a significant difference between the groups.")
else:
print("Fail to reject null hypothesis: There is no significant difference between the groups.")
T-statistic: -0.4006287660577395 P-value: 0.688784211779017 Fail to reject null hypothesis: There is no significant difference between the groups.
# Perform independent samples t-test experiment_num 2
group_test2 = exp02[exp02['experiment_group'] == 'test']['revenue']
group_control2 = exp02[exp02['experiment_group'] == 'control']['revenue']
t_statistic, p_value = ttest_ind(group_test2, group_control2)
print("T-statistic:", t_statistic)
print("P-value:", p_value)
alpha = 0.05 # significance level
if p_value < alpha:
print("Reject null hypothesis: There is a significant difference between the groups.")
else:
print("Fail to reject null hypothesis: There is no significant difference between the groups.")
T-statistic: -3.303268516112734 P-value: 0.0009915972237576193 Reject null hypothesis: There is a significant difference between the groups.
# Perform independent samples t-test experiment_num 3
group_test3 = exp03[exp03['experiment_group'] == 'test']['revenue']
group_control3 = exp03[exp03['experiment_group'] == 'control']['revenue']
t_statistic, p_value = ttest_ind(group_test3, group_control3)
print("T-statistic:", t_statistic)
print("P-value:", p_value)
alpha = 0.05 # significance level
if p_value < alpha:
print("Reject null hypothesis: There is a significant difference between the groups.")
else:
print("Fail to reject null hypothesis: There is no significant difference between the groups.")
T-statistic: 1.8703638838987617 P-value: 0.06174290011218773 Fail to reject null hypothesis: There is no significant difference between the groups.
So, we have a table as below with the arpu and p-values for each experiment. In the beginning we said that most important metric the company uses is arpu value. In this case experiment 3 has the highest arpu with p-value of 0.06, which is very close to the significance number of 0.05. In other hand, experiment 2 has very low p-value but also lowest arpu.
From this test we can say that experiment 3 should go ahead and more scaled according to the company's current strategy.
exp1 | exp2 | exp3 | |
---|---|---|---|
arpu | 693.6 | 515.8 | 833.6 |
p-value | 0.688 | 0.0009 | 0.061 |
import matplotlib.pyplot as plt
# Data
experiments = ['exp1', 'exp2', 'exp3']
metrics = ['arpu', 'p-value']
values = {
'arpu': [693.6, 515.8, 833.6],
'p-value': [0.688, 0.0009, 0.061]
}
# Create subplots
fig, axs = plt.subplots(len(metrics), 1, figsize=(8, 6))
# Plot bar charts for each metric
for i, metric in enumerate(metrics):
bars = axs[i].bar(experiments, values[metric], color=['blue', 'orange', 'green'])
axs[i].set_ylabel(metric.capitalize())
axs[i].set_title(f'{metric.capitalize()}')
# Attach values to bars
for bar in bars:
height = bar.get_height()
axs[i].annotate('{}'.format(height),
xy=(bar.get_x() + bar.get_width() / 2, height),
xytext=(0, 3), # 3 points vertical offset
textcoords="offset points",
ha='center', va='bottom')
# Adjust layout
plt.tight_layout()
# Show plot
plt.show()