A/B Testing using Python Pandas and SciPy¶

Introduction¶

We have a platform for selling cars. Sellers find more potential buyers by purchasing paid services (PS) to promote their ad. After using these paid services, ads are in the top and buyers see them more often.

The monetization team often makes changes to the mechanics of work and the control interface. Each such change goes through an AB test. The metric the platform most often track is ARPU = total revenue/total number of users.

We have the results of three unrelated AB tests.

experiment_num - experiment number (1, 2, 3)

experiment_group - the group the user is in (test, control)

user_id - user id

revenue - revenue generated by the user by purchasing a paid promotion service

Objective¶

Calculate ARPU and p-value for each experiment and identify which mechanics should be scaled.

Data Exploration¶

We are going to import necessary Python libraries, upload and explore our data.

In [ ]:
# import the necessary libraries
import pandas as pd
from scipy.stats import ttest_ind
In [ ]:
# upload the data
df = pd.read_csv('abtest.csv')
In [ ]:
# first, let's check the data
df.head(5)
Out[ ]:
experiment_num experiment_group user_id revenue
0 1 test 38456 520
1 1 control 13125924 806
2 1 control 9761984 0
3 1 test 11387012 208
4 1 test 18319648 104
In [ ]:
# check the data types
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2835 entries, 0 to 2834
Data columns (total 4 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   experiment_num    2835 non-null   int64 
 1   experiment_group  2835 non-null   object
 2   user_id           2835 non-null   int64 
 3   revenue           2835 non-null   int64 
dtypes: int64(3), object(1)
memory usage: 88.7+ KB
In [ ]:
df.describe()
Out[ ]:
experiment_num user_id revenue
count 2835.000000 2.835000e+03 2835.000000
mean 2.000000 7.314436e+06 681.030335
std 0.816641 7.553155e+06 2266.459889
min 1.000000 6.548000e+03 0.000000
25% 1.000000 4.673860e+05 0.000000
50% 2.000000 3.874702e+06 74.000000
75% 3.000000 1.267934e+07 312.000000
max 3.000000 2.434127e+07 32835.000000

Testing¶

We are going to obtain ARPU and p-value for each experiment.

In [ ]:
# Filter rows with experiment_num equal 1, 2, and 3
exp01 = df[df['experiment_num'] == 1]
exp02 = df[df['experiment_num'] == 2]
exp03 = df[df['experiment_num'] == 3]
In [ ]:
# Calculate the ARPU for experiment number 1
# Calculate total revenue
exp01total_revenue = exp01['revenue'].sum()

# Calculate total number of users
exp01total_users = exp01['user_id'].nunique()

# Calculate ARPU
exp01arpu = exp01total_revenue / exp01total_users

print(exp01arpu)
693.6497354497354
In [ ]:
# Calculate the ARPU for each experiment number 2
# Calculate total revenue
exp02total_revenue = exp02['revenue'].sum()

# Calculate total number of users
exp02total_users = exp02['user_id'].nunique()

# Calculate ARPU
exp02arpu = exp02total_revenue / exp02total_users

print(exp02arpu)
515.8412698412699
In [ ]:
# Calculate the ARPU for each experiment number 3
# Calculate total revenue
exp03total_revenue = exp03['revenue'].sum()

# Calculate total number of users
exp03total_users = exp03['user_id'].nunique()

# Calculate ARPU
exp03arpu = exp03total_revenue / exp03total_users

print(exp03arpu)
833.6
In [ ]:
# Perform independent samples t-test experiment_num 1
group_test = exp01[exp01['experiment_group'] == 'test']['revenue']
group_control = exp01[exp01['experiment_group'] == 'control']['revenue']

t_statistic, p_value = ttest_ind(group_test, group_control)

print("T-statistic:", t_statistic)
print("P-value:", p_value)

alpha = 0.05  # significance level
if p_value < alpha:
    print("Reject null hypothesis: There is a significant difference between the groups.")
else:
    print("Fail to reject null hypothesis: There is no significant difference between the groups.")
T-statistic: -0.4006287660577395
P-value: 0.688784211779017
Fail to reject null hypothesis: There is no significant difference between the groups.
In [ ]:
# Perform independent samples t-test experiment_num 2
group_test2 = exp02[exp02['experiment_group'] == 'test']['revenue']
group_control2 = exp02[exp02['experiment_group'] == 'control']['revenue']

t_statistic, p_value = ttest_ind(group_test2, group_control2)

print("T-statistic:", t_statistic)
print("P-value:", p_value)

alpha = 0.05  # significance level
if p_value < alpha:
    print("Reject null hypothesis: There is a significant difference between the groups.")
else:
    print("Fail to reject null hypothesis: There is no significant difference between the groups.")
T-statistic: -3.303268516112734
P-value: 0.0009915972237576193
Reject null hypothesis: There is a significant difference between the groups.
In [ ]:
# Perform independent samples t-test experiment_num 3
group_test3 = exp03[exp03['experiment_group'] == 'test']['revenue']
group_control3 = exp03[exp03['experiment_group'] == 'control']['revenue']

t_statistic, p_value = ttest_ind(group_test3, group_control3)

print("T-statistic:", t_statistic)
print("P-value:", p_value)

alpha = 0.05  # significance level
if p_value < alpha:
    print("Reject null hypothesis: There is a significant difference between the groups.")
else:
    print("Fail to reject null hypothesis: There is no significant difference between the groups.")
T-statistic: 1.8703638838987617
P-value: 0.06174290011218773
Fail to reject null hypothesis: There is no significant difference between the groups.

Conclusion¶

So, we have a table as below with the arpu and p-values for each experiment. In the beginning we said that most important metric the company uses is arpu value. In this case experiment 3 has the highest arpu with p-value of 0.06, which is very close to the significance number of 0.05. In other hand, experiment 2 has very low p-value but also lowest arpu.

From this test we can say that experiment 3 should go ahead and more scaled according to the company's current strategy.

exp1 exp2 exp3
arpu 693.6 515.8 833.6
p-value 0.688 0.0009 0.061
In [ ]:
import matplotlib.pyplot as plt

# Data
experiments = ['exp1', 'exp2', 'exp3']
metrics = ['arpu', 'p-value']
values = {
    'arpu': [693.6, 515.8, 833.6],
    'p-value': [0.688, 0.0009, 0.061]
}

# Create subplots
fig, axs = plt.subplots(len(metrics), 1, figsize=(8, 6))

# Plot bar charts for each metric
for i, metric in enumerate(metrics):
    bars = axs[i].bar(experiments, values[metric], color=['blue', 'orange', 'green'])
    axs[i].set_ylabel(metric.capitalize())
    axs[i].set_title(f'{metric.capitalize()}')

    # Attach values to bars
    for bar in bars:
        height = bar.get_height()
        axs[i].annotate('{}'.format(height),
                        xy=(bar.get_x() + bar.get_width() / 2, height),
                        xytext=(0, 3),  # 3 points vertical offset
                        textcoords="offset points",
                        ha='center', va='bottom')

# Adjust layout
plt.tight_layout()

# Show plot
plt.show()