A/B Testing – Free Trial Scanner

Experiment Overview

In this experiment, Udacity tested the change where if the student clicked “Start free trial” on the home page, they were asked about their availability to commit to the course as can be seen here. If the student indicated that they had fewer than 5 hours per week, a suggestion would be made to instead access the course materials for free. The student would then be given the choice to proceed with the enrollment or be directed to the free course materials. The purpose of this experiment was to improve the overall student experience and improve coaches’ capacity to support students are likely to complete the course.

The hypothesis was that this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time—without significantly reducing the number of students to continue past the free trial and eventually complete the course. If this hypothesis held true, Udacity could improve the overall student experience and improve coaches' capacity to support students who are likely to complete the course.

The unit of diversion is a cookie, although if the student enrolls in the free trial, they are tracked by user-id from that point forward. The same user-id cannot enroll in the free trial twice. For users that do not enroll, their user-id is not tracked in the experiment, even if they were signed in when they visited the course overview page.

Experiment Design

Metric Choice

Invariant Metrics

Number of Cookies

Since this is the number of unique cookies to view the course overview page, we expect this value to be similar in both the experiment and control groups. Since this value will be similar for both cases, it can be considered as an invariant metric.

Number of Clicks

The number of clicks is the unique cookies to click the “Start free trial” button and happens before the free trial screener is shown to the user. Since this is a measure of the number of clicks before the free trial screener, we expect the count to be equal between the experiment and control groups, thus making this an invariant metric.

Click-Through-Probability

This metric is dependent on the number of unique cookies to click the “Start free trial” and the unique cookie to view the course overview page. Both these instances occur before the free trial screener and thus we would expect this metric to be similar between the experiment and control groups.

Evaluation Metrics

Gross Conversion

This metric is the ratio of user-ids to complete checkout and enroll in the free trial and number of unique cookies to click the “Start free trial” button. This is considered an evaluation metric because we expect this conversion rate to be correlated to the overall student experience and improvement of coaches’ capacity. When determining whether or not to launch the experiment, we are looking for a minimum difference of 0.01 between the experiment and control groups.

Retention

This metric is the ratio between the number of user-ids to remain enrolled past the 14-day boundary and the number of user-ids to complete checkout. We would like to see this metric larger in the experiment group as compared to the control group as it indicates overall improved student experiment as well as coaches’ capacity to support students. We would like to see a minimum difference of 0.01 between the experiment and control groups.

Net Conversion

This metric is the ratio between the number of user-ids to remain enrolled past the 14-day boundary and the number of unique cookies to click the “Start free trial” button. Again, we would like to see metric this increased for our experiment group when compared to the control group with a minimum difference of 0.0075.

In conclusion, in order for us to launch the experiment, we would like to see an increase in retention and net conversion and a decrease in gross conversion. We would like to see all 3 of these metrics move in the above-mentioned direction in order to launch.

Measuring Standard Deviation

# Load libraries
import pandas as pd
import math

# Read and print baseline values table
baseline_values = pd.read_csv('BaselineValuesData.csv')
print('Baseline Values Table:')
print(baseline_values)

#Extract values from table
UniqueCookies = float(baseline_values[baseline_values.Metric == 
                                     'Unique cookies to view page per day']['Value'])
UniqueCookiesToStartFreeTrial = float(baseline_values[baseline_values.Metric == 
                                     'Unique cookies to click Start free trial per day']['Value'])
Enrollments = float(baseline_values[baseline_values.Metric == 
                                     'Enrollments per day']['Value'])
ClickThroughProb = float(baseline_values[baseline_values.Metric == 
                                     'Click-through-probability on Start free trial']['Value'])
EnrollingProb = float(baseline_values[baseline_values.Metric == 
                                     'Probability of enrolling, given click']['Value'])
PaymentProbGivenEnroll = float(baseline_values[baseline_values.Metric == 
                                     'Probability of payment, given enroll']['Value'])
PaymentProbGivenClick = float(baseline_values[baseline_values.Metric == 
                                     'Probability of payment, given click']['Value'])
TotalCookies = float(baseline_values[baseline_values.Metric == 
                                     'Number of cookies visiting course overview page']['Value'])
Alpha = float(baseline_values[baseline_values.Metric == 
                                     'alpha']['Value'])
Beta = float(baseline_values[baseline_values.Metric == 
                                     'beta']['Value'])

# Gross Conversion Standard Deviation
GrossConversionStd = math.sqrt(EnrollingProb*(1 - EnrollingProb)/
                               (TotalCookies*UniqueCookiesToStartFreeTrial/UniqueCookies))
print('Gross Conversion Standard Deviation:')
print('%.4f' % GrossConversionStd)

# Retention Standard Deviation
RetentionStd = math.sqrt(PaymentProbGivenEnroll*(1 - PaymentProbGivenEnroll)/
                               (TotalCookies*Enrollments/UniqueCookies))
print('Retention Standard Deviation:')
print('%.4f' % RetentionStd)

# Net Conversion Standard Deviation
NetConversionStd = math.sqrt(PaymentProbGivenClick*(1 - PaymentProbGivenClick)/
                               (TotalCookies*UniqueCookiesToStartFreeTrial/UniqueCookies))
print('Net Conversion Standard Deviation:')
print('%.4f' % NetConversionStd)
Baseline Values Table:
                                             Metric         Value
0               Unique cookies to view page per day  40000.000000
1  Unique cookies to click Start free trial per day   3200.000000
2                               Enrollments per day    660.000000
3     Click-through-probability on Start free trial      0.080000
4             Probability of enrolling, given click      0.206250
5              Probability of payment, given enroll      0.530000
6               Probability of payment, given click      0.109313
7   Number of cookies visiting course overview page   5000.000000
8                                             alpha      0.050000
9                                              beta      0.200000
Gross Conversion Standard Deviation:
0.0202
Retention Standard Deviation:
0.0549
Net Conversion Standard Deviation:
0.0156

I would expect the analytic estimates for gross conversion and net conversion standard deviation values to be accurate. That is because the gross conversion and the net conversion are both the unit of diversion as well as the unit of analysis. When this is the case, the analytic estimates for standard deviation tend to be accurate. For retention however, we would want to collect an empirical estimate of the variability if we had the time.

Sizing

With an alpha of 0.05 and a beta of 0.2, below are the sample sizes for each evaluation metric that we would need to collect to adequately power the experiment. These values were determined using this website.
Gross Conversion: 25,835
Retention: 39,115
Net Conversion: 27,413
We then use these values to calculate the total number of page views required to adequately power the experiment.

# Sample sizes required
GrossConversion_SS = 25835  
Retention_SS = 39115  
NetConversion_SS = 27413 

# Total page views required
GrossConversion_PV = 2*GrossConversion_SS/(UniqueCookiesToStartFreeTrial/UniqueCookies)
Retention_PV = 2*Retention_SS/(Enrollments/UniqueCookies)
NetConversion_PV = 2*NetConversion_SS/(UniqueCookiesToStartFreeTrial/UniqueCookies)
Minimum_PV = max(GrossConversion_PV, Retention_PV, NetConversion_PV)

print('Gross conversion page views required:')
print(round(GrossConversion_PV))
print('Retention page views required:')
print(round(Retention_PV))
print('Net conversion page views required:')
print(round(NetConversion_PV))
print('Minimum page views required:')
print(round(Minimum_PV))
Gross conversion page views required:
645875
Retention page views required:
4741212
Net conversion page views required:
685325
Minimum page views required:
4741212

Duration vs. Exposure

If we were to divert 100% of the traffic to the experiment, the output below shows the required duration (days) for each evaluation matric.

# Total days required
GrossConversion_Days = GrossConversion_PV/UniqueCookies
Retention_Days = Retention_PV/UniqueCookies
NetConversion_Days = NetConversion_PV/UniqueCookies
Minimum_Days = max(GrossConversion_Days, Retention_Days, NetConversion_Days)

print('Gross conversion days required:')
print(math.ceil(GrossConversion_Days))
print('Retention days required:')
print(math.ceil(Retention_Days))
print('Net conversion days required:')
print(math.ceil(NetConversion_Days))
print('Minimum days required:')
print(math.ceil(Minimum_Days))
Gross conversion days required:
17
Retention days required:
119
Net conversion days required:
18
Minimum days required:
119

For retention, we would have to run the experiment for 119 days, which is too lengthy for us. Given that, we are only left with Gross Conversion and Net Conversion as our evaluation metrics. Since the risk level of this experiment is quite low, it would be reasonable to divert 100% of the traffic and thus would need to run the experiment for 18 days.

Experiment Analysis

Sanity Checks

We perform sanity checks on our invariant metrics

# Read in experimental data and print first 5 rows

# Control data
control_data = pd.read_csv('ControlGroupData.csv')
print('Control data:')
print(control_data.head(5))

# Experiment data
experiment_data = pd.read_csv('ExperimentGroupData.csv')
print('Experiment data:')
print(experiment_data.head(5))

# Expected values for the invariant metrics
NumberOfCookies_EV = 0.5
NumberOfClicks_EV = 0.5
ClickThroughProb_EV = control_data.sum(axis=0)['Clicks'] / control_data.sum(axis=0)['Pageviews']

# Observed values for the invariant metrics
NumberOfCookies_OV = control_data.sum(axis=0)['Pageviews']/(control_data.sum(axis=0)['Pageviews']\
                                                            + experiment_data.sum(axis=0)['Pageviews'])
NumberOfClicks_OV = control_data.sum(axis=0)['Clicks']/(control_data.sum(axis=0)['Clicks'] \
                                                        + experiment_data.sum(axis=0)['Clicks'])
ClickThroughProb_OV = experiment_data.sum(axis=0)['Clicks'] \
/ experiment_data.sum(axis=0)['Pageviews']

# 95% Confidence Intervals
# Number of cookies
print('Number of Cookies Expected Value:')
print(NumberOfCookies_EV)
print('Number of Cookies Observed Value:')
print(round(NumberOfCookies_OV,4))
NumberOfCookies_LB = NumberOfCookies_EV - 1.96*math.sqrt(NumberOfCookies_EV*(1-NumberOfCookies_EV)\
                                                         /(control_data.sum(axis=0)['Pageviews']\
                                                           +experiment_data.sum(axis=0)['Pageviews']))
print('Number of Cookies Lower Bound:')
print(round(NumberOfCookies_LB,4))
NumberOfCookies_UB = NumberOfCookies_EV + 1.96*math.sqrt(NumberOfCookies_EV*(1-NumberOfCookies_EV)\
                                                         /(control_data.sum(axis=0)['Pageviews']\
                                                           +experiment_data.sum(axis=0)['Pageviews']))
print('Number of Cookies Upper Bound:')
print(round(NumberOfCookies_UB,4))
print('Passes Test:')
print(NumberOfCookies_OV > NumberOfCookies_LB and NumberOfCookies_OV < NumberOfCookies_UB)

# Number of clicks on "Start Free Trial"
print('Number of Clicks on "Start Free Trial" Expected Value:')
print(NumberOfClicks_EV)
print('Number of Clicks on "Start Free Trial" Observed Value:')
print(round(NumberOfClicks_OV,4))
NumberOfClicks_LB = NumberOfClicks_EV - 1.96*math.sqrt(NumberOfClicks_EV*(1-NumberOfClicks_EV)\
                                                       /(control_data.sum(axis=0)['Clicks']\
                                                         +experiment_data.sum(axis=0)['Clicks']))
print('Number of Clicks on "Start Free Trial" Lower Bound:')
print(round(NumberOfClicks_LB,4))
NumberOfClicks_UB = NumberOfClicks_EV + 1.96*math.sqrt(NumberOfClicks_EV*(1-NumberOfClicks_EV)\
                                                       /(control_data.sum(axis=0)['Clicks']\
                                                         +experiment_data.sum(axis=0)['Clicks']))
print('Number of Clicks on "Start Free Trial" Upper Bound:')
print(round(NumberOfClicks_UB,4))
print('Passes Test:')
print(NumberOfClicks_OV > NumberOfClicks_LB and NumberOfClicks_OV < NumberOfClicks_UB)

# Click Through Probability on "Start Free Trial"
print('Click-through-probability on "Start Free Trial" Expected Value:')
print(round(ClickThroughProb_EV,4))
print('Click-through-probability on "Start Free Trial" Observed Value:')
print(round(ClickThroughProb_OV,4))
ClickThroughProb_LB = ClickThroughProb_EV - 1.96*math.sqrt(ClickThroughProb_EV*(1-ClickThroughProb_EV)\
                                                           /control_data.sum(axis=0)['Pageviews'])
print('Click-through-probability on "Start Free Trial" Lower Bound:')
print(round(ClickThroughProb_LB,4))
ClickThroughProb_UB = ClickThroughProb_EV + 1.96*math.sqrt(ClickThroughProb_EV*(1-ClickThroughProb_EV)\
                                                           /control_data.sum(axis=0)['Pageviews'])
print('Click-through-probability on "Start Free Trial" Upper Bound:')
print(round(ClickThroughProb_UB,4))
print('Passes Test:')
print(ClickThroughProb_OV > ClickThroughProb_LB and ClickThroughProb_OV < ClickThroughProb_UB)
Control data:
          Date  Pageviews  Clicks  Enrollments  Payments
0  Sat, Oct 11       7723     687        134.0      70.0
1  Sun, Oct 12       9102     779        147.0      70.0
2  Mon, Oct 13      10511     909        167.0      95.0
3  Tue, Oct 14       9871     836        156.0     105.0
4  Wed, Oct 15      10014     837        163.0      64.0
Experiment data:
          Date  Pageviews  Clicks  Enrollments  Payments
0  Sat, Oct 11       7716     686        105.0      34.0
1  Sun, Oct 12       9288     785        116.0      91.0
2  Mon, Oct 13      10480     884        145.0      79.0
3  Tue, Oct 14       9867     827        138.0      92.0
4  Wed, Oct 15       9793     832        140.0      94.0
Number of Cookies Expected Value:
0.5
Number of Cookies Observed Value:
0.5006
Number of Cookies Lower Bound:
0.4988
Number of Cookies Upper Bound:
0.5012
Passes Test:
True
Number of Clicks on "Start Free Trial" Expected Value:
0.5
Number of Clicks on "Start Free Trial" Observed Value:
0.5005
Number of Clicks on "Start Free Trial" Lower Bound:
0.4959
Number of Clicks on "Start Free Trial" Upper Bound:
0.5041
Passes Test:
True
Click-through-probability on "Start Free Trial" Expected Value:
0.0821
Click-through-probability on "Start Free Trial" Observed Value:
0.0822
Click-through-probability on "Start Free Trial" Lower Bound:
0.0812
Click-through-probability on "Start Free Trial" Upper Bound:
0.083
Passes Test:
True

All of the invariant metrics pass the sanity check.

Result Analysis

# Practically significant values
GrossConversion_dmin = 0.01
NetConversion_dmin = 0.0075

# Length of evaluation metrics
EvalLength = (experiment_data.count()['Enrollments'])

# Gross conversion observed difference
GrossConversion_OV = experiment_data.sum(axis=0)['Enrollments']\
/experiment_data[0:EvalLength].sum(axis=0)['Clicks'] - control_data.sum(axis=0)['Enrollments']\
/control_data[0:EvalLength].sum(axis=0)['Clicks']

# Gross conversion observed difference
NetConversion_OV = experiment_data.sum(axis=0)['Payments']\
/experiment_data[0:EvalLength].sum(axis=0)['Clicks'] - control_data.sum(axis=0)['Payments']\
/control_data[0:EvalLength].sum(axis=0)['Clicks']

# 95% Confidence Intervals for Gross Conversion
print('Gross Conversion Observed Difference:')
print(round(GrossConversion_OV,4))
GrossConversion_LB = GrossConversion_OV - 1.96*math.sqrt(((experiment_data.sum(axis=0)['Enrollments'] \
                                                           + control_data.sum(axis=0)['Enrollments'])\
                                                          /(experiment_data[0:EvalLength].sum(axis=0)['Clicks']\
                                                            + control_data[0:EvalLength].sum(axis=0)['Clicks']))\
                                                         *(1-(experiment_data.sum(axis=0)['Enrollments']\
                                                              + control_data.sum(axis=0)['Enrollments'])\
                                                           /(experiment_data[0:EvalLength].sum(axis=0)['Clicks']\
                                                             + control_data[0:EvalLength].sum(axis=0)['Clicks']))\
                                                         *(1/control_data[0:EvalLength].sum(axis=0)['Clicks']\
                                                           + 1/experiment_data[0:EvalLength].sum(axis=0)['Clicks']))
print('Gross Conversion Lower Bound:')
print(round(GrossConversion_LB,4))
GrossConversion_UB = GrossConversion_OV + 1.96*math.sqrt(((experiment_data.sum(axis=0)['Enrollments']\
                                                           + control_data.sum(axis=0)['Enrollments'])\
                                                          /(experiment_data[0:EvalLength].sum(axis=0)['Clicks']\
                                                            + control_data[0:EvalLength].sum(axis=0)['Clicks']))\
                                                         *(1-(experiment_data.sum(axis=0)['Enrollments']\
                                                              + control_data.sum(axis=0)['Enrollments'])\
                                                           /(experiment_data[0:EvalLength].sum(axis=0)['Clicks']\
                                                             + control_data[0:EvalLength].sum(axis=0)['Clicks']))\
                                                         *(1/control_data[0:EvalLength].sum(axis=0)['Clicks']\
                                                           + 1/experiment_data[0:EvalLength].sum(axis=0)['Clicks']))
print('Gross Conversion Upper Bound:')
print(round(GrossConversion_UB,4))
print('Statistical Difference:')
print(GrossConversion_OV > GrossConversion_LB and GrossConversion_OV < GrossConversion_UB)
print('Practical Difference:')
print(abs(GrossConversion_OV) > GrossConversion_dmin)

# 95% Confidence Intervals for Net Convesion
print('Net Conversion Observed Difference:')
print(round(NetConversion_OV,4))
NetConversion_LB = NetConversion_OV - 1.96*math.sqrt(((experiment_data.sum(axis=0)['Payments']\
                                                       + control_data.sum(axis=0)['Payments'])\
                                                      /(experiment_data[0:EvalLength].sum(axis=0)['Clicks']\
                                                        + control_data[0:EvalLength].sum(axis=0)['Clicks']))\
                                                     *(1-(experiment_data.sum(axis=0)['Payments']\
                                                          + control_data.sum(axis=0)['Payments'])\
                                                       /(experiment_data[0:EvalLength].sum(axis=0)['Clicks']\
                                                         + control_data[0:EvalLength].sum(axis=0)['Clicks']))\
                                                     *(1/control_data[0:EvalLength].sum(axis=0)['Clicks']\
                                                       + 1/experiment_data[0:EvalLength].sum(axis=0)['Clicks']))
print('Net Conversion Lower Bound:')
print(round(NetConversion_LB,4))
NetConversion_UB = NetConversion_OV + 1.96*math.sqrt(((experiment_data.sum(axis=0)['Payments']\
                                                       + control_data.sum(axis=0)['Payments'])\
                                                      /(experiment_data[0:EvalLength].sum(axis=0)['Clicks']\
                                                        + control_data[0:EvalLength].sum(axis=0)['Clicks']))\
                                                     *(1-(experiment_data.sum(axis=0)['Payments']\
                                                          + control_data.sum(axis=0)['Payments'])\
                                                       /(experiment_data[0:EvalLength].sum(axis=0)['Clicks']\
                                                         + control_data[0:EvalLength].sum(axis=0)['Clicks']))\
                                                     *(1/control_data[0:EvalLength].sum(axis=0)['Clicks']\
                                                       + 1/experiment_data[0:EvalLength].sum(axis=0)['Clicks']))
print('Net Conversion Upper Bound:')
print(round(NetConversion_UB,4))
print('Statistical Difference:')
print(NetConversion_OV > NetConversion_LB and NetConversion_OV < NetConversion_UB)
print('Practical Difference:')
print(abs(NetConversion_OV) > NetConversion_dmin)
Gross Conversion Observed Difference:
-0.0206
Gross Conversion Lower Bound:
-0.0291
Gross Conversion Upper Bound:
-0.012
Statistical Difference:
True
Practical Difference:
True
Net Conversion Observed Difference:
-0.0049
Net Conversion Lower Bound:
-0.0116
Net Conversion Upper Bound:
0.0019
Statistical Difference:
True
Practical Difference:
False

Summary

This experiment consisted of diverting cookies that visited the Udacity homepage into one of the experiment and control groups. For sanity checking purposes, the invariant metrics that were selected were the Number of Cookies, Number of clicks on “Start free trial” and Click-Through-Probability. The evaluation metrics were Gross Conversion and Net Conversion, where Gross Conversion is the rate of enrollment per cookie and Net Conversion is the rate of payment per cookie. The null hypothesis is that the evaluation metrics are equal in both the experiment and control groups. In order to reject this null hypothesis and accept the alternative hypothesis, the differences in the evaluation metrics for the two groups must exceed the statistically and pre-specified practically significant thresholds. The Bonferroni correction was not used in our methodology since we require all of our evaluation metrics to be significant. The results show that the invariant metrics pass our sanity check since the metrics show no significant difference between the experiment and control groups at the 95% Confidence Interval. Although the Gross Conversion was found to be statistically significant at the 95% Confidence Interval, the Net Conversion however was not found to be practically significant at the 95% Confidence Interval.

Recommendation

The purpose of this experiment was to analyze whether or not adding the suggestion pop-up would lead to the improvement of overall student experience and better use of resources. Although Gross Conversion was found to be significantly and practically different in the experiment group, this was not the case for Net Conversion. Since we had required all of our evaluation metrics to be both statistically and practically significant, my recommendation would be to not launch but instead pursue further experiments.

Follow-Up Experiment

My suggestion of the follow-up experiment would fall under the same theme of filtering out students that are more likely to cancel their enrollment and potentially waste resources. This filtering process however, needs to be slightly more involved than asking one simple question and offering a suggestion. One major difference in this experiment is that the filtering process would take place after the student has enrolled into a course. Right after enrollment, the student would be diverted to either the experiment or control group, based on the user-id. The experiment group would have to fill out a short quiz that asks several questions in an attempt to gauge the probability that the student would finish end up completing the course and making efficient use of resources such as the coaches’ time. There quiz would consist of questions such as “How many hours/week do you work?”, “Do you do any volunteering that could impact your time for studies?”, etc. After completion, the student would be offered a suggestion whether or not it is appropriate for them to continue and offer advice on how to be successful in completing the course. The null hypothesis of this experiment is that the experiment group will not increase retention. The unit of diversion is the user-id since the students are split into experiment or control groups after enrollment. The invariant metrics would be user-id and the evaluation metric would retention. A statistically and practically significant retention rate between the experiment and control group would indicate a launch.

social