Learning Goals

By the end of this tutorial you will be able to:

  1. Justify the choice of outcome variables and baseline variables in an A/B test
  2. Determine the necessary sample size and duration of an A/B test
  3. Analyze whether an A/B test was correctly randomized
  4. Analyze the effects of a treatment on outcome variables of interest
  5. Prescribe recommendations to managers based on the results of an A/B test
  6. Design a follow-up experiment based on results of an A/B test

Instructions to Students

These lab assignments are not graded, but we encourage you to invest time and effort into working through them from start to finish. Add your solutions to the lab_ab_test_answers.Rmd file as you work through the exercises so that you have a record of the work you have done.

Obtain a copy of both the question and answer files using Git. To clone a copy of this repository to your own PC, use the following command:

git clone https://github.com/tisem-digital-marketing/smwa-lab-ab_test.git

Once you have your copy, open the answer document in RStudio as an RStudio project and work through the questions.

The goal of the tutorials is to explore how to “do” the technical side of social media analytics. Use this as an opportunity to push your limits and develop new skills. When you are uncertain or do not know what to do next - ask questions of your peers and the instructors on the class Slack workspace.

You will need to load the following R libraries to complete the exercises:

library(readr)
library(dplyr)
library(broom)
library(ggplot2)
library(tidyr)
library(tibble)
library(vtable)

You may need to install some of these if they are not already on your machine.

A/B Testing Case Study: “Free Trial Screening”

Udacity is an online learning platform that offers courses and nano-degree programs in various fields of technology, business, and data science. They aim to provide accessible and high-quality education to individuals seeking to advance their careers or learn new skills in the rapidly evolving fields of technology and business.

Udacity courses currently have two options on the home page: “start free trial”, and “access course materials”. If the student clicks “start free trial”, they will be asked to enter their credit card information, and then they will be enrolled in a free trial for the paid version of the course. After 14 days, they will automatically be charged unless they cancel first. If the student clicks “access course materials”, they will be able to view the videos and take the quizzes for free, but they will not receive coaching support or a verified certificate, and they will not submit their final project for feedback.

Udacity have struggled with setting clear expectations for students who enrol in their programs, particularly with regard to the workload in terms of expected study time. As a result, they face the problem of frustrated students who leave the course before completing the free trial. These frustrated students who leave before the trial ends, ultimately do not stay enrolled long enough to be charged for the paid version of the course. Udacity wants to improve the overall student- and educator experience on the platform.

Udacity have designed an experiment, to test a change where if the student clicked “start free trial”, they were asked how much time they had available to devote to the course. If the student indicated 5 or more hours per week, they would be taken through the checkout process as usual. If they indicated fewer than 5 hours per week, a message would appear indicating that Udacity courses usually require a greater time commitment for successful completion, and suggesting that the student might like to access the course materials for free. At this point, the student would have the option to continue enrolling in the free trial, or access the course materials for free instead. The screenshot below shows what the experiment looks like for students allocated to the treatment group:

The unit of analysis cookie, although if the student enrolls in the free trial, they are tracked by user-id from that point forward. The same user-id cannot enroll in the free trial twice. For users that do not enroll, their user-id is not tracked in the experiment, even if they were signed in when they visited the course overview page.

The data that you need to analyze is located in data/udacity.csv. It contains the raw information needed to compute any of the metrics we discuss in the questions below. The data is broken down by day.

The definition of each variable is:

Outcome Variable and Baseline Variable Choices

  1. Which of the following metrics (some of which you may need to compute from existing data) would you choose to measure for this experiment and why? For each metric you choose, indicate whether you would use it as an baseline metric or an outcome metric. The minimum detectable effect for each metric is included in parentheses.
    1. Number of cookies: the number of unique cookies to view the course overview page. (Minimum Detectable Effect = 3000)
    2. Number of user-ids: the number of users who enroll in the free trial. (Minimum Detectable Effect = 50)
    3. Number of clicks: That is, number of unique cookies to click the “Start free trial” button (which happens before the free trial screener is trigger). (Minimum Detectable Effect = 240)
    4. Click-through-probability: That is, number of unique cookies to click the “Start free trial” button divided by number of unique cookies to view the course overview page. (Minimum Detectable Effect = 0.01)
    5. Gross conversion: That is, number of user-ids to complete checkout and enroll in the free trial divided by number of unique cookies to click the “Start free trial” button. (Minimum Detectable Effect = 0.01)
    6. Retention: That is, number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by number of user-ids to complete checkout. (Minimum Detectable Effect = 0.01)
    7. Net conversion: That is, number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the number of unique cookies to click the “Start free trial” button. (Minimum Detectable Effect = 0.0075)

Outcomes variables that we want to track include:

  • Gross conversion
  • Net conversion
  • Retention

Baselines (dont change due to experiment):

  • clicks,
  • pageviews
  • click thru (clicks/ cookies)

Determining Sample Size, Duration and Exposure

The following estimates are baseline values for some key numbers from Udacity in the time period before the experiment begins. Note: that these numbers are fictitious.

# Number of unique cookies per day
n_cookies_per_day <- 40000
# Number of unique cookies that click "start free trial" per day
n_cookies_start_trial <- 3200
# Number of enrollments per day
n_enroll <- 660
# Click through probability for start free trial
n_ctr_free <- 0.08
# Probability of enrolling, given click
n_prob_enrol_click <- 0.20625
# Probability of payment, given enrol
n_prob_payment_enroll <- 0.53
# Probability of Payment, given click
n_prob_payment_click <- 0.109313
  1. How many page views will you need to collect to have adequate statistical power in your experiment? Ensure there is enough power for each of your metrics of choice. Use the following values in your analysis, \(alpha = 0.05\), \(\beta = 0.2\)
## Sample size for gross conversion
# start with clicks
n_clicks_test_size <-
    2 *
    round(
    power.prop.test(
    p1 = n_prob_enrol_click,
    p2 = n_prob_enrol_click + 0.01,
    power = 0.8,
    sig.level = 0.05
    )$n
)
# from clicks to pageviews
pageviews_for_sample_gross_conversion <- n_clicks_test_size / n_ctr_free

## Sample size for retention
n_retention_test_size <-
    2 *
    round(
        power.prop.test(
            p1 = n_prob_payment_enroll,
            p2 = n_prob_payment_enroll + 0.01,
            power = 0.8,
            sig.level = 0.05
        )$n
    )
# enroll per pageview
pageviews_for_sample_retention <- round(n_retention_test_size / (n_enroll / n_cookies_per_day))

## Sample size for net conversion
n_conversion_test_size <-
    2 *
    round(
        power.prop.test(
            p1 = n_prob_payment_click,
            p2 = n_prob_payment_click + 0.0075,
            power = 0.8,
            sig.level = 0.05
        )$n
    )
## clicks to pageviews
n_conversion_test_size / n_ctr_free
## [1] 699600
  1. What percentage of Udacity’s traffic would you divert to this experiment? Explain.

  2. Given the percentage you chose, how long would the experiment take to run? If the answer is longer than four weeks, then this is unreasonably long, and you should go back and update your earlier decisions.

## what ratio to divert
# if we rely on conversion and divert 100% traffic:
ceiling(pageviews_for_sample_retention / n_cookies_per_day)
## [1] 119
# too many days -- have to drop this metric

# so rely on gross conversion
# 100 percent traffic, is that realistic no
ceiling(pageviews_for_sample_gross_conversion / n_cookies_per_day)
## [1] 17
# if we do 60% -- 28 days -- ok thats approx a month, seems reasonable
ceiling(pageviews_for_sample_gross_conversion / (0.6*n_cookies_per_day))
## [1] 28

Balance Checks

  1. Load the data into R.
udacity <- read_csv("data/udacity.csv")
## Rows: 74 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): date, treatment
## dbl (4): pageviews, clicks, enrollments, payments
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(udacity)
## Rows: 74
## Columns: 6
## $ date        <chr> "Sat, Oct 11", "Sun, Oct 12", "Mon, Oct 13", "Tue, Oct 14"…
## $ pageviews   <dbl> 7723, 9102, 10511, 9871, 10014, 9670, 9008, 7434, 8459, 10…
## $ clicks      <dbl> 687, 779, 909, 836, 837, 823, 748, 632, 691, 861, 867, 838…
## $ enrollments <dbl> 134, 147, 167, 156, 163, 138, 146, 110, 131, 165, 196, 162…
## $ payments    <dbl> 70, 70, 95, 105, 64, 82, 76, 70, 60, 97, 105, 92, 56, 122,…
## $ treatment   <chr> "control", "control", "control", "control", "control", "co…
  1. Are there any missing values in your data? Make a decision to drop or keep them, and justify your decision.
udacity <- 
    udacity %>%
    na.omit()
  1. Verify the randomization into treatment and control group was successful. If you find evidence that the randomization failed look at the day by day data and see if you can offer any insight into what is causing the problem.
udacity %>%
    select(pageviews, clicks, treatment) %>%
    mutate(ctr = clicks/pageviews) %>%
    st(group = 'treatment', group.test = TRUE)
Summary Statistics
treatment
control
treatment
Variable N Mean SD N Mean SD Test
pageviews 23 9224.478 850.296 23 9189.652 802.03 F=0.02
clicks 23 751.87 77.008 23 750.435 70.323 F=0.004
ctr 23 0.082 0.004 23 0.082 0.003 F=0.025
Statistical significance markers: * p<0.1; ** p<0.05; *** p<0.01

Analysis

  1. Compute the value of the outcome variables you chose above aggregating across all days in the experiment.
agg_df <- udacity %>%
    group_by(treatment) %>%
    summarise(
        enrollment = sum(enrollments),
        click = sum(clicks),
        payment = sum(payments),
        gross_conversion = enrollment/click,
        net_conversion = payment / click
    )

agg_df
## # A tibble: 2 × 6
##   treatment enrollment click payment gross_conversion net_conversion
##   <chr>          <dbl> <dbl>   <dbl>            <dbl>          <dbl>
## 1 control         3785 17293    2033            0.219          0.118
## 2 treatment       3423 17260    1945            0.198          0.113
  1. Produce plots of your data that show how the outcome variables of interest differ between the treatment and control groups.
# for gross conversion
# note i compute for net too, you'd need to change the plot
udacity %>% 
    group_by(treatment) %>%
    summarise(
        enrollment = sum(enrollments),
        click = sum(clicks),
        payment = sum(payments),
        gross_conversion = enrollment/click,
        net_conversion = payment / click,
        std_err_gc =  sqrt(((gross_conversion) * (1 - gross_conversion) / n())),
        std_err_nc =  sqrt(((net_conversion) * (1 - net_conversion) / n()))
    ) %>%
    ggplot() +
    geom_bar(aes(x = treatment, 
                 y = gross_conversion), 
             stat="identity", 
             fill="skyblue", 
             alpha=0.7
             ) +
    geom_errorbar(aes(x = treatment, 
                      ymin = gross_conversion - std_err_gc, 
                      ymax = gross_conversion + std_err_gc), 
                  width = 0.4, 
                  colour = "orange", 
                  alpha=0.9, 
                  size=1.5
                  ) +
    theme_bw() + 
    ggtitle("Gross Conversion") +
    theme(text = element_text(size = 14),
          plot.title = element_text(hjust = 0.5))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
  1. Conduct the appropriate statistical tests to examine the effect of the treatment on your outcome variables using data aggregated across the duration of the experiment.

HINT: Use the prop.test() or t.test() functions.

# gross conversion
prop.test(agg_df$enrollment, agg_df$click)
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  agg_df$enrollment out of agg_df$click
## X-squared = 21.983, df = 1, p-value = 2.751e-06
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.01193170 0.02917804
## sample estimates:
##    prop 1    prop 2 
## 0.2188747 0.1983198
# net conversion
prop.test(agg_df$payment, agg_df$click)
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  agg_df$payment out of agg_df$click
## X-squared = 1.9666, df = 1, p-value = 0.1608
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.001914623  0.011662068
## sample estimates:
##    prop 1    prop 2 
## 0.1175620 0.1126883
  1. Now consider each day as an independent unit of observation. Use a linear regression to test whether the treatment impacts your outcome variables of choice. Interpret the results.
# i do this in logs to gove the percentage impact interpretation
tidy(lm(log(enrollments/clicks) ~ treatment, data = udacity))
## # A tibble: 2 × 5
##   term               estimate std.error statistic  p.value
##   <chr>                 <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)          -1.53     0.0433    -35.4  6.00e-34
## 2 treatmenttreatment   -0.107    0.0612     -1.75 8.70e- 2
tidy(lm(log(payments/clicks) ~ treatment, data = udacity))
## # A tibble: 2 × 5
##   term               estimate std.error statistic  p.value
##   <chr>                 <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)         -2.16      0.0582   -37.2   7.16e-35
## 2 treatmenttreatment  -0.0589    0.0823    -0.716 4.78e- 1
  1. Which of two sets of estimates do you prefer? Explain.

This boils down to whether we are interested in total conversions or average conversions.

The regression is computing an conversion metric each day, and then the regression is estimating the average conversion effect, where the average is taken over days of the trial.

The proportions test is using aggregated data and asking what is the total effect on conversions.

The total conversions measure is more interesting than the average in this case.

Recommendations

  1. Based on your analysis, would you launch this feature? Justify your answer.

The screener will help reduce the enrollment, but not enough evidence to show that there will be more students who make the payments.

I would not recommend launching this screener.

Follow Up Experiment

  1. If you wanted to reduce the number of frustrated students who cancel early in the course, what experiment would you try?
    1. Give a brief description of the change you would make,
    2. What your hypothesis would be about the effect of the change,
    3. What metrics you would want to measure, and what unit of analysis you would use.

Include an explanation of each of your choices.

No solution provided.

Many possibilities to discuss in class!