Learning Goals

By the end of this tutorial you will be able to:

  1. Formulate a clear and testable A/B test research question.
  2. Compare and contrast different experimental designs (switchback vs. geographic variation).
  3. Justify the choice of an experimental design based on tradeoffs.
  4. Communicate A/B test results clearly to non-technical stakeholders.
  5. Reflect on how quasi-experimental designs might be used when randomization is constrained.

Instructions to Students

This is a conceptual lab — no coding is required. Please think carefully about each question and write your responses in clear, complete sentences. You may discuss with your peers or instructor as needed.

A/B Testing Case Study: “Weekend Vrij on TikTok”

Nederlandse Spoorwegen (NS) wants to boost sign-ups for their “Weekend Vrij” train discount pass among Dutch young adults (18–30). They plan to test two TikTok ads:

They are considering two experimental designs:

1. Clarify the Research Question

  1. Write a version of the research question focused on causality.
What is the causal effect of showing Ad A versus Ad B on click-through rates among Dutch TikTok users aged 18–30?
  1. Write a version focused on the managerial objective.
Which TikTok ad (A or B) generates higher engagement and conversions among Dutch young adults?

Compare the Two Experimental Designs

Fill in the table comparing the Switchback and Geographic Variation designs:

Dimension Switchback (daily or hourly) Geographic Variation
Internal validity
External validity
Spillovers / interference
Platform constraints
Ease of implementation
Data needed
Dimension Switchback (daily or hourly) Geographic Variation
Internal validity High — strong control over time effects Depends — risk of pre-existing regional differences
External validity Lower — behavior may differ on NS’s main account Higher — more real-world variation
Spillovers / interference Possible if users follow patterns across ad switches Possible if people move across regions
Platform constraints May conflict with TikTok’s ad optimization algorithm Requires consistent geo-targeting control
Ease of implementation Operationally simpler, but may confuse algorithm Needs precise targeting by region
Data needed Time-stamped views/clicks, per-ad version Region-level breakdown of views/clicks

Which timing would you use in the switchback: daily or hourly?

Daily switchback is preferred — hourly switching may introduce noise due to TikTok’s algorithm taking time to stabilize and deliver ads evenly.

Recommend a Design

Choose and justify a design (and switch timing if relevant).

We recommend the daily switchback design.

It has strong internal validity, avoids regional confounds, and can be implemented on a single account. It lets us make clean causal comparisons while keeping ad delivery relatively stable.

If regional strategy is more important, we might recommend geographic variation instead, especially with stratified randomization across similar provinces.

Pitch Your Result to Stakeholders

Explain what you found and what NS should do in 2–3 simple sentences.

We tested two TikTok ad versions across alternating days and found that Ad B (sustainability-focused) drove XX% more clicks than Ad A.

We recommend using Ad B as the primary creative for the campaign targeting Dutch young adults, especially as sustainability resonates with this audience.

When You Can’t Randomize the Randstad

Suppose TikTok doesn’t allow NS to randomize in the Randstad. How could you still run a valid test?

If randomization is not possible in the Randstad, we could use a difference-in-differences design.

We would compare engagement rates before and after showing the ad in both Randstad (treatment) and non-Randstad (control) regions, assuming pre-trends are parallel.

This allows us to estimate the causal impact of the ad even without randomization, though assumptions must be tested carefully.