Title

Learning Goals

By the end of this tutorial you will be able to:

Formulate a clear and testable A/B test research question.
Compare and contrast different experimental designs (switchback vs. geographic variation).
Justify the choice of an experimental design based on tradeoffs.
Communicate A/B test results clearly to non-technical stakeholders.
Reflect on how quasi-experimental designs might be used when randomization is constrained.

Instructions to Students

This is a conceptual lab — no coding is required. Please think carefully about each question and write your responses in clear, complete sentences. You may discuss with your peers or instructor as needed.

A/B Testing Case Study: “Weekend Vrij on TikTok”

Nederlandse Spoorwegen (NS) wants to boost sign-ups for their “Weekend Vrij” train discount pass among Dutch young adults (18–30). They plan to test two TikTok ads:

Ad A: Focuses on freedom and spontaneity, their current ad
Ad B: Focuses on sustainability and savings, their new ad

They are considering two experimental designs:

Switchback Design: Alternate the ads on the same TikTok account either every day or every hour
Geographic Variation Design: Show Ad A in Randstad provinces and Ad B in non-Randstad provinces using TikTok’s geo-targeting.

1. Clarify the Research Question

Write a version of the research question focused on causality.

solution

What is the causal effect of showing Ad A versus Ad B on click-through rates among Dutch TikTok users aged 18–30?

Write a version focused on the managerial objective.

solution

Which TikTok ad (A or B) generates higher engagement and conversions among Dutch young adults?

Compare the Two Experimental Designs

Fill in the table comparing the Switchback and Geographic Variation designs:

Dimension	Switchback (daily or hourly)	Geographic Variation
Internal validity
External validity
Spillovers / interference
Platform constraints
Ease of implementation
Data needed

solution

Dimension	Switchback (daily or hourly)	Geographic Variation
Internal validity	High — strong control over time effects	Depends — risk of pre-existing regional differences
External validity	Lower — behavior may differ on NS’s main account	Higher — more real-world variation
Spillovers / interference	Possible if users follow patterns across ad switches	Possible if people move across regions
Platform constraints	May conflict with TikTok’s ad optimization algorithm	Requires consistent geo-targeting control
Ease of implementation	Operationally simpler, but may confuse algorithm	Needs precise targeting by region
Data needed	Time-stamped views/clicks, per-ad version	Region-level breakdown of views/clicks

Which timing would you use in the switchback: daily or hourly?

solution

Daily switchback is preferred — hourly switching may introduce noise due to TikTok’s algorithm taking time to stabilize and deliver ads evenly.

Recommend a Design

Choose and justify a design (and switch timing if relevant).

solution

We recommend the daily switchback design.

It has strong internal validity, avoids regional confounds, and can be implemented on a single account. It lets us make clean causal comparisons while keeping ad delivery relatively stable.

If regional strategy is more important, we might recommend geographic variation instead, especially with stratified randomization across similar provinces.

Pitch Your Result to Stakeholders

Explain what you found and what NS should do in 2–3 simple sentences.

solution

We tested two TikTok ad versions across alternating days and found that Ad B (sustainability-focused) drove XX% more clicks than Ad A.

We recommend using Ad B as the primary creative for the campaign targeting Dutch young adults, especially as sustainability resonates with this audience.

When You Can’t Randomize the Randstad

Suppose TikTok doesn’t allow NS to randomize in the Randstad. How could you still run a valid test?

solution

If randomization is not possible in the Randstad, we could use a difference-in-differences design.

We would compare engagement rates before and after showing the ad in both Randstad (treatment) and non-Randstad (control) regions, assuming pre-trends are parallel.

This allows us to estimate the causal impact of the ad even without randomization, though assumptions must be tested carefully.

Title

John Doe

06 May, 2025

Learning Goals

Instructions to Students

A/B Testing Case Study: “Weekend Vrij on TikTok”

1. Clarify the Research Question

solution

solution

Compare the Two Experimental Designs

solution

solution

Pitch Your Result to Stakeholders

solution

When You Can’t Randomize the Randstad

solution