By the end of this tutorial you will be able to:
This is a conceptual lab — no coding is required. Please think carefully about each question and write your responses in clear, complete sentences. You may discuss with your peers or instructor as needed.
Nederlandse Spoorwegen (NS) wants to boost sign-ups for their “Weekend Vrij” train discount pass among Dutch young adults (18–30). They plan to test two TikTok ads:
They are considering two experimental designs:
- Write a version of the research question focused on causality.
- Write a version focused on the managerial objective.
Fill in the table comparing the Switchback and Geographic Variation designs:
Dimension | Switchback (daily or hourly) | Geographic Variation |
---|---|---|
Internal validity | ||
External validity | ||
Spillovers / interference | ||
Platform constraints | ||
Ease of implementation | ||
Data needed |
Dimension | Switchback (daily or hourly) | Geographic Variation |
---|---|---|
Internal validity | High — strong control over time effects | Depends — risk of pre-existing regional differences |
External validity | Lower — behavior may differ on NS’s main account | Higher — more real-world variation |
Spillovers / interference | Possible if users follow patterns across ad switches | Possible if people move across regions |
Platform constraints | May conflict with TikTok’s ad optimization algorithm | Requires consistent geo-targeting control |
Ease of implementation | Operationally simpler, but may confuse algorithm | Needs precise targeting by region |
Data needed | Time-stamped views/clicks, per-ad version | Region-level breakdown of views/clicks |
Which timing would you use in the switchback: daily or hourly?
Choose and justify a design (and switch timing if relevant).
We recommend the daily switchback design.
It has strong internal validity, avoids regional confounds, and can be implemented on a single account. It lets us make clean causal comparisons while keeping ad delivery relatively stable.
If regional strategy is more important, we might recommend geographic variation instead, especially with stratified randomization across similar provinces.Explain what you found and what NS should do in 2–3 simple sentences.
We tested two TikTok ad versions across alternating days and found that Ad B (sustainability-focused) drove XX% more clicks than Ad A.
We recommend using Ad B as the primary creative for the campaign targeting Dutch young adults, especially as sustainability resonates with this audience.Suppose TikTok doesn’t allow NS to randomize in the Randstad. How could you still run a valid test?
If randomization is not possible in the Randstad, we could use a difference-in-differences design.
We would compare engagement rates before and after showing the ad in both Randstad (treatment) and non-Randstad (control) regions, assuming pre-trends are parallel.
This allows us to estimate the causal impact of the ad even without randomization, though assumptions must be tested carefully.