Difference in Differences

Online Reputation Management

This exercise studies online reputation management through the use of public comments by firms in response to online reviews. The content is based on the article “Online reputation management: Estimating the impact of management responses on consumer reviews” by Proserpio and Zervas. The article is published in Marketing Science in 2017, and is available in our course readings. You should read through the article before answering these questions. The paper wants to investigate the relationship between a hotel’s use of management responses and its online reputation (measured by star rating, stars) & establish a causal relationship from the use of management responses to online reputation. Your goal in this exercise is to explain key arguments and replicate selected results from this paper. The data for this exercise is located data/responses.dta.¹

For this exercise you might need to the following packages:

library(haven)
library(dplyr)
library(tidyr)
library(fixest)
library(purrr)
library(broom)
library(modelsummary)

Explain why firms might use public responses to reviews to manage their online reputation. Should they respond relatively more to positive or negative reviews? Explain why.

solution

Key points:

0.5 pts: Reviews becoming a more important component in a consumer’s decision-making process
0.5 pts: Students will cite numbers from Week 3 lecture slides
1 pt: Reviews important because provide a way to trust that hotel statements align with reality
1pt: Public response offers a legal way to manage reputation, by either dealing with negative complaints or thanking customers for positive reviews, and the responses are available to future consumers to use in their decision making
1pt: Cite paper we discuss in lecture (Chen et al) that shows short response to positive review and longer response to negative review seems to be the right balance, but also remark that these findings apply to future review valence and volume, and not to consumer star ratings.

Load the data for this exercise and name it hotels_orm. For this exercise you will only need the rows where xplatform_dd_obs = 1. Keep only the columns hotel_id, year, stars, after, ta_dummy, first_response, cum_avg_stars_lag, log_count_reviews_lag, t, ash_interval, traveler_type.

solution

hotels_orm <- 
    # 0.5 pt
    read_stata("data/responses.dta") %>% 
    # 1 pt
    filter(xplatform_dd_obs == 1) %>%
    # 0.5pt
    select(hotel_id, year, stars, after,
           first_response, cum_avg_stars_lag, 
           log_count_reviews_lag, ta_dummy, 
           t, ash_interval, traveler_type
        )

Proserpio & Zervas’ empirical exercise uses what they call ‘cross-platform’ difference in differences. Using your own words, explain their idea conceptually - and justify why it is valid. You can use equations or figures, but do so sparingly. (max 7 sentences).

solution

Key points:

[1pt] DiD needs treatment and control groups in before and after phase.
[1pt] Since when manager responds to a review on trip advisor they do so site wide, there is no clear internal to the site control group.
[1 pt]Authors use Expedia as an control group because they do not allow management responses (or are not utilized) throughout time period, and need to assume everything else on Expedia stays constant (or does not impact ratings)
[1pt] Authors then construct a “matched control” for a hotel’s ratings on trip advisor as that hotel’s ratings on Expedia
[1pt] Then if we think of a simple DiD set up that generates four means:
- Treatment, Before => Star rating on Trip Advisor before response
- Treatment, After => Star rating on Trip Advisor after reponse
- Control, Before => Star rating on Expedia before manag response on TA
- Control, After => Star rating on Expedia after manag response on TA

we get a cross platform DiD

Explain what the ‘parallel trends’ assumption is. Why is it important in this application? Which figure (if any) provides support for the parallel trends assumption?

solution

Key Points:

[1pt] parallel trends: In the absence of treatment, the difference between the ‘treatment’ and ‘control’ group is constant over time.
[1pt] In context this means that if no managerial responses took place we’d expect ratings difference between TA and Expedia to stay the same over time
[1 pt] If this is not true, any difference gets washed into the DiD estimate.
[1 pt]Particularly salient here because it means that there cannot be an evolving difference between ratings between platforms. This may occur if users of different sites have different tastes for example.
[1 pt] Fig 6 provides some evidence that over the year before management reviews start by a firm on TA that the difference remains constant.

What is the ‘Ashenfelter dip’? Why do the author’s believe they see a pattern akin to an Ashenfelter dip in their application.

solution

**Key Points:

[1pt] Ashenfelter Dip: hotel reviews on TA are systematically different from Expedia in the period immediately prior to management reviews starting.
- Good students might futher talk about transient endogeneity (and define it), but not needed
[1 pt]If one ignores the dip, will overstate treatment effect due to mean reversion.
[1 pt] Fig 6 shows that in the 30 days prior to MR starting there is a dip in TA ratings compared to Expedia, which looks like this kind of dip.

First, lets compute the ‘simple’ Difference in Difference estimate using differences in means.

Create a data frame with two rows and two columns where the rows take the values of first_response = 0 or 1, and the columns take the values of ta_dummy = 0 or 1. The values in the data frame should be the respective group means of stars.
Compute the difference between first_response = 1 and first_response = 0 for each of ta_dummy = 0 and ta_dummy = 1.
Compute the difference between the two values in (b) to get your simple DiD estimate. What is the estimate that you get?

solution

# answers (a) - 1 pt
pvt_tbl <- hotels_orm %>%
            group_by(ta_dummy, first_response) %>%
            summarize(stars = mean(stars)) %>%
            pivot_wider(names_from = ta_dummy, values_from = stars)

## `summarise()` has grouped output by 'ta_dummy'. You can override using the
## `.groups` argument.

pvt_tbl %>%
    # answers (b) - 1 pt
    mutate_all(funs(. - lag(.))) %>%
    # answers (c) - 1pt
    mutate(did_simple = `1` - `0`) %>%
    # prints the answer
    na.omit() %>%
    select(did_simple)

## Warning: `funs()` was deprecated in dplyr 0.8.0.
## ℹ Please use a list of either functions or lambdas:
## 
## # Simple named list: list(mean = mean, median = median)
## 
## # Auto named with `tibble::lst()`: tibble::lst(mean, median)
## 
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## # A tibble: 1 × 1
##   did_simple
##        <dbl>
## 1      0.302

The authors use the following regression equation to estimate the difference in difference estimator of the effect of online responses on online reputation:

\(Stars_{ijt} = \beta_1 After_{ijt} + \beta_2 TripAdvisor_{ij} + \delta After_{ijt} \times TripAdvisor_{ij} + X_{ijt}\gamma + \alpha_j + \tau_t + \varepsilon_{ijt}\)

where \(Stars_{ijt}\) is the star-rating of review \(i\) for hotel \(j\) in calendar month \(t\), \(After_{ijt}\) is an indicator for reviews (on either platform) submitted after hotel \(j\) started responding, \(TripAdvisor_{ij}\) is an indicator for TripAdvisor ratings, \(X_{ijt}\) are control variables, \(\alpha_j\) are hotel fixed effects, \(\tau\) are calendar-month fixed effects and \(\varepsilon_{ijt}\) is the error term.

To relationship between the variables in the equation above and the variables in the dataset is:²

\(After =\) first_reponse,
\(TripAdvisor =\) ta_dummy, and
\(After_{ijt} \times TripAdvisor_{ij}\) = after

Explain why \(\delta\), i.e. the coefficient for the interaction term \(After_{ijt} \times TripAdvisor_{ij}\) is the difference in difference estimate from the regression (i.e. the coefficient you want to estimate).

solution

Key points

[1 pt]
Easiest way to do this is to compute the four means using the regression model (OK to ignore X and simplify time FE ) and do the DiD table and show \(\delta\) is the difference.

Without loss of generality, set X, \(alpha\) and \(tau\) to zero for all hotels and all time periods.

2pts Then

E(Stars | Treatment Group = 0, After = 0 ) = 0
E(Stars | Treatment group = 1, After = 0) = \(\beta_2\)
E(Stars | Treatment group = 0, After = 1) = \(\beta_1\)
E(Stars | Treatment group = 1, After = 0) = \(\beta_1\) + \(\beta_2\) + \(\delta\)

E(Stars | Treatment group = 0, After = 1) - E(Stars | Treatment group = 0, After = 0) = \(\beta_1\)
E(Stars | Treatment group = 1, After = 1) - E(Stars | Treatment group = 1, After = 0) = \(\beta_1\) + \(\delta\)

Now if you subtract the latter from the former you get \(\delta\)

NOTE: Wordy versions of this would be OK if completely articulates the argument. If not, deduct points for each missed part of the argument.

Now, lets replicate Table 4 of Proserpio and Zervas. Estimate the model outlined above using the fixest package. In particular, produce three regression models:

model_1 should be the regression equivalent of the simple DiD in 6 using the whole dataset. This estimate is not in Table 4 of Proserpio and Zervas.
model_2 should extend model_1 by adding the fixed effects and uses the whole dataset.
model_3 should be the same estimating equation as model_2 but correct for the Ashenfelter dip.
model_4 should augment model_2 by adding the variables cum_avg_stars_lag and log_count_reviews_lag to \(X_{ijt}\), and correct for the Ashenfelter dip.

For each model, standard errors should be clustered by hotel_id.

Use the following starter code for estimating each regression model:

model_x <- fixest(YOUR_CODE ~ YOUR_CODE + 
                    # t:ta_dummy is the platform specific linear time 
                    # trend they mention in the notes of table 4
                    # you need this to get their estimates in models 2 thru 4
                    t:ta_dummy
                    |
                    # put any additional fixed effects here (if you need them)
                    YOUR CODE,
                    data = YOUR_CODE,
                    cluster = ~ YOUR_CODE # what variable denotes the clusters 
                                          # for the standard errors
                    )

solution

# 1 pt per correct model

ash_dip <- hotels_orm %>% 
           filter(ash_interval == 0)

model_1 <- feols(stars ~ after + first_response + ta_dummy,
                data = hotels_orm, 
                cluster = ~hotel_id
                )

model_2 <- feols(stars ~ after + first_response + ta_dummy +  t:ta_dummy
                 | 
                t + hotel_id,
                data = hotels_orm, 
                cluster = ~hotel_id
                )

model_3 <- feols(stars ~ after + first_response + ta_dummy +  t:ta_dummy
                 | 
                t + hotel_id,
                data = ash_dip,
                cluster = ~hotel_id
                )

model_4 <- feols(stars ~ after + first_response + ta_dummy +
                cum_avg_stars_lag + log_count_reviews_lag  + t:ta_dummy
                |
                t + hotel_id,
                data = ash_dip,
                cluster = ~hotel_id
                )

## NOTE: 3,368 observations removed because of NA values (RHS: 3,368).

Report your estimates.

solution

# 0.5 pt per correct model
tidy(model_1, conf.int = TRUE)

## # A tibble: 4 × 7
##   term           estimate std.error statistic  p.value conf.low conf.high
##   <chr>             <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)      4.10      0.0207    198.   0          4.06      4.14  
## 2 after            0.302     0.0258     11.7  1.26e-30   0.252     0.353 
## 3 first_response  -0.0346    0.0228     -1.52 1.29e- 1  -0.0793    0.0100
## 4 ta_dummy        -0.322     0.0242    -13.3  1.57e-38  -0.369    -0.275

tidy(model_2, conf.int = TRUE)

## # A tibble: 4 × 7
##   term           estimate std.error statistic  p.value conf.low conf.high
##   <chr>             <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 after           0.149    0.0207       7.21  8.37e-13  0.108     0.189  
## 2 first_response -0.00532  0.0118      -0.450 6.53e- 1 -0.0285    0.0179 
## 3 ta_dummy       -1.01     0.0493     -20.4   4.77e-83 -1.10     -0.909  
## 4 ta_dummy:t      0.00522  0.000434    12.0   3.73e-32  0.00437   0.00608

tidy(model_3, conf.int = TRUE)

## # A tibble: 4 × 7
##   term           estimate std.error statistic  p.value conf.low conf.high
##   <chr>             <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 after           0.123    0.0224       5.49  4.59e- 8  0.0790    0.167  
## 2 first_response -0.0116   0.0128      -0.913 3.61e- 1 -0.0367    0.0134 
## 3 ta_dummy       -1.03     0.0508     -20.2   8.01e-82 -1.13     -0.927  
## 4 ta_dummy:t      0.00556  0.000454    12.2   3.73e-33  0.00467   0.00645

tidy(model_4, conf.int = TRUE)

## # A tibble: 6 × 7
##   term                 estimate std.error statistic   p.value conf.low conf.high
##   <chr>                   <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
## 1 after                 0.0970   0.0186       5.20  2.18e-  7  0.0604    0.134  
## 2 first_response       -0.00270  0.0111      -0.243 8.08e-  1 -0.0245    0.0191 
## 3 ta_dummy             -0.803    0.0439     -18.3   1.11e- 68 -0.889    -0.717  
## 4 cum_avg_stars_lag     0.288    0.0109      26.5   1.04e-130  0.267     0.309  
## 5 log_count_reviews_l… -0.00339  0.00491     -0.691 4.89e-  1 -0.0130    0.00623
## 6 ta_dummy:t            0.00473  0.000393    12.1   3.39e- 32  0.00396   0.00550

Interpret the value of the coefficients you deem most important. Are these effects significant from a marketing perspective (i.e. Should they shape marketing practice)? (max. 7 sentences)

solution

Key points:

[1 pt] only important coefficient here is \(\delta\).
[1pt] model 4 is the preferred model since it has the right set of controls.
[0 pt] then we care about the coefficient on “after” based on their x variable names = 0.097
[1pt] Interpret: After adopting management reviews, star ratings on TA increase by 0.097 on average.
[1 pt] From Tab 2 we see average rating is 3.6, so we get a 0.1/3.6 = 3 percent increase in rating. This feels small, and the authors don’t do a great job contextualzing this (where does that move a hotel in the distribution? What % of the standard deviation in ratings is this)

Explain why management responses to reviews can lead to improved hotel ratings. Can you support this argument using any of the results in Proserpio and Zervas’ work? Be specific as to which results you can use and how they support your arguments. (max. 10 sentences).

solution

Key points:

Authors have a whole section on this: “Why do management responses affect hotel ratings?”

[1pt per factor] Three factors might explain the increase in ratings after the management started to respond to reviews:

(i.) the increased cost of leaving a negative review, 
(ii.) the increased benefit of leaving a positive response and the possibly 
(iii.) higher rating of returning customers.

- (i) Table 15 Cols 2-4 show longer reviews posted for bad star ratings as need to defend views.
- (ii) Table 14 col 1 shows increase in reviews after MR starts suggesting this might be the case
- (iii) No obvious evidence, authors argue the data is not rich enough.</div></div></div></div>

Apart from the “cross-platform design”, the Difference in Differences strategy adopted by Proserpio and Zervas differs across one other important dimension from the traditional design. What dimension is this? What effects might it have on the results and their interpretability?

solution

Key points:

[3pts] DiD assumes treatment start date is the same for all treated units, which implies hotels all start responding on same date.
This doesn’t happen, and looks like a staggered adoption issue which muddies the interpretation.
[2 pts] Also notice the decision to start responding to reviews is a decision of hotels, so some selection on unobservables going on meaning treatment and error might not have conditional mean independence

Note that the data is stored as a .dta format (i.e. a Stata dataset). We use the package haven with its read_stata() command to load a Stata dataset.↩︎
This mapping is not immediately obvious, and one of the (small) perils of using a dataset that one hasn’t constructed themselves from scratch. We hope this clarifies which variables need to be included in the regression.↩︎

Difference in Differences

Social Media and Web Analytics @ TiSEM

Online Reputation Management