My answers

Motivation

Linear regression is a workhorse model of a Marketing Analyst’s toolkit. This is because it gives them the ability to describe data patterns, predict the value of marketing metrics in data and potentially make causal claims about the relationships between multiple variables.

In this tutorial you will apply linear regression to get first hand experience with these tools. We will focus both on how to linear regression in R and how to correctly interpret the results. You will use linear regression to evaluate the association between product characteristics and product price in an internet mediated market.

Learning Goals

By the end of this tutorial you will be able to:

Estimate Single and Multiple Regression models with R.
Interpret regression coefficients.
Discuss likely biases in regression coefficients due to omitted variable bias.
Present regression coefficients in a table and in a plot.

Instructions to Students

These lab assignments are not graded, but we encourage you to invest time and effort into working through them from start to finish. Add your solutions to the lab-regression-answers.Rmd file as you work through the exercises so that you have a record of the work you have done.

Obtain a copy of both the question and answer files using Git. To clone a copy of this repository to your own PC, use the following command:

git clone https://github.com/tisem-digital-marketing/smwa-lab-regression.git

Once you have your copy, open the answer document in RStudio as an RStudio project and work through the questions.

The goal of the tutorials is to explore how to “do” the technical side of social media analytics. Use this as an opportunity to push your limits and develop new skills. When you are uncertain or do not know what to do next - ask questions of your peers and the instructors on the class Slack workspace.

Regression Analysis in the Wild

The advent of the internet, and the rise in user generated content has had a large effect on sex markets. In 2008 and 2009, Scott Cunningham and Todd Kendall surveyed approximately 700 US internet mediated sex workers. The questions they asked included information about their illicit and legal labor market experiences and their demographics. Part of the survey asked respondents to share information about each of the previous four sessions with clients.

To gain access to the data, run the following code to download it and save it in the file data/sasp_panel.dta:

url <- "https://github.com/scunning1975/mixtape/raw/master/sasp_panel.dta"
# where to save data
out_file <- "data/sasp_panel.dta"
# download it!
download.file(url, 
              destfile = out_file, 
              mode = "wb"
              )

The data include the log hourly price, the log of the session length (in hours), characteristics of the client (such as whether he was a regular), whether a condom was used, and some characteristics of the provider (such as their race, marital status and education level). The goal of this exercise is to estimate the price premium of unsafe sex and think through any bias in the coefficients within the regression models we estimate.

You might need to use the following R libraries throughout this exercise:¹

library(haven) # to read stata datasets
library(dplyr)
library(tidyr)
library(broom)
library(ggplot2)
library(modelsummary)

Load the data. The data is stored as a Stata dataset, so it can be loaded with the read_dta() function from the haven package.

	(1)	(2)	(3)	(4)
unsafe	−0.035	0.003	0.004	0.032
	(0.027)	(0.025)	(0.025)	(0.038)
llength		−0.265	−0.260	−0.260
		(0.017)	(0.017)	(0.017)
reg			−0.061	−0.035
			(0.026)	(0.037)
unsafe × reg				−0.049
				(0.051)
Num.Obs.	1499	1499	1499	1499
R2	0.001	0.141	0.144	0.145
R2 Adj.	0.000	0.140	0.143	0.143
RMSE	0.53	0.49	0.49	0.49

My answers

My name

11 April, 2024

Motivation

Learning Goals

Instructions to Students

Regression Analysis in the Wild