As discussed in my previous blog posts, a lot of research is being done in ad attribution and media mix modeling. Today I’ll introduce another paper that provides some interesting analysis. Fair warning, you should have a basic idea of Bayesian regression before reading this. You can find a great introduction here.
Carryover and Shape Effects
The authors’ most exciting contribution is incorporating carryover and shape effects in their media mix model. Carryover effects try to model the impact of media spend over a future period. Since media spend influences consumers on buying a product or service, the impact of such spending doesn’t just last for the time an advertisement is aired but for a more extended period. The authors transform the time series of media spend using a decay function for accounting for such carryover effects. They use the adstock function as described below :
wm is a non-negative weight function, and the media spend effect is the weighted average of media spend of the current period and previous L-1 periods. The authors introduce two types of weight functions, geometric decay (where media spend peaks when an advertisement is aired) and delayed adstock (where the impact of media spend rises sometimes after an ad is aired). A visualization describing the effects of the weight functions can be seen below, taken from the paper.
Next, the authors discuss shape effects. Shape effects aim to capture diminishing returns on media spend. For example, it is valid to assume that for a specific medium, the rate at which media spends rises is dramatic from 0 to $50 but reduces significantly from $100 to $150. The authors use a Hill function to model shape effects. The discussion of Hill functions is beyond the scope of this blog, but the regression coefficients can be multiplied by the Hill function to get the following form :
The hill function for media spend is a point transformation, as opposed to earlier discussed carryover effects. The following graph, taken from the paper, gives a visual representation of diminishing returns, given different parameter values in the Hill function :
Both these transformations can be applied to media spend. Depending on individual use cases, one must decide which transformation to apply first. The authors apply the adstock transformation first and then the shape transformation. The final sales at time t, which can be described as y_t , can be modeled using the following equation :
To simplify, this equation models sales as a function of some baseline sales τ in addition to transformed media spend effects of control variables, and random noise.
Why Bayesian Regression
A common question could be: Why estimate these parameters using bayesian regression? The answer lies in the fact that Bayesian regression lets us quantify the uncertainty in our predictions, and more importantly, allows us to set priors on our parameters. For example, it is valid to assume that media spend will never have a negative effect on sales, which allows us to set informative priors on media spend coefficients (constraining them to be non-negative values).
The authors then explain their implementation of this model to real-world datasets. They use Gibbs sampling to sample from their model and implement this in STAN. However, multiple techniques for sampling from the posterior distribution and their code can be replicated easily using PyMC3. Please take a look at the fundamentals of Bayesian Regression if this isn’t making much sense.
The parameter estimates obtained from the model can then be plugged into a linear optimization algorithm that conditions on a fixed media spend budget to find the best media mix given a set of channels. The linear optimization algorithm introduced by the author is beyond the scope of this post, but I might discuss it in my next one. Stay tuned!