Marketing today relies on a variety of metrics to gain insight into its efficacy. Given the variety of online and offline channels available to marketers, understanding the impact and interaction of individual channels has become an onerous task, to say the least.
Marketers rely heavily on two methods to obtain data-driven insights into the marketing process, Media Mix Modeling (MMM) and Data-Driven Attribution. MMM provides a “top -down” view into the marketing process in order to generate high-level insights into the efficacy of different marketing channels. For example, by looking at data over months or years, MMM can give marketers insight into consumers’ interaction with different marketing media. Attribution models, on the other hand, take a more “bottom-up” approach to the marketing process. These models look at an individual user’s interaction with different media. Since each user is exposed to a combination of marketing channels, the problem lies in ascertaining how much credit to give each marketing channel towards influencing a user’s choice about making a purchasing decision. Historically, marketers have used common attribution models such as last touch (first touch) attribution. Last touch attribution models assign all credit to the last channel (first channel) a user has been exposed to prior to conversion. The flaw in the last touch (first touch) attribution lies in the fact that channels further from (closer to) the conversion funnel are systematically undervalued.
To allocate credit more fairly, algorithm-based methodologies have received significant traction in the past decade. In a series of three blogs will introduce three papers that discuss algorithm-based models for media mix modeling and attribution modeling.
The dominance analysis approach for comparing predictors in multiple regression (Budescu, 1993)
Regression models have become a common way to explore the interaction between revenue and advertising efforts. Budescu introduces a general framework known as dominance analysis that aims to decompose the coefficient of determination (R2). For the sake of simplicity, we will only deal with linear models in this post. Budescu’s work can be extended to any area of research that tries to deal with variable importance.
Review of Legacy Methods
Various methods have been developed over time to measure the importance of variables. These methods mostly rely on using the coefficients of independent variables from standard linear models to explain variable importance. Let’s look at a standard linear model defined as the following:
y=β1 x1+⋯+βi xi+⋯+βp xp+ϵ
Let’s denote the coefficient of determination of this model as R2y,X. The vector β= (β1, β2,…..βx) represents the change in the dependent variable y, associated with a unit change in each independent variable, given the other independent variables are left unchanged. Under these constraints, it is reasonable to conclude that the squared coefficients perfectly partition the coefficient of determination, as described in the equation below:
R2y,x = ∑pj=1 p2y,xj = ∑pj=1 β2j
While this method of using variable coefficients as importance measures is intuitive and appropriate in the case of no intercorrelations between dependent variables, in most real-world applications, dependent variables (advertising channels in this case) have some level of correlation, making this method inappropriate.
Dominance Analysis
Dominance Analysis compares coefficients of determination of all nested submodels composed of subsets of independent variables with that of the full model. Too much jargon? Let’s take a look at an example.
Let’s say we have a total of ‘p’ independent variables in our linear model. We will build 2p-1 models, since these are the total number of subset models that can be created. We will then compute the incremental R2 contribution of each independent variable to the subset model of all other independent variables. Let’s take a scenario where we have 4 independent variables X1 , X2 , X3 and X4. We will build 24-1 models ie. 15 models. These will be 4 models with only one independent variable, 6 models with 2 independent variables each, 4 models with 3 independent variables each, and finally 1 model with all the independent variables. Thus, the incremental R2 contribution for variable X1 for example, is the increase in R2 value when X1 is added to each subset of the remaining independent variables (i.e., the null subset { . } , { X2 } , { X3 } , { X4 } , { X2 , X3 } , { X2 , X4 } , { X3 , X4 } and { X2 , X3 , X4 } ). Similarly, the incremental R2 contribution for variable X2 is the increase in value when is added to each subset of the remaining independent variables (i.e., the null subset { . } , { X1 } , { X3 } , { X4 } , { X1 , X3 } , { X1 , X4 } , { X3 , X4 } and { X1 , X3 , X4 } ).
The beauty behind dominance analysis lies in the fact that the sum of the overall average incremental R2 of all independent variables is equal to the R2 of the model with all independent variables (the complete model). This allows easy partitioning of the total coefficient of determination amongst independent variables.
An inherent problem with dominance analysis is the lack of computational efficiency. The need to train 2p – 1 models means that the number of models that would have to be trained increases exponentially as the number of independent variables increases.
Relative Weights Analysis
Another paper, which can be found here, builds on the concept of relative weights analysis as an alternative to dominance analysis. However, relative weights analysis is a fundamentally flawed method of determining attribution and has been debunked, most famously in this paper. The reason I even bring this up, is to forewarn a reader that the theoretical underpinnings of relative weights analysis is dubious, and to recommend dominance analysis as the superior R2 decomposition method.