Simple Linear Regression – What the heck are we estimating?

It is likely that you’ve heard a coefficient from a linear regression described in several ways: “marginal effect”, “average marginal effect”, or “marginal effect on the average”, to name a few. Well, which is it? Or, are they all valid interpretations? The answer of course, is it depends.

Consider the simple linear regression model y_{i} = \beta_{0} + x_{i}\beta_{1} + u_{i} , where \beta_{0}, \beta_{1} are the population regression coefficients. That is, \beta = E(X_{i}X_{i}')^{-1} E(X_{i}y_{i}), where X_{i}'=[1 \   x_{i}]. Then, u_{i} is defined as y_{i} - \beta_{0} + x_{i}\beta_{1}. Note that u_{i} is by construction uncorrelated with x_{i} – this is a property of regression (and is easily verifiable by looking at first-order condition of the least-squares minimization problem).

This may sound off the alarm bells. Isn’t the main concern when performing regression analysis that there may be a correlation between x_{i} and u_{i}? Yes, this is a concern. However you are thinking of x_{i} and u_{i} from a structural equation, that is, one that represents a causal relationship. Suppose the structural equation is y_{i} = \gamma_{0} + \gamma_{1}x_{i} + \epsilon_{i}, where \gamma_{1} is the causal effect of x on y. Because this is not a regression (i.e. \gamma is not a regression coefficient) there is no restriction on the relationship between x_{i} and \epsilon_{i}. In most econometrics texts, regression is introduced in the context of a causal model, and thus E[\epsilon_{i}x_{i}]=0 is stated as an assumption. And then, when this assumption does not hold we say that the regression estimates are “not consistent” estimators of the true parameters. Really, regression estimates are always consistent – they are consistent for the population regression coefficients. When the error term of the causal model is uncorrelated with x, then the structural equation and the population regression are identical (i.e. \beta=\gamma). When the assumption fails, it is not the case that regression gives you inconsistent estimates, it’s simply that the population regression you are estimating is not the same as the causal relationship you are interested in.

Okay, that detour gives a lot of insight into how to interpret a regression coefficient. It’s worth repeating that regression is purely a statistical relationship; it does not (necessarily) represent any structural or causal relationship between y and x. Regression is a mechanical exercise and can be applied to any set of variables. So how to interpret \beta_{1}? First, let’s not assume anything about what the structural model actually is. If E[y|x] is in fact a linear function, then X'\beta will be this conditional expectation function, and thus the interpretation of \beta_{1} will be the marginal effect of x on E[y|x]. Or, loosely speaking, the “marginal effect on the average”. This is how you should interpret regression coefficients when you make no assumption concerning the underlying causal model (the only assumption I made was that E[y|x] was linear – this does not concern the causal model – and if is in fact nonlinear, then X'\beta provides the best linear approximation to E[y|x].)

Now, suppose the true structural equation is the one I’ve specified earlier (and that E[x_{i}\epsilon_{i}]=0, so that \beta=\gamma). Then, our OLS estimate \beta_{1} can be interpreted as the causal effect of x on y (“marginal effect”). So, there are times when a regression estimate can be thought of as a marginal effect of x on y ; namely, when the structural relationship is linear, the causal effect is constant (the same for all individuals), and x is uncorrelated with all other variables that affect y.

I think introducing regression in a causal framework does a great disservice to students. It gives students the impression that causality and regression are intrinsically linked to each other, and clouds the fact that regression is a purely statistical exercise.

I guess I never made it to when regression estimates could be thought of as “average marginal effects” – this requires thinking about a structural model where the causal effect differs for individuals – this can be saved for a later post!

Source: Mostly Harmless Econometrics (Angrist and Pischke 2009)


The blog

This blog combines thoughts from me (Matt) and also my brother Will. My posts will center around applied econometrics and causal inference. Will is more data-oriented and his posts will reflect that. Anyways, the motivation for both of us is just to get some thoughts formalized in an organized fashion. Enjoy the blog.