Andrew Gelman: China Air Pollution Regression Discontinuity Update: "Avery writes: 'There is a follow up paper for the paper “Evidence on the impact of sustained exposure to air pollution on life expectancy from China’s Huai River policy” [by Yuyu Chen, Avraham Ebenstein, Michael Greenstone, and Hongbin Li].... “New evidence on the impact of sustained exposure to air pollution on life expectancy from China’s Huai River Policy”'.... The cleanest summary of my problems with that earlier paper is this article, 'Evidence on the deleterious impact of sustained use of polynomial regression on causal inference', written with Adam Zelizer...

...Here’s the key graph, which we copied from the earlier Chen et al. paper:

Preview of Andrew Gelman China air pollution regression discontinuity update Statistical Modeling

The most obvious problem revealed by this graph is that the estimated effect at the discontinuity is entirely the result of the weird curving polynomial regression, which in turn is being driven by points on the edge of the dataset. Looking carefully at the numbers, we see another problem which is that life expectancy is supposed to be 91 in one of these places (check out that green circle on the upper right of the plot)—and, according to the fitted model, the life expectancy there would be 5 years higher, that is, 96 years!, if only they hadn’t been exposed to all that pollution. As Zelizer and I discuss in our paper, and I’ve discussed elsewhere, this is a real problem, not at all resolved by (a) regression discontinuity being an identification strategy, (b) high-degree polynomials being recommended in some of the econometrics literature, and (c) the result being statistically significant at the 5% level. Indeed, items (a), (b), (c) above represent a problem, in that they gave the authors of that original paper, and the journal reviewers and editors, a false sense of security which allowed them to ignore the evident problems in their data and fitted model.

We’ve talked a bit recently about “scientism,” defined as “excessive belief in the power of scientific knowledge and techniques.” In this case, certain conventional statistical techniques for causal inference and estimation of uncertainty have led people to turn off their critical thinking. That said, I’m not saying, nor have I ever said, that the substantive claims of Chen et al. are wrong. It could be that this policy really did reduce life expectancy by 5 years. All I’m saying is that their data don’t really support that claim. (Just look at the above scatterplot and ignore the curvy line that goes through it.)

OK, what about this new paper?... Anyway, I still don’t buy... their statistical claim that their data strongly support their scientific claim.... I feel like kind of a grinch saying this. After all, air pollution is a big problem, and these researchers have clearly done a lot of work with robustness studies etc. to back up their claims. All I can say is: (1) Yes, air pollution is a big problem so we want to get these things right, and (2) Even without the near-certainty implied by these 95% intervals excluding zero, decisions can and will be made. Scientists and policymakers can use their best judgment, and I think they should do this without overrating the strength of any particular piece of evidence...

Andrew Gelman and Adam Zelizer: Evidence on the Deleterious Impact of Sustained Use of Polynomial Regression on Causal Inference: "It is common in regression discontinuity analysis to control for third- or fifth-degree polynomials of the assignment variable...

...Such models can overfit, leading to causal inferences that are substantively implausible and that arbitrarily attribute variation to the high-degree polynomial or the discontinuity.... This study is indicative of a category of policy analyses where strong claims are based on weak data and methodologies which permit the researcher wide latitude in presenting estimated treatment effects. We then replicate a procedure from Green et al... to illustrate one practical problem with the regression discontinuity estimate... high-degree polynomials yield noisy estimates of treatment effects that do not accurately convey uncertainty. We recommend that (a) researchers consider the problems which may result from controlling for higher-order polynomials; and (b) that journals recognize that quantitative analyses of policy issues are often inconclusive and relax the implicit rule under which statistical significance is a condition for publication...


#shouldread

Comments