Cosma Shalizi Takes Me to Probability School. Or Is It Philosophy School?
After I accuse Cosma Shalizi of waterboarding the Rev. Dr. Thomas Bayes, he responds:
Cosma Shalizi: Cosma Shalizi Waterboards the Rev. Dr. Thomas Bayes: Hoisted from Comments: I am relieved to learn that the true model of the world is always already known to every competent statistical inquirer, since otherwise it could not be given positive prior weight. I would ask, however, when our model set became complete? And further, when did people stop using models which they knew were at best convenient but tractable approximations?
Less snarkily, these two examples are out-takes from what I like to think is a fairly serious paper on Bayesian non-parametrics with mis-specified models and dependent data:
described less technically here:
The examples were simple sanity-checks on my theorems, and I posted them because they amused me.
Thus Cosma Shalizi takes me to probability school. Or perhaps he takes me to philosophy school. It is not clear.
Let me give an example simpler than one of the ones Cosma Shalizi gave. Rosencrantz is flipping a coin. Guildenstern is watching and is calling out "heads" or "tails." It is a fair coin--half the time it comes up heads, and half the time it comes up tails. Before Rosencrantz starts flipping, Guildenstern's beliefs about what the next flip of the coin will bring are accurate: he thinks that there is a 50% chance that the next flip of the coin will be heads and a 50% chance that the next flip of the coin will be tails.
Because Guildenstern starts with correct beliefs about what the odds are for the next flip of the coin, you might think that there is nothing for Guildenstern to "learn"--that as Rosencrantz flips, Guildenstern will retain his initial belief that the odds are 50-50 that the next flip of the coin will be heads or tails. But there is a problem: Guildenstern is not a human being but rather is a Bayesian AI, and Guildenstern is certain that the coin is biased: it thinks that there is a 50% chance it is dealing with a coin that lands heads 3/4 of the time, and a 50% chance it is dealing with a coin that lands tails 3/4 of the time, and its initial prediction that the next flip is equally likely to be heads or tails depends on that initial 50-50 split.
What happens as Rosencrantz starts flipping? The likelihood ratio for an H-biased as opposed to a T-biased coin is 3z, where z=h-t and h is the number of heads and t is the number of tails flipped, which means that the posterior probabilities assigned by Guildenstern after h heads and t tails are:
P(H | z) = 3z/(3z + 1)
P(T | z) = 1/(3z + 1)
And the estimate that the next flip will be heads is:
(3/4)P(H | z) + (1/4)P(T | z) = (3z+1 + 1)/(4(3z + 1))
If the number of heads and tails are even, then Guildenstern (correctly) forecasts that the odds on the next flip are 50-50. If the n flips Rosencrantz has performed have seen two more heads than tails--no matter how big n is--then Guildenstern is 90% certain that it is dealing with an H-biased coin and thinks that the chance the next flip will be heads is 70%. If the n flips Rosencrantz has performed have seen ten more heads than tails--again, no matter how many flips n there have been--Guildenstern is 99.9983% sure that it is dealing with an H-biased coin and will forecast the odds of a head on the next flip at 74.9999%.
How will Guildenstern's beliefs behave over time? Well, this passage from Shalizi's more complex example applies:
Three-Toed Sloth: The sufficient statistic z [for P(H)]... follows an unbiased random walk, meaning that as n grows it tends to get further and further away from [zero], with a typical size growing roughly like n1/2. It does keep returning to the origin, at intervals dictated by the arc sine law, but it spends more and more of its time very far away from it. The posterior estimate of the [probability of an H-biased coin thus wanders from being close to +1 to being close to [0] and back erratically, hardly ever spending time near zero, even though (from the law of large numbers) the sample mean [fraction of heads] converges to zero...
So Guildenstern spends all of its time being nearly dead certain that it is dealing with an H-biased coin or nearly dead certain that is dealing with a T-biased coin--but it switches its belief occasionally--even though there is almost surely never any statistically significant evidence for H-bias against the null hypothesis that the coin is fair and 50-50. There is an allowable set of beliefs for Guildenstern that will lead it to make the right 50-50 forecast of the odds on the next flip: if Guildenstern simply continues to believe that there is no evidence either way for H-bias or T-bias. But Guildenstern's beliefs are not those beliefs and do not converge to those beliefs: look far enough out into the future and you see that Guildenstern is almost sure either that the coin is H-biased or that the coin is T-biased, and has virtually no chance of being unsure about in which direction the bias lies.
Thus Guildenstern's processing of the data is not sensible, is not smart, is not rational, is not human--but it is Bayesian. For positive values of z, Guildenstern thinks "there are fewer heads than I would expect for an H-biased coin, but this could never come about with a T-biased coin; the coin must be H-biased: I am sure of it." A sensible agent, a smart agent, a rational agent, a human agent would think: "Hmmm. Right now I am sure that the coin is T-biased, but 100 flips ago I was sure that the coin was H-biased. I know that as I get more evidence my beliefs should be converging to the truth, but they don't seem to be converging at all. Something is wrong." But there is nothing in the Bayesian agent's little brain to allow it to reason from the failure of its beliefs to converge to the conclusion that there is something badly wrong here.
But it seems, intuitively, that Guildenstern should be able to make good forecasts. The the prior that Guildenstern started with does admit of beliefs that would lead to accurate forecasts of the next coin flip: all Guildenstern has to do is to doubt that it has enough information to decide about the bias of the coin. Indeed, Guildenstern's initial beliefs generate the right forecast of probabilities for the next flip. So, given that Guildenstern starts out with a set of beliefs that supports and generates the "right" forecast probabilities, given that there really isn't enough information to decide about the bias of the coin--there can't be, for the coin is not biased--and given that Bayesian learning is a kind of learning, why doesn't Guildenstern simply keep its original beliefs and keep making good forecasts? Shalizi has identified a case in which it seems that a Bayesian agent should be able to learn enough to make good predictions, but cannot in fact do so.
Shalizi has gotten his Bayesian AI Guildenstern to confess. But has he done so by legitimate means? Or by waterboarding? The probability theory question, or perhaps the philosophy question, is: Is this a problem for the Bayesian way of looking at the world? Or only a demonstration that torture elicits confessions?