Having traveled from Italy to England on Saturday, spent Saturday night at my parents’ house, traveled from England to Syracuse on Sunday, and then taken a one-day trip to Michigan State University to deliver a seminar, I am finally back home for an extended period.
As I mentioned in passing in my last post, the talk I enjoyed most at the meeting on Ischia was a joint talk given by Anthony Lasenby and Mike Hobson, from Cambridge University. I thought I’d give a brief description of the topic, because the talk made the subject much clearer to me than it had been.
The topic was Bayesian Evidence, which is a method through which one may compare how well two competing theories explain a given set, or sets, of data. I should say at the outset that this method, and related ones, have been being discussed in the cosmology community for some time now (my postdoc has even written a paper on it) but I haven’t paid anything like the attention to it that I should have. I’ve been vaguely keeping an eye on what people say when discussing it at meetings, but that’s about it. I should also point out, as will probably become evident rather quickly, that I am not an expert in statistical techniques for data analysis. Having said this, the fact that this talk interested me nevertheless is another measure of what a nice performance it was.
When one reads papers describing new results in cosmology, one pays attention to the error bars on a given result. To take a topical example, consider the evidence for a scalar spectral index less than one, as inferred from the three-year WMAP data. The measurement is quoted as n=0.951(+0.015)(-0.019). It is usual to interpret these “one-sigma” error bars as Gaussian; that is to say that, since one needs to add more than three multiples of 0.015 to the central value 0.951 to reach unity, one says that this constitutes a measurement of n<1 at greater than 3-sigma and that consequently there is less than a 1% chance of getting this result randomly.
Now, there isn’t anything wrong with this necessarily, but there is something missing from such a description. When one uses n in a fit to the data, one has one more free parameters than one does when leaving n fixed and just fitting the other parameters. Such a better fit might be important of course, but it shouldn’t be a surprise to anyone that one can get a better fit by using more parameters.
So there are two competing things going on here. The first is the fact that new physics can provide a new parameter, which, when allowed to vary, can provide a better fit to the data. One could think of this as a net plus. The second is that it is just easier to get a better fit with more parameters and so there is some kind of net minus associated with every new parameter one adds.
The latter effect is what we mean by Occam’s razor - namely that if two sets of parameters (read two theories) fit the data similarly well, then the simplest one (the one with less free parameters) should be preferred.
Bayesian evidence, the topic of Anthony and Mike’s talk, is a statistical method to provide a quantitative measure of the Occam’s razor part of this competition. It allows one to compute the odds of one theory versus a second theory being the right explanation for a set of data, in a way that quantitatively rewards a theory for reproducing the data more faithfully than its competitor, and penalizes it for having more parameters.
To do this one defines the Bayesian evidence, E, as the average likelihood of a model over its prior parameter space. Thus, if one introduces a new parameter that doesn’t improve the likelihood significantly over much of its range, then the evidence suffers for it.
Now, this all sounds eminently reasonable, but there is a catch of course, namely that there is no unique way to do it. Nevertheless, there are a number of different proposals, and the message that I am taking from reading a few papers and from discussions with colleagues who are much more expert at this than I am, is that one has to try a number of different methods, and if they all seem to give a significant result, then one can be confident that the result is a real one.
It is important to realize that, even when one has a method such as this to calculate the evidence, one has to make a somewhat arbitrary decision as to what value constitutes a result that is significant. In this paper, by Beltran, Garcıa-Bellido, Lesgourgues, Liddle, and Slosar, there is an amusing discussion of this, in which the authors note that models are typically ranked by the logarithm of the Bayesian evidence, ∆ ln E (the logarithm of the ratio of evidences for the competing models) , and quote the famous mathematician and astronomer Sir Harold Jeffreys
… a useful guide is given by Jeffreys [12] who rates ∆ ln E < 1 as ‘not worth more than a bare mention’, 1 < ∆ ln E < 2.5 as ’substantial’, 2.5 < ∆ ln E < 5 ’strong’ to ‘very strong’ and 5 < ∆ ln E as ‘decisive’, in each case the decision being against the model with the smaller evidence. Note that a difference ∆ ln E of 2.5 corresponds to odds of 1 in about 13, and ∆ ln E of 5 to odds of 1 in 150.
If you’d like to read more about this, I found that the introductory sections of this paper helped me a lot, although there are others that might suit you better. At some point Lasenby and Hobson’s slides will be online on the conference website.
Having had such a nice talk pull this topic out of the back of my mind, I’m thinking of having one of the undergraduates who are working with me for the summer study this technique and teach me the details of it. If I get a better understanding I’ll report on it again.


April 26th, 2006 at 7:05 pm
The issue of defining how good a fit is(or how uncertain based on the std dev) while penalizing for a larger number of degrees of freedom is well defined without having to discuss anything Bayesian. If you are using the chi squared to define a confidence interval then the chi squared you use to define the 1 \sigma level or whatnot depends on the number of degrees of freedom in your fit. However Bayesianism reallly enters when you want to start treating these degrees of freedom differently and come up with priors which then makes the whole issue of confidence intervals trickier as you alluded to.
April 26th, 2006 at 7:16 pm
Thanks not a statistician. Indeed, the devil is in the priors for different degrees of freedom as you say.
April 26th, 2006 at 10:19 pm
Just musing about Bayesian probability…
Physicists tend to think in terms of theories being completely overthrown by some observation, but there is a sense in which no single observation is ever really sufficient to reject a theory. It’s always possible that any single observation, or even any finite collection of observations is a fluke. So there really is a subjective judgement going on in the background. How improbable must a fluke be before you are convinced that the theory was wrong, and the contrary evidence wasn’t just a fluke?
It seems to me that Bayesian probabilities allow you in principle to put off forever making the decision as to whether a theory has been falsified or not. Instead, you can imagine starting with all possible physical theories (there are only countably many, if they are to be expressed in finitely many symbols), with some subjective a priori notion of likelihood. Then every experiment performed can be used to adjust the posterior probabilities.
With this approach, as time goes on, Newtonian physics would start off with a low probability, but then would gradually rise with the successful predictions about projectiles and planetary motion, but then would drop sharply with the discoveries of quantum and relativistic phenomena. But it would never go to zero.
If you are trying to make a prediction, you would weight the predictions made by every conceivable theory, according to the posterior likelihood of each theory.
April 26th, 2006 at 10:20 pm
Isn’t it great that you can tell a student to go learn something and then teach it to you? Let this be a lesson to all those students out there working hard to become professors some day — stick with it, the ultimate rewards make it all worthwhile.
April 27th, 2006 at 3:26 am
It is interesting that Bayesian analysis, whose basics was actually worked out in the 18th century, became popular and widely used in economics and science (high energy physics and cosmology as far as I know) in the 20th century, and seems to become a basic method of use in the near future. This process of coinciding old mathematics with new natural science needs, happens commonly in algebra, calculus and geometry, but much less often in statistics.
It seems to me thet the growing attention to such methods like Bayesian analysis in statistics and data analysis, and to Monte Carlo methods in computation (and also toward using these two techniques together), whose basic ideas are relatively simple, but applied techniques are very much free to alter and invent, is the sign of a new way of using math in physics in the future.
April 27th, 2006 at 8:10 am
Bayesian analysis can, in principle, be unambiguous in a multiverse setting, if you had all the information available about the multiverse. You would then know what the prior probabilities are and then you could use observations to update those probabilities unambiguously.
April 27th, 2006 at 10:31 am
Apparently Occam’s razor gives several advantages. Here is a similar paper that also uses two other quantitative measures besides the Bayesian Information Criterion to look at WMAP data http://arxiv.org/PS_cache/astro-ph/pdf/0604/0604410.pdf to make the result more robust. I have also seen a philosophic paper showing the razor gives fewer reversals of hypotheses. (And of course we observe the simpler theory often works.)
Daryl,
I have two questions you could help me with.
First and foremost, a creationist philosopher (Paul Nelson) use much the same construction as you do to arge that naturalism isn’t sufficient for science. I have earlier seen Gödels first incompleteness theorem used to argue that since even some simple formal systems can not be given a computably enumerable axiom list, physics will be forever extendable. Which is it? I would like to see if Nelson is wrong here.
Second, you use some sort of Bayesian metaanalysis to put off falsifiability. Even though a frequentist may say Bayesian probabilities are about beliefs, apparently these beliefs enable decisions for setting parameters (to 5 sigma, say). Isn’t it reasonable to do that for each theory individually (and therefore get falsifiability)? Anyway, shouldn’t the razor support the use of one specific theory at a time?
April 27th, 2006 at 9:58 pm
Torbjörn,
I have no idea how Bayesian analysis of theories tells us anything about the truth or falsity of naturalism. I’ll have to think about the implications of Godel’s theorem.
Second, you use some sort of Bayesian metaanalysis to put off falsifiability. Even though a frequentist may say Bayesian probabilities are about beliefs, apparently these beliefs enable decisions for setting parameters (to 5 sigma, say). Isn’t it reasonable to do that for each theory individually (and therefore get falsifiability)? Anyway, shouldn’t the razor support the use of one specific theory at a time?
The point I was making is that (1) you can never completely falsify any theory, and (2) there is no need to falsify theories. If you need to know the magnetic moment of the electron then you can just perform the weighted average of the predictions made by all possible theories (weighted by the likelihood that each theory is true). If one theory (for instance, QED) is much better supported than any of the competing theories, then its posterior probability will be almost 1, and so its predicted value will dominate. You don’t ever need to consider which theory is best supported by the evidence, because the Bayesian probabilities automatically take all evidence into account.
Now, of course in practice we can’t keep track of all possible theories and all the evidence for and against each, so we simplify matters by throwing out all except for one or two likely candidates. But if the complete Bayesian analysis were possible to do, it’s hard to see how it would ever give us a worse prediction than our current approach (except by luck).
April 28th, 2006 at 9:30 am
What We Talk About When We Talk About Probability
In his most recent post, Cosmic Variance’s Mark Trodden talks about one of the presentations we both saw at last week’s meeting in Ishcia, where he explains one of the hot new techniques for analyzing cosmological data, the (so-called) Bayesian Evide…
April 29th, 2006 at 1:04 am
Daryl,
Thank you for your response!
I was unclear. I was referring to “all possible physical theories (there are only countably many, if they are to be expressed in finitely many symbols)”, which seems to be the same object Nelson uses. He takes that finiteness to imply that scientific naturalism can’t explain a possibly infinite amount of data. One can analyse this in several ways; my first reaction was that I remembered someone using Gödel to argue that physics is forever extendable if necessary.
I see that you mean now by not needing falsifiability in your analysis. I still feel that Occam tells us to junk completely the theories that have low probability. I’ll have to think about that.