Saturday, February 3, 2007

The Reverend Bayes and tennis probability

The Reverend Thomas Bayes was pretty much your archetypal dour, 18th century English Nonconformist minister. Except that he was a probability whizz, and gave us a law which has annoyed amateur statisticians for the better part of three centuries. It goes something like this: the probability of one thing happening given that another happens, is the probability of both happening divided by the probability of the other thing: P(A|B) = P(A & B)/P(B)

A typical example would be "Given that my girlfriend has two cats, at least one of which is female, what is the probability of her having two girl cats?" The probability of two cats both being female is (in our idealised world) one in four, or 25%. The probability of at least one female cat in two is 3/4 or 75% (FF, FM, MF all include a girl, MM doesn't). So the probability of a second girl given a first girl is (25%)/(75%) or 1/3 (33.33%) - higher than the 1/4 given no information.

It is natural that Bayes's Law should remind me of tennis, since I have spent a lot of time looking at both twisting my head from one side to the other. How can we use previous results to determine unknown probabilities? And how well do we know them?

We will need to broach the difficult subject of probability distributions. The easiest way (for me at least) to visualise a probability distribution is as a graph. The graph outlines a region of unit area* - the x axis is an outcome, often a continuum from 0 to 1 but not forcibly; the y axis is a mystical variable called probability density. You drop a pin onto the graph so that the point is equally likely to land anywhere in the region, and read off the value on the x-axis. You can see that the peaks in the graph correspond to more likely outcomes.



Let's look first at the PDF of the sum of three rolled dice. It peaks in the middle, around 10 and 11 - meaning that you're much more likely to roll 11 than 3. This makes sense - you have many more ways to roll 11 (27, I think) than to roll 3 (just one).

Some important distributions include the uniform distribution, which is a level straight line (all outcomes equally likely) and the normal distribution, which is a bell curve - outcomes near the mean are much likelier than ones far away.

What we're going to try in the next article is create a very simple tennis model and find how our knowledge about one game affects our knowledge of the next. Our strategy is going to be as follows.

We start with a uniform prior distribution of p, a variable that describes how likely one of the players is to win a single game. We then take each possible value of p (from 0 to 1) and see what the probability of the result of a known game would be, given that value of p. Out of that we get a likelihood graph showing what values of p are more likely than others - which can be converted into a PDF if we multiply by a constant. Given the PDF, we can determine confidence limits for our value of p - we want to be, say, 75% sure it's no lower than a given value so we can evaluate the odds on offer for the next game.

* at least for continuous PDFs. For discrete ones, the sum of the probabilities is one, which amounts to the same thing in the limit.

[This article and the next were originally one article. But I realised afterwards that I'd lost the thread somewhere and needed to explain things a bit better.]

No comments: