Friday, January 26, 2007

ELO, Élö, it's good to be back

So far, I haven't said anything at all about sports or rating systems. The sports thing, well, that's only going to change tangentially. The thrust of this piece is going to describe one of the best-known rating systems, the Élö system. I'm going to get tired of the accents, which may not show up on all systems, so I'll revert to calling it Elo like everyone else.

Elo is primarily used for chess. The basic premise is to compare how well a player did over a given time frame - say the number of games he* won in a month - with how well he ought to have done, given his ranking and his opponents'. At the end of the time period, his rating is adjusted to take those games into account.

That description leaves at least two questions: how do we know how well he ought to have done? and, how do we adjust the rating afterwards?

How well the player ought to have done depends on who he's played - if he's played ten games against Deep Blue, he might be expected to win one, if he's really good. Against my girlfriend's cat Darwin? Probably seven, if Darwin's on form. In fact, the number of games he's expected to win is the sum of the probabilities of each individual game**.

All well and good. But how do we figure out the probabilities of each game, given the ratings? This, unavoidably, is going to require some maths. If you have a rating of, say, 1700, and Darwin (who's only just started playing chess) is ranked at 1500, you have a rating difference (D) of +200. From Darwin's point of view, it's -200. The probability of you beating him is 1/(1 + 10(-0.025 D))***. In this case, that's 1/(1 + 10-0.5), or 1/(1 + 0.32), about 0.76. For Darwin, if you'd like to do the sums, it's 1/(1 + 3.16), or 0.24. Note that the probabilities add up to one, as they ought to****. Over ten games, you'd expect to win 7.60, while Darwin makes do with 2.4. (Incidentally, he plays worse if you throw a ball for him to chase.)

So, let's say you played ten games against Deep Blue - we'll say it has an Elo rating of 2800 - (D = -1100 for you) your probability for each game is very small (0.18% - about one in 550). In ten games, you'd expect to win about 0.02. Let's say you did well against Darwin and picked him off eight times, and Deep Blue creamed you, as is its way. You won eight games (W=8), and your expected number of wins was E=7.62 - you did better than expected by 0.38 wins. Very good.

The last step of the process is to adjust your rating. A certain weighting (K) is given to your recent results - in chess, it's usually about K=12, but in later articles I'll hopefully determine what the best values for tennis are. In this case, your rating goes up by K(W-E), or 4.62 points. Your new rating would be 1705 (rounded off).



* or she. Take that as read pretty much everywhere in this blog.

** For the mathematicians: The expected number of wins is sum(p_i) +/- sqrt(sum(p_i . (1 - p_i))).

*** The 0.025 is arbitrary, but the only difference it makes (so far as I can see) is to the spacing of rankings.

**** ... so long as we exclude the draw as a possible result. Which we do, for simplicity's sake.

No comments: