The Martingale

Tuesday, March 27, 2007

Baseball and a toss of the coin

It seems the new baseball season is upon us. I wouldn't know this, but for a bunch of people hanging around watching spring training games in the arcade where girlfriend and I regularly play air-hockey. But never one to let a new sport slip by me, especially one so stats-friendly as baseball, I got my head down and did some work.

While considering the best way to implement a fuzzy Elo system (on which, possibly, more later), I wondered to myself: how much evidence is there that skill plays any part in MLB? I mean, how different are the results from what you'd get by tossing coins?

This is actually a pretty simple investigation. Looking just at last year's wins-losses record for all 30 MLB teams, sticking the wins into a spreadsheet, expecting that each team would win 81 of 162 coin tosses* and asking my free Excel-alike to do a chi-square test: there's a 16.3% probability that results as extreme or more so would occur just by tossing coins.

However - something's not quite right here. In this account, every game is counted twice, once for the home team and once for the away team. How about we just look at the home records? That way, we only count each game once, as nature intended.

Things get even more incriminating for people who prefer baseball to watching slot-machines - tossing a coin for each game would return more extreme results more than two thirds of the time. And that's without taking into account the observed home-field advantage.

Let's do that now - assuming a 55% win rate for home teams**, we find the probability of more extreme results to be 99.32%. Extending that back to 2001***, the probability is less - about 46% - but still way short of statistical significance. In short, given that data, there's no reason to suspect that skill variations between MLB teams has any influence on the season-by-season home win-loss records.

---

Of course, that's all a bit mischievous. While the stats are sound enough, there is a possible problem with the data: 80-odd matches per team simply isn't long enough to establish a statistically significant result. Indeed, when you group the records by team over the last six seasons, agglomerating as you go, you do get a statistically significant result (a probability around 10^-8, since you ask).

It's possible to play around with this. If you only look at the middle 20 teams, the probability is well over 20%. If you decide that Yankee Stadium, Boston, Oakland, San Francisco and Minneapolis are intimidating places to go and crank up their win ratios to 60%, while saying that Tampa Bay, Cincinnati, Baltimore, Detroit and Kansas City are less intimidating and worth only 50% - all numbers plucked from guesswork - then it's back to insignificance at the 10% level. Caveat being, of course, that the stadia to alter were picked after looking at the data.

* Some teams only played 161 games. I took this into account.

** That's a guess inspired by the data. Over the last six seasons, the actual rate was 53.91%.

** i.e. taking home W-L records for each team in each season, making (30 x 6)=180 records.

Tuesday, February 6, 2007

Kelly for the cowardly

Kelly staking is - as we've seen before - the mathematically optimal way to grow your bankroll. It has one glaring problem, though: it's horrifically volatile. Let's imagine we make 100 bets which we know are 50-50 shots but the bookies insist on pricing at 2.10. Our Kelly stake is (po-1)/(o-1) = 4.55%.

Now, when we win, we tend to win big - about a quarter of the time, we'd get 53 or more correct. That would net us at least a 49% profit. The flip-side of that is, a quarter of the time we get 46 or fewer and lose a quarter of our bankroll. There's a one in twenty chance that we'll lose half of our bankroll (although one in ten that we'll double it). With bigger edges or shorter odds, the fluctuations can be terrifying.

Is there a way to reduce them? Well, obviously, if you don't bet so much, your bankroll is steadier. But let's say you're still pretty greedy, and want to maximise your worst plausible outcome.

How do you even define that? Well, given that we're looking at a binomial distribution, we can use stats to help us. If we look at N identical bets with probability p, we know that 97.7% of the time* we'll win at least W_min = Np - 2 sqrt(Np(1-p)) of them. Bumping up the 2 to 3 gives us 99.87% confidence.

Whatever value we choose - I'm happy enough with two - outlines our worst plausible set of results over N trials**. We can then calculate our worst plausible outcome, which is B₀ (1 + k(o-1))^Wmin(1 - k)^(N-Wmin).

The trick now is to maximise this with respect to each k. It turns out, if we define p* as Wmin/N, that our optimal Kelly stake in this sense is (p*o-1)/(o-1). And if it's less than zero, we don't bet.

This is quite restrictive - in the case above, with N = 100 we simply wouldn't bet - p* is 40%, far too low to allow us to meet our minimum. N = 1000 isn't that much better - p* = 46.8%, where we need 47.6%. N = 2500 is just about enough.

Here are the results of running 2500 bets 1000 times over (using the two staking patterns on the same events):


           Pure Kelly     Modified
Stake        4.55%           0.73%
AROI***     26.07%           3.81%
SD          64.52%          23.58%
Worst      -79.85%         -11.44%

So, on average, Kelly outperforms the modified version by some way - but at the cost of much higher risk. The modified stakes 'guarantee' that the lowest plausible value is as large as possible.

It is possible to make up the discrepancy to a fair degree by increasing N, because the larger N is, the closer p* is to p (the square root term ends up getting very small).

Modified Kelly staking is worthwhile for bets with sufficiently large edges, or over sufficiently long runs. If you plan to make only 100 bets, you would need odds of at least 2.5 on a 50-50 shot before the modified stakes allowed you to bet.

I just typed bed, which is probably a Freudian slip. It's getting late.

* Look it up in a normal distribution table.

** We needn't assume the bets are identical. In general, we can replace Np with sum(p) and the bit inside the square root would be sum( p(1-p) ). But that complicates things a bit more than we need for the proof of concept.

*** Average Return on Investment

Sunday, February 4, 2007

Optimal staking subject to constraints

This came from a post in Punter's Paradise by The Dark Arts.

It's Sunday night, you've got 5 value calls on that night's NFL games. Let's say you can get 10/11 (1.909) but you make them 55% chances. However, they are each simultaneous kick off times.

What's your stake?

(BTW,I don't know the answer).

tda.

The maths for this is a mess, using partial derivatives and Lagrangian multipliers, but the stake sizes that maximise your bankroll long-term can be calculated.

Here's the situation: you make n simultaneous bets of k_i of your bankroll B₀ at o_i, each of which has probability p_i of occurring (for i = 1..n). The expected return E_i for each bet is

E_i = B₀ (1 + k_i (o_i -1)^(p_i) (1-k_i)^(1-p_i). (1)

Your expected bankroll B after the results come in is

B(k) = sum(i=1..n) E_i. (2)

However, you're subject to the constraint that you can't bet more than your entire bankroll:

g(k) := sum(i=1..n) k_i <= 1. (3)

This is a problem for Lagrangian multipliers. We want to maximise B (Eq 2) subject to the constraint (Eq 3). We then want to solve for:

∂B/∂k_i + λ ∂g/∂k_i (4), for all i, and
g(k) <= 1 (5)

The derivation is a mess, with
∂B/∂k_i = B₀ (1 + k_i (o_i-1))^(p_i-1) (1-k_i)^(-p_i) (p_io_i-1 - k_i(o_i-1)). (6)

I doubt there's a closed-form solution for this, but trial and error works - let's try a complicated case first, with two great-looking bets:
Bet .p. .o.. .k.. .1. 0.9 1.50 0.70 .2. 0.8 2.50 0.67

Our Kelly stakes add to 1.37 bankrolls and we only have one! So what's the best solution? Obviously, if we can't get as much on as we want, we should bet as much as we can, so we have a strict constraint - the <= is an equals sign. The derivatives of g are all one, so we're left with λ = ∂B/∂k_i for all i - meaning that all the derivatives of B are the same. In this case, the only solution is for λ ~ 0.225, giving stakes of 41.8% and 58.2%.

The upshot of all this is that the optimal staking strategy is when the partial derivatives ∂B/∂k_i are equal and the k_is sum to at most one. There are two scenarios: first, if the Kelly stakes generated the usual way sum to less than 1, they're optimal. This is the case in TDA's question, which I'll get back to in a minute. If not, you're going to need to get your Excel solver working hard to satisfy the constraints. Or write some code.

In TDA's question, we had p=0.55, o=1.919 and n = 5. The optimal stake on each is 0.055, making a total stake of 0.275.

The Reverend Bayes and tennis probability, Part II

For our Big Bayesian Experiment, let's take an example from 2005, which is the tennis-data spreadsheet I currently have open. Roger Federer* took on Andre Agassi in the semi-final of the Dubai Duty Free men's tournament on Feb. 26th. A little over a month earlier, they played in the quarter-finals of the Australian Open, which Federer won handily, 6-3, 6-4, 6-4.

I'm going to take a very simple model of tennis, one which could almost certainly be improved by taking into account strength of serve and the number of service breaks in each set - unfortunately, I have neither the data, the programming skill nor the patience to put that kind of model into effect for a short blog article. Anyway, contrary to all good sense, I'll assume that each game has an identical probability of each player winning, and that parameter carries forward to the next match.

And to begin with, let's also assume that we know nothing about Federer and Agassi - the chance of Federer being a Scottish no-hoper and having no chance of getting near Agassi (p_F=0, p_A=1) is the same as that of the roles being reversed (p_F=1, p_A=0) - all values of p_F are equally likely. This is our prior distribution. We're looking for a posterior distribution of p_F - how likely each value of the variable is.

There's some number-crunching to be done here. What we're going to do is examine every (well, almost) possible value of p_F, find the probability for each of Federer winning the match with that score, and examine the distribution that comes out. Some examples:

.pF. P(6-3) P(6-4) P(6-3, 6-4, 6-4) 0.40 0.0493 0.0666 0.00022 0.50 0.1091 0.1228 0.00165 0.60 0.1670 0.1505 0.00378 0.70 0.1780 0.1204 0.00258

The highest of these is 0.60 (in fact, the most likely value for p_F is the number of games he won - 18 - over the number played - 29 - or about 0.62). The number-crunching (combined with NeoOffice) give a nice graph of the likelihoods, which I reproduce here:

You can see that the peak of the PDF (I won't go into the terminology here, it's getting long) is indeed around 0.62 (0.6207, to be precise). But we're interested in a confidence range, for which we read off of the CDF the p_F values for P at various levels. For instance, we're 95% certain that p_F lies between the 0.025 level and the 0.975 level - i.e. between 0.4385 and 0.7734. We're 50% sure that it lies between the 0.25 and 0.75 levels - so between 0.5550 and 0.6734.

How does this help us? Well, it narrows down p_F substantially - remember, we didn't have a clue what it was before. Now we can say with some certainty where it isn't - it's unlikely to be more than 0.7734 or less than 0.4385, eliminating almost two-thirds of the possible values at a stroke. It's as likely as not to be in the range [0.5550, 0.6734]. Can we translate this into a match probability for the next game? Of course. Again, it's a number-crunching exercise, but we get the following for our key values of p_F:

..pF.. P(Set) P(2-0) P(2-1) P(Win) 0.4385 0.3307 0.1093 0.1464 0.2557 0.5550 0.6522 0.4253 0.2959 0.7217 0.6207 0.8075 0.6250 0.2510 0.8760 0.6734 0.8977 0.8058 0.1649 0.9707 0.7734 0.9825 0.9653 0.0338 0.9991

This tells us: the most likely probability of Federer beating Agassi in two sets is 87.6% (1.14), that it's as likely as not to be between 72.17% (1.39) and 97.07% (1.03), and 95% certain to be between 25.57 (3.91) and 0.9991 (1.001). In the event, the best available odds were 1.35 with Expekt - and depending on how much confidence-in-value we wanted, we would take those and reap the benefit of Federer's 6-3, 6-1 demolition job.

This system is, I have to stress, very basic and currently only works if the two players met in the recent past. In the next instalment, I might have a shot at a link function so we can rate players who haven't met recently - but have played a common opponent.

* easy to spell; difficult to stop spelling

Saturday, February 3, 2007

The Reverend Bayes and tennis probability

The Reverend Thomas Bayes was pretty much your archetypal dour, 18th century English Nonconformist minister. Except that he was a probability whizz, and gave us a law which has annoyed amateur statisticians for the better part of three centuries. It goes something like this: the probability of one thing happening given that another happens, is the probability of both happening divided by the probability of the other thing: P(A|B) = P(A & B)/P(B)

A typical example would be "Given that my girlfriend has two cats, at least one of which is female, what is the probability of her having two girl cats?" The probability of two cats both being female is (in our idealised world) one in four, or 25%. The probability of at least one female cat in two is 3/4 or 75% (FF, FM, MF all include a girl, MM doesn't). So the probability of a second girl given a first girl is (25%)/(75%) or 1/3 (33.33%) - higher than the 1/4 given no information.

It is natural that Bayes's Law should remind me of tennis, since I have spent a lot of time looking at both twisting my head from one side to the other. How can we use previous results to determine unknown probabilities? And how well do we know them?

We will need to broach the difficult subject of probability distributions. The easiest way (for me at least) to visualise a probability distribution is as a graph. The graph outlines a region of unit area* - the x axis is an outcome, often a continuum from 0 to 1 but not forcibly; the y axis is a mystical variable called probability density. You drop a pin onto the graph so that the point is equally likely to land anywhere in the region, and read off the value on the x-axis. You can see that the peaks in the graph correspond to more likely outcomes.

Let's look first at the PDF of the sum of three rolled dice. It peaks in the middle, around 10 and 11 - meaning that you're much more likely to roll 11 than 3. This makes sense - you have many more ways to roll 11 (27, I think) than to roll 3 (just one).

Some important distributions include the uniform distribution, which is a level straight line (all outcomes equally likely) and the normal distribution, which is a bell curve - outcomes near the mean are much likelier than ones far away.

What we're going to try in the next article is create a very simple tennis model and find how our knowledge about one game affects our knowledge of the next. Our strategy is going to be as follows.

We start with a uniform prior distribution of p, a variable that describes how likely one of the players is to win a single game. We then take each possible value of p (from 0 to 1) and see what the probability of the result of a known game would be, given that value of p. Out of that we get a likelihood graph showing what values of p are more likely than others - which can be converted into a PDF if we multiply by a constant. Given the PDF, we can determine confidence limits for our value of p - we want to be, say, 75% sure it's no lower than a given value so we can evaluate the odds on offer for the next game.

* at least for continuous PDFs. For discrete ones, the sum of the probabilities is one, which amounts to the same thing in the limit.

[This article and the next were originally one article. But I realised afterwards that I'd lost the thread somewhere and needed to explain things a bit better.]

Friday, February 2, 2007

Thursday, February 1, 2007

From the postbag: Doubles

Of course, Splittter is the only one writing to me at the moment, which makes me feel a bit like Willie Thorne in the Fantasy Football League sketches. Anyway, here is his wisdom:

Your post on doubles has been bugging me since I read it basically because the accepted gambling wisdom is simply "don't do doubles", full stop, no exceptions... yet your maths looked correct.

I had a sneaking suspicion that it had to do with your bet size relative to your bankroll, and that hidden in the double is the fact that you're essentially sticking an amount larger than your actual stake on the 'second' outcome.

So, to test that theory I imagined the following:

There are two bets for which you'll get 3.00: event 1 you reckon will come in 37%, event 2 35%, both clear value bets.

He goes on to analyse the situation in excruciating detail. As I refuse to be out-mathsed by anyone, let alone Splittter, I'll do the same but more clearly - and reach a slightly different conclusion. His experiment suggests Kelly staking.

With Kelly staking, you would place a fraction k = p - (1-p)/(o-1) of your bankroll on each bet. Your expected return is p(kB(o-1)) - (1-p)(kB) = Bk(po-1)

Betting singles, your Kelly stake on the first game is 5.5% of bankroll; on the second, 2.5%. The outcomes are as follows:

Win-win: (12.95%) +16.50%
Win-lose: (24.05%) + 8.23%
Lose-win: (22.05%) - 0.78%
Lose-lose: (40.95%) - 7.86%

The weighted average of these - trust me - is 0.73%.

By contrast, if you bet the double, your Kelly stake is 2.07% of bankroll, and your outcomes are:

Win-win: (12.95%) +16.5%
Any other: (87.05%) - 2.1%

So, on average, you come out 0.34% ahead. So far, so good for the singles. However, let's examine the bets in terms of risk vs. reward:

Expected risk for two singles: 6.99%
Expected return: 0.73%
Value for singles: 10.44%

Risk for double: 2.07%
Expected return: 0.34%
Value for double: 16.42%

You might argue that we're not comparing apples for apples - that if we're betting singles, we're forced to make the second bet even if the first fails. However, if we don't make the second bet, we do even worse - as you'd expect, failing to make a value bet lowers your expected return (in this case, to 0.66%). The risk in that case is fixed at 5.5%, making the value 12.00% even.

How about the order of the bets? In fact, it doesn't make a difference to the expected return. It does make a difference to your expected risk, though, which drops to 6.26%. That makes the value 11.66% - still lower than the double. Without the second bet if the first loses, the expected return falls to 0.34%, with a risk of 2.5%, making the value 13.59%.

My correspondent challenges me to prove things in general. I scoff, mainly because I ought to do some work. I may leave that for a later post.

All of which seems to show that a double on two value bets gives better value than two singles. The singles give a higher expected value, but at the cost of an increase in risk which reduces the value below the double's.

Tuesday, January 30, 2007

From the postbag: Arbitrage

Splittter writes to say:

Worth saying that, although the arbitrage is seductive as it guarantees profit on each market, to the mathematically inclined gambler it's still only worth doing if all bets involved are value in their own right. Basically because if they aren't, although you'll win every time, over time you'll win less than if you just backed the value ones. I'll leave you to 'do the math' though :)

Alarmingly, my poker-playing friend is quite right. Allow me to thank him for making the first post. Let's say there are two arbitrage situations, one where both bets are value, and one where only one is.

Situation one:
Heads 2.10
Tails 2.10
Total implied probability: 95.23%
The arbitrageur backs both at the same stake, and pulls down 10% of it whatever happens. The value bettor does the same thing, with the same result. Bully for both.

Now let's look at a situation where only one is value:
Heads: 2.25
Tails: 1.95
Total implied probability: 95.72%
The arbitrageur puts GBP1.95 on heads and GBP2.25 on tails, to win GBP4.38 less GBP4.20 = GBP0.18 whoever wins. Over 100 bets, he wins GBP18. The value bettor places GBP4.20 on heads every time. Over 100 bets, he wins GBP9.45 50 times (GBP472.50), minus GBP420 staked - a profit of GBP52.50, almost treble the arbitrageur's return.

The cost of this expected extra profit is volatility. Five per cent of the time, 40 or fewer heads will show up in 100 throws. In the case of 40 heads, he would win only GBP378, less GBP420, a loss of GBP42. How terrible. On the other hand, 5% of the time, there would be 60 or more heads, leaving him with a much increased bankroll- GBP567 - GBP420 = GBP147 profit, more than eight times as much.

Over a much longer time frame, the value bettor would be statistically certain* to make more money.

* that is, over a long enough time frame, you can make it as unlikely as you like that the arbitrageur would win more.

Sunday, January 28, 2007

Kelly staking

Mathematician John Kelly came up with a system for staking which maximises your expected return over the long term. This is going to be a load of maths, so look away now if you're not interested.

Assuming you bet a proportion k of your bankroll each time at odds o, after you win W and lose L bets, you have B' = B [(1 + k(o-1))^W (1 - k)^L]. We want to find the maximum of this, so we take the derivative and set it to 0:
dB'/dk = W(o-1)(1-k) - L(1 + k(o-1)) = 0.

Or, W(o-1)(1-k) = L(1 + k(o-1)). Since over the long term, W/L -> p/(1-p) (see earlier post on the Law of Large Numbers), we can substitute in to get:
p(o-1)(1-k) = (1-p)(1 + k(o-1)). A little algebra then gives us:
k = p - (p-1)/(o-1), the Kelly Staking formula.

That means, if you assess the probability of the outcome to be 50% and the odds are 2.10, you should stake 0.5 - 0.5 / (1.1) ~ 0.5 - 0.45 = 0.05: a twentieth of your balance.

That's a big gamble. After losing a few consecutive bets, your bankroll of GBP1000 would have dwindled like this:
1. Bankroll: 1000.00 Bet: 50.00
2. Bankroll: 950.00 Bet: 47.50
3. Bankroll: 902.50 Bet: 45.13
4. Bankroll: 857.37 Bet: 40.72
5. Bankroll: 816.65

In four bets, you've lost nearly a fifth of your bankroll! On the other hand, if you'd won, you'd be laughing:
1. Bankroll: 1000.00 Bet 50.00
2. Bankroll: 1055.00 Bet 52.75
3. Bankroll: 1103.03 Bet 55.65
4. Bankroll: 1174.24 Bet 58.71
5. Bankroll: 1238.82

And you're up almost 24%. Kelly staking is a wild ride. As long as your value calculations are right, you'll end up way ahead in the long run*. Occasionally you'll lag at the wrong end of the binomial distribution and look like you're way behind.

Some gamblers choose to use a slightly less volatile system called fractional Kelly, in which they split their bankroll into (say) five separate bankrolls and use only one for Kelly calculations. That dampens the volatility a bit, but does make for smaller gains when you're winning.

So long as your value estimation is correct and the law of large numbers takes hold quickly enough - and you can stand the wild fluctuations in your bankroll - Kelly staking is the most profitable system known to mathematics. Use it wisely.

* In the above situation, you'd need about 1800 bets to be 95% sure of breaking even or better.

Exotic bets

For some people, singles - betting on one selection in one event - simply aren't exciting enough, and they need to bet on several selections in several events.

The simplest multiple is the double. The proceeds of winning one bet (a stake of s at odds of d1) are reinvested as your stake in the other (at d2), so you win (s.d1.d2 - s) if both win. A treble is the same with three selections; an accumulator is the same with more.

Another common multiple is the forecast in racing. You predict the first and second place horses/dogs/etc (in order, for a straight forecast, in either order for a reverse forecast which costs twice as much for the same payout*. The winnings are calculated by a Secret Formula known only to the bookies. Exacta and trifecta bets are the same kind of thing, but with three horses.

Now comes the fun bit - exotic bets. The first of these is the patent: three singles, three doubles and a treble on the same three selections, all with the same stake. It thus costs 7 stakes. If all of the selections are at 1.83, you would need to win two to break even (3.66 from the singles, 3.34 from the doubles, minus 7 staked). You need odds of about 7.00 on each to break even with a single win, and three wins will make a profit at any stake at all.

A trixie is similar, but without the singles - so it's four bets: three doubles and a treble. This kind of bet requires at least two of your selections to win to get any kind of payout. At 2.00 each, if two of your selections win, you break even. Three winners, as before, bring you a profit at any odds.

A yankee consists of four selections and every combination of doubles (6), trebles (4) and a 4-way accumulator, making 11 bets altogether. Odds of 3.33 on each selection are enough to break even with two wins, whereas 1.56 would do the trick for three wins.

A lucky 15 adds four single bets to the yankee. At odds of 15.00 each selection, a single winner will break even. Odds of 3.00 will break even with two winners, while 1.52 will guarantee a profit if three of them win.

There are many other types, including the Super Yankee (or Canadian), involving five selections (all combinations except singles - 26 bets). A Heinz is the same thing with six selections (57 bets - hence the name), a Super Heinz the same with seven (120 bets) and a Goliath does the same with eight (247 bets). Lucky 31 and Lucky 63 are the same as Lucky 15 but with five and six selections respectively.

There are two main reasons that exotic bets are interesting. Firstly, for the average punter, there's the chance of a huge payout. If your four singles came in at 2.00, you'd win 8 stakes minus the four you placed. If you'd had a Super 15, you'd have won 80 stakes, minus the 15 you put in.

Secondly, and this is critical for us, they multiply value. This is best explained with a double, or else it gets really complicated. Let's say you have two selections you believe are worth 1.80, but the bookmaker has them available to back at 2.00. On their own, each would give a value of 11% - a nice markup. If you backed them in a double, the true odds of both winning would be 1.80 * 1.80, or 3.24. According to the bookie, the odds of the double are 4.00, giving you a value of 23%, more than double.

Unfortunately, the best odds are usually to be found at betfair, whose structure doesn't lend itself to simultaneous multiples. However, if you're prepared to do a bit of maths, you can figure out equivalent stakes for consecutive events - with the double example above, after the first selection won, you would place the entire winnings on the second selection. I'll see if I can knock up some code to determine optimal staking plans... at some point in the future.

* because it's really two straight forecast bets.