Brief summary of our 2017 in golf betting

Using our predictive model we placed 268 bets over 30 weeks on Bet365. Other then a few exceptions early, we exclusively bet on Top20s. A detailed summary of our results can be found here.

First, a bit on the model’s performance, and then a few thoughts.

Here is a graph from the summary document that reflects quite favorably on the model:

Simply put, we see that our realized profit converges to the expected profit, as determined by the model, as the number of bets gets large. I think this is some form of a law of large numbers (it’s not the simple LLN because the bets are not i.i.d.). This is suggestive evidence that the model is doing something right.

Next, I want to show two graphs that I put up previously when discussing the model’s performance through 17 weeks:

The first graph simulates a bunch of 30-week profit paths assuming that the bookie’s odds reflect the true state of the world. You can see the mean is around -40% or so, which is due to the fact that the bookie takes a cut. Our actual profit path is also shown (in red) and we see that we beat nearly all the simulated profit paths. This tells us that it is very unlikely our profit path would have arisen purely due to chance.

The second graph again shows some simulations, this time assuming that the model’s odds reflect the true state of the world. We see that the realized profit path is pretty average, conditional on the model being true.

A final angle that you can look at to gauge our model’s performance is provided here. This basically answers questions of the following nature: the model said this set of players would make the cut “x” % of the time, so, how often did they actually make the cut?

Overall, I think all of these methods for evaluation show that the model was pretty successful.

So, what did we learn? We had never bet before, so perhaps some of these *insights* are already well-known.

First of all, I think in developing this model we appreciate more how *random* golf really is. Even though our model seems to be “well-calibrated”, in the sense that if it says an event will happen x % of the time, it usually does happen about x % of the time, it does not have much predictive power. In statistical parlance, we are only able to explain about 6-8% of the daily variation in scores on the PGA TOUR with the model; the rest of the variation is unaccounted for.

Second, and this is definitely related to the point above, our model generally likes the lower-ranked golfers more, and the high-ranked golfers less, than the betting market does. For example, of our 268 bets, only 15 were made on golfers ranked in the top 10 of the field that week (we determine rank based off our model). More generally, the average rank in a given week of the players we bet on was 48th; here is a full histogram:

So why did our model view the low-ranked players more favorably than the betting sites? Well, it could just be that the majority of casual bettors like to bet on favorites (because they want to pick “winners”, as opposed to good value bets). Betting sites therefore have incentive to adjust their odds to reflect this. However, it could also be that our model acknowledges, to a greater degree than the oddsmakers do, that a large part of golf scores cannot be easily be predicted. As a consequence, our model doesn’t predict that great of a gap between the top-tier players and the bottom-tier players in any given week. For reference, here is a graph outlining some of the players we bet on this year:

Third, our model valued long-term (2-year) performance much more than the market. As a consequence, we would find ourselves betting on the same players many weeks in a row if that player got in a bad rut. For example, Robert Streb was rated pretty decently in our model at the start of 2017 due to his good performance in 2015/2016. But, as 2017 progressed, Streb failed to put up any good performances. The market adjusted pretty rapidly by downgrading Streb’s odds after just a few weeks of bad play, while the model’s predictions for Streb didn’t move much because it values longer-term performance a lot. As a consequence, we bet (and lost!) on Streb for many consecutive weeks, until he finally came 2nd at the Greenbrier; at which point the market rebounded rapidly on Streb’s stock, so much so that we didn’t bet on him much for the rest of the year. It’s important to note that we don’t arbitrarily *choose* to weight 2-year scoring average heavily. The weights are determined by the historical data used to fit the model; whatever predicts best gets weighted the most. Long-term scoring averages are by far the most predictive of future performances, and the model weights reflect this. In fact, for every 1 stroke better (per round) a player performed in his most recent event, the model only adjusts his predicted score for the next week by 0.03-0.04 strokes!

Fourth, our model did not use any player-course specific characteristics. This stands in opposition to the general betting market, which seems to fluctuate wildly due to supposed course fit. A great example of this was Rory McIlroy at the PGA Championship at Quail Hollow this year. Rory went from being on nobody’s list of favorites in the tournaments during the preceding weeks, to the top of nearly everyone’s at the PGA. In contrast, we made no adjustment, and as a consequence went from being more bullish on Rory than the markets at the Open, to less bullish at the PGA. It’s not necessarily that we don’t think that these effects exist (e.g. Luke Donald does seem to play well at Harbour Town), it’s simply that we don’t think there is enough data to precisely identify these effects. For example, even if a player plays the same course for 8 consecutive years, this is still only 32 rounds, at most. Even this is not a lot of data to learn much of value. And, in most cases, you have much less than 32 rounds to infer a “course-player fit”. When a list of scoring averages, or some other statistic, are presented based off of only 10, or even 20 rounds, this list should be looked upon skeptically. With a small sample size, it’s likely that these numbers are mostly just noise. Regarding the Luke Donald/Harbour Town fit: even if there are no such things as course*player effects, we would still expect some patterns to emerge in the data that look like course*player effects just due to chance! This becomes more likely as the sample of players and courses grows. Essentially this is a problem of testing many different hypotheses for the existence of a course*player effect: eventually you will find one, even if, in truth, there are none.

Fifth, and finally, I think it is incredibly important to have a fully specified model of golf scores because it allows you to simulate the scores of the entire field. Unless you have a ton of experience betting, it would seem, to me, to be very difficult to know how a 1 stroke/round advantage over the field translates into differences in, say, the probability of finishing in the top 20. By simulating the entire field’s scores, you are provided with a simple way of aggregating your predictions about scoring averages into probabilities for certain types of finishes.

 

 

 

 

 

 

 

Which is more important: putting, or ball-striking?

It is a saying that has stood the test of time: drive for show, putt for dough. But how much truth is there to this?

Mark Broadie’s work has shown convincingly that it is the long game that differentiates players over longer periods of time (i.e. a season, or more). Using data from 2003-2010, he shows that the variance in average total strokes-gained can be decomposed into 72% due to long game, 11% short-game, and 17% putting (see here, pg. 23). Again, to emphasize the point, Broadie is using these numbers to show what distinguishes golfers over the long-term. Let’s dig a bit further into this.

Despite these definitive results, there is still very much an ongoing debate about whether putting or long-game matters more. I think part of the confusion stems from the fact that there a couple different ways we could define the “importance” of any given part of the game, and they may lead us towards very different answers. First, we can talk about which parts of the game are most important over the course of an entire season (as Broadie’s analysis did), or we can talk about which parts of the game are most important in any given week.

First, I’ll repeat Broadie’s analysis with the 2017 data. He did what is known as a variance decomposition. In simple terms, what we are doing is the following: we observe a spread in scoring averages over the course of a season (e.g. DJ gained 2 strokes per round on the field, Jason Day only gained 1 stroke per round, Steven Bowditch lost 1 stroke per round, etc.). We want to figure out which parts of the game contributed most to these differences in scoring averages between players. We do this by looking at the spread of players’ strokes-gained averages for the year in each category. Take a look at these histograms:

The x-axis scales are all identical, so we can compare across graphs easily. The intuition is simple (ignoring some statistical details): the more spread out a strokes-gained category’s histogram is, the more it matters for determining who the best players are over the course of a season. So you can see that around-the-green play has a very small spread, putting has a slightly larger spread, and then off-the-tee play and approach play are wider still. Here are the actual numbers of the variance decomposition of total strokes-gained averages in 2017 (they are similar to what Broadie’s analysis found):

  • SG off-the-tee: 37%
  • SG approach: 33%
  • SG around-the-green: 10%
  • SG putting: 20%

However, when we examine a specific week of play (I use data from the 2017 Open Championship, but all weeks look pretty similar) we get a somewhat different story about which strokes-gained categories matter most. Consider this set of histograms, which plots all players’ strokes-gained per round for the week:

Again, we see that around-the-green SG is not separating players much. However, we see that the SG putting histogram is now a little shorter and wider than before when we were using year-long averages. The actual numbers for the variance decomposition at the Open Championship were:

  • SG off-the-tee: 15%
  • SG approach: 33%
  • SG around-the-green: 14%
  • SG putting: 38%

So, we now see that putting is the SG category that separates players most in a given week (although it is still less important than the long game, which is off-the-tee + approach). I think there is a fairly simple way of understanding these two sets of results. On any given day, putting is pretty random; even the best putters can have pretty bad putting days, or weeks. On the flip-side, the best ball-strikers are typically the best at striking that ball week-in week-out. If you don’t believe me, which of the following bets would you rather take: 1) Betting that Lee Westwood will have a higher strokes-gained putting than Jordan Spieth in the first round of the Open Championship, or 2) Betting that Brian Gay will have more strokes-gained off-the-tee than Rory McIlroy in the first round of the Open Championship. In my opinion, ball-striking prowess is a much more reliable thing than putting prowess.  As a result, we see that while putting doesn’t matter much over an entire year (people have good weeks and bad weeks that mostly balance out over the year to near-zero), it does matter a fair bit in any given week. An extreme example of this idea would be the following: each week, every player either gets hot and has +5 SG putting, or gets cold and has -5 SG putting. In the long-run, everybody’s SG putting will be zero (because whether you get hot or cold is just random) and so it will have no impact on what separates the best players over long time horizons, but it will have a huge impact week-to-week. This is what we observe in the data, but to a lesser extent.

Hopefully this provides some insight into the putting vs. ball-striking debate, which I’m sure will continue well into the future.