Does a player’s “course history” predict performance?

A much-debated topic among golf fans is the relevance of so-called “course history” to a player’s performance in a given week. That is, do specific players tend to play well on specific courses?

Of course, there are intuitive reasons we can come up with to explain why this should, logically, be true. First, the characteristics of certain courses (e.g. length, fairway width, rough length, etc.) should favor players with certain characteristics (e.g. power, accuracy, etc.). Second, golf has a mental component to it; if a player develops a certain level of comfort (or, discomfort) with a given course layout, it makes sense that this higher (or, lower) comfort level will impact their performance at that course.

But, talk is cheap. I can also give you intuitive reasons as to why some players play better on Bermuda greens, or why some players play better when wearing white belts than when wearing black belts. These are theories, and for theories to gain credibility, you need to provide some empirical evidence that corroborates their predictions.

The mere existence of certain players that have a string of good performances at the same course is not, necessarily, strong evidence for the existence of course-player effects. It is true that Luke Donald has played unusually well (compared to his typical performance level) at Harbour Town. This is simply a fact, and can’t be disputed. However, did you know that Henrik Stenson had a great course history at Bay Hill, but played awful in 2017? It is easy to focus on the former point, and overlook the latter. The reason why Luke Donald playing well at Harbour Town doesn’t provide indisputable evidence for the course history hypothesis is that it is not based off a large enough sample of rounds (and yes, 25 rounds is still a small sample, especially in golf). Suppose there really are no course-player performance effects; unless everyone plays a very large number of rounds at each course, it would be astonishing if we didn’t find evidence of some golfers playing better, or worse, than usual at specific courses. The logic here is the same as if we had 300 people flip a coin 10 times; some people will get 8-10 Heads, or 8-10 Tails, simply due to the statistical variation inherent to finite samples.

More generally, finding differences among golfers with respect to some statistic can be thought of as a necessary first step to finding a meaningful metric for predicting golf scores. It’s true that if course history, or performance on Bermuda greens, is going to be a successful predictor of player performance we need to first confirm that there exists substantial variation in the statistic (i.e. if we don’t find any variation in a player’s course-specific scoring averages, then clearly it can’t have any predictive power). But, the next, critical, step is to show that this statistic actually predicts scores to some degree. People analyzing sports data love to do this first step (because it’s easy), but the second step isn’t done very often. So next time you see a list of players ranked by a statistic, the first question should be “Is there any evidence that this helps to predict scores?”.

In this article we are going to examine how well a player’s course history predicts their performance. Along the way, we’ll explore how to best predict golf scores, in general. The hope is that the evidence here can be taken as free of any personal bias from us (full disclosure: as anybody who follows us on Twitter knows, we have been on the “course-history is irrelevant” side of this debate).

Let’s get started. First, we want readers who haven’t analyzed golf data before to appreciate how much *random* variation exists in golf scores on the PGA Tour. Below, we’ve plotted the adjusted strokes-gained of two players on Tour from 2012-present. These scores are adjusted for course difficulty, so any remaining differences reflect only differences in golfer performance; that is, an adjusted score from the U.S. Open can be directly compared to an adjusted score from the Sony Open. (Also, from here on, when I use the phrase “raw scores”, or “strokes-gained”, I am referring to this adjusted measure of scores as just defined. See footnote 1 for a primer on how this adjustment works.)

The players in the plot are Dustin Johnson and another player who we’ll keep unnamed for a moment; take a guess at the (average) world rank of this other player during this period.

Notes: Plotted here are event-level averages; round-level data would show even greater variation. Data is from 2012-present. Positive values indicate better performances.

The unnamed player’s scores plotted here belong to Kevin Na; he has been solid in this period, having an average world rank of around 50-60th or so. However, when you think of Dustin Johnson and Kevin Na, you likely imagine a wide gap between them with respect to their ability levels. But, with only a quick glance at the graph, it’s not immediately obvious who the better player even is! This is an attempt to highlight the fact that the scores of any individual golfer vary a lot.

Next, we add in our best estimates of Dustin Johnson’s and Kevin Na’s “ability” before each tournament (i.e. the score we expect them to shoot at each point in time – estimated from our model) throughout the time period:

Notes: Data points represent event-level average score. “Ability” is defined here, loosely speaking, as a weighted average of various historical scoring averages (2-year, 2-month, last event). Data is from 2012-present.

When you see the plots of their respective predicted abilities, it does become clear that Dustin Johnson has been the better player. Near the end of the sample period, DJ’s ability is estimated to be about 1 stroke per round better than Na; this is actually quite a big difference (as it compares to the typical difference in our measure of ability between PGA Tour players). However, when you see it plotted alongside their raw scores, this difference looks like small peanuts compared to the weekly (*random*) variation in an individual player’s scores. This is probably a good time to mention that we are only able to explain (or, successfully predict) about 15% of the variation in golf scores; the rest is unaccounted for! (If instead we were trying to predict round-level scores, this number drops to about 7-8%.)

Moving forward; let’s do one more quick exercise before we get to the analysis of course history. In the graph below we plot a few different scoring averages calculated over different historical time horizons. The goal here is to evaluate different ways of predicting a player’s scores. Graphically, we’ll just focus on Dustin Johnson’s data so things aren’t too crowded:

Notes: “2Y prediction” is plotting DJ’s strokes-gained average over the previous 2 years (from the date of each event), “2M prediction” is plotting his strokes-gained average over the previous 2 months, “Last event prediction” is his strokes-gained average in his most recent event, and finally, “Weighted prediction” is a weighted average of 2-year S.G., 2-month S.G., and last event S.G; the *weights* are just the coefficients from a linear regression (using all the data, not just Johnson’s).

So, what’s predicting best? Let’s calculate the average absolute deviation of our various predictions from the realized scores. To do this, we take the absolute value of the difference between every score and every prediction, and then average these. Here’s how the predictions did (this is for the entire sample, not just Johnson’s data):

What method predicts best? Average prediction errors:

  • 2Y prediction: 1.41 strokes
  • 2M prediction: 1.52 strokes
  • Last Event prediction: 1.86 strokes
  • Weighted prediction: 1.39 strokes

(Again, recall that this is all done with event-level averages.) The two main takeaways here are: 1) All the predictions do pretty poorly; the best we can do is miss a player’s average score at an event by 1.4 strokes (that is, this is our average prediction error); and 2) The 2-year strokes-gained prediction does almost as well as the optimal (i.e. “Weighted”) prediction method!

Now, finally to the discussion of the relevance of course history. Up to this point, we have been predicting scores without using any course-player specific variables. The goal is to see whether adding in a player’s course history helps to predict their performance in a given week. So, what should we use as our measure of course history? Evidently, a course history variable defined as the average of a player’s raw scores at a course would be problematic, as this will be correlated with the general ability of the player. That is, at Augusta National, Dustin Johnson will likely have a better historical scoring average than Kevin Na, but that may be simply due to the fact that Johnson is typically better than Na at all courses, and not due to unusually good performances on the part of DJ at Augusta. Therefore, we first need to adjust scores for the ability level of the player at each point in time; we’ll call this the residual score. The residual score is how much better, or worse, a player played in each round compared to their ability level at the time. (See footnote 1 to see how we estimate each player’s ability; if you don’t want to read it, you can basically think of the “Weighted prediction” above as the player’s ability at any point in time. Then, the residual score is equal to the raw score minus this prediction.) Our course history variable is going to be defined as a player’s historical average residual score at the relevant course.

In words, we are asking: “Does the fact that Luke Donald has typically played better than expected at Harbour Town from 2010-2015 mean he will play better than expected at Harbour Town in 2016?” 

This is quite a nice approach, because even though Donald’s ability level has dropped off in recent years, we are only looking to see whether he plays better than what we’ve estimated his current form to be. So, Donald may, in terms of raw scores, play worse than he has in the past at Harbour Town in 2016, but if this is still above his current ability level then that would be evidence in favor of the course history hypothesis. (*Only for those who are interested* – for a discussion of why this approach is slightly different than controlling for current ability in a multi-variable regression, see Footnote 2).

Some final details: the estimating data is PGA Tour rounds from 2010-2017. We include all players who played at least 70 rounds in this time period (otherwise we are just bringing in a lot of unnecessary noise with players who’ve only played a few rounds). We predict event-level (or, event*course-level at events with multiple courses) performances using the years 2015-2017. The reason for that is we need to have enough historical years to construct meaningful course-specific scoring averages. To be clear, we predict 2015 scores using 2010-2014 course-specific averages, 2016 scores using 2010-2015 course-specific averages, etc.

The following simple regression is run:

Residual.score_{i} = \beta_{0} + \beta_{1} \cdot Historical.avg.residual.score_{i} + u_{i}

where the regressor is the player’s historical average residual score at the relevant course, and the dependent variable is the player’s average residual score in the current week (or his average at each course in the current week if it’s a multi-course event).

Here is the main result:

Notes: Historical course-specific averages are calculated from 2010 up to year of interest. Dependent variable is current week’s average score. All scores have been adjusted for a player’s current form (i.e. they reflect how much better or worse a player performed than expected). Regression is using data from 2015-2017; sample is restricted to those with at least 15 rounds in their course history.

The slope of the regression line is 0.12 – this means that for every 1 stroke increase in a player’s course history (i.e. his course-specific historical average of residual scores) his expected score increases by 0.12 strokes. Importantly, this graph is constructed only using players with course histories comprised of at least 15 rounds (this leaves ~ 2000 observations). As can be seen from the plot, course history is providing a very noisy signal; there are plenty of players who had good course histories (i.e. further right on the x-axis) but play very poorly in the current week, and vice versa. Of course, on the whole, having a better course history correlates slightly with better performance that week (as evidenced by the upward sloping regression line). For those interested, the estimated slope has a standard error of about 0.05 – so, pretty noisy.

In the full sample (i.e. no restriction on minimum number of rounds played at the course, other than it being greater than zero), course history has basically no impact on expected score: a 1 stroke increase in course history increases the predicted score by about 0.02 strokes. However, with the full sample, there are many observations in which a player only has 2-4 rounds to construct a course history; this adds a lot of statistical noise. Perhaps unsurprisingly, the estimate of the course history effect gets larger as the round cutoff is made more strict, culminating with the result shown in the plot above (a 1 stroke increase in course history average is associated with a 0.12 stroke increase in expected performance). We could keep making the minimum round cutoff stricter, but eventually the sample becomes too small for reliable inference. For a reference point, the coefficient on short-term (say, the previous 2-3 months) from a similar regression would be about 0.15, and the coefficient on long-term form (2 years) would be about 0.75 – 0.80.

In terms of predictive power (i.e the “R-squared” of a regression), course history has very little. Recall that before we were able to predict about 15% of the variation in scores at the event-level (i.e. R-squared equals 0.15). The R-squared of the course history regressions range from 0.02% (!!) (full sample) to 0.2% (restricting to course histories with at least 15 rounds). The R-squared is only a function of two things: 1) the size of the coefficient, and 2) the variance of the course history variable. There is a decent amount of variation in course histories across players, so the reason the R-squared is so small is mainly just due to the small coefficient size. (See footnote 3 for a short discussion on this.)

To conclude, in this article we’ve shown that long-term form is king when it comes to predicting golf scores. However, short-term form does provide a slight improvement in predictive power. Course history, defined here as how much better than expected a player has historically played at a course, is found to impact performance to some degree: we estimate that increasing the course history measure by 1 stroke increases our predicted score by at least 0.02 strokes, and by at most 0.12 strokes (the former using all course history data, the latter obtained only using course histories calculated from at least 15 rounds). But, despite the somewhat meaningful impact course history has on predictions (0.12 strokes is meaningful, in our opinion), it adds virtually no predictive power (as evidenced by an extremely low R-squared). Moving forward, we will keep course history in mind when modelling golf scores, but it trails far behind long-term form, and to a lesser degree short-term form, in its relevance to predicting golfer performance.

Footnotes:

1. We use a slightly different (and, better) method to properly adjust for course difficulty and to estimate player ability than we have in previous work. We roughly follow the method used in Connolly and Rendleman (2008). The naive way to adjust for course difficulty of any given round is to subtract the mean score for the field that day. This can lead to erroneous conclusions about course difficulty, however, because not all fields are the same in terms of average skill level. Subtracting off the mean will tend to overvalue rounds played against weaker fields, and undervalue rounds played against stronger fields. To account for field strength, we have in the past estimated a fixed effects regression of the following form:

Score_{ij} = \mu_{i} + \delta_{j} + \epsilon_{ij}

where \mu_{i} represents a fixed player skill level for player i, and \delta_{j} represents the course difficulty for a given round j. We augment this specification by allowing \mu_{i} to vary over “golf time” (this is the chronological sequence of rounds the golfer plays). Consider the following:

Score_{ij} = \mu_{i}(t) + \delta_{j} + \epsilon_{ij}

where \mu_{i}(t) is now a time-varying measure of player ability (where time is specific to each player, and represents their sequence of rounds). We estimate this in a cool way using an iterative process, the basic idea is outlined in the Rendleman article linked above. The bottom line is that we allow each player’s ability to vary over time (whereas before, it was forced to be fixed over time). This is especially important because our estimating sample spans 9 years (with just a year or two of data, the fixed ability assumption is probably not unreasonable). Recall that in other parts of this article, player ability was defined as the weighted average of 2-year, 2-month, and last event scoring averages. The ability measure here is preferable because it uses data both before and after each point in time to estimate player ability (whereas the other method clearly just uses historical data – which is obviously all you have when you are doing a prediction exercise!).

From this, our adjusted score variable is defined as Score_{ij} - \delta_{j} , and the residual score variable is defined as \epsilon_{ij} .

2. An obvious way to approach this problem would have been to run a regression where you control for a player’s current form using various historical averages (e.g. 2-year S.G., 2-month S.G., etc.) and then include the raw course history average in the regression as well:

adj.score_{i} = \beta_{0} + \beta_{1} \cdot adj.score.2Y_{i} + \beta_{2} \cdot adj.score.2M_{i} + \beta_{3} \cdot adj.score.ch_{i} + \epsilon_{i}

The dependent variable is the adjusted score, and the regressor of interest is adj.score.ch_{i} . This is not quite the same as what we are doing in the body of this article. The difference is very subtle; the interpretation of \beta_{3} is the effect of a player’s historical course-specific scoring average on this week’s performance after controlling for the player’s current form (as defined here by 2-year S.G. and 2-month S.G.). Conversely, in the body of the article, the method we are using can be thought of as controlling for the form of a player at the time they played the course. To clarify with an example: the former method asks: “Does the fact that Luke Donald played better at Harbour Town in the past than what his current form indicates mean he will play better this week?”, while the latter method asks: “Does the fact that Luke Donald has typically played better than his form at the time at Harbour Town in the past mean he will play better than expected at Harbour Town this week?” If my intuition is right (and it may not be, I’m still grappling with this a bit) these two methods would seem to be the same if a player’s form hasn’t changed much in time period we are considering. Anyways, for what it’s worth, doing it with the regression controlling for current form gives almost identical results to those reported in the body of the article.

3. Intuitively, the R-squared of a regression is the proportion of the variance in the dependent variable that is *accounted for* by the included regressors. In the simple case of a single independent variable (e.g. X), the R-squared is equal to:

R^{2} =  \beta_{1}^{2} \cdot Var(X) / Var(Y)

where in our context, X is the course history variable, Y is the current week’s average score, and \beta_{1} is the regression slope coefficient. Evidently, this measure can only be small if the coefficient is very small, or the variance of X is small (relative to the variance of Y). In the full data, the variance of X is 1.48, while the variance of Y is 3.03; therefore, it’s the small size of the coefficient that is driving our very small R-squared (~0.0002, or 0.02% in the full data) result.

Live prediction model: the week that was

[Last updated: November 6th, 2017]

This is going to be a running blog post where, from time to time, we review the past week’s tournament through the lens of our live predictive model of scores. If you are unfamiliar with our predictive model, a good starting point is reading about how we predict tournaments before the events starts. To go from pre-tournament predictions, to live predictions throughout the event, we only make a few adjustments, which we’ll touch on at various points in this blog. Roughly, what we do account for once the event starts is how hard different holes are playing, and the persistence in performances from one round to the next (i.e. playing well in round 1 does affect how we predict rounds 2-4); what we do not account for is within-round persistence in performance (i.e. playing well on hole 1 does not affect how we predict holes 2-17) and hole*player-specific effects (i.e. all holes are either harder or easier than average for all players).

With this blog post we hope to achieve a few things: first, to provide readers with a unique way of looking at how a golf tournament played out; second, to allow readers to better understand how our model works and how to think about probability in golf; and third, to provide a, at times, lively and humorous read. Enjoy!

 

Nov 6th, 2017: Alex Cejka comes back from the dead at the Shriners

For almost the entire final round on Sunday at the Shriners, the model was giving Alex Cejka less than a 0.5% chance of winning (and at various points a 0.0% chance!). In the end, he fired an 8 under 63 to get himself into a 3-man playoff, which he ultimately lost. How worried should we be about our live model if it’s giving a player a 0% chance of winning at some point in the event, but then that player ends up in a playoff?!

Further, recall that last week our predictive model gave Justin Rose basically no chance of winning (<0.5%) for much of the final round, and he went on to win! So, what’s up? Is something wrong with our model?

First, consider this question: what is the probability that the eventual winner of a tournament had a live win probability lower than 1% at some point throughout the event?

As it turns out, this is actually pretty likely to happen. At the Shriners, the start-of-tournament win probabilities showed that there was a 45% chance of the winner being a player who had a pre-tournament win probability less than 1% (to get this, I just add up the win probabilities for all players with win prob. < 1%). Further, this 45% number will be a lower bound. As the tournament progresses, if any of the players who started with an above 1% win probability has their live win probability dip below 1%, this will increase that 45% number slightly. You could definitely figure out the exact answer to the question posited at the start of the paragraph, but for now let’s just leave it at greater than 45% (for the specific example of this past week’s event).

Another way to understand when we should worry about the model getting something (seemingly drastically) wrong is to recall how we evaluate our predictive model. In a nutshell, to evaluate the probabilistic forecasts, we look at, for example, how many times the model said event “X” (e.g. winning) would happen with a probability of 1%; then, we check the final outcomes for these predictions and see how many times event “X” (winning) actually happened. If it turns out that 1% of the set of players who were predicted to win with probability 1%, did in fact win 1% of the time, then we say the model is doing its job correctly. Therefore, the reason we should not (necessarily) be worried when Justin Rose, or Alex Cejka, goes on to win (or, almost win) despite having a very low win probability at some point during the event, is that there were many other players who had these very low win probabilities, too. Therefore, while Justin Rose did happen to win last week while having just a 0.2% (or so) win probability (according to the model) at the start of the round, there were many other players who had a 0.2%ish win probability at the start of the round who did not win. If the model gives 2000 players (over the course of several weeks) a 0.2% chance of winning at some point, we expect 0.002*2000 = 4 of these players to go on to win. It just so happens that Rose (and almost Cejka) was one of those 4. (All of this is not to say that the model isn’t getting things wrong – it might be, but we can’t really say whether this is true just using the last couple weeks’ data.)

It’s a bit like thinking about who ends up winning the lottery: before the winner’s name (e.g. William Nilly) is pulled, William Nilly has just a 1 in 10 million shot at winning. So when he wins, we say, “wow! That event (Mr. Nilly winning the lottery) had only a 1 in 10 million chance of happening, and it did!!” But of course, somebody has to win the lottery – indeed, we can safely predict with 100% certainty that the winner will be someone who had just a 1 in 10 million shot at winning it (assuming everybody buys just a single ticket) before the winning ticket was pulled, we just don’t know who it will be. To relate this to winning a golf tournament, it is easy to be shocked by a longshot winner when it happens, but keep in mind that somebody had to win the tournament, and if half the field is composed of so-called “longshots”, it’s really not that unlikely that one of them goes on to win.

Okay, moving on. Now I want to briefly analyze one of the scenarios yesterday where Cejka was not getting any love from the model. This should help readers understand what the model does well and what its limitations are.

At 2pm PST, Cejka was finished in the house at -9, and the model was giving him a 0% chance of winning (this is based off 5000 simulations, so it’s likely not truly 0% – if we simulated 10000 times and didn’t round the number, it would probably not be exactly 0).

The notables at the top of the leaderboard at this time were Spaun at -10 (thru 11, 41% win prob.), Cantlay at -9 (thru 12, 26% win prob.), Hadley at -9 (thru 12, 19% win prob), Hossler at -8 (thru 11, 6% win prob.), Bryson at -8 (thru 14, 2% win prob.) and then Kim at -7 (thru 11, 2% win prob.). And, as I mentioned, Cejka was in the house at -9.

The main reason the model deemed it to be so unlikely that Cejka would win, despite being just 1 shot back of a single player, was because the finishing holes were playing very easy: holes 13-16 were playing roughly 2 shots under par, and 17-18 were playing around even par at the time. Therefore, the effective leaderboard really had Spaun, Cantlay, Hadley, and Hossler 2 shots better than what they were at, and Bryson 1 shot better. The question (or, one of the questions) to be answered at this point is: how likely is it for Spaun to shoot 1 over par on a 7 hole stretch where the average player is shooting 2 under? Spaun is a bit better than the average player in this field, and he will be 1 over or worse on this stretch roughly 5% of the time. We can then make similar calculations for Cantlay (he will be E or worse 11% of the time, roughly), Hadley (he will be E or worse roughly 13% of the time), Hossler (he will be 1 under or worse 27% of the time), and finally Bryson and Kim (they will be 1 under and 2 under or worse, respectively, about 50% of the time).

Well… if I haven’t lost you in that incredibly long sentence, we can now understand why the model wasn’t optimistic about Cejka’s chances. To get into a playoff, or win outright, all of the events I described above had to happen. Assuming the events are independent (i.e. Spaun playing terrible down the stretch will not be related to Hadley playing terrible down the stretch), the probability of all the events happening is:

5% * 11% * 13% * 27% * 50% * 50% = 0.005%!

This is only a rough calculation, as there are other players that should be involved – but this should be an upper bound for Cejka’s win probability at 2pm PST!

Now, there could be some problems with how I’ve just calculated this. First, the model is always using the performance of players earlier in the day to determine how difficult holes are playing for the players still on the course. Therefore, if the wind picks up (as it did), the model is not accounting for this. So, while in the model, the last 7 holes were playing 2 under par for an average player, in reality because of the changing conditions, they were likely playing a bit harder. That is probably the most important point. Second, it’s also possible that my assumption which stated that how Spaun plays in the final few holes and how Hadley plays in the last few holes are independent events is false. The reason? Golf is a strategic game: at the end of a tournament especially, I will play different depending whether I am leading or chasing. Therefore, players’ scores may be correlated with each other, and this would make the math I did above result in a lower probability than it should have. Third, there is no accounting for *choking* in the model, which looked like it played a role in how the last few holes played out at the Shriners.

To conclude, in this case I think using hole-specific scoring averages from earlier in the day likely caused some issues for the model’s predictions later on (because the wind picked up and the holes played much harder than earlier). In reality, Alex Cejka’s win probability was likely closer to 1% than to 0% when the leaders had 6-8 holes to go. However, for reasons pointed out earlier, we should never be too surprised when the winner of a golf tournament has a very low live win probability at some point during the event.

 

October 29th, 2017: WGC-HSBC Championship

As one of 15 viewers who took in the entire final round telecast in China last night, I was treated to a wild back 9 that ultimately saw Justin Rose overcome an 8 strokes deficit at the start of the day to win by 2. Here is the tale of the tape in the final round, according to our model:

11:00am (DJ: -17, Koepka: -11, Stenson: -10, Rose:-9)

To start the day, our model was giving Rose (starting 8 back) just a 0.7% chance of winning, while Dustin Johnson, who led by 6 over Brooks Koepka (4.6% chance to start the day), had a 91% chance of closing it out. The other relevant player was Stenson, who began 7 back with a 2.4% chance of winning.

To start thinking about golf probabilistically, consider this: suppose we have an 18-hole match between two equal players, with typical “standard deviations” (i.e. a measure of consistency from round-to-round), and player 1 has a 6-shot advantage over player 2 at the start of the day. Through simulations, it is the case that player 1 will win about 92% of the time when he starts with a 6-shot advantage. Here, DJ is actually a slightly better player than any of his 3 pursuers; so why is he only getting a 91% chance of winning? The reason of course is he has to beat not one player (as in my example), but three players.

11:35am (DJ (thru 2): -15, Koepka (thru 2): -11, Rose (thru 3): -11, Stenson (thru 2): -10)

Rose quickly made up 4 shots in about half an hour. DJ’s win probability was still 76%, and Rose’s was now 7%. Despite a rough start, DJ was still 4 up and in fewer than 18 holes, this is a lot to make up.

1:30pm (DJ (thru 8): -15, Stenson (thru 8): -12, Koepka (thru 8): -10, Rose (thru 9): -9)

Unbelievably, thru 9 holes Rose now has just a 0.4% chance of winning. He’s 6 back and things are not looking good. DJ has just a 3-shot advantage over Stenson, but still has an 82% chance of closing this out.

3:20pm (DJ (thru 15): -13, Rose (thru 16): -13, Stenson (thru 15): -12, Koepka (thru 15): -11)

Now, things are getting tight. DJ’s win prob is 49%, while Rose’s is 32%. Why the big difference, given that they are tied? Hole 16 is a short par 4, playing -0.2 strokes under par; DJ still had this to play, while Rose did not. This illustrates the importance of accounting for the difficulty level of the remaining holes a player has. It also illustrates how small the margins can be in golf; a hole playing 0.2 strokes easier matters, especially with just 3 holes remaining.

3:35pm (Rose (thru 17): -14, Stenson (thru 16): -13, DJ (thru 16): -12, Koepka (thru 16): -11)

Rose now has a 1-stroke lead, and a 79% chance of closing this out; Stenson sits at 17% and DJ now has just a 4% chance of coming back to win.

4:00pm (Rose (F): -14, Stenson (thru 17): -12, DJ (thru 17): -12, Koepka (thru 17): -11)

Rose is in the house with a 2-shot lead over Stenson and DJ, and a 3-shot lead over Koepka. He’s got a 98.5% chance of closing it out; DJ has a 1% chance; Stenson has 0.5%, and Koepka 0.0%. The 18th hole was playing at about even par, so the model was giving DJ and Stenson only a 1% and 0.5% chance of making eagle respectively.

In the end, neither player could make an eagle, and Rose pulled off a shocking win.