Revisiting the question of pressure in golf

Here we take another crack at understanding whether pressure plays a role in a golfer’s performance on the PGA Tour.

First, we make the usual adjustments to scores; course difficulty and field strength are taken into account to create a strokes-gained measure for each round played. This measure tells us how much better a player’s score was than the average score shot on a neutral course that year on Tour. Second, we calculate each player’s “pressure-free” strokes-gained average; this is done by averaging all their adjusted scores in first and second rounds throughout a given year.

We want to look at how players perform in the 4th round relative to their “pressure-free” averages. We refer to the difference between a player’s 4th round score and their average as “personal strokes-gained”.

Before digging into the data, let’s briefly consider what we should expect to find. First, note that, by construction, the average personal strokes-gained over all players will be zero in the 4th round; this is because strokes-gained is a relative measure, and as such if every player truly did play worse (or better) than normal in the final round, we could not (from looking at the data) distinguish that from say, courses just playing more difficult (or easier) in the final round.

Anyways, don’t dwell on that point if it doesn’t quite make sense. What we want to look at is whether players perform worse when they are closer to the lead heading into the final round, as this is where we think players are feeling pressure most. However, there are 2 basic mechanisms that could result in scoring averages being different when near the lead: 1) In general, players near the lead in a final round of a tournament are playing well – and because form carries over slightly from round-to-round, we could perhaps expect them to continue to play well in the final round, and 2) As previously mentioned, there is pressure on players to perform well when they are in contention and have a lot of money and points on the line. The former would cause a player’s scores to be better than normal when near the lead, while the latter would cause them to be worse.

Ok, we still haven’t got to any data yet, sorry. Below we plot the fitted values (i.e. the conditional mean) from a regression of personal strokes-gained on a player’s position heading into the final round. (In lay terms, we are just drawing a line, or a polynomial, that best fits the data). We fit a quadratic function here. We are using 2011-2016 PGA Tour data, and include all players who had at minimum 20 1st & 2nd rounds in a given year.

This is interesting. It says that, on average, a player plays worse than their typical performance level when he is near the lead, or when he is near the back of the pack, heading into the final round. We can think of a story to fit this nicely; players in contention feel some pressure and this causes their performances to suffer, while those at the bottom of the leaderboard heading into Sunday are simply not playing well.

Now, anyone familiar with golf data knows that most of the variation in golf scores is not predictable (i.e. most is day-to-day random variation). Therefore, it is reasonable to think that we are trying to tell a nice story about pure noise here. For that reason, I fit the same quadratic function for each year from 2011-2016. Here are the plots:

While the pattern varies a bit from year-to-year, one thing that we always see is a drop in performance when a player is inside the top 20 or so at the start of the final round. This is reassuring, as it is evidence that the observed relationship is not just an artifact of a specific sample. Another issue one could bring up is why I am fitting a quadratic function, as opposed to some higher-order polynomial; here is the cubic for 2011-2016:

Again, the pattern varies a bit, but we still see this general tendency of players to play worse than usual near the lead! I am reasonably convinced by these patterns, mostly because it is fairly robust across years. Of course, as mentioned earlier, there is lots of variation in the scores shot from any starting position heading into the final round; some players shoot great scores when near the lead, while others have shot great rounds in last place, and vice versa. But, these graphs do show that, on average, there appears be a relationship between a player’s starting position in the final round and his subsequent performance.

Okay, moving on for those still with me. I next do the same analysis, but for strokes-gained putting, and strokes-gained tee-to-green. That is, we are comparing how players’ are performing relative to their “pressure-free” averages from round 1&2 in putting and tee-to-green play. Here is the graph using quadratic fits for all data from 2011-2016:

Again, this is interesting. But, again, these results should be interpreted cautiously. What this fitted plot shows is that putting appears to contribute more than tee-to-green play to the deterioration in a player’s performance when starting the final round near the lead. This is perhaps surprising due to the fact that players hit more shots tee-to-green in a typical round than they putt, which would (all else equal) tend to result in tee-to-green play contributing more to total score. These plots are fairly robust when fitted to each year of data, but I would still hesitate to give them a stamp of approval, as the data we are fitting is very noisy.

Next, I do a comparison of elite (defined as a player with annual SG average > 2) versus non-elite players:

Not too much to say here, this is as you would expect; elite players appear to have less of a drop (or none at all) in performance (relative to their round 1&2 average) when they are near the lead as compared to non-elite players.

Finally, let’s look at the fitted plots of some specific players (again using final round data from 2011-2016):

Note that for an individual player, they can have a non-zero average personal strokes-gained for their final round play (look at Stricker, for example). This means that the player is on average playing worse relative to his round 1&2 standard in final rounds, irrespective of his starting position. The sample size for these plots ranges from 40 (Tiger) to 126 (Kuchar); so not huge but not tiny either. It is interesting to look at Garcia’s plot; he has gotten a lot of grief for not finishing tournaments off, and this would seem to indicate that the criticism is justified. Rory’s plot also makes a good deal of sense; he seems to have a lot of back door top 10 finishes, which requires putting up good rounds from a long way back on Sundays. Finally, Hoffman and Woodland are both players who I’ve always thought were shaky in final rounds, and their plots seem to bear this out.

To wrap things up, the main takeaway here is that players do seem to play worse when they are near the lead heading into Sundays on the PGA Tour. Averaging across all players, we find that players play approximately 0.3-0.4 strokes worse than their typical level of play when near the lead. While we can’t say whether or not this is due only to pressure per say, the argument could be made that this is a lower bound to the effects of pressure on performance. As mentioned earlier, players who are near the lead heading into the final round are playing well that week (with the exception of “old” Tiger, maybe), and so the fact that we observe a drop in performance suggests that the detrimental effects of pressure more than offset any carryover in good form from previous rounds in the tournament. By this logic, 0.3-0.4 strokes could be a conservative estimate for the effects of pressure on performance.

 

Why do scores vary on the PGA Tour?

This data visualization explores the variation in scores on the PGA Tour in 2016. Scores vary because of differences between players, courses, and the day-to-day variation inherent to golf. There is a lot of information here, so take your time to understand it. Click here to get started.

The Statistical Details

This visualization starts by asking you to choose whether you want to analyze things by player or course. I’ll go through the player decomposition first, as it is more straightforward.

Player Decomposition

The starting point is the Law of Total Variance:

Var(Y) = Var(E[Y|X]) + E[Var(Y|X)]

where Y and X are random variables. The first term on the right hand side is called the “explained variance” (that is, explained by X) and the second term is the “unexplained variance”. In the viz, we refer to these terms as the “Between” and “Within” components, respectively.

In our context, Y is adjusted strokes-gained in a given round, and X will be a vector of player indicator variables. So, the “Between” component is the variance in players’ adjusted scoring averages (E[Y|X]) and the “Within” component is the average variance of individual players’ scores.

To estimate these 2 components is simple; linear regression can do it for us. To see this, consider the regression:

Y = X\beta + U

where Y and X are defined as above, and U is the regression residual (i.e. uncorrelated with X by construction). Because X is a vector of indicator variables, we know that X\beta = E[Y|X] ; this is a property of regression that we can invoke because we know the conditional expectation is linear when X is a set of dummy variables. So, we have:

Y = E[Y|X] + U

\iff  Var(Y) = Var(E[Y|X]) + Var(U)

\iff Var(Y) = Var(E[Y|X]) + Var(E[U|X]) + E[Var(U|X)]

\iff Var(Y) = Var(E[Y|X]) + E[Var(Y|X)]

where the last line follows because E[U|X] is zero by construction in regression, and Var(Y|X) = Var(U|X) by the definition of U. Long story short, to get the aforementioned “Between” and “Within” components, we simply regress adjusted scores on a set of player dummies, and then the variance of the fitted values is our “Between” component, and the variance of the residuals is our “Within” component. When we express them as percentages, the “Between” component percentage is just the R-squared.

Okay, moving on. Next, for each player, we break down their “Within” variance. For each round we know that:

SGTOT = SGOTT + SGPUTT + SGAPP + SGARG

where all of these terms are adjusted for course difficulty. So, we want to break down the total variance in a given player’s adjusted scoring into the components contributed by each part of the game. Consider this:

Var(SGTOT) = Cov(SGTOT, SGOTT) + Cov(SGTOT, SGPUTT) + Cov(SGTOT, SGAPP) + Cov(SGTOT, SGARG)

Then, we say the fraction of variance in SGTOT due to variation in SGOTT is equal to:

\frac{Cov(SGTOT, SGOTT)}{Var(SGTOT)}

This has been coined an “ensemble” decomposition. Notice that:

Cov(SGTOT, SGOTT) = Var(SGOTT) + Cov(SGAPP, SGOTT) + Cov(SGARG, SGOTT) + Cov(SGPUTT, SGOTT)

So we attribute one of each of the covariance terms to SGOTT (recall that if you write out the Var(SGTOT) you would have 2 of each of the covariance terms in the above expression). If covariance terms are small, then the contribution of SGOTT would be simply:

\frac{Var(SGOTT)}{Var(SGTOT)}

which is very intuitive. This decomposition is done for every player who has at least 30 rounds played in 2016. In practice, the covariance terms do matter a bit for the within player decompositions.

To finish this section off, we need to briefly discuss the “Between” decomposition. It proceeds exactly as above, except each data point is a player’s year-long average. Here, the covariance terms are negligible. The % contribution of each SG category tells us how much of the variance in year-long SG averages are due to each SG category.

Course Decomposition

For the course decomposition we are using raw scores.

To break things into “Between” and “Within” course variation, we regress raw scores on a set of course dummy variables.

The decompositions proceed in the same manner as described for the Player section. However, I do want to explain how we calculate the course averages for each category and how to interpret them. With players, this is very simple – just the year-long averages in each SG category, and total SG (all adjusted for course-difficulty. For courses, obtaining interpretable averages is a little more involved.

First, I start out with the baseline strokes-gained numbers (both total and for each category). Baseline SG is how much better you are playing than a “baseline function” which uses historical PGA Tour data to estimate the average number of strokes it takes to hole out from each distance and location (fairway, rough, sand, etc.). At easier courses, baseline SG for the field will have a positive average; this means that all players are on average gaining strokes relative to the baseline function (so, for example, a 400 yard hole at this course is easier than the typical 400 yard hole). An important point to account for is the fact that fields are not the same quality at all courses; therefore we correct for this by estimating fixed effects regressions for each SG category. This proceeds in an analogous manner to that discussed here in our predictive model setup. Take the SG:OTT category; a given course fixed effect from this regression will be the average strokes-gained off-the-tee relative to the baseline function for a typical field, at that course.

So now, for each course, we have the raw scoring average adjusted for field strength, and also the baseline strokes-gained for each SG category, also adjusted for field strength. But, the sum of the adjusted baseline SG category averages does not equal the adjusted raw scoring average. Why is that? The discrepancy is due to the differing distances of courses. Remember that the baseline function takes only distance and type of shot as inputs. When considering the total baseline strokes-gained at a course, all that is relevant is distance (as every hole starts from the same spot, a tee box). Therefore, if 2 courses both have the same baseline total strokes-gained averages, then any difference in the raw scoring averages has to be due to distance. So, the contribution of distance to the adjusted raw scoring average is defined as a residual; the number of strokes not accounted for by total baseline SG average.

Let’s go through an example to make this a little more concrete.

The raw scoring average at TPC Sawgrass in 2016 was 72.05. This is 0.96 strokes higher than the annual raw scoring average on Tour in 2016 (71.09). So what makes up this 0.96 stroke difference in raw scoring average?

First, the adjustment for field strength shows that the PLAYERS field in 2016 was 0.51 strokes better than a typical field (that is, the average player in the field would be expected to gain 0.51 strokes over a typical Tour player on the same course). Therefore, we should expect, all things equal, that this field would be 0.51 strokes better than the Tour average. But, we in fact observed a raw scoring average that was 0.96 strokes higher than the Tour average. Therefore, we now have a discrepancy of 0.96 + 0.46 = 1.47 strokes to account for.

Next, the fixed effects regressions show that, at TPC Sawgrass in 2016, baseline SG:OTT average is 0.21 strokes harder, SG:APP average is 0.51 strokes harder, SG:ARG average is 0.39 strokes harder, and SG:PUTT average is 0.35 strokes harder, at TPC Sawgrass than at the typical course in 2016. Putting these numbers together, we have a total strokes-gained relative to baseline that is 1.46 strokes harder than average. Therefore, we have only a 0.01 stroke discrepancy between the adjusted raw scoring average and the adjusted total baseline SG; as explained previously, this difference must be due to the distance of the course. Because TPC Sawgrass is almost exactly an average length course on Tour, its distance contributes basically nothing (0.01 strokes) to its differential raw scoring average.

And there you have it – we have broken the 0.96 strokes higher raw scoring average at TPC Sawgrass into 6 components: 1) field strength (0.51 strokes lower), 2) Baseline SG:OTT (0.21 strokes higher), 3) Baseline SG:APP (0.51 strokes higher), 4) Baseline SG:ARG (0.39 strokes higher), 5) Baseline SG:PUTT (0.35 strokes higher), and 6) Distance of the course (0.01 strokes higher).

A final note is that we exclude field strength differences in the reported ‘Between Course’ decomposition, as well as the ‘2016 Averages’, in the data viz. We think this is just more informative; we want to know how difficult each part of the course is, and field strength only serves to cloud this information.