Penalty Strokes on the PGA Tour

There are few statistics available concerning the occurrence of penalty strokes in professional golf. This article provides the raw numbers as well as some adjusted figures to tell the story of penalty strokes on the PGA Tour.

The analysis uses shot-level data from 2007-2016. Due to data limitations, the majors, WGC events, and events held opposite to majors are excluded. To start, the figure below show the distribution of players’ average penalty strokes per round, along with lists of players with the 10 worst and 10 best penalty stroke rates.

It can be seen that the majority of players average between 0.25-0.50 penalty strokes per round.

The next figure details the average penalty strokes per round by course on the PGA Tour.

The course name is found by hovering over each data point. Courses in the top right corner of the figure indicate difficult courses that yield high numbers of penalty strokes, while those in the bottom left are courses that yield few penalty strokes and have low scoring averages. In general, there is only a weak relationship between penalty strokes and scoring averages by course.

To get a better sense of which players take unusually high numbers of penalty strokes, two adjustments are made.

First, we saw previously that some courses have higher numbers of penalty strokes incurred than others. Given that players have different schedules, differences in courses played can affect a players’ average penalty strokes per round. To correct for this, for each round played we subtract the course-specific penalty stroke average from the number of penalty strokes incurred that day for each player. Then, the average of this adjusted penalty stroke measure over all rounds played, for each player, will take care of any differences in playing schedules between players.

Second, we would like to adjust for the ability of players. Evidently, golfers who are playing poorly (as indicated by their relative scoring average) are expected to have more penalty strokes per round. For example, if it were the case that Phil Mickelson and Heath Slocum had the same average penalty strokes per round, we would say that Mickelson has an unusually high average (as compared to Slocum) due to the fact that Mickelson’s relative scoring average is better than Slocum’s.

This latter adjustment is made using a statistical tool called linear regression. Without getting into specifics, we can use regression to predict a player’s expected penalty strokes per round given their relative scoring average. Then, it is the difference between a player’s actual penalty strokes per round and their predicted penalty strokes per round that is the new object of interest (called the “residual” in the regression context). The prediction for penalty stroke rates as a function of relative scoring average is shown below.

You can hover your mouse over each data point to view the player name. Also, note that the penalty stroke rates are already adjusted for course effects (that is why Furyk no longer has the lowest penalty rate).

It is now the distance between a data point and the prediction line that is the measure of interest. Players below the line take fewer penalty strokes than what is predicted by their ability level, while those above the line take more penalty strokes than what is predicted by their ability level. For example, although Jim Furyk and J.J. Henry have almost identical raw penalty stroke rates, once we adjust for ability it can be seen that J.J Henry has a lower adjusted rate (i.e. Henry’s data point is further below the prediction line than Furyk’s is). These adjustments make an important difference in determining who has the lowest adjusted penalty rates, however it makes little difference for those with the highest penalty rates (i.e. the raw and adjusted penalty rate rankings are quite similar).

Finally, we take a closer look at (unadjusted) penalty strokes incurred as the result of tee shots. We classify tee shot penalty strokes into two categories; out-of-bounds (OB) tee shots, and tee shots that find hazards or are unplayable (Note: par 3s are included in this analysis). The following figure provides the number of tee shots that find OB or hazards per 1000 tee shots hit, for each player.

Again, explore the data by hovering over the data points. It is striking how few penalty strokes some players take on tee shots. Remarkably, in our sample Fred Funk hit 3204 tee shots, none of which found the OB, and only 13 of which found a hazard.

 

The Hot-Hand Fallacy; Application to Golf

The existence of, and belief in,  a “hot-hand” in sports has long been a question of substantial interest to psychologists, economists, statisticians, and, of course, sports fans. The idea behind a “hot-hand” is that an individual is more likely to perform well following a good performance than they are following a poor performance. In basketball, which has been the context for most of the papers written on this subject, the question is whether a player is more likely to make a shot attempt given he made his previous shot attempt. In the ’80s and ’90s, a series of influential papers documented the absence of any hot-hand effect – in these studies players were no more likely (at least not to a statistically significant degree) to make a shot following a made attempt than they were following a missed attempt. These results got a lot of attention, and were largely rejected by the sports community who continued to believe in the hot-hand. This fueled more research documenting the irrationality of this belief and others like it; the common theme being that humans tend to give more meaning than they should to patterns in small samples of data.

However, there has been big news recently in this literature. The hot-hand fallacy result has been overturned in a paper by Joshua Miller and Adam Sanjurjo, which begins,

“Jack takes a coin from his pocket and decides to flip it, say, one hundred times. As he is curious about what outcome typically follows a heads, whenever he flips a heads he commits to writing the outcome of the next flip on the scrap of paper next to him. Upon completing the one hundred flips, Jack of course expects the proportion of heads written on the scrap of paper to be one-half. Shockingly, Jack is wrong. For a fair coin, the expected proportion of heads is smaller than one-half.”

As shocking as this statement is, it is true! You can check it by doing a simple simulation with some statistical software; I simulate a sequence of 50 coin flips, and calculate the proportion of heads that followed a head in the previous flip. I do this simulation 100,000 times and take the average of the proportions I get from each sequence, obtaining an average equal to 0.49! Crazy! As is explained at length in their paper, this is the result of a subtle selection bias. Honestly, it is hard to wrap your head around, and I won’t be able to provide an adequate explanation. Here is the link to their paper if you want to grapple with the concept for a bit. So, why is this relevant to understanding the hot-hand? Suppose a basketball player has a make percentage of 50%. Given what was described above, we know that in the absence of a hot-hand (i.e. the player shoots 50% regardless of whether he made or missed his previous shot(s)) the proportion of made shots following a made shot that we calculate from a sequence of data is expected to be strictly less than 0.5. Therefore, if we find in our data that the player made 50% of his shots following a made shot, this would actually be evidence in favor of this player having a hot-hand. This is due to the fact that our calculation of the proportion of made shots following a made shot is biased downwards.

In applying the concept of a hot-hand to golf, a relevant statistical summary could be the average score for a player conditional on the score they made on the previous hole (or previous two holes). You may wonder if the bias described above still persists when we are talking about non-binary (i.e. not 0 or 1, heads or tails, etc) outcomes. I answer this question with a brute force approach; I simulate 100 rolls of a die (i.e. I generate 100 numbers, each of which take on the values 1,2,3,4,5,6 with probability equal to 1/6), and then calculate the average value that followed a value of 1. I repeat this process 100,000 times and I find that the expected value of the die following a roll of 1 is about 3.53 (as opposed to 3.5, which is what it should be). By the same method, I find that the expected value of the die  following a roll of 2 is about 3.52, and so on, up to the expected value following a roll of 6 which is about 3.47. So it can be seen that the bias persists. Honestly, I still can’t believe that this is a thing; it truly is remarkable! I want to emphasize that, evidently, the expected value of a die after a 6 has been rolled is in fact 3.5 . But what these simulations show is that when we calculate this measure from data, in what would seem to be the most intuitive way to do so, our measure will be biased.

Back to golf; define outcomes in terms of relative score to par, so that a player’s relative score takes on integer values from -2 to 2 (ignoring any scores more extreme than this). A player who gets a hot-hand should have a lower expected score following a birdie than he would following a bogey. Formally, I define my measure of a player’s hot hand as the difference between their average score following a birdie, and their average score following a bogey. Thus, if this difference is negative that player performs better on holes that followed a birdie. Conversely, if the difference is positive that player performs better on holes that followed a bogey.

Given the discussion so far, if I take a sequence of hole scores, and calculate the average score on holes that follow a bogey, this average will be biased downwards. Conversely, the average score on holes following a birdie will be biased upwards. Therefore, just as in the case of basketball shooting data, my data will be biased to finding no evidence of a hot-hand.

Importantly, this bias get smaller as the length of the sequence gets longer, but gets larger as you consider longer previous streaks (i.e. there is more bias when calculating the probability of a heads following 3 straight previous heads).

OK! Hopefully I still have some readers on board at this point, because I am heading to the data to figure out which golfers have a hot-hand.

Using 2016 PGA Tour ShotLink data, I calculate my hot-hand measure for each player as follows. I calculate the average score on holes that followed a birdie or better on the previous hole, and I calculate the average score on holes that followed a bogey or worse on the previous hole. I adjust the hole scores for hole difficulty by subtracting the average score on that hole that day. (This adjustment is made due to the concern that on easier courses players make more birdies, so, on average, the holes that follow their birdie holes are easier). The difference between these two averages will be my measure of that player’s hot-hand. However, I need to correct for the bias that will exist in each player’s measure. I don’t have an analytical expression for this bias, so I have to go the brute force route again; I simulate a sequence of scores (based on a typical score probability distribution; P(Eagle or Better)=1%; P(Birdie)=19%; P(Par)=63%; P(Bogey)=16%; P(Double or Worse)=2%). Then, I calculate the average of all values that follow a birdie or better in the sequence. Similarly, I calculate the average of all values that follow a bogey or worse in the sequence. I do this many times and calculate a grand average for each. Then, the difference between these two averages will be the bias present in my measure. This difference is equal to the bias because in my simulation there is no hot hand – it should be the case that the average score is independent of the previous hole’s score. Thus, if there was no bias in my simulations, the difference would be zero.

So, why would this bias be different for different players? Two things affect the bias; the probability distribution used to generate the data, and the length of the sequence. Because I am feeling lazy, I use the same probability distribution to generate simulated scores for every player (they are similar enough that it does not really matter in practice). Thus, the only difference between bias calculations for each player is the length of the sequence (i.e. the total number of holes that the player played in my sample).

Finally, I do all of which I just described again, except I consider how players perform following 2 consecutive birdies compared to following 2 consecutive bogies. As I mentioned earlier, this measure will carry a bigger bias with it. I refer to this measure as “Hot-Hand Measure 2”, while the previous measure is “Hot-Hand Measure 1”.

The following table shows the size of the bias in my two measures for sequences of various lengths that are relevant to the analysis.

The bias is much smaller for the measure that considers performance only on the previous hole. Additionally, both bias measures decrease as the sample length increases.

Next, some actual statistics.  The average hot-hand measures of all players on Tour in 2016 are shown below. This gives a sense of whether, on average, players tend to play better following good play on their previous hole(s).

 

We see that, on average, PGA Tour players do experience the hot-hand phenomenon. Keep in mind that the first measure is much more precisely estimated than the second measure, due to the fact that there are more observations on players’ performance following a single birdie (bogey) then two consecutive birdies (bogies).

To get a sense of how meaningful the magnitudes of these measures are, note that -0.0113*18=0.20 and -0.0204*18=0.37. That is, if we imagine the average player playing an entire round in his “hot-hand state” (birdied previous hole) to his “cold-hand state” (bogied previous hole), this would result in a difference of 0.20 strokes (or 0.37 strokes in the case of our second measure). While this is certainly not a large difference, it is not nothing either. Additionally, as we will see next, there is a lot of individual heterogeneity in these hot-hand measures. Consequently, while this overall average is near zero, there are player-specific effects that are more meaningful in size.

The following interactive table provides the bias-corrected hot-hand measures for each player on the PGA Tour in 2016. The players at the top of these rankings could be considered the streakiest players, while the players at the bottom are anti-streaky; they play better, on average, following bogies than they do birdies. The overall ranking is determined by the sum of the rankings for Measure 1 and Measure 2.  You can sort based off either measure’s ranking by clicking on the column header.

For those interested, for hot-hand measure 1, there were typically 200-250 instances where a player made a birdie (bogey) in our sample. Therefore, our estimate for his average performance on the holes following a birdie (bogey) is based on 200-250 observations. This is a comfortably large sample size. For hot-hand measure 2, the average performance is based off a sample size of typically 40-50 observations. Consequently, you will notice there is much more variation in the hot-hand measure 2; this reflects the smaller sample size upon which the measure is constructed.

P.S. It has been pointed out to us that there have been papers written that are quite similar to this subject (or at least to the actual analysis we do). Here are the links Link 1 Link 2.