• Frequently Asked Questions

LAST UPDATED:

→
General

Over time we've recieved many of the same questions via email or social
media. Hopefully if you've come to this page someone has asked your question
before! If not,
send us an email and we'll try to help you out.

FAQ:

General:

Q: When does the site typically update with new data each week?

A: Most pages will update by Monday evening of each week. The
Scratch Matchups page sometimes will not be updated until Tuesday morning
if there are no odds posted on Monday. For major championship weeks, the site
will be updated with the week's relevant data by Sunday night.

Q: What are your data sources?

A: As described below, all of the data incorporated into our model is at the round-level (i.e. round
scores, and round-level strokes-gained in the categories (i.e. OTT, APP, etc.)). This data is
publicly available from a variety of websites that display results from
professional golf tournaments.

Q: Is there a way to access your raw data? Do you have an API?

A: Currently we do not offer a means to access our raw data. This
may be something on offer for Scratch subscribers in the near future.

Predictive Model:

Q: What is the difference between the two models listed on the finish probability pages?

A: The 'baseline' model is described in more detail below, and does not take into account any course-player
specific characteristics (e.g. course history, course fit). The baseline skill estimates, which are used to
generate the finish probabilities, are obtained by equally weighting golfers' historical performance
across all courses (but the weighting is not equal over time – recent results are weighted more).
The 'baseline plus CH' model takes account of players' course history at the relevant course.
A list of the adjustments we make (and the resulting adjusted skill level for each player) are
listed here. We list both
models for a couple of reasons. First, it is not clear which model performs better; using our historical data to conduct backtesting we find that
both models perform pretty similar in how well they predict future strokes-gained. One way you could use
these models is to put more trust in a specific prediction when both models agree (e.g. when both models show positive expected
value on the same bet). Second, the inclusion of the course
history model gives a sense of how the course history adjustments map to changes in finish probabilities. This should help
you build intuition about how changes in skill estimates (strokes-gained per round) impact the outcomes
we care about (i.e. finish probabilities). Moving forward, this is the first
step in a bigger plan we have to allow users to customize our model output and betting tools with their
own inputs and insights.

Q: Is the model that includes course history used anywhere else on your site?

A: Unless otherwise noted, it is the baseline model (i.e. no course history effects) that is
used on the site.

Q: In simple terms, what does your predictive model take into account?

A: If you would like a detailed description of the model methodology, visit this
blog post.

The model currently uses historical data from 6 professional tours: the PGA Tour, European Tour, Web.com Tour, Mackenzie Tour (Canada), Latinoamerica Tour, and the European Challenge Tour. Our database goes back as far as possible on each tour.

Using this historical database, the model produces estimates of each golfer's expected*strokes-gained relative to an average PGA Tour professional*. To obtain
these estimates there are basically just two steps: 1) properly adjusting
scores across tournaments and tours (e.g. accounting for the fact that
beating fields by 2 strokes on the PGA Tour
is better than doing so on the European Tour), and 2) producing a weighted
average of these adjusted scores
to project future performance (more recent rounds recieve more weight). With
these predicted strokes-gained estimates we can then derive any outcome of a golf
tournament we would like: e.g. a Top 20 finish probability, or a head-to-head matchup win probability.

This last point is important: once we have our skill estimates for each player (in units of strokes-gained relative to an average PGA Tour professional), we can translate skill differences into probabilities (of various sorts). This depends critically on how much random variance in performance there is in golf. To dig more into this, see the methodology blog post.

The inputs to our model only include round-level information (i.e. no hole-level or shot-level data is used). We do incorporate round-level*strokes-gained category* performance (e.g. Off-the-tee,
Approach, etc.) where it is possible. This latter adjustment makes use of the
fact that long game performance is more predictive than short game performance.

Importantly, our model does not account for course-specific characteristics. (Update: This is true only in the baseline model – we now provide estimates from a model that includes course history on some parts of the site.) You can think of the model as producing very good estimates of the quality of each golfer's historical performance. For reference, a golfer's last 150 rounds (roughly) contribute to the estimate of their current ability level.

The model currently uses historical data from 6 professional tours: the PGA Tour, European Tour, Web.com Tour, Mackenzie Tour (Canada), Latinoamerica Tour, and the European Challenge Tour. Our database goes back as far as possible on each tour.

Using this historical database, the model produces estimates of each golfer's expected

This last point is important: once we have our skill estimates for each player (in units of strokes-gained relative to an average PGA Tour professional), we can translate skill differences into probabilities (of various sorts). This depends critically on how much random variance in performance there is in golf. To dig more into this, see the methodology blog post.

The inputs to our model only include round-level information (i.e. no hole-level or shot-level data is used). We do incorporate round-level

Importantly, our model does not account for course-specific characteristics. (Update: This is true only in the baseline model – we now provide estimates from a model that includes course history on some parts of the site.) You can think of the model as producing very good estimates of the quality of each golfer's historical performance. For reference, a golfer's last 150 rounds (roughly) contribute to the estimate of their current ability level.

Q: How should I make use of your model's output?

A: To make use of our model, you first need to understand what it is
good at. Our model provides a set of baseline estimates that likely do not
warrant big deviations from. We are confident in saying that our model's output gets you most of
the way to accurate predictions. The majority of the value-added of our model
likely lies in two areas: first,
we are missing very little relevant data on golfers' recent performance (we are
missing data from Amateur events, and some of the more obscure international tours).
There are several models out there that are only using PGA Tour data; this immediately
puts those models at a large disadvantage. Second, we are properly
adjusting scores across tours; being able to directly compare performance
across professional tours that differ drastically in quality is very important.
Doing these two things well gets you most of the way to obtaining good estimates
of golfer ability.

Our estimates are not perfect, however. As said above, currently we do not account for any course-and-player specific effects. This would include, for example, certain players performing better on certain types of course layouts. In our past work, we have found course-and-player-specific characteristics to be difficult to incorporate into the model in a systematic manner. We are always working to improve the model, so course history and course fit may be incorporated soon; this page will be updated when it is. (Update: This is true only in the baseline model – we now provide estimates from a model that includes course history on some parts of the site.)

Apart from just using our model's output directly, there are a couple of ways you could incorporate your own information with our model's output. First, it could be useful to take our estimates as a baseline and make manual tweaks when there are particularly strong indications of player-course fit (e.g. Luke Donald at Harbour Town, Phil Mickelson at Augusta National). These adjustments should never be too large in our opinion (work we have done shows that course fit does not have much predictive power). Second, if you have your own predictive model, combining (e.g. taking a simple average, or a weighted average) our estimates with yours is one possible strategy to produce an even more accurate model than either model alone.

In the near future, we will be providing Scratch subscribers with the ability to download our model's estimates of player skill (i.e. expected strokes-gained per round) which will make it easy to incorporate our model's output into models of your own. We also plan to work on other ways that allow subscribers to customize our model's predictions (e.g. allowing users to tweak skill estimates in terms of strokes-gained per round, and then translating those tweaks into relevant probabilities for weeklong finish position and head-to-head matchups). Look for these features to be live within the next couple months.

Our estimates are not perfect, however. As said above, currently we do not account for any course-and-player specific effects. This would include, for example, certain players performing better on certain types of course layouts. In our past work, we have found course-and-player-specific characteristics to be difficult to incorporate into the model in a systematic manner. We are always working to improve the model, so course history and course fit may be incorporated soon; this page will be updated when it is. (Update: This is true only in the baseline model – we now provide estimates from a model that includes course history on some parts of the site.)

Apart from just using our model's output directly, there are a couple of ways you could incorporate your own information with our model's output. First, it could be useful to take our estimates as a baseline and make manual tweaks when there are particularly strong indications of player-course fit (e.g. Luke Donald at Harbour Town, Phil Mickelson at Augusta National). These adjustments should never be too large in our opinion (work we have done shows that course fit does not have much predictive power). Second, if you have your own predictive model, combining (e.g. taking a simple average, or a weighted average) our estimates with yours is one possible strategy to produce an even more accurate model than either model alone.

In the near future, we will be providing Scratch subscribers with the ability to download our model's estimates of player skill (i.e. expected strokes-gained per round) which will make it easy to incorporate our model's output into models of your own. We also plan to work on other ways that allow subscribers to customize our model's predictions (e.g. allowing users to tweak skill estimates in terms of strokes-gained per round, and then translating those tweaks into relevant probabilities for weeklong finish position and head-to-head matchups). Look for these features to be live within the next couple months.

Data Golf Rankings:

Q: What is different between the skill estimates listed in the Data Golf Rankings and
the skill estimates used to produce the weekly predictions and betting tools?

A: Currently the only difference between the skill estimates in the rankings and the
skill estimates used to generate finish and matchup probabilities is that the latter
takes into account a player's past performance by strokes-gained category
while the former does not. We do this
because we believe rankings should solely reflect the quality of a golfer's
historical performance, which in golf is defined
by total strokes-gained. We incorporate the strokes-gained categories into our modelling
process for making weekly predictions because we know that some categories are more
predictive than others (e.g. Off-the-tee play is more predictive than putting). In general
these two will be closely aligned, but there will be occasional meaningful discrepancies.

Betting Tools:

Custom Simulator:

Q: How frequently is this tool updated and what is changing on update?

A: The custom simulator is updated with new data every evening, as indicated by
the time stamp at the top of the page. The updated data includes our most
recent estimates of player skill. Our skill estimates are updated after
every round during a given week - they don't change much given that 1 round
of golf does not contain much information, but extreme performances (good
or bad) can result in meaningful differences (i.e. 0.1-0.2 stroke differences
in our predicted strokes-gained estimates).

Q: How do you incorporate the cut into your 4-round matchup simulation?

A: Each week when we simulate the weeklong finish probabilities (e.g. win, top5, etc.)
we also keep track of the average strokes-gained performance required to make the cut.
We then use this cutline estimate in our 4-round matchup simulations;
if either golfer's 2-round total (or 3-round total, for a select few tournaments) is below
our estimated cutline, they "miss the cut" in that simulation, and the result of the match
is recorded accordingly. If you select 2 players that are not competing in the same
event that week, or aren't competing at all, you will recieve a notice that we are
using a default cut rule (which is strokes-gained of 0).

Q: Why do the 4-round matchup and 3-ball win probability estimates differ slightly each time the page
is refreshed?

A: The win probabilities for 4-round matchups and 3-balls are obtained through simulation, whereas
the win probabilities for the 1-round matchups can be obtained via an analytical solution (i.e.
through math!). Even though we do 40K simulations (yes, they happen very quickly as the page is loaded),
there will still be small differences in our probability estimates on each run for 3-balls and 4-round
matchups.

Finish Tool:

Q: How are ties treated in the finish probability estimates?

A: Our pre-tournament finish probabilities are derived through simulations
in which ties are not possible. That is, the sum of the field's Top 20 probabilities
will equal 20, for example. This is the appropriate way to do things if you are
placing bets that use dead-heat rules (which nearly all books use).

Matchup Tool:

Q: How are you calculating expected value for matchups where ties are void?

A: We answer this question in the linked PDF
on the matchups tool page. Essentially the difference
is that we include the possibility of a tie, and hence voided bet, in the expected value
calculation. This seems the logical way to do things, as voided bets are still included in our running
totals of bets made, ROI, etc.

True Strokes-Gained:

Q: What is "true" strokes-gained?

A: True strokes-gained is simply raw strokes-gained (i.e. the number of strokes you beat
the field by in a given tournament-round) adjusted for the strength of that field. As with regular strokes-gained, true
strokes-gained requires a benchmark. For this we use the average player in a PGA Tour field in a given
season. Therefore, you would interpret a true strokes-gained number from a round in the 2018 season
as the number of strokes better than what we would expect
from the average player in 2018 PGA Tour fields. This interpretation
holds for performances from all the tours in our data. For example, the average true strokes-gained
performance on the 2018 Mackenzie Tour was about -2.5 strokes per round.

Because the benchmark is unique to each season, we are not taking a stand on how the skill level of the average PGA Tour player is changing over time. This "true" adjustment is also applied to each of the strokes-gained categories, and the interpretation is the same (i.e. performance in that category relative to the average player in a PGA Tour field in the relevant season).

Because the benchmark is unique to each season, we are not taking a stand on how the skill level of the average PGA Tour player is changing over time. This "true" adjustment is also applied to each of the strokes-gained categories, and the interpretation is the same (i.e. performance in that category relative to the average player in a PGA Tour field in the relevant season).

Q: How can you estimate a player's performance relative to the typical PGA Tour player for tournaments
other than those on the PGA Tour?

A: It is possible to make comparisons of performances on, for example, the Web.com Tour to those
on the PGA Tour because there is overlap between these fields. That is, each week in the Web.com event
there will be players who were in the PGA Tour event in the weeks preceeding or following it. It is due to this
overlap that makes direct comparisons across tournaments and tours possible. For example, if a player
beats a PGA Tour field by 1 stroke per round one week, and then beats a Web.com field by 2 strokes
per round the next, we could conclude that this PGA Tour field is 1 stroke better per round than
this Web.com field (if we assume the player's ability was constant across the 2 weeks).
Of course this example doesn't seem very realistic because we are ignoring the role that statistical
noise plays: what if the player played "poorly" one week? This would lead us to draw incorrect
conclusions about the relative field strengths. This is mitigated in practice by the
fact that we don't have just one player "connecting" fields, but many.

But what about tours like the Mackenzie Tour or Latinoamerica Tour? Surely there is very little overlap between these tours and the PGA Tour in a given season. That's true, but to make comparisons of the Mackenzie Tour to the PGA Tour, we don't actually need direct overlap. It is sufficient that there are players from the Mackenzie Tour events who also play in Web.com events, and then there are some (different) players in the Web.com events that also play in the PGA Tour events. It is in this sense that we require Mackenzie Tour events to be "connected" to PGA Tour events. The accuracy of this method is limited by the amount of overlap across tours and fields; in general, we find there is a lot more overlap than you would expect.

Once we run this statistical exercise, we are left with a set of strokes-gained numbers that can be compared relative to one another. But, we would like to have a useful benchmark to to easily understand the quality of any one performance in isolation. Therefore, as said above, for each season we make the average true strokes-gained performance equal to 0 on the PGA Tour. This gives us the nice interpretation for all true strokes-gained numbers as the number of strokes-gained relative to this baseline.

But what about tours like the Mackenzie Tour or Latinoamerica Tour? Surely there is very little overlap between these tours and the PGA Tour in a given season. That's true, but to make comparisons of the Mackenzie Tour to the PGA Tour, we don't actually need direct overlap. It is sufficient that there are players from the Mackenzie Tour events who also play in Web.com events, and then there are some (different) players in the Web.com events that also play in the PGA Tour events. It is in this sense that we require Mackenzie Tour events to be "connected" to PGA Tour events. The accuracy of this method is limited by the amount of overlap across tours and fields; in general, we find there is a lot more overlap than you would expect.

Once we run this statistical exercise, we are left with a set of strokes-gained numbers that can be compared relative to one another. But, we would like to have a useful benchmark to to easily understand the quality of any one performance in isolation. Therefore, as said above, for each season we make the average true strokes-gained performance equal to 0 on the PGA Tour. This gives us the nice interpretation for all true strokes-gained numbers as the number of strokes-gained relative to this baseline.

Q: On the true strokes-gained page, why don't the strokes-gained
categories add up to strokes-gained total in the yearly summary tables?

A: Only events that have the ShotLink system set up provide data on player performance
in the strokes-gained categories. Therefore, the true strokes-gained numbers in each
category are derived from this subset of events, while the true strokes-gained total
numbers are derived from all events in our data (PGA Tour, European Tour, Web.com, etc.).
If every tournament a golfer played in a given season had the ShotLink system in place,
then the sum of the true SG categories will equal true SG total.

Expected Wins:

Q: What are expected wins?

A: Expected wins measure the likelihood of a given strokes-gained performance
resulting in a win. For example, averaging 3 strokes-gained per round (over the golfers who played all rounds in the tournament) at
a full-field PGA Tour event will result in a win about 55% of the time.
Why would this be good enough to win some events, but not others?
Sometimes another player may also happen to have a great week and gain more
than 3 strokes per round, while other weeks this does not happen. To get a better sense of the relationship
between strokes-gained and winning on the PGA Tour,
plotted below is the winning raw strokes-gained
average at every full-field PGA Tour event since 1983 (note: only players who play all rounds in a tournament
are included in the strokes-gained calculation).
The intuition behind the expected wins calculation is simple. For example, to estimate
expected wins for a raw strokes-gained performance of +3 strokes per round,
you could just calculate the fraction of +3 strokes-gained performances that historically have resulted
in wins. (In practice, it's not quite this simple as the number of strokes-gained performances
exactly equal to 3 will be small. Therefore some smoothing must be performed — see graph below.)

When actually estimating expected wins, we also consider a few characteristics of the event. This includes the size of the field, the tour it was played on (i.e. PGA, Web, or European), the year it was played, and also whether the event was a Major or had no cut. Winners of majors typically beat fields by more strokes than at regular tour events, and winners of tournaments with larger fields typically beat the field by a larger margin, all else equal. Because professional golf has become deeper over time, the winners of golf tournaments today on average beat fields by less than in the past. Shown below is the actual function that maps from raw strokes-gained (again, this is raw strokes-gained relative to the players who made the cut and played all rounds) to expected wins for full-field regular PGA Tour events in the year 2000 (the function would look slightly different for events with smaller fields, or for majors, or for a different season etc.):
We also calculate *true* expected wins. This measures the likelihood of a given
strokes-gained performance resulting in a win at an *average full-field
PGA Tour event*. This is calculated by first adjusting the raw strokes-gained
performance for field strength, and then plugging it into the function shown
in the graph above. For example, suppose a golfer beat a European Tour field in the year 2000
by 4 strokes per round. This would be worth roughly 0.95 **raw** expected wins (that is,
we would expect this performance to win 95% of European Tour events).
After taking into account strength of field, suppose we find this performance
is equal to 3 strokes-gained per round over an average PGA Tour field. Then,
we would say this performance is worth roughly
0.55 **true** expected wins (using function shown above). Evidently, at events with an
average PGA Tour field, raw expected wins will equal true expected wins.
For reference, the Travelers Championship was an average quality full-field PGA Tour event
in 2018.

Expected wins provide a means of quantifying the number of high-quality performances a golfer has had, while avoiding the noise that is built in to using number of wins for this purpose. "Expected" statistics are used in many sports (e.g. expected goals in soccer), and they are all based on a similar premise. In golf, we were first introduced to the concept of expected wins from an article written by Jake Nichols of 15th Club.

When actually estimating expected wins, we also consider a few characteristics of the event. This includes the size of the field, the tour it was played on (i.e. PGA, Web, or European), the year it was played, and also whether the event was a Major or had no cut. Winners of majors typically beat fields by more strokes than at regular tour events, and winners of tournaments with larger fields typically beat the field by a larger margin, all else equal. Because professional golf has become deeper over time, the winners of golf tournaments today on average beat fields by less than in the past. Shown below is the actual function that maps from raw strokes-gained (again, this is raw strokes-gained relative to the players who made the cut and played all rounds) to expected wins for full-field regular PGA Tour events in the year 2000 (the function would look slightly different for events with smaller fields, or for majors, or for a different season etc.):

Expected wins provide a means of quantifying the number of high-quality performances a golfer has had, while avoiding the noise that is built in to using number of wins for this purpose. "Expected" statistics are used in many sports (e.g. expected goals in soccer), and they are all based on a similar premise. In golf, we were first introduced to the concept of expected wins from an article written by Jake Nichols of 15th Club.

Betting Results:

Q: What are the criteria you use to select the bets shown on the betting results page?

A: All bets are placed through Bet365, so the first criteria is that the bet is offered there.
For each bet type (matchups, 3-balls, Top 20s, etc.) there is an expected value threshold that must be
met to place the bet. For example, at least a 4.5% edge is required to take a matchup bet. We also do
not place 3-ball or matchup bets if we have very little data on any of the players involved
(cutoff is around 50 rounds). We do this because our predictions for low-data players have much more
uncertainty around them.

Q: When are the bets displayed on the results page?

A: Bets are typically displayed on the page as soon as play begins on a given day (sometimes
a half-hour to an hour after play begins). For Scratch members
bets can be viewed as soon as we make them ourselves (typically well before play begins).

Q: How do you decide how many units to wager?

A: We use a scaled-down version of the
Kelly Criterion. The Kelly staking strategy tells you how much of your bankroll to wager, and is an increasing function
of your percieved edge (i.e. how much greater your estimated win probability is than the implied odds) and a decreasing
function of the odds (i.e. longer odds translates to smaller bet sizes, all else equal).

Live Predictive Model:

Q: Why do the Top 5 and Top 20 probabilities add up to more than they "should" (i.e. 500% and 2000%, respectively)?

A: This is the case because the live model is simulated with *ties allowed*. One of the live model's main
purposes is to accurately predict cut probabiltiies; evidently, this requires allowing for ties. As a consequence,
the Top 5 and Top 20 probabilities provided are not suitable for making in-play bets where ties are resolved by
dead-heat rules. They will indicate more value than they should. Win probabilities in the live model will always add up
to 100%, as any ties for first are resolved in each simulation.