Roland Minton |
This website is a companion to the Johns Hopkins University Press book Golf By the Numbers. Please send me corrections or additions to the book. Here are details of my implementation of the Strokes Gained concept. Link back to main ShotLink page.
The concept of Strokes Gained is simple. If you know the average score on the PGA Tour from every location on the golf course, you can compare each golfer's expected score before a shot and the expected score after the shot. Strokes Gained for that shot equals the difference - how much that shot changed the golfer's expected score on the hole, for better or worse. There are different ways to compute those average scores, so there will be differences between my Strokes Gained values and the PGA Tour's. Here is an example. PGA Tour golfers average 1.53 putts from 8 feet. Making an 8 feet putt then saves 0.53 strokes (1.53 - 1), while 2-putting costs you 0.47 strokes (2 - 1.53). Add up all of the golfer's Strokes Gained for the year, divide by the number of putts and you have Strokes Gained Putting.
Here are some issues with the above description. My stated average of 1.53 strokes is actually for all putts of length greater than 8 feet and less than or equal to 9 feet. You could find the average for every half-foot or inch and the calculations would change. The average of 1.53 is only when the first putt is of length 8-9 feet. Putting percentages are different for first and second putts, so I wanted to start everybody off at the first putt. Also, I wanted the player's average to be measured in strokes per hole (times 17 for everybody) rather than strokes per putt (which can differ widely from golfer to golfer). I want the fairest comparison possible. One more step down the rabbit hole: a putt is defined to be a stroke taken from on the green. The club used doesn't matter (the ShotLink data I get doesn't list club, anyway). Many/most shots taken from the fringe (an official ShotLink designation) are with a putter, but they do not count as putts. My choice is to treat a stroke from the fringe as starting the putting cycle. That is, I compare the number of strokes needed to hole out from the fringe and compare it to the Tour average from the fringe, and then I do not account separately for the putts taken. The Tour average becomes 17 holes on which I measure "putts" and 1 hole on which "fringe" strokes are measured.
The most important issue is how much detail to use. The more detail, the more meaningful a comparison to the average is, but the harder the average is to compute. For example, I might want to compare a 122-yard approach shot from a sidehill lie to a different average than a 121-yard approach shot from a level lie. However, if there had only been one other 122-yard sidehill approach, using that one shot as an average would be silly. ShotLink records distances to the inch, but to increase sample sizes for approach shots I round to the nearest yard. I don't use information about sidehill and level lies, which is not always recorded in the data set. The categories of lies I use are listed below. Even rounding to the nearest yard, there are numerous cases where the average from, say, 121 yards is actually higher than the average from 123 yards. Choosing not to believe that there is any magic in 121 yards versus 123 yards, I smooth the data so that the average score increases with distance from the hole. The basic ShotLink data does not indicate whether or not a tree is in the way and I do not use any augmentation to tell me whether a hole is a dogleg or not. The data does give information about elevation so that it is possible to determine whether a shot is uphill or downhill, but I have not used that. To calculate the average for approach shots from the fairway from 121 yards, for example, start with the average number of putts from each distance. Then compute the average score from greenside bunkers and fairway and rough 0-50 yards. (In these cases, I make the simplifying assumption that these shots will end up on the green, so that I can use the putting averages). For five years worth of data, see where the 121-yard shots end up (green, bunker, fairway, rough) and record the expected number of shots from there. Then find the average of those values. Note that this is not at all the same as computing the average distance to the hole after the shot ("proximity") and converting that to a score. The latter method does not take into account shots that miss the green. A given shot is then treated the same way: find its finishing location, compute average number of strokes from there, and compare it to the average before the shot.
I was surprised when I first ran the numbers by the regularity of scores on par 4s. I computed average scores on par 4s of length 420 yards (actually, 411-420), and then 430 yards, and so on. The average scores form a nearly perfect line. The longer the hole, the higher the average score. That average score is the starting place for computing average score for a tee shot on a par 4. I then look at the location of the tee shot (rough or fairway) and distance to the hole and use the average score from that location to determine Strokes Gained on the tee shot. There's always a little more detail to consider. A drive into the "fairway" actually means fairway or intermediate rough (from where the average scores increase very slightly) and "rough" means everything else (primary rough, fairway bunker, "native area" and so on). "Rough" also includes "water" and I penalize the golfer for that in the Penalty category. Drives into the trees that force the golfer to chip out sideways are penalized in the Recovery category. So a full consideration of driving should consider the Penalty and Recovery categories. I separated tee shots on par 3s, 4s, and 5s. My logic on separating 4s and 5s is that many par 4s demand an iron off the tee or at least a decision on what club to use. This is where the average score becomes problematic unless average score on each individual hole on each course can be computed. Length off the tee is so important that mad bombers look good even if using driver on a particular hole is dumb. With data going back to 2004, perhaps there is enough data to start to compute average scores from locations on specific courses and specific holes.
The above ideas get us to raw Strokes Gained calculations. If a particular golfer only plays easy courses and is lucky enough to play on nice weather days, this can produce an unfair comparison to a golfer who plays in tough conditions. Course corrections are necassary. The simplest idea is to compute the average Strokes Gained in a particular category for a specific round, and then subtract that from each golfer who played that day. For example, if outrageous pin placements caused Strokes Gained Putting to average 0.3 strokes worse than average for a round, adjust 0.3 strokes for each golfer's Strokes Gained Putting. This simple correction ignores the possibility that the greens aren't that hard, but that this is a minor tournament populated with mediocre and bad putters. So I use an iterative process. Find the course averages and adjust the player's averages accordingly. Then use the player averages to adjust the course averages (in the above example, if the players playing that round had averages of 0.1 strokes worse than average, then adjust the course effect from 0.3 to 0.2). Then use the adjusted course averages to adjust the player averages, and so on.
Bunker: greenside bunker shots. Fairway bunker shots come under the category of Miscellaneous. Fairway: shots from the fairway, in 50-yard increments. Intermediate Rough: shots that are labelled IR in the ShotLink data set. There are not many shots with this label, and some courses have no shots with this label. Categories are short (0-50 yards), full iron (50-200), and long (200-250). Rough: shots from the Primary Rough. I did not include shots from beyond 200 yards, thinking that these would most likely be layup shots. Miscellaneous: includes many ShotLink labels such as Fairway Bunker, Native Area, Dirt Outline, and so on. There are very few shots from 0-50 yards with these labels. Penalty: the number of penalty strokes incurred, compared to the average. Layup: a shot from beyond 250 yards in the fairway or beyond 200 yards in the rough. Recovery: a shot from 50-200 yards in the rough that finishes more than 50 yards from the green. My assumption is that there was no reasonable way to get closer to the green due to tree trouble or a long carry over water. |