John Loveall, Christina Mark, Faith Okenchi, Michael Weidner
UGA PIC Math
April 29, 2022
UGA PIC Math Golf Team 2 Final Report
Through the University of Georgia’s (UGA) PIC math class this semester, we had the
opportunity to collaborate with the UGA Women’s golf team. In partnership with the team’s
head coach Josh Brewer and student athlete/team member Caterina Don, we worked to address a
problem within the women’s golf industry: the lack of useful data analytics. One of the most
useful performance metrics in golf is strokes gained. Strokes gained is a complicated formula
which calculates the number of shots gained relative to the rest of the field of competition.
Strokes gained has had a considerable influence in golf, specifically for the Professional Golf
Association (PGA), the most prestigious men’s professional golf league. The issue with strokes
gained is that the formula involves baseline values based on PGA player averages. Thus, for
female golfers, the baseline values are not accurate portrayals of their field of competition. When
female golfers use strokes gained to analyze their performance, they are comparing themselves to
professional male players, which is neither practical nor accurate due to the performance gap
between male and female golfers. Because the Ladies Professional Golf Association (LPGA) is
lagging well behind the PGA in their collection of statistics and implementation of data
analytics, calculating strokes gained using LPGA baselines is not possible, because no such
baselines exist. The UGA women’s golf team currently uses a program called Birdie Fire to
analyze basic golf statistics. However, this application and others like it cost thousands of dollars
and are not an affordable option for many teams. Furthermore, golfers lacking strong statistical
intuition and analytical skills have trouble understanding the implications of the application’s
data. Consequently, UGA’s female golfers are not receiving adequate statistical feedback to train
and improve their game.
Our team was one of two assigned to the UGA Women’s Golf industry partner, and
together, we delegated some elements of our analysis to one team and some elements to another.
Our team’s primary interest was the relationship between two statistics which the UGA women’s
golf team collects: approach shot distance and first putt length. In golf, an approach shot is a
stroke made by a golfer with the intention of landing the ball on the green. The approach shot
distance, then, is the distance from the ball to the hole when the golfer makes their approach
shot. The first putt length is the distance from the ball to the hole when the ball lands on the
green for the first time. Ideally for players, each approach shot would land the ball on the green,
directly resulting in a first putt. While this is not always the case, the data we observed showed
the players’ approach shots landed on the green the vast majority of the time, so accounting for
the exceptions was not a major priority of the project; although, it is something to consider.
The other team focused predominantly on the strength of correlation among strokes and
other various metrics. For our project, the correlation between specific approach shot distances
and specific first putt lengths is not as important as the correlation between first putt length
ranges and approach shot distance ranges. One way to more easily visualize this is to think of the
green as a dartboard and the approach shot distance as the distance of the person to the dartboard.
Although it might be intuitive to assume a golfer will get the ball closer to the hole when their
approach shot distance is shorter, golfers are not always more accurate the closer they are to the
green, as will be shown in the results below. Thus, our strategy was to split up the data into four
ranges of first putt lengths and ten approach shot distance ranges. We then wanted to find the
conditional probability of each first putt length range given each approach shot distance range.
To accomplish our goal, we wrote a script which runs this calculation in the programming
language R. The sample output below comes from data collected on six rounds played by one
player. Once we had a working code program, we created a website which takes approach shot
distances and first putt lengths from eighteen holes, runs the calculations, and outputs a graph
and table displaying the results.
The result of our work over the course of the semester is a website which gives the
golfers feedback on their approach shot accuracy. They have two options for viewing the
statistical analysis: a stacked bar graph and a two-way table. Below is an example of the stacked
bar graph the website will generate after users input their data on the website.
The x-axis is the distance on the approach shot, which includes ten ranges from zero to over 165
meters. The color key indicates the distance range of the player’s first putt resulting from each
respective approach shot. The y-axis represents the frequency, or percentage of times, the player
hits within each first putt distance range from that approach shot distance range. In other words,
the graph answers the conditional probability question of the following form: if my approach
shot was from x distance, what are the odds that my resulting putt was y distance from the hole?
Taking the first bar on the left-hand side, for example, when this player’s approach shot was
under 60 meters, their ball landed within 10 feet of the hole roughly 67% of the time and
between 10 and 20 feet from the hole about 33% of the time. Below is the corresponding
example of the two-way table this user would see.
Each row represents a range of distances on approach, and each column represents a
range for lengths of first putts. Each table entry is the conditional probability of its row’s
approach shot distance and its column’s first putt distance. The entries are color coordinated to
make it more visually clear which areas are strong and which need improvement. A player wants
high percentages in the green, less than 10 feet column because that suggests they are hitting
accurate approach shots a majority of the time. The yellow-green, second column indicates fairly
accurate approach shots, and the orange, third column indicates fairly inaccurate approach shots.
The red, last column indicates inaccurate approach shots, so high percentages in this column
imply a need for improvement from the distance ranges of those rows with these high
percentages. Each row should sum to 100% because the rows are the “given” part of the
conditional probability.
Both visuals display the same information and feedback, but we offer both options so the
players can choose which one makes more sense to them. With the assistance of the provided
diagram which explains how to read their results, players will see which distances, and therefore
clubs, on which they need to focus.
The code for the program can be organized into two sections: the user interface and the
output. The user interface contains all the code necessary to organize the visual layout of the
webpage. The webpage provides the platform where the user can input their numbers and view
the resulting visuals. We chose the “minty” theme contained in the bslib package to provide the
website with the font and color schemes that are displayed on the webpage. The output portion
contains the calculation for generating the two visuals. It begins by assigning the approach shot
distances and putt lengths to an 18 x 2 matrix, which is then split into four separate matrices,
each containing putts in only one range. Next, a series of conditional statements is looped over
each of the four matrices, generating a frequency matrix which contains essentially the same
information as the table output shown above but in absolute frequency values rather than in
percentages. This matrix is finally manipulated into its respective forms for the graph and table
through the utilization of the ggplot2, dplyr, data.table, and formattable packages.
With the application of the shiny, tidyverse, tidyr, and rconnect packages, we created a
user-friendly website that players can access on their personal devices. The website features 36
boxes in which a player can enter a number. The first two boxes are for the distance on approach
and resulting first putt length from the first hole, respectively. The next two boxes are assigned to
the distances from the second hole, the next two boxes for the distances from the third hole, and
so on for all 18 holes. Each box is clearly labeled so the player will understand which entry
belongs in each box. As players enter their numbers on the left-hand side of the page, the right-
hand side will display the stacked bar graph and two-way table. The stacked bar graph is located
under the tab labeled “Graph,” and the two-way table is located under the tab labeled “Table.”
The player can click on the tab to display whichever visual they personally find to be the most
helpful. We hope that through the use of our website, any golfer who is seeking insight on their
approach shot accuracy can quickly and conveniently receive feedback for a round of golf. By
looking at the graph and/or table, the player will more easily understand how well they are
performing from each distance range. They can use this information to concentrate on specific
clubs and improve their overall game.
Moving forward, we would like to save all the values that each player inputs when they
use the website. Each player would have their own account in which their past approach shot
distances and resulting putt lengths from every hole they have recorded on the website would be
stored. As the player inputs new distances, the website will provide them with the same stacked
bar graph and two-way table we have now, but it will be generated from both the new data and
all the data from past records. This would provide each player with more beneficial and
informative feedback since they’ll be looking at their personal trends over a much larger data set,
which would most accurately reflect their performance. Furthermore, we would like to include
features that consider other metrics related to approach shot distances and first putt lengths. For
example, up and downs refer to when a golfer misses the green on their approach shot and then
recovers, typically by chipping onto the green and making the first putt. Greens in regulation
(GIR) measures how often a golfer gets the ball onto the green in two strokes under par. Both of
these relate to a golfer’s approach shot and speak to the issue previously mentioned in our
analysis: is the approach shot actually resulting in the first putt, or is there a stroke in between,
like in the case of an up and down? We would like our project to account for the possible
discrepancies. Once values can be stored in the website to provide a larger data set, approach
shots missing the green should not skew the analysis by much, but as it stands now, the more
approach shots miss the green, the less accurate our results become.
Additionally, once these various points of improvement are addressed, the goal is to
integrate this website with the website created by the other UGA PIC Math group working with
the UGA women’s golf team. The collaboration would provide a data analytics tool which could
be of practical use to the women’s golf program at UGA, replacing the current costly
subscription to Birdie Fire while providing new ways to look at the data.
PIC Math is a program of the Mathematical Association of America (MAA) and the Society for
Industrial and Applied Mathematics (SIAM). Support is provided by the National Science
Foundation (NSF grant DMS-1722275).