0% found this document useful (0 votes)
14 views12 pages

Single Game Career High

This paper examines the relationship between NBA players' Single Game Career High (SGCH) and Season Average Career High (SACH), proposing a linear model to predict SGCH based on SACH. The analysis reveals that SGCH is generally higher than SACH, particularly among elite players, and suggests that SGCH often occurs in the same season as SACH. The study aims to highlight the significance of SGCH in basketball analytics, a topic that has been largely overlooked in existing literature.

Uploaded by

Tammy G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views12 pages

Single Game Career High

This paper examines the relationship between NBA players' Single Game Career High (SGCH) and Season Average Career High (SACH), proposing a linear model to predict SGCH based on SACH. The analysis reveals that SGCH is generally higher than SACH, particularly among elite players, and suggests that SGCH often occurs in the same season as SACH. The study aims to highlight the significance of SGCH in basketball analytics, a topic that has been largely overlooked in existing literature.

Uploaded by

Tammy G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Journal of the Chinese

Statistical Association
Vol. 56, (2018) 25–36

SINGLE GAME CAREER HIGH VS.


SEASON AVERAGE CAREER HIGH OF NBA PLAYERS

Tao Chen† and Yixuan Li


Department of Economics, University of Waterloo, Ontario, Canada
Big Data Research Center, Jianghan University, Wuhan, P. R. China

ABSTRACT

Despite the fact that quantitative analysis of professional basketball games is a very
popular research topic in both academia and industry, to the best of our knowledge,
there has been no study on events, when prominent players have career nights in scoring.
In this paper, we develop a linear relationship between NBA scorers’ Single Game
Career High (SGCH) and Season Average Career High (SACH), through which we
predict the upper limit of SGCH and the timing of it. This linear relationship is tested
against a non-parametric alternative, which is mis-specification free. Although, the
analytical methods are quite fundamental here, the main goal of this paper is to draw
more attention to this line of study.

Key words and phrases: NBA, Leading scorers, Single game career high, Simple linear
regression, Nonparametric alternative.
JEL classification: C12, C14.


Correspondence to: Tao Chen
E-mail:t66chen@uwaterloo.ca
26 TAO CHEN AND YIXUAN LI

1. Introduction
Quantitative analysis of basketball games is one of the major fields in the sports data
analytics literature. Take the Journal of Quantitative Analysis in Sports (JQAS), the
leading mainstream academic journal in this area, as an example: JQAS has published
in total 349 articles as of the second issue in 2016. Among them, 69 articles address
issues related to basketball; and 50 focus only on basketball. Popular topics in basket-
ball include, but are not limited to, measuring player efficiency rating (Tversky and
Gilovich (1989), Hollinger (2002)), player contribution to winning (Rosenbaum (1989),
Deshpande and Jensen (2016)), modeling score difference processes (Stern (1994), Gabel
and Redner (2012), Chen and Fan (2016)), measuring plus/minus rating (Rosenbaum
(2004)), shot zones (Beech (2008b)), clutch performance (lardi (2007), Schumaker,
Solieman, and Chen (2010)) and studying possessions (Kubatko, Dean, Kevin, and
Dan (2007), Štrumbelj and Vračar (2012), Cervone, D’Amour, Bornn, and Goldsberry
(2014)).
This note looks at each NBA player’s Single Game Career High (SGCH). To the best
of our knowledge, this statistic has never been discussed in the literature, even though it
could be very eye-catching when one prominent player in the league has a career night.
Kobe Bryant scored 81 points on January 22, 2006 which is the second most points a
player scored in a single game in the NBA history, after Wilt Chamberlain’s 100 points
on March 2, 1962. Media competed to report Kobe’s performance that night, and even
after ten years, Markazi (2016) still calls it “One of the greatest individual achievements
in sports history.” Chamberlain and Bryant have kept their records for more than 50
and 10 years while no NBA player can get close to them. Stephan Curry, who holds
the most 3-point field goals record (402 in 2015/2016 season), said he had no chance
to break the record when he was interviewed by Medina (2016). By definition, SGCH
is a once-in-a-life experience for all players, therefore it is not immediately clear to us
how it could be useful to improve the understanding of any aspects of the basketball
game; this is probably the reason no one has done any research on it. On the other
hand, as there is so much attention attached to this almost unreachable record, we try
to characterize conditions through Season Average Career High (SACH) to predict the
upper limit of a player’s SGCH, and the timing of it.
The rest of this paper is structured as follows. In Section 2, we define the variables
and describe the details of our data set. In Section 3, we develop the linear relationship
between SGCH and SACH, and verify it against the nonparametric alternative. Section
4 concludes the paper.
SGCH VS. SACH OF NBA PLAYERS 27

2. Data Description
Our sample contains the top 250 players from the NBA all-time-scoring list. We
do not extend the same analysis to more players for two reasons: (1) for all the data
sources we can find, reporting the top 250 players seems to be the norm; and (2) if a
player is not high on the all-time-scoring list, he either has not played enough games
or/and is less talented in scoring, which does not contribute positively to the current
study. We extract the names of the top 250 players from basketball-reference.com and
acquire the information on each player from NBA.com using data scraping techniques.
The player stats log contains complete game statistics for each player, among which we
are mostly interested in SGCH and SACH, and in which years those two variables are
recorded.
It is important to point out that both SGCH and SACH we use in this paper are
from the regular season only. Among the 250 players, only 14 of them have SGCHs
in playoff games, and the difference between their regular season SGCHs and their
playoff’s SGCHs is very small anyway. As for SACH, players play far more number of
games in the regular season as opposed to playoffs, and the scoring in the playoff is
much more influenced by who the opponents are and specific match ups. Therefore, we
only focus on regular seasons.
We define the variable Order of Events (OE) to be the number of seasons SACH
is ahead of SGCH for each individual player. In other words, if SGCH occurs after
SACH, OE is positive; if the SGCH occurs in the same season as the SACH, OE equals
zero; otherwise, OE is negative. When a player’s SGCH comes from multiple seasons,
we choose the first season as this is more consistent with the definition of career high;
for all the players we study, their SACH is from one unique season.
As we expect, the performance of NBA players at different rankings (per all-time-
scoring list) varies; that is, the relationships of the SGCH and the SACH of players
in different classes are different. Therefore, we separate our data into six categories
to analyze: Top 10, Top 20, Top 30, Top 100, Top 200 and Top 250 players. Some
statistical descriptions of each variable are summarized in Table 1.
Taking the first category as an example, the mean, the standard deviation, the
minimum value and the maximum value of the top 10 players’ SGCHs are 63.9, 15.6,
52.0 and 100.0, respectively. We can already make two quick observations right here.
The first one is SGCH average is roughly twice of the average of SACH across different
categories and the average OE only varies narrowly within -0.5 to 0.5.
28 TAO CHEN AND YIXUAN LI

Table 1: Summary Statistics

Categories Variables Mean Std Min Max


SGCH 63.9 15.6 52.0 100.0
Top 10 SACH 33.3 6.9 26.6 50.4
OE -0.5 2.0 -5.0 3.0
SGCH 59.0 12.6 43.0 100.0
Top 20 SACH 30.9 5.7 24.2 50.4
OE -0.3 3.1 -7.0 8.0
SGCH 57.4 11.5 40.0 100.0
Top 30 SACH 30.2 5.4 19.9 50.4
OE 0.1 3.0 -7.0 8.0
SGCH 51.8 9.4 34.0 100.0
Top 100 SACH 26.7 5.2 17.0 50.4
OE 0.5 2.9 -8.0 9.0
SGCH 48.0 8.8 31.0 100.0
Top 200 SACH 24.5 4.7 15.1 50.4
OE 0.4 2.6 -8.0 9.0
SGCH 47.0 8.6 31.0 100.0
Top 250 SACH 23.7 4.8 14.6 50.4
OE 0.3 2.6 -8.0 9.0

3. Statistical Analysis and Main findings


In this section, we first establish the linear relationship between SGCH and SACH,
and then explore the time difference between those two variables, i.e., study the OE
variable. Our main findings are summarized in the last subsection.

3.1 Regression Models


We first run the following simple linear regression for all the six categories of players:

SGCH = β0 + β1 SACH + U, (3.1)

where the residual U is assumed to have mean of zero and finite variance. Ordinary least
squares (OLS) estimation results of (3.1) are presented in Table 2. Before interpreting
the numbers in Table 2, which will be done in Section 3.3, we should first try to justify
the validity of (3.1), where linear functional form is imposed. To remove linearity, we
look at:

SGCH = g(SACH) + V, (3.2)


SGCH VS. SACH OF NBA PLAYERS 29

Table 2: Estimating Slope Coefficient of (3.1) Using Linear Model

Categories β̂1
Top 10 2.04 (0.33)
Top 20 1.92 (0.25)
Top 30 1.87 (0.19)
Top 100 1.47 (0.11)
Top 200 1.51 (0.08)
Top 250 1.48 (0.06)
The standard deviation is presented in the parenthesis.

Table 3: Specification Test by Härdle and Mammen (1993)

Categories Bandwidth Test Statistic Critical Values Fail to reject the null?
90% 95% 99%
Top 10 2.02 30 1697 2137 3116 Yes
Top 20 1.26 932 1053 1277 1826 Yes
Top 30 1.32 995 3435 4257 6092 Yes
Top 100 1.64 1883 8034 10070 16272 Yes
Top 200 1.21 8035 20113 25616 38472 Yes
Top 250 1.14 9301 24846 31068 46643 Yes

where g(·) is arbitrary and V has mean of zero and finite variance. Note that (3.2)
reduces to (3.1) if g is indeed a linear function of SACH. To estimate g in (3.2), we
use the standard kernel non-parametric method initially proposed by Nadaraya (1964)
and Watson (1964).1 Gaussian kernel is employed and the bandwidth choice follows
Silverman’s Rule-of-thumb method (Silverman (1986)). Numerical values for the band-
width are shown in the second column of Table 3. Both OLS and kernel estimates are
presented in Figure 1 overlaid with data scatter plots.
To verify that the OLS and kernel non-parametric estimate are statistically identi-
cal, we adopt a formal test developed by Häardle and Mammen (1993). As shown in
Table 3, the test statistics for the six categories are smaller than the 90%, 95% and
99% quantiles. These test results, along with Figure 1, suggest that we do not reject
the null hypothesis, i.e. there is no difference between the two estimates, in all six
categories at any sensible levels: 90%, 95% and 99%.2

1
As kernel regression is known to have boundary issues, therefore our estimation of g does not cover
the whole support, but rather leaves several players aside from both ends.
2
Details of this test are available upon request.
30 TAO CHEN AND YIXUAN LI

Figure 1: OLS and Kernel Estimate


N otes: The red line indicates the OLS line and the green line indicates the Kernel Estimate
SGCH VS. SACH OF NBA PLAYERS 31

Figure 2: Histogram of OE
32 TAO CHEN AND YIXUAN LI

3.2 Do SGCH and SACH overlap?


If we would like to test if SACH happens τ -years ahead of SGCH, we could apply
the following Wald-type test:
√ OE − τ
t= n , (3.3)
σOE
where OE and σOE are the average and standard deviation of the OE while n is the
sample size.
To have a general idea of the distribution of OE, we include the histograms of OE
for the six categories of players in Figure 2. We will elaborate on the results in Table
4 and the observations on Figure 2 in the next subsection.

Table 4: Test Result of (3.3)

Categories τ t P-value
Top 10 0 -0.79 0.22
Top 20 0 -0.36 0.36
Top 30 0 0.24 0.60
Top 100 0 1.56 0.94
Top 200 0.5 -0.99 0.25
Top 250 0.5 -0.68 0.16

3.3 Main Findings


We can see from Table 2 that, for the top 250 players, one’s SGCH is on average
50 percent more than that of his SACH and this percentage gradually increases as the
player’s career total-score increases. In fact, for the top 20 players, one’s SGCH is
approximately twice that of his SACH. Such a result is intuitive because a more elite
scorer is in fact more explosive when comparing his own peak to the SGCH than most
of the other players. On the other hand, it is also quite clear that Chamberlain’s record
is very unlikely to be broken unless we start to observe a player who has potential to
achieve a SACH around 50.
Section 3.2 presents the timing of the SGCH compared with the SACH. Table 4
collects the test results. Taking the Top 10 category as an example, the null hypothesis
(i.e. A player’s SGCH and his SACH happens in the same season) is −0.79 and the p-
value is 0.22. Since the p-value is greater than 0.05, we do not reject the null hypothesis
at the 95% significance level, and thus we conclude that a top 10 NBA scorer is likely
to have his SGCH and SACH in the same season. The results of other categories are
interpreted in the same manner. Note that along with Figure 2, the player’s SGCH on
SGCH VS. SACH OF NBA PLAYERS 33

average happens either at the same season as, or one season after, his SACH. Indeed,
the better players tend to reach their SGCH and SACH in the same season while the
bottom players (only with respect to the top 250 players) tend to achieve their SGCH
one season after their SACH. Thus, if we observe a downward trend for a player’s
regular season average, we can then assume that he has achieved his SGCH during the
current season or the previous season and he is unlikely to improve his SGCH.

4. Conclusion
It is important to note that analysis of this type of study has data censoring issue
because there are still active players in the sample, however we believe it has a minimum
effect on our results for the following reasons. First, by the time this data set is
constructed (2016), among the top 250 players, 212 are retired. Specifically, Clifford
Robinson and Chauncey Billups achieved their SGCH at the age of 34, and no one else
did it at a later stage of their professional career. Meanwhile, for the remaining 38
active players, 22 of them are 34 or older; therefore, empirically there is no evidence
to support a further improvement of their SGCHs. Also, close to 92% of the retired
players achieved their SGCHs before 30, and there were only two active players (Kevin
Durant and Russell Westbrook) who were still in their 20s. Putting those two facts
together, we expect only a very limited number of active players to overstep their own
record. The second reason that censoring has a minimum effect is, even if they do
outperform, it is highly likely they will also advance to a higher rank of the total-point
category, and this prediction is consistent with the results we found in Table 2.
This is the first study of SGCH of NBA players in the literature. We establish a
linear relationship between SGCH and SACH, and use SACH to predict the size and
timing of SGCH while a scorer is still in his prime. Conceivably, similar studies can be
extended to other important statistics, e.g., assists and rebounds.
34 TAO CHEN AND YIXUAN LI

References
Beech, R. (2008b). NBA Player Shot Zones. http://www.82games.com/shotzones.htm.

Cervone, D., D’Amour, A., Bornn, L., and Goldsberry, K. (2014). A multiresolution
stochastic process model for predicting basketball possession outcomes. arXiv
preprint arXiv:1408.0777.

Chen, T. and Fan, Q. (2016). A Functional Data Approach to Model Score Difference
Process in Professional Basketball Games. working paper.

Deshpande, S. K. and Jensen, S. T. (2016). Estimating an NBA player’s impact on


his team’s chances of winning. Journal of Quantitative Analysis in Sports, 0.

Gabel, A. and Redner. S. (2012). Random Walk Picture of Basketball Scoring.


Journal of Quantitative Analysis in Sports, 8.

Härdle, W. and Mammen, E. (1993). Comparing Nonparametric Versus Parametric


Regression Fits. The Annals of Statistics. 21, 1926-1947.

Hollinger, J. (2002). in Pro Basketball Prospectus: 2002 Edition., Brassey’s Inc.,


Dulles, VA.

Ilardi, S. (2007). Adjusted Plus-Minus: An Idea Whose Time Has Come. http://www.
82games.com/ilardi1.htm.

Kubatko, J., Dean, O., Kevin, P. and Dan, T. R. (2007). A Starting Point for Ana-
lyzing Basketball Statistics. Journal of Quantitative Analysis in Sports, 3.

Markazi, A. (2016). Oral History: Kobe’s Masterpiece. http://espn.go.com/espn/fea-


ture/story/ /id/14609380/how-los-angeles-lakers-kobe-bryant-made-history-81-
point-game.

Medina, M. (2016). Steph Curry skeptical he could ever top Kobe Bryant’s 81-
point game. http://www.dailynews.com/sports/20160305/steph-curry-skeptical-
he-could-ever-top-kobe-bryants-81-point-game.

Nadaraya, E. A. (1964). On estimating regression. Theory of Probability and its


Applications. 9, 141-142.

Rosenbaum, D. (1989). The Cold Facts About the ’Hot Hand’ in Basketball. Chance.
2, 16-21.

Rosenbaum, D. (2004). Measuring How NBA Players Help Their Teams. http://www.
82games.com/comm30.htm.
SGCH VS. SACH OF NBA PLAYERS 35

Schumaker, R. P., Solieman, O. K. and Chen, H. (2010). Sports data mining. in


Series: Integrated series in information systems ; v. 26., New York : Springer.

Silverman, B. W (1986). in Density Estimation for Statistics and Data Analysis,


London: Chapman and Hall.

Štrumbelj, E. and Vračar, P. (2012). Simulating a basketball match with a homo-


geneous Markov model and forecasting the outcome. International Journal of
Forecasting, 28, 532-542.

Tversky, A. and Gilovich, T. (1989). The Cold Facts About the ’Hot Hand’ in Bas-
ketball. Chance, 2, 16-21.

Watson, G. S. (1964). Smooth Regression Analysis. Sankhy: The Indian Journal of


Statistics, 359-372.

[ Received November 2017; accepted March 2018.]


36 TAO CHEN AND YIXUAN LI

Journal of the Chinese


Statistical Association
Vol. 56, (2018) 25–36

NBA 球員職業生涯單場最高分 VS. 職業生涯賽季最高平均分

陳濤 李一璇
滑鐵盧大學經濟系

摘 要
儘管職業籃球比賽的定量分析已成為一個學術界和工業界非常熱門的研
究課題, 據作者的了解, 目前還沒有任何關領軍球員們的職業生涯最高得分的
統計量的研究。 本文發現了 NBA 球員的職業生涯單場最高分與其職業生涯賽
季最高平均分之間的線性關系, 並通過所得結果預測球員的單場最高分上限以
及達到上限的時間。 本文採用非參數這種不會有錯誤設定的預估方法驗證了此
線性關系。 儘管此分析方法較為基礎, 但本文旨在吸引更多關注到此類研究上。

關鍵詞: 美國職業籃球聯賽, 得分排名前列的球員, 職業生涯單場最高分, 簡單線性還原, 非參


類檢驗。
JEL classification: C12, C14.

You might also like