Predicting The NBA Champion With Machine Learning

JakeAllenData
5 min readDec 30, 2023

--

Using Machine Learning in preparation to predict the NBA Champion right before the playoffs for every up-coming year!

Article Orginally Published: 12/30/2024

Article Updated: 10/25/2024

Article Inspiration: TheJK, please check this guy out… He’s the Goat!

Photo by Author and Michael Meek on the Dribble

Introduction

The NBA playoffs is the post-season tournament held at the end of the regular NBA season. This tournament takes the top 8 teams from each conference (east/west), 16 teams total into a best-of-7-game elimination tournament. The playoffs confer glory among the best regular-season teams in the NBA every season. The playoffs are where players make names for themselves, it is where they show their worth to a team if it’s star players proving the narratives wrong, or unexpected role players making a surprising impact on the game. The unpredictable events affected by high-pressure situations are why the playoffs are so entertaining.

Many fans each year make predictions about why they believe this or these certain teams truly have the best chances to win it all.

In this article, I give an overview of how I created a machine learning model that has managed to predict the last 5 NBA Champions Correctly!

New Machine Learning Predictions as 8/1/2024

Data

All the data used for this project comes from basketball reference. For this project I used player, coach, and team data that traces back to the 1950 season. For more information on the statistics click here for the glossary of the statistics.

All of the detailed steps of this project, including scraping, creating, and analyzing the data, is on my GitHub.

Python Libraries: np, pd, BeautifulSoup, selenium, StringIO, Commet, webdriver

Model Prediction Value

The value to be calculated by the model is champion_share. This is a custom statistic is calculated from the ending post-season standing. An example of this calculation is the 2023 title runner-up Miami Heat who achieved 13 playoff wins out of the potential 16 wins it takes to win the title (13 / 16 = 0.8125).

Table Calculation of Champion Share

Machine Learning Model Results

Machine Learning Model Used: XG-Boost Regression (Complex)

For full details feel free to investiage the code: My GitHub

This section covers 4 charts that captures every aspect of what it takes to become a Modern NBA Champion.

  1. Season Playoff Comparison Chart (2024 Season)
  2. Feature Importance Chart (Most Important Features)
  3. Feature Importance Heat Table (2024 Season)

Python Libraries: np, pd, sklearn, matplotlib, xgboost, statsmodels.api, shap, lime

The Season Playoff Comparison Chart shows the differences between the prediction and actual results of the 2024 playoff.

Season Playoff Comparison Chart (2024 Season)

One way to assess which features are most important for predicting the NBA playoffs is by examining the feature importance scores of the model. In this project, a complex tree-based model (Random-Forest Regression) was used, which calculates feature importance based on how much each feature contributes to reducing error in the predictions.

However in this models case, the Feature Importance Chart doesn’t indicate whether a feature has a positive or negative effect it simply shows how relevant each feature is for the model’s overall performance. This is a common trait of tree-based models, which split the data based on different features without assigning direct positive or negative weights to them.

XG-Boost builds one main decision tree, then sequentially adds additional decision trees that focus on correcting the errors from the previous tree. Each new tree tries to capture what the previous tree missed, which reduces the overall error. By iteratively building and adjusting these trees, XG-Boost effectively minimizes prediction errors and achieves a highly accurate final prediction.

SHAP Feature Importance (Most Important Features)

For more information on how tree-based models, click here.

Now, for a more detailed view, here is the Feature Importance Heat Table, which displays the top 10 most important features in a tabular format. This table not only highlights how these key features contribute to the prediction but also reveals how they relate to one another.

Feature Importance Heat Table (2024 Season)

What Makes An NBA Champion?

With all of the information presented, this article can briefly be summarized in these 4 points (in ascending order of importance).

  1. To become an NBA champion, a team must dominate throughout the season, consistently defeating opponents by large margins and securing a top 3 seed heading into the playoffs (top_3_conference, top_6_SRS).
  2. The team must also be highly regarded by oddsmakers, as every NBA champion has ranked within the top 9 in preseason odds (pso).
  3. Star power is essential, particularly MVP-caliber talent. The roster should feature one or more players with strong performances in past MVP races (sum_mvp_shares). Having players who have excelled in Defensive Player of the Year (DPOY) contests (sum_dpoy_shares) also provides an advantage. Moreover, the presence of players who made an impact early in their careers, particularly in Rookie of the Year (ROY) races (sum_roy_shares), underscores the team’s overall talent.
  4. Experience plays a critical role in achieving postseason success. Championship teams typically include players with substantial playoff experience over the past three seasons (sum_franchise_L3S_cs). The team’s average age is also crucial players in their prime (Age) are generally best equipped to compete for a title.
  5. Lastly, a successful team must be strong offensively and on the boards. The last 10 NBA champions have ranked in the top 5 in offensive effective field goal percentage (top_5_offensive_eFG%). Additionally, being one of the league’s top rebounding teams is crucial for withstanding the physical demands of the postseason (TRB).

Conclusion

Overall, I was really impressed with my results. By creating an expansive and creative dataset capturing all aspects of the players, coaches, and teams I was able to capture the complexity of the NBA playoffs using a complex tree-based regressor.

Feedback

Any questions or feedback are always welcome, so feel free to post them. For full details of all steps done for this project check out my GitHub.

--

--

JakeAllenData
JakeAllenData

Written by JakeAllenData

All articles up-to-date as of 10/25/2024. 📝 Website: allenjakewebsite.com

Responses (2)