Who Will Be the Next U.S. President? A Data-Driven Approach
Using data science to analyze the country’s decision-making process in electing the next president of the United States. A data-driven approach to forecasting election outcomes.
Article Updated: 10/26/2024
Introduction
Data has become an essential tool in understanding trends and patterns, even in politics. By analyzing factors like polling data, economic indicators, and historical voting behavior, we can start to make educated predictions about who might become the next president of the United States. With the rise of machine learning and data science, the methods used to explore these trends are more advanced than ever. In this article, we’ll explore how these technologies are being applied to predict the outcome of future elections.
Data
The data from this project was constructed using ChatGPT taking in data from several different sources that include 538, U.S. Bureau of Labor Statistics, Investopedia, and MIT Election Data Plus Science Lab since 1912.
With that said, I hate to say there is no direct website where all of this data came from, also keep in mind much of the data contains estimates formed by ChatGPT from taking in many different sources (pre-election night).
For full details of the data and steps within this project, feel free to investiage MyGitHub.
“Keep in mind the data used for this project are rough estimates, used from mulitple sources, the data is not perfect, excluding electoral votes and who won each state on election night.”
Election Night | State Electoral Vote Analysis
The results show that Nevada (89.3%), New Mexico (85.7%), and Colorado (82.1%) consistently indicate the presidential winner, making them highly predictive states. On the other hand, states like Texas (57.1%) and Mississippi (57.1%) are less predictive but still reflect moderate alignment with the eventual winner.
Swing states such as Florida (71.4%) and Michigan (71.4%) have moderately high percentages of aligning with the winning candidate, making them critical battlegrounds due to their large electoral vote counts. States like Kansas (53.6%) and Alaska (50.0%) are less reliable in predicting the winner, indicating greater variability in outcomes.
The data highlights key battlegrounds like Wisconsin (71.4%) and Arizona (71.4%), which, while competitive, frequently align with the overall winner. Overall, swing states such as Florida, Texas, and North Carolina play a pivotal role in determining the presidential winner due to their competitiveness and significant electoral votes.
The results show that states like Alaska (92.9%), Kansas (89.3%), and North Dakota (85.7%) are strong Republican strongholds, consistently supporting Republican candidates. Indiana (78.6%) and Ohio (75.0%) also lean Republican but remain competitive in some elections.
Key swing states like Florida (57.1%) and Pennsylvania (60.7%) show moderate Republican alignment, making them critical battlegrounds due to their large electoral vote counts. Conversely, Democratic-leaning states like California (46.4%) and New York (46.4%) have shifted away from Republican candidates, reinforcing their importance for Democrats.
States like Minnesota (32.1%) and Hawaii (28.6%) show stronger Democratic preferences, while Washington, D.C. (7.1%) remains overwhelmingly Democratic. Overall, swing states with moderate Republican percentages, such as Florida, Pennsylvania, and North Carolina, play a pivotal role in determining election outcomes.
Machine Learning Section
Model Prediction Value
The value to be calculated by the model is simply Electoral Votes…
And in case you don’t know: An electoral vote is a vote cast by a member of the U.S. Electoral College to elect the President and Vice President. Each state has a set number of electors, based on its representation in Congress. In a presidential election, voters choose electors, who then cast their votes. A candidate needs at least 270 out of 538 electoral votes to win.
Machine Learning Model Results | Ridge Regression
Machine Learning Model Used: Ridge Regression (Simple)
For full details feel free to investiage the code: MyGitHub
In this section will cover 3 charts that explain:
- The most important pre-election factors for a president to win the election. 🤔
- The historical results of the Machine Learning Model (How well has it predicted past presidency). 📝
- The predictions that the model is making this year. 🤫
The Ridge Regression model highlights several key factors in predicting presidential election outcomes:
- Approval Ratings: The most influential feature. Higher approval significantly boosts election chances.
- Campaign Efforts: More speeches or appearances strongly correlate with success.
- State Polls: Polls from key states (e.g., Virginia, Maine, Nevada, Ohio, Iowa) are critical indicators, likely because these states reflect broader voting trends.
- Debate Performance: Better debate scores improve chances of winning.
- Scandals/Controversies: More scandals negatively impact a candidate’s odds.
- Economic Indicators: Unemployment rate and interest rates play a role, though less important than approval and campaign efforts.
- Experience and Media Presence: Years in politics and media visibility contribute but are less decisive.
- Party: Party affiliation has some effect but is less significant than candidate-specific factors.
In summary, public approval, key swing state polls, campaign visibility, and economic conditions are the strongest predictors of electoral success.
Machine Learning Model | Historical Predictions
Keep in mind the data used for this projects are rough estimates, used from mulitple sources, the data is not perfect…
Model Performance:
- Accuracy: The model correctly predicted the winning candidate in all six elections since 2000.
- Mean Absolute Error (MAE): On average, the model was off by just 5.39 electoral votes per candidate.
- Close Predictions: In several cases, such as Barack Obama (2012) and George W. Bush (2004), the predicted electoral votes were within 2–4 votes of the actual results.
- Consistent Success: The model has shown the ability to correctly forecast both Democratic and Republican victories, reinforcing its versatility across different election cycles.
This strong track record, combined with a low MAE, gives me confidence in this model for future use.
Who will be the 2024 President of the United States?
Based on my model’s predictions for the 2024 U.S. presidential election, Donald Trump is projected to win with 285 electoral votes, narrowly defeating Kamala Harris, who is predicted to receive 253 electoral votes.
“This is as of 9/22/24 and 10/25/24, but this might change come election night 11/5/24, so come back to this then.”
This forecast points to a tight race, with Trump expected to secure the presidency by a margin of just 32 electoral votes. Both candidates appear to have strong support, but according to the model, Trump is likely to win key swing states or regions that put him over the 270-vote threshold needed for victory. While Harris runs a close race, she is predicted to fall just short, setting the stage for what could be a highly contested and closely watched election.
Conclusion & Feedback
Overall I thought this project was very interesting, most importantly I am curious to see how this data holds come election-time. Any questions or feedback are always welcome, feel free to post them below. For full details of all steps performed in this project check out MyGitHub.