Spotify Popularity Prediction

Predicting Song Popularity: Data Science Final Presentation

Watch our group’s full presentation on the drivers behind musical success. We discuss our objectives, the data cleaning process for over 114,000 tracks, and why genre ultimately emerged as the most powerful predictor of popularity. The presentation covers our model selection process, specifically why Random Forest outperformed linear and gradient boosting methods, and concludes with actionable insights for artists and producers looking to increase their likelihood of success on streaming platforms

Model Development & Statistical Exploration Code

Access the complete Python codebase used for this project, including data preprocessing, exploratory data analysis (EDA), and machine learning pipelines. This notebook documents the transition from raw data to a tuned Random Forest Regressor, featuring 5-fold cross-validation and a threshold sweep for classification. Key technical highlights include handling high-cardinality categorical data such as track genres, standardizing numerical features, and generating feature-importance visualizations to interpret the model’s decision-making process.

Project Methodology & Findings

View the slide deck from our EAS 5740 final project. These slides provide a visual walkthrough of our project lifecycle, starting with a distribution analysis of song popularity—which revealed a significant right-skewed distribution—through to the evaluation results for our five trained models. The deck includes detailed charts on feature importance, a breakdown of the top 10 most influential genres, and our final recommendations for the music industry.