Spotify: Can We Predict How Popular a Song Will Be Using ML?

Objective

What makes a song popular? Can we predict a song’s popularity base only on its acoustic properties (eg: how fast it is, how loud it is, etc)? To answer these questions, we’ll analyze a sample of 10,000 songs that play on Spotify:

These data are a mere sample of the more than 100,000 Spotify songs posted on Kaggle.

Along with the popularity of each song, the music data set contains lots of acoustic variables. To learn more about the acoustic variables, check out the Spotify API page.

library(ggplot2)  # for plots
library(GGally)   # for pairs plots
library(ggridges) # for joy plots
library(dplyr)    # for wrangling
library(caret)    # for machine learning algorithms

music <- read.csv("https://www.macalester.edu/~ajohns24/data/spotify_18.csv")

1: Preparing & getting to know the data

Seeing the 6 most and the 6 least popular songs in the dataset

music %>%
  arrange(desc(popularity)) %>% 
  head(6)

##     artist_name               track_id
## 1 Ariana Grande 2rPE9A1vEgShuZxxzR2tZH
## 2        Halsey 5p7ujcrUXASCNwRaWNHR1C
## 3  XXXTENTACION 3ee8Jmje8o58CHK66QrVC2
## 4       6ix9ine 2E124GmJRnBJuXbTb4cPUB
## 5    Little Mix 6rrTr2HEAzlpC4KWZxF3S1
## 6      Rita Ora 6xtcFXSo8H9BZN637BMVKS
##                               track_name acousticness danceability
## 1                          thank u, next        0.280        0.724
## 2                             Without Me        0.297        0.752
## 3                                   SAD!        0.258        0.740
## 4 FEFE (feat. Nicki Minaj & Murda Beatz)        0.088        0.931
## 5      Woman Like Me (feat. Nicki Minaj)        0.173        0.757
## 6                        Let You Love Me        0.288        0.531
##   duration_ms energy instrumentalness liveness loudness mode speechiness
## 1      207333  0.647         0.00e+00   0.1020   -5.642    1      0.0658
## 2      201661  0.488         9.11e-06   0.0936   -7.050    1      0.0705
## 3      166606  0.613         3.72e-03   0.1230   -4.880    1      0.1450
## 4      179405  0.387         0.00e+00   0.1360   -9.127    1      0.4120
## 5      228207  0.849         0.00e+00   0.0878   -3.424    0      0.0536
## 6      190000  0.854         0.00e+00   0.0773   -3.149    1      0.1980
##     tempo valence popularity time_signature_4
## 1 106.960   0.435        100             TRUE
## 2 136.041   0.533         95             TRUE
## 3  75.023   0.473         92             TRUE
## 4 125.978   0.376         90             TRUE
## 5 150.036   0.826         90             TRUE
## 6  94.419   0.456         90             TRUE

music %>%
  arrange(popularity) %>% 
  head(6)

##    artist_name               track_id           track_name acousticness
## 1     7j2music 4JZkJVBZH2PE0typUqzN9F Welcome to Loud City      0.26200
## 2     7j2music 59JeX6N0kfnzKi016LC9LW              So Easy      0.11600
## 3        7liwa 40kCuwvYw5SCX7njHnKfhe                 Nari      0.30600
## 4     Ace Amin 71k1FxiK1PmzCsRd8bV7Nk          Punctuation      0.02950
## 5   Adam Jones 4HfJTAR2lu5k15W6DOfe7t              Forever      0.00137
## 6 Alisa Turner 5j6CvdHg6ddueHPYqQwMEa        Only My Jesus      0.49900
##   danceability duration_ms energy instrumentalness liveness loudness mode
## 1        0.825      230191  0.510         0.000651   0.0575   -5.838    1
## 2        0.843      191060  0.217         0.010900   0.0854  -10.062    0
## 3        0.798      202710  0.565         0.000000   0.0641   -9.814    0
## 4        0.708      311458  0.678         0.000000   0.0974   -7.459    0
## 5        0.726      183432  0.307         0.476000   0.1190  -12.301    1
## 6        0.450      241347  0.465         0.000000   0.0788   -9.419    1
##   speechiness   tempo valence popularity time_signature_4
## 1      0.0557 117.074  0.3180          1             TRUE
## 2      0.2000  93.031  0.0396          1             TRUE
## 3      0.3590 148.953  0.6120          1             TRUE
## 4      0.2110 119.835  0.1860          1             TRUE
## 5      0.1620 130.964  0.3230          1             TRUE
## 6      0.0359 152.059  0.3020          1            FALSE

1B

Joy Division is one of the only, if not the only, band to have a data visualization tool named after them. The “Joy plot” technique is inspired by the band’s album cover.

In honor of their album cover, I construct a joy plot for the popularity of a handful of artists who are among those with the most songs in the data set.

# Filter out the artists of interest
classical <- music %>% 
  filter(artist_name %in% c("Johann Sebastian Bach", "Wolfgang Amadeus Mozart", "Waka Flocka Flame", "Eagles"))
ggplot(classical, aes(x = popularity, y = artist_name)) +
  geom_density_ridges() + 
  theme_ridges()

## Picking joint bandwidth of 2.63

1C

Here we create a new data set music_sub with the following features:

It re-defines time_signature_4 and mode as factor variables. It removes 3 variables which should not be used in our predictive model of popularity.

music$time_signature_4 = as.factor(music$time_signature_4)
music$mode = as.factor(music$mode)

music_sub <-music %>% select(acousticness, danceability, duration_ms, energy, instrumentalness, liveness, loudness, mode, speechiness, tempo, valence, popularity, time_signature_4) %>%
  droplevels()

Exercise 2: Build a Predictive Model

Our goal in this section is to build a predictive model of song popularity using the available set of predictors in music_sub. To this end, we’ll perform least squares, backward stepwise selection, and LASSO.

Least squares regressions

Here we construct a least squares regression model of popularity by all predictors in music_sub.

# Set the seed 
    set.seed(253)
    
    # Run the least squares regression model
    ls_model <- train(
      popularity ~ .,
      data = music_sub,
      method = "lm",
      trControl = trainControl(method = "cv", number = 10),
      na.action = na.omit
    )

    # Summarize the model
    summary(ls_model)

## 
## Call:
## lm(formula = .outcome ~ ., data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -35.711 -11.872  -1.863   9.923  67.537 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           3.286e+01  1.615e+00  20.343  < 2e-16 ***
## acousticness          1.269e+00  7.036e-01   1.804 0.071312 .  
## danceability          1.092e+01  1.133e+00   9.637  < 2e-16 ***
## duration_ms          -3.649e-06  1.299e-06  -2.809 0.004984 ** 
## energy               -4.214e+00  1.219e+00  -3.458 0.000547 ***
## instrumentalness     -3.532e+00  5.491e-01  -6.433 1.31e-10 ***
## liveness             -2.406e+00  9.951e-01  -2.418 0.015621 *  
## loudness              6.017e-01  4.592e-02  13.105  < 2e-16 ***
## mode1                -5.080e-01  3.274e-01  -1.551 0.120816    
## speechiness          -5.813e+00  1.453e+00  -4.002 6.32e-05 ***
## tempo                 5.604e-03  5.497e-03   1.019 0.308042    
## valence              -5.054e+00  7.394e-01  -6.835 8.69e-12 ***
## time_signature_4TRUE  9.606e-01  5.091e-01   1.887 0.059199 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.87 on 9987 degrees of freedom
## Multiple R-squared:  0.0797, Adjusted R-squared:  0.07859 
## F-statistic: 72.07 on 12 and 9987 DF,  p-value: < 2.2e-16

    # Calculate the CV MAE (2 approaches)
    ls_model$resample %>% 
      summarize(mean(MAE))

##   mean(MAE)
## 1   12.7675

The average MAE is 12.76 which means our model’s predictions are off by an average of 12.76 in predicting popularity.

Backward stepwise selection

set.seed(253)

 backstep_model <- train(
      popularity ~ .,
      data = music_sub,
      method = "leapBackward",
      tuneGrid = data.frame(nvmax = 1:13),
      trControl = trainControl(method = "cv", number = 10, selectionFunction = "best"),
      metric = "MAE",
      na.action = na.omit
    )
 
 summary(backstep_model)

## Subset selection object
## 12 Variables  (and intercept)
##                      Forced in Forced out
## acousticness             FALSE      FALSE
## danceability             FALSE      FALSE
## duration_ms              FALSE      FALSE
## energy                   FALSE      FALSE
## instrumentalness         FALSE      FALSE
## liveness                 FALSE      FALSE
## loudness                 FALSE      FALSE
## mode1                    FALSE      FALSE
## speechiness              FALSE      FALSE
## tempo                    FALSE      FALSE
## valence                  FALSE      FALSE
## time_signature_4TRUE     FALSE      FALSE
## 1 subsets of each size up to 12
## Selection Algorithm: backward
##           acousticness danceability duration_ms energy instrumentalness
## 1  ( 1 )  " "          " "          " "         " "    " "             
## 2  ( 1 )  " "          "*"          " "         " "    " "             
## 3  ( 1 )  " "          "*"          " "         "*"    " "             
## 4  ( 1 )  " "          "*"          " "         "*"    " "             
## 5  ( 1 )  " "          "*"          " "         "*"    "*"             
## 6  ( 1 )  " "          "*"          " "         "*"    "*"             
## 7  ( 1 )  " "          "*"          "*"         "*"    "*"             
## 8  ( 1 )  " "          "*"          "*"         "*"    "*"             
## 9  ( 1 )  " "          "*"          "*"         "*"    "*"             
## 10  ( 1 ) "*"          "*"          "*"         "*"    "*"             
## 11  ( 1 ) "*"          "*"          "*"         "*"    "*"             
## 12  ( 1 ) "*"          "*"          "*"         "*"    "*"             
##           liveness loudness mode1 speechiness tempo valence
## 1  ( 1 )  " "      "*"      " "   " "         " "   " "    
## 2  ( 1 )  " "      "*"      " "   " "         " "   " "    
## 3  ( 1 )  " "      "*"      " "   " "         " "   " "    
## 4  ( 1 )  " "      "*"      " "   " "         " "   "*"    
## 5  ( 1 )  " "      "*"      " "   " "         " "   "*"    
## 6  ( 1 )  " "      "*"      " "   "*"         " "   "*"    
## 7  ( 1 )  " "      "*"      " "   "*"         " "   "*"    
## 8  ( 1 )  "*"      "*"      " "   "*"         " "   "*"    
## 9  ( 1 )  "*"      "*"      " "   "*"         " "   "*"    
## 10  ( 1 ) "*"      "*"      " "   "*"         " "   "*"    
## 11  ( 1 ) "*"      "*"      "*"   "*"         " "   "*"    
## 12  ( 1 ) "*"      "*"      "*"   "*"         "*"   "*"    
##           time_signature_4TRUE
## 1  ( 1 )  " "                 
## 2  ( 1 )  " "                 
## 3  ( 1 )  " "                 
## 4  ( 1 )  " "                 
## 5  ( 1 )  " "                 
## 6  ( 1 )  " "                 
## 7  ( 1 )  " "                 
## 8  ( 1 )  " "                 
## 9  ( 1 )  "*"                 
## 10  ( 1 ) "*"                 
## 11  ( 1 ) "*"                 
## 12  ( 1 ) "*"

The predictor in the 1 predictor model is loudness. Looking at the summary of our model, we see that loudness is included in every model indicating that it is the best predictor of popularity.

plot(backstep_model)

Looking at the plot, I see that the MAE decreases significantly with an additional predictor added to model up until the model has 5 predictors. After that, the MAE is relatively the same.

   backstep_model$bestTune$nvmax

## [1] 12

   coef(backstep_model$finalModel, id = backstep_model$bestTune$nvmax)

##          (Intercept)         acousticness         danceability 
##         3.285533e+01         1.269143e+00         1.091794e+01 
##          duration_ms               energy     instrumentalness 
##        -3.649070e-06        -4.214388e+00        -3.532106e+00 
##             liveness             loudness                mode1 
##        -2.406279e+00         6.017404e-01        -5.079746e-01 
##          speechiness                tempo              valence 
##        -5.813490e+00         5.603849e-03        -5.053561e+00 
## time_signature_4TRUE 
##         9.606084e-01

backstep_model$results

##    nvmax     RMSE   Rsquared      MAE    RMSESD RsquaredSD     MAESD
## 1      1 16.06074 0.05606267 12.91315 0.3204538 0.01010604 0.2098705
## 2      2 16.00292 0.06313439 12.87448 0.3432967 0.01242319 0.2341260
## 3      3 15.95057 0.06914477 12.82616 0.3542820 0.01289069 0.2469815
## 4      4 15.93250 0.07153704 12.81453 0.3642571 0.01542394 0.2575261
## 5      5 15.89390 0.07599532 12.77347 0.3620911 0.01606566 0.2490064
## 6      6 15.88231 0.07734374 12.76765 0.3705848 0.01668448 0.2598045
## 7      7 15.88142 0.07740947 12.77148 0.3724037 0.01690682 0.2653889
## 8      8 15.88000 0.07749397 12.77376 0.3710002 0.01595113 0.2639313
## 9      9 15.88028 0.07749334 12.77298 0.3699003 0.01638701 0.2633252
## 10    10 15.87793 0.07776160 12.77353 0.3713698 0.01644213 0.2685246
## 11    11 15.87263 0.07838279 12.76940 0.3716256 0.01668481 0.2693489
## 12    12 15.87059 0.07861446 12.76750 0.3740550 0.01669409 0.2705485
## 13    13 15.87059 0.07861446 12.76750 0.3740550 0.01669409 0.2705485

The 10 fold CV MAE for the “best” model is 12.75957.

LASSO

lambda_grid <- 10^seq(-3, 0.5, length = 100)

lasso_model <- train(
    popularity ~ .,
    data = music_sub,
    method = "glmnet",
    tuneGrid = data.frame(alpha = 1, lambda = lambda_grid),
    trControl = trainControl(method = "cv", number = 10, selectionFunction = "oneSE"),
    metric = "MAE",
    na.action = na.omit
)

    plot(lasso_model$finalModel, xvar = "lambda", label = TRUE, col = rainbow(15))

    coef(lasso_model$finalModel, 1)

## 13 x 1 sparse Matrix of class "dgCMatrix"
##                               1
## (Intercept)          28.6239105
## acousticness          .        
## danceability          4.8684752
## duration_ms           .        
## energy                .        
## instrumentalness     -1.4110102
## liveness              .        
## loudness              0.3544421
## mode1                 .        
## speechiness           .        
## tempo                 .        
## valence               .        
## time_signature_4TRUE  .

    coef(lasso_model$finalModel, 0.2)

## 13 x 1 sparse Matrix of class "dgCMatrix"
##                                  1
## (Intercept)           3.124019e+01
## acousticness          2.087322e-01
## danceability          9.455856e+00
## duration_ms          -1.805861e-06
## energy               -2.175641e+00
## instrumentalness     -3.091832e+00
## liveness             -2.060220e+00
## loudness              4.965490e-01
## mode1                -4.025813e-02
## speechiness          -3.695737e+00
## tempo                 .           
## valence              -3.560391e+00
## time_signature_4TRUE  4.624382e-01

If Lamba equals 1, then there are 3 variables in the model: danceability, instrumentalness, and loudness.

One of the most persistent variables is loudness.

One of the least persistent variables is tempo.

A

    plot(lasso_model)

    plot(lasso_model, xlim = c(0,0.6), ylim = c(12.75,12.8))

    lasso_model$bestTune

##    alpha    lambda
## 78     1 0.5274997

B

coef(lasso_model$finalModel, 0.5094)

## 13 x 1 sparse Matrix of class "dgCMatrix"
##                               1
## (Intercept)          28.6745152
## acousticness          .        
## danceability          7.2176889
## duration_ms           .        
## energy                .        
## instrumentalness     -2.3463701
## liveness             -1.0493408
## loudness              0.3956689
## mode1                 .        
## speechiness          -0.3998064
## tempo                 .        
## valence              -1.3125838
## time_signature_4TRUE  .

The remaining variables and coefficients are:

danceability: 7.2176 intrumentalness: -2.34637 liveness: -1.04934 loudness: 0.39566 speechiness: -0.39980 valence: -1.31258

C

lasso_model$resample %>% 
  summarize(mean(MAE))

##   mean(MAE)
## 1  12.82685

Exercise 5: Reflection & a final mode

Lasso Model MAE: 12.82 Backstep Model MAE: 12.75 Least Squares Regression Model: 12.76

Basedon the average MAE’s, I’d say that the backstep model is the best since it has the lowest MAE and is relatively simple at 5 variables.

lasso_model$results %>% 
  filter(lambda == lasso_model$bestTune$lambda)

##   alpha    lambda     RMSE   Rsquared      MAE    RMSESD RsquaredSD
## 1     1 0.5274997 15.95739 0.07070021 12.82685 0.3003048 0.01467597
##       MAESD
## 1 0.1904266

     result_df <- data.frame(resid = resid(lasso_model), fitted = fitted(lasso_model))

      ggplot(result_df, aes(x = fitted, y = resid)) + 
        geom_point() + 
        geom_hline(yintercept = 0)

Is it right? Based on the residual plot, the LASSO model doesn’t seem to be right. The residuals aren’t random and have clear patterns.

Is the model strong? The R squared is weak at 0.07 so I would say the model is weak.

Does the model produce accurate predictions? The MAE is 12.85 which is relatively large compared to the popularity levels we are predicting so the model does not produce accurate predictions.

Conclusion

Although our model isn’t great, I would recommend music execs to increase danceability as that is relatively significant and has a high coefficient suggesting it has a sizeable impact on popularity.