I’ve seen 677 movies. The only reason I know that number is because I have a dataset with all of those movies. This dataset gives me power that no one else has. It’s time again to call upon that power- this time to try and optimize my movie selection process so as to not waste my time watching shitty movies like Jack and Jill.
For every movie in this master list I’ve indicated yes or no whether it was worth watching. So, that’s what I’m going to try to predict.
Now. To assess this predictor I’ll need a couple of things:
1 – A baseline
2 – Metrics to understand the results
3 – A threshold at which pushing these keys becomes worth my time
The baseline is pretty simple because apparently about 75% of all movies I’ve seen were worth my time. So, just assuming I’d enjoy every movie, we’d be right about 75% of the time. Now this isn’t quite right because I don’t pick movies at random to watch; I watch the ones I think I’ll enjoy… but we’re just gonna… ignore that…
The metrics I’ll use for evaluation are accuracy and precision. I’m choosing precision over recall because, I mean, this is all about not wasting my time watching shitty movies. Which is to say that I want to minimize the number of false positives- or movies I’m predicted to enjoy but won’t.
My threshold is pretty much anything better than the baseline. Even just a couple of points. I mean… the baseline is already pretty high and I feel like as I see more movies I’ll kinda be building this model directly into my brain and just get better at rejecting movies I won’t like… At least when we’re talking about the data I currently have on these films. And I’ll talk about that more later.
So. I built a couple of these predictors or models. I started with logistic regression (because my assignment called for at least one linear model) and it ended up about on par with or maybe even slightly worse than the baseline with a validation accuracy score of 74%. That’s no good.
My random forest classifier model did better, though, with 76% accuracy on my validation set and 81% on my final test set. But I think the precision will be even more telling. Here’s a confusion matrix!
This is a confusion matrix! And it’s telling me that my precision is pretty bad. When my model is wrong, it’s usually giving false positives. —which is very, very bad news— This means that, while my model is less likely to miss movies I would enjoy (i.e. has a high recall. (I don’t care)), it is simultaneously more likely to predict I will enjoy a movie that I won’t. Damn it.
Hmm. Okay. Well. What did we learn here?
Oh, I did learn that just because I think something is important, doesn’t mean it is. For example, I spent a huge amount of time ensuring all the writers and all the directors of every movie were part of the model’s training process. But a model trained with no writers or directors performed almost as well… It was like 71%. Ultimately I chose to keep the directors and toss the writers. Those writers were a no go because there were too many in the first place and a bunch were only observed once.
Now before you go and think you could have built a better model, slow down and let me add some perspective. Most of the information I’ve been training with isn’t very useful. 🙃
The most important features are IMDb rating, release year, and runtime- which are all super funky in their own stupid ways. Then the rest of the nearly 500 other features (quite imbalanced, I know) are just genres and directors… I tried various combinations of these features but it never really got me anywhere. Perhaps with features like novelty, timing, theme, originality, aesthetic, or fricking special effects quality, I might have built a better model. Oh well.
Ultimately this was a massive waste of time because the model isn’t useful and the baseline was high enough. Thank you and goodbye. Code here