I’m part of a small group of people who have a spreadsheet with every movie I’ve seen and a score out of 100 for each. And actually it’s totally possible I’m the only person with such a spreadsheet… Armed with this as well as a few skills I’ve unlocked as a data science student, I’m going to find the review source that is most closely related to my own, personal take on movies. Specifically, I’ll be looking at IMDb, Metacritic, Metacritic User Reviews, and Rotten Tomatoes.
In order to put a number to the relation between these review sources, I’ll need to calculate the correlation coefficient.
The correlation coefficient is a number between -1 and 1.
-1 indicates a perfect negative correlation.
0 indicates no correlation whatsoever.
1 indicates perfect positive correlation.
Here’s an image I saw during my lectures and later found again on Wikipedia that explains correlation visually:
To get my data to look like this, all I have to do is locate, import, clean, trim, align, and standardize all my datasets. Then just plot the review source against my personal ratings.
I felt this data was a little difficult to interpret visually like this. So, here are the actual correlation coefficients for each review source:
Here’s what I’m learning:
IMDb has the highest correlation.
But not only is it not by much, it’s not very high either…
I had another curiosity pop up while I was working on this. I wondered what the distributions of these reviews looked like.
So here they are:
Something we can learn from this is that Rotten Tomatoes taken as “% Fresh” has a far greater spread than the other sources with scores out of 100. We’re also seeing my tendency to hop on multiples of 10.
This was a lot of fun to explore and I hope to use this dataset for even more interesting topics in the future. Thanks for reading.
Oh and according to my personal scores, the worst movie I’ve ever seen is Grown Ups 2 and the best is Toy Story 2 😊