I’m part of a small group of people_{ }who have a spreadsheet with every movie I’ve seen and a score out of 100 for each. And actually it’s totally possible I’m the _{only }person with such a spreadsheet… Armed with this as well as a few skills I’ve unlocked as a data science student, I’m going to find the review source that is *most closely* related to my own, personal take on movies. Specifically, I’ll be looking at IMDb, Metacritic, Metacritic User Reviews, and Rotten Tomatoes.

In order to put a number to the relation between these review sources, I’ll need to calculate the *correlation coefficient*.

The *correlation coefficient* is a number between -1 and 1.

-1 indicates a perfect negative correlation.

0 indicates no correlation whatsoever.

1 indicates perfect positive correlation.

Here’s an image I saw during my lectures and later found again on Wikipedia that explains correlation visually:

To get my data to look like this, all I have to do is locate, import, clean, trim, align, and standardize all my datasets. Then just plot the review source against my personal ratings.

I felt this data was a little difficult to interpret visually like this. So, here are the actual correlation coefficients for each review source:

Here’s what I’m learning:

IMDb has the highest correlation.

But not only is it not by much, it’s not very high either…

I had another curiosity pop up while I was working on this. I wondered what the distributions of these reviews looked like.

So here they are:

Something we can learn from this is that Rotten Tomatoes taken as “% Fresh” has a far greater spread than the other sources with scores out of 100. We’re also seeing my tendency to hop on multiples of 10.

This was a lot of fun to explore and I hope to use this dataset for even more interesting topics in the future. Thanks for reading.

Oh and according to my personal scores, the worst movie I’ve ever seen is *Grown Ups 2* and the best is *Toy Story 2* 😊