Create Your Own Movie Recommendation System Using Python

Do you wonder how Netflix suggests movies that align your interests so much? Or maybe you want to build a system that can make such suggestions to its users too?

If your answer was yes, then you’ve come to the right place as this article will teach you how to build a movie recommendation system by using Python. 

However, before we start discussing the ‘How’ we must be familiar with the ‘What.’

Check out our data science training to upskill yourself

Recommendation System: What is It?

Recommendation systems have become a very integral part of our daily lives. From online retailers like Amazon and Flipkart to social media platforms like YouTube and Facebook, every major digital company uses recommendation systems to provide a personalized user experience to their clients.

Some examples of recommendation systems in your everyday life include:

  • The suggestions you get from Amazon when you buy products are a result of a recommender system.
  • YouTube uses a recommender system to suggest videos suited for your taste.
  • Netflix has a famous recommendation system for suggesting shows and movies according to your interests. 

A recommender system suggests users products by using data. This data could be about the user’s entered interests, history, etc. If you’re studying machine learning and AI, then it’s a must to study recommender systems as they are becoming increasingly popular and advanced. 

Types of Recommendation Systems

There are two types of recommendation systems:

1. Collaborative Recommendation Systems

A collaborative recommendation system suggests items according to the likeness of similar users for that item. It groups users with similar interests and tastes and suggests their products accordingly. 

For example, suppose you and one other user liked Sholay. Now, after watching Sholay and liking it, the other user liked Golmaal. Because you and the other user have similar interests, the recommender system would suggest you watch Golmaal based on this data. This is collaborative filtering. 

2. Content-Based Recommendation Systems

A content-based recommender system suggests items based on the data it receives from a user. It could be based on explicit data (‘Likes’, ‘Shares’, etc.) or implicit data (watch history). The recommendation system would use this data to create a user-specific profile and would suggest items based on that profile. 

Building a Basic Movie Recommendation System

Now that we have covered the basics of recommender systems, let’s get started on building a movie recommendation system. 

We can start building a movie recommendation system Python-based by using the full MovieLens dataset. This dataset contains more than 26 million ratings, 750,000 tag applications that are applied to over 45,000 movies. The tag genome data present in this dataset with more than 12 million relevance scores. 

We are using the full dataset for creating a basic movie recommendation system. However, you’re free to use a smaller dataset for this project. First, we’ll have to import all the required libraries:

A basic movie recommendation system Python-based would suggest movies according to the movie’s popularity and genre. This system works based on the notion that popular movies with critical acclamation will have a high probability of getting liked by the general audience. Keep in mind that such a movie recommendation system doesn’t give personalized suggestions. 

To implement it, we will sort the movies according to their popularity and rating and pass in a genre argument to get a genre’s top movies:


md = pd. read_csv(‘../input/movies_metadata.csv’)



adult belongs_to_collection budget genres video id imdb_id original_title overview revenue title
False (‘id’L 10194, ‘name’: ‘Toy Story Collection’) 30000000 [{‘id’: 16, ‘name’: ‘Animvation’}… False 862 tt0114709 Toy Story Led by Woody, Andy’s toys live happily… 373554033 Toy Story
1 False NaN 65000000 {{‘id’: 12, ‘name’: ‘Adventure’}… False 8844 tt0113497 Jumanji When siblings Judy and Peter… 262797249 Jumanji
2 False (‘id’:  119050, ‘name’: ‘Grumpy Old Men) 0 {{‘id’: 10749, ‘name’: ‘Romance’}… False 15602 tt0113228 Grumpy Old Men A family wedding reignites the ancient… 0 Grumpier Old Men
3 False NaN 16000000 {{‘id’: 35, ‘name’: ‘Comedy’}… False 31357 tt0114885 Waiting to Exhale Cheated on, mistreated and stepped… 81452156 Waiting to Exhale


md[‘genres’] = md[‘genres’].fillna(‘[]’).apply(literal_eval).apply(lambda x: [i[‘name’] for i in x] if isinstance(x, list) else [])

The Formula for Our Chart

For creating our chart of top movies, we used the TMDB ratings. We will use IMDB’s weighted rating formula to create our chart, which is as follows:

Weighted Rating (WR) = (iaouaouaouaouaou)

Here, v stands for the number of votes a movie got, m is the minimum number of votes a movie should have to get on the chart, R stands for the average rating of the movie, and C is the mean vote for the entire report. 

Building the Charts

Now that we have the dataset and the formula in place, we can start building the chart. We’ll only add those movies to our charts that have a minimum of 95% votes. We’ll begin with creating a top 250 chart. 


vote_counts = md[md[‘vote_count’].notnull()][‘vote_count’].astype(‘int’)

vote_averages = md[md[‘vote_average’].notnull()][‘vote_average’].astype(‘int’)

C = vote_averages.mean()





m = vote_counts.quantile(0.95)





md[‘year’] = pd.to_datetime(md[‘release_date’], errors=’coerce’).apply(lambda x: str(x).split(‘-‘)[0] if x != np.nan else np.nan)


qualified = md[(md[‘vote_count’] >= m) & (md[‘vote_count’].notnull()) & (md[‘vote_average’].notnull())][[‘title’, ‘year’, ‘vote_count’, ‘vote_average’, ‘popularity’, ‘genres’]]

qualified[‘vote_count’] = qualified[‘vote_count’].astype(‘int’)

qualified[‘vote_average’] = qualified[‘vote_average’].astype(‘int’)



(2274, 6)

As you can see, to get a place on our chart a movie must have a minimum of 434 votes. You may have noticed that the average rating a movie must have to enter our chart is 5.24. 


def weighted_rating(x):

    v = x[‘vote_count’]

    R = x[‘vote_average’]

    return (v/(v+m) * R) + (m/(m+v) * C)


qualified[‘wr’] = qualified.apply(weighted_rating, axis=1)


qualified = qualified.sort_values(‘wr’, ascending=False).head(250)

With all of this in place, let’s build the chart:

upGrad’s Exclusive Data Science Webinar for you –

How upGrad helps for your Data Science Career?

Explore our Popular Data Science Certifications

Top Movies Overall




title year vote_count vote_average popularity genres wr
15480 Inception 2010 14075 8 29.1081 [Action, Thriller, Science Fiction, Mystery, A… 7.917588
12481 The Dark Knight 2008 12269 8 123.167 [Drama, Action, Crime, Thriller] 7.905871
22879 Interstellar 2014 11187 8 32.2135 [Adventure, Drama, Science Fiction] 7.897107
2843 Fight Club 1999 9678 8 63.8696 [Drama] 7.881753
4863 The Lord of the Rings: The Fellowship of the Ring 2001 8892 8 32.0707 [Adventure, Fantasy, Action] 7.871787
292 Pulp Fiction 1994 8670 8 140.95 [Thriller, Crime] 7.868660
314 The Shawshank Redemption 1994 8358 8 51.6454 [Drama, Crime] 7.864000
7000 The Lord of the Rings: The Return of the King 2003 8226 8 29.3244 [Adventure, Fantasy, Action] 7.861927
351 Forrest Gump 1994 8147 8 48.3072 [Comedy, Drama, Romance] 7.860656
5814 The Lord of the Rings: The Two Towers 2002 7641 8 29.4235 [Adventure, Fantasy, Action] 7.851924
256 Star Wars 1977 6778 8 42.1497 [Adventure, Action, Science Fiction] 7.834205
1225 Back to the Future 1985 6239 8 25.7785 [Adventure, Comedy, Science Fiction, Family] 7.820813
834 The Godfather 1972 6024 8 41.1093 [Drama, Crime] 7.814847
1154 The Empire Strikes Back 1980 5998 8 19.471 [Adventure, Action, Science Fiction] 7.814099
46 Se7en 1995 5915 8 18.4574 [Crime, Mystery, Thriller]

Voila, you have created a basic movie recommendation system Python-based! 

We will now narrow down our recommender system’s suggestions to genre-based so it can be more precise. After all, it is not necessary for everyone to like The Godfather equally. 

Top Data Science Skills to Learn

Narrowing Down the Genre

So, now we’ll modify our recommender system to be more genre-specific:


s = md.apply(lambda x: pd.Series(x[‘genres’]),axis=1).stack().reset_index(level=1, drop=True) = ‘genre’

gen_md = md.drop(‘genres’, axis=1).join(s)


def build_chart(genre, percentile=0.85):

    df = gen_md[gen_md[‘genre’] == genre]

    vote_counts = df[df[‘vote_count’].notnull()][‘vote_count’].astype(‘int’)

    vote_averages = df[df[‘vote_average’].notnull()][‘vote_average’].astype(‘int’)

    C = vote_averages.mean()

    m = vote_counts.quantile(percentile)

    qualified = df[(df[‘vote_count’] >= m) & (df[‘vote_count’].notnull()) & (df[‘vote_average’].notnull())][[‘title’, ‘year’, ‘vote_count’, ‘vote_average’, ‘popularity’]]

    qualified[‘vote_count’] = qualified[‘vote_count’].astype(‘int’)

    qualified[‘vote_average’] = qualified[‘vote_average’].astype(‘int’)

    qualified[‘wr’] = qualified.apply(lambda x: (x[‘vote_count’]/(x[‘vote_count’]+m) * x[‘vote_average’]) + (m/(m+x[‘vote_count’]) * C), axis=1)

    qualified = qualified.sort_values(‘wr’, ascending=False).head(250)

        return qualified

We have now created a recommender system that sorts movies in the romance genre and recommends the top ones. We chose the romance genre because it didn’t show up much in our previous chart. 

Read our popular Data Science Articles

Top Movies in Romance




title year vote_count vote_average popularity wr
10309 Dilwale Dulhania Le Jayenge 1995 661 9 34.457 8.565285
351 Forrest Gump 1994 8147 8 48.3072 7.971357
876 Vertigo 1958 1162 8 18.2082 7.811667
40251 Your Name. 2016 1030 8 34.461252 7.789489
883 Some Like It Hot 1959 835 8 11.8451 7.745154
1132 Cinema Paradiso 1988 834 8 14.177 7.744878
19901 Paperman 2012 734 8 7.19863 7.713951
37863 Sing Street 2016 669 8 10.672862 7.689483
882 The Apartment 1960 498 8 11.9943 7.599317
38718 The Handmaiden 2016 453 8 16.727405 7.566166
3189 City Lights 1931 444 8 10.8915 7.558867
24886 The Way He Looks 2014 262 8 5.71127 7.331363
45437 In a Heartbeat 2017 146 8 20.82178 7.003959
1639 Titanic 1997 7770 7 26.8891 6.981546
19731 Silver Linings Playbook 2012 4840 7 14.4881 6.970581

Now, you have a movie recommender system that suggests top movies according to a chosen genre. We recommend testing out this recommender system with other genres too such as Action, Drama, Suspense, etc. Share the top three movies in your favourite genre the recommender system suggests in the comment section below

Learn More About a Movie Recommendation System 

As you must have noticed by now, building a movie recommendation system Python-based, is quite simple. All you need is a little knowledge of data science and a little effort to create a fully-functional recommender system. 

However, what if you want to build more advanced recommender systems? What if you want to create a recommender system that a large corporate might consider using? 

If you’re interested in learning more about recommender systems and data science, then we recommend taking a data science course. With a course, you’ll learn all the fundamental and advanced concepts of data science and machine learning. Moreover, you’ll study from industry experts who will guide you throughout the course to help you avoid doubts and confusion.

At upGrad, we offer multiple data science and machine learning courses. You can pick anyone from the following depending on your interests:

Apart from these courses, we offer many other courses in data science and machine learning. Be sure to check them out!

Final Thoughts

You now know how to build a movie recommendation system. After you have created the system, be sure to share it with others and show them your progress. Recommender systems have a diverse range of applications so learning about them will surely give you an upper hand in the industry.

What is collaborative filtering and what are its types?

Collaborative filtering is a type of recommendation system that approaches building a model based on the user’s preferences. The history of the users acts as the dataset for collaborative filtering. Collaborative filtering is of 2 types that are mentioned below:

1. User-based collaborative filtering : The idea behind this type of collaborative filtering is that we take a user for preference, let's say “A” and find other users having similar preferences and then providing “A” those preferences of these users that it has not encountered yet.
Item-based collaborative filtering : Here instead of finding the users with similar preferences, we find movies similar to “A”’s taste and recommend those movies that it has not watched yet.

What are the advantages and disadvantages of content-based filtering?

The content-based filtering collects the data from the user and suggests the items accordingly. Some of its advantages, as well as disadvantages, are mentioned below:
1. Unlike collaborative filtering, the model does not need to collect data about other users with similar preferences as it takes the suggestions from the primary user itself.
2. The model can recommend some of the best movies to you according to your preferences that only a few others have watched.
1. This technique requires a lot of information about a certain domain so the quality of features it provides is more or less the same as the hand-engineered features.
2. Its ability to recommend movies is limited since it only works according to the existing interests of the users.

Which popular applications use collaborative filtering algorithms?

The collaborative filtering algorithm is becoming the primary driving algorithm for many popular applications. More and more businesses are focusing on delivering rich personalized content. For example, you probably have seen this message on many e-commerce websites Customers who buy this also bought.
The following are some of the applications having a popular user base worldwide:
1. YouTube uses this algorithm along with some other powerful algorithms to provide video recommendations on the home page.
2. E-commerce websites such as Amazon, Flipkart, and Myntra also use this algorithm to provide product recommendations.
3. Video streaming platforms are the biggest example here which use user rating, average rating, and related content to provide personalized suggestions.

Want to share this article?

Prepare for a Career of the Future

Leave a comment

Your email address will not be published. Required fields are marked *

Our Popular Data Science Course

Get Free Consultation

Leave a comment

Your email address will not be published. Required fields are marked *

Get Free career counselling from upGrad experts!
Book a session with an industry professional today!
No Thanks
Let's do it
Get Free career counselling from upGrad experts!
Book a Session with an industry professional today!
Let's do it
No Thanks