Building a Movie Recommendation System With SVD

Ever found yourself lost in the endless sea of movies, struggling to decide what to watch next? Our movie recommendation system is here to solve that problem! Leveraging Singular Value Decomposition (SVD) and a TF-IDF-based similarity model, we built an efficient way to serve personalized movie recommendations.

Building a Movie Recommendation System With SVD
Rahul Saini

Rahul Saini

Published On

November 16, 2024

The Challenge: Making Sense of Massive Movie Data

Streaming platforms and movie databases hold thousands (even millions) of titles, making it overwhelming to manually find something appealing. The key challenge? Handling large datasets efficiently while ensuring recommendations are accurate and relevant.

How We Solved It with SVD

SVD helps break down large, complex datasets into smaller, more manageable components. By reducing irrelevant noise and uncovering hidden patterns in movie-related data (such as genres, keywords, and overviews), we make similarity-based recommendations more effective.

How Our System Works

  • Processing the Top 10,000 Movies – We narrow down our dataset for efficient similarity analysis.
  • TF-IDF Vectorization – Converts text-based data (genres, keywords, and overviews) into a numerical format.
  • Dimensionality Reduction with Truncated SVD – Compresses the dataset into 70 key features while preserving important information.
  • Normalization – Ensures fair similarity comparisons by scaling vectors equally.
  • Cosine Similarity – Measures how closely movies are related based on their reduced feature vectors.

The Methodology: From Raw Data to Smart Recommendations

Data Preparation

We started with the TMDB Movie Dataset, which initially had over 1.1 million records. After filtering only English-language, released movies with valid titles, we cut it down to 596,384 movies.

Feature Engineering

The magic happens with TF-IDF Vectorization, which assigns unique weights to words in a movie's description. This helps distinguish movies based on their key attributes. Using Truncated SVD, we then reduce our feature matrix to 70 dimensions, balancing efficiency and accuracy.

Building the Recommendation Engine

With a cosine similarity matrix, we compute how close two movies are in our reduced feature space. A simple function allows users to search for a movie and receive the top 10 most similar recommendations.

Putting It to the Test: Experiments & Results

We evaluated our system using precision, recall, and F1-score:

  • Precision: 90% (90% of recommended movies were actually relevant!)
  • Recall: 82% (Our system retrieved 82% of all relevant movies.)
  • F1-Score: 86% (A balance between precision and recall.)

Key Observations:

  • Using fewer than 50 SVD components reduced accuracy significantly.
  • Increasing components to 100 provided slight improvements but required more computational power.

What Does This Look Like in Action?

Here’s what our system delivers:

  • Current Top 10 Movies: Fetches trending movies.
current_top_movies = recommender.get_current_top_movies()
recommender.show_results(current_top_movies)
Project Image
  • All-Time Top 10 Movies: Recommends the best movies ever.
top_movies = recommender.get_top_movies(10)
recommender.show_results(top_movies)
Project Image
  • Trending Movies: Finds what's hot right now.
trending_movies = recommender.get_trending_movies()
recommender.show_results(trending_movies)
Project Image
  • Movie-Based Recommendations: Users can enter a movie title (e.g., Spider-Man), and the system returns the 10 most similar movies!
recommendations = recommender.get_recommendations('Spider-Man')
recommender.show_results(recommendations)
Project Image

Final Thoughts

This project proves how machine learning can simplify decision-making in entertainment. By using SVD and TF-IDF, we built a fast and accurate recommendation system that could be expanded to personalized recommendations based on user preferences and watch history.

Get in Touch
with me.

Whether you have questions, inquires or just want to say hello, I'd love to hear from you. Reach out using the below details.