Popular Movies Dataset

The TMDb (The Movie Database) platform is a widely used, community-driven source of structured movie and television metadata. It provides detailed information on movie titles, genres, release years, cast and crew, and user ratings on a 1–10 scale. TMDb supports a broad catalog of films spanning multiple decades and genres, and its open API has made it a popular resource for developers and researchers working on media analytics, recommendation systems, and content-based filtering.

In this class, we use a curated subset of TMDb focused on films released between 2000 and 2024, matched with box office data from the Kaggle “Movies Box Office Dataset.” This integrated dataset combines user ratings and genre metadata from TMDb with financial performance metrics—domestic, international, and worldwide gross. It enables students to explore how audience reception aligns with commercial outcomes through hands-on exercises in data wrangling, visualization, and statistical modeling.

 

Measurements Within Data
Feature Description
Title Title of the movie.
Domestic Gross Total box office earnings in the domestic market (usually U.S.).
International Gross Total box office earnings in international markets.
Worldwide Gross Total global box office earnings (domestic + international).
Release Year The release year of the movie.
Genres The genres associated with the movie (e.g., Action, Drama, Comedy).
Ratings Average audience rating from TMDB (e.g., 7.5/10).

 

https://www.kaggle.com/datasets/aditya126/movies-box-office-dataset-2000-2024