Datasets

Parts 1 of the course revisit basic statistical knowledge and fundamental data-analytics procedures by leveraging a sandbox of Excel datasets for which most students have some pre-existing intuitive understanding:

  • All dogs dataset: matrix of 200 of the most popular dog breeds.The matrix consists of 194 dog breeds x 22 measurements of characteristics (trainability, friendliness, intelligence, etc.) compiled by the American Kennel Club (AKC)
  • All music dataset: matrix of the 20,000 most popular songs on Spotify.The matrix consists of music characteristics 19,947 songs x 15 characteristics (e.g. loudness, vocals, tempo, etc.) which are used by Spotify to personalize music recommendations for users.
  • All food dataset: matrix of 9,000 of the most popular foods compiles by the USDA.   The matrix consists of 8,889 foods x 70 nutritional categories (e.g. carbs, vitamins, minerals) that are used to inform ingredient labels on commercial food products.
  • All beer dataset: matrix of 3,000 of the most popular beers in the United States.   The matrix consists of 3,197 beers x 20 taste measurements (bitterness, sweetness, ABV, etc.)
  • All movies dataset: matrix of 5,000 of the most popular movies from 1920-2020. The matrix consists of 4,822 movies x 27 measurements (e.g. box office, average rating, genres, etc.) which was obtained from IMDB and is used for recommendations of movies by streaming surfaces and multiple websites.