Baseline Performance of Pre-trained Models on Movie Genre Classification from Spectrograms

Abstract

This study investigates the use of deep learning for classifying movie genres based on audio spectrograms. We construct a dataset of movie trailers, transform them into spectrograms, and label them by genre. Then, we utilize MATLAB's pre-trained convolutional neural networks (CNNs) for clas- sication, comparing the performance of 9 different architectures, including MobileNet-v2, RestNet-18, DenseNet-201, Places365-GoogLeNet, VGG- 16, VGG-19, Inception-RestNet-v2, Inception-v3, and NASANet-Mobile. We evaluated all models based on their ability to classify movie trailers into ve genres: action, romance, drama, comedy, and thriller. Our results, based on accuracy and F1-score across genres, indicate that VGG16 achieves the highest overall performance with an accuracy of 86.27%, an F1-score of 86.69%, a recall of 86.87%, and a precision of 87.28%. This research demonstrates the potential of leveraging pre-trained CNNs, particularly VGG-16, for efficient and effective audio-based genre classification in movie trailers.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By