Minimizing Model Size of CNN-Based Vehicle Make Recognition for Frontal Vehicle Images

Wiput Puisamlee; Rathachai Chawuthai

doi:10.1109/access.2025.3574187

Minimizing Model Size of CNN-Based Vehicle Make Recognition for Frontal Vehicle Images

Date

2025-1-1

Authors

Wiput Puisamlee

Rathachai Chawuthai

Publisher

IEEE Access

Abstract

Vehicle Make Model Recognition (VMMR) is generally considered in any Intelligent Transport Systems (ITS), free flow image-based toll systems, and enforcement systems. These systems need to analyze and process any images of the front of vehicles as evidence of use. Currently, Convolutional Neural Networks (CNN) are well-known techniques for image classification research and they applied to solve problems in the VMMR domain. Increasing the accuracy of classification with a large number of classes requires more complex model structures and a greater number of internal parameters. It resulted in the issues larger models and potentially longer processing times. This work aims to study and develop a smaller CNN model that is suitable for devices with limited resources, such as embedded computers and embedded computer cameras, for recognizing vehicle makes from frontal images. The experimental datasets were collected from actual free-flow toll systems, and a CNN model was developed that achieved 99% accuracy in recognizing vehicle makes. The developed model is smaller than state-of-the-art CNN models tested (VGG16, InceptionV3, Yolo11m-cls, and ResNet50) and achieves over 90% accuracy. It was able to develop the CTv1 model to achieve an F1 score approximately 2.06% higher than the best one, which is InceptionV3, while reducing the number of parameters by 69.95%. The model was tested on a Raspberry Pi 3 Model B, where it processed images at the average speed of 1 second per image and a power consumption of 25 milliwatts-hour (mWh). Our study also reduces the CNN model size using Depth-wise Separable Convolutional and 1x1 Convolutional Dimension Reduction (Bottleneck) methods, as well as adjusting the Padding and Stride of the Convolutional Layer to test the accuracy, training time, processing time, and model size for vehicle make recognition.