Multi Scale Convolutional Fusion Network for Image Retrieval
The task of image retrieval from large-scale databases presents a major challenge in computer vision due to the limitations of traditional text-based and content-based methods in fully capturing visual features, often leading to a "semantic gap." A novel method, the Multi-Scale Convolutional Fusion Network (MSCFNet), has been introduced to improve both the accuracy and efficiency of image retrieval by employing multi-scale convolutional layers. MSCFNet uses filters of different sizes to simultaneously extract fine, medium, and large-scale features, offering a more detailed representation of images. This enables better detection of diverse patterns and visual details, enhancing image matching and retrieval performance. Additionally, MSCFNet minimizes model complexity by using the "addition" operation for feature fusion, maintaining computational efficiency without increasing feature map dimensionality. MSCFNet was implemented in two versions, one with 2 layers and another with 4 layers, and tested on CIFAR-10, CIFAR-100, and Fashion-MNIST datasets. The results show MSCFNet consistently outperforms more complex models like ResNet18 and ResNet50, with accuracies of 74.43% on CIFAR-10, 38.87% on CIFAR-100, and 92.47% on Fashion-MNIST. Furthermore, MSCFNet greatly reduces parameters and training time, with the 2-layer version requiring just 113.1 seconds on CIFAR-10 while maintaining high accuracy. The 4-layer version further improves accuracy and F-Score across all datasets. MSCFNet's balance of accuracy, efficiency, and reduced complexity makes it ideal for use in resource-limited environments.