Abstract:
The Attention Mechanism and Batch Normalization (BN) are two cornerstone components in modern Deep Learning (DL) architectures. However, their individual contributions to model performance often remain unclear due to intertwined effects within complex networks. This study systematically investigates and quantifies the impact of both structures through an ablation analysis using two efficient baseline models, EfficientNetB0 and MobileNetV3 Small, on the CIFAR-10 dataset. Three configurations were evaluated: (1) the default architecture, (2) models with Squeeze-and-Excitation (SE) attention modules removed, and (3) models with BN layers removed. Experimental results reveal that BN layers play a more critical role in enhancing model accuracy and stability, outperforming both the default and attention-only configurations. These findings highlight BN as a dominant factor in optimizing learning dynamics, suggesting that normalization contributes more significantly to convergence and generalization than attention mechanisms alone.
Keywords:
Deep Learning (DL), Convolutional Neural Networks (CNN), EfficientNetB0, MobileNetV3 Small, Attention Mechanism, Batch Normalization (BN), NLP, Computer Vision (CV)