•  
  •  
 

Abstract

As the complexity of large-scale infrared video images increases, traditional approaches for detecting conspicuous objects in these images continue to fail. An improved method for detecting prominent objects in infrared photos using deep learning has been created. The major parts of this model include the spatial feature extractor, the temporal feature extractor, the residual connection blocks, and the pixel-level classifier. The spatial feature extraction module was first used to collect the relevant spatial data. Once consistent spatial and temporal data was collected, the temporal feature extraction module was applied. The salient object recognition outcome was produced by jointly feeding the spatiotemporal feature information and spatial low-level feature information acquired from the spatial module into the pixel-level classifier via residual connection blocks. During the network training, we used BCE loss and DICE loss in tandem to increase the training stability. For evaluation, we made use of the OTCBVS infrared video dataset as well as other infrared video sequences with intricate backgrounds. We utilized evaluation metrics e.g., Mean Absolute Error, MaxF, MaxE, and S Measures to calculate the model performance. The proposed model showed a significant detection accuracy of % 82.87 in a complex background for inconspicuous targets with comparison to models designed for visible-light images such as CPD, MGA, Ds-Net, and RcrNet. The results demonstrate the model’s ability to reliably identify salient items, as well as its generalizability and robustness.

Share

COinS