Korean J. Remote Sens. 2024; 40(6): 1141-1148
Published online: December 31, 2024
https://doi.org/10.7780/kjrs.2024.40.6.1.21
© Korean Society of Remote Sensing
Correspondence to : Yunjee Kim
E-mail: yunjee0531@kriso.re.kr
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
With the recent development of satellites and microsatellite constellations, higher temporal resolution satellite imagery and larger numbers of satellites have become available. As the availability of satellite data increases, it is necessary to develop ship detection models that consider specific data characteristics, such as polarization, spatial resolution, and frequency bands. Among these, to leverage the distinct features of polarization, we employed a late fusion approach to merge the ship detection results from Sentinel-1 dual-polarization data. To evaluate the effectiveness of the fusion model, we built four single training models using two polarizations (VV, VH) and two colormaps (gray, parula), as well as six multimodal models with late fusion. As a result of comparing the accuracy of the single model and the fusion model, we found that the accuracy of the fusion model consisting of 1) VH gray colormap and VV parula colormap, and 2) VH parula colormap and VV parula colormap is higher than that of the single model (based on intersection over union (IoU) thresholds of 0.4 and 0.5). Each fusion model achieved a relative accuracy improvement of at least 1.5% and up to 6.5% compared to the single model with the highest accuracy among the two. The significance of this study is that the late fusion was applied using both polarization data and colormap information simultaneously. These results suggest that the fusion model can detect ships more accurately and that colormaps, which have been underexplored in SAR research, can be a factor in improving accuracy.
Keywords Ship detection, SAR, Sentinel-1, Late fusion, YOLO, Dual-polarization
Ship monitoring in coastal and maritime areas is important for ensuring marine safety, maritime security, prompt shipwreck and accident responses, and preventing marine pollution (Nie et al., 2020; Qin et al., 2021; Wang et al., 2019; Zhang et al., 2021a). Coastal ship monitoring is primarily conducted through vessel traffic services (VTSs) and patrol vessels; however, this type of monitoring is limited due to the broad expanse of the ocean. As of late 2019, there were more than 90,000 merchant ships and 5,000 warships worldwide, complicating the rapid detection and accurate identification of the wide variety of ships in the ocean (Zhang et al., 2021c). Not all ships are obligated to carry transponders, and these may be switched off for various reasons. In areas with high ship traffic, it is difficult to monitor numerous ships with limited equipment (Tian et al., 2022); therefore, satellite data are essential for ship monitoring over wide areas. Many studies have investigated ship detection using satellite images, particularly using SAR satellites, which can acquire data regardless of the weather or time of day (Li et al., 2022; Miao et al., 2022; Pelich et al., 2019; Song et al., 2020).
Satellites are being gradually miniaturized, and synthetic aperture radar (SAR) microsatellite constellations, such as those produced by ICEYE and Capella, are becoming more common. This process has significantly improved the spatial and temporal resolution of satellite data producing an abundance of highquality satellite data. The increasing availability of various types of data from multiple satellites is creating new opportunities to utilize multi-satellite data, and any related research is considered essential. Effectively leveraging multi-satellite data requires the development of sophisticated algorithms capable of handling diverse data types and extracting meaningful insights. Therefore, in this study, we aimed to suggest a detection algorithm that uses dual-polarization data with different characteristics as input.
Ship detection research using SAR images is now mainly conducted through deep learning methods, including the You Only Look Once (YOLO) algorithm, which has high speed and detection accuracy (Chang et al., 2019; Guo et al., 2022; Sun et al., 2021; Tang et al., 2021; Wang et al., 2021; Wang et al., 2022). The YOLO v2 algorithm outperformed the generally faster region-based convolutional neural network (CNN) algorithm, showing 5.8-fold faster ship detection; an extension of this algorithm was 2.5-fold faster than the original (Chang et al., 2019). A subsequent novel ship detection approach integrated noise level classification and potential target area extraction based on the YOLO algorithm, and showed superior detection of objects directly from images; however, its data loss during edge extraction requires further refinement (Tang et al., 2021).
An analysis by Im et al. (2023) has shown the potential of utilizing various polarization data in ship detection by comparing the accuracy of YOLO-based detection models built with each polarization data and YOLO-based detection models built by integrating all polarization data. This study demonstrated that VV polarization typically shows higher backscatter coefficients than VH polarization (Im et al., 2023). Noting that dual-polarization data each has its unique characteristics, we compared the accuracy of a single model built with VV and VH images from the Sentinel-1 with a fusion model using a late fusion technique. While constructing the fusion model, we used gray and parula colormaps to demonstrate that the colormap of an image can also influence accuracy.
In this study, training and test datasets were generated from a set of 31 Sentinel-1A/B images, captured between July 3, 2019, and March 18, 2021, over Busan in South Korea. The original image was cropped into sub-images according to the condition and prerequisite in Table 1.
Table 1 Description of sub-image data in this studyDescription of sub-image data in this study
Sub-image | Condition | Prerequisite | No. of sub-images |
---|---|---|---|
400s × 400s | Randomly crop the sub-image from the original image so that both sides have lengths between 400 and 499 | Contains at least three ships | 322 |
500s × 500s | Randomly crop the sub-image from the original image so that both sides have lengths between 500 and 599 | ||
600s × 600s | Randomly crop the sub-image from the original image so that both sides have lengths between 600 and 699 | ||
700s × 700s | Randomly crop the sub-image from the original image so that both sides have lengths between 700 and 799 |
The dataset was partitioned into training (60%), validation (20%), and testing (20%) subsets to balance training data availability with adequate samples for evaluation and hyperparameter tuning. Especially, randomization was employed during allocation to minimize selection bias, ensuring a representative sample across all subsets. This rigorous partitioning approach enabled a robust evaluation of model performance across diverse conditions, laying a strong foundation for reliable ship detection in SAR images.
Previous studies have shown that image segmentation performance varies depending on the color of the image (Cheng et al., 2001; Lucchese and Mitra, 2001). We took note of these points and considered that the colormap of the image might be related to the object detection accuracy when detecting ships. According to previous research, the Autumn, Viridis, and Parula colormaps demonstrated superior segmentation performance in microwave tomography (Zhang et al., 2021b). Among these, the parula colormap was found to have mean and standard deviation (SD) values similar to those of the gray colormap, and thus, it was compared with the gray colormap in this study. The following graph represents the RGB values of two colormaps (Fig. 1).
If an image is represented using the gray colormap, its colors will range between black and white because its RGB values are distributed between 0 and 1. By contrast, parula colormap exhibits larger blue values for smaller values, and an increase in red and green values for larger values (Fig. 1). Although it is challenging to intuitively distinguish SAR images based on different colormaps, we used two distinct colormaps for all training data because they may exhibit different characteristics in the deep learning process (Fig. 2). Both colormaps used in this study were scaled to have values in the range of 0 to 255.
YOLO is a single-stage object detection algorithm based on deep learning; it is based on the human glance, which instantly grasps the types and correlations of objects included in image data (Liu et al., 2016; Redmon, 2016). Also, it uses a pipeline that that infers object information from the entire image, and therefore learns information about the background region, in addition to the specific region where the object is located; this results in smaller background errors compared to two-stage object detection algorithms (Qing et al., 2021; Zhou et al., 2022). Zhu et al. (2021) compared the detection speed and accuracy of three region-based convolutional neural network (R-CNN)-based two-stage detectors and four single-stage detectors, confirming that the single-stage detector, YOLOv5s, achieved the fastest detection speed at 63.3 frames per second (FPS).
The YOLO algorithm has evolved through various versions. In a study by Kim (2023), YOLOv1 and YOLOv2 are suitable for tasks where accuracy is critical, while YOLOv3 and YOLOv4 are primarily used for tasks that prioritize speed. On the other hand, YOLOv5 and YOLOv6 are effective in environments requiring a balance between accuracy and speed, whereas YOLOv7 and YOLOv8 are known to be applicable in scenarios with limited memory and computational resources. In this study, YOLOv5, which shows an excellent balance between accuracy and speed, was used considering the real-time nature of ship detection.
Late fusion involves applying object detection models to data from various sensors and combining the detection results at the decision-making stage. This method maintains high detection accuracy and robustness, even when the performance of individual sensors deteriorates (Kim and Cho, 2021; Kim et al., 2019; Zhao et al., 2021). In this study, late fusion was used to consider the features of VV and VH images with different features, and the detection results of each single model were fused using a soft non-maximum suppression (NMS)-based late fusion method (Fig. 3).
A study by Farahnakian and Heikkonen (2020) explored ship detection using late fusion. They compared the accuracy of different fusion methods by integrating ship detection results from RGB and IR images, employing not only late fusion but also early and middle fusion techniques. Although various studies have explored fusing sensor detection results, this research is the first to improve ship detection accuracy by employing late fusion of dual-polarization data from SAR images.
Before comparing the accuracy of single models and fused models, we first evaluated the accuracy of single models based on polarization and color maps to determine whether the polarization and color map of SAR images affect the accuracy. This study evaluates mean average precision (mAP) at IoU thresholds between 0.3 and 0.5, reflecting varying levels of localization precision, to comprehensively assess model performance.
As a result of comparing the accuracy according to polarization, the VH image had higher accuracy in more than 79% of all cases. In particular, it had significantly higher detection accuracy than VV, for the gray colormap; the difference was even greater at a threshold of 0.5. The accuracy of the VH image was 26% higher than that of the VV image. However, for the parula colormap, the detection accuracy was generally similar between polarization modes (Table 2). Next, we compared the accuracy of the gray and parula colormaps. The detection accuracy of the parula colormap was higher than that of the gray colormap in more than 80% of cases. The difference in accuracy between the gray and parula colormaps was greater for VV images than for VH images and became larger as the IoU threshold increased (Table 2).
Table 2 Comparison of single model accuracy based on polarization and colormaps
Input size | Polarization | Colormap | IoU threshold | ||
---|---|---|---|---|---|
0.3 | 0.4 | 0.5 | |||
400s × 400s | VV | Gray | 73.65 | 47.86 | 21.65 |
VH | 89.70 | 72.57 | 46.54 | ||
VV | Parula | 91.39 | 73.87 | 44.37 | |
VH | 93.26 | 77.46 | 46.73 | ||
500s × 500s | VV | Gray | 89.50 | 67.26 | 36.49 |
VH | 94.39 | 84.30 | 61.38 | ||
VV | Parula | 94.33 | 84.78 | 64.76 | |
VH | 96.12 | 87.34 | 68.55 | ||
600s × 600s | VV | Gray | 92.27 | 74.80 | 50.88 |
VH | 97.19 | 86.30 | 67.04 | ||
VV | Parula | 95.64 | 88.66 | 72.36 | |
VH | 96.83 | 85.48 | 71.42 | ||
700s × 700s | VV | Gray | 94.27 | 79.59 | 48.57 |
VH | 96.24 | 87.60 | 74.61 | ||
VV | Parula | 97.31 | 89.52 | 75.19 | |
VH | 96.14 | 88.45 | 74.09 |
This study proposed a ship detection method that fuses the detection results of dual-polarization data of different colormap types to improve the accuracy of deep training models by utilizing both training data with different characteristics. To improve detection accuracy, we compared the accuracy of all fusion models that could be combined into four single models (VV gray, VV parula, VH gray, VH parula). Specifically, the detection accuracy of a fusion model, which consists of a training model with an image size of and a threshold of 0.4 for late fusion, was compared (Table 3).
Table 3 Ship detection accuracy comparison according to multimodal types and IoU thresholds (Input size: 700s×700s / Multimodal threshold: 0.4)
Multimodal threshold | Multimodal input | IoU threshold | |||
---|---|---|---|---|---|
0.3 | 0.4 | 0.5 | |||
0.4 | VV gray | VH gray | 95.88 | 86.73 | 69.79 |
VH gray | VV parula | 96.74 | 91.59 | 78.81 | |
VH gray | VH parula | 97.83 | 88.91 | 75.97 | |
VV gray | VH parula | 95.79 | 88.70 | 68.97 | |
VV gray | VV parula | 94.82 | 86.80 | 64.88 | |
VH parula | VV parula | 97.92 | 90.86 | 78.87 |
As a result, it was found that the fused model with 1) VH gray colormap and VV parula colormap, and 2) VH parula colormap and VV parula colormap as multimodal inputs has higher accuracy than the single model (for IoU threshold of 0.4 and 0.5). For an IoU threshold of 0.4, the accuracy of the VH gray colormap single model was 87.60%, and that for the parula VV single model had an accuracy of 89.54%, both of which were below 90%; by contrast, a fusion of these single models resulted in an accuracy level higher than 90%. Similarly, the accuracy levels of the single VH parula and VV parula models were 88.45% and 89.52%, respectively, whereas the fusion of these models resulted in > 90% accuracy. Both multimodal models had higher true positive rates than the single models, despite a nonsignificant difference in the number of detections (‘Det’ in Table 4) as shown in Table 4.
Table 4 Comparison of results produced by two multimodal models with high detection accuracy and their corresponding single model predictions
Predict result | Input | ||||
---|---|---|---|---|---|
VH gray | VH parula | VV parula | VH gray | VH parula | |
VV parula | VV parula | ||||
Ground Truth (GT) | 731 | 731 | 731 | 731 | 731 |
Detected (Det) | 801 | 776 | 860 | 840 | 823 |
True Positive (TP) | 657 | 662 | 671 | 678 | 684 |
False Positive (FP) | 144 | 114 | 189 | 162 | 139 |
Improving ship detection accuracy consists of two components: increasing the rate of true positives and eliminating false positives. Factors such as ship wake and offshore structures can cause ships to be mistaken for noise whereas the multimodal model is more likely to detect ships among these confounding factors (top of Fig. 4a, blue circle in Fig. 4d). Conversely a bright area on a SAR image may be falsely detected as a ship due to noise from small islands or strong backscattering signals whereas the multimodal model will correctly recognize that these signals are not ships (left in Figs. 4b, c; yellow circle in Figs. 4e, f).
In this study, we compared the accuracy of four single models using dual-polarization and gray and parula color maps, as well as six fused models utilizing late fusion techniques to integrate data with diverse characteristics. The comparison of 24 cases of single models revealed that, in 79% of the cases, the VH image was more accurate than the VV image, with the difference being more pronounced in the gray colormap. This indicates that the VH gray demonstrates high accuracy. Conversely, when comparing results based on the colormap, the parula colormap showed higher accuracy in 83% of the cases, with the difference being more significant in the VV image. This suggests that the VV parula provides high accuracy. To summarize, it can be concluded that VH parula demonstrates the highest accuracy. This finding aligns with the accuracy comparisons observed in the fused models.
Results from comparing the accuracy of six fusion models show that the best model for the YOLOv5-based late fusion algorithm proposed in this study involved fusing two pairs of detection results: 1) VH gray colormap and VV parula colormap, and 2) VH parula colormap and VV gray colormap, using a multimodal threshold of 0.4 and an IoU threshold of 0.4 or 0.5. All single models showed <90% accuracy for an IoU threshold of 0.4, whereas the optimal fusion models showed >90% accuracy, indicating effective compensation for both false positives and non detections. The YOLOv5-based late fusion method reduced the false detection rate by eliminating multiple detection results, which was a limitation of the NMS method, through the adoption of soft-NMS. The results of this study can be used for fundamental data to develop a multisensor-based ship detection algorithm. Future research will explore ways to utilize various satellite data to improve ship detection accuracy.
This research was funded Korea Institute of Marine Science & Technology Promotion (KIMST) and by the Ministry of Oceans and Fisheries (RS-2024-00415504).
No potential conflict of interest relevant to this article was reported.
Korean J. Remote Sens. 2024; 40(6): 1141-1148
Published online December 31, 2024 https://doi.org/10.7780/kjrs.2024.40.6.1.21
Copyright © Korean Society of Remote Sensing.
Yunjee Kim1* , Jinsoo Kim2 , Ki-mook Kang3
1Researcher, Maritime Digital Transformation Research Center, Korea Research Institute of Ships & Ocean Engineering, Daejeon, Republic of Korea
2CTO, DEEP.I Inc., Asan, Republic of Korea
3Senior Researcher, Water Resource Satellite Center, K-water Research Institute, Daejeon, Republic of Korea
Correspondence to:Yunjee Kim
E-mail: yunjee0531@kriso.re.kr
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
With the recent development of satellites and microsatellite constellations, higher temporal resolution satellite imagery and larger numbers of satellites have become available. As the availability of satellite data increases, it is necessary to develop ship detection models that consider specific data characteristics, such as polarization, spatial resolution, and frequency bands. Among these, to leverage the distinct features of polarization, we employed a late fusion approach to merge the ship detection results from Sentinel-1 dual-polarization data. To evaluate the effectiveness of the fusion model, we built four single training models using two polarizations (VV, VH) and two colormaps (gray, parula), as well as six multimodal models with late fusion. As a result of comparing the accuracy of the single model and the fusion model, we found that the accuracy of the fusion model consisting of 1) VH gray colormap and VV parula colormap, and 2) VH parula colormap and VV parula colormap is higher than that of the single model (based on intersection over union (IoU) thresholds of 0.4 and 0.5). Each fusion model achieved a relative accuracy improvement of at least 1.5% and up to 6.5% compared to the single model with the highest accuracy among the two. The significance of this study is that the late fusion was applied using both polarization data and colormap information simultaneously. These results suggest that the fusion model can detect ships more accurately and that colormaps, which have been underexplored in SAR research, can be a factor in improving accuracy.
Keywords: Ship detection, SAR, Sentinel-1, Late fusion, YOLO, Dual-polarization
Ship monitoring in coastal and maritime areas is important for ensuring marine safety, maritime security, prompt shipwreck and accident responses, and preventing marine pollution (Nie et al., 2020; Qin et al., 2021; Wang et al., 2019; Zhang et al., 2021a). Coastal ship monitoring is primarily conducted through vessel traffic services (VTSs) and patrol vessels; however, this type of monitoring is limited due to the broad expanse of the ocean. As of late 2019, there were more than 90,000 merchant ships and 5,000 warships worldwide, complicating the rapid detection and accurate identification of the wide variety of ships in the ocean (Zhang et al., 2021c). Not all ships are obligated to carry transponders, and these may be switched off for various reasons. In areas with high ship traffic, it is difficult to monitor numerous ships with limited equipment (Tian et al., 2022); therefore, satellite data are essential for ship monitoring over wide areas. Many studies have investigated ship detection using satellite images, particularly using SAR satellites, which can acquire data regardless of the weather or time of day (Li et al., 2022; Miao et al., 2022; Pelich et al., 2019; Song et al., 2020).
Satellites are being gradually miniaturized, and synthetic aperture radar (SAR) microsatellite constellations, such as those produced by ICEYE and Capella, are becoming more common. This process has significantly improved the spatial and temporal resolution of satellite data producing an abundance of highquality satellite data. The increasing availability of various types of data from multiple satellites is creating new opportunities to utilize multi-satellite data, and any related research is considered essential. Effectively leveraging multi-satellite data requires the development of sophisticated algorithms capable of handling diverse data types and extracting meaningful insights. Therefore, in this study, we aimed to suggest a detection algorithm that uses dual-polarization data with different characteristics as input.
Ship detection research using SAR images is now mainly conducted through deep learning methods, including the You Only Look Once (YOLO) algorithm, which has high speed and detection accuracy (Chang et al., 2019; Guo et al., 2022; Sun et al., 2021; Tang et al., 2021; Wang et al., 2021; Wang et al., 2022). The YOLO v2 algorithm outperformed the generally faster region-based convolutional neural network (CNN) algorithm, showing 5.8-fold faster ship detection; an extension of this algorithm was 2.5-fold faster than the original (Chang et al., 2019). A subsequent novel ship detection approach integrated noise level classification and potential target area extraction based on the YOLO algorithm, and showed superior detection of objects directly from images; however, its data loss during edge extraction requires further refinement (Tang et al., 2021).
An analysis by Im et al. (2023) has shown the potential of utilizing various polarization data in ship detection by comparing the accuracy of YOLO-based detection models built with each polarization data and YOLO-based detection models built by integrating all polarization data. This study demonstrated that VV polarization typically shows higher backscatter coefficients than VH polarization (Im et al., 2023). Noting that dual-polarization data each has its unique characteristics, we compared the accuracy of a single model built with VV and VH images from the Sentinel-1 with a fusion model using a late fusion technique. While constructing the fusion model, we used gray and parula colormaps to demonstrate that the colormap of an image can also influence accuracy.
In this study, training and test datasets were generated from a set of 31 Sentinel-1A/B images, captured between July 3, 2019, and March 18, 2021, over Busan in South Korea. The original image was cropped into sub-images according to the condition and prerequisite in Table 1.
Table 1 . Description of sub-image data in this studyDescription of sub-image data in this study.
Sub-image | Condition | Prerequisite | No. of sub-images |
---|---|---|---|
400s × 400s | Randomly crop the sub-image from the original image so that both sides have lengths between 400 and 499 | Contains at least three ships | 322 |
500s × 500s | Randomly crop the sub-image from the original image so that both sides have lengths between 500 and 599 | ||
600s × 600s | Randomly crop the sub-image from the original image so that both sides have lengths between 600 and 699 | ||
700s × 700s | Randomly crop the sub-image from the original image so that both sides have lengths between 700 and 799 |
The dataset was partitioned into training (60%), validation (20%), and testing (20%) subsets to balance training data availability with adequate samples for evaluation and hyperparameter tuning. Especially, randomization was employed during allocation to minimize selection bias, ensuring a representative sample across all subsets. This rigorous partitioning approach enabled a robust evaluation of model performance across diverse conditions, laying a strong foundation for reliable ship detection in SAR images.
Previous studies have shown that image segmentation performance varies depending on the color of the image (Cheng et al., 2001; Lucchese and Mitra, 2001). We took note of these points and considered that the colormap of the image might be related to the object detection accuracy when detecting ships. According to previous research, the Autumn, Viridis, and Parula colormaps demonstrated superior segmentation performance in microwave tomography (Zhang et al., 2021b). Among these, the parula colormap was found to have mean and standard deviation (SD) values similar to those of the gray colormap, and thus, it was compared with the gray colormap in this study. The following graph represents the RGB values of two colormaps (Fig. 1).
If an image is represented using the gray colormap, its colors will range between black and white because its RGB values are distributed between 0 and 1. By contrast, parula colormap exhibits larger blue values for smaller values, and an increase in red and green values for larger values (Fig. 1). Although it is challenging to intuitively distinguish SAR images based on different colormaps, we used two distinct colormaps for all training data because they may exhibit different characteristics in the deep learning process (Fig. 2). Both colormaps used in this study were scaled to have values in the range of 0 to 255.
YOLO is a single-stage object detection algorithm based on deep learning; it is based on the human glance, which instantly grasps the types and correlations of objects included in image data (Liu et al., 2016; Redmon, 2016). Also, it uses a pipeline that that infers object information from the entire image, and therefore learns information about the background region, in addition to the specific region where the object is located; this results in smaller background errors compared to two-stage object detection algorithms (Qing et al., 2021; Zhou et al., 2022). Zhu et al. (2021) compared the detection speed and accuracy of three region-based convolutional neural network (R-CNN)-based two-stage detectors and four single-stage detectors, confirming that the single-stage detector, YOLOv5s, achieved the fastest detection speed at 63.3 frames per second (FPS).
The YOLO algorithm has evolved through various versions. In a study by Kim (2023), YOLOv1 and YOLOv2 are suitable for tasks where accuracy is critical, while YOLOv3 and YOLOv4 are primarily used for tasks that prioritize speed. On the other hand, YOLOv5 and YOLOv6 are effective in environments requiring a balance between accuracy and speed, whereas YOLOv7 and YOLOv8 are known to be applicable in scenarios with limited memory and computational resources. In this study, YOLOv5, which shows an excellent balance between accuracy and speed, was used considering the real-time nature of ship detection.
Late fusion involves applying object detection models to data from various sensors and combining the detection results at the decision-making stage. This method maintains high detection accuracy and robustness, even when the performance of individual sensors deteriorates (Kim and Cho, 2021; Kim et al., 2019; Zhao et al., 2021). In this study, late fusion was used to consider the features of VV and VH images with different features, and the detection results of each single model were fused using a soft non-maximum suppression (NMS)-based late fusion method (Fig. 3).
A study by Farahnakian and Heikkonen (2020) explored ship detection using late fusion. They compared the accuracy of different fusion methods by integrating ship detection results from RGB and IR images, employing not only late fusion but also early and middle fusion techniques. Although various studies have explored fusing sensor detection results, this research is the first to improve ship detection accuracy by employing late fusion of dual-polarization data from SAR images.
Before comparing the accuracy of single models and fused models, we first evaluated the accuracy of single models based on polarization and color maps to determine whether the polarization and color map of SAR images affect the accuracy. This study evaluates mean average precision (mAP) at IoU thresholds between 0.3 and 0.5, reflecting varying levels of localization precision, to comprehensively assess model performance.
As a result of comparing the accuracy according to polarization, the VH image had higher accuracy in more than 79% of all cases. In particular, it had significantly higher detection accuracy than VV, for the gray colormap; the difference was even greater at a threshold of 0.5. The accuracy of the VH image was 26% higher than that of the VV image. However, for the parula colormap, the detection accuracy was generally similar between polarization modes (Table 2). Next, we compared the accuracy of the gray and parula colormaps. The detection accuracy of the parula colormap was higher than that of the gray colormap in more than 80% of cases. The difference in accuracy between the gray and parula colormaps was greater for VV images than for VH images and became larger as the IoU threshold increased (Table 2).
Table 2 . Comparison of single model accuracy based on polarization and colormaps.
Input size | Polarization | Colormap | IoU threshold | ||
---|---|---|---|---|---|
0.3 | 0.4 | 0.5 | |||
400s × 400s | VV | Gray | 73.65 | 47.86 | 21.65 |
VH | 89.70 | 72.57 | 46.54 | ||
VV | Parula | 91.39 | 73.87 | 44.37 | |
VH | 93.26 | 77.46 | 46.73 | ||
500s × 500s | VV | Gray | 89.50 | 67.26 | 36.49 |
VH | 94.39 | 84.30 | 61.38 | ||
VV | Parula | 94.33 | 84.78 | 64.76 | |
VH | 96.12 | 87.34 | 68.55 | ||
600s × 600s | VV | Gray | 92.27 | 74.80 | 50.88 |
VH | 97.19 | 86.30 | 67.04 | ||
VV | Parula | 95.64 | 88.66 | 72.36 | |
VH | 96.83 | 85.48 | 71.42 | ||
700s × 700s | VV | Gray | 94.27 | 79.59 | 48.57 |
VH | 96.24 | 87.60 | 74.61 | ||
VV | Parula | 97.31 | 89.52 | 75.19 | |
VH | 96.14 | 88.45 | 74.09 |
This study proposed a ship detection method that fuses the detection results of dual-polarization data of different colormap types to improve the accuracy of deep training models by utilizing both training data with different characteristics. To improve detection accuracy, we compared the accuracy of all fusion models that could be combined into four single models (VV gray, VV parula, VH gray, VH parula). Specifically, the detection accuracy of a fusion model, which consists of a training model with an image size of and a threshold of 0.4 for late fusion, was compared (Table 3).
Table 3 . Ship detection accuracy comparison according to multimodal types and IoU thresholds (Input size: 700s×700s / Multimodal threshold: 0.4).
Multimodal threshold | Multimodal input | IoU threshold | |||
---|---|---|---|---|---|
0.3 | 0.4 | 0.5 | |||
0.4 | VV gray | VH gray | 95.88 | 86.73 | 69.79 |
VH gray | VV parula | 96.74 | 91.59 | 78.81 | |
VH gray | VH parula | 97.83 | 88.91 | 75.97 | |
VV gray | VH parula | 95.79 | 88.70 | 68.97 | |
VV gray | VV parula | 94.82 | 86.80 | 64.88 | |
VH parula | VV parula | 97.92 | 90.86 | 78.87 |
As a result, it was found that the fused model with 1) VH gray colormap and VV parula colormap, and 2) VH parula colormap and VV parula colormap as multimodal inputs has higher accuracy than the single model (for IoU threshold of 0.4 and 0.5). For an IoU threshold of 0.4, the accuracy of the VH gray colormap single model was 87.60%, and that for the parula VV single model had an accuracy of 89.54%, both of which were below 90%; by contrast, a fusion of these single models resulted in an accuracy level higher than 90%. Similarly, the accuracy levels of the single VH parula and VV parula models were 88.45% and 89.52%, respectively, whereas the fusion of these models resulted in > 90% accuracy. Both multimodal models had higher true positive rates than the single models, despite a nonsignificant difference in the number of detections (‘Det’ in Table 4) as shown in Table 4.
Table 4 . Comparison of results produced by two multimodal models with high detection accuracy and their corresponding single model predictions.
Predict result | Input | ||||
---|---|---|---|---|---|
VH gray | VH parula | VV parula | VH gray | VH parula | |
VV parula | VV parula | ||||
Ground Truth (GT) | 731 | 731 | 731 | 731 | 731 |
Detected (Det) | 801 | 776 | 860 | 840 | 823 |
True Positive (TP) | 657 | 662 | 671 | 678 | 684 |
False Positive (FP) | 144 | 114 | 189 | 162 | 139 |
Improving ship detection accuracy consists of two components: increasing the rate of true positives and eliminating false positives. Factors such as ship wake and offshore structures can cause ships to be mistaken for noise whereas the multimodal model is more likely to detect ships among these confounding factors (top of Fig. 4a, blue circle in Fig. 4d). Conversely a bright area on a SAR image may be falsely detected as a ship due to noise from small islands or strong backscattering signals whereas the multimodal model will correctly recognize that these signals are not ships (left in Figs. 4b, c; yellow circle in Figs. 4e, f).
In this study, we compared the accuracy of four single models using dual-polarization and gray and parula color maps, as well as six fused models utilizing late fusion techniques to integrate data with diverse characteristics. The comparison of 24 cases of single models revealed that, in 79% of the cases, the VH image was more accurate than the VV image, with the difference being more pronounced in the gray colormap. This indicates that the VH gray demonstrates high accuracy. Conversely, when comparing results based on the colormap, the parula colormap showed higher accuracy in 83% of the cases, with the difference being more significant in the VV image. This suggests that the VV parula provides high accuracy. To summarize, it can be concluded that VH parula demonstrates the highest accuracy. This finding aligns with the accuracy comparisons observed in the fused models.
Results from comparing the accuracy of six fusion models show that the best model for the YOLOv5-based late fusion algorithm proposed in this study involved fusing two pairs of detection results: 1) VH gray colormap and VV parula colormap, and 2) VH parula colormap and VV gray colormap, using a multimodal threshold of 0.4 and an IoU threshold of 0.4 or 0.5. All single models showed <90% accuracy for an IoU threshold of 0.4, whereas the optimal fusion models showed >90% accuracy, indicating effective compensation for both false positives and non detections. The YOLOv5-based late fusion method reduced the false detection rate by eliminating multiple detection results, which was a limitation of the NMS method, through the adoption of soft-NMS. The results of this study can be used for fundamental data to develop a multisensor-based ship detection algorithm. Future research will explore ways to utilize various satellite data to improve ship detection accuracy.
This research was funded Korea Institute of Marine Science & Technology Promotion (KIMST) and by the Ministry of Oceans and Fisheries (RS-2024-00415504).
No potential conflict of interest relevant to this article was reported.
Table 1 . Description of sub-image data in this studyDescription of sub-image data in this study.
Sub-image | Condition | Prerequisite | No. of sub-images |
---|---|---|---|
400s × 400s | Randomly crop the sub-image from the original image so that both sides have lengths between 400 and 499 | Contains at least three ships | 322 |
500s × 500s | Randomly crop the sub-image from the original image so that both sides have lengths between 500 and 599 | ||
600s × 600s | Randomly crop the sub-image from the original image so that both sides have lengths between 600 and 699 | ||
700s × 700s | Randomly crop the sub-image from the original image so that both sides have lengths between 700 and 799 |
Table 2 . Comparison of single model accuracy based on polarization and colormaps.
Input size | Polarization | Colormap | IoU threshold | ||
---|---|---|---|---|---|
0.3 | 0.4 | 0.5 | |||
400s × 400s | VV | Gray | 73.65 | 47.86 | 21.65 |
VH | 89.70 | 72.57 | 46.54 | ||
VV | Parula | 91.39 | 73.87 | 44.37 | |
VH | 93.26 | 77.46 | 46.73 | ||
500s × 500s | VV | Gray | 89.50 | 67.26 | 36.49 |
VH | 94.39 | 84.30 | 61.38 | ||
VV | Parula | 94.33 | 84.78 | 64.76 | |
VH | 96.12 | 87.34 | 68.55 | ||
600s × 600s | VV | Gray | 92.27 | 74.80 | 50.88 |
VH | 97.19 | 86.30 | 67.04 | ||
VV | Parula | 95.64 | 88.66 | 72.36 | |
VH | 96.83 | 85.48 | 71.42 | ||
700s × 700s | VV | Gray | 94.27 | 79.59 | 48.57 |
VH | 96.24 | 87.60 | 74.61 | ||
VV | Parula | 97.31 | 89.52 | 75.19 | |
VH | 96.14 | 88.45 | 74.09 |
Table 3 . Ship detection accuracy comparison according to multimodal types and IoU thresholds (Input size: 700s×700s / Multimodal threshold: 0.4).
Multimodal threshold | Multimodal input | IoU threshold | |||
---|---|---|---|---|---|
0.3 | 0.4 | 0.5 | |||
0.4 | VV gray | VH gray | 95.88 | 86.73 | 69.79 |
VH gray | VV parula | 96.74 | 91.59 | 78.81 | |
VH gray | VH parula | 97.83 | 88.91 | 75.97 | |
VV gray | VH parula | 95.79 | 88.70 | 68.97 | |
VV gray | VV parula | 94.82 | 86.80 | 64.88 | |
VH parula | VV parula | 97.92 | 90.86 | 78.87 |
Table 4 . Comparison of results produced by two multimodal models with high detection accuracy and their corresponding single model predictions.
Predict result | Input | ||||
---|---|---|---|---|---|
VH gray | VH parula | VV parula | VH gray | VH parula | |
VV parula | VV parula | ||||
Ground Truth (GT) | 731 | 731 | 731 | 731 | 731 |
Detected (Det) | 801 | 776 | 860 | 840 | 823 |
True Positive (TP) | 657 | 662 | 671 | 678 | 684 |
False Positive (FP) | 144 | 114 | 189 | 162 | 139 |
Bong Chan Kim, Seulki Lee, Chang-Wook Lee
Korean J. Remote Sens. 2024; 40(3): 307-316