Korean J. Remote Sens. 2024; 40(6): 943-955
Published online: December 31, 2024
https://doi.org/10.7780/kjrs.2024.40.6.1.6
© Korean Society of Remote Sensing
Correspondence to : Kyoungah Choi
E-mail: shale@krihs.re.kr
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Maritime accidents cause human and property losses, making timely detection of small ships crucial for improving rescue operations’ efficiency. To address this, this study constructed a high-resolution training dataset for small ship detection using satellite imagery and evaluated the performance of various deep learning models. Among the detection transformer (DETR) models, the DETR with improved denoising anchor boxes for end-to-end object detection (DINO) model achieved an average precision (AP) of 0.934, outperforming convolutional neural network (CNN)-based models. Notably, it succeeded in detecting small ships under 10 meters. Furthermore, detection experiments using the BlackSky microsatellite constellation evaluated the efficiency of maritime monitoring and search and rescue operations, with positional errors between global positioning system (GPS) and detected ship locations of 64.23 m and 54.89 m in the first and second experiments, respectively. These results confirm the practicality of the proposed transformer-based high-resolution satellite imagery methodology for small ship detection and suggest potential improvements in detection performance under various maritime conditions.
Keywords Maritime search and rescue, High-resolution satellite imagery, Small ship detection, Microsatellite constellation, Detection transformer
Maritime accidents occur frequently worldwide, with incidents involving small ships, such as fishing boats, accounting for a significant proportion. According to recent statistics, approximately 80% of annual maritime accidents are related to small ships, leading to severe human and property losses (Lee and Choi, 2024). Typical examples include the capsizing of fishing boats or drifting due to mechanical failure, which, if not promptly addressed, may result in substantial casualties. Therefore, rapid situational awareness and rescue operations are essential in incidents involving small ships.
Conventional maritime search and rescue operations generally rely on ships and personnel managed by the Maritime Rescue and Salvage Association, along with aircraft and patrol ships operated by the Korea Coast Guard. These operations depend on passive methods such as radar, visual observation, and ship-to-ship radio communication, which present several limitations (Yun, 2020). Personnel-dependent rescue missions are inherently risky, and factors such as crew fatigue and harsh marine environments can further reduce operational efficiency. Particularly, when small teams conduct searches across vast maritime areas, detection speed, and accuracy are significantly compromised. These challenges highlight the urgent need for new technologies to enhance the efficiency of maritime search and rescue operations.
Satellite imagery has emerged as a crucial tool for overcoming the limitations of traditional search and rescue methods, expanding search areas, and enhancing the efficiency of maritime surveillance and rescue operations. With advancements in deep learning, research on ship detection using satellite imagery in the field of remote sensing has been actively conducted. To support these efforts, several large-scale satellites and aerial image datasets have been made publicly available, including high-resolution ship collection 2016 (HRSC2016) (Liu et al., 2017), dataset for object detection in aerial images (DOTA) (Xia et al., 2018), airbus ship detection (Al-Saad et al., 2021), and object detection in optical remote sensing images (DIOR) (Li et al., 2020). These datasets have facilitated the development of various deep-learning models aimed at improving ship detection performance.
Prominent ship detection models include the faster regionbased convolutional neural network (Faster R-CNN), which utilizes a region proposal network to detect objects quickly and accurately (Ren et al., 2015). The You Only Look Once (YOLO) model, on the other hand, processes the entire image in a single pass using a unified neural network, enabling real-time detection (Redmon et al., 2016). More recently, end-to-end object detection with transformers (DETR) has been introduced, leveraging the Transformer architecture to simplify the traditional object detection pipeline while delivering high performance (Carion et al., 2020). Research in ship detection has thus primarily focused on improving object detection performance by utilizing largescale benchmark datasets.
However, most benchmark datasets consist of medium- to low-resolution satellite imagery, lacking high-resolution datasets and instances suitable for detecting small ships. Even though developed models demonstrate high detection performance, their practicality in real rescue operations remains limited. Traditional satellite imagery has the advantage of covering vast areas but falls short in urgent rescue situations due to limitations in temporal resolution and real-time data processing, making immediate response difficult.
These limitations can be addressed through the use of microsatellite constellations. Operating in low Earth orbit, microsatellite constellations provide high spatiotemporal resolution with short revisit times (Kim and Kang, 2021). This capability allows for rapid response to dynamic maritime situations, enabling more frequent observations and timely actions in search and rescue operations, effectively overcoming the constraints of traditional satellites. Despite these advantages, practical applications of microsatellite constellations in rescue operations remain scarce.
Therefore, this study aims to build a training dataset for small ship detection and assess the detection efficiency and practical applicability of microsatellite constellations in real-world operations. The research methodology comprises three main stages. First, a high-resolution training dataset suitable for small ship detection was constructed. Second, various deeplearning detection models were trained and evaluated using the developed dataset. Finally, detection experiments were conducted to assess the practical applicability of microsatellite constellations in real maritime environments for small ship detection.
In this study, high-resolution satellite and aerial imagery from domestic and international open-source datasets, along with new data from domestic satellites, were utilized for small ship detection. The datasets were selected based on the criterion of having spatial resolutions of 1 meter or higher to enable the extraction of ship dimension information. Accordingly, the opensource datasets chosen include xView, very high-resolution ships dataset (VHRShip), and the “Satellite Imagery Object Detection 1.0 ver” dataset provided by AI-Hub (hereafter referred to as the AI-Hub dataset).
The xView dataset consists of high-resolution images collected by the WorldView-3 satellite, with a spatial resolution of 0.3 meters. Each image covers an area of approximately 1 km2 and includes a total of 60 object classes. Key classes include ‘fixed wing aircraft,’ ‘passenger vehicle,’ ‘truck,’ ‘railway vehicle,’ ‘engineering vehicle,’ ‘maritime vessel,’ and ‘building.’ The dataset comprises a total of 1,413 images (Lam et al., 2018).
The VHRShip dataset contains 6,312 high-resolution images collected from Google Earth, comprising 5,312 ship images and 1,000 non-ship images. All images have a spatial resolution of 0.43 meters and are composed of 720 × 1,280 pixel RGB images. The dataset includes a total of 11,179 ship instances, categorized into 24 superclasses and 11 subclasses (Kızılkaya et al., 2022).
The AI-Hub dataset was constructed using KOMPSAT satellite images provided by the Korea Aerospace Research Institute. It consists of five types of data: ‘object of interest detection data,’ ‘building outline extraction data,’ ‘road outline extraction data,’ ‘cloud extraction data,’ and ‘water body extraction data.’ This study utilized the ‘object of interest detection data,’ which is composed of optical images captured by KOMPSAT-3 and KOMPSAT-3A, with spatial resolutions of 0.7 meters and 0.55 meters, respectively. The AI-Hub dataset includes 15 object classes, such as vehicles, ships, airplanes, and trains, with more than 500,000 instances. Each image is an RGB image with a resolution of 1,024 × 1,024 pixels (AI-Hub, 2020). As for the new satellite dataset, three KOMPSAT-3 images and five KOMPSAT-3A images captured over the seas near Korea were acquired. These images were divided into 1,024 × 1,024 pixel tiles, consistent with the AI-Hub dataset, to generate image tiles.
To train various datasets on a single model, it is essential to standardize the formats across datasets. This process involved instance extraction, image editing, and annotation format editing and conversion. Annotations were prepared in common objects in context (COCO) format with horizontal bounding boxes (HBB) and DOTA format with oriented bounding boxes (OBB) to ensure compatibility with different image types, including aerial and satellite imagery.
For the xView dataset, which contains a variety of object classes including both ship and non-ship data, only instances classified as ‘maritime vessel’ were extracted for this study. Additionally, only images containing these instances were used. Since individual image tiles in the xView dataset have a high pixel count, they require significant computational resources for training. To address this, new tiles were created by cropping each image to a size of 900–1,100 pixels centered on ship instances, ensuring more efficient processing.
The original annotations in the xView dataset were provided in COCO format, including HBB and segmentation mask coordinates for ship objects. To convert the COCO format to the DOTA format, OBB coordinates were extracted from the segmentation mask using the minAreaRect function from OpenCV. As a result, a total of 196 image tiles and 4,395 ship instances were generated.
The VHRShip dataset consists entirely of ship instances, and since the image pixel dimensions were suitable for model training, only annotation conversion was required. The dataset was originally annotated in Pascal Visual Object Classes (VOC) format, containing HBB coordinates. Conversion to the COCO format was performed using the HBB information, while conversion to the DOTA format involved direct OBB labeling for all images using the open-source labeling tool Label Studio. During this process, large ship instances, such as container ships, passenger ships, warships, and tankers, were manually excluded to focus on small ships. As a result, the final dataset comprised 2,785 images and 4,949 ship instances. For the AI-Hub dataset, only instances classified as small ships or large ships, along with the images containing these instances, were extracted from the original 15 object classes. Since the annotations included OBB information, conversion to the DOTA format was straightforward. HBB coordinates were then derived from the OBB data by calculating the maximum and minimum x and y values, enabling conversion to the COCO format. This process yielded a dataset containing 390 images and 20,482 ship instances.
The new satellite data were processed by merging the Red, Green, and Blue band images from KOMPSAT-3 and KOMPSAT-3Ato create RGB images, as shown in Fig. 1. The digital number values were stretched to 8-bit by excluding the top 2% of the min and max values. The resulting 8-bit images were then divided into 1,024 × 1,024 pixel tiles. OBB labeling was performed using Label Studio, and the annotations were exported in both COCO and DOTA formats.
As a result, a dataset comprising 838 image tiles and 7,394 ship instances was constructed from a total of 8 satellite images. Since the ship instances in the source datasets were labeled with different classes, all instances were uniformly assigned the class label “ship.” Consequently, a final dataset consisting of 4,209 highresolution image tiles with spatial resolutions of 0.7 meters or better, containing a total of 37,550 ship instances, was constructed (Table 1).
Table 1 Summary of constructed datasets
Dataset | AI-Hub | xView | VHRShip | Ours | ||
---|---|---|---|---|---|---|
No. of images | 390 | 196 | 2,785 | 838 | ||
Spatial resolution (m) | 0.55, | 0.7 | 0.3 | 0.45 | 0.55, | 0.7 |
No. of instances | 20,482 | 4,395 | 4,949 | 7,394 | ||
Image format | PNG, TIF | |||||
Annotation format | COCO, DOTA |
The constructed dataset was divided into train and validation datasets. Typically, train and validation datasets are split in a 7:3 or 8:2 ratio. However, in this study, as shown in Fig. 2, the train dataset was designed to include a variety of scenarios to allow the model to learn ships in diverse environments. These scenarios included numerous ships docked at ports, ships in complex backgrounds, and ships in open water without surrounding background interference. In contrast, the validation dataset was specifically composed of images simulating real search and rescue scenarios, focusing on isolated ships in remote areas far from land.
The constructed dataset was divided into train and validation datasets. Typically, train and validation datasets are split in a 7:3 or 8:2 ratio. However, in this study, as shown in Fig. 2, the train dataset was designed to include a variety of scenarios to allow the model to learn ships in diverse environments. These scenarios included numerous ships docked at ports, ships in complex backgrounds, and ships in open water without surrounding background interference. In contrast, the validation dataset was specifically composed of images simulating real search and rescue scenarios, focusing on isolated ships in remote areas far from land. As a result, the train dataset consisted of 2,773 image tiles out of a total of 4,209, containing 33,455 ship instances out of 37,550. The validation dataset comprised 1,436 tiles and 4,095 ship instances.
The deep learning models used for training and validating the constructed dataset included CNN-based object detection models and DETR models. The model selection process considered various factors such as detection performance, processing speed, and the ability to detect rotated objects, ensuring a diverse range of models with complementary characteristics.
CNN-based models are categorized into 1- and 2-stage models. In this study, four CNN-based models were selected: Faster R-CNN, RetinaNet (Lin et al., 2017), Oriented RepPoints (Li et al., 2022), and Oriented R-CNN (Xie et al., 2021). RetinaNet, a 1-stage model, was chosen for its high speed and reasonable performance, making it suitable for real-time detection scenarios. Faster R-CNN, a 2-stage model, was selected for its high detection accuracy, as it proposes regions of interest before performing precise detection. Oriented RepPoints and Oriented R-CNN were included for their specialization in detecting rotated objects, which is expected to improve the detection of small ship orientations. These models support the OBB format and were trained using the DOTA-format annotations.
The DETR model family was selected for its use of the Transformer architecture, which directly predicts objects and their positions in images without the need for complex anchor box configurations or non-maximum suppression, making it the first object detection model with a streamlined structure. Four DETR variants were chosen: DETR, Deformable DETR (Zhu et al., 2020), Dynamic DETR (Dai et al., 2021), and DETR with improved denoising anchor boxes for end-to-end object detection (DINO) (Zhang et al., 2022). DETR offers simplicity and strong performance, while deformable DETR improves detection through a modified attention mechanism. Dynamic DETR enhances detection with dynamic attention, and DINO maximizes performance through improved denoising anchor boxes. These models supported the HBB format and were trained using COCO-format annotations. All selected models were trained using open-source libraries provided by OpenMMlab. For CNN-based models, codes from MMRotate were utilized (Zhou et al., 2022), while DETR models were trained using codes from MMDetection (Chen et al., 2019).
The trained deep-learning models were validated using the constructed validation dataset. Overall model performance was evaluated using the average precision (AP) metric, which compares the models based on the Precision-Recall curve at various intersections over union (IoU) thresholds for the ship class.
To enable a more detailed comparison, the IoU threshold was fixed at 0.5, and the true positive (TP), false positive (FP), and false negative (FN) counts for each model were analyzed. TP represents correctly detected ships, FP refers to non-ship objects incorrectly identified as ships, and FN denotes ships that were not detected. This analysis was conducted separately for small ships and large ships. Small ships were defined based on the automatic identification system (AIS) installation standard of 300-ton ships, with reference to the Korea Maritime and Fisheries statistical yearbook (Ministry of Oceans and Fisheries, 2022). Specifically, ships with a length of less than 50m and a width of less than 20m were classified as small ships, while larger ships were classified as large ships. This classification was determined by multiplying the OBB dimensions by the image resolution and comparing them to the defined thresholds.
By analyzing the models’ detection accuracy and performance variations according to ship size, a comprehensive evaluation of each model was conducted. Finally, the overall performance results were summarized. Using the validation dataset, the performances of Faster R-CNN, RetinaNet, Oriented RepPoints, Oriented R-CNN, DETR, Deformable DETR, Dynamic DETR, and DINO models were compared. As shown in Table 2, among the CNN-based models, Oriented R-CNN achieved the highest performance with an AP of 0.896. The 1-stage detectors, RetinaNet, and Oriented RepPoints, yielded AP values of 0.858 and 0.862, respectively, offering faster detection speeds but relatively lower detection accuracy. In contrast, the 2-stage detectors, Faster RCNN and Oriented R-CNN, recorded improved accuracy with AP values of 0.874 and 0.896, respectively, surpassing their 1-stage counterparts.
Table 2 Specifications of BlackSky satellite
Item | Specification |
---|---|
Spatial resolution | 1 m (Super resolution: 0.5 m) |
Observation swath | 4 km × 6 km |
Spectral resolution | Red, Green, Blue |
Radiometric resolution | 12-bit |
The DETR-based models consistently outperformed their CNN counterparts. DETR achieved an AP of 0.902, while Deformable DETR and Dynamic DETR recorded AP values of 0.915 and 0.909, respectively. The DINO model demonstrated the highest performance, achieving an AP of 0.934, indicating that the use of the Transformer architecture significantly enhances object detection performance. To further compare model performance, the TP, FP, and FN values were analyzed for the Oriented R-CNN model, which achieved the highest performance among the CNN-based models, and the DINO model, the top-performing DETR-based model, at an IoU threshold of 0.5. The analysis revealed that the DINO model significantly reduced FP, leading to an overall improvement in performance (Fig. 3).
In small ship detection, the DINO model recorded 2,725 TP, outperforming Oriented R-CNN, which had 2,561 TPs. Additionally, the DINO model showed significantly fewer FP, with 270 compared to Oriented R-CNN’s 420, indicating a notably lower false detection rate. This highlights DINO’s higher accuracy in small ship detection and its substantial reduction in false detections. The number of FN was also lower for DINO, with 418 compared to Oriented R-CNN’s 582, demonstrating fewer missed detections. For large ship detection, the performance gap between the two models was relatively smaller. The DINO model recorded 21 false positives and 8 false negatives, slightly outperforming Oriented R-CNN in both metrics. In terms of TPs, DINO achieved 939, while Oriented R-CNN recorded 944, showing comparable performance levels.
The qualitative evaluation was conducted through visual inspection of the detection results. Oriented R-CNN exhibited decreased performance in scenarios involving ship wakes caused by ship movement (Fig. 4a). In contrast, the DINO model maintained stable performance under these conditions. Additionally, in complex backgrounds containing islands or coastal areas (Fig. 4b), Oriented R-CNN showed a higher rate of FP, whereas the DINO model effectively reduced false detections. Notably, in cases where ships appeared as small objects with few pixels in the image (Fig. 4c), the DINO model demonstrated high detection accuracy, highlighting its strength in small object detection. These results indicate that the DINO model provides greater reliability across various maritime environments.
This study selected a demonstration site located 1 km west of Bieung Port in Gunsan, Jeollabuk-do (P1, 35°56′10.79″N, 126° 30′51.66″E). Experiments were conducted twice, on May 10 and September 10, 2024. The experiment simulated a scenario where a small fishing boat loses power due to mechanical failure and drifts. During the tests, the boat was set adrift at the designated location, and its position was periodically recorded using global positioning system (GPS) equipment (Fig. 5).
The satellite imagery used for the detection experiments was captured by the BlackSky satellite, which provides high-resolution images with a spatial resolution of approximately 1 meter. The BlackSky satellite features a rapid revisit capability, allowing up to 15 observations of the same area per day. In this study, to enhance the precision of small ship detection, super-resolution processing was applied via the SpectraAI platform operated by BlackSky, resulting in images with a final spatial resolution of approximately 0.5 meters.
The target object for detection was a 2-ton fishing boat with a length of 8.19 meters and a width of 2.1 meters, equipped with GPS devices for experimental purposes.
The experimental scenario assumed P1 as the incident site, simulating a situation where a ship loses power and drifts in the sea. To track the drifting ship’s location in real-time, new image acquisitions were requested from the BlackSky satellite. The captured images were processed into 8-bit image tiles following the steps shown in Figs. 1(a, b). The generated image tiles were then input into the trained deep-learning models to detect the ship. The detected ship locations were compared with the actual positions recorded by GPS to evaluate detection accuracy and positional precision.
The first and second experiments were conducted at the P1 site on May 10 and September 10, 2024, respectively. BlackSky satellite images were acquired for the P1 site, with imagery captured on May 10, 2024, at 08:46:40 and on September 10, 2024, at 10:06.
In the first experiment, the satellite imagery was analyzed with a focus on the P1 site. As shown in Fig. 6, a small drifting object was identified. Additionally, three civilian ships in operation were also captured in the imagery.
Applying the DINO model, three out of the four ships present in the imagery were successfully detected, including the target ship used in the experiment. As shown in Fig. 7, the detected target ship was located 81.23 meters northeast of the P1 site. This position exhibited an error of approximately 64.23 meters compared to the GPS location recorded at 08:43 on the same day.
The second experiment captured a total of six ships at sea, excluding those docked at the harbor. Analysis of the imagery focused on the P1 site revealed that the target ship was located near the P1 site (Fig. 8). Applying the DINO model, four out of the six ships were successfully detected, including the target ship. For the detected target ship and another nearby civilian ship, ground-based imagery captured at Bieung Port at 10:07:04, approximately one minute after the satellite image, confirmed that these were the same objects (Fig. 9a). The detected target ship was located 286 meters south of the P1 site, with a positional error of approximately 54.89 meters compared to the GPS location recorded at 10:05 (Fig. 9b).
This study compared the performance of deep learning models for small ship detection and evaluated their applicability in real search and rescue scenarios using microsatellite constellations. The DETR-based models demonstrated superior performance overall, with the DINO model achieving the highest AP of 0.934. This result indicates that the Transformer-based DINO model effectively reduces both FP and FN, enabling fast and accurate detection of small ships in real search and rescue situations.
The ship detection experiments using microsatellite constellations highlighted the potential for near-real-time maritime surveillance via BlackSky satellites. In the first experiment, the time gap between the assumed incident and the satellite capture was approximately 16 minutes, while in the second experiment, it was about 66 minutes. This rapid revisit capability demonstrates that microsatellite constellations can provide timely responses in urgent search and rescue scenarios.
The positional errors between the satellite-detected ship locations and GPS-recorded positions were approximately 64.23 meters in the first experiment and 54.89 meters in the second. These discrepancies are attributed to factors such as the time difference between GPS logging and satellite imaging, as well as the georeferencing accuracy of satellite images. However, in emergencies, spending excessive time on precise georeferencing may not be practical. Given the maritime context, such positional errors are acceptable, as rapid results are more critical. By considering these location errors in search planning, more efficient and accurate rescue operations can be conducted.
This study constructed a high-resolution training dataset for small ship detection and validated the detection performance of various deep learning models. Among the DETR-based models, the DINO model demonstrated the highest accuracy in small ship detection, confirming its practical applicability. Detection experiments using microsatellite constellations highlighted their potential to improve the efficiency of maritime surveillance and search and rescue operations through rapid revisit cycles and high spatial resolution. Through this, the research aimed to explore the feasibility of automated ship detection technology using microsatellite constellations and to present its potential application in real-world operations. Additionally, the relationship between the quality of training data and model performance was analyzed, underscoring the importance of high-quality data.
While ships were defined as a single class for model training, future research will focus on subdividing ship classes based on size, shape, and purpose to further improve the generalization ability of the model. In addition, efforts will be made to expand the dataset to include diverse maritime conditions and periods and to refine positional error correction to enhance the stability and accuracy of the detection system. These advancements are expected to significantly improve the speed and precision of maritime incident response and contribute to more efficient search and rescue operations.
This research was supported by the Korea Institute of Marine Science & Technology Promotion (KIMST), funded by the Ministry of Oceans and Fisheries, Korea (RS-2022-KS221629).
No potential conflict of interest relevant to this article was reported.
Korean J. Remote Sens. 2024; 40(6): 943-955
Published online December 31, 2024 https://doi.org/10.7780/kjrs.2024.40.6.1.6
Copyright © Korean Society of Remote Sensing.
Dong Ho Lee1 , Kyoungah Choi2*
1Postdoctoral Researcher, Satellite Application Division, Korea Aerospace Research Institute, Daejeon, Republic of Korea
2Associate Research Fellow, National Infrastructure & Geospatial Information Research Division, Korea Research Institute for Human Settlements, Sejong, Republic of Korea
Correspondence to:Kyoungah Choi
E-mail: shale@krihs.re.kr
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Maritime accidents cause human and property losses, making timely detection of small ships crucial for improving rescue operations’ efficiency. To address this, this study constructed a high-resolution training dataset for small ship detection using satellite imagery and evaluated the performance of various deep learning models. Among the detection transformer (DETR) models, the DETR with improved denoising anchor boxes for end-to-end object detection (DINO) model achieved an average precision (AP) of 0.934, outperforming convolutional neural network (CNN)-based models. Notably, it succeeded in detecting small ships under 10 meters. Furthermore, detection experiments using the BlackSky microsatellite constellation evaluated the efficiency of maritime monitoring and search and rescue operations, with positional errors between global positioning system (GPS) and detected ship locations of 64.23 m and 54.89 m in the first and second experiments, respectively. These results confirm the practicality of the proposed transformer-based high-resolution satellite imagery methodology for small ship detection and suggest potential improvements in detection performance under various maritime conditions.
Keywords: Maritime search and rescue, High-resolution satellite imagery, Small ship detection, Microsatellite constellation, Detection transformer
Maritime accidents occur frequently worldwide, with incidents involving small ships, such as fishing boats, accounting for a significant proportion. According to recent statistics, approximately 80% of annual maritime accidents are related to small ships, leading to severe human and property losses (Lee and Choi, 2024). Typical examples include the capsizing of fishing boats or drifting due to mechanical failure, which, if not promptly addressed, may result in substantial casualties. Therefore, rapid situational awareness and rescue operations are essential in incidents involving small ships.
Conventional maritime search and rescue operations generally rely on ships and personnel managed by the Maritime Rescue and Salvage Association, along with aircraft and patrol ships operated by the Korea Coast Guard. These operations depend on passive methods such as radar, visual observation, and ship-to-ship radio communication, which present several limitations (Yun, 2020). Personnel-dependent rescue missions are inherently risky, and factors such as crew fatigue and harsh marine environments can further reduce operational efficiency. Particularly, when small teams conduct searches across vast maritime areas, detection speed, and accuracy are significantly compromised. These challenges highlight the urgent need for new technologies to enhance the efficiency of maritime search and rescue operations.
Satellite imagery has emerged as a crucial tool for overcoming the limitations of traditional search and rescue methods, expanding search areas, and enhancing the efficiency of maritime surveillance and rescue operations. With advancements in deep learning, research on ship detection using satellite imagery in the field of remote sensing has been actively conducted. To support these efforts, several large-scale satellites and aerial image datasets have been made publicly available, including high-resolution ship collection 2016 (HRSC2016) (Liu et al., 2017), dataset for object detection in aerial images (DOTA) (Xia et al., 2018), airbus ship detection (Al-Saad et al., 2021), and object detection in optical remote sensing images (DIOR) (Li et al., 2020). These datasets have facilitated the development of various deep-learning models aimed at improving ship detection performance.
Prominent ship detection models include the faster regionbased convolutional neural network (Faster R-CNN), which utilizes a region proposal network to detect objects quickly and accurately (Ren et al., 2015). The You Only Look Once (YOLO) model, on the other hand, processes the entire image in a single pass using a unified neural network, enabling real-time detection (Redmon et al., 2016). More recently, end-to-end object detection with transformers (DETR) has been introduced, leveraging the Transformer architecture to simplify the traditional object detection pipeline while delivering high performance (Carion et al., 2020). Research in ship detection has thus primarily focused on improving object detection performance by utilizing largescale benchmark datasets.
However, most benchmark datasets consist of medium- to low-resolution satellite imagery, lacking high-resolution datasets and instances suitable for detecting small ships. Even though developed models demonstrate high detection performance, their practicality in real rescue operations remains limited. Traditional satellite imagery has the advantage of covering vast areas but falls short in urgent rescue situations due to limitations in temporal resolution and real-time data processing, making immediate response difficult.
These limitations can be addressed through the use of microsatellite constellations. Operating in low Earth orbit, microsatellite constellations provide high spatiotemporal resolution with short revisit times (Kim and Kang, 2021). This capability allows for rapid response to dynamic maritime situations, enabling more frequent observations and timely actions in search and rescue operations, effectively overcoming the constraints of traditional satellites. Despite these advantages, practical applications of microsatellite constellations in rescue operations remain scarce.
Therefore, this study aims to build a training dataset for small ship detection and assess the detection efficiency and practical applicability of microsatellite constellations in real-world operations. The research methodology comprises three main stages. First, a high-resolution training dataset suitable for small ship detection was constructed. Second, various deeplearning detection models were trained and evaluated using the developed dataset. Finally, detection experiments were conducted to assess the practical applicability of microsatellite constellations in real maritime environments for small ship detection.
In this study, high-resolution satellite and aerial imagery from domestic and international open-source datasets, along with new data from domestic satellites, were utilized for small ship detection. The datasets were selected based on the criterion of having spatial resolutions of 1 meter or higher to enable the extraction of ship dimension information. Accordingly, the opensource datasets chosen include xView, very high-resolution ships dataset (VHRShip), and the “Satellite Imagery Object Detection 1.0 ver” dataset provided by AI-Hub (hereafter referred to as the AI-Hub dataset).
The xView dataset consists of high-resolution images collected by the WorldView-3 satellite, with a spatial resolution of 0.3 meters. Each image covers an area of approximately 1 km2 and includes a total of 60 object classes. Key classes include ‘fixed wing aircraft,’ ‘passenger vehicle,’ ‘truck,’ ‘railway vehicle,’ ‘engineering vehicle,’ ‘maritime vessel,’ and ‘building.’ The dataset comprises a total of 1,413 images (Lam et al., 2018).
The VHRShip dataset contains 6,312 high-resolution images collected from Google Earth, comprising 5,312 ship images and 1,000 non-ship images. All images have a spatial resolution of 0.43 meters and are composed of 720 × 1,280 pixel RGB images. The dataset includes a total of 11,179 ship instances, categorized into 24 superclasses and 11 subclasses (Kızılkaya et al., 2022).
The AI-Hub dataset was constructed using KOMPSAT satellite images provided by the Korea Aerospace Research Institute. It consists of five types of data: ‘object of interest detection data,’ ‘building outline extraction data,’ ‘road outline extraction data,’ ‘cloud extraction data,’ and ‘water body extraction data.’ This study utilized the ‘object of interest detection data,’ which is composed of optical images captured by KOMPSAT-3 and KOMPSAT-3A, with spatial resolutions of 0.7 meters and 0.55 meters, respectively. The AI-Hub dataset includes 15 object classes, such as vehicles, ships, airplanes, and trains, with more than 500,000 instances. Each image is an RGB image with a resolution of 1,024 × 1,024 pixels (AI-Hub, 2020). As for the new satellite dataset, three KOMPSAT-3 images and five KOMPSAT-3A images captured over the seas near Korea were acquired. These images were divided into 1,024 × 1,024 pixel tiles, consistent with the AI-Hub dataset, to generate image tiles.
To train various datasets on a single model, it is essential to standardize the formats across datasets. This process involved instance extraction, image editing, and annotation format editing and conversion. Annotations were prepared in common objects in context (COCO) format with horizontal bounding boxes (HBB) and DOTA format with oriented bounding boxes (OBB) to ensure compatibility with different image types, including aerial and satellite imagery.
For the xView dataset, which contains a variety of object classes including both ship and non-ship data, only instances classified as ‘maritime vessel’ were extracted for this study. Additionally, only images containing these instances were used. Since individual image tiles in the xView dataset have a high pixel count, they require significant computational resources for training. To address this, new tiles were created by cropping each image to a size of 900–1,100 pixels centered on ship instances, ensuring more efficient processing.
The original annotations in the xView dataset were provided in COCO format, including HBB and segmentation mask coordinates for ship objects. To convert the COCO format to the DOTA format, OBB coordinates were extracted from the segmentation mask using the minAreaRect function from OpenCV. As a result, a total of 196 image tiles and 4,395 ship instances were generated.
The VHRShip dataset consists entirely of ship instances, and since the image pixel dimensions were suitable for model training, only annotation conversion was required. The dataset was originally annotated in Pascal Visual Object Classes (VOC) format, containing HBB coordinates. Conversion to the COCO format was performed using the HBB information, while conversion to the DOTA format involved direct OBB labeling for all images using the open-source labeling tool Label Studio. During this process, large ship instances, such as container ships, passenger ships, warships, and tankers, were manually excluded to focus on small ships. As a result, the final dataset comprised 2,785 images and 4,949 ship instances. For the AI-Hub dataset, only instances classified as small ships or large ships, along with the images containing these instances, were extracted from the original 15 object classes. Since the annotations included OBB information, conversion to the DOTA format was straightforward. HBB coordinates were then derived from the OBB data by calculating the maximum and minimum x and y values, enabling conversion to the COCO format. This process yielded a dataset containing 390 images and 20,482 ship instances.
The new satellite data were processed by merging the Red, Green, and Blue band images from KOMPSAT-3 and KOMPSAT-3Ato create RGB images, as shown in Fig. 1. The digital number values were stretched to 8-bit by excluding the top 2% of the min and max values. The resulting 8-bit images were then divided into 1,024 × 1,024 pixel tiles. OBB labeling was performed using Label Studio, and the annotations were exported in both COCO and DOTA formats.
As a result, a dataset comprising 838 image tiles and 7,394 ship instances was constructed from a total of 8 satellite images. Since the ship instances in the source datasets were labeled with different classes, all instances were uniformly assigned the class label “ship.” Consequently, a final dataset consisting of 4,209 highresolution image tiles with spatial resolutions of 0.7 meters or better, containing a total of 37,550 ship instances, was constructed (Table 1).
Table 1 . Summary of constructed datasets.
Dataset | AI-Hub | xView | VHRShip | Ours | ||
---|---|---|---|---|---|---|
No. of images | 390 | 196 | 2,785 | 838 | ||
Spatial resolution (m) | 0.55, | 0.7 | 0.3 | 0.45 | 0.55, | 0.7 |
No. of instances | 20,482 | 4,395 | 4,949 | 7,394 | ||
Image format | PNG, TIF | |||||
Annotation format | COCO, DOTA |
The constructed dataset was divided into train and validation datasets. Typically, train and validation datasets are split in a 7:3 or 8:2 ratio. However, in this study, as shown in Fig. 2, the train dataset was designed to include a variety of scenarios to allow the model to learn ships in diverse environments. These scenarios included numerous ships docked at ports, ships in complex backgrounds, and ships in open water without surrounding background interference. In contrast, the validation dataset was specifically composed of images simulating real search and rescue scenarios, focusing on isolated ships in remote areas far from land.
The constructed dataset was divided into train and validation datasets. Typically, train and validation datasets are split in a 7:3 or 8:2 ratio. However, in this study, as shown in Fig. 2, the train dataset was designed to include a variety of scenarios to allow the model to learn ships in diverse environments. These scenarios included numerous ships docked at ports, ships in complex backgrounds, and ships in open water without surrounding background interference. In contrast, the validation dataset was specifically composed of images simulating real search and rescue scenarios, focusing on isolated ships in remote areas far from land. As a result, the train dataset consisted of 2,773 image tiles out of a total of 4,209, containing 33,455 ship instances out of 37,550. The validation dataset comprised 1,436 tiles and 4,095 ship instances.
The deep learning models used for training and validating the constructed dataset included CNN-based object detection models and DETR models. The model selection process considered various factors such as detection performance, processing speed, and the ability to detect rotated objects, ensuring a diverse range of models with complementary characteristics.
CNN-based models are categorized into 1- and 2-stage models. In this study, four CNN-based models were selected: Faster R-CNN, RetinaNet (Lin et al., 2017), Oriented RepPoints (Li et al., 2022), and Oriented R-CNN (Xie et al., 2021). RetinaNet, a 1-stage model, was chosen for its high speed and reasonable performance, making it suitable for real-time detection scenarios. Faster R-CNN, a 2-stage model, was selected for its high detection accuracy, as it proposes regions of interest before performing precise detection. Oriented RepPoints and Oriented R-CNN were included for their specialization in detecting rotated objects, which is expected to improve the detection of small ship orientations. These models support the OBB format and were trained using the DOTA-format annotations.
The DETR model family was selected for its use of the Transformer architecture, which directly predicts objects and their positions in images without the need for complex anchor box configurations or non-maximum suppression, making it the first object detection model with a streamlined structure. Four DETR variants were chosen: DETR, Deformable DETR (Zhu et al., 2020), Dynamic DETR (Dai et al., 2021), and DETR with improved denoising anchor boxes for end-to-end object detection (DINO) (Zhang et al., 2022). DETR offers simplicity and strong performance, while deformable DETR improves detection through a modified attention mechanism. Dynamic DETR enhances detection with dynamic attention, and DINO maximizes performance through improved denoising anchor boxes. These models supported the HBB format and were trained using COCO-format annotations. All selected models were trained using open-source libraries provided by OpenMMlab. For CNN-based models, codes from MMRotate were utilized (Zhou et al., 2022), while DETR models were trained using codes from MMDetection (Chen et al., 2019).
The trained deep-learning models were validated using the constructed validation dataset. Overall model performance was evaluated using the average precision (AP) metric, which compares the models based on the Precision-Recall curve at various intersections over union (IoU) thresholds for the ship class.
To enable a more detailed comparison, the IoU threshold was fixed at 0.5, and the true positive (TP), false positive (FP), and false negative (FN) counts for each model were analyzed. TP represents correctly detected ships, FP refers to non-ship objects incorrectly identified as ships, and FN denotes ships that were not detected. This analysis was conducted separately for small ships and large ships. Small ships were defined based on the automatic identification system (AIS) installation standard of 300-ton ships, with reference to the Korea Maritime and Fisheries statistical yearbook (Ministry of Oceans and Fisheries, 2022). Specifically, ships with a length of less than 50m and a width of less than 20m were classified as small ships, while larger ships were classified as large ships. This classification was determined by multiplying the OBB dimensions by the image resolution and comparing them to the defined thresholds.
By analyzing the models’ detection accuracy and performance variations according to ship size, a comprehensive evaluation of each model was conducted. Finally, the overall performance results were summarized. Using the validation dataset, the performances of Faster R-CNN, RetinaNet, Oriented RepPoints, Oriented R-CNN, DETR, Deformable DETR, Dynamic DETR, and DINO models were compared. As shown in Table 2, among the CNN-based models, Oriented R-CNN achieved the highest performance with an AP of 0.896. The 1-stage detectors, RetinaNet, and Oriented RepPoints, yielded AP values of 0.858 and 0.862, respectively, offering faster detection speeds but relatively lower detection accuracy. In contrast, the 2-stage detectors, Faster RCNN and Oriented R-CNN, recorded improved accuracy with AP values of 0.874 and 0.896, respectively, surpassing their 1-stage counterparts.
Table 2 . Specifications of BlackSky satellite.
Item | Specification |
---|---|
Spatial resolution | 1 m (Super resolution: 0.5 m) |
Observation swath | 4 km × 6 km |
Spectral resolution | Red, Green, Blue |
Radiometric resolution | 12-bit |
The DETR-based models consistently outperformed their CNN counterparts. DETR achieved an AP of 0.902, while Deformable DETR and Dynamic DETR recorded AP values of 0.915 and 0.909, respectively. The DINO model demonstrated the highest performance, achieving an AP of 0.934, indicating that the use of the Transformer architecture significantly enhances object detection performance. To further compare model performance, the TP, FP, and FN values were analyzed for the Oriented R-CNN model, which achieved the highest performance among the CNN-based models, and the DINO model, the top-performing DETR-based model, at an IoU threshold of 0.5. The analysis revealed that the DINO model significantly reduced FP, leading to an overall improvement in performance (Fig. 3).
In small ship detection, the DINO model recorded 2,725 TP, outperforming Oriented R-CNN, which had 2,561 TPs. Additionally, the DINO model showed significantly fewer FP, with 270 compared to Oriented R-CNN’s 420, indicating a notably lower false detection rate. This highlights DINO’s higher accuracy in small ship detection and its substantial reduction in false detections. The number of FN was also lower for DINO, with 418 compared to Oriented R-CNN’s 582, demonstrating fewer missed detections. For large ship detection, the performance gap between the two models was relatively smaller. The DINO model recorded 21 false positives and 8 false negatives, slightly outperforming Oriented R-CNN in both metrics. In terms of TPs, DINO achieved 939, while Oriented R-CNN recorded 944, showing comparable performance levels.
The qualitative evaluation was conducted through visual inspection of the detection results. Oriented R-CNN exhibited decreased performance in scenarios involving ship wakes caused by ship movement (Fig. 4a). In contrast, the DINO model maintained stable performance under these conditions. Additionally, in complex backgrounds containing islands or coastal areas (Fig. 4b), Oriented R-CNN showed a higher rate of FP, whereas the DINO model effectively reduced false detections. Notably, in cases where ships appeared as small objects with few pixels in the image (Fig. 4c), the DINO model demonstrated high detection accuracy, highlighting its strength in small object detection. These results indicate that the DINO model provides greater reliability across various maritime environments.
This study selected a demonstration site located 1 km west of Bieung Port in Gunsan, Jeollabuk-do (P1, 35°56′10.79″N, 126° 30′51.66″E). Experiments were conducted twice, on May 10 and September 10, 2024. The experiment simulated a scenario where a small fishing boat loses power due to mechanical failure and drifts. During the tests, the boat was set adrift at the designated location, and its position was periodically recorded using global positioning system (GPS) equipment (Fig. 5).
The satellite imagery used for the detection experiments was captured by the BlackSky satellite, which provides high-resolution images with a spatial resolution of approximately 1 meter. The BlackSky satellite features a rapid revisit capability, allowing up to 15 observations of the same area per day. In this study, to enhance the precision of small ship detection, super-resolution processing was applied via the SpectraAI platform operated by BlackSky, resulting in images with a final spatial resolution of approximately 0.5 meters.
The target object for detection was a 2-ton fishing boat with a length of 8.19 meters and a width of 2.1 meters, equipped with GPS devices for experimental purposes.
The experimental scenario assumed P1 as the incident site, simulating a situation where a ship loses power and drifts in the sea. To track the drifting ship’s location in real-time, new image acquisitions were requested from the BlackSky satellite. The captured images were processed into 8-bit image tiles following the steps shown in Figs. 1(a, b). The generated image tiles were then input into the trained deep-learning models to detect the ship. The detected ship locations were compared with the actual positions recorded by GPS to evaluate detection accuracy and positional precision.
The first and second experiments were conducted at the P1 site on May 10 and September 10, 2024, respectively. BlackSky satellite images were acquired for the P1 site, with imagery captured on May 10, 2024, at 08:46:40 and on September 10, 2024, at 10:06.
In the first experiment, the satellite imagery was analyzed with a focus on the P1 site. As shown in Fig. 6, a small drifting object was identified. Additionally, three civilian ships in operation were also captured in the imagery.
Applying the DINO model, three out of the four ships present in the imagery were successfully detected, including the target ship used in the experiment. As shown in Fig. 7, the detected target ship was located 81.23 meters northeast of the P1 site. This position exhibited an error of approximately 64.23 meters compared to the GPS location recorded at 08:43 on the same day.
The second experiment captured a total of six ships at sea, excluding those docked at the harbor. Analysis of the imagery focused on the P1 site revealed that the target ship was located near the P1 site (Fig. 8). Applying the DINO model, four out of the six ships were successfully detected, including the target ship. For the detected target ship and another nearby civilian ship, ground-based imagery captured at Bieung Port at 10:07:04, approximately one minute after the satellite image, confirmed that these were the same objects (Fig. 9a). The detected target ship was located 286 meters south of the P1 site, with a positional error of approximately 54.89 meters compared to the GPS location recorded at 10:05 (Fig. 9b).
This study compared the performance of deep learning models for small ship detection and evaluated their applicability in real search and rescue scenarios using microsatellite constellations. The DETR-based models demonstrated superior performance overall, with the DINO model achieving the highest AP of 0.934. This result indicates that the Transformer-based DINO model effectively reduces both FP and FN, enabling fast and accurate detection of small ships in real search and rescue situations.
The ship detection experiments using microsatellite constellations highlighted the potential for near-real-time maritime surveillance via BlackSky satellites. In the first experiment, the time gap between the assumed incident and the satellite capture was approximately 16 minutes, while in the second experiment, it was about 66 minutes. This rapid revisit capability demonstrates that microsatellite constellations can provide timely responses in urgent search and rescue scenarios.
The positional errors between the satellite-detected ship locations and GPS-recorded positions were approximately 64.23 meters in the first experiment and 54.89 meters in the second. These discrepancies are attributed to factors such as the time difference between GPS logging and satellite imaging, as well as the georeferencing accuracy of satellite images. However, in emergencies, spending excessive time on precise georeferencing may not be practical. Given the maritime context, such positional errors are acceptable, as rapid results are more critical. By considering these location errors in search planning, more efficient and accurate rescue operations can be conducted.
This study constructed a high-resolution training dataset for small ship detection and validated the detection performance of various deep learning models. Among the DETR-based models, the DINO model demonstrated the highest accuracy in small ship detection, confirming its practical applicability. Detection experiments using microsatellite constellations highlighted their potential to improve the efficiency of maritime surveillance and search and rescue operations through rapid revisit cycles and high spatial resolution. Through this, the research aimed to explore the feasibility of automated ship detection technology using microsatellite constellations and to present its potential application in real-world operations. Additionally, the relationship between the quality of training data and model performance was analyzed, underscoring the importance of high-quality data.
While ships were defined as a single class for model training, future research will focus on subdividing ship classes based on size, shape, and purpose to further improve the generalization ability of the model. In addition, efforts will be made to expand the dataset to include diverse maritime conditions and periods and to refine positional error correction to enhance the stability and accuracy of the detection system. These advancements are expected to significantly improve the speed and precision of maritime incident response and contribute to more efficient search and rescue operations.
This research was supported by the Korea Institute of Marine Science & Technology Promotion (KIMST), funded by the Ministry of Oceans and Fisheries, Korea (RS-2022-KS221629).
No potential conflict of interest relevant to this article was reported.
Table 1 . Summary of constructed datasets.
Dataset | AI-Hub | xView | VHRShip | Ours | ||
---|---|---|---|---|---|---|
No. of images | 390 | 196 | 2,785 | 838 | ||
Spatial resolution (m) | 0.55, | 0.7 | 0.3 | 0.45 | 0.55, | 0.7 |
No. of instances | 20,482 | 4,395 | 4,949 | 7,394 | ||
Image format | PNG, TIF | |||||
Annotation format | COCO, DOTA |
Table 2 . Specifications of BlackSky satellite.
Item | Specification |
---|---|
Spatial resolution | 1 m (Super resolution: 0.5 m) |
Observation swath | 4 km × 6 km |
Spectral resolution | Red, Green, Blue |
Radiometric resolution | 12-bit |
Youngon Oh, Jeonghyo Oh, Impyeong Lee
Korean J. Remote Sens. 2024; 40(6): 1079-1093