Research Article

Split Viewer

Korean J. Remote Sens. 2024; 40(6): 1219-1227

Published online: December 31, 2024

https://doi.org/10.7780/kjrs.2024.40.6.1.27

© Korean Society of Remote Sensing

Enhanced Vehicle Detection and Segmentation Using the SAMRS Model: Applications in High-Resolution Satellite Imagery

Jihyun Lee1, Taeyeon Won2, Kwangseob Kim3, Jinwoo Kim4, Seungchul Lee5*

1Researcher, Satellite Application Team, Stellarvision Inc., Seoul, Republic of Korea
2Senior Researcher, Satellite Application Team, Stellarvision Inc., Seoul, Republic of Korea
3Assistant Professor, Department of Computer Software, Kyungmin University, Uijeongbu, Republic of Korea
4Chief Researcher, Satellite System Research Center, LIG Nex1, Yongin, Republic of Korea
5CEO, Stellarvision Inc., Seoul, Republic of Korea

Correspondence to : Seungchul Lee
E-mail: leesc@stellarvision.kr

Received: November 22, 2024; Revised: December 10, 2024; Accepted: December 19, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Deep learning technologies have revolutionized image processing and analysis, introducing groundbreaking innovations that significantly improve the accuracy and efficiency of object segmentation, especially in satellite imagery. The increasing availability of high-resolution satellite images has created a demand for advanced models capable of handling the complexities of object detection in diverse environments. This study investigates the potential of the Segment Anything Model for Remote Sensing (SAMRS), a deep learning framework specifically designed for remote sensing applications, to accurately identify and segment a wide range of objects within satellite imagery. The model was trained using prominent datasets such as Dataset for Object Detection in Aerial Images (DOTA), Dataset for Object Detection in Optical Remote Sensing Images (DIOR), Fine-grained Object Detection in Aerial Images for Remote Sensing Version 2.0 (FAIR1M-2.0), and Instance Segmentation in Aerial Images Dataset (iSAID), enabling it to learn diverse object features and complexities. The evaluation of SAMRS was conducted on Northwestern Polytechnical University Very High Resolution 10-Class Dataset (NWPU VHR-10) and Beijing-3B datasets, where it demonstrated impressive results. In vehicle detection tasks, SAMRS achieved an Intersection over Union (IoU) of 0.9175, an F1-score of 0.9570, and an accuracy of 0.9385. These metrics highlight SAMRS’s capability to automate object detection in complex satellite images, overcoming challenges posed by intricate backgrounds and diverse object sizes. Furthermore, SAMRS is optimized to analyze both large and small-scale objects, ensuring robust performance across varying conditions. The findings emphasize the model’s utility not only for current remote sensing applications but also for future extensions involving drone imagery and domestic satellite datasets. By automating object detection and segmentation, SAMRS has the potential to transform practical fields such as urban planning, disaster management, traffic monitoring, and environmental analysis, making it a vital tool in advancing satellite imagery analysis.

Keywords Remote sensing, Deep learning, SAMRS, SAM, Segmentation

Advancements in Earth observation technology have significantly increased satellite imagery, essential for agriculture, urban planning, and environmental protection. However, much of this vast data remains unlabeled, particularly at the pixel level, which is crucial for object segmentation and analysis. Manual annotation is labor-intensive, demanding highly specialized knowledge and significant time, as highlighted by previous studies (Pritt and Chern, 2017; Tehsin et al., 2023). Additionally, remote sensing images encompass diverse resolutions and spectral bands, requiring advanced segmentation techniques for accurate analysis. These challenges highlight the limitations of relying solely on human interpretation, leaving much satellite imagery unlabeled and restricting its potential applications in fields such as agriculture, urban planning, and environmental monitoring (Swapna et al., 2023).

To address these issues, automated labeling and object segmentation technologies based on artificial intelligence, particularly deep learning, have recently gained traction. Kang et al. (2022) demonstrated the effectiveness of deep learning-based image-matching techniques for multi-sensor high-resolution satellite imagery, enabling automated labeling and object recognition across various resolutions and spectral bands. Deep learning models excel at learning patterns from vast satellite imagery datasets, enabling automated labeling, object recognition, and segmentation with exceptional performance (Uzma et al., 2024). Object segmentation, in particular, facilitates precise pixel-level boundary detection and analysis across various resolutions and spectral bands, allowing for detailed interpretation even in complex satellite images (Yang and Tang, 2021).

Several studies have highlighted the potential of deep learning-based segmentation techniques. For example, Kim and Lee (2024) demonstrated that combining segmentation and upscaling significantly improves object detection in satellite imagery. Song et al. (2019) applied transfer learning and change detection networks to overcome the challenges posed by insufficient training data, enabling effective change detection in highresolution satellite imagery. Lee and Lee (2020) optimized deep learning methods for land cover classification, emphasizing the evolution of classification techniques from supervised and unsupervised methods to advanced machine learning and deep learning approaches. Shin et al. (2021) proposed a Residual U-Net model for semantic segmentation, which reduces feature loss and efficiently extracts contextual information. More recently, Yun et al. (2023) improved the segmentation performance for small objects in satellite imagery by integrating ESRGAN with Semantic Soft Segmentation, demonstrating the effectiveness of super-resolution techniques in object segmentation tasks.

Building upon these advancements, this study explores the potential of the Segment Anything Model for Remote Sensing (SAMRS), a deep learning model specialized for remote sensing, to automate object segmentation. SAMRS is an extended framework of the Segment Anything Model (SAM) tailored to remote sensing datasets, aiming for accurate and efficient analysis of satellite imagery. By leveraging SAM’s inherent advantages, SAMRS is optimized to handle the diverse spectral properties of remote sensing data effectively. SAMRS also retains the zero-shot segmentation capability of SAM, which allows the model to generalize segmentation tasks without requiring retraining on specific datasets. This enables SAMRS to perform robust object segmentation across different regions and scenes, significantly reducing the need for labor-intensive manual annotations. By incorporating this capability, SAMRS not only streamlines segmentation tasks but also enhances scalability, making it suitable for a wide range of applications.

In particular, this study introduces significant advancements in SAMRS through fine-tuning for pixel-level segmentation, focusing on vehicle detection as a case study. By fine-tuning SAMRS, we successfully segmented small and large vehicles, addressing critical challenges in object differentiation. Moreover, we employed the state-of-the-art Swin Transformer architecture to enhance segmentation performance, particularly in identifying and separating complex object categories. These contributions represent a clear differentiation from prior studies, which primarily focused on broader object recognition without evaluating such granular segmentation tasks.

To contextualize our approach, we build upon the insights from previous studies, integrating their findings into the design of SAMRS while addressing the limitations they identified. This research not only demonstrates the advanced segmentation capabilities of SAMRS but also highlights its application in diverse fields such as agriculture, environmental protection, urban planning, and disaster management. Automating detailed object segmentation tasks through SAMRS offers experts opportunities to focus on higher-level analyses, enhancing productivity and insights.

Additionally, this study provides a comprehensive evaluation of SAMRS across both training and evaluation datasets. Prominent datasets such as Dataset for Object Detection in Aerial Images (DOTA), Dataset for Object Detection in Optical Remote Sensing Images (DIOR), Fine-grained Object Detection in Aerial Images for Remote Sensing Version 2.0 (FAIR1M-2.0), and Instance Segmentation in Aerial Images Dataset (iSAID) were used for training to ensure robust performance across diverse scenarios. Independent evaluation datasets, including Northwestern Polytechnical University Very High Resolution 10-Class Dataset (NWPU VHR-10) and Beijing-3B, were utilized to validate the model’s segmentation and classification capabilities in real-world environments. By addressing gaps in manual segmentation, SAMRS offers a scalable and innovative solution for complex satellite data analysis.

2.1. Methodology

The United States, China, Russia, and other countries with a significant number of satellites collectively observe the Earth, generating an enormous amount of satellite imagery. As the volume of satellite data grows, meaningful segmentation of these images has become essential for effective use in various fields. High-resolution satellite imagery, in particular, requires sophisticated analysis for practical applications such as agriculture, urban planning, environmental monitoring, and defense.

In South Korea, the expansion of data centers reflects the continuous growth of infrastructure for storing, managing, and processing satellite data. As shown in Fig. 1, the number of data centers in South Korea increased from 53 in 2000 to 156 in 2020, according to the Korea Data Center Council (KDCC), and is projected to reach 180 by 2025. This expansion underscores the increasing need for technologies that enable the meaningful segmentation and analysis of vast satellite data.

Fig. 1. Trends in the growth of domestic data centers.

This study aims to develop effective methods for segmenting large volumes of satellite imagery for practical applications across various domains. Deep learning-based object segmentation technologies enable high-precision segmentation of diverse objects within satellite images, facilitating detailed analysis and maximizing the value of satellite data. Fig. 2 illustrates the processing workflow of the SAMRS model utilized in this study.

Fig. 2. Flowchart of the SAMRS framework: Integration of multi-dataset segmentation heads (DOTA, DIOR, and FAIR1M-2.0) with backbone network and loss aggregation mechanism.

SAMRS, developed by Wang et al. (2023), is an object segmentation model tailored for remote sensing. The model extends the capabilities of the original SAM framework to accommodate the unique characteristics of remote sensing datasets. By doing so, SAMRS enables the generation of large-scale segmentation datasets, improving the efficiency and accuracy of remote sensing tasks. A key strength of SAMRS is its ability to identify and segment the boundaries and shapes of various objects in complex satellite imagery, overcoming the limitations of manual labeling. Moreover, SAMRS incorporates the zero-shot segmentation capabilities of the original SAM, allowing it to segment previously unseen object classes without requiring additional training. This functionality significantly enhances the model’s adaptability to diverse datasets and reduces the reliance on manually annotated data.

The SAMRS framework consists of three primary components: the backbone network, the segmentation decoder, and the loss aggregation mechanism. The backbone network extracts robust features from input satellite imagery, which are then processed by the segmentation decoder to generate pixel-level segmentation outputs. In this study, the Swin Transformer was employed as the backbone network due to its hierarchical structure and efficient window-based attention mechanism. These features enable the Swin Transformer to process high-resolution imagery efficiently while minimizing memory usage. The segmentation decoder utilizes multiple independent segmentation heads, each tailored to specific datasets such as DOTA, DIOR, and FAIR1M-2.0, ensuring precise segmentation across diverse datasets. Finally, the loss aggregation mechanism combines the dataset-specific losses, where segmentation losses for the respective datasets are represented explicitly. This multi-head pre-training strategy ensures that SAMRS can effectively learn from diverse datasets while maintaining high segmentation accuracy.

To further optimize SAMRS performance, the model was finetuned on large-scale remote sensing datasets such as iSAID, which includes pixel-level annotations for a wide range of object classes. Fine-tuning allows SAMRS to adapt its segmentation capabilities to the specific requirements of remote sensing tasks, such as the segmentation of small, complex, or irregularly shaped objects. Data augmentation techniques such as random scaling, horizontal and vertical flipping, and color jittering were employed during training to enhance the model’s robustness and generalization capabilities.

The integration of the Swin Transformer into the SAMRS framework played a critical role in achieving superior segmentation performance. The Swin Transformer’s window-based attention mechanism efficiently processes local patches of high-resolution images, while its hierarchical structure facilitates multi-scale feature learning. These attributes enable SAMRS to capture both fine-grained details and global context, making it highly effective for analyzing complex satellite imagery.

By leveraging the capabilities of SAMRS, this study achieved significant improvements in object segmentation performance across diverse remote sensing datasets. The model demonstrated the ability to accurately segment various object classes, including small cars and large cars, in high-resolution imagery. These advancements highlight the potential of SAMRS to enhance the accuracy and efficiency of satellite image analysis, offering automated object detection and segmentation capabilities for applications in agriculture, urban planning, environmental monitoring, and defense. By addressing the challenges of manual labeling and segmentation for large-scale datasets, SAMRS represents a significant step forward in the field of remote sensing.

2.2. Study Data

In this study, the performance of the SAMRS model was evaluated using multiple remote sensing datasets, summarized in Table 1. Among these, iSAID is marked with a bold border in Table 1, as it was specifically utilized for fine-tuning the SAMRS model, highlighting its critical role in refining the model’s segmentation capabilities. The primary datasets include DOTA (Satellite Object deTection in Aerial images [SOTA]), DIOR (Satellite Imagery Object Recognition [SIOR]), FAIR1M-2.0 (Fine-grAined Segmentation for high-resolution remoTe sensing imagery [FAST]), and iSAID, each tailored for specific object types, resolutions, and use cases. These datasets were chosen to ensure diverse training scenarios and robust performance validation for SAMRS.

Table 1 Datasets used for training the SAMRS model

DatasetImagesCategoryChannelsResolutionImage size
SOTA17,48018RGB0.3–1.0 m1,024×1,024
SIOR23,46320RGB0.5–30.0 m800×800
FAST64,14737RGB0.3–1.0 m600×600
iSAID2,80615RGB0.3–1.0 m800×800–13,000×4,000


DOTA (SOTA) contains 16 object classes and focuses on detecting and segmenting various objects such as buildings, aircraft, ships, and vehicles in complex environments. Specifically, DOTA includes objects at various angles and scales, making it ideal for evaluating model versatility and robustness. This dataset plays a critical role in identifying and analyzing diverse objects in remote sensing images.

DIOR (SIOR) comprises 20 object classes and is a large-scale remote sensing dataset constructed to reflect varying weather conditions, seasons, and optical variations. It includes key objects such as buildings, ships, aircraft, and bridges, and focuses on evaluating high-resolution object detection and segmentation performance. DIOR is valuable for validating model robustness to environmental changes.

FAIR1M-2.0 (FAST) is designed for military applications and emphasizes the precise detection of objects of various sizes. This dataset plays a vital role in assessing the accuracy and adaptability of SAMRS for military and industrial applications. FAST is particularly useful for object detection and analysis in military scenarios.

iSAID is a large-scale remote sensing dataset designed for pixel-level object segmentation, consisting of 2,806 images and 655,451 object instances. iSAID supports complex segmentation tasks, including vehicle detection and size classification into large and small vehicles. It excels in accurately distinguishing vehicles of various sizes and shapes, making it an essential resource for applications such as traffic monitoring, traffic analysis, urban planning, and logistics management. Differentiating vehicle size and shape in remote sensing images is crucial for detecting small objects in complex backgrounds, and the iSAID dataset serves as an important benchmark for evaluating these capabilities.

Together, these datasets comprehensively evaluate the SAMRS model’s performance across diverse characteristics. Compared to other datasets, their significantly larger scale sets them apart. For instance, while Unmanned Aerial Vehicle (UAV) ID contains 420 images and DeepGlobe Land Cover includes 1,146 images, SAMRS provides a total of 105,090 images, enabling large-scale training and contributing to improved generalization performance of the model.

Additionally, two datasets, NWPU VHR-10, and Beijing-3B, were utilized as independent evaluation datasets to validate the model’s segmentation and classification performance in diverse scenarios. Table 2 summarizes the key characteristics of the NWPU VHR-10 and Beijing-3B datasets, providing an overview of their resolution, categories, and notable features used in the evaluation.

Table 2 Datasets used for performance evaluation

DatasetResolutionCategory/ClassFeature
NWPU VHR-100.08 m, 0.5–2.0 m10 classesMulti-class dataset including vehicles, buildings, ships, aircraft, and other key ground objects
Beijing-3B0.3 mSmall vehicles / Large vehiclesReal-world imagery including complex urban environments and diverse objects


In this study, the segmentation and object detection performance of the proposed model was evaluated using the NWPU VHR-10 dataset, which was not utilized during the training process. The NWPU VHR-10 dataset, with 800 high-resolution images, effectively validates the model’s performance across diverse objects and complex environments.

The dataset contains two main types of images. 715 optical images, obtained from Google Earth, have spatial resolutions ranging from 0.5 m to 2.0 m, encompassing diverse environments. These images are ideal for assessing the model’s generalization ability and robustness. Additionally, 85 pan-sharpened color infrared images provide an extremely fine spatial resolution of 0.08 m, offering conditions highly favorable for detecting the detailed boundaries and shapes of objects.

For further quantitative evaluation, the model’s pixel-level vehicle detection performance was assessed. The dataset utilized for this evaluation was collected by the Beijing-3B satellite, which has a spatial resolution of 0.3 m. Table 3 outlines the specifications of the Beijing-3B satellite. In addition to its spatial resolution, the satellite’s advanced orbital design and rapid revisit cycle allow it to collect detailed data over large areas with exceptional precision. This capability is crucial for applications such as real-time traffic monitoring, logistics network optimization, and urban traffic flow analysis. For instance, the proportion of small vehicles in a given traffic scenario or the movement patterns of large vehicles can be analyzed to alleviate traffic congestion or inform urban planning decisions.

Table 3 Specifications of the Beijing-3B Satellite

SpecificationDetails
Spatial resolution0.3 m
Spectral bandsPanchromatic, Multispectral
Revisit time1–2 days

3.1. Segmentation Results on NWPU VHR-10

Fig. 3 illustrates the object detection and segmentation results using the NWPU VHR-10 dataset. The model successfully detected and sharply delineated the boundaries of various objects such as airplanes, ships, storage tanks, bridges, and small vehicles, even within complex backgrounds. As shown in Fig. 3, the results include (a) airplane detection, (b) storage tanks and ships, (c) bridge identification, and (d) vehicle delineation. Notably, the proposed model demonstrated consistently high segmentation performance for both large and small objects. This includes accurately separating small objects, such as compact vehicles, that are often interspersed within cluttered backgrounds. These results validate the model’s robustness for object detection and segmentation in high-resolution remote sensing images, showcasing its ability to maintain strong performance under varying conditions.

Fig. 3. Segmentation results on NWPU VHR-10 using the SAMRS model: (a) airplane detection, (b) storage tanks and ships, (c) bridge identification, and (d) vehicle delineation.

It should be noted that the NWPU VHR-10 dataset was not used for training but served as an independent evaluation dataset to validate the SAMRS model’s segmentation capabilities on unseen data. The objective of this section is to provide qualitative evidence of the model’s robustness, as shown in Fig. 3. Therefore, quantitative evaluations were not conducted here and are instead presented in Section 3.2 using the Beijing-3B dataset.

The NWPU VHR-10 dataset encompasses diverse background conditions, including mountainous regions, urban areas, and marine environments. The model successfully detected and segmented objects in these complex settings, demonstrating its versatility and potential applicability in various remote sensing scenarios. Particularly, the model’s ability to accurately distinguish object shapes and boundaries in high-resolution imagery highlights its potential utility in applications such as traffic monitoring, logistics management, urban planning, and defense.

The experimental results using the NWPU VHR-10 dataset confirm that the proposed model can accurately detect small objects even in complex backgrounds of remote sensing imagery. The demonstrated robustness across various resolutions and conditions suggests that this methodology can be effectively applied to real-world remote sensing tasks, offering significant potential for practical applications in a wide range of fields.

3.2. Segmentation Results on Beijing-3B

This study utilized the Beijing-3B satellite to perform vehicle classification based on remote sensing imagery. Fig. 4 presents the object detection results using the SAMRS model on Beijing-3B satellite imagery. Beijing-3B, a state-of-the-art remote sensing satellite, provides an ultra-high spatial resolution of 0.3 m, enabling precise detection and classification of vehicle sizes and shapes even in complex environments such as crowded intersections, multi-lane roads, and dense urban areas.

Fig. 4. Segmentation results on Beijing-3B satellite imagery using SAMRS Model: (a) original satellite image, (b) segmentation mask, and (c) overlay of segmentation mask on the original image.

The high-resolution imagery from Beijing-3B enables not only the identification of vehicle presence but also the accurate differentiation of vehicle size and shape. This allows for clear classification of large vehicles (e.g., trucks and buses) and small vehicles (e.g., passenger cars and compact SUVs) while sharply detecting vehicle boundaries and fine structural details. Notably, the resolution of the Beijing-3B satellite offers significant advantages over conventional satellites, particularly in distinguishing individual vehicles in congested traffic areas or complex intersections with closely packed vehicles.

The vehicle classification task based on Beijing-3B satellite data recorded exceptionally high performance, even in complex environments, in distinguishing between large and small vehicles. The ultra-high resolution of 0.3 m enables the model to discern not only the vehicle’s outer contours but also detailed structural features, opening new possibilities for traffic monitoring using remote sensing technology.

3.3. Vehicle Identification Results

To quantitatively evaluate the performance of the SAMRS model, performance metrics such as F1-score, Accuracy, and IoU were analyzed using the Beijing-3B dataset. The F1-score, calculated based on Precision and Recall, was determined using the Eqs. (1)–(3). Precision represents the proportion of predicted positives that are positive, while Recall indicates the proportion of actual positives that are correctly identified as positive. Lastly, the F1-score is the harmonic mean of Precision and Recall, providing a balanced measure of the model’s performance.

Precision=TPTP+FP
Recall=TPTP+FN
F1=2×Precision×RecallPrecision+Recall

Additionally, Accuracy and IoU metrics were calculated to further evaluate the performance of the SAMRS model. Accuracy represents the proportion of correctly predicted instances out of all predictions, and it was computed using Eq. (4). IoU, measuring the overlap between the predicted region and the actual region, was calculated using Eq. (5).

Accuracy=TP+TNTP+TN+FP+FN
IoU=TPTP+FP+FN

Table 4 summarizes the performance of the SAMRS model in the vehicle detection experiment. The model achieved high metrics for F1-score (0.9570), Accuracy (0.9385), and IoU (0.9175), demonstrating its exceptional performance in vehicle detection tasks. These results validate the model’s efficiency in remote sensing object detection, establishing its effectiveness in handling such complex tasks.

Table 4 Performance metrics for vehicle detection using the SAMRS model

Performance metricValue
F1-score0.9570
Accuracy0.9385
IoU0.9175


This study highlights the strong potential of the SAMRS model to deliver high efficiency and accuracy across diverse remote sensing datasets. Furthermore, it underscores the model’s ability to perform automated object detection in complex satellite imagery, paving the way for broader applications in remote sensing analysis.

The performance of the SAMRS model can be further contextualized by comparing it to existing studies on object detection in satellite imagery. Groener et al. (2019) demonstrated that single-stage models like RetinaNet excel in detecting large objects with faster inference speed, while two-stage models such as Faster Region-based Convolutional Neural Network (R-CNN) and Cascade R-CNN achieve higher accuracy for small objects but at the cost of slower prediction speeds.

Similarly, Pflugfelder et al. (2022) addressed the challenges of detecting small vehicles (4–10 pixels) in satellite video, proposing a spatiotemporal model that achieved an F1-score of 0.87 by leveraging temporal consistency. However, their approach requires video data, which may not always be available in static satellite imagery.

In comparison, the SAMRS model overcomes these limitations by achieving accurate detection for both large and small objects using single satellite images. By balancing the speed of single-stage models with the accuracy of two-stage models, SAMRS demonstrates superior performance and versatility across varying object sizes and complex environments.

In this study, the performance of object detection and vehicle segmentation was evaluated using the NWPU VHR-10 dataset and Beijing-3B optical imagery, which were not utilized during the training process. Experimental results with the NWPU VHR-10 dataset demonstrated that various objects, such as airplanes, ships, and small vehicles, were accurately segmented in highresolution remote sensing images, even within complex backgrounds. Additionally, vehicle segmentation results using Beijing-3B’s 0.3 m ultra-high-resolution optical imagery confirmed that both large vehicles (e.g., trucks and buses) and small vehicles (e.g., passenger cars and compact SUVs) could be accurately classified even in complex intersections and road environments.

These findings validate the SAMRS model as a highly accurate and efficient solution for object detection and segmentation tasks in remote sensing. Beyond precise object segmentation, the model’s application potential extends significantly to real-world fields such as traffic monitoring, urban planning, and logistics management. The consistent performance of the SAMRS model across diverse and complex conditions demonstrates its feasibility as an efficient solution for remote sensing-based analysis.

Future research will focus on further validating and enhancing the performance of the SAMRS model by utilizing a broader range of remote sensing datasets. The goal is to test the model under various data conditions, ensuring robust performance across diverse object detection scenarios. Additionally, efforts will be directed toward improving real-time analysis capabilities to optimize the model’s performance in critical applications, including emergency response, disaster monitoring, and traffic flow analysis. These advancements are expected to expand the applicability of the SAMRS model in industries such as traffic management, logistics route optimization, and environmental monitoring.

This research was supported by the 2024 Technology Innovation Development Program funded by the Ministry of SMEs and Startups, Republic of Korea (No. RS-2024-00467984).

No potential conflict of interest relevant to this article was reported.

  1. Groener, A., Chern, G., and Pritt, M., 2019. A comparison of deep learning object detection models for satellite imagery. In Proceedings of the 2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, Oct. 15-16, pp. 1-10. https://doi.org/10.1109/AIPR47015.2019.9174593
  2. Kang, W. B., Jeong, M. Y., and Kim, Y. I., 2022. A Study on training dataset configuration for deep learning based image matching of multi-sensor VHR satellite images. Korean Journal of Remote Sensing, 38(6-1), 1505-1514. https://doi.org/10.7780/kjrs.2022.38.6.1.38
  3. Kim, J. S., and Lee, S. J., 2024. Improvement of object detection performance in satellite images using image segmentation and up-scaling. Korean Journal of Information Technology, 22(6), 21-29. https://doi.org/10.14801/jkiit.2024.22.6.21
  4. Lee, S. H., and Lee, M. J., 2020. A study on deep learning optimization by land cover classification item using satellite imagery. Korean Journal of Remote Sensing, 36(6-2), 1591-1604. https://doi.org/10.7780/kjrs.2020.36.6.2.9
  5. Pflugfelder, R., Weissenfeld, A., and Wagner, J., 2022. Deep vehicle detection in satellite video. arXiv preprint arXiv:2204.06828. https://doi.org/10.48550/arXiv.2204.06828
  6. Pritt, M., and Chern, G., 2017. Satellite image classification with deep learning. In Proceedings of the 2017 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, Oct. 10-12, pp. 1-7. https://doi.org/10.1109/AIPR.2017.8457969
  7. Shin, S. Y., Lee, S. H., and Han, H. H., 2021. A study on residual U-Net for semantic segmentation based on deep learning. Journal of Digital Convergence, 19(6), 251-258. https://doi.org/10.14400/JDC.2021.19.6.251
  8. Song, A. R., Choi, J. W., and Kim, Y. I., 2019. Change detection for high-resolution satellite images using transfer learning and deep learning network. Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, 37(3), 199-208. https://doi.org/10.7848/ksgpc.2019.37.3.199
  9. Swapna, B., Venkatessan, R., Taskeen, F., IndraPriya, K., Manjula, D., and Muthukumar, D. S., 2023. Scalable deep learning for categorization of satellite images. In Proceedings of the 2023 7th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Kirtipur, Nepal, Oct. 11-13, pp. 773-778. https://doi.org/10.1109/I-SMAC58438.2023.10290437
  10. Tehsin, S., Kausar, S., Jameel, A., Humayun, M., and Almofarreh, D. K., 2023. Satellite image categorization using scalable deep learning. Applied Sciences, 13(8), 5108. https://doi.org/10.3390/app13085108
  11. Uzma, S., Sabir, M., Malik, A., and Ahmed, M. A. U., 2024. Automating earth observation: Scalable deep learning for satellite image categorization. ZKG International, 9(1).
  12. Wang, D., Zhang, J., Du, B., Xu, M., Liu, L., and Tao, D., et al, 2023. SAMRS: Scaling-up remote sensing segmentation dataset with segment anything model. arXiv preprint arXiv:2305.02034. https://doi.org/10.48550/arXiv.2305.02034
  13. Yang, N., and Tang, H., 2021. Semantic segmentation of satellite images: A deep learning approach integrated with geospatial hash codes. Remote Sensing, 13(14), 2723. https://doi.org/10.3390/rs13142723
  14. Yun, D. S., and Kwak, N. Y., 2023. Object segmentation using ESRGAN and semantic soft segmentation. Journal of Internet of Things and Convergence, 9(1), 97-104. https://doi.org/10.20465/KIOTS.2023.9.1.097

Research Article

Korean J. Remote Sens. 2024; 40(6): 1219-1227

Published online December 31, 2024 https://doi.org/10.7780/kjrs.2024.40.6.1.27

Copyright © Korean Society of Remote Sensing.

Enhanced Vehicle Detection and Segmentation Using the SAMRS Model: Applications in High-Resolution Satellite Imagery

Jihyun Lee1, Taeyeon Won2, Kwangseob Kim3, Jinwoo Kim4, Seungchul Lee5*

1Researcher, Satellite Application Team, Stellarvision Inc., Seoul, Republic of Korea
2Senior Researcher, Satellite Application Team, Stellarvision Inc., Seoul, Republic of Korea
3Assistant Professor, Department of Computer Software, Kyungmin University, Uijeongbu, Republic of Korea
4Chief Researcher, Satellite System Research Center, LIG Nex1, Yongin, Republic of Korea
5CEO, Stellarvision Inc., Seoul, Republic of Korea

Correspondence to:Seungchul Lee
E-mail: leesc@stellarvision.kr

Received: November 22, 2024; Revised: December 10, 2024; Accepted: December 19, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Deep learning technologies have revolutionized image processing and analysis, introducing groundbreaking innovations that significantly improve the accuracy and efficiency of object segmentation, especially in satellite imagery. The increasing availability of high-resolution satellite images has created a demand for advanced models capable of handling the complexities of object detection in diverse environments. This study investigates the potential of the Segment Anything Model for Remote Sensing (SAMRS), a deep learning framework specifically designed for remote sensing applications, to accurately identify and segment a wide range of objects within satellite imagery. The model was trained using prominent datasets such as Dataset for Object Detection in Aerial Images (DOTA), Dataset for Object Detection in Optical Remote Sensing Images (DIOR), Fine-grained Object Detection in Aerial Images for Remote Sensing Version 2.0 (FAIR1M-2.0), and Instance Segmentation in Aerial Images Dataset (iSAID), enabling it to learn diverse object features and complexities. The evaluation of SAMRS was conducted on Northwestern Polytechnical University Very High Resolution 10-Class Dataset (NWPU VHR-10) and Beijing-3B datasets, where it demonstrated impressive results. In vehicle detection tasks, SAMRS achieved an Intersection over Union (IoU) of 0.9175, an F1-score of 0.9570, and an accuracy of 0.9385. These metrics highlight SAMRS’s capability to automate object detection in complex satellite images, overcoming challenges posed by intricate backgrounds and diverse object sizes. Furthermore, SAMRS is optimized to analyze both large and small-scale objects, ensuring robust performance across varying conditions. The findings emphasize the model’s utility not only for current remote sensing applications but also for future extensions involving drone imagery and domestic satellite datasets. By automating object detection and segmentation, SAMRS has the potential to transform practical fields such as urban planning, disaster management, traffic monitoring, and environmental analysis, making it a vital tool in advancing satellite imagery analysis.

Keywords: Remote sensing, Deep learning, SAMRS, SAM, Segmentation

1. Introduction

Advancements in Earth observation technology have significantly increased satellite imagery, essential for agriculture, urban planning, and environmental protection. However, much of this vast data remains unlabeled, particularly at the pixel level, which is crucial for object segmentation and analysis. Manual annotation is labor-intensive, demanding highly specialized knowledge and significant time, as highlighted by previous studies (Pritt and Chern, 2017; Tehsin et al., 2023). Additionally, remote sensing images encompass diverse resolutions and spectral bands, requiring advanced segmentation techniques for accurate analysis. These challenges highlight the limitations of relying solely on human interpretation, leaving much satellite imagery unlabeled and restricting its potential applications in fields such as agriculture, urban planning, and environmental monitoring (Swapna et al., 2023).

To address these issues, automated labeling and object segmentation technologies based on artificial intelligence, particularly deep learning, have recently gained traction. Kang et al. (2022) demonstrated the effectiveness of deep learning-based image-matching techniques for multi-sensor high-resolution satellite imagery, enabling automated labeling and object recognition across various resolutions and spectral bands. Deep learning models excel at learning patterns from vast satellite imagery datasets, enabling automated labeling, object recognition, and segmentation with exceptional performance (Uzma et al., 2024). Object segmentation, in particular, facilitates precise pixel-level boundary detection and analysis across various resolutions and spectral bands, allowing for detailed interpretation even in complex satellite images (Yang and Tang, 2021).

Several studies have highlighted the potential of deep learning-based segmentation techniques. For example, Kim and Lee (2024) demonstrated that combining segmentation and upscaling significantly improves object detection in satellite imagery. Song et al. (2019) applied transfer learning and change detection networks to overcome the challenges posed by insufficient training data, enabling effective change detection in highresolution satellite imagery. Lee and Lee (2020) optimized deep learning methods for land cover classification, emphasizing the evolution of classification techniques from supervised and unsupervised methods to advanced machine learning and deep learning approaches. Shin et al. (2021) proposed a Residual U-Net model for semantic segmentation, which reduces feature loss and efficiently extracts contextual information. More recently, Yun et al. (2023) improved the segmentation performance for small objects in satellite imagery by integrating ESRGAN with Semantic Soft Segmentation, demonstrating the effectiveness of super-resolution techniques in object segmentation tasks.

Building upon these advancements, this study explores the potential of the Segment Anything Model for Remote Sensing (SAMRS), a deep learning model specialized for remote sensing, to automate object segmentation. SAMRS is an extended framework of the Segment Anything Model (SAM) tailored to remote sensing datasets, aiming for accurate and efficient analysis of satellite imagery. By leveraging SAM’s inherent advantages, SAMRS is optimized to handle the diverse spectral properties of remote sensing data effectively. SAMRS also retains the zero-shot segmentation capability of SAM, which allows the model to generalize segmentation tasks without requiring retraining on specific datasets. This enables SAMRS to perform robust object segmentation across different regions and scenes, significantly reducing the need for labor-intensive manual annotations. By incorporating this capability, SAMRS not only streamlines segmentation tasks but also enhances scalability, making it suitable for a wide range of applications.

In particular, this study introduces significant advancements in SAMRS through fine-tuning for pixel-level segmentation, focusing on vehicle detection as a case study. By fine-tuning SAMRS, we successfully segmented small and large vehicles, addressing critical challenges in object differentiation. Moreover, we employed the state-of-the-art Swin Transformer architecture to enhance segmentation performance, particularly in identifying and separating complex object categories. These contributions represent a clear differentiation from prior studies, which primarily focused on broader object recognition without evaluating such granular segmentation tasks.

To contextualize our approach, we build upon the insights from previous studies, integrating their findings into the design of SAMRS while addressing the limitations they identified. This research not only demonstrates the advanced segmentation capabilities of SAMRS but also highlights its application in diverse fields such as agriculture, environmental protection, urban planning, and disaster management. Automating detailed object segmentation tasks through SAMRS offers experts opportunities to focus on higher-level analyses, enhancing productivity and insights.

Additionally, this study provides a comprehensive evaluation of SAMRS across both training and evaluation datasets. Prominent datasets such as Dataset for Object Detection in Aerial Images (DOTA), Dataset for Object Detection in Optical Remote Sensing Images (DIOR), Fine-grained Object Detection in Aerial Images for Remote Sensing Version 2.0 (FAIR1M-2.0), and Instance Segmentation in Aerial Images Dataset (iSAID) were used for training to ensure robust performance across diverse scenarios. Independent evaluation datasets, including Northwestern Polytechnical University Very High Resolution 10-Class Dataset (NWPU VHR-10) and Beijing-3B, were utilized to validate the model’s segmentation and classification capabilities in real-world environments. By addressing gaps in manual segmentation, SAMRS offers a scalable and innovative solution for complex satellite data analysis.

2. Materials and Methods

2.1. Methodology

The United States, China, Russia, and other countries with a significant number of satellites collectively observe the Earth, generating an enormous amount of satellite imagery. As the volume of satellite data grows, meaningful segmentation of these images has become essential for effective use in various fields. High-resolution satellite imagery, in particular, requires sophisticated analysis for practical applications such as agriculture, urban planning, environmental monitoring, and defense.

In South Korea, the expansion of data centers reflects the continuous growth of infrastructure for storing, managing, and processing satellite data. As shown in Fig. 1, the number of data centers in South Korea increased from 53 in 2000 to 156 in 2020, according to the Korea Data Center Council (KDCC), and is projected to reach 180 by 2025. This expansion underscores the increasing need for technologies that enable the meaningful segmentation and analysis of vast satellite data.

Figure 1. Trends in the growth of domestic data centers.

This study aims to develop effective methods for segmenting large volumes of satellite imagery for practical applications across various domains. Deep learning-based object segmentation technologies enable high-precision segmentation of diverse objects within satellite images, facilitating detailed analysis and maximizing the value of satellite data. Fig. 2 illustrates the processing workflow of the SAMRS model utilized in this study.

Figure 2. Flowchart of the SAMRS framework: Integration of multi-dataset segmentation heads (DOTA, DIOR, and FAIR1M-2.0) with backbone network and loss aggregation mechanism.

SAMRS, developed by Wang et al. (2023), is an object segmentation model tailored for remote sensing. The model extends the capabilities of the original SAM framework to accommodate the unique characteristics of remote sensing datasets. By doing so, SAMRS enables the generation of large-scale segmentation datasets, improving the efficiency and accuracy of remote sensing tasks. A key strength of SAMRS is its ability to identify and segment the boundaries and shapes of various objects in complex satellite imagery, overcoming the limitations of manual labeling. Moreover, SAMRS incorporates the zero-shot segmentation capabilities of the original SAM, allowing it to segment previously unseen object classes without requiring additional training. This functionality significantly enhances the model’s adaptability to diverse datasets and reduces the reliance on manually annotated data.

The SAMRS framework consists of three primary components: the backbone network, the segmentation decoder, and the loss aggregation mechanism. The backbone network extracts robust features from input satellite imagery, which are then processed by the segmentation decoder to generate pixel-level segmentation outputs. In this study, the Swin Transformer was employed as the backbone network due to its hierarchical structure and efficient window-based attention mechanism. These features enable the Swin Transformer to process high-resolution imagery efficiently while minimizing memory usage. The segmentation decoder utilizes multiple independent segmentation heads, each tailored to specific datasets such as DOTA, DIOR, and FAIR1M-2.0, ensuring precise segmentation across diverse datasets. Finally, the loss aggregation mechanism combines the dataset-specific losses, where segmentation losses for the respective datasets are represented explicitly. This multi-head pre-training strategy ensures that SAMRS can effectively learn from diverse datasets while maintaining high segmentation accuracy.

To further optimize SAMRS performance, the model was finetuned on large-scale remote sensing datasets such as iSAID, which includes pixel-level annotations for a wide range of object classes. Fine-tuning allows SAMRS to adapt its segmentation capabilities to the specific requirements of remote sensing tasks, such as the segmentation of small, complex, or irregularly shaped objects. Data augmentation techniques such as random scaling, horizontal and vertical flipping, and color jittering were employed during training to enhance the model’s robustness and generalization capabilities.

The integration of the Swin Transformer into the SAMRS framework played a critical role in achieving superior segmentation performance. The Swin Transformer’s window-based attention mechanism efficiently processes local patches of high-resolution images, while its hierarchical structure facilitates multi-scale feature learning. These attributes enable SAMRS to capture both fine-grained details and global context, making it highly effective for analyzing complex satellite imagery.

By leveraging the capabilities of SAMRS, this study achieved significant improvements in object segmentation performance across diverse remote sensing datasets. The model demonstrated the ability to accurately segment various object classes, including small cars and large cars, in high-resolution imagery. These advancements highlight the potential of SAMRS to enhance the accuracy and efficiency of satellite image analysis, offering automated object detection and segmentation capabilities for applications in agriculture, urban planning, environmental monitoring, and defense. By addressing the challenges of manual labeling and segmentation for large-scale datasets, SAMRS represents a significant step forward in the field of remote sensing.

2.2. Study Data

In this study, the performance of the SAMRS model was evaluated using multiple remote sensing datasets, summarized in Table 1. Among these, iSAID is marked with a bold border in Table 1, as it was specifically utilized for fine-tuning the SAMRS model, highlighting its critical role in refining the model’s segmentation capabilities. The primary datasets include DOTA (Satellite Object deTection in Aerial images [SOTA]), DIOR (Satellite Imagery Object Recognition [SIOR]), FAIR1M-2.0 (Fine-grAined Segmentation for high-resolution remoTe sensing imagery [FAST]), and iSAID, each tailored for specific object types, resolutions, and use cases. These datasets were chosen to ensure diverse training scenarios and robust performance validation for SAMRS.

Table 1 . Datasets used for training the SAMRS model.

DatasetImagesCategoryChannelsResolutionImage size
SOTA17,48018RGB0.3–1.0 m1,024×1,024
SIOR23,46320RGB0.5–30.0 m800×800
FAST64,14737RGB0.3–1.0 m600×600
iSAID2,80615RGB0.3–1.0 m800×800–13,000×4,000


DOTA (SOTA) contains 16 object classes and focuses on detecting and segmenting various objects such as buildings, aircraft, ships, and vehicles in complex environments. Specifically, DOTA includes objects at various angles and scales, making it ideal for evaluating model versatility and robustness. This dataset plays a critical role in identifying and analyzing diverse objects in remote sensing images.

DIOR (SIOR) comprises 20 object classes and is a large-scale remote sensing dataset constructed to reflect varying weather conditions, seasons, and optical variations. It includes key objects such as buildings, ships, aircraft, and bridges, and focuses on evaluating high-resolution object detection and segmentation performance. DIOR is valuable for validating model robustness to environmental changes.

FAIR1M-2.0 (FAST) is designed for military applications and emphasizes the precise detection of objects of various sizes. This dataset plays a vital role in assessing the accuracy and adaptability of SAMRS for military and industrial applications. FAST is particularly useful for object detection and analysis in military scenarios.

iSAID is a large-scale remote sensing dataset designed for pixel-level object segmentation, consisting of 2,806 images and 655,451 object instances. iSAID supports complex segmentation tasks, including vehicle detection and size classification into large and small vehicles. It excels in accurately distinguishing vehicles of various sizes and shapes, making it an essential resource for applications such as traffic monitoring, traffic analysis, urban planning, and logistics management. Differentiating vehicle size and shape in remote sensing images is crucial for detecting small objects in complex backgrounds, and the iSAID dataset serves as an important benchmark for evaluating these capabilities.

Together, these datasets comprehensively evaluate the SAMRS model’s performance across diverse characteristics. Compared to other datasets, their significantly larger scale sets them apart. For instance, while Unmanned Aerial Vehicle (UAV) ID contains 420 images and DeepGlobe Land Cover includes 1,146 images, SAMRS provides a total of 105,090 images, enabling large-scale training and contributing to improved generalization performance of the model.

Additionally, two datasets, NWPU VHR-10, and Beijing-3B, were utilized as independent evaluation datasets to validate the model’s segmentation and classification performance in diverse scenarios. Table 2 summarizes the key characteristics of the NWPU VHR-10 and Beijing-3B datasets, providing an overview of their resolution, categories, and notable features used in the evaluation.

Table 2 . Datasets used for performance evaluation.

DatasetResolutionCategory/ClassFeature
NWPU VHR-100.08 m, 0.5–2.0 m10 classesMulti-class dataset including vehicles, buildings, ships, aircraft, and other key ground objects
Beijing-3B0.3 mSmall vehicles / Large vehiclesReal-world imagery including complex urban environments and diverse objects


In this study, the segmentation and object detection performance of the proposed model was evaluated using the NWPU VHR-10 dataset, which was not utilized during the training process. The NWPU VHR-10 dataset, with 800 high-resolution images, effectively validates the model’s performance across diverse objects and complex environments.

The dataset contains two main types of images. 715 optical images, obtained from Google Earth, have spatial resolutions ranging from 0.5 m to 2.0 m, encompassing diverse environments. These images are ideal for assessing the model’s generalization ability and robustness. Additionally, 85 pan-sharpened color infrared images provide an extremely fine spatial resolution of 0.08 m, offering conditions highly favorable for detecting the detailed boundaries and shapes of objects.

For further quantitative evaluation, the model’s pixel-level vehicle detection performance was assessed. The dataset utilized for this evaluation was collected by the Beijing-3B satellite, which has a spatial resolution of 0.3 m. Table 3 outlines the specifications of the Beijing-3B satellite. In addition to its spatial resolution, the satellite’s advanced orbital design and rapid revisit cycle allow it to collect detailed data over large areas with exceptional precision. This capability is crucial for applications such as real-time traffic monitoring, logistics network optimization, and urban traffic flow analysis. For instance, the proportion of small vehicles in a given traffic scenario or the movement patterns of large vehicles can be analyzed to alleviate traffic congestion or inform urban planning decisions.

Table 3 . Specifications of the Beijing-3B Satellite.

SpecificationDetails
Spatial resolution0.3 m
Spectral bandsPanchromatic, Multispectral
Revisit time1–2 days

3. Results and Analysis

3.1. Segmentation Results on NWPU VHR-10

Fig. 3 illustrates the object detection and segmentation results using the NWPU VHR-10 dataset. The model successfully detected and sharply delineated the boundaries of various objects such as airplanes, ships, storage tanks, bridges, and small vehicles, even within complex backgrounds. As shown in Fig. 3, the results include (a) airplane detection, (b) storage tanks and ships, (c) bridge identification, and (d) vehicle delineation. Notably, the proposed model demonstrated consistently high segmentation performance for both large and small objects. This includes accurately separating small objects, such as compact vehicles, that are often interspersed within cluttered backgrounds. These results validate the model’s robustness for object detection and segmentation in high-resolution remote sensing images, showcasing its ability to maintain strong performance under varying conditions.

Figure 3. Segmentation results on NWPU VHR-10 using the SAMRS model: (a) airplane detection, (b) storage tanks and ships, (c) bridge identification, and (d) vehicle delineation.

It should be noted that the NWPU VHR-10 dataset was not used for training but served as an independent evaluation dataset to validate the SAMRS model’s segmentation capabilities on unseen data. The objective of this section is to provide qualitative evidence of the model’s robustness, as shown in Fig. 3. Therefore, quantitative evaluations were not conducted here and are instead presented in Section 3.2 using the Beijing-3B dataset.

The NWPU VHR-10 dataset encompasses diverse background conditions, including mountainous regions, urban areas, and marine environments. The model successfully detected and segmented objects in these complex settings, demonstrating its versatility and potential applicability in various remote sensing scenarios. Particularly, the model’s ability to accurately distinguish object shapes and boundaries in high-resolution imagery highlights its potential utility in applications such as traffic monitoring, logistics management, urban planning, and defense.

The experimental results using the NWPU VHR-10 dataset confirm that the proposed model can accurately detect small objects even in complex backgrounds of remote sensing imagery. The demonstrated robustness across various resolutions and conditions suggests that this methodology can be effectively applied to real-world remote sensing tasks, offering significant potential for practical applications in a wide range of fields.

3.2. Segmentation Results on Beijing-3B

This study utilized the Beijing-3B satellite to perform vehicle classification based on remote sensing imagery. Fig. 4 presents the object detection results using the SAMRS model on Beijing-3B satellite imagery. Beijing-3B, a state-of-the-art remote sensing satellite, provides an ultra-high spatial resolution of 0.3 m, enabling precise detection and classification of vehicle sizes and shapes even in complex environments such as crowded intersections, multi-lane roads, and dense urban areas.

Figure 4. Segmentation results on Beijing-3B satellite imagery using SAMRS Model: (a) original satellite image, (b) segmentation mask, and (c) overlay of segmentation mask on the original image.

The high-resolution imagery from Beijing-3B enables not only the identification of vehicle presence but also the accurate differentiation of vehicle size and shape. This allows for clear classification of large vehicles (e.g., trucks and buses) and small vehicles (e.g., passenger cars and compact SUVs) while sharply detecting vehicle boundaries and fine structural details. Notably, the resolution of the Beijing-3B satellite offers significant advantages over conventional satellites, particularly in distinguishing individual vehicles in congested traffic areas or complex intersections with closely packed vehicles.

The vehicle classification task based on Beijing-3B satellite data recorded exceptionally high performance, even in complex environments, in distinguishing between large and small vehicles. The ultra-high resolution of 0.3 m enables the model to discern not only the vehicle’s outer contours but also detailed structural features, opening new possibilities for traffic monitoring using remote sensing technology.

3.3. Vehicle Identification Results

To quantitatively evaluate the performance of the SAMRS model, performance metrics such as F1-score, Accuracy, and IoU were analyzed using the Beijing-3B dataset. The F1-score, calculated based on Precision and Recall, was determined using the Eqs. (1)–(3). Precision represents the proportion of predicted positives that are positive, while Recall indicates the proportion of actual positives that are correctly identified as positive. Lastly, the F1-score is the harmonic mean of Precision and Recall, providing a balanced measure of the model’s performance.

Precision=TPTP+FP
Recall=TPTP+FN
F1=2×Precision×RecallPrecision+Recall

Additionally, Accuracy and IoU metrics were calculated to further evaluate the performance of the SAMRS model. Accuracy represents the proportion of correctly predicted instances out of all predictions, and it was computed using Eq. (4). IoU, measuring the overlap between the predicted region and the actual region, was calculated using Eq. (5).

Accuracy=TP+TNTP+TN+FP+FN
IoU=TPTP+FP+FN

Table 4 summarizes the performance of the SAMRS model in the vehicle detection experiment. The model achieved high metrics for F1-score (0.9570), Accuracy (0.9385), and IoU (0.9175), demonstrating its exceptional performance in vehicle detection tasks. These results validate the model’s efficiency in remote sensing object detection, establishing its effectiveness in handling such complex tasks.

Table 4 . Performance metrics for vehicle detection using the SAMRS model.

Performance metricValue
F1-score0.9570
Accuracy0.9385
IoU0.9175


This study highlights the strong potential of the SAMRS model to deliver high efficiency and accuracy across diverse remote sensing datasets. Furthermore, it underscores the model’s ability to perform automated object detection in complex satellite imagery, paving the way for broader applications in remote sensing analysis.

The performance of the SAMRS model can be further contextualized by comparing it to existing studies on object detection in satellite imagery. Groener et al. (2019) demonstrated that single-stage models like RetinaNet excel in detecting large objects with faster inference speed, while two-stage models such as Faster Region-based Convolutional Neural Network (R-CNN) and Cascade R-CNN achieve higher accuracy for small objects but at the cost of slower prediction speeds.

Similarly, Pflugfelder et al. (2022) addressed the challenges of detecting small vehicles (4–10 pixels) in satellite video, proposing a spatiotemporal model that achieved an F1-score of 0.87 by leveraging temporal consistency. However, their approach requires video data, which may not always be available in static satellite imagery.

In comparison, the SAMRS model overcomes these limitations by achieving accurate detection for both large and small objects using single satellite images. By balancing the speed of single-stage models with the accuracy of two-stage models, SAMRS demonstrates superior performance and versatility across varying object sizes and complex environments.

4. Conclusions

In this study, the performance of object detection and vehicle segmentation was evaluated using the NWPU VHR-10 dataset and Beijing-3B optical imagery, which were not utilized during the training process. Experimental results with the NWPU VHR-10 dataset demonstrated that various objects, such as airplanes, ships, and small vehicles, were accurately segmented in highresolution remote sensing images, even within complex backgrounds. Additionally, vehicle segmentation results using Beijing-3B’s 0.3 m ultra-high-resolution optical imagery confirmed that both large vehicles (e.g., trucks and buses) and small vehicles (e.g., passenger cars and compact SUVs) could be accurately classified even in complex intersections and road environments.

These findings validate the SAMRS model as a highly accurate and efficient solution for object detection and segmentation tasks in remote sensing. Beyond precise object segmentation, the model’s application potential extends significantly to real-world fields such as traffic monitoring, urban planning, and logistics management. The consistent performance of the SAMRS model across diverse and complex conditions demonstrates its feasibility as an efficient solution for remote sensing-based analysis.

Future research will focus on further validating and enhancing the performance of the SAMRS model by utilizing a broader range of remote sensing datasets. The goal is to test the model under various data conditions, ensuring robust performance across diverse object detection scenarios. Additionally, efforts will be directed toward improving real-time analysis capabilities to optimize the model’s performance in critical applications, including emergency response, disaster monitoring, and traffic flow analysis. These advancements are expected to expand the applicability of the SAMRS model in industries such as traffic management, logistics route optimization, and environmental monitoring.

Acknowledgments

This research was supported by the 2024 Technology Innovation Development Program funded by the Ministry of SMEs and Startups, Republic of Korea (No. RS-2024-00467984).

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

Fig 1.

Figure 1.Trends in the growth of domestic data centers.
Korean Journal of Remote Sensing 2024; 40: 1219-1227https://doi.org/10.7780/kjrs.2024.40.6.1.27

Fig 2.

Figure 2.Flowchart of the SAMRS framework: Integration of multi-dataset segmentation heads (DOTA, DIOR, and FAIR1M-2.0) with backbone network and loss aggregation mechanism.
Korean Journal of Remote Sensing 2024; 40: 1219-1227https://doi.org/10.7780/kjrs.2024.40.6.1.27

Fig 3.

Figure 3.Segmentation results on NWPU VHR-10 using the SAMRS model: (a) airplane detection, (b) storage tanks and ships, (c) bridge identification, and (d) vehicle delineation.
Korean Journal of Remote Sensing 2024; 40: 1219-1227https://doi.org/10.7780/kjrs.2024.40.6.1.27

Fig 4.

Figure 4.Segmentation results on Beijing-3B satellite imagery using SAMRS Model: (a) original satellite image, (b) segmentation mask, and (c) overlay of segmentation mask on the original image.
Korean Journal of Remote Sensing 2024; 40: 1219-1227https://doi.org/10.7780/kjrs.2024.40.6.1.27

Table 1 . Datasets used for training the SAMRS model.

DatasetImagesCategoryChannelsResolutionImage size
SOTA17,48018RGB0.3–1.0 m1,024×1,024
SIOR23,46320RGB0.5–30.0 m800×800
FAST64,14737RGB0.3–1.0 m600×600
iSAID2,80615RGB0.3–1.0 m800×800–13,000×4,000

Table 2 . Datasets used for performance evaluation.

DatasetResolutionCategory/ClassFeature
NWPU VHR-100.08 m, 0.5–2.0 m10 classesMulti-class dataset including vehicles, buildings, ships, aircraft, and other key ground objects
Beijing-3B0.3 mSmall vehicles / Large vehiclesReal-world imagery including complex urban environments and diverse objects

Table 3 . Specifications of the Beijing-3B Satellite.

SpecificationDetails
Spatial resolution0.3 m
Spectral bandsPanchromatic, Multispectral
Revisit time1–2 days

Table 4 . Performance metrics for vehicle detection using the SAMRS model.

Performance metricValue
F1-score0.9570
Accuracy0.9385
IoU0.9175

References

  1. Groener, A., Chern, G., and Pritt, M., 2019. A comparison of deep learning object detection models for satellite imagery. In Proceedings of the 2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, Oct. 15-16, pp. 1-10. https://doi.org/10.1109/AIPR47015.2019.9174593
  2. Kang, W. B., Jeong, M. Y., and Kim, Y. I., 2022. A Study on training dataset configuration for deep learning based image matching of multi-sensor VHR satellite images. Korean Journal of Remote Sensing, 38(6-1), 1505-1514. https://doi.org/10.7780/kjrs.2022.38.6.1.38
  3. Kim, J. S., and Lee, S. J., 2024. Improvement of object detection performance in satellite images using image segmentation and up-scaling. Korean Journal of Information Technology, 22(6), 21-29. https://doi.org/10.14801/jkiit.2024.22.6.21
  4. Lee, S. H., and Lee, M. J., 2020. A study on deep learning optimization by land cover classification item using satellite imagery. Korean Journal of Remote Sensing, 36(6-2), 1591-1604. https://doi.org/10.7780/kjrs.2020.36.6.2.9
  5. Pflugfelder, R., Weissenfeld, A., and Wagner, J., 2022. Deep vehicle detection in satellite video. arXiv preprint arXiv:2204.06828. https://doi.org/10.48550/arXiv.2204.06828
  6. Pritt, M., and Chern, G., 2017. Satellite image classification with deep learning. In Proceedings of the 2017 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, Oct. 10-12, pp. 1-7. https://doi.org/10.1109/AIPR.2017.8457969
  7. Shin, S. Y., Lee, S. H., and Han, H. H., 2021. A study on residual U-Net for semantic segmentation based on deep learning. Journal of Digital Convergence, 19(6), 251-258. https://doi.org/10.14400/JDC.2021.19.6.251
  8. Song, A. R., Choi, J. W., and Kim, Y. I., 2019. Change detection for high-resolution satellite images using transfer learning and deep learning network. Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, 37(3), 199-208. https://doi.org/10.7848/ksgpc.2019.37.3.199
  9. Swapna, B., Venkatessan, R., Taskeen, F., IndraPriya, K., Manjula, D., and Muthukumar, D. S., 2023. Scalable deep learning for categorization of satellite images. In Proceedings of the 2023 7th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Kirtipur, Nepal, Oct. 11-13, pp. 773-778. https://doi.org/10.1109/I-SMAC58438.2023.10290437
  10. Tehsin, S., Kausar, S., Jameel, A., Humayun, M., and Almofarreh, D. K., 2023. Satellite image categorization using scalable deep learning. Applied Sciences, 13(8), 5108. https://doi.org/10.3390/app13085108
  11. Uzma, S., Sabir, M., Malik, A., and Ahmed, M. A. U., 2024. Automating earth observation: Scalable deep learning for satellite image categorization. ZKG International, 9(1).
  12. Wang, D., Zhang, J., Du, B., Xu, M., Liu, L., and Tao, D., et al, 2023. SAMRS: Scaling-up remote sensing segmentation dataset with segment anything model. arXiv preprint arXiv:2305.02034. https://doi.org/10.48550/arXiv.2305.02034
  13. Yang, N., and Tang, H., 2021. Semantic segmentation of satellite images: A deep learning approach integrated with geospatial hash codes. Remote Sensing, 13(14), 2723. https://doi.org/10.3390/rs13142723
  14. Yun, D. S., and Kwak, N. Y., 2023. Object segmentation using ESRGAN and semantic soft segmentation. Journal of Internet of Things and Convergence, 9(1), 97-104. https://doi.org/10.20465/KIOTS.2023.9.1.097
KSRS
December 2024 Vol. 40, No.6, pp. 1005-989

Metrics

Share

  • line

Related Articles

Korean Journal of Remote Sensing