Korean J. Remote Sens. 2025; 41(1): 73-86
Published online: February 28, 2025
https://doi.org/10.7780/kjrs.2025.41.1.7
© Korean Society of Remote Sensing
Correspondence to : Jong-Hwa Park
E-mail: jhpak7@cbnu.ac.kr
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Early and accurate monitoring of crop growth is crucial for precision agriculture. This study developed and evaluated a novel framework for precision monitoring of early-stage cabbage (Brassica oleracea var. capitata) using Unmanned Aerial Vehicle (UAV) multispectral imagery and a modified Faster Region-based Convolutional Neural Network (Faster R-CNN). A DJI Matrice 300 RTK UAV equipped with RGB and RedEdge-MX multispectral sensors acquired high-resolution imagery of a cabbage testbed in Goesan-gun, South Korea. A Faster R-CNN model, incorporating a ResNet-50 backbone and Feature Pyramid Network (FPN), was trained to detect individual cabbage plants. A two-stage data augmentation approach was employed: initial training with bounding box annotations, followed by refinement using 15cm buffer zones around predicted plant centroids. The model achieved a mean Average Precision (mAP) of 0.900 on an independent test set, outperforming YOLOv5s and SSD models. Two object delineation methods were compared: the 15cm buffer zones and an Excess Green (ExG)-based dissolve operation. The ExG-based dissolve method demonstrated superior performance in delineating healthy cabbage vegetation, yielding a significantly higher mean Normalized Difference Vegetation Index (NDVI) (0.470) compared to the buffer method (0.300) and a lower proportion of low NDVI values (12.54% vs. 49.38%). These results highlight the potential of integrating UAV-based multispectral imaging with a modified Faster R-CNN and an ExG-based dissolve approach for accurate and efficient early-stage cabbage monitoring, facilitating data-driven decision-making in precision agriculture.
Keywords Precision agriculture, Unmanned aerial vehicle, Early growth stage monitoring, Faster RCNN, Deep learning
The growing global demand for food, coupled with increasing environmental concerns and resource limitations, necessitates a transition toward more sustainable and efficient agricultural practices (Godfray et al., 2010; Velten et al., 2015). Precision agriculture, which leverages advanced technologies to optimize crop management, offers a promising pathway to achieving this goal (Zhang et al., 2002). A core component of this paradigm shift is the accurate and timely monitoring of crop growth, spatial distribution, and physiological status, which have traditionally relied on labor-intensive and time-consuming manual methods (Araus and Cairns, 2014). This study focuses on advancing precision agriculture techniques for early-stage cabbage (Brassica oleracea var. capitata), a vegetable crop of vital importance to Korea’s national food security and agricultural economy (MAFRA, 2023). Specifically, early detection of stress in cabbage seedlings is crucial for maximizing yield, as young plants are particularly vulnerable to environmental pressures. However, Korean cabbage cultivation faces mounting challenges, including increased climate variability, particularly in temperature and precipitation patterns, limitations in water resources, and increasing occurrences of pests and diseases (Lee et al., 2016; Ryu et al., 2022). These factors underscore the urgency for adopting datadriven, scientifically informed cultivation practices to ensure the economic viability and environmental sustainability of cabbage production (Na et al., 2021; Ryu et al., 2024).
While Unmanned Aerial Vehicles (UAVs) equipped with RGB sensors have shown promise for general crop monitoring, their ability to capture the full spectrum of plant physiological information remains limited, particularly for subtle early-stage stress detection (Tsouros et al., 2019). In this research, we employ UAV-mounted multispectral sensors, capturing data beyond the visible spectrum, to gain deeper insights into cabbage health, including nutrient status and stress levels. Furthermore, the originality of this work lies in the synergistic integration of high-resolution spatial data from UAVs, detailed multispectral information, and a customized deep learning model for comprehensive cabbage monitoring. Specifically, we utilize the Faster Convolutional Neural Network (Faster R-CNN) framework, renowned for its object detection accuracy (Ren et al., 2016), leveraging a ResNet-50 backbone for robust feature extraction and a Feature Pyramid Network (FPN) for enhanced multi-scale object detection, crucial for identifying individual cabbage plants across varying growth stages and sizes (Lin et al., 2017). Faster R-CNN was chosen as a starting point due to its proven performance in object detection tasks and its ability to handle objects of varying scales, making it well-suited for early-stage cabbage detection, where plant size can vary considerably. ResNet-50 was selected as the backbone to provide a good balance between accuracy and computational efficiency, and FPN was incorporated to enhance the detection of both small and larger cabbage seedlings within the imagery.
The novelty of this research is threefold: (1) we present a pioneering integration of UAV-based multispectral imaging with a fine-tuned Faster R-CNN (ResNet-50 + FPN) model, specifically adapted for the unique challenges of early-stage cabbage production in Korea, such as its dense planting patterns and specific stress indicators like nutrient deficiencies and early disease symptoms; (2) we introduce a novel pre-processing workflow, including a unique image tiling approach with a 20% overlap and a 15cm buffering technique around detected plants, specifically designed to optimize the performance of the deep learning model on multispectral UAV data and address edge effects; and (3) we demonstrate the potential to extract both spatial (e.g., plant density, distribution) and physiological information (e.g., stress, nutrient content through vegetation indices like Normalized Difference Vegetation Index [NDVI] and Excess Green [ExG]) from a single data source. Moreover, we uniquely compare the efficacy of the buffering method with an ExG-based binary dissolve approach for refining object delineation, providing valuable methodological insights for future research.
This study hypothesizes that the proposed integrated framework, encompassing UAV-based multispectral data acquisition, tailored pre-processing, and a customized Faster R-CNN model, will enable accurate and efficient detection, classification, and health assessment of cabbage plants across different growth stages, specifically during the critical early growth stage. By providing real-time, spatially explicit insights into cabbage growth dynamics, this research aims to facilitate a transition from traditional, reactive management to a more proactive, data-driven approach. The findings will contribute to the advancement of high-throughput phenotyping, support the broader adoption of precision agriculture practices, and ultimately bolster the economic viability and environmental sustainability of cabbage production in Korea and potentially other regions with similar agricultural contexts. Therefore, the primary objective of this study was to develop and rigorously evaluate a UAV-based multispectral imaging and deep learning framework for accurate early-stage cabbage detection, delineation, and health assessment using a modified Faster R-CNN model. This framework aims to provide timely and actionable information to support informed decision-making in cabbage cultivation.
This study employed a multi-stage workflow (Fig. 1) to detect and analyze early-stage cabbage plants using UAV-acquired multispectral imagery and deep learning. The process involved data acquisition, preprocessing, dataset creation, model training and evaluation, and a comparative analysis of object delineation methods. Each step of the workflow is described in detail below.
The research was carried out at the K-Smart Organic Farm Innovation Demonstration Complex in Galeup-ri, Goesan-gun, Chungcheongbuk-do, South Korea (36°47′14″N, 127°51′5″E, Fig. 2). This complex, established in July 2023 by the Ministry of Agriculture, Food, and Rural Affairs, serves as a testbed for advanced agricultural technologies, including wireless automated irrigation and UAV-based crop monitoring. The study site was a designated testbed within a 2.13 ha area dedicated to organic cabbage (Brassica oleracea var. capitata, cultivar: ‘Chun Gwang’) cultivation. The specific testbed, located at 599 Galeum-ri, was planted with cabbage seedlings on August 23, 2023. The planting density was approximately 6.25 plants per square meter, with a row spacing of 60 cm and plant spacing within rows of 40 cm. The site was selected for its representation of cutting-edge, technology-integrated agricultural practices currently being promoted for sustainable vegetable production in South Korea. The testbed is equipped with a drip irrigation system, representative of the smart farming infrastructure.
Multispectral and RGB imagery of the cabbage testbed was acquired on September 10, 2023, approximately two weeks postplanting, a critical period for assessing early growth status and identifying potential stress (Jeong et al., 2024; Lee et al., 2022). Data acquisition was performed using a DJI Matrice 300 RTK UAV platform equipped with a Zenmuse H20T RGB camera and a RedEdge-MX multispectral sensor (Table 1). Flights were conducted under clear sky conditions between 11:00 AM and 1:00 PM local time to minimize shadow effects. The flight plan was designed using DJI GS Pro software, with automated flight paths generated to ensure consistent image overlap. The RGB data were captured at a 30 m flight altitude, yielding a spatial resolution of 1.1 cm/pixel. A total of 150 RGB images were acquired. The multispectral data were acquired at a 40 m altitude, resulting in a 2.87 cm/pixel resolution. A total of 180 multispectral images were acquired. The multispectral sensor captured data in five spectral bands: blue (475 ± 32 nm), green (560 ± 27 nm), red (668 ± 14 nm), red edge (717 ± 12 nm), and near-infrared (NIR) (842 ± 57 nm). The forward overlap for both RGB and multispectral flights was 80%, and the side overlap was 70%.
Table 1 Specifications of the UAV platform, RGB camera, and multispectral sensor used for data acquisition
Equipment/Sensor | Model | Manufacturer | Flight Altitude (m) | Resolution (cm/pixel) | Spectral Bands (nm) |
---|---|---|---|---|---|
UAV Platform | DJI Matrice 300 RTK | DJI, Shenzhen, China | - | - | - |
RGB Sensor | Zenmuse H20T | DJI, Shenzhen, China | 30 | 1.1 | - |
Multispectral Sensor | RedEdge-MX | MicaSense, Seattle, WA, USA | 40 | 2.87 | Blue (475 ± 32), Green (560 ± 27), Red (668 ± 14), Red Edge (717 ± 12), NIR (842 ± 57) |
For precise georeferencing, ground control points (GCPs) were established across the field before the UAV flights and measured using a Trimble R10 GNSS receiver (Lee and Park, 2024; Go and Park, 2024). A total of nine GCPs were used, distributed across the field perimeter. Due to logistical constraints and to minimize disturbance to the young cabbage plants, no GCPs were placed in the central region of the testbed. Post-processing analysis confirmed that geometric accuracy was within acceptable limits (RMSE < 3 cm in X and Y), demonstrating that the perimeter GCP distribution provided sufficient accuracy for this study. Radiometric calibration was performed using a calibrated reflectance panel (Spectralon 99% reflectance panel, Labsphere, Inc., North Sutton, NH, USA; Serial Number: RP04-1949205- OB). Images of the reflectance panel (Fig. 3) were acquired before and after each UAV flight, with the panel placed horizontally on a level surface near the takeoff location. These images were used to convert raw digital numbers (DNs) to reflectance values using the empirical line method, following the manufacturer’s recommendations. This process corrects for variations in illumination conditions and sensor response, ensuring accurate reflectance measurements.
The acquired imagery underwent a rigorous preprocessing pipeline using Pix4Dmapper software (Pix4D, Lausanne, Switzerland). Raw images were first radiometrically calibrated using the reflectance panel data (Section 2.2.2). A 3D point cloud was generated using structure-from-motion (SfM) techniques (Go et al., 2022). Geometric correction was then performed using the GCPs, followed by orthomosaic generation (Lee et al., 2022). The resulting RGB orthomosaic had a spatial resolution of 1.1 cm/pixel and dimensions of 24,588 × 31,881 pixels. The multispectral orthomosaic had a spatial resolution of 2.87 cm/pixel.
The high-resolution RGB orthomosaic was tiled into smaller 512 × 512-pixel image patches with a 20% overlap to optimize model training and prevent object truncation at tile boundaries. This tiling strategy produced a total of 188 RGB tiles. The 512 × 512 pixel size was chosen as a balance between providing sufficient contextual information for the Faster R-CNN model and maintaining computational efficiency during training.
Manual annotation using Label Studio (HumanSignal, San Francisco, CA, USA) involved creating bounding boxes around each cabbage plant, with annotations saved in COCO format. A single annotator performed all labeling to ensure consistency, followed by a quality control check by a second expert to minimize bias and ensure accuracy. Fig. 4 provides an example of the bounding box annotations, demonstrating the precision with which individual cabbage plants were delineated. The dataset was split into training (91 images, 4,270 annotations), validation (31 images, 1,368 annotations), and test (31 images, 1,591 annotations) sets, using a 60:20:20 ratio. This split ratio is commonly used in machine learning and provides a sufficient amount of data for training while reserving adequate, independent sets for validation and testing. Bounding boxes were drawn to encompass the entire visible extent of each cabbage plant, minimizing the inclusion of background soil (Fig. 5).
To further augment the training data and improve model robustness, 15 cm radius buffer zones were generated around the predicted centroids of each cabbage plant. This was a twostage process: (a) Initial Model Training: An initial Faster R-CNN model (identical in architecture to the final model described in Section 2.6) was trained on the original bounding box annotations. (b) Buffer Zone Creation: This initial model was then used to predict cabbage locations (centroids) on the training images. 15 cm radius buffer zones were created around these predicted centroids (Fig. 5). This novel application of buffer zones, guided by initial model predictions, demonstrates a differentiated approach to refining object detection and was used to create instance segmentation masks in this study. The 15 cm buffer size was chosen based on the average diameter of cabbage plants at the two-week post-planting stage, as determined by field measurements. This augmentation strategy aims to provide the model with more contextual information about each plant, improving its ability to distinguish cabbage plants from the background. The final training dataset included both the original bounding box annotations and the augmented data with buffer zones.
The Faster R-CNN object detection framework (Ren et al., 2016), with a ResNet-50 backbone and FPN (Lin et al., 2017), was employed for cabbage detection. Faster R-CNN is a two-stage detector (Fig. 6). The first stage, the Region Proposal Network (RPN), proposes regions likely to contain objects. The second stage classifies these regions and refines their bounding boxes. The ResNet-50 backbone, a deep convolutional neural network, mitigates vanishing gradients via residual connections, enabling it to learn complex features (He et al., 2016). It serves as the feature extractor, providing rich image representations. The FPN addresses multi-scale object detection by creating a top-down pathway with lateral connections. This combines low-resolution, semantically strong features with high-resolution, semantically weak features, using 1 × 1 convolutions to ensure consistent channel dimensions before element-wise addition. The RPN slides a small network over the FPN’s feature maps, predicting multiple region proposals (with varying sizes and aspect ratios – anchors) at each location. Region of Interest (RoI) pooling then extracts fixed-size feature maps from each proposal. These features are fed into fully connected layers (classification and regression heads). The classification head predicts the probability of each region containing a cabbage (or background), while the regression head refines the bounding box coordinates. The model uses a multi-task loss function, combining classification loss (cross-entropy) and bounding box regression loss (Smooth L1 loss): L = Lcls + λ × Lreg where Lcls is the classification loss, Lreg is the bounding box regression loss, and λ is a balancing parameter (set to 1 in this study). Faster R-CNN’s proven performance and ability to handle varying object scales, combined with ResNet- 50’s balance of accuracy and efficiency, and FPN’s multi-scale detection capabilities, made it suitable for this task.
The model was implemented using PyTorch 2.3.1 (Meta AI, Menlo Park, CA, USA) and initialized with weights pre-trained on the COCO dataset to leverage transfer learning (Yosinski et al., 2014). We customized the RoI Heads, setting the number of classes to two (“cabbage” and “background”), the maximum detections per image to 300, and the confidence threshold to 0.05. This low threshold aimed to maximize the detection of young, less visually distinct cabbage plants. Model training was performed on a system with a single NVIDIA GeForce GTX 1080 Ti GPU and CUDA version 12.1. The SGD optimizer was used with a learning rate of 0.001, momentum of 0.9, and weight decay of 0.0005. The model was trained for 50 epochs, with each epoch taking approximately 10.4 seconds. The loss decreased from 37.15 to 12.90, indicating successful convergence.
Model performance was rigorously evaluated using both validation and test datasets. Key metrics included Precision, Recall, F1-score, Average Precision (AP), and mean Average Precision (mAP), calculated as follows:
Precision:
Recall:
F1-Score:
Intersection over Union (IoU):
Average Precision (AP):
mean Average Precision (mAP):
where TP, FP, FN, and N represent true positives, false positives, false negatives, and the number of classes, respectively. The IoU threshold for determining true positives was set to 0.5.
To assess cabbage health and growth, the NDVI and ExG Index were calculated.
where r = Red / (Red + Green + Blue), g = Green / (Red + Green + Blue), b = Blue / (Red + Green + Blue) and NIR, Red, Green, and Blue represent the reflectance values in the respective multispectral bands. These indices were computed pixel-wise using the multispectral orthomosaic to generate spatial maps of vegetation vigor.
A comparative analysis was conducted between the 15 cm buffer zone method and an ExG-based binary dissolve approach for object delineation. The binary dissolve operation was performed on the ExG maps using a threshold of 0.05 to create a binary mask representing areas of vegetation (Fig. 7). The ExG threshold of 0.05 was determined empirically through visual inspection of the ExG maps and corresponding RGB imagery, selecting a value that effectively segmented cabbage vegetation from the background soil and other non-cabbage objects. NDVI values were extracted and compared for regions defined by both methods using zonal statistics. This methodological comparison provides insights into the effectiveness of different approaches for refining object boundaries. Kernel Density Estimate (KDE) plots and scatter plots of the NDVI distributions were generated to visually compare the two methods.
To further justify the selection of the Faster R-CNN model, its performance was compared to that of two other widely used object detection models: YOLOv5s and Single Shot MultiBox Detector (SSD). The same training, validation, and test datasets (described in Section 2.4) were used for all three models.
The YOLOv5s (small) model was implemented using PyTorch. The model was trained using the SGD optimizer with a learning rate of 0.01, momentum of 0.937, and weight decay of 0.0005. The model was trained for 50 epochs with a batch size of 16.
The SSD model with a VGG-16 backbone was implemented using PyTorch. The model was trained using the SGD optimizer with a learning rate of 0.001, momentum of 0.9, and weight decay of 0.0005. The model was trained for 50 epochs with a batch size of 32.
The models were compared using the same metrics described in Section 2.6.3 (Precision, Recall, F1-score, AP, and mAP).
This study evaluated the performance of a customized Faster R-CNN model for detecting early-stage cabbage plants in UAVacquired multispectral imagery and compared two object delineation methods: a 15cm buffer zone and an ExG-based dissolve operation.
The Faster R-CNN model, utilizing a ResNet-50 backbone with FPN, demonstrated robust performance. As shown in Table 2, the model achieved a mAP of 0.890 on the validation set and 0.900 on the independent test set, indicating high accuracy and good generalization. High Precision (0.910) and Recall (0.920) values on the test set indicate a low rate of both false positives and false negatives. The F1-score was 0.910. Consistent performance between validation and test sets suggests the model is not overfitted. Fig. 7 visually compares ground truth annotations and model predictions.
Table 2 Performance metrics of the Faster R-CNN model on the validation and test datasets (IoU threshold = 0.5)
Dataset | mAP | Precision | Recall | F1-Score | AP |
---|---|---|---|---|---|
Validation | 0.890 | 0.900 | 0.910 | 0.900 | 0.890 |
Test | 0.900 | 0.910 | 0.920 | 0.910 | 0.900 |
To validate the choice of Faster R-CNN, its performance was compared with YOLOv5s and SSD models (Table 3). Faster R-CNN achieved the highest mAP (0.900) on the test set, outperforming both YOLOv5s (mAP = 0.852) and SSD (mAP = 0.825). While YOLOv5s showed slightly higher precision (0.921), Faster R-CNN exhibited superior recall (0.920). The F1-scores confirm Faster R-CNN’s superior performance (0.910) compared to YOLOv5s (0.885) and SSD (0.836). Faster R-CNN had an average inference time of 0.12 seconds per image, while YOLOv5s was faster at 0.05 seconds per image, and SSD was slower at 0.18 seconds per image.
Table 3 Performance comparison of Faster R-CNN, YOLOv5s, and SSD models on the test dataset
Model | mAP | Precision | Recall | F1-Score | Inference Time (s/image) |
---|---|---|---|---|---|
Faster R-CNN | 0.900 | 0.910 | 0.920 | 0.910 | 0.12 |
YOLOv5s | 0.852 | 0.921 | 0.853 | 0.885 | 0.05 |
SSD | 0.825 | 0.847 | 0.826 | 0.836 | 0.18 |
Metrics calculated using an IoU threshold of 0.5.
Following object detection, a spatial dissolve operation merged overlapping bounding boxes, refining initial predictions. Fig. 8 shows the raw Faster R-CNN detections, while Fig. 9 shows the result after the spatial dissolution. The dissolve operation consolidated multiple detections of the same plant, reducing the number of detected objects by 15%.
Fig. 9 qualitatively compares predicted bounding boxes (after dissolution) and regions of high ExG values. The visual corres - pondence between bounding boxes and high ExG areas indicates accurate detection of healthy cabbage plants (Woebbecke et al., 1995). Fig. 10 visually demonstrates the general agreement between the predicted bounding boxes (green) and regions of high ExG values (warmer colors in the ExG map), suggesting that the model accurately detects areas corresponding to healthy cabbage plants.
An ExG-based dissolve operation (ExG threshold of 0.05) generated polygons representing dense vegetation. Fig. 11 compares object delineation using the ExG-based dissolve and the 15 cm buffer zone. This method leverages spectral information to define object boundaries, offering a differentiated and potentially more accurate approach to object delineation compared to solely relying on bounding boxes.
The buffer zone and ExG-based dissolve methods were quantitatively compared using NDVI statistics (Table 4). The dissolved method yielded a significantly higher mean NDVI (0.470) compared to the buffer method (0.300) and a lower proportion of low NDVI values (≤ 0.3): 12.54% compared to 49.38%. Variances were similar.
Table 4 Comparison of NDVI statistics for cabbage regions delineated by the buffer and dissolve methods
Analysis Item | Buffer Method | Dissolved Method | Value Difference (%) |
---|---|---|---|
NDVI Mean Value | 0.300 | 0.470 | +0.17 |
NDVI Variance | 0.016 | 0.016 | 0.0 |
Count of NDVI Values ≤ 0.3 | 2,496 | 634 | –1,862 |
Proportion of NDVI Values ≤ 0.3 (%) | 49.38 | 12.54 | –36.84 |
NDVI values are unitless.
Fig. 12 supports these findings. The KDE plot clearly shows that the dissolved method’s NDVI distribution (blue line) is shifted towards higher values, with a peak in the 0.4–0.6 range, while the buffer method’s distribution (red line) peaks in the lower 0.1–0.3 range.
A scatter plot analysis (Fig. 13) provided a point-by-point comparison of NDVI values derived from the two methods. Fig. 13’s scatter plot confirms that the dissolving method generally yields higher NDVI values, with most points falling below the 1:1 line.
This study demonstrates the substantial potential of combining UAV-based multispectral imaging with a tailored Faster R-CNN model for accurate, efficient, and automated early-stage cabbage detection and health assessment. The high-performance metrics achieved by the Faster R-CNN model (Table 2), surpassing both YOLOv5s and SSD in mAP and recall (Table 3), confirm its suitability for this task. Superior recall is particularly important in this early-growth stage context, as failing to detect stressed or diseased seedlings (false negatives) can have significant downstream consequences for yield. The speed advantage of YOLOv5s, while noteworthy, is outweighed by the improved accuracy of Faster R-CNN for this application.
A key contribution of this work is the comparative evaluation of two object delineation strategies: fixed-radius buffering and ExG-based dissolving. The qualitative (Figs. 10 and 11) and quantitative (Table 4, Figs. 12 and 13) results unequivocally demonstrate the superiority of the ExG-based approach. By leveraging the spectral characteristics of healthy vegetation, the dissolving method produced more accurate plant delineations, avoiding the inclusion of bare soil and other non-cabbage areas that were often included within the fixed-radius buffer zones. This is directly reflected in the significantly higher mean NDVI values and the lower proportion of low-NDVI pixels within the dissolved regions. These findings are consistent with previous research demonstrating the utility of ExG for vegetation analysis (Woebbecke et al., 1995).
These technological advancements have direct and significant implications for precision agriculture practices. The ability to accurately identify individual plants and assess their health status (via NDVI derived from the delineated objects) opens up several opportunities: (a) targeted interventions – the framework facilitates precise, variable-rate application of resources (water, nutrients, pesticides). Instead of treating an entire field uniformly, interventions can be targeted only to those areas or individual plants that require them, reducing costs and environmental impact; (b) early stress detection – by monitoring subtle changes in NDVI, the system can provide early warnings of stress (nutrient deficiencies, water stress, disease onset) before visual symptoms become apparent, allowing for timely corrective action; and (c) improved yield prediction – accurate plant counts and health assessments, early in the growing season, provide valuable inputs for yield prediction models, enabling better planning and resource management.
While the results are promising, several limitations and avenues for future research should be acknowledged: (a) over-merging – the ExG-based dissolve, while effective, occasionally merged closely spaced plants. Future work will investigate more sophisticated delineation techniques, such as watershed segmentation and marker-controlled watershed segmentation, potentially incorporating plant size and shape information to mitigate this issue (Neubeck and Van Gool, 2006); (b) spectral resolution – this study utilized multispectral data. Future research could explore the use of hyperspectral imagery, which offers finer spectral resolution and the potential to detect even more subtle variations in plant physiological status; (c) environmental variability – data acquisition was performed under relatively ideal conditions (clear skies, midday). Future work should address the impact of varying illumination, atmospheric conditions, and shadow effects. This includes exploring robust atmospheric correction methods and potentially incorporating data multiple times of day; (d) generalizability – further validation is needed across different cabbage varieties, soil types, environmental conditions, and other crops with similar morphology (e.g., broccoli, cauliflower, lettuce); and (e) real-time System: A key goal is to develop a realtime implementation of this framework, integrating the image acquisition, processing, and analysis into a single, streamlined system for immediate on-farm decision support. Addressing these limitations will enhance the robustness, accuracy, and practical applicability of the framework, paving the way for its broader adoption in precision agriculture.
This study successfully developed and validated a novel framework for the precision monitoring of early-stage cabbage using UAV-based multispectral imaging and a customized Faster R-CNN deep learning model. The framework integrates data acquisition, preprocessing, object detection, and object delineation, providing a complete pipeline for automated cabbage analysis. The Faster R-CNN model, with a ResNet-50 backbone and FPN, achieved high accuracy in detecting individual cabbage plants (mAP = 0.900 on the test dataset), outperforming YOLOv5s and SSD in this specific application. This superior performance, particularly in terms of recall, is crucial for early-stage monitoring where minimizing missed detections is paramount. A key contribution of this work is the comparative analysis of object delineation methods. The ExG-based dissolution operation demonstrated significantly improved performance compared to the 15cm buffer zone approach, resulting in higher mean NDVI values (0.470 vs. 0.300) and a lower proportion of pixels with low NDVI values (12.54% vs. 49.38%). This indicates that the ExG-based method more accurately isolates healthy, photosynthetically active cabbage vegetation, providing a more reliable basis for assessing plant health.
The proposed framework offers a valuable tool for precision agriculture, enabling data-driven decision-making in cabbage cultivation. The accurate plant detection and health assessment capabilities can facilitate optimized resource allocation (e.g., variable-rate irrigation and fertilization) and early stress detection, ultimately leading to improved crop productivity and sustainability. While this study focused on early-stage cabbage, the framework’s core principles are adaptable to other crops with similar growth patterns. Future research will focus on addressing the identified limitations, including refining the object delineation algorithms to handle dense planting and overlapping plants (e.g., exploring watershed segmentation), investigating the use of hyperspectral imagery for enhanced stress detection, incorporating atmospheric correction techniques, and evaluating the framework’s performance across diverse environmental conditions and cabbage varieties. Ultimately, the goal is to develop a robust, real-time system for precision crop monitoring that can be widely adopted by farmers to improve agricultural practices.
None.
No potential conflict of interest relevant to this article was reported.
Korean J. Remote Sens. 2025; 41(1): 73-86
Published online February 28, 2025 https://doi.org/10.7780/kjrs.2025.41.1.7
Copyright © Korean Society of Remote Sensing.
Gyeong-Su Jeong1 , Jong-Hwa Park2*
1Master Student, Department of Agricultural and Rural Engineering, Chungbuk National University, Cheongju, Republic of Korea
2Professor, Department of Agricultural and Rural Engineering, Chungbuk National University, Cheongju, Republic of Korea
Correspondence to:Jong-Hwa Park
E-mail: jhpak7@cbnu.ac.kr
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Early and accurate monitoring of crop growth is crucial for precision agriculture. This study developed and evaluated a novel framework for precision monitoring of early-stage cabbage (Brassica oleracea var. capitata) using Unmanned Aerial Vehicle (UAV) multispectral imagery and a modified Faster Region-based Convolutional Neural Network (Faster R-CNN). A DJI Matrice 300 RTK UAV equipped with RGB and RedEdge-MX multispectral sensors acquired high-resolution imagery of a cabbage testbed in Goesan-gun, South Korea. A Faster R-CNN model, incorporating a ResNet-50 backbone and Feature Pyramid Network (FPN), was trained to detect individual cabbage plants. A two-stage data augmentation approach was employed: initial training with bounding box annotations, followed by refinement using 15cm buffer zones around predicted plant centroids. The model achieved a mean Average Precision (mAP) of 0.900 on an independent test set, outperforming YOLOv5s and SSD models. Two object delineation methods were compared: the 15cm buffer zones and an Excess Green (ExG)-based dissolve operation. The ExG-based dissolve method demonstrated superior performance in delineating healthy cabbage vegetation, yielding a significantly higher mean Normalized Difference Vegetation Index (NDVI) (0.470) compared to the buffer method (0.300) and a lower proportion of low NDVI values (12.54% vs. 49.38%). These results highlight the potential of integrating UAV-based multispectral imaging with a modified Faster R-CNN and an ExG-based dissolve approach for accurate and efficient early-stage cabbage monitoring, facilitating data-driven decision-making in precision agriculture.
Keywords: Precision agriculture, Unmanned aerial vehicle, Early growth stage monitoring, Faster RCNN, Deep learning
The growing global demand for food, coupled with increasing environmental concerns and resource limitations, necessitates a transition toward more sustainable and efficient agricultural practices (Godfray et al., 2010; Velten et al., 2015). Precision agriculture, which leverages advanced technologies to optimize crop management, offers a promising pathway to achieving this goal (Zhang et al., 2002). A core component of this paradigm shift is the accurate and timely monitoring of crop growth, spatial distribution, and physiological status, which have traditionally relied on labor-intensive and time-consuming manual methods (Araus and Cairns, 2014). This study focuses on advancing precision agriculture techniques for early-stage cabbage (Brassica oleracea var. capitata), a vegetable crop of vital importance to Korea’s national food security and agricultural economy (MAFRA, 2023). Specifically, early detection of stress in cabbage seedlings is crucial for maximizing yield, as young plants are particularly vulnerable to environmental pressures. However, Korean cabbage cultivation faces mounting challenges, including increased climate variability, particularly in temperature and precipitation patterns, limitations in water resources, and increasing occurrences of pests and diseases (Lee et al., 2016; Ryu et al., 2022). These factors underscore the urgency for adopting datadriven, scientifically informed cultivation practices to ensure the economic viability and environmental sustainability of cabbage production (Na et al., 2021; Ryu et al., 2024).
While Unmanned Aerial Vehicles (UAVs) equipped with RGB sensors have shown promise for general crop monitoring, their ability to capture the full spectrum of plant physiological information remains limited, particularly for subtle early-stage stress detection (Tsouros et al., 2019). In this research, we employ UAV-mounted multispectral sensors, capturing data beyond the visible spectrum, to gain deeper insights into cabbage health, including nutrient status and stress levels. Furthermore, the originality of this work lies in the synergistic integration of high-resolution spatial data from UAVs, detailed multispectral information, and a customized deep learning model for comprehensive cabbage monitoring. Specifically, we utilize the Faster Convolutional Neural Network (Faster R-CNN) framework, renowned for its object detection accuracy (Ren et al., 2016), leveraging a ResNet-50 backbone for robust feature extraction and a Feature Pyramid Network (FPN) for enhanced multi-scale object detection, crucial for identifying individual cabbage plants across varying growth stages and sizes (Lin et al., 2017). Faster R-CNN was chosen as a starting point due to its proven performance in object detection tasks and its ability to handle objects of varying scales, making it well-suited for early-stage cabbage detection, where plant size can vary considerably. ResNet-50 was selected as the backbone to provide a good balance between accuracy and computational efficiency, and FPN was incorporated to enhance the detection of both small and larger cabbage seedlings within the imagery.
The novelty of this research is threefold: (1) we present a pioneering integration of UAV-based multispectral imaging with a fine-tuned Faster R-CNN (ResNet-50 + FPN) model, specifically adapted for the unique challenges of early-stage cabbage production in Korea, such as its dense planting patterns and specific stress indicators like nutrient deficiencies and early disease symptoms; (2) we introduce a novel pre-processing workflow, including a unique image tiling approach with a 20% overlap and a 15cm buffering technique around detected plants, specifically designed to optimize the performance of the deep learning model on multispectral UAV data and address edge effects; and (3) we demonstrate the potential to extract both spatial (e.g., plant density, distribution) and physiological information (e.g., stress, nutrient content through vegetation indices like Normalized Difference Vegetation Index [NDVI] and Excess Green [ExG]) from a single data source. Moreover, we uniquely compare the efficacy of the buffering method with an ExG-based binary dissolve approach for refining object delineation, providing valuable methodological insights for future research.
This study hypothesizes that the proposed integrated framework, encompassing UAV-based multispectral data acquisition, tailored pre-processing, and a customized Faster R-CNN model, will enable accurate and efficient detection, classification, and health assessment of cabbage plants across different growth stages, specifically during the critical early growth stage. By providing real-time, spatially explicit insights into cabbage growth dynamics, this research aims to facilitate a transition from traditional, reactive management to a more proactive, data-driven approach. The findings will contribute to the advancement of high-throughput phenotyping, support the broader adoption of precision agriculture practices, and ultimately bolster the economic viability and environmental sustainability of cabbage production in Korea and potentially other regions with similar agricultural contexts. Therefore, the primary objective of this study was to develop and rigorously evaluate a UAV-based multispectral imaging and deep learning framework for accurate early-stage cabbage detection, delineation, and health assessment using a modified Faster R-CNN model. This framework aims to provide timely and actionable information to support informed decision-making in cabbage cultivation.
This study employed a multi-stage workflow (Fig. 1) to detect and analyze early-stage cabbage plants using UAV-acquired multispectral imagery and deep learning. The process involved data acquisition, preprocessing, dataset creation, model training and evaluation, and a comparative analysis of object delineation methods. Each step of the workflow is described in detail below.
The research was carried out at the K-Smart Organic Farm Innovation Demonstration Complex in Galeup-ri, Goesan-gun, Chungcheongbuk-do, South Korea (36°47′14″N, 127°51′5″E, Fig. 2). This complex, established in July 2023 by the Ministry of Agriculture, Food, and Rural Affairs, serves as a testbed for advanced agricultural technologies, including wireless automated irrigation and UAV-based crop monitoring. The study site was a designated testbed within a 2.13 ha area dedicated to organic cabbage (Brassica oleracea var. capitata, cultivar: ‘Chun Gwang’) cultivation. The specific testbed, located at 599 Galeum-ri, was planted with cabbage seedlings on August 23, 2023. The planting density was approximately 6.25 plants per square meter, with a row spacing of 60 cm and plant spacing within rows of 40 cm. The site was selected for its representation of cutting-edge, technology-integrated agricultural practices currently being promoted for sustainable vegetable production in South Korea. The testbed is equipped with a drip irrigation system, representative of the smart farming infrastructure.
Multispectral and RGB imagery of the cabbage testbed was acquired on September 10, 2023, approximately two weeks postplanting, a critical period for assessing early growth status and identifying potential stress (Jeong et al., 2024; Lee et al., 2022). Data acquisition was performed using a DJI Matrice 300 RTK UAV platform equipped with a Zenmuse H20T RGB camera and a RedEdge-MX multispectral sensor (Table 1). Flights were conducted under clear sky conditions between 11:00 AM and 1:00 PM local time to minimize shadow effects. The flight plan was designed using DJI GS Pro software, with automated flight paths generated to ensure consistent image overlap. The RGB data were captured at a 30 m flight altitude, yielding a spatial resolution of 1.1 cm/pixel. A total of 150 RGB images were acquired. The multispectral data were acquired at a 40 m altitude, resulting in a 2.87 cm/pixel resolution. A total of 180 multispectral images were acquired. The multispectral sensor captured data in five spectral bands: blue (475 ± 32 nm), green (560 ± 27 nm), red (668 ± 14 nm), red edge (717 ± 12 nm), and near-infrared (NIR) (842 ± 57 nm). The forward overlap for both RGB and multispectral flights was 80%, and the side overlap was 70%.
Table 1 . Specifications of the UAV platform, RGB camera, and multispectral sensor used for data acquisition.
Equipment/Sensor | Model | Manufacturer | Flight Altitude (m) | Resolution (cm/pixel) | Spectral Bands (nm) |
---|---|---|---|---|---|
UAV Platform | DJI Matrice 300 RTK | DJI, Shenzhen, China | - | - | - |
RGB Sensor | Zenmuse H20T | DJI, Shenzhen, China | 30 | 1.1 | - |
Multispectral Sensor | RedEdge-MX | MicaSense, Seattle, WA, USA | 40 | 2.87 | Blue (475 ± 32), Green (560 ± 27), Red (668 ± 14), Red Edge (717 ± 12), NIR (842 ± 57) |
For precise georeferencing, ground control points (GCPs) were established across the field before the UAV flights and measured using a Trimble R10 GNSS receiver (Lee and Park, 2024; Go and Park, 2024). A total of nine GCPs were used, distributed across the field perimeter. Due to logistical constraints and to minimize disturbance to the young cabbage plants, no GCPs were placed in the central region of the testbed. Post-processing analysis confirmed that geometric accuracy was within acceptable limits (RMSE < 3 cm in X and Y), demonstrating that the perimeter GCP distribution provided sufficient accuracy for this study. Radiometric calibration was performed using a calibrated reflectance panel (Spectralon 99% reflectance panel, Labsphere, Inc., North Sutton, NH, USA; Serial Number: RP04-1949205- OB). Images of the reflectance panel (Fig. 3) were acquired before and after each UAV flight, with the panel placed horizontally on a level surface near the takeoff location. These images were used to convert raw digital numbers (DNs) to reflectance values using the empirical line method, following the manufacturer’s recommendations. This process corrects for variations in illumination conditions and sensor response, ensuring accurate reflectance measurements.
The acquired imagery underwent a rigorous preprocessing pipeline using Pix4Dmapper software (Pix4D, Lausanne, Switzerland). Raw images were first radiometrically calibrated using the reflectance panel data (Section 2.2.2). A 3D point cloud was generated using structure-from-motion (SfM) techniques (Go et al., 2022). Geometric correction was then performed using the GCPs, followed by orthomosaic generation (Lee et al., 2022). The resulting RGB orthomosaic had a spatial resolution of 1.1 cm/pixel and dimensions of 24,588 × 31,881 pixels. The multispectral orthomosaic had a spatial resolution of 2.87 cm/pixel.
The high-resolution RGB orthomosaic was tiled into smaller 512 × 512-pixel image patches with a 20% overlap to optimize model training and prevent object truncation at tile boundaries. This tiling strategy produced a total of 188 RGB tiles. The 512 × 512 pixel size was chosen as a balance between providing sufficient contextual information for the Faster R-CNN model and maintaining computational efficiency during training.
Manual annotation using Label Studio (HumanSignal, San Francisco, CA, USA) involved creating bounding boxes around each cabbage plant, with annotations saved in COCO format. A single annotator performed all labeling to ensure consistency, followed by a quality control check by a second expert to minimize bias and ensure accuracy. Fig. 4 provides an example of the bounding box annotations, demonstrating the precision with which individual cabbage plants were delineated. The dataset was split into training (91 images, 4,270 annotations), validation (31 images, 1,368 annotations), and test (31 images, 1,591 annotations) sets, using a 60:20:20 ratio. This split ratio is commonly used in machine learning and provides a sufficient amount of data for training while reserving adequate, independent sets for validation and testing. Bounding boxes were drawn to encompass the entire visible extent of each cabbage plant, minimizing the inclusion of background soil (Fig. 5).
To further augment the training data and improve model robustness, 15 cm radius buffer zones were generated around the predicted centroids of each cabbage plant. This was a twostage process: (a) Initial Model Training: An initial Faster R-CNN model (identical in architecture to the final model described in Section 2.6) was trained on the original bounding box annotations. (b) Buffer Zone Creation: This initial model was then used to predict cabbage locations (centroids) on the training images. 15 cm radius buffer zones were created around these predicted centroids (Fig. 5). This novel application of buffer zones, guided by initial model predictions, demonstrates a differentiated approach to refining object detection and was used to create instance segmentation masks in this study. The 15 cm buffer size was chosen based on the average diameter of cabbage plants at the two-week post-planting stage, as determined by field measurements. This augmentation strategy aims to provide the model with more contextual information about each plant, improving its ability to distinguish cabbage plants from the background. The final training dataset included both the original bounding box annotations and the augmented data with buffer zones.
The Faster R-CNN object detection framework (Ren et al., 2016), with a ResNet-50 backbone and FPN (Lin et al., 2017), was employed for cabbage detection. Faster R-CNN is a two-stage detector (Fig. 6). The first stage, the Region Proposal Network (RPN), proposes regions likely to contain objects. The second stage classifies these regions and refines their bounding boxes. The ResNet-50 backbone, a deep convolutional neural network, mitigates vanishing gradients via residual connections, enabling it to learn complex features (He et al., 2016). It serves as the feature extractor, providing rich image representations. The FPN addresses multi-scale object detection by creating a top-down pathway with lateral connections. This combines low-resolution, semantically strong features with high-resolution, semantically weak features, using 1 × 1 convolutions to ensure consistent channel dimensions before element-wise addition. The RPN slides a small network over the FPN’s feature maps, predicting multiple region proposals (with varying sizes and aspect ratios – anchors) at each location. Region of Interest (RoI) pooling then extracts fixed-size feature maps from each proposal. These features are fed into fully connected layers (classification and regression heads). The classification head predicts the probability of each region containing a cabbage (or background), while the regression head refines the bounding box coordinates. The model uses a multi-task loss function, combining classification loss (cross-entropy) and bounding box regression loss (Smooth L1 loss): L = Lcls + λ × Lreg where Lcls is the classification loss, Lreg is the bounding box regression loss, and λ is a balancing parameter (set to 1 in this study). Faster R-CNN’s proven performance and ability to handle varying object scales, combined with ResNet- 50’s balance of accuracy and efficiency, and FPN’s multi-scale detection capabilities, made it suitable for this task.
The model was implemented using PyTorch 2.3.1 (Meta AI, Menlo Park, CA, USA) and initialized with weights pre-trained on the COCO dataset to leverage transfer learning (Yosinski et al., 2014). We customized the RoI Heads, setting the number of classes to two (“cabbage” and “background”), the maximum detections per image to 300, and the confidence threshold to 0.05. This low threshold aimed to maximize the detection of young, less visually distinct cabbage plants. Model training was performed on a system with a single NVIDIA GeForce GTX 1080 Ti GPU and CUDA version 12.1. The SGD optimizer was used with a learning rate of 0.001, momentum of 0.9, and weight decay of 0.0005. The model was trained for 50 epochs, with each epoch taking approximately 10.4 seconds. The loss decreased from 37.15 to 12.90, indicating successful convergence.
Model performance was rigorously evaluated using both validation and test datasets. Key metrics included Precision, Recall, F1-score, Average Precision (AP), and mean Average Precision (mAP), calculated as follows:
Precision:
Recall:
F1-Score:
Intersection over Union (IoU):
Average Precision (AP):
mean Average Precision (mAP):
where TP, FP, FN, and N represent true positives, false positives, false negatives, and the number of classes, respectively. The IoU threshold for determining true positives was set to 0.5.
To assess cabbage health and growth, the NDVI and ExG Index were calculated.
where r = Red / (Red + Green + Blue), g = Green / (Red + Green + Blue), b = Blue / (Red + Green + Blue) and NIR, Red, Green, and Blue represent the reflectance values in the respective multispectral bands. These indices were computed pixel-wise using the multispectral orthomosaic to generate spatial maps of vegetation vigor.
A comparative analysis was conducted between the 15 cm buffer zone method and an ExG-based binary dissolve approach for object delineation. The binary dissolve operation was performed on the ExG maps using a threshold of 0.05 to create a binary mask representing areas of vegetation (Fig. 7). The ExG threshold of 0.05 was determined empirically through visual inspection of the ExG maps and corresponding RGB imagery, selecting a value that effectively segmented cabbage vegetation from the background soil and other non-cabbage objects. NDVI values were extracted and compared for regions defined by both methods using zonal statistics. This methodological comparison provides insights into the effectiveness of different approaches for refining object boundaries. Kernel Density Estimate (KDE) plots and scatter plots of the NDVI distributions were generated to visually compare the two methods.
To further justify the selection of the Faster R-CNN model, its performance was compared to that of two other widely used object detection models: YOLOv5s and Single Shot MultiBox Detector (SSD). The same training, validation, and test datasets (described in Section 2.4) were used for all three models.
The YOLOv5s (small) model was implemented using PyTorch. The model was trained using the SGD optimizer with a learning rate of 0.01, momentum of 0.937, and weight decay of 0.0005. The model was trained for 50 epochs with a batch size of 16.
The SSD model with a VGG-16 backbone was implemented using PyTorch. The model was trained using the SGD optimizer with a learning rate of 0.001, momentum of 0.9, and weight decay of 0.0005. The model was trained for 50 epochs with a batch size of 32.
The models were compared using the same metrics described in Section 2.6.3 (Precision, Recall, F1-score, AP, and mAP).
This study evaluated the performance of a customized Faster R-CNN model for detecting early-stage cabbage plants in UAVacquired multispectral imagery and compared two object delineation methods: a 15cm buffer zone and an ExG-based dissolve operation.
The Faster R-CNN model, utilizing a ResNet-50 backbone with FPN, demonstrated robust performance. As shown in Table 2, the model achieved a mAP of 0.890 on the validation set and 0.900 on the independent test set, indicating high accuracy and good generalization. High Precision (0.910) and Recall (0.920) values on the test set indicate a low rate of both false positives and false negatives. The F1-score was 0.910. Consistent performance between validation and test sets suggests the model is not overfitted. Fig. 7 visually compares ground truth annotations and model predictions.
Table 2 . Performance metrics of the Faster R-CNN model on the validation and test datasets (IoU threshold = 0.5).
Dataset | mAP | Precision | Recall | F1-Score | AP |
---|---|---|---|---|---|
Validation | 0.890 | 0.900 | 0.910 | 0.900 | 0.890 |
Test | 0.900 | 0.910 | 0.920 | 0.910 | 0.900 |
To validate the choice of Faster R-CNN, its performance was compared with YOLOv5s and SSD models (Table 3). Faster R-CNN achieved the highest mAP (0.900) on the test set, outperforming both YOLOv5s (mAP = 0.852) and SSD (mAP = 0.825). While YOLOv5s showed slightly higher precision (0.921), Faster R-CNN exhibited superior recall (0.920). The F1-scores confirm Faster R-CNN’s superior performance (0.910) compared to YOLOv5s (0.885) and SSD (0.836). Faster R-CNN had an average inference time of 0.12 seconds per image, while YOLOv5s was faster at 0.05 seconds per image, and SSD was slower at 0.18 seconds per image.
Table 3 . Performance comparison of Faster R-CNN, YOLOv5s, and SSD models on the test dataset.
Model | mAP | Precision | Recall | F1-Score | Inference Time (s/image) |
---|---|---|---|---|---|
Faster R-CNN | 0.900 | 0.910 | 0.920 | 0.910 | 0.12 |
YOLOv5s | 0.852 | 0.921 | 0.853 | 0.885 | 0.05 |
SSD | 0.825 | 0.847 | 0.826 | 0.836 | 0.18 |
Metrics calculated using an IoU threshold of 0.5..
Following object detection, a spatial dissolve operation merged overlapping bounding boxes, refining initial predictions. Fig. 8 shows the raw Faster R-CNN detections, while Fig. 9 shows the result after the spatial dissolution. The dissolve operation consolidated multiple detections of the same plant, reducing the number of detected objects by 15%.
Fig. 9 qualitatively compares predicted bounding boxes (after dissolution) and regions of high ExG values. The visual corres - pondence between bounding boxes and high ExG areas indicates accurate detection of healthy cabbage plants (Woebbecke et al., 1995). Fig. 10 visually demonstrates the general agreement between the predicted bounding boxes (green) and regions of high ExG values (warmer colors in the ExG map), suggesting that the model accurately detects areas corresponding to healthy cabbage plants.
An ExG-based dissolve operation (ExG threshold of 0.05) generated polygons representing dense vegetation. Fig. 11 compares object delineation using the ExG-based dissolve and the 15 cm buffer zone. This method leverages spectral information to define object boundaries, offering a differentiated and potentially more accurate approach to object delineation compared to solely relying on bounding boxes.
The buffer zone and ExG-based dissolve methods were quantitatively compared using NDVI statistics (Table 4). The dissolved method yielded a significantly higher mean NDVI (0.470) compared to the buffer method (0.300) and a lower proportion of low NDVI values (≤ 0.3): 12.54% compared to 49.38%. Variances were similar.
Table 4 . Comparison of NDVI statistics for cabbage regions delineated by the buffer and dissolve methods.
Analysis Item | Buffer Method | Dissolved Method | Value Difference (%) |
---|---|---|---|
NDVI Mean Value | 0.300 | 0.470 | +0.17 |
NDVI Variance | 0.016 | 0.016 | 0.0 |
Count of NDVI Values ≤ 0.3 | 2,496 | 634 | –1,862 |
Proportion of NDVI Values ≤ 0.3 (%) | 49.38 | 12.54 | –36.84 |
NDVI values are unitless..
Fig. 12 supports these findings. The KDE plot clearly shows that the dissolved method’s NDVI distribution (blue line) is shifted towards higher values, with a peak in the 0.4–0.6 range, while the buffer method’s distribution (red line) peaks in the lower 0.1–0.3 range.
A scatter plot analysis (Fig. 13) provided a point-by-point comparison of NDVI values derived from the two methods. Fig. 13’s scatter plot confirms that the dissolving method generally yields higher NDVI values, with most points falling below the 1:1 line.
This study demonstrates the substantial potential of combining UAV-based multispectral imaging with a tailored Faster R-CNN model for accurate, efficient, and automated early-stage cabbage detection and health assessment. The high-performance metrics achieved by the Faster R-CNN model (Table 2), surpassing both YOLOv5s and SSD in mAP and recall (Table 3), confirm its suitability for this task. Superior recall is particularly important in this early-growth stage context, as failing to detect stressed or diseased seedlings (false negatives) can have significant downstream consequences for yield. The speed advantage of YOLOv5s, while noteworthy, is outweighed by the improved accuracy of Faster R-CNN for this application.
A key contribution of this work is the comparative evaluation of two object delineation strategies: fixed-radius buffering and ExG-based dissolving. The qualitative (Figs. 10 and 11) and quantitative (Table 4, Figs. 12 and 13) results unequivocally demonstrate the superiority of the ExG-based approach. By leveraging the spectral characteristics of healthy vegetation, the dissolving method produced more accurate plant delineations, avoiding the inclusion of bare soil and other non-cabbage areas that were often included within the fixed-radius buffer zones. This is directly reflected in the significantly higher mean NDVI values and the lower proportion of low-NDVI pixels within the dissolved regions. These findings are consistent with previous research demonstrating the utility of ExG for vegetation analysis (Woebbecke et al., 1995).
These technological advancements have direct and significant implications for precision agriculture practices. The ability to accurately identify individual plants and assess their health status (via NDVI derived from the delineated objects) opens up several opportunities: (a) targeted interventions – the framework facilitates precise, variable-rate application of resources (water, nutrients, pesticides). Instead of treating an entire field uniformly, interventions can be targeted only to those areas or individual plants that require them, reducing costs and environmental impact; (b) early stress detection – by monitoring subtle changes in NDVI, the system can provide early warnings of stress (nutrient deficiencies, water stress, disease onset) before visual symptoms become apparent, allowing for timely corrective action; and (c) improved yield prediction – accurate plant counts and health assessments, early in the growing season, provide valuable inputs for yield prediction models, enabling better planning and resource management.
While the results are promising, several limitations and avenues for future research should be acknowledged: (a) over-merging – the ExG-based dissolve, while effective, occasionally merged closely spaced plants. Future work will investigate more sophisticated delineation techniques, such as watershed segmentation and marker-controlled watershed segmentation, potentially incorporating plant size and shape information to mitigate this issue (Neubeck and Van Gool, 2006); (b) spectral resolution – this study utilized multispectral data. Future research could explore the use of hyperspectral imagery, which offers finer spectral resolution and the potential to detect even more subtle variations in plant physiological status; (c) environmental variability – data acquisition was performed under relatively ideal conditions (clear skies, midday). Future work should address the impact of varying illumination, atmospheric conditions, and shadow effects. This includes exploring robust atmospheric correction methods and potentially incorporating data multiple times of day; (d) generalizability – further validation is needed across different cabbage varieties, soil types, environmental conditions, and other crops with similar morphology (e.g., broccoli, cauliflower, lettuce); and (e) real-time System: A key goal is to develop a realtime implementation of this framework, integrating the image acquisition, processing, and analysis into a single, streamlined system for immediate on-farm decision support. Addressing these limitations will enhance the robustness, accuracy, and practical applicability of the framework, paving the way for its broader adoption in precision agriculture.
This study successfully developed and validated a novel framework for the precision monitoring of early-stage cabbage using UAV-based multispectral imaging and a customized Faster R-CNN deep learning model. The framework integrates data acquisition, preprocessing, object detection, and object delineation, providing a complete pipeline for automated cabbage analysis. The Faster R-CNN model, with a ResNet-50 backbone and FPN, achieved high accuracy in detecting individual cabbage plants (mAP = 0.900 on the test dataset), outperforming YOLOv5s and SSD in this specific application. This superior performance, particularly in terms of recall, is crucial for early-stage monitoring where minimizing missed detections is paramount. A key contribution of this work is the comparative analysis of object delineation methods. The ExG-based dissolution operation demonstrated significantly improved performance compared to the 15cm buffer zone approach, resulting in higher mean NDVI values (0.470 vs. 0.300) and a lower proportion of pixels with low NDVI values (12.54% vs. 49.38%). This indicates that the ExG-based method more accurately isolates healthy, photosynthetically active cabbage vegetation, providing a more reliable basis for assessing plant health.
The proposed framework offers a valuable tool for precision agriculture, enabling data-driven decision-making in cabbage cultivation. The accurate plant detection and health assessment capabilities can facilitate optimized resource allocation (e.g., variable-rate irrigation and fertilization) and early stress detection, ultimately leading to improved crop productivity and sustainability. While this study focused on early-stage cabbage, the framework’s core principles are adaptable to other crops with similar growth patterns. Future research will focus on addressing the identified limitations, including refining the object delineation algorithms to handle dense planting and overlapping plants (e.g., exploring watershed segmentation), investigating the use of hyperspectral imagery for enhanced stress detection, incorporating atmospheric correction techniques, and evaluating the framework’s performance across diverse environmental conditions and cabbage varieties. Ultimately, the goal is to develop a robust, real-time system for precision crop monitoring that can be widely adopted by farmers to improve agricultural practices.
None.
No potential conflict of interest relevant to this article was reported.
Table 1 . Specifications of the UAV platform, RGB camera, and multispectral sensor used for data acquisition.
Equipment/Sensor | Model | Manufacturer | Flight Altitude (m) | Resolution (cm/pixel) | Spectral Bands (nm) |
---|---|---|---|---|---|
UAV Platform | DJI Matrice 300 RTK | DJI, Shenzhen, China | - | - | - |
RGB Sensor | Zenmuse H20T | DJI, Shenzhen, China | 30 | 1.1 | - |
Multispectral Sensor | RedEdge-MX | MicaSense, Seattle, WA, USA | 40 | 2.87 | Blue (475 ± 32), Green (560 ± 27), Red (668 ± 14), Red Edge (717 ± 12), NIR (842 ± 57) |
Table 2 . Performance metrics of the Faster R-CNN model on the validation and test datasets (IoU threshold = 0.5).
Dataset | mAP | Precision | Recall | F1-Score | AP |
---|---|---|---|---|---|
Validation | 0.890 | 0.900 | 0.910 | 0.900 | 0.890 |
Test | 0.900 | 0.910 | 0.920 | 0.910 | 0.900 |
Table 3 . Performance comparison of Faster R-CNN, YOLOv5s, and SSD models on the test dataset.
Model | mAP | Precision | Recall | F1-Score | Inference Time (s/image) |
---|---|---|---|---|---|
Faster R-CNN | 0.900 | 0.910 | 0.920 | 0.910 | 0.12 |
YOLOv5s | 0.852 | 0.921 | 0.853 | 0.885 | 0.05 |
SSD | 0.825 | 0.847 | 0.826 | 0.836 | 0.18 |
Metrics calculated using an IoU threshold of 0.5..
Table 4 . Comparison of NDVI statistics for cabbage regions delineated by the buffer and dissolve methods.
Analysis Item | Buffer Method | Dissolved Method | Value Difference (%) |
---|---|---|---|
NDVI Mean Value | 0.300 | 0.470 | +0.17 |
NDVI Variance | 0.016 | 0.016 | 0.0 |
Count of NDVI Values ≤ 0.3 | 2,496 | 634 | –1,862 |
Proportion of NDVI Values ≤ 0.3 (%) | 49.38 | 12.54 | –36.84 |
NDVI values are unitless..
Chul-Soo Ye
Korean J. Remote Sens. 2025; 41(1): 185-197Chansol Kim, Seungchan Lim, Donggyu Kim, Hohyun Jeong, Chuluong Choi
Korean J. Remote Sens. 2025; 41(1): 87-100Sungkyu Jeong, Byeongcheol Kim, Seonyoung Park, Eugene Chung, Soyoung Lee
Korean J. Remote Sens. 2024; 40(6): 1409-1419