Korean J. Remote Sens. 2024; 40(6): 1051-1065
Published online: December 31, 2024
https://doi.org/10.7780/kjrs.2024.40.6.1.15
© Korean Society of Remote Sensing
Correspondence to : Yangwon Lee
E-mail: modconfi@pknu.ac.kr
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The frequency and scale of large wildfires are increasing worldwide due to the rise in extreme weather events. Wildfires not only cause direct damage to human life and property but also lead to complex environmental issues, including forest ecosystem destruction and acceleration of climate change through massive greenhouse gas emissions. In South Korea, where forests cover 70% of the territory, accurate assessment and effective management of wildfire damage have emerged as critical challenges. Notably, 99.5% of domestic wildfires are small to medium fires under 100 ha, and their cumulative impact on forest ecosystems is substantial. Traditional burn severity assessment methods based on spectral indices such as normalized difference vegetation index (NDVI) and normalized burn ratio (NBR) have limitations in providing consistent evaluation criteria, as appropriate thresholds vary depending on forest types and regional characteristics. Therefore, this study developed a Transformer-based semantic segmentation model to classify burn severity into three levels: surface fire, crown scorch, and crown fire. To mitigate the class imbalance issues, we conducted experiments with three different sampling approaches for the unburned class, achieving a mean intersection over union (mIoU) of 0.748 with the final model. Notably, the model demonstrated its practical applicability by achieving over 90% prediction accuracy in validating small wildfires under 1ha. This study presents a novel methodology for rapid and accurate estimation of burn severity using deep learning-based segmentation. It is expected to provide a foundation for establishing burn severity assessment systems and effective forest restoration planning tailored to South Korea’s forest characteristics.
Keywords Burn severity, Drone image, Deep learning, Swin transformer
In recent years, climate change has resulted in the occurrence of catastrophic wildfires across the globe. As reported by Global Forest Watch, the global burned area has increased by approximately 5.4% per year over the past 22 years (2001–2003). This trend is predicted to continue in the future. In Korea, Korea Forest Service (2023) indicates that the mean number of wildfires per annum over the past decade (2014–2023) was 567, resulting in the loss of 4,004 ha of forest. In addition to the direct loss of trees, wildfires cause damage to the forest as a whole, reducing the vigor of surviving trees and increasing their vulnerability to pests (Stephens et al., 2018). Furthermore, secondary impacts, such as soil erosion and reduced water storage capacity, can elevate the probability of landslides and floods, thereby diminishing the public benefits of forests (Farid et al., 2024). It is therefore essential to conduct a rapid and accurate assessment of burn severity in order to minimize the impact. Burn severity describes the degree of ecosystem alteration following a wildfire, enabling assessment of both immediate and long-term effects on vegetation and soils (Lentile et al., 2006; Chuvieco et al., 2006; De Santis and Chuvieco, 2007). It serves as a key indicator for determining ecosystem recovery potential and trajectory while providing a quantitative basis for estimating greenhouse gas emissions from wildfires (Brewer et al., 2005; National Institute of Forest Science, 2013). In 2022, large wildfires burned over 20,000 hectares along the east coast regions of Gangneung-Donghae and Uljin-Samcheok in South Korea. However, these fires account for only 1.5% of all wildfires. A statistical analysis of 15,201 wildfires from 1991 to 2023 revealed that 15,128 (99.5%) were classified as small to medium, with a total area of less than 100 ha. As indicated by the National Institute of Forest Science (2009), a 1 ha pine forest incinerated by a wildfire is responsible for the emission of approximately the same quantity of carbon dioxide as seven cars in a year. This demonstrates that even small to medium wildfires can significantly impact ecosystem carbon cycles, highlighting the importance of assessing cumulative effects from frequent small to medium wildfires alongside large events.
However, burn severity analyses in South Korea have predominantly focused on large wildfires. Previous studies are primarily limited to one to three large wildfires, such as those in Samcheok, Gangneung-Donghae, and Goseong-Sokcho. This may result in overfitting of specific cases and limit the ability to reflect the characteristics of wildfires in different environments (Sim et al., 2020; Lee and Jeong, 2019; Shin et al., 2019). In terms of methodology, existing studies rely on satellite-based spectral indices, such as the normalized difference vegetation index (NDVI) and the normalized burn ratio (NBR), comparing pre and post-fire conditions. In particular, the NBR is a spectral index that quantifies wildfire damage using the moisture-sensitive shortwave infrared (SWIR) wavelength range. It has been actively used for burned area detection due to its higher detection accuracy compared to the NDVI (Schepers et al., 2014; van Gerrevink and Veraverbeke, 2021; Delcourt et al., 2021). This is achieved by calculating the spectral index for each wildfire, subsequently calculating the mean, and standard deviation, and extracting thresholds for each bin to classify the severity (Won et al., 2007; Escuin et al., 2008). However, it has been argued that the NBR was developed based on the U.S. forest status and therefore may not accurately reflect South Korean forest characteristics (Yang and Kim, 2023). Furthermore, the reliance on spectral indices alone reduces objectivity due to threshold variations across environmental conditions. Significant discrepancies have been observed between NBR-based severity classifications and field-validated assessments in South Korean forests (National Institute of Forest Science, 2013).
To address these limitations, studies employing supervised and unsupervised classification techniques have emerged. Hultquist et al. (2014) conducted a comparative analysis of machine learning models, including random forest and Gaussian process regression, for burn severity estimation. Similarly, Kim and Lee (2020) applied K-means and ISODATA clustering for severity classification. However, in contrast to large wildfires, small to medium wildfires exhibit subtle spectral differences between severity levels, challenging conventional classification methods based on general machine learning models or simple spectral comparisons. Accordingly, we put forth a novel approach to burn severity estimation by integrating high-resolution drone imagery with deep learning technology. Recent advancements in unmanned aerial vehicle (UAV) technology and sensor capabilities have increased the utilization of drones for burned area mapping (Bo et al., 2022; Beltrán-Marcos et al., 2023). Drone-based imagery offers distinct advantages: specialized sensors provide very high spatial resolution and rapid post-fire data acquisition before vegetation recovery begins (Pineda Valles et al., 2023; Míguez and Fernández, 2023). Additionally, in computer vision, vision transformers (ViT) have emerged as a promising solution to overcome conventional convolutional neural network (CNN) limitations. While CNN is effective in extracting local features, it is limited in modeling long-range dependencies. In contrast, ViT can effectively capture global features through attention mechanisms (Park and Kim, 2022; Naseer et al., 2021). As a result of these advantages, it has demonstrated excellent performance in remote sensing (Gibril et al., 2024; Zhao et al., 2023). In this study, we aim to construct a burn severity estimation model that can effectively detect various damage aspects of wildfires in South Korea using a transformer-based segmentation model and verify its field applicability.
This study was conducted on 4 large wildfires that occurred in 2022 and 14 small to medium wildfires that occurred between 2023 and 2024. Small to medium wildfires were defined as those affecting less than 100 ha, according to the regulations of the Korea Forest Service. Detailed wildfire occurrence status for each event is presented in Table 1. The images of the wildfires were acquired by a drone in the visible spectrum. For documenting small to medium wildfires, a DJI Phantom4 RTK drone equipped with an FC6310R sensor was utilized. For large wildfires, a DJI M300 RTK drone with a Zenmuse P1 sensor was employed. The detailed specifications of the drones are presented in Table 2.
Table 1 The list of 18 wildfires with location, time, and size information used in this study
No. | Wildfire Occurrence Status | |||
---|---|---|---|---|
Location | Start Time | Containment Time | Size (ha) | |
1 | Junggye-dong, Nowon-gu, Seoul | 2022-02-24 14:11 | 2022-02-24 17:18 | 0.54 |
2 | Haenggok-ri, Geunnam-myeon, Uljin-gun, Gyeongsangbuk-do | 2022-03-04 17:14 | 2022-03-05 18:00 | 220.86 |
3 | Jisa-dong, Gangseo-gu, Busan | 2022-03-06 11:41 | 2022-03-06 17:01 | 1.7 |
4 | Hwabuk-ri, Samgugyusa-myeon, Gunwi-gun, Daegu | 2022-04-10 13:10 | 2022-04-13 00:00 | 332.04 |
5 | Songcheong-ri, Yanggu-eup, Yanggu-gun, Gangwon-do | 2022-04-10 15:40 | 2022-04-12 21:30 | 739.74 |
6 | Guam-ri, Hwado-eup, Namyangju-si, Gyeonggi-do | 2022-04-20 11:41 | 2022-04-20 17:00 | 2.12 |
7 | Sinbok-ri, Okcheon-myeon, Yangpyeong-gun, Gyeonggi-do | 2022-05-04 21:45 | 2022-05-04 07:00 | 9.1 |
8 | Chunhwa-ri, Bubuk-myeon, Miryang-si, Gyeongsangnam-do | 2022-05-31 00:00 | 2022-06-05 13:30 | 576.98 |
9 | Hong-yeon-ri, Oksan-myeon, Buyeo-gun, Chungcheongnam-do | 2023-03-08 13:34 | 2023-03-08 20:29 | 27.92 |
10 | Geumjeong-ri, Simcheon-myeon, Yeongdong-gun, Chungcheongbuk-do | 2023-03-18 14:08 | 2023-03-18 19:00 | 13.77 |
11 | Yogok-ri, Masan-myeon, Seocheon-gun, Chungcheongnam-do | 2023-03-20 13:48 | 2023-03-20 00:00 | 15.31 |
12 | Imok-ri, Nangseong-myeon, Sangdang-gu, Cheongju-si, Chungcheongbuk-do | 2024-03-03 17:57 | 2024-03-03 19:45 | 0.17 |
13 | Guwol-ri, Gammul-myeon, Goesan-gun, Chungcheongbuk-do | 2024-03-11 11:01 | 2024-03-11 11:49 | 0.16 |
14 | Odong-ri, Janggye-myeon, Jangsu-gun, Jeonbuk-do | 2024-03-15 15:45 | 2024-03-15 17:00 | 0.19 |
15 | Hyanggyo-ri, Cheongung-myeon, Imsil-gun, Jeonbuk-do | 2024-03-16 14:15 | 2024-03-16 18:10 | 8.18 |
16 | Bugam-ri, Songnisan-myeon, Boeun-gun, Chungcheongbuk-do | 2024-03-22 12:34 | 2024-03-22 16:00 | 0.46 |
17 | Gaojak-ri, Guktojeongjungang-myeon, Yanggu-gun, Gangwon-do | 2024-04-12 13:14 | 2024-04-12 19:30 | 2.4 |
18 | Gigok-ri, Seolcheon-myeon, Muju-gun, Jeonbuk-do | 2024-04-13 13:50 | 2024-04-13 15:15 | 0.38 |
Table 2 Specifications of UAV systems used for data collection
Specifications | DJI/Phantom4 RTK | DJI/M300RTK | |
---|---|---|---|
UAV | Equipment | ![]() | ![]() |
Weight | 1.391 kg | 6.3 kg | |
Max Flight time | 30 min | 55 min | |
Max Speed | 50 km/h | 83 km/h | |
Camera | Sensor | FC6310R | Zenmuse P1 |
Sensor Size | 13.2 x 8.8 mm | 35.9 x 24 mm | |
Focal length | 8.8 mm | 24/35/50 mm | |
Pixel Size | 2.4 μm | 3.76 μm | |
Resolution | 5,472 x 3,648 | 8,192 x 5,460 |
Flight parameters were optimized according to the spatial extent and topographic characteristics of each wildfire. To ensure geometric accuracy, the camera angle was maintained perpendicular to the ground surface, and images were captured at maximum allowable altitude. Given the complex structural characteristics of forest canopies, we implemented 80-85% forward and side overlap to enhance burn severity estimation precision. In the preprocessing stage for orthomosaic generation, images with blurred focus or severely distorted GPS coordinate information were excluded. Small to medium wildfire images were processed using Pix4Dmapper and large wildfire images were processed using DJI Terra software. For Pix4Dmapper processing, matching points were generated based on the scaleinvariant feature transform algorithm, which was then used to create dense point cloud data and a digital surface model. Finally, orthomosaic images of the entire burned area were generated through georeferencing and mosaicking processes.
A specialist investigation team conducted a wildfire investigation to determine the burn perimeter. The entire burned area was surveyed by backtracking from the wildfire termination point to the ignition point. The burned area was then demarcated using GPS along the outline of the burned area. Only wildfires of a relatively limited extent, defined as those covering less than 100 ha, were subjected to detailed field observations. For large wildfires, visual assessment was employed due to accessibility constraints imposed by the extensive damage, utilizing color changes along the damage boundary as assessment criteria. Burn severity was classified into three levels following criteria from the National Institute of Forest Science (2013) based on field observations: crown fire (high), crown scorch (moderate), and surface fire (low). Crown fires are characterized by completely blackened and carbonized trees where the entire crown layer has burned. Crown scorch is defined as more than 60% crown scorch due to thermal radiation, resulting in tree mortality. Surface fire shows surface fuel and understory vegetation burns, with more than 60% of the crown structure surviving.
The quality of the label data critically determined the performance of the deep learning model. Consistent annotation criteria were applied based on RGB band values from drone imagery and visual characteristics of the burned areas. Labels were manually assigned using vector editing tools in QGIS software. Given the considerable volume of data present in the high-resolution drone imagery, we employed a coarse labeling approach to delineate the boundaries of burn severity. While this approach increased labeling efficiency and reduced data construction costs, it potentially simplified fine differences between burn severity, which could affect model performance. The vector files were subsequently integrated and converted into raster files in GeoTIFF format, with values assigned as unburned (0), surface fire (1), crown scorch (2), and crown fire (3). The same resolution and coordinate system as the input images were applied to standardize the label data for training the deep learning model.
Analysis of burn severity distribution from the constructed dataset revealed 9 cases involving crown fire, including large wildfires, while 6 cases exhibited only surface fire. Unburned areas constituted over 80% of the total imagery. Within burned areas, surface fires accounted for 72.1%, followed by crown scorch at 18.8% and crown fires at 9.1%.
To ensure model generalization, the 18 wildfire cases were randomly partitioned without overlap, allocating 14 cases for training and 2 cases each for validation and testing. The test set deliberately included both a large wildfire in Miryang-si, and a medium wildfire in Seocheon-gun, to evaluate model performance across different fire scales. For efficient processing, we restructured the burned area imagery into patch-based datasets. We configured 512 × 512 pixel patches and employed a sliding window approach, minimizing single-class patches while ensuring comprehensive coverage of the imagery.
The vertical capture characteristics of drone imagery presented inherent challenges in assessing understory vegetation damage. This was particularly evident in surface fire, where surviving upper canopy masked understory damage, creating spectral similarities with unburned areas. To address this challenge, we implemented a multi-scale approach, integrating patches extracted at 2,048 and 4,096 pixel resolutions, subsequently downsampling to 512 ×512 pixels. This method enhanced spatial context integration, improving surface fire identification accuracy.
Due to the post-event nature of wildfire data acquisition, there are inherent quantitative limitations in data collection. Particularly in South Korea, where small to medium wildfires predominate, this leads to quantitative imbalances in training data across different burn severity classes. To address this challenge and enhance model robustness, we implemented data augmentation techniques to artificially expand the size and diversity of the dataset, thereby improving training quality. The augmentation included both geometric transformations (horizontal flip, vertical flip, random rotation) and pixel intensity modifications (random brightness, HSV shift, Gaussian filter) with assigned probability values. This augmentation was applied threefold specifically to patches containing damaged class pixels.
Class imbalance represents a significant challenge in the deep learning model, as training becomes biased toward majority classes, resulting in deteriorated performance for minority classes. This challenge can be addressed through two primary approaches: model optimization-based methods, such as class weight adjustment, and data-level methods, including sampling strategies (Xia et al., 2019). In this study, we implemented a data-level approach by designing three different dataset configurations. Specifically, we established ratios between unburned and surface fire classes: 1:1, 1.5:1, and 3:1 respectively. The first configuration selectively included only patches containing burn severity classes; the second achieved a moderate ratio through random sampling of unburned patches; and the third utilized all unburned patches. These were sequentially designated as Dataset Us, Um and Ul based on their unburned area proportions. This approach aimed to optimize multi-class segmentation performance by considering class balance ratios, with detailed compositions for each dataset configuration presented in Table 3.
Table 3 Percentage distribution of burn severity classes in the annotated dataset
Annotation Class | Ratio (%) |
---|---|
Low | 72.1 |
Moderate | 18.8 |
High | 9.1 |
In this study, we employed Swin Transformer, a transformerbased model, as the backbone architecture. Swin Transformer addresses the computational limitations of the global attention mechanism in the original ViT (Liu et al., 2021). It effectively processes information at various resolutions by dividing input images into patches and constructing hierarchical feature maps through sequential patch merging. Notably, it introduces a window-based self-attention mechanism, performs attention operations on fixed-size windows at each layer, and enables feature information exchange between consecutive layers through the shifted windows technique.
Among the model variants differentiated by size and computational complexity (Swin-T, Swin-S, Swin-B, Swin-L), we utilized Swin-L with the largest hidden dimension of 192. The extracted feature maps undergo segmentation through UperNet, based on the Feature Pyramid Network, which implements a structure that harmonizes features across different resolutions through bidirectional information exchange between layers (Xiao et al., 2018). It effectively processes multi-scale contextual information by combining two features: high-resolution spatial details from lower layers and semantic information from upper layers, resulting in feature pyramids with consistent channel dimensions.
In this study, we performed multi-class segmentation, and the correlation between model predictions and labels was evaluated through confusion matrices as presented in Fig. 4. The predictive performance of the model was quantitatively assessed by comparing predictions with labels at the pixel level, calculating various performance metrics including accuracy, intersection over union (IoU), F1-score, precision, and recall.
We implemented a structure utilizing Swin-L as the backbone for feature extraction, followed by a UperNet decoder for segmentation, initializing the backbone with weights pre-trained on ImageNet-22K. Cross entropy loss was employed as the loss function, with inverse weighting applied based on pixel distribution per class to address the class imbalance. For optimization, we used AdamW with weight decay and implemented a poly-learning rate scheduler to enhance model convergence. Experiments were conducted separately for each of the three dataset configurations, with mean IoU (mIoU) measured on the validation set every 1,000 iterations, and the model showing the highest performance was selected as the final model. We evaluated the final model on the test set to assess accuracy for each burn severity class.
Comparing three configurations of sampling approaches, Dataset Um with randomly included unburned patches in balanced proportions demonstrated superior performance with a mIoU of 0.748, outperforming other combinations (Dataset Us: 0.722, Dataset Ul: 0.714), and was thus selected as the final model. The unburned class showed exceptionally high-class IoU values above 0.95 across all configurations, particularly achieving 0.962 in Datase Um. Surface fires also exhibited peak performance with a class IoU of 0.749. Notably, its recall improved to 0.873, surpassing the performance of 0.807 for Dataset Us and 0.865 for Dataset Ul, while achieving a high precision of 0.841, confirming enhanced detection accuracy. Similarly, crown scorch showed the highest accuracy in Dataset Um with a class IoU of 0.667, representing approximately 14.6% improvement compared to 0.582 for Dataset Us. Crown fires, the most challenging to estimate, exhibited significantly lower recall compared to precision across all configurations (Dataset Um: Recall 0.647, Precision 0.918), indicating a high rate of misdetection of crown fire pixels to other classes.
To precisely analyze class-wise misdetection patterns, we generated confusion matrices normalized by pixel count per class, revealing crown fire misclassification instances. Particularly, crown fires showed high rates of false detection as surface fires across all configurations: 17.7% in Dataset Um, 21.7% in Dataset Us, and 23.3% in Dataset Ul (Fig. 6). The underdetection of crown fires is attributed to inter-class data imbalance. Therefore, quantitative improvements in model prediction performance could be achieved through balanced training data from future data acquisition.
The following section presents the qualitative results of our model. In the Miryang-si case, where a large wildfire affected over 660 ha, the model demonstrated overall accurate predictions with clear delineation between burn severity levels. The final model showed significantly improved detection performance for surface fires and crown scorch, as evidenced in (b) and (e) (Fig. 7). Notably, in case (c), while the label incorrectly labeled crown scorch as surface fire, our model accurately predicted the actual burn pattern, effectively distinguishing the true damage characteristics.
In the small to medium wildfire case of Seocheon-gun, the final model exhibited robust prediction of surface fires across diverse land cover types, including forested areas (a) and bare ground (b) (Fig. 8). For crown scorch detection, as shown in (c) and (d), our model generated more continuous and natural boundaries compared to labels. In (e), despite label errors, the model accurately captured the characteristic textural patterns of crown scorch. However, unlike large wildfires, this case presented ambiguous distinctions between crown fires and other severity levels due to less pronounced canopy damage. Particularly in cases with shadowing effects on trees, as illustrated in (e), the spectral similarity between crown fires and surface fires led to the misdetection of crown fires as surface fires. These findings suggest the need for future incorporation of complex wildfire characteristics through topographical data integration and temporal analysis.
To evaluate the performance of the transformer-based model, we conducted comparative experiments using U-Net and HRNet as CNN-based backbones on Dataset Um, which showed the highest performance. The quantitative evaluation revealed that while U-Net and HRNet achieved mIoU of 0.632 and 0.634 respectively, Swin Transformer attained a higher mIoU of 0.714. As shown in Table 6, Swin Transformer demonstrated superior IoU across all classes, with particularly notable improvements in the crown fire, showing approximately 29% and 56% increases compared to U-Net (class IoU=0.487) and HRNet (class IoU=0.401), respectively. In qualitative analysis, for large wildfires (cases a and b), CNN-based models exhibited irregular boundaries between severity classes, whereas Swin Transformer produced more consistent predictions. These differences became more pronounced in small to medium wildfires (cases c and d), where the boundaries between severity classes were less prominent. Notably, while U-Net showed scattered, pixel-level predictions with significant noise in surface fire, Swin Transformer generated predictions that closely aligned with the labels. These results can be attributed to self-attention mechanisms in transformers, which effectively capture long-range dependencies and spatial contextual information across images.
Table 6 Model performance metrics for each class across different dataset configurations
Dataset | Class IoU | F1-score | Recall | Precision | Accuracy | mIoU | |
---|---|---|---|---|---|---|---|
Us | Unburned | 0.955 | 0.977 | 0.962 | 0.991 | 0.945 | 0.722 |
Low | 0.726 | 0.841 | 0.865 | 0.819 | |||
Moderate | 0.623 | 0.768 | 0.954 | 0.642 | |||
High | 0.586 | 0.739 | 0.648 | 0.859 | |||
Um | Unburned | 0.962 | 0.981 | 0.973 | 0.988 | 0.954 | 0.748 |
Low | 0.749 | 0.857 | 0.873 | 0.841 | |||
Moderate | 0.667 | 0.800 | 0.914 | 0.712 | |||
High | 0.611 | 0.759 | 0.647 | 0.918 | |||
Ul | Unburned | 0.952 | 0.975 | 0.972 | 0.979 | 0.942 | 0.714 |
Low | 0.695 | 0.820 | 0.807 | 0.807 | |||
Moderate | 0.582 | 0.735 | 0.853 | 0.646 | |||
High | 0.626 | 0.770 | 0.706 | 0.847 |
While the model effectively predicted burn severity patterns, minor noise artifacts surrounding the burned area introduced inaccuracies in area calculations. Particularly in the small to medium wildfire case of Seocheon-gun, the mixed land cover characteristics of forest and bare ground resulted in numerous false positives, with the model incorrectly identifying burned areas in bare ground areas adjacent to the actual burned area. To address these artifacts, we refined the predictions using morphological operations from the OpenCV (Open Source Computer Vision) library. These operations employ structuring element kernels, composed of zeros and ones, to modify object morphology. We implemented opening operations using rectangular kernels to sequentially apply erosion and dilation, effectively removing noise while preserving object integrity. Undetected regions within the burned area that remained unfilled after opening operations were addressed by masking the perimeters and filling undetected pixels with the modal values of neighboring pixels. Results demonstrated that this post-processing approach successfully eliminated surrounding noise while maintaining the original prediction patterns in both wildfire cases (Fig. 10).
To objectively evaluate the performance of our deep learning model, we conducted validation using independent datasets not included in the training process. Although the transformer-based model demonstrated robust performance under various conditions in previous tests, it was essential to assess potential overfitting and verify its applicability to real-world wildfire scenarios. For validation, we utilized datasets from two small wildfires (< 1 ha): one occurred in Namgok-ri, Dongi-myeon, Okcheon-gun, Chungcheongbuk-do on March 10, 2024, and another in Hugok-ri, Munui-myeon, Sangdang-gu, Cheongju-si, Chungcheongbuk-do on April 2, 2024 (Fig. 11). We compared our results with official burned area data from the Korea Forest Service. However, quantitative validation of burn severity predictions was limited due to the absence of field-measured intensity data. In the Okcheon-gun case, the model demonstrated high accuracy, predicting a burned area of 0.54 ha compared to the actual damage area of 0.57 ha. For the Cheongju-si, the model predicted a total burned area of 0.77 ha, comprising 0.71 ha (92.21%) of surface fires and 0.06 ha (7.79%) of crown scorch, against the actual burned area of 0.83 ha, achieving 92.8% accuracy. These results indicate that our proposed model effectively detects and delineates small wildfires (< 1 ha) that are challenging to identify through visual inspection.
This study aimed to develop a burn severity estimation model using a transformer-based deep learning approach, analyzing drone imagery from 18 wildfire events between 2022 and 2024. The burn severity was categorized into three levels: surface fire, crown scorch, and crown fire, with surface fires representing the predominant class. To address the inter-class imbalance, we designed three dataset configurations with varying sampling strategies based on the proportion of unburned areas. These configurations were comparatively analyzed for their impact on multi-class segmentation performance. The dataset configuration that achieved a 1.5:1 ratio of unburned to surface fire through random sampling of unburned patches showed the highest performance (mIoU of 0.748) and was selected as the final model. Notably, this configuration demonstrated superior reliability with F1-scores of 0.857 and 0.800 for surface fire and crown scorch classes, respectively, surpassing other dataset configurations Qualitative assessment confirmed model effectiveness in identifying actual burn patterns. Despite occasional label inconsistencies, the model demonstrated accurate predictions, suggesting that actual performance may exceed quantitative metrics. However, the crown fire class exhibited consistently lower performance across all configurations, with a tendency toward under-estimation, likely due to limited sample availability. To address this limitation, future improvements could be achieved by incorporating additional wildfire cases from diverse environmental conditions.
Recent studies have demonstrated that generative adversarial networks-based data augmentation techniques can achieve high detection accuracy even under limited sample conditions, suggesting another promising approach for performance enhancement (Park and Lee, 2024; Chen et al., 2022). Future research directions include implementing alternative decoders such as Mask2Former with the transformer backbone. Additionally, the integration of environmental variables, including grey-level co-occurrence matrix texture analysis and digital elevation model data, will be explored to incorporate topographical characteristics. Further enhancement of post-processing methods for noise reduction will also be investigated. The final model demonstrated efficient segmentation performance on independent datasets, confirming its applicability to real-world wildfire scenarios. This model provides high-accuracy burn severity estimations that align well with field surveys, offering potential utility for post-fire restoration planning. While field validation is still necessary, this approach could significantly reduce assessment time and costs. Moreover, this study represents a significant contribution to the scientific basis for wildfire carbon emission estimation.
Table 4 Percentage distribution of burn severity classes in train, validation, and testing across different dataset configurations
Dataset | Category | ||||
---|---|---|---|---|---|
Unburned | Low | Moderate | High | ||
Us | Train | 40.9 | 41.8 | 10.8 | 6.6 |
Validation | 88.4 | 8.5 | 2.2 | 0.9 | |
Test | 81.3 | 13.9 | 3.8 | 1 | |
Um | Train | 50.1 | 35.3 | 9.1 | 5.5 |
Validation | 88.4 | 8.5 | 2.2 | 0.9 | |
Test | 81.3 | 13.9 | 3.8 | 1 | |
Ul | Train | 61.8 | 27 | 7 | 4.2 |
Validation | 88.4 | 8.5 | 2.2 | 0.9 | |
Test | 81.3 | 13.9 | 3.8 | 1 |
Table 5 Number of image patches in train, validation, and test datasets for each dataset configuration
Dataset | Patch Size | Train | Validation | Test |
---|---|---|---|---|
Us | 512 | 5,278 | 828 | 1,915 |
Um | 512 | 6,245 | 828 | 1,915 |
Ul | 512 | 8,139 | 828 | 1,915 |
Table 7 Comparison of per-class IoU and overall accuracy between transformer-based and CNN-based models for burn severity estimation
Model | Class IoU | mIoU | mF1-score | |||
---|---|---|---|---|---|---|
Unburned | Low | Moderate | High | |||
U-Net | 0.942 | 0.590 | 0.507 | 0.487 | 0.632 | 0.760 |
HRNet | 0.952 | 0.637 | 0.547 | 0.401 | 0.634 | 0.758 |
Swin Transformer | 0.952 | 0.695 | 0.582 | 0.626 | 0.714 | 0.825 |
This research was supported by a grant (2021-MOIS37-002) from “Intelligent Technology Development Program on Disaster Response and Emergency Management” funded by the Ministry of Interior and Safety.
No potential conflict of interest relevant to this article was reported.
Korean J. Remote Sens. 2024; 40(6): 1051-1065
Published online December 31, 2024 https://doi.org/10.7780/kjrs.2024.40.6.1.15
Copyright © Korean Society of Remote Sensing.
Youngmin Seo1, Nam Gyun Kim2, Chan Ho Yeom2, Mi Na Jang3, Sun Jeoung Lee4, Yangwon Lee5*
1Master Student, Major of Spatial Information Engineering, Division of Earth Environmental System Science, Pukyong National University, Busan, Republic of Korea
2Director, Forest Fire Research Management Center, Korea Forest Fire Management Service Association, Daejeon, Republic of Korea
3Researcher, Division of Forest Fire, National Institute of Forest Science, Seoul, Republic of Korea
4Researcher, Forest Carbon on Climate Change, National Institute of Forest Science, Seoul, Republic of Korea
5Professor, Major of Geomatics Engineering, Division of Earth Environmental System Science, Pukyong National University, Busan, Republic of Korea
Correspondence to:Yangwon Lee
E-mail: modconfi@pknu.ac.kr
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The frequency and scale of large wildfires are increasing worldwide due to the rise in extreme weather events. Wildfires not only cause direct damage to human life and property but also lead to complex environmental issues, including forest ecosystem destruction and acceleration of climate change through massive greenhouse gas emissions. In South Korea, where forests cover 70% of the territory, accurate assessment and effective management of wildfire damage have emerged as critical challenges. Notably, 99.5% of domestic wildfires are small to medium fires under 100 ha, and their cumulative impact on forest ecosystems is substantial. Traditional burn severity assessment methods based on spectral indices such as normalized difference vegetation index (NDVI) and normalized burn ratio (NBR) have limitations in providing consistent evaluation criteria, as appropriate thresholds vary depending on forest types and regional characteristics. Therefore, this study developed a Transformer-based semantic segmentation model to classify burn severity into three levels: surface fire, crown scorch, and crown fire. To mitigate the class imbalance issues, we conducted experiments with three different sampling approaches for the unburned class, achieving a mean intersection over union (mIoU) of 0.748 with the final model. Notably, the model demonstrated its practical applicability by achieving over 90% prediction accuracy in validating small wildfires under 1ha. This study presents a novel methodology for rapid and accurate estimation of burn severity using deep learning-based segmentation. It is expected to provide a foundation for establishing burn severity assessment systems and effective forest restoration planning tailored to South Korea’s forest characteristics.
Keywords: Burn severity, Drone image, Deep learning, Swin transformer
In recent years, climate change has resulted in the occurrence of catastrophic wildfires across the globe. As reported by Global Forest Watch, the global burned area has increased by approximately 5.4% per year over the past 22 years (2001–2003). This trend is predicted to continue in the future. In Korea, Korea Forest Service (2023) indicates that the mean number of wildfires per annum over the past decade (2014–2023) was 567, resulting in the loss of 4,004 ha of forest. In addition to the direct loss of trees, wildfires cause damage to the forest as a whole, reducing the vigor of surviving trees and increasing their vulnerability to pests (Stephens et al., 2018). Furthermore, secondary impacts, such as soil erosion and reduced water storage capacity, can elevate the probability of landslides and floods, thereby diminishing the public benefits of forests (Farid et al., 2024). It is therefore essential to conduct a rapid and accurate assessment of burn severity in order to minimize the impact. Burn severity describes the degree of ecosystem alteration following a wildfire, enabling assessment of both immediate and long-term effects on vegetation and soils (Lentile et al., 2006; Chuvieco et al., 2006; De Santis and Chuvieco, 2007). It serves as a key indicator for determining ecosystem recovery potential and trajectory while providing a quantitative basis for estimating greenhouse gas emissions from wildfires (Brewer et al., 2005; National Institute of Forest Science, 2013). In 2022, large wildfires burned over 20,000 hectares along the east coast regions of Gangneung-Donghae and Uljin-Samcheok in South Korea. However, these fires account for only 1.5% of all wildfires. A statistical analysis of 15,201 wildfires from 1991 to 2023 revealed that 15,128 (99.5%) were classified as small to medium, with a total area of less than 100 ha. As indicated by the National Institute of Forest Science (2009), a 1 ha pine forest incinerated by a wildfire is responsible for the emission of approximately the same quantity of carbon dioxide as seven cars in a year. This demonstrates that even small to medium wildfires can significantly impact ecosystem carbon cycles, highlighting the importance of assessing cumulative effects from frequent small to medium wildfires alongside large events.
However, burn severity analyses in South Korea have predominantly focused on large wildfires. Previous studies are primarily limited to one to three large wildfires, such as those in Samcheok, Gangneung-Donghae, and Goseong-Sokcho. This may result in overfitting of specific cases and limit the ability to reflect the characteristics of wildfires in different environments (Sim et al., 2020; Lee and Jeong, 2019; Shin et al., 2019). In terms of methodology, existing studies rely on satellite-based spectral indices, such as the normalized difference vegetation index (NDVI) and the normalized burn ratio (NBR), comparing pre and post-fire conditions. In particular, the NBR is a spectral index that quantifies wildfire damage using the moisture-sensitive shortwave infrared (SWIR) wavelength range. It has been actively used for burned area detection due to its higher detection accuracy compared to the NDVI (Schepers et al., 2014; van Gerrevink and Veraverbeke, 2021; Delcourt et al., 2021). This is achieved by calculating the spectral index for each wildfire, subsequently calculating the mean, and standard deviation, and extracting thresholds for each bin to classify the severity (Won et al., 2007; Escuin et al., 2008). However, it has been argued that the NBR was developed based on the U.S. forest status and therefore may not accurately reflect South Korean forest characteristics (Yang and Kim, 2023). Furthermore, the reliance on spectral indices alone reduces objectivity due to threshold variations across environmental conditions. Significant discrepancies have been observed between NBR-based severity classifications and field-validated assessments in South Korean forests (National Institute of Forest Science, 2013).
To address these limitations, studies employing supervised and unsupervised classification techniques have emerged. Hultquist et al. (2014) conducted a comparative analysis of machine learning models, including random forest and Gaussian process regression, for burn severity estimation. Similarly, Kim and Lee (2020) applied K-means and ISODATA clustering for severity classification. However, in contrast to large wildfires, small to medium wildfires exhibit subtle spectral differences between severity levels, challenging conventional classification methods based on general machine learning models or simple spectral comparisons. Accordingly, we put forth a novel approach to burn severity estimation by integrating high-resolution drone imagery with deep learning technology. Recent advancements in unmanned aerial vehicle (UAV) technology and sensor capabilities have increased the utilization of drones for burned area mapping (Bo et al., 2022; Beltrán-Marcos et al., 2023). Drone-based imagery offers distinct advantages: specialized sensors provide very high spatial resolution and rapid post-fire data acquisition before vegetation recovery begins (Pineda Valles et al., 2023; Míguez and Fernández, 2023). Additionally, in computer vision, vision transformers (ViT) have emerged as a promising solution to overcome conventional convolutional neural network (CNN) limitations. While CNN is effective in extracting local features, it is limited in modeling long-range dependencies. In contrast, ViT can effectively capture global features through attention mechanisms (Park and Kim, 2022; Naseer et al., 2021). As a result of these advantages, it has demonstrated excellent performance in remote sensing (Gibril et al., 2024; Zhao et al., 2023). In this study, we aim to construct a burn severity estimation model that can effectively detect various damage aspects of wildfires in South Korea using a transformer-based segmentation model and verify its field applicability.
This study was conducted on 4 large wildfires that occurred in 2022 and 14 small to medium wildfires that occurred between 2023 and 2024. Small to medium wildfires were defined as those affecting less than 100 ha, according to the regulations of the Korea Forest Service. Detailed wildfire occurrence status for each event is presented in Table 1. The images of the wildfires were acquired by a drone in the visible spectrum. For documenting small to medium wildfires, a DJI Phantom4 RTK drone equipped with an FC6310R sensor was utilized. For large wildfires, a DJI M300 RTK drone with a Zenmuse P1 sensor was employed. The detailed specifications of the drones are presented in Table 2.
Table 1 . The list of 18 wildfires with location, time, and size information used in this study.
No. | Wildfire Occurrence Status | |||
---|---|---|---|---|
Location | Start Time | Containment Time | Size (ha) | |
1 | Junggye-dong, Nowon-gu, Seoul | 2022-02-24 14:11 | 2022-02-24 17:18 | 0.54 |
2 | Haenggok-ri, Geunnam-myeon, Uljin-gun, Gyeongsangbuk-do | 2022-03-04 17:14 | 2022-03-05 18:00 | 220.86 |
3 | Jisa-dong, Gangseo-gu, Busan | 2022-03-06 11:41 | 2022-03-06 17:01 | 1.7 |
4 | Hwabuk-ri, Samgugyusa-myeon, Gunwi-gun, Daegu | 2022-04-10 13:10 | 2022-04-13 00:00 | 332.04 |
5 | Songcheong-ri, Yanggu-eup, Yanggu-gun, Gangwon-do | 2022-04-10 15:40 | 2022-04-12 21:30 | 739.74 |
6 | Guam-ri, Hwado-eup, Namyangju-si, Gyeonggi-do | 2022-04-20 11:41 | 2022-04-20 17:00 | 2.12 |
7 | Sinbok-ri, Okcheon-myeon, Yangpyeong-gun, Gyeonggi-do | 2022-05-04 21:45 | 2022-05-04 07:00 | 9.1 |
8 | Chunhwa-ri, Bubuk-myeon, Miryang-si, Gyeongsangnam-do | 2022-05-31 00:00 | 2022-06-05 13:30 | 576.98 |
9 | Hong-yeon-ri, Oksan-myeon, Buyeo-gun, Chungcheongnam-do | 2023-03-08 13:34 | 2023-03-08 20:29 | 27.92 |
10 | Geumjeong-ri, Simcheon-myeon, Yeongdong-gun, Chungcheongbuk-do | 2023-03-18 14:08 | 2023-03-18 19:00 | 13.77 |
11 | Yogok-ri, Masan-myeon, Seocheon-gun, Chungcheongnam-do | 2023-03-20 13:48 | 2023-03-20 00:00 | 15.31 |
12 | Imok-ri, Nangseong-myeon, Sangdang-gu, Cheongju-si, Chungcheongbuk-do | 2024-03-03 17:57 | 2024-03-03 19:45 | 0.17 |
13 | Guwol-ri, Gammul-myeon, Goesan-gun, Chungcheongbuk-do | 2024-03-11 11:01 | 2024-03-11 11:49 | 0.16 |
14 | Odong-ri, Janggye-myeon, Jangsu-gun, Jeonbuk-do | 2024-03-15 15:45 | 2024-03-15 17:00 | 0.19 |
15 | Hyanggyo-ri, Cheongung-myeon, Imsil-gun, Jeonbuk-do | 2024-03-16 14:15 | 2024-03-16 18:10 | 8.18 |
16 | Bugam-ri, Songnisan-myeon, Boeun-gun, Chungcheongbuk-do | 2024-03-22 12:34 | 2024-03-22 16:00 | 0.46 |
17 | Gaojak-ri, Guktojeongjungang-myeon, Yanggu-gun, Gangwon-do | 2024-04-12 13:14 | 2024-04-12 19:30 | 2.4 |
18 | Gigok-ri, Seolcheon-myeon, Muju-gun, Jeonbuk-do | 2024-04-13 13:50 | 2024-04-13 15:15 | 0.38 |
Table 2 . Specifications of UAV systems used for data collection.
Specifications | DJI/Phantom4 RTK | DJI/M300RTK | |
---|---|---|---|
UAV | Equipment | ![]() | ![]() |
Weight | 1.391 kg | 6.3 kg | |
Max Flight time | 30 min | 55 min | |
Max Speed | 50 km/h | 83 km/h | |
Camera | Sensor | FC6310R | Zenmuse P1 |
Sensor Size | 13.2 x 8.8 mm | 35.9 x 24 mm | |
Focal length | 8.8 mm | 24/35/50 mm | |
Pixel Size | 2.4 μm | 3.76 μm | |
Resolution | 5,472 x 3,648 | 8,192 x 5,460 |
Flight parameters were optimized according to the spatial extent and topographic characteristics of each wildfire. To ensure geometric accuracy, the camera angle was maintained perpendicular to the ground surface, and images were captured at maximum allowable altitude. Given the complex structural characteristics of forest canopies, we implemented 80-85% forward and side overlap to enhance burn severity estimation precision. In the preprocessing stage for orthomosaic generation, images with blurred focus or severely distorted GPS coordinate information were excluded. Small to medium wildfire images were processed using Pix4Dmapper and large wildfire images were processed using DJI Terra software. For Pix4Dmapper processing, matching points were generated based on the scaleinvariant feature transform algorithm, which was then used to create dense point cloud data and a digital surface model. Finally, orthomosaic images of the entire burned area were generated through georeferencing and mosaicking processes.
A specialist investigation team conducted a wildfire investigation to determine the burn perimeter. The entire burned area was surveyed by backtracking from the wildfire termination point to the ignition point. The burned area was then demarcated using GPS along the outline of the burned area. Only wildfires of a relatively limited extent, defined as those covering less than 100 ha, were subjected to detailed field observations. For large wildfires, visual assessment was employed due to accessibility constraints imposed by the extensive damage, utilizing color changes along the damage boundary as assessment criteria. Burn severity was classified into three levels following criteria from the National Institute of Forest Science (2013) based on field observations: crown fire (high), crown scorch (moderate), and surface fire (low). Crown fires are characterized by completely blackened and carbonized trees where the entire crown layer has burned. Crown scorch is defined as more than 60% crown scorch due to thermal radiation, resulting in tree mortality. Surface fire shows surface fuel and understory vegetation burns, with more than 60% of the crown structure surviving.
The quality of the label data critically determined the performance of the deep learning model. Consistent annotation criteria were applied based on RGB band values from drone imagery and visual characteristics of the burned areas. Labels were manually assigned using vector editing tools in QGIS software. Given the considerable volume of data present in the high-resolution drone imagery, we employed a coarse labeling approach to delineate the boundaries of burn severity. While this approach increased labeling efficiency and reduced data construction costs, it potentially simplified fine differences between burn severity, which could affect model performance. The vector files were subsequently integrated and converted into raster files in GeoTIFF format, with values assigned as unburned (0), surface fire (1), crown scorch (2), and crown fire (3). The same resolution and coordinate system as the input images were applied to standardize the label data for training the deep learning model.
Analysis of burn severity distribution from the constructed dataset revealed 9 cases involving crown fire, including large wildfires, while 6 cases exhibited only surface fire. Unburned areas constituted over 80% of the total imagery. Within burned areas, surface fires accounted for 72.1%, followed by crown scorch at 18.8% and crown fires at 9.1%.
To ensure model generalization, the 18 wildfire cases were randomly partitioned without overlap, allocating 14 cases for training and 2 cases each for validation and testing. The test set deliberately included both a large wildfire in Miryang-si, and a medium wildfire in Seocheon-gun, to evaluate model performance across different fire scales. For efficient processing, we restructured the burned area imagery into patch-based datasets. We configured 512 × 512 pixel patches and employed a sliding window approach, minimizing single-class patches while ensuring comprehensive coverage of the imagery.
The vertical capture characteristics of drone imagery presented inherent challenges in assessing understory vegetation damage. This was particularly evident in surface fire, where surviving upper canopy masked understory damage, creating spectral similarities with unburned areas. To address this challenge, we implemented a multi-scale approach, integrating patches extracted at 2,048 and 4,096 pixel resolutions, subsequently downsampling to 512 ×512 pixels. This method enhanced spatial context integration, improving surface fire identification accuracy.
Due to the post-event nature of wildfire data acquisition, there are inherent quantitative limitations in data collection. Particularly in South Korea, where small to medium wildfires predominate, this leads to quantitative imbalances in training data across different burn severity classes. To address this challenge and enhance model robustness, we implemented data augmentation techniques to artificially expand the size and diversity of the dataset, thereby improving training quality. The augmentation included both geometric transformations (horizontal flip, vertical flip, random rotation) and pixel intensity modifications (random brightness, HSV shift, Gaussian filter) with assigned probability values. This augmentation was applied threefold specifically to patches containing damaged class pixels.
Class imbalance represents a significant challenge in the deep learning model, as training becomes biased toward majority classes, resulting in deteriorated performance for minority classes. This challenge can be addressed through two primary approaches: model optimization-based methods, such as class weight adjustment, and data-level methods, including sampling strategies (Xia et al., 2019). In this study, we implemented a data-level approach by designing three different dataset configurations. Specifically, we established ratios between unburned and surface fire classes: 1:1, 1.5:1, and 3:1 respectively. The first configuration selectively included only patches containing burn severity classes; the second achieved a moderate ratio through random sampling of unburned patches; and the third utilized all unburned patches. These were sequentially designated as Dataset Us, Um and Ul based on their unburned area proportions. This approach aimed to optimize multi-class segmentation performance by considering class balance ratios, with detailed compositions for each dataset configuration presented in Table 3.
Table 3 . Percentage distribution of burn severity classes in the annotated dataset.
Annotation Class | Ratio (%) |
---|---|
Low | 72.1 |
Moderate | 18.8 |
High | 9.1 |
In this study, we employed Swin Transformer, a transformerbased model, as the backbone architecture. Swin Transformer addresses the computational limitations of the global attention mechanism in the original ViT (Liu et al., 2021). It effectively processes information at various resolutions by dividing input images into patches and constructing hierarchical feature maps through sequential patch merging. Notably, it introduces a window-based self-attention mechanism, performs attention operations on fixed-size windows at each layer, and enables feature information exchange between consecutive layers through the shifted windows technique.
Among the model variants differentiated by size and computational complexity (Swin-T, Swin-S, Swin-B, Swin-L), we utilized Swin-L with the largest hidden dimension of 192. The extracted feature maps undergo segmentation through UperNet, based on the Feature Pyramid Network, which implements a structure that harmonizes features across different resolutions through bidirectional information exchange between layers (Xiao et al., 2018). It effectively processes multi-scale contextual information by combining two features: high-resolution spatial details from lower layers and semantic information from upper layers, resulting in feature pyramids with consistent channel dimensions.
In this study, we performed multi-class segmentation, and the correlation between model predictions and labels was evaluated through confusion matrices as presented in Fig. 4. The predictive performance of the model was quantitatively assessed by comparing predictions with labels at the pixel level, calculating various performance metrics including accuracy, intersection over union (IoU), F1-score, precision, and recall.
We implemented a structure utilizing Swin-L as the backbone for feature extraction, followed by a UperNet decoder for segmentation, initializing the backbone with weights pre-trained on ImageNet-22K. Cross entropy loss was employed as the loss function, with inverse weighting applied based on pixel distribution per class to address the class imbalance. For optimization, we used AdamW with weight decay and implemented a poly-learning rate scheduler to enhance model convergence. Experiments were conducted separately for each of the three dataset configurations, with mean IoU (mIoU) measured on the validation set every 1,000 iterations, and the model showing the highest performance was selected as the final model. We evaluated the final model on the test set to assess accuracy for each burn severity class.
Comparing three configurations of sampling approaches, Dataset Um with randomly included unburned patches in balanced proportions demonstrated superior performance with a mIoU of 0.748, outperforming other combinations (Dataset Us: 0.722, Dataset Ul: 0.714), and was thus selected as the final model. The unburned class showed exceptionally high-class IoU values above 0.95 across all configurations, particularly achieving 0.962 in Datase Um. Surface fires also exhibited peak performance with a class IoU of 0.749. Notably, its recall improved to 0.873, surpassing the performance of 0.807 for Dataset Us and 0.865 for Dataset Ul, while achieving a high precision of 0.841, confirming enhanced detection accuracy. Similarly, crown scorch showed the highest accuracy in Dataset Um with a class IoU of 0.667, representing approximately 14.6% improvement compared to 0.582 for Dataset Us. Crown fires, the most challenging to estimate, exhibited significantly lower recall compared to precision across all configurations (Dataset Um: Recall 0.647, Precision 0.918), indicating a high rate of misdetection of crown fire pixels to other classes.
To precisely analyze class-wise misdetection patterns, we generated confusion matrices normalized by pixel count per class, revealing crown fire misclassification instances. Particularly, crown fires showed high rates of false detection as surface fires across all configurations: 17.7% in Dataset Um, 21.7% in Dataset Us, and 23.3% in Dataset Ul (Fig. 6). The underdetection of crown fires is attributed to inter-class data imbalance. Therefore, quantitative improvements in model prediction performance could be achieved through balanced training data from future data acquisition.
The following section presents the qualitative results of our model. In the Miryang-si case, where a large wildfire affected over 660 ha, the model demonstrated overall accurate predictions with clear delineation between burn severity levels. The final model showed significantly improved detection performance for surface fires and crown scorch, as evidenced in (b) and (e) (Fig. 7). Notably, in case (c), while the label incorrectly labeled crown scorch as surface fire, our model accurately predicted the actual burn pattern, effectively distinguishing the true damage characteristics.
In the small to medium wildfire case of Seocheon-gun, the final model exhibited robust prediction of surface fires across diverse land cover types, including forested areas (a) and bare ground (b) (Fig. 8). For crown scorch detection, as shown in (c) and (d), our model generated more continuous and natural boundaries compared to labels. In (e), despite label errors, the model accurately captured the characteristic textural patterns of crown scorch. However, unlike large wildfires, this case presented ambiguous distinctions between crown fires and other severity levels due to less pronounced canopy damage. Particularly in cases with shadowing effects on trees, as illustrated in (e), the spectral similarity between crown fires and surface fires led to the misdetection of crown fires as surface fires. These findings suggest the need for future incorporation of complex wildfire characteristics through topographical data integration and temporal analysis.
To evaluate the performance of the transformer-based model, we conducted comparative experiments using U-Net and HRNet as CNN-based backbones on Dataset Um, which showed the highest performance. The quantitative evaluation revealed that while U-Net and HRNet achieved mIoU of 0.632 and 0.634 respectively, Swin Transformer attained a higher mIoU of 0.714. As shown in Table 6, Swin Transformer demonstrated superior IoU across all classes, with particularly notable improvements in the crown fire, showing approximately 29% and 56% increases compared to U-Net (class IoU=0.487) and HRNet (class IoU=0.401), respectively. In qualitative analysis, for large wildfires (cases a and b), CNN-based models exhibited irregular boundaries between severity classes, whereas Swin Transformer produced more consistent predictions. These differences became more pronounced in small to medium wildfires (cases c and d), where the boundaries between severity classes were less prominent. Notably, while U-Net showed scattered, pixel-level predictions with significant noise in surface fire, Swin Transformer generated predictions that closely aligned with the labels. These results can be attributed to self-attention mechanisms in transformers, which effectively capture long-range dependencies and spatial contextual information across images.
Table 6 . Model performance metrics for each class across different dataset configurations.
Dataset | Class IoU | F1-score | Recall | Precision | Accuracy | mIoU | |
---|---|---|---|---|---|---|---|
Us | Unburned | 0.955 | 0.977 | 0.962 | 0.991 | 0.945 | 0.722 |
Low | 0.726 | 0.841 | 0.865 | 0.819 | |||
Moderate | 0.623 | 0.768 | 0.954 | 0.642 | |||
High | 0.586 | 0.739 | 0.648 | 0.859 | |||
Um | Unburned | 0.962 | 0.981 | 0.973 | 0.988 | 0.954 | 0.748 |
Low | 0.749 | 0.857 | 0.873 | 0.841 | |||
Moderate | 0.667 | 0.800 | 0.914 | 0.712 | |||
High | 0.611 | 0.759 | 0.647 | 0.918 | |||
Ul | Unburned | 0.952 | 0.975 | 0.972 | 0.979 | 0.942 | 0.714 |
Low | 0.695 | 0.820 | 0.807 | 0.807 | |||
Moderate | 0.582 | 0.735 | 0.853 | 0.646 | |||
High | 0.626 | 0.770 | 0.706 | 0.847 |
While the model effectively predicted burn severity patterns, minor noise artifacts surrounding the burned area introduced inaccuracies in area calculations. Particularly in the small to medium wildfire case of Seocheon-gun, the mixed land cover characteristics of forest and bare ground resulted in numerous false positives, with the model incorrectly identifying burned areas in bare ground areas adjacent to the actual burned area. To address these artifacts, we refined the predictions using morphological operations from the OpenCV (Open Source Computer Vision) library. These operations employ structuring element kernels, composed of zeros and ones, to modify object morphology. We implemented opening operations using rectangular kernels to sequentially apply erosion and dilation, effectively removing noise while preserving object integrity. Undetected regions within the burned area that remained unfilled after opening operations were addressed by masking the perimeters and filling undetected pixels with the modal values of neighboring pixels. Results demonstrated that this post-processing approach successfully eliminated surrounding noise while maintaining the original prediction patterns in both wildfire cases (Fig. 10).
To objectively evaluate the performance of our deep learning model, we conducted validation using independent datasets not included in the training process. Although the transformer-based model demonstrated robust performance under various conditions in previous tests, it was essential to assess potential overfitting and verify its applicability to real-world wildfire scenarios. For validation, we utilized datasets from two small wildfires (< 1 ha): one occurred in Namgok-ri, Dongi-myeon, Okcheon-gun, Chungcheongbuk-do on March 10, 2024, and another in Hugok-ri, Munui-myeon, Sangdang-gu, Cheongju-si, Chungcheongbuk-do on April 2, 2024 (Fig. 11). We compared our results with official burned area data from the Korea Forest Service. However, quantitative validation of burn severity predictions was limited due to the absence of field-measured intensity data. In the Okcheon-gun case, the model demonstrated high accuracy, predicting a burned area of 0.54 ha compared to the actual damage area of 0.57 ha. For the Cheongju-si, the model predicted a total burned area of 0.77 ha, comprising 0.71 ha (92.21%) of surface fires and 0.06 ha (7.79%) of crown scorch, against the actual burned area of 0.83 ha, achieving 92.8% accuracy. These results indicate that our proposed model effectively detects and delineates small wildfires (< 1 ha) that are challenging to identify through visual inspection.
This study aimed to develop a burn severity estimation model using a transformer-based deep learning approach, analyzing drone imagery from 18 wildfire events between 2022 and 2024. The burn severity was categorized into three levels: surface fire, crown scorch, and crown fire, with surface fires representing the predominant class. To address the inter-class imbalance, we designed three dataset configurations with varying sampling strategies based on the proportion of unburned areas. These configurations were comparatively analyzed for their impact on multi-class segmentation performance. The dataset configuration that achieved a 1.5:1 ratio of unburned to surface fire through random sampling of unburned patches showed the highest performance (mIoU of 0.748) and was selected as the final model. Notably, this configuration demonstrated superior reliability with F1-scores of 0.857 and 0.800 for surface fire and crown scorch classes, respectively, surpassing other dataset configurations Qualitative assessment confirmed model effectiveness in identifying actual burn patterns. Despite occasional label inconsistencies, the model demonstrated accurate predictions, suggesting that actual performance may exceed quantitative metrics. However, the crown fire class exhibited consistently lower performance across all configurations, with a tendency toward under-estimation, likely due to limited sample availability. To address this limitation, future improvements could be achieved by incorporating additional wildfire cases from diverse environmental conditions.
Recent studies have demonstrated that generative adversarial networks-based data augmentation techniques can achieve high detection accuracy even under limited sample conditions, suggesting another promising approach for performance enhancement (Park and Lee, 2024; Chen et al., 2022). Future research directions include implementing alternative decoders such as Mask2Former with the transformer backbone. Additionally, the integration of environmental variables, including grey-level co-occurrence matrix texture analysis and digital elevation model data, will be explored to incorporate topographical characteristics. Further enhancement of post-processing methods for noise reduction will also be investigated. The final model demonstrated efficient segmentation performance on independent datasets, confirming its applicability to real-world wildfire scenarios. This model provides high-accuracy burn severity estimations that align well with field surveys, offering potential utility for post-fire restoration planning. While field validation is still necessary, this approach could significantly reduce assessment time and costs. Moreover, this study represents a significant contribution to the scientific basis for wildfire carbon emission estimation.
Table 4 . Percentage distribution of burn severity classes in train, validation, and testing across different dataset configurations.
Dataset | Category | ||||
---|---|---|---|---|---|
Unburned | Low | Moderate | High | ||
Us | Train | 40.9 | 41.8 | 10.8 | 6.6 |
Validation | 88.4 | 8.5 | 2.2 | 0.9 | |
Test | 81.3 | 13.9 | 3.8 | 1 | |
Um | Train | 50.1 | 35.3 | 9.1 | 5.5 |
Validation | 88.4 | 8.5 | 2.2 | 0.9 | |
Test | 81.3 | 13.9 | 3.8 | 1 | |
Ul | Train | 61.8 | 27 | 7 | 4.2 |
Validation | 88.4 | 8.5 | 2.2 | 0.9 | |
Test | 81.3 | 13.9 | 3.8 | 1 |
Table 5 . Number of image patches in train, validation, and test datasets for each dataset configuration.
Dataset | Patch Size | Train | Validation | Test |
---|---|---|---|---|
Us | 512 | 5,278 | 828 | 1,915 |
Um | 512 | 6,245 | 828 | 1,915 |
Ul | 512 | 8,139 | 828 | 1,915 |
Table 7 . Comparison of per-class IoU and overall accuracy between transformer-based and CNN-based models for burn severity estimation.
Model | Class IoU | mIoU | mF1-score | |||
---|---|---|---|---|---|---|
Unburned | Low | Moderate | High | |||
U-Net | 0.942 | 0.590 | 0.507 | 0.487 | 0.632 | 0.760 |
HRNet | 0.952 | 0.637 | 0.547 | 0.401 | 0.634 | 0.758 |
Swin Transformer | 0.952 | 0.695 | 0.582 | 0.626 | 0.714 | 0.825 |
This research was supported by a grant (2021-MOIS37-002) from “Intelligent Technology Development Program on Disaster Response and Emergency Management” funded by the Ministry of Interior and Safety.
No potential conflict of interest relevant to this article was reported.
Table 1 . The list of 18 wildfires with location, time, and size information used in this study.
No. | Wildfire Occurrence Status | |||
---|---|---|---|---|
Location | Start Time | Containment Time | Size (ha) | |
1 | Junggye-dong, Nowon-gu, Seoul | 2022-02-24 14:11 | 2022-02-24 17:18 | 0.54 |
2 | Haenggok-ri, Geunnam-myeon, Uljin-gun, Gyeongsangbuk-do | 2022-03-04 17:14 | 2022-03-05 18:00 | 220.86 |
3 | Jisa-dong, Gangseo-gu, Busan | 2022-03-06 11:41 | 2022-03-06 17:01 | 1.7 |
4 | Hwabuk-ri, Samgugyusa-myeon, Gunwi-gun, Daegu | 2022-04-10 13:10 | 2022-04-13 00:00 | 332.04 |
5 | Songcheong-ri, Yanggu-eup, Yanggu-gun, Gangwon-do | 2022-04-10 15:40 | 2022-04-12 21:30 | 739.74 |
6 | Guam-ri, Hwado-eup, Namyangju-si, Gyeonggi-do | 2022-04-20 11:41 | 2022-04-20 17:00 | 2.12 |
7 | Sinbok-ri, Okcheon-myeon, Yangpyeong-gun, Gyeonggi-do | 2022-05-04 21:45 | 2022-05-04 07:00 | 9.1 |
8 | Chunhwa-ri, Bubuk-myeon, Miryang-si, Gyeongsangnam-do | 2022-05-31 00:00 | 2022-06-05 13:30 | 576.98 |
9 | Hong-yeon-ri, Oksan-myeon, Buyeo-gun, Chungcheongnam-do | 2023-03-08 13:34 | 2023-03-08 20:29 | 27.92 |
10 | Geumjeong-ri, Simcheon-myeon, Yeongdong-gun, Chungcheongbuk-do | 2023-03-18 14:08 | 2023-03-18 19:00 | 13.77 |
11 | Yogok-ri, Masan-myeon, Seocheon-gun, Chungcheongnam-do | 2023-03-20 13:48 | 2023-03-20 00:00 | 15.31 |
12 | Imok-ri, Nangseong-myeon, Sangdang-gu, Cheongju-si, Chungcheongbuk-do | 2024-03-03 17:57 | 2024-03-03 19:45 | 0.17 |
13 | Guwol-ri, Gammul-myeon, Goesan-gun, Chungcheongbuk-do | 2024-03-11 11:01 | 2024-03-11 11:49 | 0.16 |
14 | Odong-ri, Janggye-myeon, Jangsu-gun, Jeonbuk-do | 2024-03-15 15:45 | 2024-03-15 17:00 | 0.19 |
15 | Hyanggyo-ri, Cheongung-myeon, Imsil-gun, Jeonbuk-do | 2024-03-16 14:15 | 2024-03-16 18:10 | 8.18 |
16 | Bugam-ri, Songnisan-myeon, Boeun-gun, Chungcheongbuk-do | 2024-03-22 12:34 | 2024-03-22 16:00 | 0.46 |
17 | Gaojak-ri, Guktojeongjungang-myeon, Yanggu-gun, Gangwon-do | 2024-04-12 13:14 | 2024-04-12 19:30 | 2.4 |
18 | Gigok-ri, Seolcheon-myeon, Muju-gun, Jeonbuk-do | 2024-04-13 13:50 | 2024-04-13 15:15 | 0.38 |
Table 2 . Specifications of UAV systems used for data collection.
Specifications | DJI/Phantom4 RTK | DJI/M300RTK | |
---|---|---|---|
UAV | Equipment | ![]() | ![]() |
Weight | 1.391 kg | 6.3 kg | |
Max Flight time | 30 min | 55 min | |
Max Speed | 50 km/h | 83 km/h | |
Camera | Sensor | FC6310R | Zenmuse P1 |
Sensor Size | 13.2 x 8.8 mm | 35.9 x 24 mm | |
Focal length | 8.8 mm | 24/35/50 mm | |
Pixel Size | 2.4 μm | 3.76 μm | |
Resolution | 5,472 x 3,648 | 8,192 x 5,460 |
Table 3 . Percentage distribution of burn severity classes in the annotated dataset.
Annotation Class | Ratio (%) |
---|---|
Low | 72.1 |
Moderate | 18.8 |
High | 9.1 |
Table 4 . Percentage distribution of burn severity classes in train, validation, and testing across different dataset configurations.
Dataset | Category | ||||
---|---|---|---|---|---|
Unburned | Low | Moderate | High | ||
Us | Train | 40.9 | 41.8 | 10.8 | 6.6 |
Validation | 88.4 | 8.5 | 2.2 | 0.9 | |
Test | 81.3 | 13.9 | 3.8 | 1 | |
Um | Train | 50.1 | 35.3 | 9.1 | 5.5 |
Validation | 88.4 | 8.5 | 2.2 | 0.9 | |
Test | 81.3 | 13.9 | 3.8 | 1 | |
Ul | Train | 61.8 | 27 | 7 | 4.2 |
Validation | 88.4 | 8.5 | 2.2 | 0.9 | |
Test | 81.3 | 13.9 | 3.8 | 1 |
Table 5 . Number of image patches in train, validation, and test datasets for each dataset configuration.
Dataset | Patch Size | Train | Validation | Test |
---|---|---|---|---|
Us | 512 | 5,278 | 828 | 1,915 |
Um | 512 | 6,245 | 828 | 1,915 |
Ul | 512 | 8,139 | 828 | 1,915 |
Table 6 . Model performance metrics for each class across different dataset configurations.
Dataset | Class IoU | F1-score | Recall | Precision | Accuracy | mIoU | |
---|---|---|---|---|---|---|---|
Us | Unburned | 0.955 | 0.977 | 0.962 | 0.991 | 0.945 | 0.722 |
Low | 0.726 | 0.841 | 0.865 | 0.819 | |||
Moderate | 0.623 | 0.768 | 0.954 | 0.642 | |||
High | 0.586 | 0.739 | 0.648 | 0.859 | |||
Um | Unburned | 0.962 | 0.981 | 0.973 | 0.988 | 0.954 | 0.748 |
Low | 0.749 | 0.857 | 0.873 | 0.841 | |||
Moderate | 0.667 | 0.800 | 0.914 | 0.712 | |||
High | 0.611 | 0.759 | 0.647 | 0.918 | |||
Ul | Unburned | 0.952 | 0.975 | 0.972 | 0.979 | 0.942 | 0.714 |
Low | 0.695 | 0.820 | 0.807 | 0.807 | |||
Moderate | 0.582 | 0.735 | 0.853 | 0.646 | |||
High | 0.626 | 0.770 | 0.706 | 0.847 |
Table 7 . Comparison of per-class IoU and overall accuracy between transformer-based and CNN-based models for burn severity estimation.
Model | Class IoU | mIoU | mF1-score | |||
---|---|---|---|---|---|---|
Unburned | Low | Moderate | High | |||
U-Net | 0.942 | 0.590 | 0.507 | 0.487 | 0.632 | 0.760 |
HRNet | 0.952 | 0.637 | 0.547 | 0.401 | 0.634 | 0.758 |
Swin Transformer | 0.952 | 0.695 | 0.582 | 0.626 | 0.714 | 0.825 |
Sungkyu Jeong, Byeongcheol Kim, Seonyoung Park, Eugene Chung, Soyoung Lee
Korean J. Remote Sens. 2024; 40(6): 1409-1419Jae Young Chang, Kwan-Young Oh, Sun-Gu Lee
Korean J. Remote Sens. 2024; 40(6): 1397-1408