Korean J. Remote Sens. 2024; 40(6): 1289-1294
Published online: December 31, 2024
https://doi.org/10.7780/kjrs.2024.40.6.1.33
© Korean Society of Remote Sensing
Correspondence to : Kyung-Soo Han
E-mail: kyung-soo.han@pknu.ac.kr
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
This study compares two resampling methods, the k-dimensional tree (kd-tree) and the Python interface for Terra Advanced Fusion Resample/Reprojection functions (PyTAF), using surface albedo data from the GEO-KOMSAT-2A (GK2A) Advanced Meteorological Imager (AMI) geostationary satellite, with and without consideration of Earth’s curvature. Evaluation metrics include Correlation Coefficient (R), Root Mean Square Deviation (RMSD), Relative RMSD (RRMSD), Bias, spatial distribution, and processing time. Additional quantitative analyses were performed based on viewing zenith angle (VZA) intervals, ranging from 0° to 80° at 20° increments. The results showed that both resampling methods exhibited similar performance in terms of quantitative metrics, but differences emerged in processing time and VZA-specific analysis. These differences were primarily attributed to variations in algorithm design. Specifically, as VZA increased, the panoramic effect caused each pixel to cover a larger geographic area, resulting in geometric distortions. Additionally, the influence of reflectance variability between snow-covered and non-snow-covered regions further exacerbated data uncertainty and geometric distortions. These combined factors contributed to reduced accuracy and increased errors during resampling, leading to higher RMSD and RRMSD values. This study provides empirical evidence of the performance differences between the two resampling methods, offering practical insights for selecting the optimal resampling technique based on research objectives and data conditions.
Keywords Resampling, Nearest neighbor, PyTAF, kd-tree, GK2A/AMI, Surface albedo
According to the 67th session of the United Nations Committee for the Peaceful Uses of Outer Space (COPUOS), as of January 2023, approximately 1,200 remote sensing satellites are in operation, with the frequency of satellite launches having increased significantly in recent years (Emanuelli, 2024). This surge has led to a substantial growth in satellite data, which is now being utilized across diverse fields such as agriculture, oceanography, and meteorology (Kim et al., 2022; Woo et al., 2023; Eom et al., 2023). However, due to variations in spatial resolution and projection methods among satellites, harmonizing and aligning these datasets is necessary for comparative analysis and integration, a process achieved through resampling (Friedmann, 1981). Resampling is the process of transforming satellite imagery into a different coordinate system or pixel size by recalculating and assigning pixel values from the original image to the new grid. Previous studies have demonstrated that the choice of resampling method can significantly impact analytical results, highlighting the importance of selecting an appropriate resampling technique in satellite data analysis (Porwal and Katiyar, 2014; Dung et al., 2018).
Several resampling methods are commonly employed, including nearest neighbor (NN), bilinear interpolation, and bicubic interpolation. Among these, the nearest neighbor method is often preferred in satellite-based research due to its computational efficiency, particularly when handling large datasets (Moreno and Melia, 1994). Nearest neighbor resampling methods can be categorized based on whether they account for Earth’s curvature. Methods such as the k-dimensional tree (kd-tree), which does not consider curvature, and the Python interface for Terra Advanced Fusion Resample/Reprojection functions (PyTAF), which consider curvature, are notable examples.
This study aims to analyze the differences between these two resampling approaches and evaluate their impacts on the performance and efficiency of satellite data resampling. By providing an objective comparison, this research seeks to offer practical insights for selecting the most suitable resampling method in satellite data processing workflows.
This study used daily surface albedo data with a spatial resolution of 2 km, provided by the geostationary satellite GEO-KOMSAT-2A (GK2A) Advanced Meteorological Imager (AMI), covering the full disk (FD) region that spans latitudes from 80°S to 80°N and longitudes from 47°E to 180°E. The data is offered in the Geostationary (GEOS) projection coordinate system, and the dataset for February 2022 was composited into a monthly average for analysis. The GK2A/AMI geostationary satellite data, which offer continuous wide-area monitoring and high viewing zenith angle (VZA) data, were utilized to evaluate resampling performance under varying geometric conditions. The composited data was resampled using kd-tree and PyTAF under two cases.
The first case involves cropping the data to the East Asia region of interest, defined by the latitude range of 15°–50°N and longitude range of 105°–150°E, applying the World Geodetic System 1984 (WGS84) projection, and converting the spatial resolution to 4 km. The second case maintained the original GEOS projection and geographical extent but modified the spatial resolution to 4 km. For both cases, the resampling process from 2 km to 4 km employed the nearest-neighbor method to search for the nearest points, followed by an averaging process. The resampled datasets were evaluated for performance and efficiency using metrics such as Correlation Coefficient (R), Root Mean Square Deviation (RMSD), Relative RMSD (RRMSD), Bias, spatial distribution, and processing time.
Additionally, in regions where the VZA increases, the panoramic effect causes the latitude and longitude range covered by a single pixel to grow proportionally. This leads to geometric distortions, particularly in areas with large VZA, which can affect the spatial resolution and performance of satellite data (Kim et al., 2017). To quantitatively analyze the impact of VZA on resampling performance, errors were evaluated across VZA intervals ranging from 0° to 80° in 20° increments. This study was conducted on a desktop computer equipped with an Intel Core i5-9400F CPU (2.90 GHz) and 16 GB of RAM, running a 64-bit Windows operating system.
PyTAF is a software developed for resampling NASA’s Terra satellite data. Unlike conventional brute-force search methods, it employs a block indexing algorithm designed to efficiently reduce the search area for identifying the nearest pixel (Zhao et al., 2022). The block indexing algorithm divides the Earth’s surface into equally sized spatial blocks, restricts the search for each target pixel within a specific range, and identifies the nearest point using the geodesic distance (Eq. 1), which accounts for the Earth’s curvature. In Eq. (1), Rearth represents the Earth’s radius, si, and tj denote the i-th source pixel and j-th target pixel, respectively. Latitude and longitude values, lat and lon, are expressed in degrees. The source pixel refers to a pixel in the original dataset, the target pixel refers to a newly generated pixel created during the resampling process. Such a search algorithm enhances computational efficiency compared to traditional methods by limiting the search range.
The kd-tree is a binary tree structure designed for efficient management and search of multidimensional data, first introduced by Bentley (1975). In this study, a kd-tree utilizing the sliding midpoint approach was applied. This method enhances search efficiency by considering the data distribution and adjusting the splitting point when data points are skewed to one side (Maneewongvatana and Mount, 1999). Additionally, the kd-tree calculates the straight-line distance between points using the Euclidean metric (Eq. 2), enabling efficient nearest-neighbor searches. In Eq. (2), d represents the Euclidean distance, and x and y refer to the coordinates of the source and target pixels. In this study, the cKDTree implementation in Python, which follows the same approach, was utilized. The cKDTree is implemented in C, making it faster and more efficient compared to the KDTree previously provided by SciPy. This enables efficient searches and computations in high-dimensional data.
This study evaluated the accuracy and efficiency of different resampling methods. Fig. 1 presents the quantitative evaluation metrics for each resampling method in cases 1 and 2. The correlation coefficients were 0.99 and 0.98, respectively, while RMSD values were low at 0.020 and 0.027. Bias was also close to zero.
Fig. 2 shows that RMSD remained below 0.1 in most areas, and Bias was generally near zero. However, minor differences in RMSD and Bias were observed in certain regions, likely due to differences in the resampling methods.
An evaluation of processing time and performance by VZA revealed significant differences between the two resampling methods. For the first case, PyTAF demonstrated approximately five times faster processing speeds compared to kd-tree. In contrast, for the second case, kd-tree was approximately 45 times faster than PyTAF (Table 1). In the first case, PyTAF efficiently searched for the nearest points within a predefined range, whereas kd-tree performed an exhaustive search across the entire dataset. However, in the second case, as the latitude increased, the panoramic effect caused the latitude and longitude range of a single pixel to expand, leading to an increased search radius for PyTAF and significantly longer processing times.
Table 1 CPU time by case and resampling method
Case 1 | Case 2 | |||
---|---|---|---|---|
Resampling method | kd-tree | PyTAF | kd-tree | PyTAF |
CPU time (sec) | 12.90 | 2.74 | 27.58 | 1228.21 |
The analysis of VZA ranges (Fig. 3, Table 2) indicates a trend of increasing R as VZA becomes larger. This is interpreted as being due to the greater presence of snow in high-latitude regions with high VZA. Snow has a higher reflectance compared to typical land surfaces and exhibits a more uniform distribution of reflectance in snow-covered areas. As a result, the observed values tend to approach a 1:1 match on scatter plots, which is identified as a major factor contributing to the increase in the R. In regions with high VZA (Fig. 3, Table 2), both RMSD and RRMSD values showed an increasing trend. This is attributed to the panoramic effect, where higher VZA causes each pixel to encompass a larger geographic area, leading to geometric distortions. These distortions reduce accuracy and introduce errors during the resampling process, resulting in increased RMSD and RRMSD values.
Table 2 R, RMSD, RRMSD values by VZA range
VZA range | R | RMSD | RRMSD (%) |
---|---|---|---|
0°–20° | 0.95 | 0.004 | 0.04 |
20°–40° | 0.97 | 0.009 | 0.06 |
40°–60° | 0.98 | 0.032 | 0.14 |
60°–80° | 0.98 | 0.043 | 0.11 |
Additionally, regions with high VZA, such as high-latitude areas, tend to have substantial snow cover. The high reflectance of snow significantly impacts the data values. The difference in reflectance between snow-covered and non-snow-covered areas leads to reduced data consistency, and this heterogeneity contributes to increased uncertainty during the resampling process. Therefore, if future studies consider only pixels from non-snow-covered regions by accounting for snow cover, the individual effects of these two factors on RMSD and RRMSD could be more effectively disentangled.
This study evaluated the differences in resampling methods for geostationary satellite data, focusing on the impact of Earth’s curvature in nearest-neighbor calculations on satellite data processing performance and efficiency. Both resampling methods exhibited similar performance in most quantitative metrics; however, differences were observed in processing time and error characteristics associated with VZA. These differences are attributed to algorithmic design variations and environmental factors, such as the high reflectance of snow in high-latitude regions, which influenced the resampling process. In particular, PyTAF demonstrated advantages in handling complex geometric conditions, such as high VZA, by incorporating Earth’s curvature. However, its processing time increased significantly as the search radius expanded. In contrast, kd-tree exhibited strengths in rapidly processing large datasets with its straightforward and efficient search algorithm, though its performance at high VZA may be comparatively lower. These results suggest that the choice of resampling method should be guided by specific research objectives and data processing requirements. This study empirically highlights the differences between the two resampling methods and is expected to contribute to improving the efficiency and reliability of satellite data utilization.
This work was supported by the Korea Polar Research Institute (KOPRI) under Grant PE24040.
No potential conflict of interest relevant to this article was reported.
Korean J. Remote Sens. 2024; 40(6): 1289-1294
Published online December 31, 2024 https://doi.org/10.7780/kjrs.2024.40.6.1.33
Copyright © Korean Society of Remote Sensing.
Seungkyoo Lee1, Hyun-Cheol Kim2, Daeseong Jung3, Sungwoo Park4, Sungwon Choi5, Kyung-Soo Han6*
1Master Student, Major of Spatial Information Engineering, Division of Earth Environmental System Sciences, Pukyong National University, Busan, Republic of Korea
2Principal Director, Center of Remote Sensing & GIS, Korea Polar Research Institute, Incheon, Republic of Korea
3PhD Candidate, Major of Spatial Information Engineering, Division of Earth Environmental Sciences, Pukyong National University, Busan, Republic of Korea
4Combined MS/PhD Student, Major of Spatial Information Engineering, Division of Earth Environmental Sciences, Pukyong National University, Busan, Republic of Korea
5Research Professor, Industry-University Cooperation Foundation, Pukyong National University, Busan, Republic of Korea
6Professor, Major of Spatial Information Engineering, Division of Earth Environmental Sciences, Pukyong National University, Busan, Republic of Korea
Correspondence to:Kyung-Soo Han
E-mail: kyung-soo.han@pknu.ac.kr
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
This study compares two resampling methods, the k-dimensional tree (kd-tree) and the Python interface for Terra Advanced Fusion Resample/Reprojection functions (PyTAF), using surface albedo data from the GEO-KOMSAT-2A (GK2A) Advanced Meteorological Imager (AMI) geostationary satellite, with and without consideration of Earth’s curvature. Evaluation metrics include Correlation Coefficient (R), Root Mean Square Deviation (RMSD), Relative RMSD (RRMSD), Bias, spatial distribution, and processing time. Additional quantitative analyses were performed based on viewing zenith angle (VZA) intervals, ranging from 0° to 80° at 20° increments. The results showed that both resampling methods exhibited similar performance in terms of quantitative metrics, but differences emerged in processing time and VZA-specific analysis. These differences were primarily attributed to variations in algorithm design. Specifically, as VZA increased, the panoramic effect caused each pixel to cover a larger geographic area, resulting in geometric distortions. Additionally, the influence of reflectance variability between snow-covered and non-snow-covered regions further exacerbated data uncertainty and geometric distortions. These combined factors contributed to reduced accuracy and increased errors during resampling, leading to higher RMSD and RRMSD values. This study provides empirical evidence of the performance differences between the two resampling methods, offering practical insights for selecting the optimal resampling technique based on research objectives and data conditions.
Keywords: Resampling, Nearest neighbor, PyTAF, kd-tree, GK2A/AMI, Surface albedo
According to the 67th session of the United Nations Committee for the Peaceful Uses of Outer Space (COPUOS), as of January 2023, approximately 1,200 remote sensing satellites are in operation, with the frequency of satellite launches having increased significantly in recent years (Emanuelli, 2024). This surge has led to a substantial growth in satellite data, which is now being utilized across diverse fields such as agriculture, oceanography, and meteorology (Kim et al., 2022; Woo et al., 2023; Eom et al., 2023). However, due to variations in spatial resolution and projection methods among satellites, harmonizing and aligning these datasets is necessary for comparative analysis and integration, a process achieved through resampling (Friedmann, 1981). Resampling is the process of transforming satellite imagery into a different coordinate system or pixel size by recalculating and assigning pixel values from the original image to the new grid. Previous studies have demonstrated that the choice of resampling method can significantly impact analytical results, highlighting the importance of selecting an appropriate resampling technique in satellite data analysis (Porwal and Katiyar, 2014; Dung et al., 2018).
Several resampling methods are commonly employed, including nearest neighbor (NN), bilinear interpolation, and bicubic interpolation. Among these, the nearest neighbor method is often preferred in satellite-based research due to its computational efficiency, particularly when handling large datasets (Moreno and Melia, 1994). Nearest neighbor resampling methods can be categorized based on whether they account for Earth’s curvature. Methods such as the k-dimensional tree (kd-tree), which does not consider curvature, and the Python interface for Terra Advanced Fusion Resample/Reprojection functions (PyTAF), which consider curvature, are notable examples.
This study aims to analyze the differences between these two resampling approaches and evaluate their impacts on the performance and efficiency of satellite data resampling. By providing an objective comparison, this research seeks to offer practical insights for selecting the most suitable resampling method in satellite data processing workflows.
This study used daily surface albedo data with a spatial resolution of 2 km, provided by the geostationary satellite GEO-KOMSAT-2A (GK2A) Advanced Meteorological Imager (AMI), covering the full disk (FD) region that spans latitudes from 80°S to 80°N and longitudes from 47°E to 180°E. The data is offered in the Geostationary (GEOS) projection coordinate system, and the dataset for February 2022 was composited into a monthly average for analysis. The GK2A/AMI geostationary satellite data, which offer continuous wide-area monitoring and high viewing zenith angle (VZA) data, were utilized to evaluate resampling performance under varying geometric conditions. The composited data was resampled using kd-tree and PyTAF under two cases.
The first case involves cropping the data to the East Asia region of interest, defined by the latitude range of 15°–50°N and longitude range of 105°–150°E, applying the World Geodetic System 1984 (WGS84) projection, and converting the spatial resolution to 4 km. The second case maintained the original GEOS projection and geographical extent but modified the spatial resolution to 4 km. For both cases, the resampling process from 2 km to 4 km employed the nearest-neighbor method to search for the nearest points, followed by an averaging process. The resampled datasets were evaluated for performance and efficiency using metrics such as Correlation Coefficient (R), Root Mean Square Deviation (RMSD), Relative RMSD (RRMSD), Bias, spatial distribution, and processing time.
Additionally, in regions where the VZA increases, the panoramic effect causes the latitude and longitude range covered by a single pixel to grow proportionally. This leads to geometric distortions, particularly in areas with large VZA, which can affect the spatial resolution and performance of satellite data (Kim et al., 2017). To quantitatively analyze the impact of VZA on resampling performance, errors were evaluated across VZA intervals ranging from 0° to 80° in 20° increments. This study was conducted on a desktop computer equipped with an Intel Core i5-9400F CPU (2.90 GHz) and 16 GB of RAM, running a 64-bit Windows operating system.
PyTAF is a software developed for resampling NASA’s Terra satellite data. Unlike conventional brute-force search methods, it employs a block indexing algorithm designed to efficiently reduce the search area for identifying the nearest pixel (Zhao et al., 2022). The block indexing algorithm divides the Earth’s surface into equally sized spatial blocks, restricts the search for each target pixel within a specific range, and identifies the nearest point using the geodesic distance (Eq. 1), which accounts for the Earth’s curvature. In Eq. (1), Rearth represents the Earth’s radius, si, and tj denote the i-th source pixel and j-th target pixel, respectively. Latitude and longitude values, lat and lon, are expressed in degrees. The source pixel refers to a pixel in the original dataset, the target pixel refers to a newly generated pixel created during the resampling process. Such a search algorithm enhances computational efficiency compared to traditional methods by limiting the search range.
The kd-tree is a binary tree structure designed for efficient management and search of multidimensional data, first introduced by Bentley (1975). In this study, a kd-tree utilizing the sliding midpoint approach was applied. This method enhances search efficiency by considering the data distribution and adjusting the splitting point when data points are skewed to one side (Maneewongvatana and Mount, 1999). Additionally, the kd-tree calculates the straight-line distance between points using the Euclidean metric (Eq. 2), enabling efficient nearest-neighbor searches. In Eq. (2), d represents the Euclidean distance, and x and y refer to the coordinates of the source and target pixels. In this study, the cKDTree implementation in Python, which follows the same approach, was utilized. The cKDTree is implemented in C, making it faster and more efficient compared to the KDTree previously provided by SciPy. This enables efficient searches and computations in high-dimensional data.
This study evaluated the accuracy and efficiency of different resampling methods. Fig. 1 presents the quantitative evaluation metrics for each resampling method in cases 1 and 2. The correlation coefficients were 0.99 and 0.98, respectively, while RMSD values were low at 0.020 and 0.027. Bias was also close to zero.
Fig. 2 shows that RMSD remained below 0.1 in most areas, and Bias was generally near zero. However, minor differences in RMSD and Bias were observed in certain regions, likely due to differences in the resampling methods.
An evaluation of processing time and performance by VZA revealed significant differences between the two resampling methods. For the first case, PyTAF demonstrated approximately five times faster processing speeds compared to kd-tree. In contrast, for the second case, kd-tree was approximately 45 times faster than PyTAF (Table 1). In the first case, PyTAF efficiently searched for the nearest points within a predefined range, whereas kd-tree performed an exhaustive search across the entire dataset. However, in the second case, as the latitude increased, the panoramic effect caused the latitude and longitude range of a single pixel to expand, leading to an increased search radius for PyTAF and significantly longer processing times.
Table 1 . CPU time by case and resampling method.
Case 1 | Case 2 | |||
---|---|---|---|---|
Resampling method | kd-tree | PyTAF | kd-tree | PyTAF |
CPU time (sec) | 12.90 | 2.74 | 27.58 | 1228.21 |
The analysis of VZA ranges (Fig. 3, Table 2) indicates a trend of increasing R as VZA becomes larger. This is interpreted as being due to the greater presence of snow in high-latitude regions with high VZA. Snow has a higher reflectance compared to typical land surfaces and exhibits a more uniform distribution of reflectance in snow-covered areas. As a result, the observed values tend to approach a 1:1 match on scatter plots, which is identified as a major factor contributing to the increase in the R. In regions with high VZA (Fig. 3, Table 2), both RMSD and RRMSD values showed an increasing trend. This is attributed to the panoramic effect, where higher VZA causes each pixel to encompass a larger geographic area, leading to geometric distortions. These distortions reduce accuracy and introduce errors during the resampling process, resulting in increased RMSD and RRMSD values.
Table 2 . R, RMSD, RRMSD values by VZA range.
VZA range | R | RMSD | RRMSD (%) |
---|---|---|---|
0°–20° | 0.95 | 0.004 | 0.04 |
20°–40° | 0.97 | 0.009 | 0.06 |
40°–60° | 0.98 | 0.032 | 0.14 |
60°–80° | 0.98 | 0.043 | 0.11 |
Additionally, regions with high VZA, such as high-latitude areas, tend to have substantial snow cover. The high reflectance of snow significantly impacts the data values. The difference in reflectance between snow-covered and non-snow-covered areas leads to reduced data consistency, and this heterogeneity contributes to increased uncertainty during the resampling process. Therefore, if future studies consider only pixels from non-snow-covered regions by accounting for snow cover, the individual effects of these two factors on RMSD and RRMSD could be more effectively disentangled.
This study evaluated the differences in resampling methods for geostationary satellite data, focusing on the impact of Earth’s curvature in nearest-neighbor calculations on satellite data processing performance and efficiency. Both resampling methods exhibited similar performance in most quantitative metrics; however, differences were observed in processing time and error characteristics associated with VZA. These differences are attributed to algorithmic design variations and environmental factors, such as the high reflectance of snow in high-latitude regions, which influenced the resampling process. In particular, PyTAF demonstrated advantages in handling complex geometric conditions, such as high VZA, by incorporating Earth’s curvature. However, its processing time increased significantly as the search radius expanded. In contrast, kd-tree exhibited strengths in rapidly processing large datasets with its straightforward and efficient search algorithm, though its performance at high VZA may be comparatively lower. These results suggest that the choice of resampling method should be guided by specific research objectives and data processing requirements. This study empirically highlights the differences between the two resampling methods and is expected to contribute to improving the efficiency and reliability of satellite data utilization.
This work was supported by the Korea Polar Research Institute (KOPRI) under Grant PE24040.
No potential conflict of interest relevant to this article was reported.
Table 1 . CPU time by case and resampling method.
Case 1 | Case 2 | |||
---|---|---|---|---|
Resampling method | kd-tree | PyTAF | kd-tree | PyTAF |
CPU time (sec) | 12.90 | 2.74 | 27.58 | 1228.21 |
Table 2 . R, RMSD, RRMSD values by VZA range.
VZA range | R | RMSD | RRMSD (%) |
---|---|---|---|
0°–20° | 0.95 | 0.004 | 0.04 |
20°–40° | 0.97 | 0.009 | 0.06 |
40°–60° | 0.98 | 0.032 | 0.14 |
60°–80° | 0.98 | 0.043 | 0.11 |
Seungwon Kim, Jongho Woo, Suyoung Sim, Eun-Ha Sohn, Mee-Ja Kim, Sungwon Choi, Kyung-Soo Han
Korean J. Remote Sens. 2024; 40(6): 1283-1288