Korean J. Remote Sens. 2024; 40(6): 1095-1108
Published online: December 31, 2024
https://doi.org/10.7780/kjrs.2024.40.6.1.18
© Korean Society of Remote Sensing
Correspondence to : Yangwon Lee
E-mail: modconfi@pknu.ac.kr
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Methane from rice fields has a strong greenhouse effect and its accurate estimation is essential to combat climate change. In this study, we conducted an analysis based on the Gradient Boosting Machine (GBM) model using Local Data Assimilation and Prediction System (LDAPS) data, Normalized Difference Vegetation Index (NDVI), and Normalized Difference Water Index (NDWI) from VIIRS and Moderate Resolution Imaging Spectroradiometer (MODIS), and FluxNet ground observations of the Cheorwon rice paddy region. This was used to estimate methane emissions from rice paddy fields in South Korea and to create a gridded spatial information map of methane concentrations. Using data with a spatial resolution of 1.5 kilometers, we identified detailed changes within the region and generated daily maps to analyze daily changes and seasonal characteristics. To predict methane concentration, we considered the correlation between meteorological factors such as latent heat flux, humidity, soil moisture, and soil temperature and methane emissions as key variables. Latent heat flux and humidity were selected as key variables considering that the migration of methane gas is affected by the evapotranspiration process. In addition, soil moisture, which creates the anaerobic conditions necessary for methane production, and soil temperature, which affects the activity of methanogenic microorganisms, were included in the analysis. Taking these various factors into consideration, we analyzed methane emission data from rice fields in Korea and visualized them on a map to understand the pattern of methane production in response to changing weather conditions. The developed model showed a correlation coefficient of 0.91 and Mean Absolute Error (MAE) of 28.97 in the 5-fold cross-validation and an average correlation coefficient of 0.87 and MAE of 35.46 in the Leave One Year Out (LOYO) cross-validation. These results are expected to contribute to the understanding of methane generation patterns under changing weather conditions and accurate methane emission estimation. In addition, the developed model and the constructed methane concentration map can be utilized as an important basis for establishing greenhouse gas reduction policies in the agricultural sector and effective climate change response strategies in the future.
Keywords Methane emission, Rice paddy, Machine learning, Meteorological data, Satellite image
As the severity of global warming and climate change intensifies, accurately identifying and reducing greenhouse gas emissions has emerged as a crucial task. Methane (CH4), in particular, is a potent greenhouse gas, exhibiting a warming effect 27.9 times stronger than carbon dioxide over 100 years (Intergovernmental Panel on Climate Change, 2021). The agricultural sector is one of the major sources of methane emissions, accounting for a significant portion of total methane emissions (Jeong et al., 2010), with rice cultivation being one of the primary anthropogenic methane sources, responsible for approximately 11% of global anthropogenic methane emissions (Yan et al., 2009). Methane emissions from rice paddies occur as the final stage of the anaerobic decomposition of organic matter in rice field soils, carried out by methanogenic archaea (Choi et al., 2017; Conrad, 2007). The flooding of soil in rice paddies is a prerequisite for continuous methane emissions (Wassmann et al., 2000), and the practice of keeping rice paddies filled with water for extended periods provides an environment conducive to methane emissions. Therefore, reducing methane emissions in the agricultural sector can play a crucial role in climate change mitigation strategies. Rice production offers good potential to influence both methane emissions and increase soil carbon sequestration through improved water management, fertilizer use, and organic residue management (Smith et al., 2008). Consequently, it is necessary to accurately understand and estimate the relationship between rice cultivation periods and methane emissions.
Furthermore, methane emissions are influenced by various environmental factors, with weather conditions playing a particularly important role. Climate elements have been shown to have a significant impact on methane generation and emission. Soil temperature directly affects the activity of methanogens and the rate of organic matter decomposition, regulating methane production, and has been identified as one of the most important factors controlling methane generation and emission in rice fields (Khalil et al., 1998). Additionally, precipitation and irrigation management play crucial roles in determining soil moisture conditions, which are essential for creating anaerobic environments. Rice paddies are significant sources of CH4, which is released into the atmosphere through three pathways: molecular diffusion of dissolved methane at the air-water interface, ebullition of gas bubbles, and diffusive transport through the aerenchyma of rice plants (Hamamoto et al., 2024). The complex interaction of these meteorological factors acts as a key determinant in shaping methane emission patterns during the rice cultivation period.
In addition to meteorological factors, methane emissions from rice paddies are also influenced by the growth state of rice and water management. Therefore, it is important to include vegetation and moisture indices such as the Normalized Difference Vegetation Index (NDVI) and the Normalized Difference Water Index (NDWI). NDVI and NDWI indices derived from Moderate Resolution Imaging Spectroradiometer (MODIS) data have been successfully used to monitor rice growth stages and inundation conditions, which are closely related to methane emissions from rice fields (Xiao et al., 2006). NDVI, which uses the difference in reflectance between near-infrared and red light, is an indicator of vegetation health and density and is useful for assessing the growth state of rice. NDWI, which uses the difference in reflectance between near-infrared and short-wave infrared, is an indicator of vegetation moisture content and soil wetness and can be used to assess the water management status of rice paddies.
In this study, to estimate methane emissions considering these various factors comprehensively, we utilized data from the Korea Meteorological Administration’s Local Data Assimilation and Prediction System (LDAPS) and the Ministry of Agriculture, Food and Rural Affairs’ electronic map of agricultural land, known as Farmmap. We applied a Gradient Boosting Machine (GBM) based model to model complex non-linear relationships and integrate various input variables for analysis. The main objective of this study is to develop a model for estimating methane concentrations in rice paddies and to construct highresolution daily methane concentration estimation maps specialized for rice cultivation areas.
The current methods for evaluating and visualizing methane emissions in South Korea focus on generating maps for short-term and long-term methane emission assessments(Choi et al., 2018). Particularly in the methane emission mapping section, methods are presented for estimating methane emissions nationwide and visualizing them by administrative districts (Baek et al., 2023; Choi et al., 2020). Based on the results of these previous studies, this research focuses on estimating and spatially mapping methane concentrations in rice cultivation areas across the country. We estimated and spatially mapped methane emissions from rice cultivation areas with finer spatiotemporal resolution, constructing daily methane concentration estimation maps. Through this approach, we expect to accurately understand the characteristics of methane emissions in the agricultural sector and make practical contributions to the development of greenhouse gas reduction policies.
This study utilized ground observation data, meteorological data, and satellite data from 2015 to 2018 to estimate methane emissions from rice paddies. The entire country of South Korea was set as the study area to include all rice paddy regions for developing a methane concentration estimation model.
Fluxnet-CH4 is a global flux observation network that provides data from 81 observation sites including freshwater wetlands, coastal areas, highlands, and natural and managed ecosystems. It continuously measures methane exchange between the surface and atmosphere using the eddy covariance method, providing 30-minute and daily methane flux data. For estimating methane emissions from rice paddies in South Korea, daily methane flux Fluxnet-CH4 data (nmol CH4 m–2 s–1) from the Cheorwon flux tower (38°12′04″N, 127°15′02″E) were used.
The LDAPS data from the Korea Meteorological Administration were used. LDAPS has a horizontal resolution of 1.5 km and 40 vertical layers, enabling high-resolution numerical simulation within the atmospheric boundary layer and providing forecast data reflecting actual terrain and atmospheric conditions at 3-hour intervals. For methane emission estimation, soil temperature, soil moisture, relative humidity, specific humidity, latent heat flux, ground heat flux, and air pressure were selected as key variables. Soil temperature is an important factor affecting methane generation as it is related to the activity of methanogens and the decomposition of organic matter (Bridgham et al., 2013). Soil moisture and relative humidity provide an environment for methanogens to actively function by creating anaerobic conditions, and especially in flooded soils, anaerobic conditions are formed, promoting methane generation (Bridgham et al., 2013; Lai, 2009).
Observational data from the MODIS sensor were utilized. MODIS is a sensor mounted on the Terra satellite, providing data through 36 spectral bands (0.405–14.385 μm) at resolutions of 250 m, 500 m, and 1,000 m. MODIS has long-term observation data since 1999, suitable for time series analysis, and provides stable data with a 1-2 times daily observation cycle. The NDVI from MODIS Terra was derived using 16-day composite data from MOD13Q1 and MYD13Q1 products, which have a spatial resolution of 250 m and a temporal resolution of 16 days. The NDWI from MODIS Terra has a spatial resolution of 463.313 m and a temporal resolution of 8 days, derived from the near-infrared and short-wave infrared bands of the MOD09GA_006 product, and was obtained from Google. NDWI reflects the soil moisture condition, allowing for the assessment of anaerobic condition formation, while NDVI represents the biomass and photosynthetic activity of rice, used to analyze the relationship with methane emissions (Serrano et al., 2019).
The integration of these multiple observational data sources contributes to a more comprehensive understanding of methane emission characteristics in rice paddy ecosystems by complementarily utilizing the temporal continuity of ground observations and the spatial representativeness of satellite observations (Zhang et al., 2016).
In this study, the following preprocessing steps were performed to integrate Fluxnet-CH4, LDAPS, and satellite data. Methane flux data collected from the Cheorwon flux tower (38°12′04″N, 127°15′02″E) were cleaned of missing values. From the 30-minute methane flux measurements, data containing missing values were removed, resulting in 1,122 days (76.8%) of valid data out of a total observation period of 1,461 days. The preprocessed data were then converted to daily average values for analysis.
Fig. 2. shows methane emission data collected from the Cheorwon rice paddy flux tower from 2015 to 2018. Time series analysis of the collected methane emission data revealed distinct seasonal variations. In particular, there was a tendency for methane emissions to increase sharply during summer, with high methane emissions recorded up to 600 (nmol CH4 m–2 s–1). This phenomenon is thought to be due to the increased activity of methanogens as anaerobic conditions form when water fills the paddy fields during the rice cultivation period. This suggests that water management during rice cultivation can play an important role in controlling methane emissions.
Meteorological variables provided by LDAPS underwent time and spatial resolution matching. Data provided in Universal Time Coordinated (UTC) were converted to Korean Standard Time (KST), and 3-hourly data were aggregated to daily averages. Data for a total of 7 variables (soil heat flux, latent heat flux, specific humidity, relative humidity, soil moisture, soil temperature, and air pressure) were converted to the EPSG:4326 coordinate system and adjusted to the spatial extent of the study area.
Satellite data preprocessing was conducted in three stages. For MODIS NDWI and NDVI data, coordinate system transformation was performed to unify them to EPSG:4326, and the spatial extent was set based on South Korea (Define Extent). The three types of preprocessed data were spatiotemporally matched based on the location of the Cheorwon flux tower. Point information from each dataset was extracted to construct a unified dataset, which was used as input data for the GBM model. In the integration process, the temporal resolution of all data was unified to daily units and spatial information was matched to the same coordinate system and extent.
The research flow diagram (Fig. 3) shows the process of estimating methane emissions from rice paddy fields in South Korea and creating a gridded spatial information map of methane concentrations using the GBM algorithm by matching Fluxnet data with LDAPS meteorological variables. Model input data were constructed by matching meteorological variables, satellite-based variables, and flux data. After generating predicted methane concentration maps through modeling, the final maps were constructed by masking only rice paddy areas using land cover maps.
The application of machine learning methods in methane emission estimation is gaining attention as it provides a more scalable and automated approach compared to traditional statistical methods. Machine learning models can efficiently process large-scale datasets, offering scalability and automation capabilities for continuous monitoring of methane emissions over extensive areas (Rouet-Leduc and Hulbert, 2024). Furthermore, these models can integrate and analyze various spatiotemporal data such as satellite imagery, meteorological data, and ground sensor data, enabling more comprehensive methane detection.
We selected GBM among various machine learning methods. GBM shows better performance in modeling complex non-linear relationships and achieves high prediction accuracy in various problems (Friedman, 2001). It generates an initial model with simple decision trees, calculates the difference (residuals) between actual and predicted values, learns new decision trees to predict the residuals from the previous step, and updates the overall model by adding these new models to the previous ones. This process is repeated for a specified number of times or until there is no performance improvement. Finally, it combines the predictions of all trees to generate the final prediction (Chen and Guestrin, 2016; Friedman, 2001).
Through this process, GBM gradually improves the model’s performance and can learn complex patterns. Particularly, these characteristics of GBM are expected to be very useful in analyzing complex environmental data such as methane emission estimation. In this study, we aim to develop a methane concentration estimation model and accurately identify methane emission characteristics in rice cultivation areas by utilizing these advantages of GBM.
To evaluate the performance and generalization of the model, we applied the 5-fold cross-validation and Leave One Year Out (LOYO) validation methods. In 5-fold cross-validation, the entire dataset was divided into five subsets, each of which was sequentially used as a test set and the remaining four as a training set. 5-fold cross-validation efficiently utilizes the entire dataset to evaluate the model’s generalization performance, reducing bias. This allows us to evaluate the performance of the model without biasing the data and prevent overfitting.
LOYO cross-validation is a method that considers the characteristics of time series data, excluding data from a specific year as the test set, training the model with data from the remaining years, and then evaluating the prediction performance of the excluded year. In this study, LOYO validation was performed using data from 2015 to 2018. For example, we used 2015 data as the test set trained the model with data from 2016 to 2018, and repeated the process for all years. LOYO validation is a method that considers the characteristics of time series data, using data from a specific year as the test set to reflect changes over time. By using both methods together, we comprehensively evaluated the model’s generalization performance and estimation ability over time, validating its reliability.
The model’s evaluation metrics included Mean Bias Error (MBE), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Correlation Coefficient (CC).
MBE is an indicator that evaluates the overall bias of the model by averaging the differences between predicted and actual values (Walther and Moore, 2005). A positive value indicates that the model overestimates compared to the actual value, while a negative value indicates underestimation, effectively identifying systematic bias in the prediction model. MBE provides directional information on errors, offering useful information for determining the direction of model calibration. The formula for MBE is given in Eq. (1).
MAE is the average of the absolute differences between predicted and actual values, assigning equal weight to all errors and reacting less sensitively to extreme errors (Willmott and Matsuura, 2005). It maintains the same unit as the original data, allowing for intuitive interpretation, and is particularly useful for evaluating model performance in datasets with outliers (Chai and Draxler, 2014). It is also effectively used in studies dealing with values of actual physical significance, such as time series predictions or climate modeling. The formula for MAE is given in Eq. (2), where n is the total number of data samples, y_i is the actual observed value, ?_i is the model’s predicted value, ? is the mean of actual values, and ?? is the mean of predicted values.
RMSE is the square root of the average of the squared differences between predicted and actual values, giving more weight to larger errors (Chai and Draxler, 2014). As RMSE assigns greater penalties to larger error magnitudes, it is suitable for more sensitively evaluating large prediction errors. It is particularly effective in rigorously assessing model performance for outliers or extreme values and is widely used to evaluate prediction precision. The formula for RMSE is given in Eq. (3).
CC is an indicator that evaluates the agreement between predicted and actual values, ranging from -1 to 1, with values closer to 1 indicating higher agreement. CC can simultaneously evaluate the linear relationship and accuracy between two measurements, making it useful for comprehensively assessing the overall performance of prediction models. It is particularly effective in evaluating agreement between repeated measurements or different measurement methods. The formula for CC is given in Eq. (4).
A nationwide methane concentration estimation map was constructed by using the trained GBM model and combined with an electronic map of agricultural land to generate a methane concentration estimation map for rice paddy areas. This process consisted of three main steps: processing the electronic map of agricultural land, generating a nationwide methane concentration estimation map, and masking rice paddy areas.
Using the electronic map of agricultural land (Farmmap) provided by the Ministry of Agriculture, Food and Rural Affairs in 2023, we performed a process to accurately identify and rasterize rice paddy areas. Using QGIS software, we extracted only the rice paddy areas from the Farmmap and then converted the vector data to raster format. In this process, we adjusted the pixel size to have the same 500m spatial resolution as the final methane concentration estimation map.
Based on the trained GBM model, we estimated the nationwide distribution of methane concentrations. We used meteorological variables (temperature, relative humidity, pressure, precipitation, etc.) provided by LDAPS and vegetation indices (NDVI, NDWI) based on satellite imagery as input data, and all input data were gridded to a unified 500m spatial resolution. Daily methane concentrations were estimated by inputting the meteorological conditions and vegetation indices at each grid point into the model, thereby constructing a spatiotemporally continuous methane concentration distribution map for four years from 2015 to 2018.
We performed a masking process to extract only rice paddy areas from the nationwide methane concentration estimation map. This is because methane flux observation data were collected only from rice paddies, and the model’s estimation reliability for other land cover types (forests, urban areas, etc.) could not be guaranteed. Especially for forest areas, meteorological conditions differ greatly from rice paddies due to altitude and topographical characteristics, and the methane generation mechanism is also different, making it difficult to apply the same model. Therefore, we ensured the reliability of the model estimation by using the rasterized Farmmap as a mask to extract methane concentrations only for rice paddy areas. Using this method, we finally constructed high-resolution methane concentration estimation maps specialized for rice paddy areas for each day from 2015 to 2018.
This study used 5-fold cross-validation and LOYO (Leave One Year Out) cross-validation methods to evaluate the prediction performance of the GBM model. The model’s performance was comprehensively analyzed based on various evaluation metrics (MAE, MBE, RMSE, CC).
The 5-fold cross-validation results showed an MBE of 0.308 (nmol CH4 m–2 s–1), MAE of 28.97 (nmol CH4 m–2 s–1), RMSE of 48.053 (nmol CH4 m–2 s–1), and CC of 0.91, demonstrating the model’s high accuracy (Table 4). In particular, the MBE being close to 0 indicates that the predicted values do not show a significant tendency to overestimate or underestimate compared to the measured values. This suggests that the model performed stable and reliable predictions overall.
Table 4 5-fold cross-validation results for the GBM model (unit: nmol CH4 m–2 s–1)
N | MBE | MAE | RMSE | CC |
---|---|---|---|---|
1122 | 0.308 | 28.970 | 48.053 | 0.910 |
The LOYO cross-validation results showed an average MBE of 1.56 (nmol CH4 m–2 s–1), MAE of 35.46 (nmol CH4 m–2 s–1), RMSE of 57.276 (nmol CH4 m–2 s–1), and CC of 0.875, with correlation coefficients consistently ranging between 0.85 and 0.90 across years (Table 5). This indicates that the model’s performance did not significantly deteriorate even when excluding data from a specific year for validation. However, for 2015 and 2016, there were many missing values, which may have caused data imbalance during the model training process, potentially affecting the prediction trends. This is also evident in the MBE values (2015: 4.399, 2016: -10.270), which can be interpreted as a slight overestimation or underestimation of predicted values compared to measured values due to the lack of data for specific years.
Table 5 LOYO cross-validation results for the GBM model (unit: nmol CH4 m–2 s–1)
Year | N | MBE | MAE | RMSE | CC |
---|---|---|---|---|---|
2015 | 155 | 4.399 | 38.826 | 59.270 | 0.853 |
2016 | 275 | -10.270 | 36.053 | 65.352 | 0.850 |
2017 | 327 | 15.122 | 33.702 | 50.381 | 0.892 |
2018 | 351 | -3.011 | 33.260 | 54.127 | 0.904 |
Total | 1108 | 1.560 | 35.460 | 57.276 | 0.875 |
The Scatter plot (Fig. 5) of the relationship between observed and predicted values through a scatter plot confirmed a high correlation between the two values. This indicates that the GBM model effectively reflects the observed data and can reliably predict methane emissions in the Cheorwon rice paddy area.
Upon examining the variable importance of the model, NDVI was found to be the most influential variable in methane emissions, which is interpreted as an important indicator reflecting the growth state and biomass of rice (Table 6). NDVI shows the highest importance of 0.661, suggesting a strong correlation between rice growth stages and methane emission amounts. The second most important variable is soil temperature (ST) with an importance of 0.204, which acts as an environmental factor regulating the activity of methanogens and organic matter decomposition. Latent heat flux (LE) was identified as the third most influential variable with an importance of 0.033, affecting evapotranspiration processes and methane circulation. Variables such as soil moisture (SM), specific humidity (SH), pressure (PRES), relative humidity (RH), NDWI, and soil heat flux (G) showed relatively low importance, but they also contributed to explaining the spatiotemporal variability of methane emissions.
Table 6 Variable importance results of the GBM model
No. | Variable | Importance |
---|---|---|
1 | NDVI (Normalized Difference Vegetation Index) | 0.661 |
2 | ST (Soil Temperature) | 0.204 |
3 | LE (Latent Heat Flux) | 0.033 |
4 | SM (Soil Moisture) | 0.025 |
5 | SH (Specific Humidity) | 0.023 |
6 | PRES (Air Pressure) | 0.019 |
7 | RH (Relative Humidity) | 0.014 |
8 | NDWI (Normalized Difference Water Index) | 0.011 |
9 | G (Soil Heat Flux) | 0.010 |
The Variable Inflation Factor (VIF) analysis revealed some multicollinearity among variables. Specific humidity (SH) showed the highest VIF value of 18.304, followed by ST with 14.639 (Table 7). This suggests that these two variables have significant linear relationships with other predictor variables. NDVI and LE showed moderate VIF values of 5.151 and 5.042, respectively. RH, PRES, G, NDWI, and SM all showed VIF values below 5, indicating that multicollinearity is not a major issue for these variables. In particular, G, NDWI, and SM showed low VIF values below 1.6, indicating that they maintain their independent characteristics well.
Table 7 Multicollinearity results in VIF for the GBM model
No. | Variable | VIF Value |
---|---|---|
1 | SH (Specific Humidity) | 18.304 |
2 | ST (Soil Temperature) | 14.639 |
3 | NDVI (Normalized Difference Vegetation Index) | 5.151 |
4 | LE (Latent Heat Flux) | 5.042 |
5 | RH (Relative Humidity) | 3.154 |
6 | PRES (Air Pressure) | 2.637 |
7 | G (Soil Heat Flux) | 1.589 |
8 | NDWI (Normalized Difference Water Index) | 1.578 |
9 | SM (Soil Moisture) | 1.554 |
Although high VIF values were observed for some variables, the GBM model is relatively robust to multicollinearity due to its ability to focus on characteristics that provide additional information not captured by existing trees through the sequential boosting process (Natekin and Knoll, 2013). Therefore, due to the non-linear nature of the model, the impact of this multicollinearity on model performance is expected to be limited.
The constructed methane concentration grid maps revealed distinct seasonal variability and spatial distribution characteristics. Methane concentrations remained low (0-100 nmol CH4 m–2 s–1) from January to April, then increased sharply during the rice cultivation period, especially from June to September when the paddies are flooded.
During the transplanting period in May-June, methane concentrations began to increase to 100-200 nmol CH4 m–2 s–1 as the paddies were kept flooded for rice planting, strengthening anaerobic conditions. During the growth period in July-August, methane concentrations peaked at 400-500 nmol CH4 m–2 s–1 with the full-scale growth of rice, with the highest values recorded in August (Fig. 6). The high temperatures and precipitation during this period provided favorable conditions for methane generation, accelerating the increase in concentration.
During the heading and ripening periods in September– October, methane generation decreased significantly due to changes in water management practices such as mid-season drainage before harvest, lowering concentrations to 200–300 nmol CH4 m–2 s–1. After the rice harvest in November-December, as the paddies dried out and soil conditions became aerobic, methane emissions were minimized to 0–100 nmol CH4 m–2 s–1.
This study utilized a GBM model to estimate methane concentrations in rice paddy areas of South Korea and construct high-resolution maps. The analysis integrated data from the LDAPS, satellite imagery (NDVI, NDWI), and ground observations from the Cheorwon rice paddy region’s FluxNet.
The GBM model showed high performance in 5-fold crossvalidation with a correlation coefficient of 0.91, MAE of 28.97 (nmol CH4 m–2 s–1), and RMSE of 48.053 (nmol CH4 m–2 s–1). In the LOYO cross-validation, it demonstrated stable performance with an average correlation coefficient of 0.875, MAE of 35.46 (nmol CH4 m–2 s–1), and RMSE of 57.276 (nmol CH4 m–2 s–1). Unlike previous studies, this research constructed daily methane concentration maps and analyzed data from nationwide rice paddies over four years, yielding more comprehensive and generalized results. Additionally, the use of satellite-based vegetation indices like NDVI and NDWI directly reflected rice growth stages and soil moisture conditions, enhancing the accuracy of methane emission estimates. This suggests that the model can effectively account for various weather conditions and annual variability.
The variable importance analysis revealed that ST had the most significant impact on methane emissions, followed by NDVI and LE. This aligns with existing research findings on the crucial roles of soil environment and vegetation state in methane generation and emission processes. Multicollinearity analysis showed relatively high VIF values for SH and ST, but due to the characteristics of the GBM model, the negative impact of this was limited.
The constructed methane concentration maps clearly showed seasonal variations in methane emissions related to rice cultivation periods. A sharp increase in methane concentrations was observed between June and September, reflecting the influence of rice growth stages and water management practices on methane emissions. The changes in methane concentrations during the transplanting (May–June), growth (July–August), and ripening and heading (September–October) periods were identified. The main significance of this study includes:
(1) Presenting a new approach to effectively monitor and predict methane emissions in agricultural areas by combining meteorological data, satellite imagery, and machine learning techniques.
(2) Developing high-resolution methane concentration maps that can serve as important foundational data for establishing greenhouse gas reduction policies and effective climate change response strategies in the agricultural sector.
(3) Deepening the understanding of methane emission patterns during rice cultivation and quantitatively identifying methane emission characteristics for different rice cultivation periods.
This study is limited by the fact that we validated the model using a single flux tower data. However, given the similarities in the regional characteristics of rice fields, we expect that the model’s predictions will apply to other rice field regions to some extent. In future research, we plan to further utilize flux tower data from various regions to improve the generalization ability of the model and further increase the accuracy of the mapping. We will also consider additional environmental variables, such as soil organic matter content and rice variety, to further improve the model’s regional prediction accuracy. Furthermore, the study could be extended to predict future changes in methane emissions by applying climate change scenarios. This research is expected to contribute to climate change response by improving understanding of methane emissions from rice paddies and providing a scientific basis for greenhouse gas management in the agricultural sector.
Table 1 Variables based on collected LDAPS meteorological data
Data Souce | Variable | Spatial Resolution | Temporal Resolution |
---|---|---|---|
LDAPS | G (Soil Heat Flux) | 1.5 km | 3 hours |
LE (Latent Heat Flux) | |||
SH (Specific Humidity) | |||
RH (Relative Humidity) | |||
SM (Soil Moisture) | |||
ST (Soil Temperature) | |||
PRES (Air Pressure) |
Table 2 Variables based on collected satellite imagery
Short Name | Long Name | Spatial Resolution | Temporal Resolution | Variable | Data Period |
---|---|---|---|---|---|
MOD13Q1 | MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061 | 250 m | 16 days | NDVI | 2015.01.01~2018.12.31 |
MYD13Q1 | MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061 | 250 m | 16 days | NDVI | 2015.01.01~2018.12.31 |
MOD09GA006 NDWI | MODIS Terra Daily NDWI | 500 m | 1 Day | NDWI | 2015.01.01~2018.12.31 |
Table 3 Input data constructed from ground observation data, LDAPS meteorological data, and MODIS satellite data
Data Souce | Variable | Spatial Resolution | Temporal Resolution | Pre-processed Temporal Resolution |
---|---|---|---|---|
Fluxnet-CH4 | Methane Flux | Point | 1 day | 1 day |
MOD13Q1 | NDVI (normalized difference vegetation index) | 250 m | 16 days | 16 days |
MYD13Q1 | NDVI (normalized difference vegetation index) | 250 m | 16 days | 16 days |
MOD09GA | NDWI (normalized difference water index) | 500 m | 1 day | 1 day |
LDAPS | G (Soil Heat Flux) | 1.5 km | 3 hours | 1 day |
LE (Latent Heat Flux) | ||||
SH (Specific Humidity) | ||||
RH (Relative Humidity) | ||||
SM (Soil Moisture) | ||||
ST (Soil Temperature) | ||||
PRES (Air Pressure) |
This research was supported by the Pukyong National University Development Project Research Fund (Philosopher of Next Generation, 2024).
No potential conflict of interest relevant to this article was reported.
Korean J. Remote Sens. 2024; 40(6): 1095-1108
Published online December 31, 2024 https://doi.org/10.7780/kjrs.2024.40.6.1.18
Copyright © Korean Society of Remote Sensing.
Jiah Jang1, Geunah Kim2, Jaeung Sim1, Jaedong Kim1, Yangwon Lee3*
1Master Student, Major of Spatial Information Engineering, Division of Earth Environmental System Science, Pukyong National University, Busan, Republic of Korea
2PhD Student, Major of Spatial Information Engineering, Division of Earth Environmental System Sciences, Pukyong National University, Busan, Republic of Korea
3Professor, Major of Geomatics Engineering, Division of Earth Environmental System Science, Pukyong National University, Busan, Republic of Korea
Correspondence to:Yangwon Lee
E-mail: modconfi@pknu.ac.kr
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Methane from rice fields has a strong greenhouse effect and its accurate estimation is essential to combat climate change. In this study, we conducted an analysis based on the Gradient Boosting Machine (GBM) model using Local Data Assimilation and Prediction System (LDAPS) data, Normalized Difference Vegetation Index (NDVI), and Normalized Difference Water Index (NDWI) from VIIRS and Moderate Resolution Imaging Spectroradiometer (MODIS), and FluxNet ground observations of the Cheorwon rice paddy region. This was used to estimate methane emissions from rice paddy fields in South Korea and to create a gridded spatial information map of methane concentrations. Using data with a spatial resolution of 1.5 kilometers, we identified detailed changes within the region and generated daily maps to analyze daily changes and seasonal characteristics. To predict methane concentration, we considered the correlation between meteorological factors such as latent heat flux, humidity, soil moisture, and soil temperature and methane emissions as key variables. Latent heat flux and humidity were selected as key variables considering that the migration of methane gas is affected by the evapotranspiration process. In addition, soil moisture, which creates the anaerobic conditions necessary for methane production, and soil temperature, which affects the activity of methanogenic microorganisms, were included in the analysis. Taking these various factors into consideration, we analyzed methane emission data from rice fields in Korea and visualized them on a map to understand the pattern of methane production in response to changing weather conditions. The developed model showed a correlation coefficient of 0.91 and Mean Absolute Error (MAE) of 28.97 in the 5-fold cross-validation and an average correlation coefficient of 0.87 and MAE of 35.46 in the Leave One Year Out (LOYO) cross-validation. These results are expected to contribute to the understanding of methane generation patterns under changing weather conditions and accurate methane emission estimation. In addition, the developed model and the constructed methane concentration map can be utilized as an important basis for establishing greenhouse gas reduction policies in the agricultural sector and effective climate change response strategies in the future.
Keywords: Methane emission, Rice paddy, Machine learning, Meteorological data, Satellite image
As the severity of global warming and climate change intensifies, accurately identifying and reducing greenhouse gas emissions has emerged as a crucial task. Methane (CH4), in particular, is a potent greenhouse gas, exhibiting a warming effect 27.9 times stronger than carbon dioxide over 100 years (Intergovernmental Panel on Climate Change, 2021). The agricultural sector is one of the major sources of methane emissions, accounting for a significant portion of total methane emissions (Jeong et al., 2010), with rice cultivation being one of the primary anthropogenic methane sources, responsible for approximately 11% of global anthropogenic methane emissions (Yan et al., 2009). Methane emissions from rice paddies occur as the final stage of the anaerobic decomposition of organic matter in rice field soils, carried out by methanogenic archaea (Choi et al., 2017; Conrad, 2007). The flooding of soil in rice paddies is a prerequisite for continuous methane emissions (Wassmann et al., 2000), and the practice of keeping rice paddies filled with water for extended periods provides an environment conducive to methane emissions. Therefore, reducing methane emissions in the agricultural sector can play a crucial role in climate change mitigation strategies. Rice production offers good potential to influence both methane emissions and increase soil carbon sequestration through improved water management, fertilizer use, and organic residue management (Smith et al., 2008). Consequently, it is necessary to accurately understand and estimate the relationship between rice cultivation periods and methane emissions.
Furthermore, methane emissions are influenced by various environmental factors, with weather conditions playing a particularly important role. Climate elements have been shown to have a significant impact on methane generation and emission. Soil temperature directly affects the activity of methanogens and the rate of organic matter decomposition, regulating methane production, and has been identified as one of the most important factors controlling methane generation and emission in rice fields (Khalil et al., 1998). Additionally, precipitation and irrigation management play crucial roles in determining soil moisture conditions, which are essential for creating anaerobic environments. Rice paddies are significant sources of CH4, which is released into the atmosphere through three pathways: molecular diffusion of dissolved methane at the air-water interface, ebullition of gas bubbles, and diffusive transport through the aerenchyma of rice plants (Hamamoto et al., 2024). The complex interaction of these meteorological factors acts as a key determinant in shaping methane emission patterns during the rice cultivation period.
In addition to meteorological factors, methane emissions from rice paddies are also influenced by the growth state of rice and water management. Therefore, it is important to include vegetation and moisture indices such as the Normalized Difference Vegetation Index (NDVI) and the Normalized Difference Water Index (NDWI). NDVI and NDWI indices derived from Moderate Resolution Imaging Spectroradiometer (MODIS) data have been successfully used to monitor rice growth stages and inundation conditions, which are closely related to methane emissions from rice fields (Xiao et al., 2006). NDVI, which uses the difference in reflectance between near-infrared and red light, is an indicator of vegetation health and density and is useful for assessing the growth state of rice. NDWI, which uses the difference in reflectance between near-infrared and short-wave infrared, is an indicator of vegetation moisture content and soil wetness and can be used to assess the water management status of rice paddies.
In this study, to estimate methane emissions considering these various factors comprehensively, we utilized data from the Korea Meteorological Administration’s Local Data Assimilation and Prediction System (LDAPS) and the Ministry of Agriculture, Food and Rural Affairs’ electronic map of agricultural land, known as Farmmap. We applied a Gradient Boosting Machine (GBM) based model to model complex non-linear relationships and integrate various input variables for analysis. The main objective of this study is to develop a model for estimating methane concentrations in rice paddies and to construct highresolution daily methane concentration estimation maps specialized for rice cultivation areas.
The current methods for evaluating and visualizing methane emissions in South Korea focus on generating maps for short-term and long-term methane emission assessments(Choi et al., 2018). Particularly in the methane emission mapping section, methods are presented for estimating methane emissions nationwide and visualizing them by administrative districts (Baek et al., 2023; Choi et al., 2020). Based on the results of these previous studies, this research focuses on estimating and spatially mapping methane concentrations in rice cultivation areas across the country. We estimated and spatially mapped methane emissions from rice cultivation areas with finer spatiotemporal resolution, constructing daily methane concentration estimation maps. Through this approach, we expect to accurately understand the characteristics of methane emissions in the agricultural sector and make practical contributions to the development of greenhouse gas reduction policies.
This study utilized ground observation data, meteorological data, and satellite data from 2015 to 2018 to estimate methane emissions from rice paddies. The entire country of South Korea was set as the study area to include all rice paddy regions for developing a methane concentration estimation model.
Fluxnet-CH4 is a global flux observation network that provides data from 81 observation sites including freshwater wetlands, coastal areas, highlands, and natural and managed ecosystems. It continuously measures methane exchange between the surface and atmosphere using the eddy covariance method, providing 30-minute and daily methane flux data. For estimating methane emissions from rice paddies in South Korea, daily methane flux Fluxnet-CH4 data (nmol CH4 m–2 s–1) from the Cheorwon flux tower (38°12′04″N, 127°15′02″E) were used.
The LDAPS data from the Korea Meteorological Administration were used. LDAPS has a horizontal resolution of 1.5 km and 40 vertical layers, enabling high-resolution numerical simulation within the atmospheric boundary layer and providing forecast data reflecting actual terrain and atmospheric conditions at 3-hour intervals. For methane emission estimation, soil temperature, soil moisture, relative humidity, specific humidity, latent heat flux, ground heat flux, and air pressure were selected as key variables. Soil temperature is an important factor affecting methane generation as it is related to the activity of methanogens and the decomposition of organic matter (Bridgham et al., 2013). Soil moisture and relative humidity provide an environment for methanogens to actively function by creating anaerobic conditions, and especially in flooded soils, anaerobic conditions are formed, promoting methane generation (Bridgham et al., 2013; Lai, 2009).
Observational data from the MODIS sensor were utilized. MODIS is a sensor mounted on the Terra satellite, providing data through 36 spectral bands (0.405–14.385 μm) at resolutions of 250 m, 500 m, and 1,000 m. MODIS has long-term observation data since 1999, suitable for time series analysis, and provides stable data with a 1-2 times daily observation cycle. The NDVI from MODIS Terra was derived using 16-day composite data from MOD13Q1 and MYD13Q1 products, which have a spatial resolution of 250 m and a temporal resolution of 16 days. The NDWI from MODIS Terra has a spatial resolution of 463.313 m and a temporal resolution of 8 days, derived from the near-infrared and short-wave infrared bands of the MOD09GA_006 product, and was obtained from Google. NDWI reflects the soil moisture condition, allowing for the assessment of anaerobic condition formation, while NDVI represents the biomass and photosynthetic activity of rice, used to analyze the relationship with methane emissions (Serrano et al., 2019).
The integration of these multiple observational data sources contributes to a more comprehensive understanding of methane emission characteristics in rice paddy ecosystems by complementarily utilizing the temporal continuity of ground observations and the spatial representativeness of satellite observations (Zhang et al., 2016).
In this study, the following preprocessing steps were performed to integrate Fluxnet-CH4, LDAPS, and satellite data. Methane flux data collected from the Cheorwon flux tower (38°12′04″N, 127°15′02″E) were cleaned of missing values. From the 30-minute methane flux measurements, data containing missing values were removed, resulting in 1,122 days (76.8%) of valid data out of a total observation period of 1,461 days. The preprocessed data were then converted to daily average values for analysis.
Fig. 2. shows methane emission data collected from the Cheorwon rice paddy flux tower from 2015 to 2018. Time series analysis of the collected methane emission data revealed distinct seasonal variations. In particular, there was a tendency for methane emissions to increase sharply during summer, with high methane emissions recorded up to 600 (nmol CH4 m–2 s–1). This phenomenon is thought to be due to the increased activity of methanogens as anaerobic conditions form when water fills the paddy fields during the rice cultivation period. This suggests that water management during rice cultivation can play an important role in controlling methane emissions.
Meteorological variables provided by LDAPS underwent time and spatial resolution matching. Data provided in Universal Time Coordinated (UTC) were converted to Korean Standard Time (KST), and 3-hourly data were aggregated to daily averages. Data for a total of 7 variables (soil heat flux, latent heat flux, specific humidity, relative humidity, soil moisture, soil temperature, and air pressure) were converted to the EPSG:4326 coordinate system and adjusted to the spatial extent of the study area.
Satellite data preprocessing was conducted in three stages. For MODIS NDWI and NDVI data, coordinate system transformation was performed to unify them to EPSG:4326, and the spatial extent was set based on South Korea (Define Extent). The three types of preprocessed data were spatiotemporally matched based on the location of the Cheorwon flux tower. Point information from each dataset was extracted to construct a unified dataset, which was used as input data for the GBM model. In the integration process, the temporal resolution of all data was unified to daily units and spatial information was matched to the same coordinate system and extent.
The research flow diagram (Fig. 3) shows the process of estimating methane emissions from rice paddy fields in South Korea and creating a gridded spatial information map of methane concentrations using the GBM algorithm by matching Fluxnet data with LDAPS meteorological variables. Model input data were constructed by matching meteorological variables, satellite-based variables, and flux data. After generating predicted methane concentration maps through modeling, the final maps were constructed by masking only rice paddy areas using land cover maps.
The application of machine learning methods in methane emission estimation is gaining attention as it provides a more scalable and automated approach compared to traditional statistical methods. Machine learning models can efficiently process large-scale datasets, offering scalability and automation capabilities for continuous monitoring of methane emissions over extensive areas (Rouet-Leduc and Hulbert, 2024). Furthermore, these models can integrate and analyze various spatiotemporal data such as satellite imagery, meteorological data, and ground sensor data, enabling more comprehensive methane detection.
We selected GBM among various machine learning methods. GBM shows better performance in modeling complex non-linear relationships and achieves high prediction accuracy in various problems (Friedman, 2001). It generates an initial model with simple decision trees, calculates the difference (residuals) between actual and predicted values, learns new decision trees to predict the residuals from the previous step, and updates the overall model by adding these new models to the previous ones. This process is repeated for a specified number of times or until there is no performance improvement. Finally, it combines the predictions of all trees to generate the final prediction (Chen and Guestrin, 2016; Friedman, 2001).
Through this process, GBM gradually improves the model’s performance and can learn complex patterns. Particularly, these characteristics of GBM are expected to be very useful in analyzing complex environmental data such as methane emission estimation. In this study, we aim to develop a methane concentration estimation model and accurately identify methane emission characteristics in rice cultivation areas by utilizing these advantages of GBM.
To evaluate the performance and generalization of the model, we applied the 5-fold cross-validation and Leave One Year Out (LOYO) validation methods. In 5-fold cross-validation, the entire dataset was divided into five subsets, each of which was sequentially used as a test set and the remaining four as a training set. 5-fold cross-validation efficiently utilizes the entire dataset to evaluate the model’s generalization performance, reducing bias. This allows us to evaluate the performance of the model without biasing the data and prevent overfitting.
LOYO cross-validation is a method that considers the characteristics of time series data, excluding data from a specific year as the test set, training the model with data from the remaining years, and then evaluating the prediction performance of the excluded year. In this study, LOYO validation was performed using data from 2015 to 2018. For example, we used 2015 data as the test set trained the model with data from 2016 to 2018, and repeated the process for all years. LOYO validation is a method that considers the characteristics of time series data, using data from a specific year as the test set to reflect changes over time. By using both methods together, we comprehensively evaluated the model’s generalization performance and estimation ability over time, validating its reliability.
The model’s evaluation metrics included Mean Bias Error (MBE), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Correlation Coefficient (CC).
MBE is an indicator that evaluates the overall bias of the model by averaging the differences between predicted and actual values (Walther and Moore, 2005). A positive value indicates that the model overestimates compared to the actual value, while a negative value indicates underestimation, effectively identifying systematic bias in the prediction model. MBE provides directional information on errors, offering useful information for determining the direction of model calibration. The formula for MBE is given in Eq. (1).
MAE is the average of the absolute differences between predicted and actual values, assigning equal weight to all errors and reacting less sensitively to extreme errors (Willmott and Matsuura, 2005). It maintains the same unit as the original data, allowing for intuitive interpretation, and is particularly useful for evaluating model performance in datasets with outliers (Chai and Draxler, 2014). It is also effectively used in studies dealing with values of actual physical significance, such as time series predictions or climate modeling. The formula for MAE is given in Eq. (2), where n is the total number of data samples, y_i is the actual observed value, ?_i is the model’s predicted value, ? is the mean of actual values, and ?? is the mean of predicted values.
RMSE is the square root of the average of the squared differences between predicted and actual values, giving more weight to larger errors (Chai and Draxler, 2014). As RMSE assigns greater penalties to larger error magnitudes, it is suitable for more sensitively evaluating large prediction errors. It is particularly effective in rigorously assessing model performance for outliers or extreme values and is widely used to evaluate prediction precision. The formula for RMSE is given in Eq. (3).
CC is an indicator that evaluates the agreement between predicted and actual values, ranging from -1 to 1, with values closer to 1 indicating higher agreement. CC can simultaneously evaluate the linear relationship and accuracy between two measurements, making it useful for comprehensively assessing the overall performance of prediction models. It is particularly effective in evaluating agreement between repeated measurements or different measurement methods. The formula for CC is given in Eq. (4).
A nationwide methane concentration estimation map was constructed by using the trained GBM model and combined with an electronic map of agricultural land to generate a methane concentration estimation map for rice paddy areas. This process consisted of three main steps: processing the electronic map of agricultural land, generating a nationwide methane concentration estimation map, and masking rice paddy areas.
Using the electronic map of agricultural land (Farmmap) provided by the Ministry of Agriculture, Food and Rural Affairs in 2023, we performed a process to accurately identify and rasterize rice paddy areas. Using QGIS software, we extracted only the rice paddy areas from the Farmmap and then converted the vector data to raster format. In this process, we adjusted the pixel size to have the same 500m spatial resolution as the final methane concentration estimation map.
Based on the trained GBM model, we estimated the nationwide distribution of methane concentrations. We used meteorological variables (temperature, relative humidity, pressure, precipitation, etc.) provided by LDAPS and vegetation indices (NDVI, NDWI) based on satellite imagery as input data, and all input data were gridded to a unified 500m spatial resolution. Daily methane concentrations were estimated by inputting the meteorological conditions and vegetation indices at each grid point into the model, thereby constructing a spatiotemporally continuous methane concentration distribution map for four years from 2015 to 2018.
We performed a masking process to extract only rice paddy areas from the nationwide methane concentration estimation map. This is because methane flux observation data were collected only from rice paddies, and the model’s estimation reliability for other land cover types (forests, urban areas, etc.) could not be guaranteed. Especially for forest areas, meteorological conditions differ greatly from rice paddies due to altitude and topographical characteristics, and the methane generation mechanism is also different, making it difficult to apply the same model. Therefore, we ensured the reliability of the model estimation by using the rasterized Farmmap as a mask to extract methane concentrations only for rice paddy areas. Using this method, we finally constructed high-resolution methane concentration estimation maps specialized for rice paddy areas for each day from 2015 to 2018.
This study used 5-fold cross-validation and LOYO (Leave One Year Out) cross-validation methods to evaluate the prediction performance of the GBM model. The model’s performance was comprehensively analyzed based on various evaluation metrics (MAE, MBE, RMSE, CC).
The 5-fold cross-validation results showed an MBE of 0.308 (nmol CH4 m–2 s–1), MAE of 28.97 (nmol CH4 m–2 s–1), RMSE of 48.053 (nmol CH4 m–2 s–1), and CC of 0.91, demonstrating the model’s high accuracy (Table 4). In particular, the MBE being close to 0 indicates that the predicted values do not show a significant tendency to overestimate or underestimate compared to the measured values. This suggests that the model performed stable and reliable predictions overall.
Table 4 . 5-fold cross-validation results for the GBM model (unit: nmol CH4 m–2 s–1).
N | MBE | MAE | RMSE | CC |
---|---|---|---|---|
1122 | 0.308 | 28.970 | 48.053 | 0.910 |
The LOYO cross-validation results showed an average MBE of 1.56 (nmol CH4 m–2 s–1), MAE of 35.46 (nmol CH4 m–2 s–1), RMSE of 57.276 (nmol CH4 m–2 s–1), and CC of 0.875, with correlation coefficients consistently ranging between 0.85 and 0.90 across years (Table 5). This indicates that the model’s performance did not significantly deteriorate even when excluding data from a specific year for validation. However, for 2015 and 2016, there were many missing values, which may have caused data imbalance during the model training process, potentially affecting the prediction trends. This is also evident in the MBE values (2015: 4.399, 2016: -10.270), which can be interpreted as a slight overestimation or underestimation of predicted values compared to measured values due to the lack of data for specific years.
Table 5 . LOYO cross-validation results for the GBM model (unit: nmol CH4 m–2 s–1).
Year | N | MBE | MAE | RMSE | CC |
---|---|---|---|---|---|
2015 | 155 | 4.399 | 38.826 | 59.270 | 0.853 |
2016 | 275 | -10.270 | 36.053 | 65.352 | 0.850 |
2017 | 327 | 15.122 | 33.702 | 50.381 | 0.892 |
2018 | 351 | -3.011 | 33.260 | 54.127 | 0.904 |
Total | 1108 | 1.560 | 35.460 | 57.276 | 0.875 |
The Scatter plot (Fig. 5) of the relationship between observed and predicted values through a scatter plot confirmed a high correlation between the two values. This indicates that the GBM model effectively reflects the observed data and can reliably predict methane emissions in the Cheorwon rice paddy area.
Upon examining the variable importance of the model, NDVI was found to be the most influential variable in methane emissions, which is interpreted as an important indicator reflecting the growth state and biomass of rice (Table 6). NDVI shows the highest importance of 0.661, suggesting a strong correlation between rice growth stages and methane emission amounts. The second most important variable is soil temperature (ST) with an importance of 0.204, which acts as an environmental factor regulating the activity of methanogens and organic matter decomposition. Latent heat flux (LE) was identified as the third most influential variable with an importance of 0.033, affecting evapotranspiration processes and methane circulation. Variables such as soil moisture (SM), specific humidity (SH), pressure (PRES), relative humidity (RH), NDWI, and soil heat flux (G) showed relatively low importance, but they also contributed to explaining the spatiotemporal variability of methane emissions.
Table 6 . Variable importance results of the GBM model.
No. | Variable | Importance |
---|---|---|
1 | NDVI (Normalized Difference Vegetation Index) | 0.661 |
2 | ST (Soil Temperature) | 0.204 |
3 | LE (Latent Heat Flux) | 0.033 |
4 | SM (Soil Moisture) | 0.025 |
5 | SH (Specific Humidity) | 0.023 |
6 | PRES (Air Pressure) | 0.019 |
7 | RH (Relative Humidity) | 0.014 |
8 | NDWI (Normalized Difference Water Index) | 0.011 |
9 | G (Soil Heat Flux) | 0.010 |
The Variable Inflation Factor (VIF) analysis revealed some multicollinearity among variables. Specific humidity (SH) showed the highest VIF value of 18.304, followed by ST with 14.639 (Table 7). This suggests that these two variables have significant linear relationships with other predictor variables. NDVI and LE showed moderate VIF values of 5.151 and 5.042, respectively. RH, PRES, G, NDWI, and SM all showed VIF values below 5, indicating that multicollinearity is not a major issue for these variables. In particular, G, NDWI, and SM showed low VIF values below 1.6, indicating that they maintain their independent characteristics well.
Table 7 . Multicollinearity results in VIF for the GBM model.
No. | Variable | VIF Value |
---|---|---|
1 | SH (Specific Humidity) | 18.304 |
2 | ST (Soil Temperature) | 14.639 |
3 | NDVI (Normalized Difference Vegetation Index) | 5.151 |
4 | LE (Latent Heat Flux) | 5.042 |
5 | RH (Relative Humidity) | 3.154 |
6 | PRES (Air Pressure) | 2.637 |
7 | G (Soil Heat Flux) | 1.589 |
8 | NDWI (Normalized Difference Water Index) | 1.578 |
9 | SM (Soil Moisture) | 1.554 |
Although high VIF values were observed for some variables, the GBM model is relatively robust to multicollinearity due to its ability to focus on characteristics that provide additional information not captured by existing trees through the sequential boosting process (Natekin and Knoll, 2013). Therefore, due to the non-linear nature of the model, the impact of this multicollinearity on model performance is expected to be limited.
The constructed methane concentration grid maps revealed distinct seasonal variability and spatial distribution characteristics. Methane concentrations remained low (0-100 nmol CH4 m–2 s–1) from January to April, then increased sharply during the rice cultivation period, especially from June to September when the paddies are flooded.
During the transplanting period in May-June, methane concentrations began to increase to 100-200 nmol CH4 m–2 s–1 as the paddies were kept flooded for rice planting, strengthening anaerobic conditions. During the growth period in July-August, methane concentrations peaked at 400-500 nmol CH4 m–2 s–1 with the full-scale growth of rice, with the highest values recorded in August (Fig. 6). The high temperatures and precipitation during this period provided favorable conditions for methane generation, accelerating the increase in concentration.
During the heading and ripening periods in September– October, methane generation decreased significantly due to changes in water management practices such as mid-season drainage before harvest, lowering concentrations to 200–300 nmol CH4 m–2 s–1. After the rice harvest in November-December, as the paddies dried out and soil conditions became aerobic, methane emissions were minimized to 0–100 nmol CH4 m–2 s–1.
This study utilized a GBM model to estimate methane concentrations in rice paddy areas of South Korea and construct high-resolution maps. The analysis integrated data from the LDAPS, satellite imagery (NDVI, NDWI), and ground observations from the Cheorwon rice paddy region’s FluxNet.
The GBM model showed high performance in 5-fold crossvalidation with a correlation coefficient of 0.91, MAE of 28.97 (nmol CH4 m–2 s–1), and RMSE of 48.053 (nmol CH4 m–2 s–1). In the LOYO cross-validation, it demonstrated stable performance with an average correlation coefficient of 0.875, MAE of 35.46 (nmol CH4 m–2 s–1), and RMSE of 57.276 (nmol CH4 m–2 s–1). Unlike previous studies, this research constructed daily methane concentration maps and analyzed data from nationwide rice paddies over four years, yielding more comprehensive and generalized results. Additionally, the use of satellite-based vegetation indices like NDVI and NDWI directly reflected rice growth stages and soil moisture conditions, enhancing the accuracy of methane emission estimates. This suggests that the model can effectively account for various weather conditions and annual variability.
The variable importance analysis revealed that ST had the most significant impact on methane emissions, followed by NDVI and LE. This aligns with existing research findings on the crucial roles of soil environment and vegetation state in methane generation and emission processes. Multicollinearity analysis showed relatively high VIF values for SH and ST, but due to the characteristics of the GBM model, the negative impact of this was limited.
The constructed methane concentration maps clearly showed seasonal variations in methane emissions related to rice cultivation periods. A sharp increase in methane concentrations was observed between June and September, reflecting the influence of rice growth stages and water management practices on methane emissions. The changes in methane concentrations during the transplanting (May–June), growth (July–August), and ripening and heading (September–October) periods were identified. The main significance of this study includes:
(1) Presenting a new approach to effectively monitor and predict methane emissions in agricultural areas by combining meteorological data, satellite imagery, and machine learning techniques.
(2) Developing high-resolution methane concentration maps that can serve as important foundational data for establishing greenhouse gas reduction policies and effective climate change response strategies in the agricultural sector.
(3) Deepening the understanding of methane emission patterns during rice cultivation and quantitatively identifying methane emission characteristics for different rice cultivation periods.
This study is limited by the fact that we validated the model using a single flux tower data. However, given the similarities in the regional characteristics of rice fields, we expect that the model’s predictions will apply to other rice field regions to some extent. In future research, we plan to further utilize flux tower data from various regions to improve the generalization ability of the model and further increase the accuracy of the mapping. We will also consider additional environmental variables, such as soil organic matter content and rice variety, to further improve the model’s regional prediction accuracy. Furthermore, the study could be extended to predict future changes in methane emissions by applying climate change scenarios. This research is expected to contribute to climate change response by improving understanding of methane emissions from rice paddies and providing a scientific basis for greenhouse gas management in the agricultural sector.
Table 1 . Variables based on collected LDAPS meteorological data.
Data Souce | Variable | Spatial Resolution | Temporal Resolution |
---|---|---|---|
LDAPS | G (Soil Heat Flux) | 1.5 km | 3 hours |
LE (Latent Heat Flux) | |||
SH (Specific Humidity) | |||
RH (Relative Humidity) | |||
SM (Soil Moisture) | |||
ST (Soil Temperature) | |||
PRES (Air Pressure) |
Table 2 . Variables based on collected satellite imagery.
Short Name | Long Name | Spatial Resolution | Temporal Resolution | Variable | Data Period |
---|---|---|---|---|---|
MOD13Q1 | MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061 | 250 m | 16 days | NDVI | 2015.01.01~2018.12.31 |
MYD13Q1 | MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061 | 250 m | 16 days | NDVI | 2015.01.01~2018.12.31 |
MOD09GA006 NDWI | MODIS Terra Daily NDWI | 500 m | 1 Day | NDWI | 2015.01.01~2018.12.31 |
Table 3 . Input data constructed from ground observation data, LDAPS meteorological data, and MODIS satellite data.
Data Souce | Variable | Spatial Resolution | Temporal Resolution | Pre-processed Temporal Resolution |
---|---|---|---|---|
Fluxnet-CH4 | Methane Flux | Point | 1 day | 1 day |
MOD13Q1 | NDVI (normalized difference vegetation index) | 250 m | 16 days | 16 days |
MYD13Q1 | NDVI (normalized difference vegetation index) | 250 m | 16 days | 16 days |
MOD09GA | NDWI (normalized difference water index) | 500 m | 1 day | 1 day |
LDAPS | G (Soil Heat Flux) | 1.5 km | 3 hours | 1 day |
LE (Latent Heat Flux) | ||||
SH (Specific Humidity) | ||||
RH (Relative Humidity) | ||||
SM (Soil Moisture) | ||||
ST (Soil Temperature) | ||||
PRES (Air Pressure) |
This research was supported by the Pukyong National University Development Project Research Fund (Philosopher of Next Generation, 2024).
No potential conflict of interest relevant to this article was reported.
Table 1 . Variables based on collected LDAPS meteorological data.
Data Souce | Variable | Spatial Resolution | Temporal Resolution |
---|---|---|---|
LDAPS | G (Soil Heat Flux) | 1.5 km | 3 hours |
LE (Latent Heat Flux) | |||
SH (Specific Humidity) | |||
RH (Relative Humidity) | |||
SM (Soil Moisture) | |||
ST (Soil Temperature) | |||
PRES (Air Pressure) |
Table 2 . Variables based on collected satellite imagery.
Short Name | Long Name | Spatial Resolution | Temporal Resolution | Variable | Data Period |
---|---|---|---|---|---|
MOD13Q1 | MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061 | 250 m | 16 days | NDVI | 2015.01.01~2018.12.31 |
MYD13Q1 | MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061 | 250 m | 16 days | NDVI | 2015.01.01~2018.12.31 |
MOD09GA006 NDWI | MODIS Terra Daily NDWI | 500 m | 1 Day | NDWI | 2015.01.01~2018.12.31 |
Table 3 . Input data constructed from ground observation data, LDAPS meteorological data, and MODIS satellite data.
Data Souce | Variable | Spatial Resolution | Temporal Resolution | Pre-processed Temporal Resolution |
---|---|---|---|---|
Fluxnet-CH4 | Methane Flux | Point | 1 day | 1 day |
MOD13Q1 | NDVI (normalized difference vegetation index) | 250 m | 16 days | 16 days |
MYD13Q1 | NDVI (normalized difference vegetation index) | 250 m | 16 days | 16 days |
MOD09GA | NDWI (normalized difference water index) | 500 m | 1 day | 1 day |
LDAPS | G (Soil Heat Flux) | 1.5 km | 3 hours | 1 day |
LE (Latent Heat Flux) | ||||
SH (Specific Humidity) | ||||
RH (Relative Humidity) | ||||
SM (Soil Moisture) | ||||
ST (Soil Temperature) | ||||
PRES (Air Pressure) |
Table 4 . 5-fold cross-validation results for the GBM model (unit: nmol CH4 m–2 s–1).
N | MBE | MAE | RMSE | CC |
---|---|---|---|---|
1122 | 0.308 | 28.970 | 48.053 | 0.910 |
Table 5 . LOYO cross-validation results for the GBM model (unit: nmol CH4 m–2 s–1).
Year | N | MBE | MAE | RMSE | CC |
---|---|---|---|---|---|
2015 | 155 | 4.399 | 38.826 | 59.270 | 0.853 |
2016 | 275 | -10.270 | 36.053 | 65.352 | 0.850 |
2017 | 327 | 15.122 | 33.702 | 50.381 | 0.892 |
2018 | 351 | -3.011 | 33.260 | 54.127 | 0.904 |
Total | 1108 | 1.560 | 35.460 | 57.276 | 0.875 |
Table 6 . Variable importance results of the GBM model.
No. | Variable | Importance |
---|---|---|
1 | NDVI (Normalized Difference Vegetation Index) | 0.661 |
2 | ST (Soil Temperature) | 0.204 |
3 | LE (Latent Heat Flux) | 0.033 |
4 | SM (Soil Moisture) | 0.025 |
5 | SH (Specific Humidity) | 0.023 |
6 | PRES (Air Pressure) | 0.019 |
7 | RH (Relative Humidity) | 0.014 |
8 | NDWI (Normalized Difference Water Index) | 0.011 |
9 | G (Soil Heat Flux) | 0.010 |
Table 7 . Multicollinearity results in VIF for the GBM model.
No. | Variable | VIF Value |
---|---|---|
1 | SH (Specific Humidity) | 18.304 |
2 | ST (Soil Temperature) | 14.639 |
3 | NDVI (Normalized Difference Vegetation Index) | 5.151 |
4 | LE (Latent Heat Flux) | 5.042 |
5 | RH (Relative Humidity) | 3.154 |
6 | PRES (Air Pressure) | 2.637 |
7 | G (Soil Heat Flux) | 1.589 |
8 | NDWI (Normalized Difference Water Index) | 1.578 |
9 | SM (Soil Moisture) | 1.554 |
Jaeung Sim, Jaeil Cho, Kyungdo Lee, Yangwon Lee
Korean J. Remote Sens. 2024; 40(6): 1195-1208Subin Cho 1) · Youjeong Youn 1) · Seoyeon Kim 1) · Yemin Jeong 1) · Gunah Kim 1) · Jonggu Kang 1) · Kwangjin Kim 2) · Jaeil Cho 3) · Yangwon Lee 4)†
Korean J. Remote Sens. 2021; 37(2): 337-357Jongsoo Park, Hagyu Jeong, Junwoo Lee, Boyoung Heo
Korean J. Remote Sens. 2024; 40(6): 1305-1314