Abstract
Semantic image segmentation techniques have recently gained widespread adoption in the field of remote sensing for tasks such as classifying surface properties and extracting specific objects. The performance of semantic image segmentation is influenced not only by the choice of deep learning model but also by the configuration of key hyperparameters, including learning rate and batch size. Among these hyperparameters, the batch size is typically set to a larger value to improve model performance. However, since the processing capacity of a typical deep learning system’s graphics processing unit (GPU) is limited, selecting an appropriate batch size is necessary. This paper investigates the impact of batch size on building detection performance in deep learning systems for semantic image segmentation using satellite and aerial imagery. For the performance analysis, representative models for semantic image segmentation, including UNet, ResUNet, DeepResUNet, and CBAM-DRUNet, were used as baseline models. Additionally, transfer learning models such as UNet-VGG19, UNet-ResNet50, and CBAM-DRUNet-VGG19 were incorporated for comparison. The training datasets used in this study included the WHU and INRIA datasets, which are commonly used for semantic image segmentation tasks, as well as the Kompsat-3A dataset. The experimental results revealed that a batch size of 2 or larger led to an improvement in F1 scores across all models and datasets. For the WHU dataset, the smallest of the datasets, the F1 score initially increased with batch size, but after reaching a certain threshold, it began to decline, except for the CBAM-DRUNet-VGG19 model. In contrast, for the INRIA dataset, which is approximately 1.5 times larger than WHU, transfer learning models maintained relatively stable F1 scores as the batch size increased, while other models showed a similar trend of increasing F1 scores followed by a decrease. In the case of the Kompsat-3A datasets, which are 4 to 5 times larger than the WHU dataset, all models showed a substantial increase in F1 score when the batch size was set to 2. Beyond this point, F1 scores stabilized without further significant improvements. In terms of training time, increasing the batch size generally resulted in reduced training time for all models. Therefore, when the training dataset is sufficiently large, setting the batch size to 2 is already sufficient to achieve significant improvements in F1 score accuracy. Furthermore, setting the batch size to a value greater than 2 may be advantageous in terms of further reducing training time, provided that the GPU capacity of the deep learning system is sufficient to handle the larger batch size.