Abstract
In this study, we propose an automatic training dataset generation method to build a self-supervised matching network, using an End-to-end approach, to extract matching points between very high-resolution (VHR) satellite images. A homography matrix that transforms the scale, rotation, and translation of a single VHR remote-sensing image is applied to generate reference and sensed image patches. After adjusting the contrast and brightness of the sensed image patch, Gaussian and speckle noise are added, and shading and motion blur effects are applied to give it different characteristics from the reference image patch. Subsequently, multiple feature point extractors are combined with homographic adaptation to extract robustly detected feature points by different detectors under various geometric conditions from each image patch. The extracted feature points are optimized using the non-maximum suppression (NMS) technique. Feature point pairs with distance errors within 1 pixel between image patches are identified as matching points using the inverse homography matrix. The coordinates of these matching points, along with the homography matrix, are then employed as pseudo-labels. Training data was generated only when the automated method, applied to the VHR remote sensing database collected from various sources, extracted more than 20 matching points. As a result, training and validation datasets were generated, comprising a total of 341,820 and 44,389 image patches, respectively. The End-to-end matching network trained with the proposed dataset extracted matching points more accurately compared to other matching methods and deep learning networks. Therefore, the proposed method can automatically generate high-quality pseudo-labels that reflect the characteristics of VHR satellite images, thereby improving the training efficiency of deep learning networks.