This study aims at obtaining the optimized tile size and stride when using point label to detect objects in remote sensing image. In this case, we use the address points as the label (ground truth) and tried different tile size and stride to split the image into small tiles. We assume that the object should be wholly (or R%) in the input layer to trigger the classifier, and estimated the stride according to the probability of the object wholly (or R%) located in a single tile when split the image into small tiles to feed the neural network. We suggest that a stride should be a half of the length of the object without considering the location accuracy of the point label.
Fig.1 This study can be used in point label datasets.
Fig.2 The relationship between the recall of address point and the probability of a house wholly located in a tile.
Fig.3 Detection result and examples. (a): Original address point (red dot). (b): Detection results (blue dot), much more than the address points. (c): Image and address point in the area inside the red rectangle in (a). (d): Detection results of the same site of (c). Because the neural network detects house by tiles, it labels then tiles in the boundary of this large building as House. (e) A complex was labeled by many blue dots. (f) – (j) are some samples of detection of the single house. The neural network labels the houses accurately.
Fig. 4 The convolutional neural network used in this research.