Building Training Dataset using existing Spatial Data

This research used different portions of a land cover dataset to create the training datasets and used them to train the neural network. By analyzing the classification results of the trained neural networks, this study examined the effects of the size of training dataset on their classification accuracies and other metrics.

We concluded that the size of training dataset does have an impact on the classification result. In our study case, a larger training dataset leads to better performance: not only the total accuracy but also the kappa value and F1 score. From the research case, we summarized several guidelines when converting an existing geo-spatial data.

  • More training data is needed to classify the urban area, or other classes with complicated boundary and classifying rules. The data owner should build the training dataset as large as possible.
  • In the rural area, where with less human-made infrastructure, the neural network needs fewer data to learn the features and get a better performance on the accuracy of class and depiction of the boundary.
  • When start a land cover extracting task, a small training dataset from the human operator can be used to classify the rural area or check the class correctness of urban area land cover.
  • Edge Ratio can be used to evaluate the results from the different classifiers or to obtain the potential improvement when comparing the ground truth. An Edge Ratio closer to the ground truth means the corresponding classifier has a better performance.

test

Fig1. Using different schemes to sample the land cover database into the training dataset.

image040

Fig2. Image classification results of different sizes of the training dataset.

image050  image048

Fig3. Details of image classification results of different sizes of the training dataset.

image054

Fig4. Details of image classification results of different sizes of the training dataset in the urban area.

image052

Fig5. Details of image classification results of different sizes of the training dataset in the rural area.

Translate »