MapReduce, a parallel data processing framework pioneered by Google, has been proven to be effective when it comes to handling big data challenges. As an open source implementation of MapReduce, Hadoop has gained increasing popularity over the past several years. However, Hadoop is not designed to handle spatiotemporal data. To bridge the gap, we propose a novel spatiotemporal indexing approach that significantly accelerates querying and processing of big climate data with MapReduce in their native format.
Structure of the spatiotemporal index
The index bridges the gap between array-based data models and block-oriented HDFS storage models by linking the logical spatiotemporal information (space, time, and variables) to the physical location information (node, file, and byte). Based on the index, a grid partition algorithm was developed to optimize MapReduce processing performance by maximizing data locality and balancing the workload across cluster nodes.
Li, Z., Hu, F., Schnase, J. L., Duffy, D. Q., Lee, T., Bowen, M. K., & Yang, C. (2017). A spatiotemporal indexing approach for efficient processing of big array-based climate data with MapReduce. International Journal of Geographical Information Science, 31(1), 17-35.