High-performance Computing Cluster
GIBD is equipped with a high-performance big data computing cluster (supercomputer) named GeoRapider, serving as a testbed for geospatial big data analytics and computing intensive research and applications. GeoRapider consists of 15 computer servers with a total of 232 CPU cores, 864GB memory, and 200 TB of storage. This supercomputer is housed in Sumwalt building at UofSC and maintained by GIBD and Research Computing in the Division of Information Technology. GIBD is also equipped with Deep Learning Workstations powered by high-end NVIDIA Titan XP GPUs for geospatial Artificial Intelligence (GeoAI) research and development.
Big Data Computing Platform
On top of the computing infrastructure, GIBD develops a Geospatial Big Data computing platform with a set of innovative parallel processing algorithms, spatiotemporal indices, query analytical tools, and interactive web portals to efficiently manage, analyze and visualize tens of billions of geotagged tweets, terabytes of climate data, and other big data such as mobile location data, taxi trip data, and electronic health records. . For example, using the platform, massive tweets can be queried, extracted, and visualized within seconds based on the criteria of spatial regions, keywords, time period, and spatial resolutions.
Geospatial Big Data Resource
GIBD has been streaming geotagged tweets for the whole world since 2015 using the official Twitter Streaming API. In addition, the lab also processed the Twitter Stream Grab from Internet Archive with tweets (~1% sample of the entire twitter stream) tracing back to 2012. In total, we have over 18.6 billion tweets stored and managed in our secure data server. In addition, the lab also has maintained a place visitation database obtained from SafeGraph for human mobility related research. This big dataset includes visitations data for over 5 million places in the US (including HIV clinics, hospitals, and night bars etc.).
All these big data datasets are stored and managed in our secured high-performance big data computing cluster and can be rapidly queried and analyzed using our Geospatial Big Data computing platform for supporting different application areas such as disaster management and public health. For example, mining massive amount of geotagged tweets can help support disaster management by examining the physical infrastructure (e.g., road damage), environment (e.g., flood extent), and nature-human interaction (e.g., evacuation) from spatial, temporal, and social dimensions.
The lab has access to a modern conference room with a large TV display for video conferencing and presentations and an interactive big data visualization station.