Twitter Census

The study of migrations and mobility has historically been severely limited by the absence of reliable data or the temporal sparsity of the available data. Using geospatial digital trace data, the study of population movements can be much more precisely and dynamically measured. Our research seeks to develop a near real-time (one-day lag) Twitter census that gives a more temporally granular picture of local and non-local population at the county level. Internal validation reveals over 80% of accuracy when compared with users’ self-reported home location. External validation results suggest these stocks correlate with available statistics of residents/non-residents at the county level and can accurately reflect regular (seasonal tourism) and non-regular events such as the Great American Solar Eclipse of 2017. The findings demonstrate that Twitter holds the potential to introduce the dynamic component often lacking in population estimates. This study could potentially benefit various fields such as demography, tourism, emergency management, and public health and create new opportunities for large-scale mobility analyses.

Scatter plot of 5-year ACS county population estimates and the daily average of total active Twitter residents per county (46 counties in SC).
A) Log-log plots of the relationship between monthly accommodation tax per county and the average of monthly non-resident Twitter users per county. B) R2 values of this relationship across year 2017.
Standard deviations from the mean of daily percentage (z-score) of visitors from August 19 to August 23 for the contiguous US.
A snapshot of standard score of estimated visitors for four selected days in 2017 and 2018 visualized using the portal, revealing clear holiday and weekday-weekend patterns in a national scale.
Interactive web portal for data visualization and exploration

In follow-up projects, we intend to apply the method to other countries of the world and make the information available in near real-time, which would be of great help to follow dynamic processes such as evacuations or shelter-in-place orders during emergencies. We believe this new dataset could benefit various fields such as demography, geography, tourism, emergency management, and public health and create new opportunities for large-scale mobility analyses, as it can introduce the dynamic component often lacking in population estimates. For instance, the creation of world-wide real-time monitoring of the spatial behavior of the population would be particularly relevant during emergencies such as the COVID-19 crisis, where identifying where population are concentrating in near real-time or determining whether people are complying with shelter-in-place official orders becomes the first-class necessity for authorities. Even though this is just an initial step and more detailed analyses would be needed to blend Twitter with other sources such as surveys and SafeGraph data ( and model the intrinsic biases associated with social media data, geotagged tweets, as one of the few forms of freely available geospatial digital trace data, hold promise for the understanding of human spatial behavior under normal and extraordinary conditions such as the COVID-19 global crisis.

Translate ยป