Wednesday, August 21, 2024

In Silico Human Mobility Data Science

In the past we have wrote about using simulation to build synthetic datasets for trajectory analysis due to the limited availability of real world comprehensive datasets. In relation to this work we  (Andreas Züfle, Dieter Pfoser, Carola Wenk, Hamdi Kavak, Taylor Anderson, Joon-Seok Kim, Nathan Holt, Andrew DiAntonio and myself) have a new vision paper entitled "In Silico Human Mobility Data Science: Leveraging Massive Simulated Mobility Data" published in Transactions on Spatial Algorithms and Systems

In the paper we sketch out a framework  for in silico mobility data science. The rationale being in someway that mobility data alone does not tell us much about why people do what do and to quote from the paper "but imagine a world where we can go back in time to ask people about the purpose of their mobility to understand why an individual visited a place of interest." By building models (aka, agent-based models) we can do just that which therefore allows us to build in silico human mobility data  

To build this argument, in the paper we review existing data sets of individual human mobility and their limitations in terms of size and representativeness. We then survey existing simulation frameworks that generate individual human mobility data and comment on their limitations before presenting our vision of a scalable in silico world that captures realistic human patterns of life and allows us to generate massive datasets as sandboxes for human mobility data science. Building off this we describe a small sample of applications and research directions that would be enabled by such massive individual human mobility datasets if our vision came true.

If this sounds of interest, below we provide the abstract to the paper, some of the figures we use to highlight our argument and our envisioned framework that could exhibit both realistic behavior and realistic movement. Finally at the bottom of the post we provide a reference and a link to the paper itself. As always, any thoughts or comments are most welcome. 

Abstract:

Human mobility data science using trajectories or check-ins of individuals has many applications. Recently, we have seen a plethora of research efforts that tackle these applications. However, research progress in this field is limited by a lack of large and representative datasets. The largest and most commonly used dataset of individual human trajectories captures fewer than 200 individuals while data sets of individual human check-ins capture fewer than 100 check-ins per city per day. Thus, it is not clear if findings from the human mobility data science community would generalize to large populations. Since obtaining massive, representative, and individual-level human mobility data is hard to come by due to privacy considerations, the vision of this paper is to embrace the use of data generated by large-scale socially realistic microsimulations. Informed by both real data and leveraging social and behavioral theories, massive spatially explicit microsimulations may allow us to simulate entire megacities at the person level. The simulated worlds, which do not capture any identifiable personal information, allow us to perform “in silico” experiments using the simulated world as a sandbox in which we have perfect information and perfect control without jeopardizing the privacy of any actual individual. In silico experiments have become commonplace in other scientific domains such as chemistry and biology, permitting experiments that foster the understanding of concepts without any harm to individuals. This work describes challenges and opportunities for leveraging massive and realistic simulated alternate worlds for in silico human mobility data science.

Key Words: Spatial Simulation, Mobility Data Science, Trajectory Data, Location Based Social Network Data, In Silico

The envisioned in silico mobility data science process- (let:) A massive microsimulation is created to simulate realistic human behavior specified by a user through an AI-supported builder tool. (middle:) The microsimulation generates massive datasets, including high-fidelity trajectories of all individuals over years of simulation time. This data, which is 100% accurate and complete (in the simulated world) is then sampled to generate realistic datasets. (right:) These datasets are then used to perform mobility data science tasks in the simulated in silico world as if it was the real world. The results of these tasks can then be compared to the ground truth data (of the simulated in silico world) for validation.

The Patterns of Life Simulation. A video of the simulation can be found at: https://www.youtube.com/watch?v=rP1PDyQAQ5M.
Envisioned framework for a simulation that exhibits both realistic behavior and realistic movement.

Full reference: 

Züfle, A., Pfoser, D., Wenk, C., Crooks, A.T., Kavak, H., Anderson, T., Kim, J-S., Holt, N. and Diantonio, A. (2024), In Silico Human Mobility Data Science: Leveraging Massive Simulated Mobility Data (Vision Paper), Transactions on Spatial Algorithms and Systems (pdf).

Monday, July 01, 2024

Call for Abstracts: Future Map @ AGU


Call for Abstracts! 

At the 2024 American Geophysical Union (AGU) meeting to be held during the 9th to 13th of December in Washington, D.C., Carter Christopher, Wenwen Li, Gautam Thakur and myself are organizing a session entitled: “GC077: Future Map: The Convergence of Generative GeoAI, Population Synthesis, and Agent-Based Modeling to Develop Geographic Futures for Climate Assessments” 

Abstract
The climate community has long developed reliable climate models grounded in trusted Earth systems data and physics, but it has not been until recently that human dynamics and feedbacks have been viewed as a necessary coupling within these models. Including human dynamics within integrated models necessitates a forecasted understanding of human transitions within the landscape. The geospatial science domain has typically not looked forward through simulations. Advances in agent-based modeling, synthetic population generation, and GeoAI/GenAI are presenting new opportunities for generating future-oriented representations of humans landscapes, enabling the development of scenario-specific forecasted datasets, such as synthetic satellite imagery, land cover/land use, the built environment, and more. This session will explore the boundaries of geospatial modeling, data synthesis, and microsimulations for forecasting. Emphasis will be placed on research and studies that show how synthetic forecasted data can enable high fidelity assessments of climate futures and population impacts.

If this sounds of interest and you want to be part of this session, further details can be found at: https://agu.confex.com/agu/agu24/prelim.cgi/Session/229712

Key Thinkers on Space and Place


In the recent edition of Key Thinkers on Space and Place edited by Mary GilmartinPhil Hubbard, Rob Kitchin and Sue Roberts, I was asked to write a chapter about Mike Batty

While I have known Mike for a while, to say writing the chapter was easy, is a understatement. In the sense, we had a word constraint (3,000 words plus references) and trying to sum up his biographical details and theoretical context, his spatial contributions along with his key advances and controversies, and key works was a challenge.  Anyway, if you would like to read a draft of my contribution to the book and my attempt to sum up Mikes work, you can find the reference and the link to the chapter below.

Full reference:  
Crooks, A.T. (2024), Michael Batty, in Gilmartin, M., Hubbard, P., Kitchin, R. and Roberts, S. (eds.), Key Thinkers on Space and Place (3rd edition), Sage, London, UK. pp. 37-43. (pdf)

Friday, June 07, 2024

A comparison of social surveys and social media for vaccine hesitancy

In the past we have explored various ways to explore vaccine hesitancy and keeping with this theme we have a new paper published in PLOS ONE entitled "Understanding the determinants of vaccine hesitancy in the United States: A comparison of social surveys and social media" with Kuleen Sasse, Ron Mahabir, Olga Gkountouna and Arie Croitoru

In the paper we use social, demographic and economic (e.g., US Censusvariables to predict COVID-19 vaccine hesitancy levels in the ten most populous US metropolitan statistical areas (MSAs). By using  machine learning algorithms (e.g., linear regression, random forest regression, and XGBoost regression) we compare a set of baseline models that contain only these variables with models that incorporate survey data and social media (i.e., Twitter) data separately. 

We find that different algorithms perform differently along with variations in influential variables such as age, ethnicity, occupation, and political inclination across the five hesitancy classes (e.g., “definitely get a vaccine”, “probably get a vaccine”, “unsure”, “probably not get a vaccine”, and “definitely not get a vaccine”).   Further, we find that the application of the models to different MSAs yields mixed results, emphasizing the uniqueness of communities and the need for complementary data approaches. But in summary, this paper shows social media data’s potential for understanding vaccine hesitancy, and tailoring interventions to specific communities. If this sounds of interest, below we provide the abstract to the paper along with our mixed methods matrix, data sources used and the results from the various MSAs. At the bottom of the post, you cans see the full reference and the link to the paper so you can read more if you so desire. 

Abstract:
The COVID-19 pandemic prompted governments worldwide to implement a range of containment measures, including mass gathering restrictions, social distancing, and school closures. Despite these efforts, vaccines continue to be the safest and most effective means of combating such viruses. Yet, vaccine hesitancy persists, posing a significant public health concern, particularly with the emergence of new COVID-19 variants. To effectively address this issue, timely data is crucial for understanding the various factors contributing to vaccine hesitancy. While previous research has largely relied on traditional surveys for this information, recent sources of data, such as social media, have gained attention. However, the potential of social media data as a reliable proxy for information on population hesitancy, especially when compared with survey data, remains underexplored. This paper aims to bridge this gap. Our approach uses social, demographic, and economic data to predict vaccine hesitancy levels in the ten most populous US metropolitan areas. We employ machine learning algorithms to compare a set of baseline models that contain only these variables with models that incorporate survey data and social media data separately. Our results show that XGBoost algorithm consistently outperforms Random Forest and Linear Regression, with marginal differences between Random Forest and XGBoost. This was especially the case with models that incorporate survey or social media data, thus highlighting the promise of the latter data as a complementary information source. Results also reveal variations in influential variables across the five hesitancy classes, such as age, ethnicity, occupation, and political inclination. Further, the application of models to different MSAs yields mixed results, emphasizing the uniqueness of communities and the need for complementary data approaches. In summary, this study underscores social media data’s potential for understanding vaccine hesitancy, emphasizes the importance of tailoring interventions to specific communities, and suggests the value of combining different data sources.
Mixed methods matrix showing the data, processing, and model development steps used in our study.

Data sources used in our study.

MSA model performance (Bolded adjusted R2 values represent the best performing model for each modeling technique and MSA).

Monday, June 03, 2024

Skiing and Modeling

Looker room layouts
(Source: Gao et al., 2024)

One of my favorite winter activities is skiing and now that all the skiing places in the North East have closed (for those interested Killington, VT closed last Saturday), I thought it would be interesting to see how people have using various modeling techniques to explore ski areas. While what follows is not a comprehensive list of all the works, these are some that I have come across. If you know more, feel free to leave a comment below. 

Models have ranged form looking at the spatial arrangement  of locker rooms at ski resorts (Gao et al., 2024) to lift lines  (congestion) in places such as  La Plagne in the  French Alps (Poulhès and Mirial, 2017) or the Austrian ski resort of Fanningberg (Heinrich et al., 2023). Others have simulated entire ski areas including lift lines, slopes used etc. (Kappaurer 2022). While Pons et al., (2014) developed an agent based model to see how climate change might impact where skies go. Others have explored how climate change might impact ski areas and their associated water usage for making snow (e.g., Soboll and Schmude 2011). Keeping the climate theme, Revilloud et al., (2013) have used agent-based simulations to simulate snow hight on ski runs based on skiers movements in order to facilitate snow cover management (i.e., reduce the production cost of artificial snow and thus water and energy consumption). Murphy (2021) developed a more simple agent-based model of how skiers might ski durring a powder day and explores the area of terrain they may cover based on ability.  

.Simulation of skiers (source Revilloud et al., 2013)

Similar to some of the other models above, but in light of COVID19, Integrated Insight (2020), a analytics consulting company shows in the movie below how one can use simulation to explore crowd management in the base areas of ski resorts. 



References / papers discussed above:
As noted above, if you know more, feel free to leave a comment below.