Friday, September 27, 2024

Genomic profiling and spatial SEIR modeling of COVID-19 transmission

Lineage distribution of SARS-CoV-2 across
geographic regions of Ontario, Canada,
Western New York, and New York City over time
In the past we have posted on using agent-based models for explore the spread of diseases. We have been keeping up with this work especially in light of COVID-19. To this end we are excited to introduce our new paper entitled "Genomic Profiling and Spatial SEIR Modeling of COVID-19 Transmission in Western New York" published in Frontiers in Microbiology In this paper have been collaborating with other researchers at the University at Buffalo who focus  on the genomic sequencing of various lineages distribution of SARS-CoV-2. What is special about this  new paper is that we explore how such linages change over space and time and how this relates to movement patterns. If this sounds of interest, below you can read the abstract of the paper, see some the lineages in different regions which change over space and time, and our agent-based model which explores how different lineages might spread though peoples movement patterns. At the bottom of the post, you can see the full reference and the link to the paper itself.  

Abstract: 

The COVID-19 pandemic has prompted an unprecedented global effort to understand and mitigate the spread of the SARS-CoV-2 virus. In this study, we present a comprehensive analysis of COVID-19 in Western New York (WNY), integrating individual patient-level genomic sequencing data with a spatially informed agent-based disease Susceptible-Exposed-Infectious-Recovered (SEIR) computational model. The integration of genomic and spatial data enables a multi-faceted exploration of the factors influencing the transmission patterns of COVID-19, including genetic variations in the viral genomes, population density, and movement dynamics in New York State (NYS). Our genomic analyses provide insights into the genetic heterogeneity of SARS-CoV-2 within a single lineage, at region-specific resolutions, while our population analyses provide models for SARS-CoV-2 lineage transmission. Together, our findings shed light on localized dynamics of the pandemic, revealing potential cross-county transmission networks. This interdisciplinary approach, bridging genomics and spatial modeling, contributes to a more comprehensive understanding of COVID-19 dynamics. The results of this study have implications for future public health strategies, including guiding targeted interventions and resource allocations to control the spread of similar viruses.
Phylogenetic and spatial–temporal distribution of omicron BA.2.12.1. (A) Geographic introduction and organization of BA.2.12.1 lineage from February 2022 to November 2022, by percentage of SARS-CoV-2 circulating in each county per month. N/A represents counties with no BA.2.12.1 cases sequenced. (B) Phylogenetic clustering of jukes-cantor distance estimations between consensus sequences of 2,737 samples. Lineages on the phylogenetic tree are color-coded by county; Erie County (pink), Monroe County (green), Onondaga County (blue), and Westchester County (chartreuse). (C) Hierarchical clustering of sample-to-sample distance estimation of 2,737 BA.2.12.1 lineages in four counties across NYS, with k-means clustering k = 4.
SEIR model schematic and dynamics. (A) Schematics of SEIR model including general parameter and synthetic population parameter sets, and model initialization and function (B) R0 = 3 Susceptibility, Exposed, Infectious, and Recovered curves based on the introduction of two infected agents, monitored over time. (C) R0 = 5, (D) R0 = 8.
Commuter behavior dynamics in WNY. Estimated commuter populations originating in a specific county. (A) Commuter behavior with Erie County origins. (B) Commuter behavior from Niagara County origin. (C) Commuter behavior from Monroe County origin. (D) Composite Commuter behavior network.

Full Reference: 

Bard, J.E., Jiang, N., Emerson, J., Bartz, M., Lamb, N.A., Marzullo, B.J., Pohlman, A., Boccolucci, A., Nowak, N.J., Yergeau, D.A., Crooks, A.T. and Surtees, J. (2024), Genomic Profiling and Spatial SEIR Modeling of COVID-19 Transmission in Western New York, Frontiers in Microbiology, 15. Available at  https://doi.org/10.3389/fmicb.2024.1416580  (pdf)

Wednesday, August 21, 2024

In Silico Human Mobility Data Science

In the past we have wrote about using simulation to build synthetic datasets for trajectory analysis due to the limited availability of real world comprehensive datasets. In relation to this work we  (Andreas Züfle, Dieter Pfoser, Carola Wenk, Hamdi Kavak, Taylor Anderson, Joon-Seok Kim, Nathan Holt, Andrew DiAntonio and myself) have a new vision paper entitled "In Silico Human Mobility Data Science: Leveraging Massive Simulated Mobility Data" published in Transactions on Spatial Algorithms and Systems

In the paper we sketch out a framework  for in silico mobility data science. The rationale being in someway that mobility data alone does not tell us much about why people do what do and to quote from the paper "but imagine a world where we can go back in time to ask people about the purpose of their mobility to understand why an individual visited a place of interest." By building models (aka, agent-based models) we can do just that which therefore allows us to build in silico human mobility data  

To build this argument, in the paper we review existing data sets of individual human mobility and their limitations in terms of size and representativeness. We then survey existing simulation frameworks that generate individual human mobility data and comment on their limitations before presenting our vision of a scalable in silico world that captures realistic human patterns of life and allows us to generate massive datasets as sandboxes for human mobility data science. Building off this we describe a small sample of applications and research directions that would be enabled by such massive individual human mobility datasets if our vision came true.

If this sounds of interest, below we provide the abstract to the paper, some of the figures we use to highlight our argument and our envisioned framework that could exhibit both realistic behavior and realistic movement. Finally at the bottom of the post we provide a reference and a link to the paper itself. As always, any thoughts or comments are most welcome. 

Abstract:

Human mobility data science using trajectories or check-ins of individuals has many applications. Recently, we have seen a plethora of research efforts that tackle these applications. However, research progress in this field is limited by a lack of large and representative datasets. The largest and most commonly used dataset of individual human trajectories captures fewer than 200 individuals while data sets of individual human check-ins capture fewer than 100 check-ins per city per day. Thus, it is not clear if findings from the human mobility data science community would generalize to large populations. Since obtaining massive, representative, and individual-level human mobility data is hard to come by due to privacy considerations, the vision of this paper is to embrace the use of data generated by large-scale socially realistic microsimulations. Informed by both real data and leveraging social and behavioral theories, massive spatially explicit microsimulations may allow us to simulate entire megacities at the person level. The simulated worlds, which do not capture any identifiable personal information, allow us to perform “in silico” experiments using the simulated world as a sandbox in which we have perfect information and perfect control without jeopardizing the privacy of any actual individual. In silico experiments have become commonplace in other scientific domains such as chemistry and biology, permitting experiments that foster the understanding of concepts without any harm to individuals. This work describes challenges and opportunities for leveraging massive and realistic simulated alternate worlds for in silico human mobility data science.

Key Words: Spatial Simulation, Mobility Data Science, Trajectory Data, Location Based Social Network Data, In Silico

The envisioned in silico mobility data science process- (let:) A massive microsimulation is created to simulate realistic human behavior specified by a user through an AI-supported builder tool. (middle:) The microsimulation generates massive datasets, including high-fidelity trajectories of all individuals over years of simulation time. This data, which is 100% accurate and complete (in the simulated world) is then sampled to generate realistic datasets. (right:) These datasets are then used to perform mobility data science tasks in the simulated in silico world as if it was the real world. The results of these tasks can then be compared to the ground truth data (of the simulated in silico world) for validation.

The Patterns of Life Simulation. A video of the simulation can be found at: https://www.youtube.com/watch?v=rP1PDyQAQ5M.
Envisioned framework for a simulation that exhibits both realistic behavior and realistic movement.

Full reference: 

Züfle, A., Pfoser, D., Wenk, C., Crooks, A.T., Kavak, H., Anderson, T., Kim, J-S., Holt, N. and Diantonio, A. (2024), In Silico Human Mobility Data Science: Leveraging Massive Simulated Mobility Data (Vision Paper), Transactions on Spatial Algorithms and Systems (pdf).

Monday, July 01, 2024

Call for Abstracts: Future Map @ AGU


Call for Abstracts! 

At the 2024 American Geophysical Union (AGU) meeting to be held during the 9th to 13th of December in Washington, D.C., Carter Christopher, Wenwen Li, Gautam Thakur and myself are organizing a session entitled: “GC077: Future Map: The Convergence of Generative GeoAI, Population Synthesis, and Agent-Based Modeling to Develop Geographic Futures for Climate Assessments” 

Abstract
The climate community has long developed reliable climate models grounded in trusted Earth systems data and physics, but it has not been until recently that human dynamics and feedbacks have been viewed as a necessary coupling within these models. Including human dynamics within integrated models necessitates a forecasted understanding of human transitions within the landscape. The geospatial science domain has typically not looked forward through simulations. Advances in agent-based modeling, synthetic population generation, and GeoAI/GenAI are presenting new opportunities for generating future-oriented representations of humans landscapes, enabling the development of scenario-specific forecasted datasets, such as synthetic satellite imagery, land cover/land use, the built environment, and more. This session will explore the boundaries of geospatial modeling, data synthesis, and microsimulations for forecasting. Emphasis will be placed on research and studies that show how synthetic forecasted data can enable high fidelity assessments of climate futures and population impacts.

If this sounds of interest and you want to be part of this session, further details can be found at: https://agu.confex.com/agu/agu24/prelim.cgi/Session/229712

Key Thinkers on Space and Place


In the recent edition of Key Thinkers on Space and Place edited by Mary GilmartinPhil Hubbard, Rob Kitchin and Sue Roberts, I was asked to write a chapter about Mike Batty

While I have known Mike for a while, to say writing the chapter was easy, is a understatement. In the sense, we had a word constraint (3,000 words plus references) and trying to sum up his biographical details and theoretical context, his spatial contributions along with his key advances and controversies, and key works was a challenge.  Anyway, if you would like to read a draft of my contribution to the book and my attempt to sum up Mikes work, you can find the reference and the link to the chapter below.

Full reference:  
Crooks, A.T. (2024), Michael Batty, in Gilmartin, M., Hubbard, P., Kitchin, R. and Roberts, S. (eds.), Key Thinkers on Space and Place (3rd edition), Sage, London, UK. pp. 37-43. (pdf)

Friday, June 07, 2024

A comparison of social surveys and social media for vaccine hesitancy

In the past we have explored various ways to explore vaccine hesitancy and keeping with this theme we have a new paper published in PLOS ONE entitled "Understanding the determinants of vaccine hesitancy in the United States: A comparison of social surveys and social media" with Kuleen Sasse, Ron Mahabir, Olga Gkountouna and Arie Croitoru

In the paper we use social, demographic and economic (e.g., US Censusvariables to predict COVID-19 vaccine hesitancy levels in the ten most populous US metropolitan statistical areas (MSAs). By using  machine learning algorithms (e.g., linear regression, random forest regression, and XGBoost regression) we compare a set of baseline models that contain only these variables with models that incorporate survey data and social media (i.e., Twitter) data separately. 

We find that different algorithms perform differently along with variations in influential variables such as age, ethnicity, occupation, and political inclination across the five hesitancy classes (e.g., “definitely get a vaccine”, “probably get a vaccine”, “unsure”, “probably not get a vaccine”, and “definitely not get a vaccine”).   Further, we find that the application of the models to different MSAs yields mixed results, emphasizing the uniqueness of communities and the need for complementary data approaches. But in summary, this paper shows social media data’s potential for understanding vaccine hesitancy, and tailoring interventions to specific communities. If this sounds of interest, below we provide the abstract to the paper along with our mixed methods matrix, data sources used and the results from the various MSAs. At the bottom of the post, you cans see the full reference and the link to the paper so you can read more if you so desire. 

Abstract:
The COVID-19 pandemic prompted governments worldwide to implement a range of containment measures, including mass gathering restrictions, social distancing, and school closures. Despite these efforts, vaccines continue to be the safest and most effective means of combating such viruses. Yet, vaccine hesitancy persists, posing a significant public health concern, particularly with the emergence of new COVID-19 variants. To effectively address this issue, timely data is crucial for understanding the various factors contributing to vaccine hesitancy. While previous research has largely relied on traditional surveys for this information, recent sources of data, such as social media, have gained attention. However, the potential of social media data as a reliable proxy for information on population hesitancy, especially when compared with survey data, remains underexplored. This paper aims to bridge this gap. Our approach uses social, demographic, and economic data to predict vaccine hesitancy levels in the ten most populous US metropolitan areas. We employ machine learning algorithms to compare a set of baseline models that contain only these variables with models that incorporate survey data and social media data separately. Our results show that XGBoost algorithm consistently outperforms Random Forest and Linear Regression, with marginal differences between Random Forest and XGBoost. This was especially the case with models that incorporate survey or social media data, thus highlighting the promise of the latter data as a complementary information source. Results also reveal variations in influential variables across the five hesitancy classes, such as age, ethnicity, occupation, and political inclination. Further, the application of models to different MSAs yields mixed results, emphasizing the uniqueness of communities and the need for complementary data approaches. In summary, this study underscores social media data’s potential for understanding vaccine hesitancy, emphasizes the importance of tailoring interventions to specific communities, and suggests the value of combining different data sources.
Mixed methods matrix showing the data, processing, and model development steps used in our study.

Data sources used in our study.

MSA model performance (Bolded adjusted R2 values represent the best performing model for each modeling technique and MSA).