Thursday, February 06, 2025

From print to perspective: A mixed-method analysis of the convergence and divergence of COVID-19 topics in newspapers and interviews

In previous posts we have noted how one can explore urban issues through newspapers, while at the same time we have used social media to explore trends in vaccinations. In a recently published paper in PLOS Digital Health entitled "From print to perspective: A mixed-method analysis of the convergence and divergence of COVID-19 topics in newspapers and interviews" with Qingqing Chen, Adam Sullivan, Jennifer Surtees, Laurene Tumiel-Berhalter and myself, we thought we would explore how COVID-19 was reported in newspapers and how this varied from interviews. 

The rationale behind this was that the COVID-19 pandemic has led to diverse experiences influenced by public health measures like lockdowns and social distancing. To explore these dynamics, we introduce a novel ’big-thick’ data approach that integrates extensive U.S. newspaper data with detailed interviews. By employing natural language processing (NLP) and geoparsing techniques, we identify key topics related to the pandemic and vaccinations both in newspapers and personal narratives from interviews, and compare the (spatial) convergences and divergences between them. 

We found that both sources converge to highlight the profound impacts of the pandemic on daily life. However, newspapers provide a macro-level perspective, predominately covering policy, public health efforts and economics, while interviews reveal the nuanced impacts at the micro-level, focusing on personal experiences, emotion and concerns. An intriguing finding is the pronounced concern regarding the reliability of news information from interviews. By showcasing both convergences and divergences in identified topics, our study enhances the understanding of key issues that both disseminated to and resonate with the public, contributing to the development of more effective communication strategies for future public health crises.

If this sounds of interest, below you can read the abstract to the paper, see some of the figures which include our workflow and some of the results. At the bottom of the post you can see the full reference and a link to the actual paper. While at https://figshare.com/s/339b1c0d059c189dd6a4?file=44583661 you can find the code we used for our analysis. 

Abstract:

In the face of the unprecedented COVID-19 pandemic, various government-led initiatives and individual actions (e.g., lockdowns, social distancing, and masking) have resulted in diverse pandemic experiences. This study aims to explore these varied experiences to inform more proactive responses for future public health crises. Employing a novel “big-thick” data approach, we analyze and compare key pandemic-related topics that have been disseminated to the public through newspapers with those collected from the public via interviews. Specifically, we utilized 82,533 U.S. newspaper articles from January 2020 to December 2021 and supplemented this “big” dataset with “thick” data from interviews and focus groups for topic modeling. Identified key topics were contextualized, compared and visualized at different scales to reveal areas of convergence and divergence. We found seven key topics from the “big” newspaper dataset, providing a macro-level view that covers public health, policies and economics. Conversely, three divergent topics were derived from the “thick” interview data, offering a micro-level view that focuses more on individuals’ experiences, emotions and concerns. A notable finding is the public’s concern about the reliability of news information, suggesting the need for further investigation on the impacts of mass media in shaping the public’s perception and behavior. Overall, by exploring the convergence and divergence in identified topics, our study offers new insights into the complex impacts of the pandemic and enhances our understanding of key issues both disseminated to and resonating with the public, paving the way for further health communication and policy-making.
An overview of the research workflow.

The monthly distribution of collected articles in the United States from January 2020 to December 2021.

An example of identified entities labeled with predefined entity types.

The spatial distribution of newspaper articles by different scales.


The spatial distribution of identified newspaper topics across different regions in New York State.

Ordered rank of identified topics by percentage from interviews.

Full reference:
Chen, Q., Crooks, A.T., Sullivan, A.J., Surtees, J.A. and Tumiel-Berhalter, L. (2025). From Print to Perspective: A mixed-method analysis of the convergence and divergence of COVID-19 topics in newspapers and interviews, PLOS Digital Health. Available at https://doi.org/10.1371/journal.pdig.0000736. (pdf)

Friday, January 31, 2025

New Directions in Mapping the Earth’s Surface with Citizen Science and Generative

In previous posts, we have written how large language models (LLMs) like ChatGPT can be used in various urban analytical applications. We have kept exploring this potential especially with respect to citizen science applications. To this end we have just published a new paper in iScience, entitled "New Directions in Mapping the Earth’s Surface with Citizen Science and Generative AI". In the paper, lead by Linda See, we discuss how multi-modal LLMs (MLLMs) which are like LMMs but can take different forms of inputs (e.g., text, images, video) and output multi-modal information (e.g., take an image and output a description) could be leveraged to enhance citizen science land cover/land use mapping campaigns. If this sounds of interest, below you can read the abstract to the paper, see some of the figures we use to build our argument, while at the bottom of the post you can see the full reference and a link to the actual paper.
Abstract: 
As more satellite imagery has become openly available, efforts in mapping the Earth’s surface have accelerated. Yet the accuracy of these maps is still limited by the lack of in-situ data needed to train machine learning algorithms. Citizen science has proven to be a valuable approach for collecting in-situ data through applications like Geo-Wiki and Picture Pile, but better approaches for optimizing volunteer time are still required. Although machine learning is being used in some citizen science projects, advances in generative Artificial Intelligence (AI) are yet to be fully exploited. This paper discusses how generative AI could be harnessed for land cover/land use mapping by enhancing citizen science approaches with multi-modal large language models (MLLMs), including improvements to the spatial awareness of AI.
Visual interpretation tasks undertaken by ChatGPT for (a) a wetland/mangrove landscape in South America (b) an agricultural area in central Europe.
Visual interpretation tasks undertaken by ChatGPT for identification of natural and non-natural ecosystems where ChatGPT misclassified the images as non-natural for locations in (a) Chad and (b) Austria. In (c), the image from Colombia was classified as unsure by validators but natural by ChatGPT.
Integrating multi-modal Large Language Models (MLLMs) in a citizen science visual interpretation workflow.
Full reference : 
See, L., Chen, Q., Crooks, A., Bayas, J.C.L., Fraisl, D., Fritz, S., Georgieva, I., Hager, G., Hofer, M., and Lesiv, M., Malek, Ž., Milenković, M., Moorthy, I., Orduña-Cabrera, F., Pérez-Guzmán, K., Schepaschenko, D., Shchepashchenko, M., Steinhauser, J.and McCallum, I. (2025), New Directions in Mapping the Earth’s Surface with Citizen Science and Generative AI, iScience, doi: https://doi.org/10.1016/j.isci.2025.111919(pdf)

Saturday, December 14, 2024

AGU

This past week we attended the American Geophysical Union (AGU) Fall Meeting in Washington DC. At the AGU we presented two abstracts. 

The first follows on our work with respect to using synthetic populations within agent-based models. This work was with Na Jiang, Fuzhen Yin and Boyu Wang and entitled "A Framework for Populating Urban Digital Twins with Agents." Or more specially why digital twins need agents. Below you can see our abstract and a couple of figures showing our synthetic population workflow and how we integrate these into agent-based models.  

Abstract:

Over the last few years, considerable efforts have been placed in creating digital twins from diverse fields ranging from engineering to urban planning and many things in-between. These digital twins have benefited from the growth and availability of computational power and data. For example, in urban planning the growth of computational resources and the explosion of spatial data sources(e.g. remote sensing) has lead to the creation and widespread adoption of detailed virtual urban environments or urban digital twins. However, we would argue that many of such works emphasize only the physical infrastructure or the built environment of the city instead of considering the key actors of urban systems: the people who live in them. In this work we aim to remedy this by introducing a framework that utilizes agent-based modeling to add humans to such urban digital twins. This framework consists of two components: 1)synthetic populations generated with census data; and 2) pipeline of using the population datasets for agent-based modeling applications within the urban digital twins domain. To demonstrate the utility of this framework, we have representative applications that showcase how digital twins can be created to study various urban phenomena (e.g., evacuation scenarios, traffic congestion and disease transmission). By doing so, we believe this framework will benefit researchers wishing to build urban digital twins and to explore complex urban issues with realistic populations. 


Workflow of utilizing synthetic populations within agent-based models.
Examples of agent-based models utilizing our synthetic popuation.

In a different presentation, we return to how one can use social media to monitor the world around us, in this case dust storms. This work entitled "Mining unconventional data sources: creating a social media-based catalog of dust events in the Western US" is collaboration with Stuart Evans and Festus Adegbola. Generally speaking we explore how social media has the potential for a new unconventional source of observations of windblown dust. If this sounds of interest, below you can read the abstract to the paper and see the visual overlap between social media posts about dust events and official National Weather Service (NWS) dust storm warning coverage. 

Abstract 

Complete observations of dust events are difficult, as dust’s spatial and temporal variability means satellites may miss dust due to overpass time or cloud coverage, while ground stations may miss dust due to not being in the plume. As a result, an unknown number of dust events go unrecorded in traditional datasets. Dust’s importance both for atmospheric processes and as a health and travel hazard makes detecting dust events whenever possible important, and in particular, studies of the health impacts of dust are limited by detailed exposure information, i.e. where is there dust and when. In recent years, social media platforms have provided an opportunity to access vast user-generated data. This research utilizes geotagged Flickr and Twitter posts referencing dust in the western US, and compares it to traditional datasets including blowing dust reports from the National Weather Service and satellite observations from Suomi-VIIRS. Results show that this unconventional dataset broadly recreates the observed spatial and seasonal distributions of dust. Daily analysis of the locations of the social media posts creates a novel catalog of dust events in the western US that can be used for further research. While this catalog is necessarily incomplete, it nonetheless provides a complementary list of events to those detected by traditional means. Analysis of individual events in this catalog shows that social media captures many dust events that previously went undetected by traditional datasets.


References:

Crooks, A.T., Jiang, N., Yin, F. and Wang, B. (2024), A Framework for Populating Urban Digital Twins with Agents, American Geophysical Union (AGU) Fall Meeting, 9th–13th December, Washington, DC. (pdf)

Evans, S., Adegbola, F. and Crooks, A.T. (2024), Mining Unconventional Data Sources: Creating a Social Media-based Catalog of Dust Events in the Western US, American Geophysical Union (AGU) Fall Meeting, 9th–13th December, Washington, DC. (pdf)

Thursday, November 07, 2024

A Large-Scale Geographically Explicit Synthetic Population with Social Networks for the US

In numerous posts, we have been discussing synthetic populations and their use in agent-based modeling. But there are many modeling styles that also utilize synthetic populations. In our own work we often spend significant amounts of time creating such synthetic populations, especially those grounded with data, due to the time needed to collect, preprocess and generate the final synthetic population. To alleviate this, we (Na (Richard) JiangFuzhen YinBoyu Wang and myself) have a new paper published in Scientific Data, entitled "A Large-Scale Geographically Explicit Synthetic Population with Social Networks for the United States.Our aim of this paper is to build and provide a geographically explicit synthetic population along with its social networks using open data including that from the latest 2020 U.S. Census which can be used in a variety of geo-simulation models.

Summary of the Resulting Datasets.

Specially, in the paper we outline how we created the a synthetic population of 330,526,186 individuals representing America's 50 states and Washington D.C.. Each individual has a set of geographical locations that represent their home, work or school addresses. Additionally, these individuals are not isolated, they are embedded in a larger social setting based on their household, working and studying relationships (i.e., social networks).

The work (e.g., data collection, data preprocessing and generation processes) was coded using Python 3.12 and all the scripts used are available at: https://github.com/njiang8/geo-synthetic-pop-usa while the resulting datasets (85 GB uncompressed) are available at OSF: https://osf.io/fpnc2/.  

To give you a sense of the paper, below we provide the abstract to it, along with  some results and our efforts to validate the synthetic population. While at the full reference and link to the paper can be found at the bottom of the post. 

Abstract:

Within the geo-simulation research domain, micro-simulation and agent-based modeling often require the creation of synthetic populations. Creating such data is a time-consuming task and often lacks social networks, which are crucial for studying human interactions (e.g., disease spread, disaster response) while at the same time impacting decision-making. We address these challenges by introducing a Python based method that uses the open data including that from 2020 U.S. Census data to generate a large-scale realistic geographically explicit synthetic population for America's 50 states and Washington D.C. along with the stylized social networks (e.g., home, work and schools). The resulting synthetic population can be utilized within various geo-simulation approaches (e.g., agent-based modeling), exploring the emergence of complex phenomena through human interactions and further fostering the study of urban digital twins.

Keywords: Synthetic Population, U.S. Census 2020, Agent-Based Modeling, Geo-Simulation, Social Networks.

Data Generation Workflow and Resulting Datasets.

A Sample of a Social Networks for one Household and their Home, Work and Educational Social Networks from the Generated Data.

Sample of Generated Social Networks Extracted from the City of Buffalo, New York: (a) Household; (b) Work; (c) School; (d) Daycare.

Validation of the Synthetic Population at Different Levels: (a) Population under Different 18 Age Groups; (b) Household under Different Household Types.

Full Referece: 

Jiang, N., Yin, F., Wang., B. and Crooks, A.T., (2024), A Large-Scale Geographically Explicit Synthetic Population with Social Networks for the United States, Scientific Data, 11, 1204. https://doi.org/10.1038/s41597-024-03970-1 (pdf)




Friday, November 01, 2024

Pattern of Life Human Mobility Simulation (Demo)

While in the past we have written about how we can use agent-based models to capture basic patterns of life, and even developed a simulations, but until now we have never really demonstrated how we go about this. However, at the  SIGSPATIAL 2024 conference  we (Hossein Amiri, Will Kohn, Shiyang Ruan, Joon-Seok Kim, Hamdi Kavak, Dieter Pfoser, Carola Wenk, Andreas Zufle and myslf) have a demonstration paper entitled "The Pattern of Life Human Mobility Simulation." in which we show: 

  1. How to run the Patterns of Life Simulation with the graphical user interface (GUI) to visually explore the mobility patterns of a region.
  2. How to run the Patterns of Life Simulation headless (without GUI) for large-scale data generation.
  3. How to adapt the simulation to any region in the world using OpenStreetMap data,
  4. Showcase how recent scalability improvements allow us to simulate hundreds of thousands of agents.

If this sounds of interest, below we show the GUI to the model, along with the steps to generate a trajectory dataset or a new map for the simulation. At the bottom of the post you can actually see the papers full reference and a link to download it. While at https://github.com/onspatial/generate-mobility-dataset you can find the source code for the enhanced simulation and data-processing tools for you to experiment with.

Abstract: 

We demonstrate the Patterns of Life Simulation to create realistic simulations of human mobility in a city. This simulation has recently been used to generate massive amounts of trajectory and check-in data. Our demonstration focuses on using the simulation twofold: (1) using the graphical user interface (GUI), and (2) running the simulation headless by disabling the GUI for faster data generation. We further demonstrate how the Patterns of Life simulation can be used to simulate any region on Earth by using publicly available data from OpenStreetMap. Finally, we also demonstrate recent improvements to the scalability of the simulation allows simulating up to 100,000 individual agents for years of simulation time. During our demonstration, as well as offline using our guides on GitHub, participants will learn: (1) The theories of human behavior driving the Patters of Life simulation, (2) how to simulate to generate massive amounts of synthetic yet realistic trajectory data, (3) running the simulation for a region of interest chosen by participants using OSM data, (4) learn the scalability of the simulation and understand the properties of generated data, and (5) manage thousands of parallel simulation instances running concurrently.

Keywords: Patterns of Life, Simulation, Trajectory, Dataset, Customization

A screenshot of the graphical user interface of the Patterns of Life Simulation. The GUI shows the map and the movements of agents on the left side and the social network of agents and their statistical properties on the right side. 

Steps to generate the one trajectory dataset.
Steps to generate a new map for the simulation.

Full referece: 

Amiri, H., Kohn, W., Ruan, S., Kim, J-S., Kavak, H., Crooks, A.T., Pfoser, D., Wenk, C. and Zufle, A. (2024) The Pattern of Life Human Mobility Simulation (Demo Paper), ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Atlanta, GA. (pdf)