March 25, 2025

Awardees of AI-focused population health pilot projects report progress in their work

Programming code displayed on a computer screen The University of Washington Population Health Initiative awarded five, $100,000 artificial intelligence-focused pilot grants in June 2024 to interdisciplinary teams of researchers who sought to develop preliminary data or the proof-of-concept needed to pursue follow-on funding to scale one’s research efforts.

The goal of this special funding call was to accelerate the application of large language models and generative AI to seemingly intractable grand challenges in population health, with awardees focusing on topics such as more effective diagnosis of tuberculosis to better assessment of brain health and pathology samples.

Each of the projects have now reached its respective midpoint and are reporting progress in the following areas:

Investigators
Yulia Tsvetkov, Allen School of Computer Science & Engineering
Pang Wei Koh, Allen School of Computer Science & Engineering
Jonathan Ilgen, Department of Emergency Medicine

Project update
This project’s goal is to develop reliable, knowledgeable and socially-aware medical large language model (LLM) assistants. It focuses on addressing challenges related to data privacy, model safety and the integration of medical knowledge. By synthesizing realistic data, incorporating uncertainty mechanisms, and augmenting LLMs with external knowledge, the project aims to create interactive prototypes that empower diverse patients, clinicians and researchers across various clinical applications.

Our main accomplishment thus far is the development of MediQ, an interactive benchmark we introduced to reliably evaluate and enhance the question-asking abilities of medical LLMs. Recognizing that static benchmarks inadequately reflect real-world interactive use cases, we designed MediQ to simulate realistic clinical dialogues, requiring models to proactively ask clarifying questions to gather missing patient details before making diagnostic decisions.

Our initial results demonstrate that simply prompting state-of-the-art LLMs to ask follow-up questions reduces diagnostic accuracy, highlighting the difficulty of adapting existing models to interactive, information-seeking contexts. We further explored confidence-based abstention strategies, improving diagnostic performance by 22.3%, although this still falls short of the ideal case with complete patient information upfront. Overall, MediQ establishes a novel pathway toward reliable medical LLM assistants and aligns closely with our project’s broader objectives of creating knowledgeable, socially-aware and trustworthy clinical AI systems serving diverse populations.

A paper summarizing the MediQ framework was presented in NeurIPS 2024, a premier machine learning conference. All PIs collaborated on this project, and it supported a talented PhD student, who is leading our collaborations on LLMs for reliable clinical reasoning support.

Investigators
Linda Shapiro, Department of Electrical and Computer Engineering, Allen School of Computer Science & Engineering
Ranjay Krishna, Allen School of Computer Science & Engineering
Mehmet Saygin Seyfioglu, Department of Electrical & Computer Engineering
Fatemeh Ghezloo, Allen School of Computer Science & Engineering
Wisdom Ikezogwo, Allen School of Computer Science & Engineering

Project update
Diagnosing diseases through histopathology whole slide images (WSIs) is fundamental in modern pathology but is challenged by the gigapixel scale and complexity of WSIs. Trained histopathologists overcome this challenge by navigating the WSI, looking for relevant patches, taking notes, and compiling them to produce a final holistic diagnostic. Traditional AI approaches, such as multiple instance learning and transformer-based models, fall short of such a holistic, iterative, multi-scale diagnostic procedure, limiting their adoption in the real-world.

We have developed PathFinder, a multi-modal, multi-agent framework that emulates the decision-making process of expert pathologists. PathFinder integrates four AI agents – the Triage Agent, Navigation Agent, Description Agent and Diagnosis Agent – that collaboratively navigate WSIs, gather evidence and provide comprehensive diagnoses with natural language explanations. The Triage Agent classifies the WSI as benign or risky; if risky, the Navigation and Description Agents iteratively focus on significant regions, generating importance maps and descriptive insights of sampled patches. Finally, the Diagnosis Agent synthesizes the findings to determine the patient’s diagnostic classification.

Investigators
Michael R. Levitt, Department of Neurological Surgery
Suman Jayadev, Department of Neurology
Shwetak Patel, Allen School of Computer Science & Engineering
Anthony Maxin, Department of Neurological Surgery

Project update
Our project has two principal aims: 1) Collect pupillometry data from a variety of neurological conditions and diseases, and 2) Produce machine learning binary classification models for studied disease states.

Within Aim 1, we have achieved enrollment of 11 subjects with Alzheimer’s dementia, eight subjects with mild cognitive impairment and 16 subjects with normal cognition for their age from the University of Washington Alzheimer’s Disease Research Center Clinical Core. All subjects have associated clinical and demographic characterization.

In our mild traumatic brain injury/concussion cohort, we have enrolled 41 subjects along with relevant clinical information to better characterize the extent of brain injury. Additionally, we have enrolled 24 acute ischemic stroke patients subjects and 13 hemorrhagic stroke subjects. Data collection continues in earnest with the goal of meeting our enrollment milestones by the end of the project period. Machine learning binary classification analysis will be completed once data collection goals have been achieved.

Investigators
Shwetak Patel, Allen School of Computer Science & Engineering, Department of Electrical & Computer Engineering
David Horne, Department of Medicine
Thomas R. Hawn, Department of Medicine

Project update
This project aims to create a dataset of coughs recorded on the person during their daily activities via wearables, and a pipeline for using that data to classify tuberculosis infection in an individual. We are currently making progress with the front end portion of the pipeline for classifying coughs in an audio file as well as creating models using existing datasets for classifying tuberculosis infectiousness level in the form of CASS pos/neg readings. In the next few months we hope to finalize our evaluation of wearable data recording devices so that we can begin collecting new data with our partners in Kenya.

For the data collection, the original plan was to use a FitBit and an audio recording device to collect the data. Our partners in Kenya were sent one of the audio recording prototypes that consisted of a wrist mounted audio recorder in a 3D printed case, however we have not started any data collection yet. Because the long term future plans for the project past the pilot will include audio and biometrics, it is a better use of resources to switch to a device that can record both which the FitBit cannot. While the standalone audio recorder is very easy to use (on/off) we are exploring the use of a Pixel Watch 3 so that we can set up the pipeline to collect both sets of data simultaneously. We can go back to the standalone audio watch should the Pixel Watch not allow for easy data collection.

We have been experimenting on the pipeline for the initial audio processing step of classifying, and labelling coughs in noisy environments to remove the need for human labellers of the data which took up a considerable amount of time. We are using publicly available datasets of labelled coughs recorded with environmental noise to test our input pipeline. The first approach converts the input signal into a spectrogram and then looks for the power of the signal within frequency bands that coughs typically fall under. Various configurations of this yielded very low accuracy for classification. A second approach using machine learning with the MobileNetV2 and an input of the mel-spectrogram of the audio yielded much better accuracy (around 92%) for classification. We still need to add labeling so that we can extract the coughs from a longer audio recording.

While the primary goal of the pilot is to form the front end of the data collection and create a dataset, we also wanted to take advantage of the large amounts of data available from previous studies in TB cough to start refining the models. We are using the TBScreen dataset to evaluate models that can output a CASS positivity classification. Currently we are using ResNet50, a CNN based model, and MobileViT, a transformer based model. With a random sampling of the dataset for the training and test sets we have an accuracy of 75% with an AUROC of 82% for the ResNet and 88% accuracy and AUROC of 95% for the MobileViT. However a random sampling creates data leakage and when we do a patient based split of the data so the training and test sets do not contain data from a single patient, the accuracy unfortunately drops to around 40%-60% which basically indicates random guessing. We are working on doing data augmentation and rearchitecting the models to improve the accuracy with the appropriate data split.

Investigators
Julianne Meisner, Department of Global Health
John Y. Choe, Department of Industrial & Systems Engineering
Shwetak Patel, Allen School of Computer Science & Engineering
Peter Rabinowitz, Department of Environmental & Occupational Health Sciences
Beth Lipton, Washington State Department of Health

Project update
The goal of this project is to create high-resolution and dynamic datasets of key risk factors for pandemic emergence in Washington state, validate them with members of the Washington State One Health Collaborative and develop a computational framework for forecasting these datasets.

The first project aim is to create high-resolution and dynamic datasets of risk factors for pandemic emergence, focusing on three sets of pandemic drivers: biodiversity, human animal movement and land use. Our team has collated human and animal movement data from the Move Bank and eBird databases and produced visualizations from these data. We have not been able to secure access to Strava data because this is only open once per year for academic researchers; we are currently exploring agency access through collaboration with partners at the WA Department of Health. We have also learned through the Washington State One Health Collaborative Network that Pacific Northwest National Laboratory is working with eBird data to create risk maps for exposure to West Nile Virus and avian influenza.

We are working with faculty from the Department of Biology on March 10th to learn more about their workflow for Map of Life data specific to biodiversity. For land use, our team met with students and postdocs from the School of Environment and Forest Sciences in January 2025 to learn more about their workflow for Landsat and VIIRS data, and their work with Google Earth Engine. We have also had meetings with colleagues from the WA Department of Health to discuss the best approach to characterizing land use along a spectrum of intensification of agriculture. From this, we have aligned on a strategy to wrap-up Aim 1 activities, which will be the focus of our work in the coming months. The team has developed an initial prototype that is built on the Google Earth API, which we continue to extend over the course of the next couple quarters

The second project aim is to develop a spatial-temporal computational framework capable of forecasting future values of these risk factors, which will be completed between September 2025 and January 2026.

More information about this funding opportunity can be found by visiting its funding page.

Awardees of AI-focused population health pilot projects report progress in their work

Customizing LLMs for Reliable Clinical Reasoning Support

PathFinder: A Multi-Modal Multi-Agent Framework for Diagnostic Decision-Making in Histopathology

Standalone Smartphone Pupillometry with Machine Learning and AI for Diagnosis of Neurological Disease

Using AI for Tuberculosis Classification Using Wearable Data

AI-generated characterization of landscape risk for disease emergence Washington

What is population health?

Population Health Twitter

Events calendar

Be boundless

Connect with us: