Armeen Taeb
Assistant Professor
Department of Statistics
College of Arts and Sciences
ataeb@uw.edu
Taeb Faculty page
What is your Research Focus?
My research lies at the intersection of optimization and statistics. Broadly speaking, my work focuses on developing efficient methods for extracting useful and reliable information from complex datasets. One theme of my research is on latent-variable modeling. Suppose you obtain measurements of a collection of variables — these could be water levels of a collection of reservoirs in California or medical image data of a collection of patients and whether they have a particular disease — and we would like to understand the relationship between these variables. In this first case, in order to make more informed policies, we would like to understand how an increase in water levels of one reservoir affects the water levels of another reservoir. In the second case, we would like to learn a model that predicts from a medical image whether a patient has a disease. Frequently, in trying to understand the relationships or interactions between these variables, there are some relevant variables that are latent or unobserved. For instance, in the reservoir example, there may be some geopolitical factors that are not directly observable but influence the way reservoir levels vary. In the medical imaging example, clinicians are typically reluctant to trust models unless the factors identified for prediction are interpretable; however many interpretable factors such as shape or size of an organ are not directly observed and may be unknown. The question is: by looking at the original set of variables that we do get to observe, can we potentially infer the existence of hidden effects. By accounting for these, we can discover interpretable and accurate models governing the original set of variables. We are working on developing efficient tools based on optimization to solve this problem with applications to California reservoir modeling and medical imaging.
Another theme of my research is on using causal reasoning to learn prediction models that generalize to unseen environments/domains. As a motivating example, suppose we are running a mult-center study of a new vaccine where we gather data on various features of individuals (e.g. height, gender, weight) as well the effectiveness of the vaccine on these individuals. Our goal is to learn an accurate prediction model that uses features of individuals to predict how effective the vaccine will be. Naturally, we are only able to obtain data across a few sub-populations, yet we want our prediction model to work well for many sub-populations that are potentially very different from what we have observed! Thankfully, causality provides a mathematical foundation for modeling perturbations/interventions/changes to a system of variables. Using causal models, we can identify prediction models that extrapolate to unseen environments.
The final theme of my research is on model selection in non-traditional settings. As a motivating example, suppose we have a ranking of hospitals in Washington from a few years ago, and the state of Washington allocated resources according to this ranking. Now, new data on metrics that assess clinical performance is available, and the goal is to update the previous ranking using this data; based on the newly obtained ranking, the state would update its existing resource infrastructure. Naturally, we want any changes (i.e. discoveries) to the original ranking to be accurate since changes to the infrastructure based on an incorrect ranking can be very costly. How do we develop a model selection procedure that ensures that we make few `false discoveries’? Following the standard paradigm of multiple hypothesis testing, we might test for pairwise ranking of all the hospitals. Unfortunately, we run into a problem because simply combining hypothesis tests based on pairwise comparisons may lead to incompatibilities and violate transitivity. Similar issues arise in clustering and causal learning, where models must satisfy global constraints such as transitivity and acyclicity. Thus, in these sophisticated modeling paradigms, we need new formalism for measuring true and false discoveries and approaches that control them. It turns out that tools from combinatorics and order theory are very helpful in addressing this challenge! Specifically, we can organize classes of models as partially ordered sets, which leads to systematic approaches for defining analogs of true and false discoveries as well as methodology for controlling false discoveries.
What opportunities at the UW excite you?
I am excited by the close interaction between faculty and students in statistics, biostatistics, mathematics and electrical engineering. The university has a number of centers such as the eScience Institute and the IFDS Institute that bring together folks from all these areas who are broadly working on data science and its applications. While we may have different backgrounds, we share the common language of mathematics and excitement for developing data-driven methods for advancing science and technology. And so while we may be working on different things, this common language and interest make us able to talk to each other. I think the opportunity to learn from someone else is absolutely maximized if you share broadly the same language but have maybe a different perspective. You get exposed to very different ways of thinking.