Statistics and Data Science Seminars
Dec 03, 2025 01:00 PM
ZOOM

Host: Haotian Xu
SDS seminar’s website: https://auburn.edu/cosam/datascienceseminar/index.htm
DMS Statistics and Data Science (SDS) Seminar
Nov 19, 2025 01:00 PM
ZOOM

DMS Statistics and Data Science Seminar
Nov 12, 2025 01:00 PM
358 Parker Hall

First, LLMs are strongly biased, for example, they might (over)prefer the number 7 or certain names like Biden, and that bias comes straight from their training data.
DMS Statistics and Data Science Seminar
Oct 29, 2025 01:00 PM
ONLINE

Speaker: Weidong Ma (Univ. of Pennsylvania, Perelman School of Medicine, Biostatistics and Epidemiology)
Title: A Novel Framework for Addressing Disease Under-Diagnosis Using EHR Data.
DMS Statistics and Data Science Seminar
Oct 22, 2025 01:00 PM
ONLINE

Speaker: Dmitrii Ostrovskii (Georgia Tech — Math and ISyE)
Title: Near-Optimal and Tractable Estimation under Shift-Invariance
Abstract: How hard is it to estimate a discrete-time signal \((x_1, \dots, x_n) \in \mathbb{C}^n\) satisfying an unknown linear recurrence relation of order s and observed in i.i.d. complex Gaussian noise? The class of all such signals is parametric but extremely rich: It contains all exponential polynomials over \(\mathbb{C}\) with total degree s, including harmonic oscillations with s arbitrary frequencies. Geometrically, this class corresponds to the projection onto \(\mathbb{C}^n\) of the union of all shift-invariant subspaces of \(\mathbb{C}^{Z}\) of dimension s. We show that the statistical complexity of this class, as measured by the squared minimax radius of the (1−δ)-confidence ℓ2-ball, is nearly the same as for the class of s-sparse signals, namely \(O(slog(en)+log(δ−1))⋅log2(es)⋅log(en/s)\). Moreover, the corresponding near-minimax estimator is tractable, and it can be used to build a test statistic with a near-minimax detection threshold in the associated detection problem. These statistical results rest upon an approximation-theoretic one: We show that finite-dimensional shift-invariant subspaces admit compactly supported reproducing kernels whose Fourier spectra have nearly the smallest possible ℓp-norms, for all \(p∈[1,+∞]\) at once.
Host: Haotian Xu
SDS seminar’s website: https://auburn.edu/cosam/datascienceseminar/index.htm
DMS Statistics and Data Science Seminar
Oct 08, 2025 01:00 PM
358 Parker Hall

Speaker: Prof. Bo Li (Department of Statistics and Data Science, Washington University in St. Louis)
Title: Spatially Varying Changepoint Detection with Application to Mapping the Impact of the Mount Pinatubo Eruption
Abstract: Significant events such as volcanic eruptions can exert global and long-lasting impacts on climate. These impacts, however, are not uniform across space and time. Motivated by the need to understand how the 1991 Mt. Pinatubo eruption influenced global and regional climate, we propose a Bayesian framework to simultaneously detect and estimate spatially varying temporal changepoints. Our approach accounts for the diffusive nature of volcanic effects and leverages spatial correlation. We then extend the changepoint detection problem to large-scale spherical spatiotemporal data and develop a scalable method for global applications. The framework enables Gibbs sampling for changepoints within MCMC, offering greater computational efficiency than the Metropolis–Hastings algorithm. To address the high dimensionality of global data, we incorporate spherical harmonic transformations, which further substantially reduce computational burden while preserving accuracy. We demonstrate the effectiveness of our method using both simulated datasets and real data on stratospheric aerosol optical depth and surface temperature to detect and estimate changepoints associated with the Mt. Pinatubo eruption.
SDS seminar’s website: https://auburn.edu/cosam/datascienceseminar/index.htm
DMS Statistics and Data Science Seminar
Sep 24, 2025 01:00 PM
ZOOM

Speaker: Xiaodong Li (University of California, Davis)
Title: Estimating SNR in High-Dimensional Linear Models: Robust REML and a Multivariate Method of Moments
Abstract: This talk presents two complementary approaches to estimating signal-to-noise ratios (and residual variances) in high-dimensional linear models, motivated by heritability analysis. First, I show that the REML estimator remains consistent and asymptotically normal under substantial model misspecification—fixed coefficients and heteroskedastic and possibly correlated errors. Second, I extend a method-of-moments framework to multivariate responses for both fixed- and random-effects models, deriving asymptotic distributions and heteroskedasticity-robust standard-error formulas. Simulations corroborate the theory and demonstrate strong finite-sample performance.
Host: Haoran Li
SDS seminar’s website: https://auburn.edu/cosam/datascienceseminar/index.htm
DMS Statistics and Data Science (SDS) Seminar
Sep 17, 2025 01:00 PM
352 Parker Hall

Speaker: Yin Tang (University of Kentucky)
Title: Belted and Ensembled Neural Network for Linear and Nonlinear Sufficient Dimension Reduction
Abstract: We introduce a unified, flexible, and easy-to-implement framework of sufficient dimension reduction that can accommodate both linear and nonlinear dimension reduction, and both the conditional distribution and the conditional mean as the targets of estimation. This unified framework is achieved by a specially structured neural network -- the Belted and Ensembled Neural Network (BENN) -- that consists of a narrow latent layer, which we call the belt, and a family of transformations of the response, which we call the ensemble. By strategically placing the belt at different layers of the neural network, we can achieve linear or nonlinear sufficient dimension reduction, and by choosing the appropriate transformation families, we can achieve dimension reduction for the conditional distribution or the conditional mean. Moreover, thanks to the advantage of the neural network, the method is very fast to compute, overcoming a computation bottleneck of the traditional sufficient dimension reduction estimators, which involves the inversion of a matrix of dimension either p or n. We develop the algorithm and convergence rate of our method, compare it with existing sufficient dimension reduction methods, and apply it to two data examples.
https://arxiv.org/abs/2412.08961
Host: Haotian Xu
SDS seminar’s website: https://auburn.edu/cosam/datascienceseminar/index.htm
DMS Statistics and Data Science Seminar
Apr 23, 2025 02:00 PM
354 Parker Hall

Speaker: Dr. Andrés Felipe Barrientos (Assistant Professor, Department of Statistics, Florida State University)
Title: Bayesian nonparametric modeling of mixed-type bounded data
Abstract: We propose a Bayesian nonparametric model for mixed-type bounded data, where some variables are compositional and others are interval-bounded. Compositional variables are non-negative and sum to a given constant, such as the proportion of time an individual spends on different activities during the day or the fraction of different types of nutrients in a person's diet. Interval-bounded variables, on the other hand, are real numbers constrained by both a lower and an upper bound. Our approach relies on a novel class of random multivariate Bernstein polynomials, which induce a Dirichlet process mixture model of products of Dirichlet and beta densities. We study the theoretical properties of the model, including its topological support and posterior consistency. The model can be used for density and conditional density estimation, where both the response and predictors take values in the simplex space and/or hypercube. We illustrate the model's behavior through the analysis of simulated data and data from the 2005-2006 cycle of the U.S. National Health and Nutrition Examination Survey.
Joint work with Rufeng Liu, Claudia Wehrhahn, and Alejandro Jara.
DMS Statistics and Data Science Seminar
Apr 16, 2025 02:00 PM
354 Parker Hall

Speaker: Dr. Jiwon Park (postdoctoral researcher, Department of Epidemiology, Johns Hopkins University)
Title: A Robust Pleiotropy Testing Method with Applications to Inflammatory Bowel Disease Subtypes with Sample Overlap
Abstract: Pleiotropy, where a genetic region influences multiple traits, is common in complex diseases and provides insight into shared biological mechanisms. However, identifying pleiotropic loci remains challenging, especially for correlated traits or case-control studies with overlapping samples. We present PLACO+, a statistical method for detecting pleiotropic associations using GWAS summary statistics from two traits. PLACO+ models a composite null hypothesis with an inflated variance structure, allowing for partial associations, and computes analytical p-values based on the distribution of the product of correlated Z-scores. Applied to genome-wide studies of inflammatory bowel disease (IBD) subtypes—Crohn’s disease and ulcerative colitis—PLACO+ identifies shared genetic loci missed by conventional approaches, particularly when effects are in opposite directions. These results demonstrate the utility of PLACO+ in uncovering novel pleiotropic signals in complex trait genetics.