Statistics and Data Science Seminars

COSAM Departments Mathematics & Statistics Research Seminars Statistics and Data Science

Upcoming Statistics and Data Science Seminars

Past Statistics and Data Science Seminars

DMS Statistics and Data Science (SDS) Seminar
Feb 25, 2026 02:00 PM
358 Parker Hall

Speaker: Jiajin Sun (Florida State University, Department of Statistics)

Title: (full article available by clicking:) Efficient Analysis of Latent Spaces in Heterogeneous Networks

Abstract: This work proposes a unified framework for efficient estimation under latent space modeling of heterogeneous networks. We consider a class of latent space models that decompose latent vectors into shared and network-specific components across networks. We develop a novel procedure that first identifies the shared latent vectors and further refines estimates through efficient score equations to achieve statistical efficiency. Oracle error rates for estimating the shared and heterogeneous latent vectors are established simultaneously. The analysis framework offers remarkable flexibility, accommodating various types of edge weights under general distributions.

Speaker Bio: Jiajin Sun is an Assistant Professor of Statistics at Florida State University. He earned his Ph.D. in Statistics at Columbia University in 2024. His research spans network analysis, high-dimensional statistics, and semiparametric statistics.

SDS seminar’s website: https://auburn.edu/cosam/datascienceseminar/index.htm

DMS Statistics and Data Science Seminar
Feb 11, 2026 02:00 PM
358 Parker Hall

sayar

Speaker: Sayar Karmakar (University of Florida, Department of Statistics)

Title: Epidemic Changepoints: Applications in spatial anomaly detection and localizing LLM watermarks

Abstract: We present epidemic change-points as a unifying lens for two localization problems: (i) detecting spatial anomalies and (ii) segmenting watermarked regions in mixed-source text. For spatial data, we formalize a "spatial" change-point as an anomalous region (an epidemic in space), provide detection-accuracy results for single and multiple breaks, and propose a block-based scan that delivers substantial computational savings with guarantees. Next, we move to a seemingly unrelated but a very pertinent topic.

As large language models proliferate, ensuring content provenance has become a statistical challenge. For this problem on finding localized modified text data segments, we introduce WISER, a fast epidemic-segmentation approach with finite-sample error bounds and consistency for multiple watermarked segments, and we demonstrate empirical gains over state-of-the-art baselines on benchmark datasets.

We emphasize how classical changepoint ideas catered to epidemic and transient departures yield principled, scalable solutions to modern problems in text provenance and spatial anomaly detection. Simulations and empirical studies corroborate the theory and point to open questions for PhD-level research.

Joint work with Soham Bonnerjee and Subhrajyoty Roy (watermarks) and with Soham Bonnerjee and George Michailidis (spatial anomaly).

Speaker Bio: Sayar Karmakar is an Assistant Professor of Statistics at the University of Florida. His research spans high-dimensional time series, changepoints, spatial and spatiotemporal data, econometrics and applied probability.

DMS Statistics and Data Science (SDS) Seminar
Dec 03, 2025 01:00 PM
ZOOM

Speaker: Carlos-Misael Madrid-Padilla (Washington University in St. Louis)

Title: Temporal Spatial Model via Trend Filtering

Abstract: This talk explores the estimation of nonparametric regression functions in the presence of temporal-spatial dependencies. In such a context, trend filtering, a nonparametric estimator, is studied. To the best of our knowledge, this estimator has not previously been examined in a similar context. In the univariate case, the signals considered are assumed to have a k-th weak derivative with bounded total variation, allowing for a general degree of smoothness. In the multivariate setting, we study a variant of the K-nearest neighbor fused lasso estimator. For this case, the function is required to have bounded variation and satisfy a property that extends a piecewise Lipschitz continuity criterion, or the function is assumed to be piecewise Lipschitz. To enable efficient computation, we developed an ADMM algorithm. By aligning with lower bounds, the minimax optimality of the univariate and multivariate estimators is shown. A unique phase transition phenomenon, previously unprecedented in trend filtering studies, emerges through the analysis. Both simulation studies and real data applications underscore the superior performance of the method when compared with established techniques in the existing literature.

Bio:

Carlos-Misael Madrid-Padilla is a tenure-track Assistant Professor in the Department of Statistics and Data Science at Washington University in St. Louis. He earned a Ph.D. in Mathematics at the Department of Mathematics at the University of Notre Dame under the supervision of Dr. Daren Wang. During the first two years of his Ph.D., he received a master's degree in mathematics under the supervision of Dr. Alex Himonas. His research interests include high-dimensional statistics, nonparametric statistics, change point detection, causal inference, Bayes methodology, etc.

Host: Haotian Xu

SDS seminar’s website: https://auburn.edu/cosam/datascienceseminar/index.htm

DMS Statistics and Data Science (SDS) Seminar
Nov 19, 2025 01:00 PM
ZOOM

Speaker: Ruizhi Zhang (University of Georgia)

Title: Robust Sequential Change Detection: The Approach Based on Breakdown Points and Influence Functions

Abstract: Sequential change-point detection has many important applications in industrial quality control, signal detection, and clinical trials. However, many classical procedures may fail when the observed data are contaminated by outliers, even if the percentage of outliers is very small. In this paper, we focus on the problem of robust sequential change-point detection in the presence of a small proportion of random outliers. We first study the statistical detection properties of a general family of detection procedures under Huber’s gross error model. Moreover, we incorporate ideas of the breakdown point and the influence function from the classical offline robust statistics literature and propose their new definitions to quantify the robustness of general sequential change-point detection procedures. Then, we derive the breakdown points and influence functions of our proposed family of detection procedures, which provide a quantitative analysis of the robustness of these procedures. Moreover, we find the optimal robust bounded-influence procedure in that general family that has the smallest detection delay subject to the constraints on the false alarm rate influence function. It turns out the optimal procedure is based on the truncation of the scaled likelihood ratio statistic and has a simple form. Finally, we demonstrate the robustness and the detection efficiency of the optimal robust bounded-influence procedure through extensive simulations and compute numerical approximations of breakdown points and influence functions of some procedures to have a quantitative understanding of the robustness of different procedures.

Bio:

Ruizhi Zhang is an Associate Professor in the Department of Statistics at the University of Georgia, Athens. Before that, he was an assistant professor in the Department of Statistics at the University of Nebraska-Lincoln. He received his B.S. degree in Mathematics from Hua Loo-Keng Talent Program in Mathematics at the University of Science and Technology of China (USTC) in 2014, graduated with honors. He received his P.h.D degree in Industrial Engineering from the School of Industrial and Systems Engineering at Georgia Institute of Technology in 2019, and he was co-advised by Prof. Yajun Mei and Prof. Jianjun Shi. His research interests include change-point detection, sequential analysis, robust statistics, high-dimensional statistical inference, etc.

Host: Haotian Xu

SDS seminar’s website: https://auburn.edu/cosam/datascienceseminar/index.htm

DMS Statistics and Data Science Seminar
Nov 12, 2025 01:00 PM
358 Parker Hall

Speaker: Dr. Anh Nguyen (Computer Science Department, Auburn University)

Title: How to make Vision Language Models see and explain themselves

Abstract: Large Language Models, or LLMs, with their massive world knowledge learned from text, have completely changed the game. They’ve introduced a new era: vision-language models (VLMs). In these models, images and text live in the same representation space, and instead of predicting from a fixed set of labels, they draw predictions from an open vocabulary. In this talk, I will walk you through three challenges of integrating vision capabilities into LLMs:
First, LLMs are strongly biased, for example, they might (over)prefer the number 7 or certain names like Biden, and that bias comes straight from their training data.

Second, a language bias is usually seen as a blessing that helps models generalize beyond training data but also a curse in vision tasks that demand careful, detailed image analysis.

And third, it turns out that VLMs do not have very good "eyesight" when tested on a test similar to the eye exams for humans. Because of this, VLMs can sometimes behave in ways we don’t expect, which calls for an interface that allows humans to understand the thought process of VLMs. However, there is not yet a natural way to explain VLM decisions on an image like chain of thoughts in text.

I’ll share my proposal for general Explainable Bottleneck, and our implementation of Part-Based Explainable and Editable Bottleneck (PEEB) networks. In fine-grained image classification, PEEB does not only explain its predictions by describing each visual part of an object, but also lets users reprogram the classifier’s logic using natural language---right at test time.

DMS Statistics and Data Science Seminar
Oct 29, 2025 01:00 PM
ONLINE

maweidong

Speaker: Weidong Ma (Univ. of Pennsylvania, Perelman School of Medicine, Biostatistics and Epidemiology)

Title: A Novel Framework for Addressing Disease Under-Diagnosis Using EHR Data.

Abstract: Effective treatment of medical conditions begins with an accurate diagnosis. However, many conditions are often underdiagnosed, either being overlooked or diagnosed after significant delays. Electronic Health Records (EHRs) contain extensive patient health information, offering an opportunity to probabilistically identify underdiagnosed individuals. The rationale is that both diagnosed and underdiagnosed patients may display similar health profiles in EHR data, distinguishing them from condition-free patients. Thus, EHR data can be leveraged to develop models that assess an individual’s risk of having a condition. To date, this opportunity has largely remained unexploited, partly due to the lack of suitable statistical methods. The key challenge is the positive-unlabeled EHR data structure, which consists of data for diagnosed ("positive") patients and the remaining ("unlabeled") that include underdiagnosed patients and many condition-free patients. Therefore, data for patients who are unambiguously condition-free, essential for developing risk assessment models, is unavailable. To overcome this challenge, we propose ascertaining condition statuses for a small subset of unlabeled patients. We develop a novel statistical method for building accurate models using this supplemented EHR data to estimate the probability that a patient has the condition of interest. Building on the developed risk prediction model, we further study the potential factors that may contribute to under-diagnosis. Numerical simulation studies and real data applications are conducted to assess the performance of the proposed methods.

Host: Huan He

SDS seminar’s website: https://auburn.edu/cosam/datascienceseminar/index.htm

DMS Statistics and Data Science Seminar
Oct 22, 2025 01:00 PM
ONLINE

Dmitrii

Speaker: Dmitrii Ostrovskii (Georgia Tech — Math and ISyE)

Title: Near-Optimal and Tractable Estimation under Shift-Invariance

Abstract: How hard is it to estimate a discrete-time signal \((x_1, \dots, x_n) \in \mathbb{C}^n\) satisfying an unknown linear recurrence relation of order s and observed in i.i.d. complex Gaussian noise? The class of all such signals is parametric but extremely rich: It contains all exponential polynomials over \(\mathbb{C}\) with total degree s, including harmonic oscillations with s arbitrary frequencies. Geometrically, this class corresponds to the projection onto \(\mathbb{C}^n\) of the union of all shift-invariant subspaces of \(\mathbb{C}^{Z}\) of dimension s. We show that the statistical complexity of this class, as measured by the squared minimax radius of the (1−δ)-confidence ℓ2-ball, is nearly the same as for the class of s-sparse signals, namely \(O(slog(en)+log(δ−1))⋅log2(es)⋅log(en/s)\). Moreover, the corresponding near-minimax estimator is tractable, and it can be used to build a test statistic with a near-minimax detection threshold in the associated detection problem. These statistical results rest upon an approximation-theoretic one: We show that finite-dimensional shift-invariant subspaces admit compactly supported reproducing kernels whose Fourier spectra have nearly the smallest possible ℓp-norms, for all \(p∈[1,+∞]\) at once.

Host: Haotian Xu

SDS seminar’s website: https://auburn.edu/cosam/datascienceseminar/index.htm

DMS Statistics and Data Science Seminar
Oct 08, 2025 01:00 PM
358 Parker Hall

bo li

Speaker: Prof. Bo Li (Department of Statistics and Data Science, Washington University in St. Louis)

Title: Spatially Varying Changepoint Detection with Application to Mapping the Impact of the Mount Pinatubo Eruption

Abstract: Significant events such as volcanic eruptions can exert global and long-lasting impacts on climate. These impacts, however, are not uniform across space and time. Motivated by the need to understand how the 1991 Mt. Pinatubo eruption influenced global and regional climate, we propose a Bayesian framework to simultaneously detect and estimate spatially varying temporal changepoints. Our approach accounts for the diffusive nature of volcanic effects and leverages spatial correlation. We then extend the changepoint detection problem to large-scale spherical spatiotemporal data and develop a scalable method for global applications. The framework enables Gibbs sampling for changepoints within MCMC, offering greater computational efficiency than the Metropolis–Hastings algorithm. To address the high dimensionality of global data, we incorporate spherical harmonic transformations, which further substantially reduce computational burden while preserving accuracy. We demonstrate the effectiveness of our method using both simulated datasets and real data on stratospheric aerosol optical depth and surface temperature to detect and estimate changepoints associated with the Mt. Pinatubo eruption.

SDS seminar’s website: https://auburn.edu/cosam/datascienceseminar/index.htm

DMS Statistics and Data Science Seminar
Sep 24, 2025 01:00 PM
ZOOM

xiadong

Speaker: Xiaodong Li (University of California, Davis)

Title: Estimating SNR in High-Dimensional Linear Models: Robust REML and a Multivariate Method of Moments

Abstract: This talk presents two complementary approaches to estimating signal-to-noise ratios (and residual variances) in high-dimensional linear models, motivated by heritability analysis. First, I show that the REML estimator remains consistent and asymptotically normal under substantial model misspecification—fixed coefficients and heteroskedastic and possibly correlated errors. Second, I extend a method-of-moments framework to multivariate responses for both fixed- and random-effects models, deriving asymptotic distributions and heteroskedasticity-robust standard-error formulas. Simulations corroborate the theory and demonstrate strong finite-sample performance.

Host: Haoran Li

SDS seminar’s website: https://auburn.edu/cosam/datascienceseminar/index.htm

DMS Statistics and Data Science (SDS) Seminar
Sep 17, 2025 01:00 PM
352 Parker Hall

yin tang

Speaker: Yin Tang (University of Kentucky)

Title: Belted and Ensembled Neural Network for Linear and Nonlinear Sufficient Dimension Reduction

Abstract: We introduce a unified, flexible, and easy-to-implement framework of sufficient dimension reduction that can accommodate both linear and nonlinear dimension reduction, and both the conditional distribution and the conditional mean as the targets of estimation. This unified framework is achieved by a specially structured neural network -- the Belted and Ensembled Neural Network (BENN) -- that consists of a narrow latent layer, which we call the belt, and a family of transformations of the response, which we call the ensemble. By strategically placing the belt at different layers of the neural network, we can achieve linear or nonlinear sufficient dimension reduction, and by choosing the appropriate transformation families, we can achieve dimension reduction for the conditional distribution or the conditional mean. Moreover, thanks to the advantage of the neural network, the method is very fast to compute, overcoming a computation bottleneck of the traditional sufficient dimension reduction estimators, which involves the inversion of a matrix of dimension either p or n. We develop the algorithm and convergence rate of our method, compare it with existing sufficient dimension reduction methods, and apply it to two data examples.

https://arxiv.org/abs/2412.08961

Host: Haotian Xu

SDS seminar’s website: https://auburn.edu/cosam/datascienceseminar/index.htm

More Events...

College of Sciences and Mathematics Homepage

Statistics and Data Science Seminars

In the News