Splash photo of Stanford
Conference on Parsimony and Learning (CPAL)
March 2025, Stanford

Spotlight Track: Accepted Papers

Accepted Spotlight Track papers are presented as posters at CPAL 2025. See the full program for the precise time and location of each poster session.

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Dan Qiao, Kaiqi Zhang, Esha Singh, Daniel Soudry, Yu-Xiang Wang

Keywords: Minima Stability, Edge-of-Stability, Generalization, Flat Local Minima, Curvature

Principle Component Trees and their Persistent Homology

Ben Kizaric, Daniel L. Pimentel-Alarcón

Keywords: subspace clustering, low-rank decomposition, unsupervised learning, manifold learning, dimensionality reduction, topological data analysis

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

Pengxiang Li, Lu Yin, Shiwei Liu

Keywords: LayerNorm, LLM, Transformer

Geometric Algebra Planes: Convex Implicit Neural Volumes

Irmak Sivgin, Sara Fridovich-Keil, Gordon Wetzstein, Mert Pilanci

Keywords: Volume representation, tensor decomposition, convex optimization, geometric algebra, nerf

Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing

Peihao Wang, Ruisi Cai, Yuehao Wang, Jiajun Zhu, Pragya Srivastava, Zhangyang Wang, Pan Li

Keywords: State Space Models, Large Language Models, Recency, Over-smoothing

Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding

Jiajun Zhu, Peihao Wang, Ruisi Cai, Jason D. Lee, Pan Li, Zhangyang Wang

Keywords: Positional Encoding, Equivariant Machine Learning, Large Language Models

WHOMP: Optimizing Randomized Controlled Trials via Wasserstein Homogeneity

Shizhou Xu, Thomas Strohmer

Keywords: randomized controlled trial, Wasserstein homogeneity, anti-clustering, diverse K-means, control/test group splitting, cross-validation

Diffusion models learn low-dimensional distributions via subspace clustering

Peng Wang, Huijie Zhang, Zekai Zhang, Siyi Chen, Yi Ma, Qing Qu

Keywords: diffusion models, mixture of low-rank Gaussians, phase transition, subspace clustering

Generative Learning for Solving Non-Convex Problem with Multi-Valued Input-Solution Mapping

Enming Liang, Minghua Chen

Keywords: Non-convex Optimization, Generative Modeling, Flow, ODE

Attention-Only Transformers via Unrolled Subspace Denoising

Peng Wang, Yifu Lu, Yaodong Yu, Druv Pai, Qing Qu, Yi Ma

Keywords: transformer, self-attention, unrolled optimization, subspace denoising

Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence

Berfin Simsek, Amire Bendjeddou, Daniel Hsu

Keywords: time complexity, gradient flow dynamics, hardness

On Generalization Bounds for Neural Networks with Low Rank Layers

Andrea Pinto, Akshay Rangamani, Tomaso A Poggio

Keywords: Gaussian Complexity, Low Rank, Neural Collapse

Simplifying DINO by Coding Rate Regularization

Ziyang Wu, Jingyuan Zhang, Druv Pai, Yi Ma

Keywords: Representation Learning, Self Supervised Learning, Coding Rate

Knowledge-aware Parsimony Learning: A Perspective from Relational Graphs

Quanming Yao, Yongqi Zhang, Yaqing Wang, Nan Yin, James Kwok, Qiang Yang

Keywords: scaling law, Parsimony Learning, Graph Learning

Understanding Diffusion-based Representation Learning via Low-Dimensional Modeling

Xiao Li, Zekai Zhang, Xiang Li, Siyi Chen, Zhihui Zhu, Peng Wang, Qing Qu

Keywords: diffusion representation learning, representation learning, diffusion model

Geometry of Concepts in Next-token Prediction: Neural-Collapse Meets Semantics

Yize Zhao, Christos Thrampoulidis

Keywords: Large Language Models(LLMs), Neural Embeddings, Word Embeddings, Neural-Collapse, Interpretability, Optimization

FlowDAS: A Flow-Based Framework for Data Assimilation

Siyi Chen, Yixuan Jia, Qing Qu, He Sun, Jeffrey A Fessler

Keywords: Data Assimilation, Stochastic Dynamic System, Flow matching, Stochastic Interpolants, Inverse Problem

What’s in a Prior? Learned Proximal Networks for Inverse Problems

Zhenghan Fang, Sam Buchanan, Jeremias Sulam

Keywords: Inverse problems, Proximal operators, Plug-and-play, Explicit regularizer, Convergent PnP, Input convex neural networks

Pruning neural network models for gene regulatory dynamics using data and domain knowledge

Intekhab Hossain, Jonas Fischer, Rebekka Burkholz, John Quackenbush

Keywords: sparsification, pruning, lottery tickets, explainability, gene regulation, domain knowledge, neural architecture design, NeuralODEs

Certified Robustness against Sparse Adversarial Perturbations via Data Localization

Ambar Pal, Rene Vidal, Jeremias Sulam

Keywords: Adversarial Robustness, Certified Robustness, Sparse Perturbations, Data Localization

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Weixin Liang, LILI YU, Liang Luo, Srini Iyer, Ning Dong, Chunting Zhou, Gargi Ghosh, Mike Lewis, Wen-tau Yih, Luke Zettlemoyer, Xi Victoria Lin

Keywords: Sparse architecture, Efficient deep architecture, Multi-modal foundation models, Mixture-of-Experts, Transformer

Provable Probabilistic Imaging using Score-based Generative Priors

Yu Sun, Zihui Wu, Yifan Chen, Berthy Feng, Katherine Bouman

Keywords: Diffusion models, inverse problem, image reconstruction, langevin dynamics, markov processes, plug-and-play priors, posterior sampling, regularized inversion, score-based generative models, uncertainty quantification

DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks

Kaijie Zhu, Jiaao Chen, Jindong Wang, Neil Zhenqiang Gong, Diyi Yang, Xing Xie

Keywords: Large Language Models, Evaluation, Data Contamination

Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry

Mohammed Adnan, Rohan Jain, Ekansh Sharma, Yani Ioannou

Keywords: Lottery Ticket Hypothesis, sparse training, linear mode connectivity, weight symmetry, deep learning, deep neural networks, random initialization, git re-basin, optimization

Shallow Diffuse: Robust and Invisible Watermarking through Low-Dimensional Subspaces in Diffusion Models

Wenda Li, Huijie Zhang, Qing Qu

Keywords: diffusion Model, watermark, low-dimensional subspace, consistency, robustness

A Robust Kernel Statistical Test of Invariance: Detecting Subtle Asymmetries

Ashkan Soleymani, Behrooz Tahmasebi, Stefanie Jegelka, Patrick Jaillet

Keywords: Invariance, Hypothesis Testing, Kernel Methods

WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models

Jinghan Jia, Jiancheng Liu, Yihua Zhang, Parikshit Ram, Nathalie Baracaldo, Sijia Liu

Keywords: Machine Unlearning, LLMs

Masks, Signs, And Learning Rate Rewinding

Advait Gadhikar, Rebekka Burkholz

Keywords: sparsity, pruning, lottery tickets, learning rate rewinding, iterative magnitude pruning

Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs

Tianyu Guo, Druv Pai, Yu Bai, Jiantao Jiao, Michael I. Jordan, Song Mei

Keywords: attention sink, mechanistic interpretability, language models, transformers

Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces

Saket Tiwari, Omer Gottesman, George Konidaris

Keywords: resinforcement learning, continuous control, geometry

Out-of-distribution generalization via composition: a lens through induction heads in Transformers

Jiajun Song, Zhuoyan Xu, Yiqiao Zhong

Keywords: out-of-distribution generalization, low-dimensional subspace, composition, large language models, emergent ability, in-context learning

Dynamic Rescaling for Training GNNs

Nimrah Mustafa, Rebekka Burkholz

Keywords: graph neural network, rescale invariance, generalization, network balance

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Tianzhe Chu, Yuexiang Zhai, Jihan Yang, Shengbang Tong, Saining Xie, Sergey Levine, Yi Ma

Keywords: foundation model post-training

Prior Mismatch and Adaptation in PnP-ADMM with a Nonconvex Convergence Analysis

Shirin Shoushtari, Jiaming Liu, Edward P. Chandler, M. Salman Asif, Ulugbek S. Kamilov

Keywords: Computational Imaging, Plug-and-Play Priors, Imaging Inverse Problems, Mismatched Priors, Domain Adaptation

Relaxed Contrastive Learning for Federated Learning

Seonguk Seo, Jinkyu Kim, Geeho Kim, Bohyung Han

Keywords: dimensional collapse, transferability, federated learning, local deviation

Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity

Weixin Liang, Junhong Shen, Genghan Zhang, Ning Dong, Luke Zettlemoyer, LILI YU

Keywords: Sparse architecture, Efficient deep architecture, Multi-modal foundation models, Mixture-of-Experts, State Space Model

Training Bayesian Neural Networks with Sparse Subspace Variational Inference

Junbo Li, Zichen Miao, Qiang Qiu, Ruqi Zhang

Keywords: Bayesian neural networks, sparse Bayesian learning, variational inference

SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems

Ismail Alkhouri, Shijun Liang, Cheng-Han Huang, Jimmy Dai, Qing Qu, Saiprasad Ravishankar, Rongrong Wang

Keywords: Image Restoration, Diffusion Models, Inverse Problems

Learning with Exact Invariances in Polynomial Time

Ashkan Soleymani, Behrooz Tahmasebi, Stefanie Jegelka, Patrick Jaillet

Keywords: Learning with Invariances, Kernels, Spectral Theory

Dependence Induced Representations

Xiangxiang Xu, Lizhong Zheng

Keywords: representation learning, statistical dependence, maximal correlation, minimal sufficiency, neural collapse

Unlocking Global Optimality in Bilevel Optimization: A Pilot Study

Quan Xiao, Tianyi Chen

Keywords: bilevel optimization; global convergence

Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations

Yize Zhao, Tina Behnia, Vala Vakilian, Christos Thrampoulidis

Keywords: language models, neural embeddings, optimization, implicit regularization, low-rank matrix factorization, support-vector machines

CompeteAI: Understanding the Competition Dynamics of Large Language Model-based Agents

Qinlin Zhao, Jindong Wang, Yixuan Zhang, Yiqiao Jin, Kaijie Zhu, Hao Chen, Xing Xie

Keywords: LLM-based Agent, Agent Based Modeling, Competition

Image Reconstruction Via Autoencoding Sequential Deep Image Prior

Ismail Alkhouri, Shijun Liang, Evan Bell, Qing Qu, Rongrong Wang, Saiprasad Ravishankar

Keywords: Image Reconstruction, Deep Image Prior, Generative Models

Understanding How Nonlinear Networks Create Linearly Separable Features for Low-Dimensional Data

Alec S Xu, Can Yaras, Peng Wang, Qing Qu

Keywords: union of subspaces, shallow nonlinear networks, random feature model

On the Crucial Role of Initialization for Matrix Factorization

Bingcong Li, Liang Zhang, Aryan Mokhtari, Niao He

Keywords: nonconvex optimization, initialization, quadratic rate, low rank adapter, lora

Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity

Dominik Stöger, Yizhe Zhu

Keywords: non-convex optimization, factorized gradient descent, matrix sensing, sample complexity, virtual sequences

Deep Neural Regression Collapse

Akshay Rangamani, Altay Unal

Keywords: Neural Collapse, Regression, Low Rank

Visual Prompting Reimagined: The Power of Activation Prompts

Yihua Zhang, Hongkang Li, Yuguang Yao, Aochuan Chen, Shuai Zhang, Pin-Yu Chen, Meng Wang, Sijia Liu

Keywords: visual prompt, parameter efficient finetuning, learning theory, generalization analysis

Primal-Dual Spectral Representation for Off-policy Evaluation

Yang Hu, Tianyi Chen, Na Li, Kai Wang, Bo Dai

Keywords: reinforcement learning, off-policy evaluation, spectral representation, primal-dual representation

Learning Dynamics of Deep Matrix Factorization Beyond the Edge of Stability

Avrajit Ghosh, Soo Min Kwon, Rongrong Wang, Saiprasad Ravishankar, Qing Qu

Keywords: edge of stability, deep linear networks

Characterizing ResNet’s Universal Approximation Capability

Chenghao Liu, Enming Liang, Minghua Chen

Keywords: universal approximation, ResNet, optimal approximation rate