Spotlight Track: Accepted Papers
Presentation Format
Accepted papers will be presented in one of two spotlight poster sessions during the conference.
The ordering of session numbers matches their chronological ordering. See the full program for the precise time and location of each spotlight presentation session.
Spotlight Poster Session 1
Time: Day 2 (Jan 4) – Thursday – 5:00 PM to 6:30 PM
Low-Rank Matrix Completion Theory via Plucker Coordinates
Manolis C. Tsakiris
Keywords: algebraic geometry, Grassmannian, low-rank matrix completion, non-random observation patterns, Plucker coordinates
Variational Information Pursuit for Interpretable Predictions
Aditya Chattopadhyay, Kwan Ho Ryan Chan, Benjamin David Haeffele, Donald Geman, Rene Vidal
Keywords: Interpretable ML, Explainable AI, Information Pursuit
Classification Bias on a Data Diet
Tejas Pote, Mohammed Adnan, Yigit Yargic, Yani Ioannou
Keywords: data diet, model bias, classification bias, data pruning
Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity
Lu Yin, Shiwei Liu, AJAY KUMAR JAISWAL, Souvik Kundu, Zhangyang Wang
Keywords: Junk DNA Hypothesis, low-magnitude weights, large-scale language models
FedNAR: Federated Optimization with Normalized Annealing Regularization
Junbo Li, Ang Li, Chong Tian, Qirong Ho, Eric Xing, Hongyi Wang
Keywords: Federated learning, weight decay, adaptive hyperparameters
Model Compression Beyond Size Reduction
Mubarek Mohammed
Keywords: Knowledge Distillation, Pruning, Model Compression, Neural Networks
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers
Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat, Sashank Reddi, Ke Ye, Felix Chern, Felix Yu, Ruiqi Guo, Sanjiv Kumar
Keywords: Transformer efficiency, activation sparsity, robustness, calibration
Block Coordinate Descent on Smooth Manifolds: Convergence Theory and Twenty-One Examples
Liangzu Peng, Rene Vidal
Keywords: Block Coordinate Descent, Alternating Minimization, Non-Convex Optimization, Manifold Optimization, Convergence Analysis
Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks
Feng Chen, Daniel Kunin, Atsushi Yamamura, Surya Ganguli
Keywords: Implicit Bias, sparsity, SGD Dynamics, Implicit regularization, Learning rate schedule, Stochastic Gradient Descent, Invariant set, Attractive saddle points, Stochastic collapse, Permutation invariance, Simplicity bias, Teacher-student
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
Zhiwei Xu, Yutong Wang, Spencer Frei, Gal Vardi, Wei Hu
Keywords: Grokking, benign overfitting, deep learning
Neural Dependencies Emerging from Learning Massive Categories
Ruili Feng, Deli Zhao, Zheng-Jun Zha
Keywords: Deep Learning Theory, Interpretability
SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization
Xixu HU
Keywords: Vision Transformer, Adversarial Robustness, Lipschitz Continuity, Computer Vision
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Zichang Liu, Jue WANG, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen
Keywords: Large language Model; Efficient Inference; Sparsity
Towards a Better Theoretical Understanding of Independent Subnetwork Training
Egor Shulgin, Peter Richtárik
Keywords: Optimization, Distributed Learning, Independent Subnetwork Training, Federated Learning
Sparsity-aware generalization theory for deep neural networks
Ramchandran Muthukumar, Jeremias Sulam
Keywords: Generalization, Sparsity, Sensitivity, PAC-Bayes
GMRLNet: A graph-based manifold regularization learning framework for placental insufficiency diagnosis on incomplete multimodal ultrasound data
Jing Jiao, Huang Yi FDU, LiXiaokang, Yi Guo
Keywords: Manifold regularization learning, Incomplete multimodal learning, graph neural network, knowledge transfer, prenatal diagnosis
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Lu Yin, You Wu, Zhenyu Zhang, Cheng-Yu Hsieh, Yaqing Wang, Yiling Jia, Mykola Pechenizkiy, Yi Liang, Zhangyang Wang, Shiwei Liu
Keywords: Large language model, pruning, sparsity
Profiling and Pairing Catchments and Hydrological Models With Latent Factor Model
Yang Yang, Ting Fong May Chui
Keywords: Hydrological modeling, latent factor model, recommender system, machine learning
Model Sparsity Can Simplify Machine Unlearning
Jinghan Jia, Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, Pranay Sharma, Sijia Liu
Keywords: Truthworthy AI, Sparsity, Privacy
Alternating Updates for Efficient Transformers
Cenk Baykal, Dylan J Cutler, Nishanth Dikkala, Nikhil Ghosh, Rina Panigrahy, Xin Wang
Keywords: efficiency, efficient transformers
Invariant Low-Dimensional Subspaces in Gradient Descent for Learning Deep Linear Networks
Can Yaras, Peng Wang, Wei Hu, Zhihui Zhu, Laura Balzano, Qing Qu
Keywords: implicit bias, low dimensional structures, deep linear networks
Dynamic Sparsity Is Channel-Level Sparsity Learner
Lu Yin, Gen Li, Meng Fang, Li Shen, Tianjin Huang, Zhangyang Wang, Vlado Menkovski, Xiaolong Ma, Mykola Pechenizkiy, Shiwei Liu
Keywords: dynamic sparsity, dynamic sparse training, channel-level sparsity
Three-way trade-off in multi-objective learning: Optimization, generalization and conflict-avoidance
Lisha Chen, Heshan Devaka Fernando, Yiming Ying, Tianyi Chen
Keywords: multi-objective learning, generalization, algorithm stability, stochastic optimization
Sparse MoE with Language Guided Routing for Multilingual Machine Translation
Xinyu Zhao, Xuxi Chen, Yu Cheng, Tianlong Chen
Keywords: Sparse Mixture-of-Experts, Multilingual Machine Translation, Language Guided Routing
The Emergence of Reproducibility and Consistency in Diffusion Models
Huijie Zhang, Jinfan Zhou, Yifu Lu, Minzhe Guo, Liyue Shen, Qing Qu
Keywords: Diffusion model; Consistent model reproducibility; Phenomenon; Uniquely identifiable encoding
Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective
Wei Huang, Yuan Cao, Haonan Wang, Xin Cao, Taiji Suzuki
Keywords: Graph Neural Network, Feature Learning, Graph Convolution, Deep Learning Theory, Benign Overfitting
Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks
Kaiqi Zhang, Zixuan Zhang, Minshuo Chen, Yuma Takeda, Mengdi Wang, Tuo Zhao, Yu-Xiang Wang
Keywords: Nonparametric Classification; Low Dimensional Manifolds; Overparameterized ResNets; Function Approximation
$H_2O$: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Re, Clark Barrett, Zhangyang Wang, Beidi Chen
Keywords: Large Language Models; Efficient Generative Inference
Sparse Mixture-of-Experts are Domain Generalizable Learners
Bo Li, Yifei Shen, Jingkang Yang, Yezhen Wang, Jiawei Ren, Tong Che, Jun Zhang, Ziwei Liu
Keywords: domain generalization, mixture-of-experts, algorithmic alignment, visual attributes
Principled and Efficient Transfer Learning of Deep Models via Neural Collapse
Xiao Li, Sheng Liu, Jinxin Zhou, Xinyu Lu, Carlos Fernandez-Granda, Zhihui Zhu, Qing Qu
Keywords: representation learning, neural collapse, transfer learning
Spotlight Poster Session 2
Time: Day 3 (Jan 5) – Friday – 5:00 PM to 6:30 PM
Sparsity Enhances Non-Gaussian Data Statistics During Local Receptive Field Formation
William T Redman, Zhangyang Wang, Alessandro Ingrosso, Sebastian Goldt
Keywords: iterative magnitude pruning, sparse machine learning, statistics of internal representations, learning local receptive fields
Efficient Low-Dimensional Compression of Overparameterized Networks
Soo Min Kwon, Zekai Zhang, Dogyoon Song, Laura Balzano, Qing Qu
Keywords: overparameterization, deep networks, low-dimensional modeling
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Yuandong Tian, Yiping Wang, Beidi Chen, Simon Shaolei Du
Keywords: transformer, training dynamics, theoretical analysis, self-attention, interpretability, neural network understanding
High Probability Guarantees for Random Reshuffling
Hengxu Yu, Xiao Li
Keywords: random reshuffling, shuffled SGD, high-probability sample complexity, stopping criterion, the last iteration result
A Linearly Convergent GAN Inversion-based Algorithm for Reverse Engineering of Deceptions
Darshan Thaker, Paris Giampouras, Rene Vidal
Keywords: reverse engineering deceptions, GAN inversion, optimization, adversarial attacks, generative models, inverse problems
Simultaneous linear connectivity of neural networks modulo permutation
Ekansh Sharma, Devin Kwok, tom denton, Daniel M. Roy, David Rolnick, Gintare Karolina Dziugaite
Keywords: linear mode connectivity, loss landscape, permutation symmetry, iterative magnitude pruning, lottery ticket
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization
Denis Kuznedelev, Eldar Kurtic, Eugenia Iofinova, Elias Frantar, Alexandra Peste, Dan Alistarh
Keywords: Efficient ML, Pruning, optimization, sparsity
Dynamic Sparse Training with Structured Sparsity
Mike Lasby, Anna Golubeva, Utku Evci, Mihai Nica, Yani Ioannou
Keywords: Machine Learning, dynamic sparse training, structured sparsity, N:M sparsity, efficient deep learning, RigL, SRigL, constant fan-in, dynamic neuron ablation, neuron ablation, structured and fine-grained sparsity, online inference, accelerating inference
Ultrafast Neural Estimation of Mutual Information
Zhengyang Hu, Song Kang, Qunsong Zeng, Kaibin Huang, Yanchao Yang
Keywords: Deep Learning, Efficient Mutual Information Estimation, Real-Time Correlation Computation, Maximum Correlation Coefficient
Understanding Hierarchical Representations in Deep Networks via Feature Compression and Discrimination
Peng Wang, Xiao Li, Can Yaras, Zhihui Zhu, Laura Balzano, Wei Hu, Qing Qu
Keywords: representation learning; neural collapse; deep linear networks
Deep Neural Network Initialization with Sparsity Inducing Activations
Ilan Price, Nicholas Daultry Ball, Adam Christopher Jones, Samuel Chun Hei Lam, Jared Tanner
Keywords: Deep neural network, random initialisation, sparsity, gaussian process
How Structured Data Guides Feature Learning: A Case Study of Sparse Parity Problem
Atsushi Nitanda, Kazusato Oko, Taiji Suzuki, Denny Wu
Keywords: neural network optimization, representation learning, mean-field Langevin dynamics
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
AJAY KUMAR JAISWAL, Shiwei Liu, Tianlong Chen, Zhangyang Wang
Keywords: Pre-trained Models, Sparsity, Emergence, Transformers, Pruning
Divided Attention: Unsupervised Multiple-object Discovery and Segmentation with Interpretable Contextually Separated Slots
Dong Lao, Zhengyang Hu, Francesco Locatello, Yanchao Yang, Stefano Soatto
Keywords: Moving object segmentation, Slot attention, Unsupervised object discovery
Compressing LLMs: The Truth is Rarely Pure and Never Simple
AJAY KUMAR JAISWAL, Zhe Gan, Xianzhi Du, Bowen Zhang, Zhangyang Wang, Yinfei Yang
Keywords: Compression, Large Language Models, Pruning, Quantization
On Bias-Variance Alignment in Deep Models
Lin Chen, Michal Lukasik, Wittawat Jitkrittum, Chong You, Sanjiv Kumar
Keywords: bias-variance decomposition, ensemble, deep learning
Canonical Factors for Hybrid Neural Fields
Brent Yi, Weijia Zeng, Sam Buchanan, Yi Ma
Keywords: 3d representation learning, neural fields, NeRF, voxel grids, invariance, non-convex optimization
On Separability of Covariance in Multiway Data Analysis
Dogyoon Song, Alfred Hero
Keywords: Multiway data, Separable covariance, Kronecker PCA, Low-rank covariance model, Tensor decomposition, Frank-Wolfe method
Low Complexity Homeomorphic Projection to Ensure Neural-Network Solution Feasibility for Optimization over (Non-)Convex Set
Enming Liang, Minghua Chen, Steven Low
Keywords: Constraint optimization, Feasibility, Neural Network, Homeomorphism, Invertible Neural Network, Projection
The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning
Tian Jin, Nolan Clement, Xin Dong, Vaishnavh Nagarajan, Michael Carbin, Jonathan Ragan-Kelley, Gintare Karolina Dziugaite
Keywords: large language model, scaling, pruning, sparsity
Generalized Neural Collapse for A Large Number of Classes
Jiachen Jiang, Jinxin Zhou, Peng Wang, Qing Qu, Dustin G. Mixon, Chong You, Zhihui Zhu
Keywords: Neural Collapse, Tammes Problem, Sphere Packing, Deep Learning
Sparse MoE as a New Treatment: Addressing Forgetting, Fitting, Learning Issues in Multi-Modal Multi-Task Learning
Jie Peng, Kaixiong Zhou, Ruida Zhou, Thomas Hartvigsen, Yanyong Zhang, Zhangyang Wang, Tianlong Chen
Keywords: multi-task learning, multimodal learning, transformer
Solving Inverse Problems with Latent Diffusion Models via Hard Data Consistency
Bowen Song, Soo Min Kwon, Zecheng Zhang, Xinyu Hu, Qing Qu, Liyue Shen
Keywords: inverse problems; latent diffusion models
Unsupervised Manifold Linearizing and Clustering
Tianjiao Ding, Shengbang Tong, Kwan Ho Ryan Chan, Xili Dai, Yi Ma, Benjamin David Haeffele
Keywords: Clustering, Manifold Embedding, Manifold Clustering
Masked Completion via Structured Diffusion with White-Box Transformers
Druv Pai, Ziyang Wu, Sam Buchanan, Tianzhe Chu, Yaodong Yu, Yi Ma
Keywords: masked autoencoding, white-box transformers, coding rate reduction, representation learning
Approximately Equivariant Graph Networks
Ningyuan Teresa Huang, Ron Levie, Soledad Villar
Keywords: graph neural networks, equivariant machine learning, symmetry, generalization, statistical learning
Neural Collapse meets Differential Privacy: Curious behaviors of NoisySGD with Near-Perfect Representation Learning
Chendi Wang, Yuqing Zhu, Weijie J Su, Yu-Xiang Wang
Keywords: Neural collapse, differential privacy, representation learning
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Yuandong Tian, Yiping Wang, Zhenyu Zhang, Beidi Chen, Simon Shaolei Du
Keywords: transformer, training dynamics, theoretical analysis, self-attention, interpretability, neural network understanding
Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective
Jimmy Ba, Murat A Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu
Keywords: random matrix theory, high-dimensional statistics, neural network, kernel method, representation learning
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization
Nuoya Xiong, Lijun Ding, Simon Shaolei Du
Keywords: non-convex optimization, random initialization, global convergence, matrix recovery, matrix sensing
Robust Physics-based Deep MRI Reconstruction Via Diffusion Purification
Ismail Alkhouri, Shijun Liang, Rongrong Wang, Qing Qu, Saiprasad Ravishankar
Keywords: Robust MRI reconstruction, model-based deep learning, diffusion purification, computational imaging, machine learning
Neural Collapse in Multi-label Learning with Pick-all-label Loss
Pengyu Li, Yutong Wang, Xiao Li, Qing Qu
Keywords: Multi-label learning, Neural Collapse, Representation Learning