Splash photo of HKU
Conference on Parsimony and Learning (CPAL)
January 2024, HKU

Spotlight Track: Accepted Papers

Presentation Format

Accepted papers will be presented in one of two spotlight poster sessions during the conference.

The ordering of session numbers matches their chronological ordering. See the full program for the precise time and location of each spotlight presentation session.

Spotlight Poster Session 1

Time: Day 2 (Jan 4) – Thursday – 5:00 PM to 6:30 PM

Low-Rank Matrix Completion Theory via Plucker Coordinates

Manolis C. Tsakiris

Keywords: algebraic geometry, Grassmannian, low-rank matrix completion, non-random observation patterns, Plucker coordinates

Variational Information Pursuit for Interpretable Predictions

Aditya Chattopadhyay, Kwan Ho Ryan Chan, Benjamin David Haeffele, Donald Geman, Rene Vidal

Keywords: Interpretable ML, Explainable AI, Information Pursuit

Classification Bias on a Data Diet

Tejas Pote, Mohammed Adnan, Yigit Yargic, Yani Ioannou

Keywords: data diet, model bias, classification bias, data pruning

Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity

Lu Yin, Shiwei Liu, AJAY KUMAR JAISWAL, Souvik Kundu, Zhangyang Wang

Keywords: Junk DNA Hypothesis, low-magnitude weights, large-scale language models

FedNAR: Federated Optimization with Normalized Annealing Regularization

Junbo Li, Ang Li, Chong Tian, Qirong Ho, Eric Xing, Hongyi Wang

Keywords: Federated learning, weight decay, adaptive hyperparameters

Model Compression Beyond Size Reduction

Mubarek Mohammed

Keywords: Knowledge Distillation, Pruning, Model Compression, Neural Networks

The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers

Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat, Sashank Reddi, Ke Ye, Felix Chern, Felix Yu, Ruiqi Guo, Sanjiv Kumar

Keywords: Transformer efficiency, activation sparsity, robustness, calibration

Block Coordinate Descent on Smooth Manifolds: Convergence Theory and Twenty-One Examples

Liangzu Peng, Rene Vidal

Keywords: Block Coordinate Descent, Alternating Minimization, Non-Convex Optimization, Manifold Optimization, Convergence Analysis

Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks

Feng Chen, Daniel Kunin, Atsushi Yamamura, Surya Ganguli

Keywords: Implicit Bias, sparsity, SGD Dynamics, Implicit regularization, Learning rate schedule, Stochastic Gradient Descent, Invariant set, Attractive saddle points, Stochastic collapse, Permutation invariance, Simplicity bias, Teacher-student

Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data

Zhiwei Xu, Yutong Wang, Spencer Frei, Gal Vardi, Wei Hu

Keywords: Grokking, benign overfitting, deep learning

Neural Dependencies Emerging from Learning Massive Categories

Ruili Feng, Deli Zhao, Zheng-Jun Zha

Keywords: Deep Learning Theory, Interpretability

SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization

Xixu HU

Keywords: Vision Transformer, Adversarial Robustness, Lipschitz Continuity, Computer Vision

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

Zichang Liu, Jue WANG, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen

Keywords: Large language Model; Efficient Inference; Sparsity

Towards a Better Theoretical Understanding of Independent Subnetwork Training

Egor Shulgin, Peter Richtárik

Keywords: Optimization, Distributed Learning, Independent Subnetwork Training, Federated Learning

Sparsity-aware generalization theory for deep neural networks

Ramchandran Muthukumar, Jeremias Sulam

Keywords: Generalization, Sparsity, Sensitivity, PAC-Bayes

GMRLNet: A graph-based manifold regularization learning framework for placental insufficiency diagnosis on incomplete multimodal ultrasound data

Jing Jiao, Huang Yi FDU, LiXiaokang, Yi Guo

Keywords: Manifold regularization learning, Incomplete multimodal learning, graph neural network, knowledge transfer, prenatal diagnosis

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

Lu Yin, You Wu, Zhenyu Zhang, Cheng-Yu Hsieh, Yaqing Wang, Yiling Jia, Mykola Pechenizkiy, Yi Liang, Zhangyang Wang, Shiwei Liu

Keywords: Large language model, pruning, sparsity

Profiling and Pairing Catchments and Hydrological Models With Latent Factor Model

Yang Yang, Ting Fong May Chui

Keywords: Hydrological modeling, latent factor model, recommender system, machine learning

Model Sparsity Can Simplify Machine Unlearning

Jinghan Jia, Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, Pranay Sharma, Sijia Liu

Keywords: Truthworthy AI, Sparsity, Privacy

Alternating Updates for Efficient Transformers

Cenk Baykal, Dylan J Cutler, Nishanth Dikkala, Nikhil Ghosh, Rina Panigrahy, Xin Wang

Keywords: efficiency, efficient transformers

Invariant Low-Dimensional Subspaces in Gradient Descent for Learning Deep Linear Networks

Can Yaras, Peng Wang, Wei Hu, Zhihui Zhu, Laura Balzano, Qing Qu

Keywords: implicit bias, low dimensional structures, deep linear networks

Dynamic Sparsity Is Channel-Level Sparsity Learner

Lu Yin, Gen Li, Meng Fang, Li Shen, Tianjin Huang, Zhangyang Wang, Vlado Menkovski, Xiaolong Ma, Mykola Pechenizkiy, Shiwei Liu

Keywords: dynamic sparsity, dynamic sparse training, channel-level sparsity

Three-way trade-off in multi-objective learning: Optimization, generalization and conflict-avoidance

Lisha Chen, Heshan Devaka Fernando, Yiming Ying, Tianyi Chen

Keywords: multi-objective learning, generalization, algorithm stability, stochastic optimization

Sparse MoE with Language Guided Routing for Multilingual Machine Translation

Xinyu Zhao, Xuxi Chen, Yu Cheng, Tianlong Chen

Keywords: Sparse Mixture-of-Experts, Multilingual Machine Translation, Language Guided Routing

The Emergence of Reproducibility and Consistency in Diffusion Models

Huijie Zhang, Jinfan Zhou, Yifu Lu, Minzhe Guo, Liyue Shen, Qing Qu

Keywords: Diffusion model; Consistent model reproducibility; Phenomenon; Uniquely identifiable encoding

Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective

Wei Huang, Yuan Cao, Haonan Wang, Xin Cao, Taiji Suzuki

Keywords: Graph Neural Network, Feature Learning, Graph Convolution, Deep Learning Theory, Benign Overfitting

Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks

Kaiqi Zhang, Zixuan Zhang, Minshuo Chen, Yuma Takeda, Mengdi Wang, Tuo Zhao, Yu-Xiang Wang

Keywords: Nonparametric Classification; Low Dimensional Manifolds; Overparameterized ResNets; Function Approximation

$H_2O$: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Re, Clark Barrett, Zhangyang Wang, Beidi Chen

Keywords: Large Language Models; Efficient Generative Inference

Sparse Mixture-of-Experts are Domain Generalizable Learners

Bo Li, Yifei Shen, Jingkang Yang, Yezhen Wang, Jiawei Ren, Tong Che, Jun Zhang, Ziwei Liu

Keywords: domain generalization, mixture-of-experts, algorithmic alignment, visual attributes

Principled and Efficient Transfer Learning of Deep Models via Neural Collapse

Xiao Li, Sheng Liu, Jinxin Zhou, Xinyu Lu, Carlos Fernandez-Granda, Zhihui Zhu, Qing Qu

Keywords: representation learning, neural collapse, transfer learning

Spotlight Poster Session 2

Time: Day 3 (Jan 5) – Friday – 5:00 PM to 6:30 PM

Sparsity Enhances Non-Gaussian Data Statistics During Local Receptive Field Formation

William T Redman, Zhangyang Wang, Alessandro Ingrosso, Sebastian Goldt

Keywords: iterative magnitude pruning, sparse machine learning, statistics of internal representations, learning local receptive fields

Efficient Low-Dimensional Compression of Overparameterized Networks

Soo Min Kwon, Zekai Zhang, Dogyoon Song, Laura Balzano, Qing Qu

Keywords: overparameterization, deep networks, low-dimensional modeling

Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer

Yuandong Tian, Yiping Wang, Beidi Chen, Simon Shaolei Du

Keywords: transformer, training dynamics, theoretical analysis, self-attention, interpretability, neural network understanding

High Probability Guarantees for Random Reshuffling

Hengxu Yu, Xiao Li

Keywords: random reshuffling, shuffled SGD, high-probability sample complexity, stopping criterion, the last iteration result

A Linearly Convergent GAN Inversion-based Algorithm for Reverse Engineering of Deceptions

Darshan Thaker, Paris Giampouras, Rene Vidal

Keywords: reverse engineering deceptions, GAN inversion, optimization, adversarial attacks, generative models, inverse problems

Simultaneous linear connectivity of neural networks modulo permutation

Ekansh Sharma, Devin Kwok, tom denton, Daniel M. Roy, David Rolnick, Gintare Karolina Dziugaite

Keywords: linear mode connectivity, loss landscape, permutation symmetry, iterative magnitude pruning, lottery ticket

Accurate Neural Network Pruning Requires Rethinking Sparse Optimization

Denis Kuznedelev, Eldar Kurtic, Eugenia Iofinova, Elias Frantar, Alexandra Peste, Dan Alistarh

Keywords: Efficient ML, Pruning, optimization, sparsity

Dynamic Sparse Training with Structured Sparsity

Mike Lasby, Anna Golubeva, Utku Evci, Mihai Nica, Yani Ioannou

Keywords: Machine Learning, dynamic sparse training, structured sparsity, N:M sparsity, efficient deep learning, RigL, SRigL, constant fan-in, dynamic neuron ablation, neuron ablation, structured and fine-grained sparsity, online inference, accelerating inference

Ultrafast Neural Estimation of Mutual Information

Zhengyang Hu, Song Kang, Qunsong Zeng, Kaibin Huang, Yanchao Yang

Keywords: Deep Learning, Efficient Mutual Information Estimation, Real-Time Correlation Computation, Maximum Correlation Coefficient

Understanding Hierarchical Representations in Deep Networks via Feature Compression and Discrimination

Peng Wang, Xiao Li, Can Yaras, Zhihui Zhu, Laura Balzano, Wei Hu, Qing Qu

Keywords: representation learning; neural collapse; deep linear networks

Deep Neural Network Initialization with Sparsity Inducing Activations

Ilan Price, Nicholas Daultry Ball, Adam Christopher Jones, Samuel Chun Hei Lam, Jared Tanner

Keywords: Deep neural network, random initialisation, sparsity, gaussian process

How Structured Data Guides Feature Learning: A Case Study of Sparse Parity Problem

Atsushi Nitanda, Kazusato Oko, Taiji Suzuki, Denny Wu

Keywords: neural network optimization, representation learning, mean-field Langevin dynamics

The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter

AJAY KUMAR JAISWAL, Shiwei Liu, Tianlong Chen, Zhangyang Wang

Keywords: Pre-trained Models, Sparsity, Emergence, Transformers, Pruning

Divided Attention: Unsupervised Multiple-object Discovery and Segmentation with Interpretable Contextually Separated Slots

Dong Lao, Zhengyang Hu, Francesco Locatello, Yanchao Yang, Stefano Soatto

Keywords: Moving object segmentation, Slot attention, Unsupervised object discovery

Compressing LLMs: The Truth is Rarely Pure and Never Simple

AJAY KUMAR JAISWAL, Zhe Gan, Xianzhi Du, Bowen Zhang, Zhangyang Wang, Yinfei Yang

Keywords: Compression, Large Language Models, Pruning, Quantization

On Bias-Variance Alignment in Deep Models

Lin Chen, Michal Lukasik, Wittawat Jitkrittum, Chong You, Sanjiv Kumar

Keywords: bias-variance decomposition, ensemble, deep learning

Canonical Factors for Hybrid Neural Fields

Brent Yi, Weijia Zeng, Sam Buchanan, Yi Ma

Keywords: 3d representation learning, neural fields, NeRF, voxel grids, invariance, non-convex optimization

On Separability of Covariance in Multiway Data Analysis

Dogyoon Song, Alfred Hero

Keywords: Multiway data, Separable covariance, Kronecker PCA, Low-rank covariance model, Tensor decomposition, Frank-Wolfe method

Low Complexity Homeomorphic Projection to Ensure Neural-Network Solution Feasibility for Optimization over (Non-)Convex Set

Enming Liang, Minghua Chen, Steven Low

Keywords: Constraint optimization, Feasibility, Neural Network, Homeomorphism, Invertible Neural Network, Projection

The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

Tian Jin, Nolan Clement, Xin Dong, Vaishnavh Nagarajan, Michael Carbin, Jonathan Ragan-Kelley, Gintare Karolina Dziugaite

Keywords: large language model, scaling, pruning, sparsity

Generalized Neural Collapse for A Large Number of Classes

Jiachen Jiang, Jinxin Zhou, Peng Wang, Qing Qu, Dustin G. Mixon, Chong You, Zhihui Zhu

Keywords: Neural Collapse, Tammes Problem, Sphere Packing, Deep Learning

Sparse MoE as a New Treatment: Addressing Forgetting, Fitting, Learning Issues in Multi-Modal Multi-Task Learning

Jie Peng, Kaixiong Zhou, Ruida Zhou, Thomas Hartvigsen, Yanyong Zhang, Zhangyang Wang, Tianlong Chen

Keywords: multi-task learning, multimodal learning, transformer

Solving Inverse Problems with Latent Diffusion Models via Hard Data Consistency

Bowen Song, Soo Min Kwon, Zecheng Zhang, Xinyu Hu, Qing Qu, Liyue Shen

Keywords: inverse problems; latent diffusion models

Unsupervised Manifold Linearizing and Clustering

Tianjiao Ding, Shengbang Tong, Kwan Ho Ryan Chan, Xili Dai, Yi Ma, Benjamin David Haeffele

Keywords: Clustering, Manifold Embedding, Manifold Clustering

Masked Completion via Structured Diffusion with White-Box Transformers

Druv Pai, Ziyang Wu, Sam Buchanan, Tianzhe Chu, Yaodong Yu, Yi Ma

Keywords: masked autoencoding, white-box transformers, coding rate reduction, representation learning

Approximately Equivariant Graph Networks

Ningyuan Teresa Huang, Ron Levie, Soledad Villar

Keywords: graph neural networks, equivariant machine learning, symmetry, generalization, statistical learning

Neural Collapse meets Differential Privacy: Curious behaviors of NoisySGD with Near-Perfect Representation Learning

Chendi Wang, Yuqing Zhu, Weijie J Su, Yu-Xiang Wang

Keywords: Neural collapse, differential privacy, representation learning

JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention

Yuandong Tian, Yiping Wang, Zhenyu Zhang, Beidi Chen, Simon Shaolei Du

Keywords: transformer, training dynamics, theoretical analysis, self-attention, interpretability, neural network understanding

Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective

Jimmy Ba, Murat A Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu

Keywords: random matrix theory, high-dimensional statistics, neural network, kernel method, representation learning

How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization

Nuoya Xiong, Lijun Ding, Simon Shaolei Du

Keywords: non-convex optimization, random initialization, global convergence, matrix recovery, matrix sensing

Robust Physics-based Deep MRI Reconstruction Via Diffusion Purification

Ismail Alkhouri, Shijun Liang, Rongrong Wang, Qing Qu, Saiprasad Ravishankar

Keywords: Robust MRI reconstruction, model-based deep learning, diffusion purification, computational imaging, machine learning

Neural Collapse in Multi-label Learning with Pick-all-label Loss

Pengyu Li, Yutong Wang, Xiao Li, Qing Qu

Keywords: Multi-label learning, Neural Collapse, Representation Learning