Conference on Parsimony and Learning (CPAL)

March 2025, Stanford

Proceedings Track: Accepted Papers

Accepted Proceedings Track papers are presented as posters at CPAL 2025. A select number of accepted Proceedings Track papers will be presented as orals; they are labeled below with (Oral). See the full program for the precise time and location of each oral and poster session.

Towards Vector Optimization on Low-Dimensional Vector Symbolic Architecture

Shijin Duan, Yejia Liu, Gaowen Liu, Ramana Rao Kompella, Shaolei Ren, Xiaolin Xu

Keywords: Vector Symbolic Architecture, Batch Normalization, Knowledge Distillation

SGD with Weight Decay Secretly Minimizes the Ranks of Your Neural Networks

Tomer Galanti, Zachary S Siegel, Aparna Gupte, Tomaso A Poggio

Keywords: Low-Rank, SGD, Implicit Bias, Rank, Rank Minimization, Weight Decay

Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning

Can Yaras, Siyi Chen, Peng Wang, Qing Qu

Keywords: multimodal learning, modality gap, contrastive learning

Collaborative and Efficient Personalization with Mixtures of Adaptors

Abdulla Jasem Almansoori, Samuel Horváth, Martin Takáč

Keywords: federated learning, personalization, multi-task learning, clustering, parameter-efficient

Are all layers created equal: A neural collapse perspective

Jinxin Zhou, Jiachen Jiang, Zhihui Zhu

Keywords: Deep Learning, Neural Collapse, Robustness, Generalization, Memorization, Understanding

White-box Error Correction Code Transformer

Ziyan Zheng, Chin Wa Lau, Nian Guo, Xiang Shi, Shao-Lun Huang

Keywords: Error Correction Codes, Neural Decoder, White-box Transformer, Sparse Rate Reduction, Tanner Graph

On How Iterative Magnitude Pruning Discovers Local Receptive Fields in Fully Connected Neural Networks

William T Redman, Zhangyang Wang, Alessandro Ingrosso, Sebastian Goldt

Keywords: iterative magnitude pruning, lottery tickets, sparse machine learning, gaussian statistics

Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets (Oral)

Arthur Jacot, Alexandre Kaiser

Keywords: Low-rank bias, NeuralODE, Hamiltonian, Bottleneck structure

Streaming Kernel PCA Algorithm With Small Space

Yichuan Deng, Jiangxuan Long, Zhao Song, Zifan Wang, Han Zhang

Keywords: Principal Component Analysis, Kernel Method, Streaming Algorithm

Sufficient and Necessary Explanations (and What Lies in Between) (Oral)

Beepul Bharti, Paul Yi, Jeremias Sulam

Keywords: interpretability, explainability

Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers (Oral)

Abhimanyu Rajeshkumar Bambhaniya, Amir Yazdanbakhsh, Suvinay Subramanian, Sheng-Chun Kao, Shivani Agrawal, Utku Evci, Tushar Krishna

Keywords: N:M structured sparsity, sparsity, model compression, attention-based models, sparse training recipe

AgentHPO: Large Language Model Agent for Hyper-Parameter Optimization

Siyi Liu, Chen Gao, Yong Li

Keywords: Large Language Models, Agent, Hyperparameter Optimization

Jie Peng, Sukwon Yun, Kaixiong Zhou, Ruida Zhou, Thomas Hartvigsen, Yanyong Zhang, Zhangyang Wang, Tianlong Chen

Keywords: transformer, sparse mixture-of-experts, multi-modal learning, multi-task learning

Exact and Rich Feature Learning Dynamics of Two-Layer Linear Networks

Wei Huang, Wuyang Chen, zhiqiang xu, Zhangyang Wang, Taiji Suzuki

Keywords: Neural networks dyanmics, Feature Learning, Optimization

Vanishing Feature: Diagnosing Model Merging and Beyond (Oral)

Xingyu Qu, Samuel Horváth

Keywords: Model Merging, Efficiency, Deep Learning, Efficient Deep Learning

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

Zhenyu Zhang, AJAY KUMAR JAISWAL, Lu Yin, Shiwei Liu, Jiawei Zhao, Yuandong Tian, Zhangyang Wang

Keywords: Large Language Models; Memory Efficient Training; Low Rank

Enhancing Video Representation Learning with Temporal Differentiation

Siyi Chen, Minkyu Choi, Zesen Zhao, Kuan Han, Qing Qu, Zhongming Liu

Keywords: video representation learning, physics-inspired

FedOSAA: Improving Federated Learning with One-Step Anderson Acceleration

Xue Feng, M. Paul Laiu, Thomas Strohmer

Keywords: federated learning, quasi-Newton methods, Anderson acceleration

Closure Discovery for Coarse-Grained Partial Differential Equations Using Grid-based Reinforcement Learning (Oral)

Jan-Philipp von Bassewitz, Sebastian Kaltenbach, Petros Koumoutsakos

Keywords: Closure Discovery, Inductive Bias, Multi-Agent Reinforcement Learning

Fast and Efficient Matching Algorithm with Deadline Instances

Zhao Song, Weixin Wang, Chenbo Yin, Junze Yin

Keywords: online weighted matching problem, sketching

Learning Effective Dynamics across Spatio-Temporal Scales of Complex Flows

Han Gao, Sebastian Kaltenbach, Petros Koumoutsakos

Keywords: Learned Effective Dynamics, Reduced-Order Modeling, Multiscale Systems, Turbulent Flows

RecCrysFormer: Refined Protein Structural Prediction from 3D Patterson Maps via Recycling Training Runs

Tom Pan, Evan Dramko, Mitchell D. Miller, George N Phillips Jr., Anastasios Kyrillidis

Keywords: Protein Structural Prediction, Transformers, Patterson Maps

Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization

DONGWEI WANG, Huanrui Yang

Keywords: LLM quantization, Hessian trace, Noise-aware finetuning

Adversarially Robust Spiking Neural Networks with Sparse Connectivity

Mathias Schmolli, Maximilian Baronig, Robert Legenstein, Ozan Ozdenizci

Keywords: adversarial robustness, spiking neural networks, ANN-to-SNN conversion, sparsity, robust pruning

Quantum EigenGame for excited state calculation

David A. Quiroga, Jason Han, Anastasios Kyrillidis

Keywords: variational quantum algorithms, PCA, EigenGame, eigensolvers

Improving Neuron-level Interpretability with White-box Language Models (Oral)

Hao Bai, Yi Ma

Keywords: White-box models, deep learning architectures, neuron-level interpretation

You Only Debias Once: Towards Flexible Accuracy-Fairness Trade-offs at Inference Time (Oral)

Xiaotian Han, Tianlong Chen, Kaixiong Zhou, Zhimeng Jiang, Zhangyang Wang, Xia Hu

Keywords: fairness, weight space, neural network subspace

Grouped Sequential Optimization Strategy - the Application of Hyperparameter Importance Assessment in Deep Learning

Ruinan Wang, Ian T. Nabney, MOHAMMAD GOLBABAEE

Keywords: Optimization, Hyperparameter Optimization, Hyperparameter Importance Assessment, Model Efficiency, Search Space Exploration, Resource Allocation

The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity (Oral)

Yifang Chen, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song

Keywords: State-Space Models, Mamba, Circuit Complexity, Computational Limits

Curse of Attention: A Kernel-Based Perspective for Why Transformers Fail to Generalize on Time Series Forecasting and Beyond

Yekun Ke, Yingyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang

Keywords: Time Series Forecasting, Transformer Generalization, Kernel Methods

Asymptotic Behavior of the Coordinate Ascent Variational Inference in Singular Models

Sean C Plummer, Anirban Bhattacharya, Debdeep Pati, Yun Yang

Keywords: Coordinate Ascent Variational Inference, Singular Models, Dynmaical Systems

Theoretical and Empirical Advances in Forest Pruning

Albert Dorador

Keywords: Regression, Decision Trees, Ensemble Learning, Pruning, Interpretable Machine Learning

Bridging Domain Adaptation and Graph Neural Networks: A Tensor-Based Framework for Effective Label Propagation

Tao Wen, Elynn Chen, Yuzhou Chen, Qi Lei

Keywords: Graph Classification, Domain Adaptation, Label Propagation

Unlock the Theory behind Scaling 1-bit Neural Networks

Majid Daliri, Zhao Song, Chiwun Yang

Keywords: 1-bit neural network, neural tangent kernel, scaling law theory

MoXCo: How I learned to stop exploring and love my local minima?

Esha Singh, Shoham Sabach, Yu-Xiang Wang

Keywords: optimization, deep learning, adaptive methods

Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining

Jianwei Li, Yijun Dong, Qi Lei

Keywords: Efficient, Structured Pruning, LLMs

A unified framework for Sparse plus Low-Rank Matrix Decomposition for LLMs (Oral)

Mehdi Makni, Kayhan Behdin, Zheng Xu, Natalia Ponomareva, Rahul Mazumder

Keywords: model compression, sparse plus low-rank, optimization, inference acceleration, 2:4 sparsity, hardware and system co-design

FedPeWS: Personalized Warmup via Subnetworks for Enhanced Heterogeneous Federated Learning

Nurbek Tastan, Samuel Horváth, Martin Takáč, Karthik Nandakumar

Keywords: federated learning, heterogeneous federated learning, personalized warmup, subnetworks

Concept Bottleneck Model with Zero Performance Loss

Zhenzhen Wang, Aleksander Popel, Jeremias Sulam

Keywords: interpretability, explainability, concept bottleneck model, concept explanations

Meta ControlNet: Enhancing Task Adaptation via Meta Learning

Junjie Yang, Jinze Zhao, Peihao Wang, Zhangyang Wang, Yingbin Liang

Keywords: Meta Learning, Diffusion Models, Generalization

Provable Model-Parallel Distributed Principal Component Analysis with Parallel Deflation

Fangshuo Liao, Wenyi Su, Anastasios Kyrillidis

Keywords: Principal Component Analysis, Distributed Learning

Dimension Mixer: Group Mixing of Input Dimensions for Efficient Function Approximation

Suman Sapkota, Binod Bhattarai

Keywords: Sparse Architectures, Structured Sparsity, Butterfly Sparsity, Butterfly MLP, Butterfly Attention, Long Range Arena (LRA), Solving Pathfinder-X, Patch Only MLP-Mixer, Dimension Mixer

Dual Reasoning: A GNN-LLM Collaborative Framework for Knowledge Graph Question Answering

Guangyi Liu, Yongqi Zhang, Yong Li, Quanming Yao

Keywords: Large Language Model, Knowledge Graph, Question Answering

A Validation Approach to Over-parameterized Matrix and Image Recovery

Lijun Ding, Zhen Qin, Liwei Jiang, Jinxin Zhou, Zhihui Zhu

Keywords: Matrix recovery, low-rank, validation, gradient descent, nonconvex optimization

Revisiting the Initial Steps in Adaptive Gradient Descent Optimization

Abulikemu Abuduweili, Changliu Liu

Keywords: Optimization, Adam, Adaptive Gradient Decent, Neural Networks

Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism

Tim Tsz-Kit Lau, Weijian Li, Chenwei Xu, Han Liu, Mladen Kolar

Keywords: Distributed training, adaptive batch size, data parallelism, model parallelism

Heterogeneous Decision Making in Mixed Traffic: Uncertainty-aware Planning and Bounded Rationality

Hang Wang, Qiaoyi Fang, Junshan Zhang

Keywords: Mixed Traffic, Reinforcement Learning, Planning, Bounded Rationality

Do Global and Local Perform Cooperatively or Adversarially in Heterogeneous Federated Learning?

Huiwen Wu, Shuo Zhang

Keywords: federated learning; multilevel optimization; learning dynamics

A Case Study of Low Ranked Self-Expressive Structures in Neural Network Representations (Oral)

Uday Singh Saini, William Shiao, Yahya Sattar, Yogesh Dahiya, Samet Oymak, Evangelos E. Papalexakis

Keywords: Subspace Clustering, Centered Kernel Alignment, Representation Similarity Measures.

AdaProx: A Novel Method for Bilevel Optimization under Pessimistic Framework

Ziwei Guan, Daouda Sow, Sen Lin, Yingbin Liang

Keywords: pessimistic bilevel optimization, convergence analysis, nonconvex, gradient-based method

HSR-Enhanced Sparse Attention Acceleration

Bo Chen, Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song

Keywords: Half-Space Reporting, Attention Acceleration, Sparse Attention

Learning of Patch-Based Smooth-Plus-Sparse Models for Image Reconstruction

Stanislas Ducotterd, Sebastian Neumayer, Michael Unser

Keywords: Image reconstruction, sparsity, dictionary learning, deep equilibrium

Large-Scale Multiway Clustering with Seeded Clustering

Jiaxin Hu

Keywords: scalable algorithm, time complexity, space complexity, large-scale data, tensor clustering, seeded clustering

Fast John Ellipsoid Computation with Differential Privacy Optimization (Oral)

Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song, Junwei Yu

Keywords: Fast Optimization, Differential Privacy, John Ellipsoid Computation

Approximate Nullspace Augmented Finetuning for Robust Vision Transformers (Oral)

Haoyang Liu, Aditya Singh, Yijiang Li, Haohan Wang

Keywords: Robustness, Vision Transformer, Invariance