Splash photo of Stanford
Conference on Parsimony and Learning (CPAL)
March 2025, Stanford

Poster Sessions at CPAL 2025

Presentation Format

All accepted papers at CPAL 2025, from both the Proceedings and Spotlight tracks, will be presented as posters at the conference. A select number of Proceedings track papers are also presented as orals, as specified on the orals page.

The ordering of session numbers matches their chronological ordering. See the full program for the precise time and location of each spotlight presentation session.

Reception + Poster Session 1

Time: Day 2 (Mar 25) – Tuesday – 4:45 PM to 6:15 PM

1. Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers

Abhimanyu Rajeshkumar Bambhaniya, Amir Yazdanbakhsh, Suvinay Subramanian, Sheng-Chun Kao, Shivani Agrawal, Utku Evci, Tushar Krishna

Keywords: N:M structured sparsity, sparsity, model compression, attention-based models, sparse training recipe

2. Sparse MoE as a New Treatment: Addressing Forgetting, Fitting, Learning Issues in Multi-Modal Multi-Task Learning

Jie Peng, Sukwon Yun, Kaixiong Zhou, Ruida Zhou, Thomas Hartvigsen, Yanyong Zhang, Zhangyang Wang, Tianlong Chen

Keywords: transformer, sparse mixture-of-experts, multi-modal learning, multi-task learning

3. Theoretical and Empirical Advances in Forest Pruning

Albert Dorador

Keywords: Regression, Decision Trees, Ensemble Learning, Pruning, Interpretable Machine Learning

4. On How Iterative Magnitude Pruning Discovers Local Receptive Fields in Fully Connected Neural Networks

William T Redman, Zhangyang Wang, Alessandro Ingrosso, Sebastian Goldt

Keywords: iterative magnitude pruning, lottery tickets, sparse machine learning, gaussian statistics

5. Dimension Mixer: Group Mixing of Input Dimensions for Efficient Function Approximation

Suman Sapkota, Binod Bhattarai

Keywords: Sparse Architectures, Structured Sparsity, Butterfly Sparsity, Butterfly MLP, Butterfly Attention, Long Range Arena (LRA), Solving Pathfinder-X, Patch Only MLP-Mixer, Dimension Mixer

6. HSR-Enhanced Sparse Attention Acceleration

Bo Chen, Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song

Keywords: Half-Space Reporting, Attention Acceleration, Sparse Attention

7. Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization

DONGWEI WANG, Huanrui Yang

Keywords: LLM quantization, Hessian trace, Noise-aware finetuning

8. Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining

Jianwei Li, Yijun Dong, Qi Lei

Keywords: Efficient, Structured Pruning, LLMs

9. Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

Zhenyu Zhang, AJAY KUMAR JAISWAL, Lu Yin, Shiwei Liu, Jiawei Zhao, Yuandong Tian, Zhangyang Wang

Keywords: Large Language Models; Memory Efficient Training; Low Rank

10. Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism

Tim Tsz-Kit Lau, Weijian Li, Chenwei Xu, Han Liu, Mladen Kolar

Keywords: Distributed training, adaptive batch size, data parallelism, model parallelism

11. A unified framework for Sparse plus Low-Rank Matrix Decomposition for LLMs

Mehdi Makni, Kayhan Behdin, Zheng Xu, Natalia Ponomareva, Rahul Mazumder

Keywords: model compression, sparse plus low-rank, optimization, inference acceleration, 2:4 sparsity, hardware and system co-design

12. Unlock the Theory behind Scaling 1-bit Neural Networks

Majid Daliri, Zhao Song, Chiwun Yang

Keywords: 1-bit neural network, neural tangent kernel, scaling law theory

13. Adversarially Robust Spiking Neural Networks with Sparse Connectivity

Mathias Schmolli, Maximilian Baronig, Robert Legenstein, Ozan Ozdenizci

Keywords: adversarial robustness, spiking neural networks, ANN-to-SNN conversion, sparsity, robust pruning

14. SGD with Weight Decay Secretly Minimizes the Ranks of Your Neural Networks

Tomer Galanti, Zachary S Siegel, Aparna Gupte, Tomaso A Poggio

Keywords: Low-Rank, SGD, Implicit Bias, Rank, Rank Minimization, Weight Decay

15. Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry

Mohammed Adnan, Rohan Jain, Ekansh Sharma, Yani Ioannou

Keywords: Lottery Ticket Hypothesis, sparse training, linear mode connectivity, weight symmetry, deep learning, deep neural networks, random initialization, git re-basin, optimization

16. Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Weixin Liang, LILI YU, Liang Luo, Srini Iyer, Ning Dong, Chunting Zhou, Gargi Ghosh, Mike Lewis, Wen-tau Yih, Luke Zettlemoyer, Xi Victoria Lin

Keywords: Sparse architecture, Efficient deep architecture, Multi-modal foundation models, Mixture-of-Experts, Transformer

17. Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity

Weixin Liang, Junhong Shen, Genghan Zhang, Ning Dong, Luke Zettlemoyer, LILI YU

Keywords: Sparse architecture, Efficient deep architecture, Multi-modal foundation models, Mixture-of-Experts, State Space Model

18. Training Bayesian Neural Networks with Sparse Subspace Variational Inference

Junbo Li, Zichen Miao, Qiang Qiu, Ruqi Zhang

Keywords: Bayesian neural networks, sparse Bayesian learning, variational inference

19. WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models

Jinghan Jia, Jiancheng Liu, Yihua Zhang, Parikshit Ram, Nathalie Baracaldo, Sijia Liu

Keywords: Machine Unlearning, LLMs

20. Masks, Signs, And Learning Rate Rewinding

Advait Gadhikar, Rebekka Burkholz

Keywords: sparsity, pruning, lottery tickets, learning rate rewinding, iterative magnitude pruning

21. Streaming Kernel PCA Algorithm With Small Space

Yichuan Deng, Jiangxuan Long, Zhao Song, Zifan Wang, Han Zhang

Keywords: Principal Component Analysis, Kernel Method, Streaming Algorithm

22. Collaborative and Efficient Personalization with Mixtures of Adaptors

Abdulla Jasem Almansoori, Samuel Horváth, Martin Takáč

Keywords: federated learning, personalization, multi-task learning, clustering, parameter-efficient

23. Pruning neural network models for gene regulatory dynamics using data and domain knowledge

Intekhab Hossain, Jonas Fischer, Rebekka Burkholz, John Quackenbush

Keywords: sparsification, pruning, lottery tickets, explainability, gene regulation, domain knowledge, neural architecture design, NeuralODEs

24. Towards Vector Optimization on Low-Dimensional Vector Symbolic Architecture

Shijin Duan, Yejia Liu, Gaowen Liu, Ramana Rao Kompella, Shaolei Ren, Xiaolin Xu

Keywords: Vector Symbolic Architecture, Batch Normalization, Knowledge Distillation

25. Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

Pengxiang Li, Lu Yin, Shiwei Liu

Keywords: LayerNorm, LLM, Transformer

26. Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Haoyang Liu, Aditya Singh, Yijiang Li, Haohan Wang

Keywords: Robustness, Vision Transformer, Invariance

27. Learning of Patch-Based Smooth-Plus-Sparse Models for Image Reconstruction

Stanislas Ducotterd, Sebastian Neumayer, Michael Unser

Keywords: Image reconstruction, sparsity, dictionary learning, deep equilibrium

28. Prior Mismatch and Adaptation in PnP-ADMM with a Nonconvex Convergence Analysis

Shirin Shoushtari, Jiaming Liu, Edward P. Chandler, M. Salman Asif, Ulugbek S. Kamilov

Keywords: Computational Imaging, Plug-and-Play Priors, Imaging Inverse Problems, Mismatched Priors, Domain Adaptation

29. Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Dan Qiao, Kaiqi Zhang, Esha Singh, Daniel Soudry, Yu-Xiang Wang

Keywords: Minima Stability, Edge-of-Stability, Generalization, Flat Local Minima, Curvature

30. Certified Robustness against Sparse Adversarial Perturbations via Data Localization

Ambar Pal, Rene Vidal, Jeremias Sulam

Keywords: Adversarial Robustness, Certified Robustness, Sparse Perturbations, Data Localization

31. The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity

Yifang Chen, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song

Keywords: State-Space Models, Mamba, Circuit Complexity, Computational Limits

32. Fast John Ellipsoid Computation with Differential Privacy Optimization

Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song, Junwei Yu

Keywords: Fast Optimization, Differential Privacy, John Ellipsoid Computation

33. Understanding How Nonlinear Networks Create Linearly Separable Features for Low-Dimensional Data

Alec S Xu, Can Yaras, Peng Wang, Qing Qu

Keywords: union of subspaces, shallow nonlinear networks, random feature model

34. On Generalization Bounds for Neural Networks with Low Rank Layers

Andrea Pinto, Akshay Rangamani, Tomaso A Poggio

Keywords: Gaussian Complexity, Low Rank, Neural Collapse

35. Fast and Efficient Matching Algorithm with Deadline Instances

Zhao Song, Weixin Wang, Chenbo Yin, Junze Yin

Keywords: online weighted matching problem, sketching

Poster Session 2

Time: Day 3 (Mar 26) – Wednesday – 4:45 PM to 6:15 PM

1. AdaProx: A Novel Method for Bilevel Optimization under Pessimistic Framework

Ziwei Guan, Daouda Sow, Sen Lin, Yingbin Liang

Keywords: pessimistic bilevel optimization, convergence analysis, nonconvex, gradient-based method

2. Revisiting the Initial Steps in Adaptive Gradient Descent Optimization

Abulikemu Abuduweili, Changliu Liu

Keywords: Optimization, Adam, Adaptive Gradient Decent, Neural Networks

3. Exact and Rich Feature Learning Dynamics of Two-Layer Linear Networks

Wei Huang, Wuyang Chen, zhiqiang xu, Zhangyang Wang, Taiji Suzuki

Keywords: Neural networks dyanmics, Feature Learning, Optimization

4. Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets

Arthur Jacot, Alexandre Kaiser

Keywords: Low-rank bias, NeuralODE, Hamiltonian, Bottleneck structure

5. Quantum EigenGame for excited state calculation

David A. Quiroga, Jason Han, Anastasios Kyrillidis

Keywords: variational quantum algorithms, PCA, EigenGame, eigensolvers

6. Asymptotic Behavior of the Coordinate Ascent Variational Inference in Singular Models

Sean C Plummer, Anirban Bhattacharya, Debdeep Pati, Yun Yang

Keywords: Coordinate Ascent Variational Inference, Singular Models, Dynmaical Systems

7. Grouped Sequential Optimization Strategy - the Application of Hyperparameter Importance Assessment in Deep Learning

Ruinan Wang, Ian T. Nabney, MOHAMMAD GOLBABAEE

Keywords: Optimization, Hyperparameter Optimization, Hyperparameter Importance Assessment, Model Efficiency, Search Space Exploration, Resource Allocation

8. Provable Model-Parallel Distributed Principal Component Analysis with Parallel Deflation

Fangshuo Liao, Wenyi Su, Anastasios Kyrillidis

Keywords: Principal Component Analysis, Distributed Learning

9. AgentHPO: Large Language Model Agent for Hyper-Parameter Optimization

Siyi Liu, Chen Gao, Yong Li

Keywords: Large Language Models, Agent, Hyperparameter Optimization

10. FedOSAA: Improving Federated Learning with One-Step Anderson Acceleration

Xue Feng, M. Paul Laiu, Thomas Strohmer

Keywords: federated learning, quasi-Newton methods, Anderson acceleration

11. Unlocking Global Optimality in Bilevel Optimization: A Pilot Study

Quan Xiao, Tianyi Chen

Keywords: bilevel optimization; global convergence

12. On the Crucial Role of Initialization for Matrix Factorization

Bingcong Li, Liang Zhang, Aryan Mokhtari, Niao He

Keywords: nonconvex optimization, initialization, quadratic rate, low rank adapter, lora

13. Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces

Saket Tiwari, Omer Gottesman, George Konidaris

Keywords: resinforcement learning, continuous control, geometry

14. Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence

Berfin Simsek, Amire Bendjeddou, Daniel Hsu

Keywords: time complexity, gradient flow dynamics, hardness

15. Learning Dynamics of Deep Matrix Factorization Beyond the Edge of Stability

Avrajit Ghosh, Soo Min Kwon, Rongrong Wang, Saiprasad Ravishankar, Qing Qu

Keywords: edge of stability, deep linear networks

16. Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity

Dominik Stöger, Yizhe Zhu

Keywords: non-convex optimization, factorized gradient descent, matrix sensing, sample complexity, virtual sequences

17. Relaxed Contrastive Learning for Federated Learning

Seonguk Seo, Jinkyu Kim, Geeho Kim, Bohyung Han

Keywords: dimensional collapse, transferability, federated learning, local deviation

18. Do Global and Local Perform Cooperatively or Adversarially in Heterogeneous Federated Learning?

Huiwen Wu, Shuo Zhang

Keywords: federated learning; multilevel optimization; learning dynamics

19. FedPeWS: Personalized Warmup via Subnetworks for Enhanced Heterogeneous Federated Learning

Nurbek Tastan, Samuel Horváth, Martin Takáč, Karthik Nandakumar

Keywords: federated learning, heterogeneous federated learning, personalized warmup, subnetworks

20. Characterizing ResNet’s Universal Approximation Capability

Chenghao Liu, Enming Liang, Minghua Chen

Keywords: universal approximation, ResNet, optimal approximation rate

21. A Validation Approach to Over-parameterized Matrix and Image Recovery

Lijun Ding, Zhen Qin, Liwei Jiang, Jinxin Zhou, Zhihui Zhu

Keywords: Matrix recovery, low-rank, validation, gradient descent, nonconvex optimization

22. WHOMP: Optimizing Randomized Controlled Trials via Wasserstein Homogeneity

Shizhou Xu, Thomas Strohmer

Keywords: randomized controlled trial, Wasserstein homogeneity, anti-clustering, diverse K-means, control/test group splitting, cross-validation

23. What’s in a Prior? Learned Proximal Networks for Inverse Problems

Zhenghan Fang, Sam Buchanan, Jeremias Sulam

Keywords: Inverse problems, Proximal operators, Plug-and-play, Explicit regularizer, Convergent PnP, Input convex neural networks

24. Provable Probabilistic Imaging using Score-based Generative Priors

Yu Sun, Zihui Wu, Yifan Chen, Berthy Feng, Katherine Bouman

Keywords: Diffusion models, inverse problem, image reconstruction, langevin dynamics, markov processes, plug-and-play priors, posterior sampling, regularized inversion, score-based generative models, uncertainty quantification

25. Principle Component Trees and their Persistent Homology

Ben Kizaric, Daniel L. Pimentel-Alarcón

Keywords: subspace clustering, low-rank decomposition, unsupervised learning, manifold learning, dimensionality reduction, topological data analysis

26. FlowDAS: A Flow-Based Framework for Data Assimilation

Siyi Chen, Yixuan Jia, Qing Qu, He Sun, Jeffrey A Fessler

Keywords: Data Assimilation, Stochastic Dynamic System, Flow matching, Stochastic Interpolants, Inverse Problem

27. Large-Scale Multiway Clustering with Seeded Clustering

Jiaxin Hu

Keywords: scalable algorithm, time complexity, space complexity, large-scale data, tensor clustering, seeded clustering

28. Are all layers created equal: A neural collapse perspective

Jinxin Zhou, Jiachen Jiang, Zhihui Zhu

Keywords: Deep Learning, Neural Collapse, Robustness, Generalization, Memorization, Understanding

29. Geometry of Concepts in Next-token Prediction: Neural-Collapse Meets Semantics

Yize Zhao, Christos Thrampoulidis

Keywords: Large Language Models(LLMs), Neural Embeddings, Word Embeddings, Neural-Collapse, Interpretability, Optimization

30. Deep Neural Regression Collapse

Akshay Rangamani, Altay Unal

Keywords: Neural Collapse, Regression, Low Rank

31. Geometric Algebra Planes: Convex Implicit Neural Volumes

Irmak Sivgin, Sara Fridovich-Keil, Gordon Wetzstein, Mert Pilanci

Keywords: Volume representation, tensor decomposition, convex optimization, geometric algebra, nerf

32. A Robust Kernel Statistical Test of Invariance: Detecting Subtle Asymmetries

Ashkan Soleymani, Behrooz Tahmasebi, Stefanie Jegelka, Patrick Jaillet

Keywords: Invariance, Hypothesis Testing, Kernel Methods

33. Learning with Exact Invariances in Polynomial Time

Ashkan Soleymani, Behrooz Tahmasebi, Stefanie Jegelka, Patrick Jaillet

Keywords: Learning with Invariances, Kernels, Spectral Theory

34. Primal-Dual Spectral Representation for Off-policy Evaluation

Yang Hu, Tianyi Chen, Na Li, Kai Wang, Bo Dai

Keywords: reinforcement learning, off-policy evaluation, spectral representation, primal-dual representation

35. Dependence Induced Representations

Xiangxiang Xu, Lizhong Zheng

Keywords: representation learning, statistical dependence, maximal correlation, minimal sufficiency, neural collapse

36. MoXCo: How I learned to stop exploring and love my local minima?

Esha Singh, Shoham Sabach, Yu-Xiang Wang

Keywords: optimization, deep learning, adaptive methods

Coffee Break + Poster Session 3

Time: Day 4 (Mar 27) – Thursday – 11:00 AM to 12:30 PM

1. Improving Neuron-level Interpretability with White-box Language Models

Hao Bai, Yi Ma

Keywords: White-box models, deep learning architectures, neuron-level interpretation

2. Vanishing Feature: Diagnosing Model Merging and Beyond

Xingyu Qu, Samuel Horváth

Keywords: Model Merging, Efficiency, Deep Learning, Efficient Deep Learning

3. A Case Study of Low Ranked Self-Expressive Structures in Neural Network Representations

Uday Singh Saini, William Shiao, Yahya Sattar, Yogesh Dahiya, Samet Oymak, Evangelos E. Papalexakis

Keywords: Subspace Clustering, Centered Kernel Alignment, Representation Similarity Measures.

4. You Only Debias Once: Towards Flexible Accuracy-Fairness Trade-offs at Inference Time

Xiaotian Han, Tianlong Chen, Kaixiong Zhou, Zhimeng Jiang, Zhangyang Wang, Xia Hu

Keywords: fairness, weight space, neural network subspace

5. RecCrysFormer: Refined Protein Structural Prediction from 3D Patterson Maps via Recycling Training Runs

Tom Pan, Evan Dramko, Mitchell D. Miller, George N Phillips Jr., Anastasios Kyrillidis

Keywords: Protein Structural Prediction, Transformers, Patterson Maps

6. Dual Reasoning: A GNN-LLM Collaborative Framework for Knowledge Graph Question Answering

Guangyi Liu, Yongqi Zhang, Yong Li, Quanming Yao

Keywords: Large Language Model, Knowledge Graph, Question Answering

7. Meta ControlNet: Enhancing Task Adaptation via Meta Learning

Junjie Yang, Jinze Zhao, Peihao Wang, Zhangyang Wang, Yingbin Liang

Keywords: Meta Learning, Diffusion Models, Generalization

8. Bridging Domain Adaptation and Graph Neural Networks: A Tensor-Based Framework for Effective Label Propagation

Tao Wen, Elynn Chen, Yuzhou Chen, Qi Lei

Keywords: Graph Classification, Domain Adaptation, Label Propagation

9. Concept Bottleneck Model with Zero Performance Loss

Zhenzhen Wang, Aleksander Popel, Jeremias Sulam

Keywords: interpretability, explainability, concept bottleneck model, concept explanations

10. Enhancing Video Representation Learning with Temporal Differentiation

Siyi Chen, Minkyu Choi, Zesen Zhao, Kuan Han, Qing Qu, Zhongming Liu

Keywords: video representation learning, physics-inspired

11. Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning

Can Yaras, Siyi Chen, Peng Wang, Qing Qu

Keywords: multimodal learning, modality gap, contrastive learning

12. Learning Effective Dynamics across Spatio-Temporal Scales of Complex Flows

Han Gao, Sebastian Kaltenbach, Petros Koumoutsakos

Keywords: Learned Effective Dynamics, Reduced-Order Modeling, Multiscale Systems, Turbulent Flows

13. White-box Error Correction Code Transformer

Ziyan Zheng, Chin Wa Lau, Nian Guo, Xiang Shi, Shao-Lun Huang

Keywords: Error Correction Codes, Neural Decoder, White-box Transformer, Sparse Rate Reduction, Tanner Graph

14. Curse of Attention: A Kernel-Based Perspective for Why Transformers Fail to Generalize on Time Series Forecasting and Beyond

Yekun Ke, Yingyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang

Keywords: Time Series Forecasting, Transformer Generalization, Kernel Methods

15. Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs

Tianyu Guo, Druv Pai, Yu Bai, Jiantao Jiao, Michael I. Jordan, Song Mei

Keywords: attention sink, mechanistic interpretability, language models, transformers

16. Diffusion models learn low-dimensional distributions via subspace clustering

Peng Wang, Huijie Zhang, Zekai Zhang, Siyi Chen, Yi Ma, Qing Qu

Keywords: diffusion models, mixture of low-rank Gaussians, phase transition, subspace clustering

17. Visual Prompting Reimagined: The Power of Activation Prompts

Yihua Zhang, Hongkang Li, Yuguang Yao, Aochuan Chen, Shuai Zhang, Pin-Yu Chen, Meng Wang, Sijia Liu

Keywords: visual prompt, parameter efficient finetuning, learning theory, generalization analysis

18. Understanding Diffusion-based Representation Learning via Low-Dimensional Modeling

Xiao Li, Zekai Zhang, Xiang Li, Siyi Chen, Zhihui Zhu, Peng Wang, Qing Qu

Keywords: diffusion representation learning, representation learning, diffusion model

19. Simplifying DINO by Coding Rate Regularization

Ziyang Wu, Jingyuan Zhang, Druv Pai, Yi Ma

Keywords: Representation Learning, Self Supervised Learning, Coding Rate

20. Closure Discovery for Coarse-Grained Partial Differential Equations Using Grid-based Reinforcement Learning

Jan-Philipp von Bassewitz, Sebastian Kaltenbach, Petros Koumoutsakos

Keywords: Closure Discovery, Inductive Bias, Multi-Agent Reinforcement Learning

21. Heterogeneous Decision Making in Mixed Traffic: Uncertainty-aware Planning and Bounded Rationality

Hang Wang, Qiaoyi Fang, Junshan Zhang

Keywords: Mixed Traffic, Reinforcement Learning, Planning, Bounded Rationality

22. CompeteAI: Understanding the Competition Dynamics of Large Language Model-based Agents

Qinlin Zhao, Jindong Wang, Yixuan Zhang, Yiqiao Jin, Kaijie Zhu, Hao Chen, Xing Xie

Keywords: LLM-based Agent, Agent Based Modeling, Competition

23. DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks

Kaijie Zhu, Jiaao Chen, Jindong Wang, Neil Zhenqiang Gong, Diyi Yang, Xing Xie

Keywords: Large Language Models, Evaluation, Data Contamination

24. Knowledge-aware Parsimony Learning: A Perspective from Relational Graphs

Quanming Yao, Yongqi Zhang, Yaqing Wang, Nan Yin, James Kwok, Qiang Yang

Keywords: scaling law, Parsimony Learning, Graph Learning

25. Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing

Peihao Wang, Ruisi Cai, Yuehao Wang, Jiajun Zhu, Pragya Srivastava, Zhangyang Wang, Pan Li

Keywords: State Space Models, Large Language Models, Recency, Over-smoothing

26. Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding

Jiajun Zhu, Peihao Wang, Ruisi Cai, Jason D. Lee, Pan Li, Zhangyang Wang

Keywords: Positional Encoding, Equivariant Machine Learning, Large Language Models

27. Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations

Yize Zhao, Tina Behnia, Vala Vakilian, Christos Thrampoulidis

Keywords: language models, neural embeddings, optimization, implicit regularization, low-rank matrix factorization, support-vector machines

28. Dynamic Rescaling for Training GNNs

Nimrah Mustafa, Rebekka Burkholz

Keywords: graph neural network, rescale invariance, generalization, network balance

29. Image Reconstruction Via Autoencoding Sequential Deep Image Prior

Ismail Alkhouri, Shijun Liang, Evan Bell, Qing Qu, Rongrong Wang, Saiprasad Ravishankar

Keywords: Image Reconstruction, Deep Image Prior, Generative Models

30. SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems

Ismail Alkhouri, Shijun Liang, Cheng-Han Huang, Jimmy Dai, Qing Qu, Saiprasad Ravishankar, Rongrong Wang

Keywords: Image Restoration, Diffusion Models, Inverse Problems

31. SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Tianzhe Chu, Yuexiang Zhai, Jihan Yang, Shengbang Tong, Saining Xie, Sergey Levine, Yi Ma

Keywords: foundation model post-training

32. Attention-Only Transformers via Unrolled Subspace Denoising

Peng Wang, Yifu Lu, Yaodong Yu, Druv Pai, Qing Qu, Yi Ma

Keywords: transformer, self-attention, unrolled optimization, subspace denoising

33. Out-of-distribution generalization via composition: a lens through induction heads in Transformers

Jiajun Song, Zhuoyan Xu, Yiqiao Zhong

Keywords: out-of-distribution generalization, low-dimensional subspace, composition, large language models, emergent ability, in-context learning

34. Sufficient and Necessary Explanations (and What Lies in Between)

Beepul Bharti, Paul Yi, Jeremias Sulam

Keywords: interpretability, explainability

35. Generative Learning for Solving Non-Convex Problem with Multi-Valued Input-Solution Mapping

Enming Liang, Minghua Chen

Keywords: Non-convex Optimization, Generative Modeling, Flow, ODE

36. Shallow Diffuse: Robust and Invisible Watermarking through Low-Dimensional Subspaces in Diffusion Models

Wenda Li, Huijie Zhang, Qing Qu

Keywords: diffusion Model, watermark, low-dimensional subspace, consistency, robustness