
Poster Sessions at CPAL 2025
Presentation Format
All accepted papers at CPAL 2025, from both the Proceedings and Spotlight tracks, will be presented as posters at the conference. A select number of Proceedings track papers are also presented as orals, as specified on the orals page.
The ordering of session numbers matches their chronological ordering. See the full program for the precise time and location of each spotlight presentation session.
Reception + Poster Session 1
Time: Day 2 (Mar 25) – Tuesday – 4:45 PM to 6:15 PM
1. Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers
Abhimanyu Rajeshkumar Bambhaniya, Amir Yazdanbakhsh, Suvinay Subramanian, Sheng-Chun Kao, Shivani Agrawal, Utku Evci, Tushar Krishna
Keywords: N:M structured sparsity, sparsity, model compression, attention-based models, sparse training recipe
2. Sparse MoE as a New Treatment: Addressing Forgetting, Fitting, Learning Issues in Multi-Modal Multi-Task Learning
Jie Peng, Sukwon Yun, Kaixiong Zhou, Ruida Zhou, Thomas Hartvigsen, Yanyong Zhang, Zhangyang Wang, Tianlong Chen
Keywords: transformer, sparse mixture-of-experts, multi-modal learning, multi-task learning
3. Theoretical and Empirical Advances in Forest Pruning
Albert Dorador
Keywords: Regression, Decision Trees, Ensemble Learning, Pruning, Interpretable Machine Learning
4. On How Iterative Magnitude Pruning Discovers Local Receptive Fields in Fully Connected Neural Networks
William T Redman, Zhangyang Wang, Alessandro Ingrosso, Sebastian Goldt
Keywords: iterative magnitude pruning, lottery tickets, sparse machine learning, gaussian statistics
5. Dimension Mixer: Group Mixing of Input Dimensions for Efficient Function Approximation
Suman Sapkota, Binod Bhattarai
Keywords: Sparse Architectures, Structured Sparsity, Butterfly Sparsity, Butterfly MLP, Butterfly Attention, Long Range Arena (LRA), Solving Pathfinder-X, Patch Only MLP-Mixer, Dimension Mixer
6. HSR-Enhanced Sparse Attention Acceleration
Bo Chen, Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song
Keywords: Half-Space Reporting, Attention Acceleration, Sparse Attention
7. Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization
DONGWEI WANG, Huanrui Yang
Keywords: LLM quantization, Hessian trace, Noise-aware finetuning
8. Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining
Jianwei Li, Yijun Dong, Qi Lei
Keywords: Efficient, Structured Pruning, LLMs
9. Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients
Zhenyu Zhang, AJAY KUMAR JAISWAL, Lu Yin, Shiwei Liu, Jiawei Zhao, Yuandong Tian, Zhangyang Wang
Keywords: Large Language Models; Memory Efficient Training; Low Rank
10. Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau, Weijian Li, Chenwei Xu, Han Liu, Mladen Kolar
Keywords: Distributed training, adaptive batch size, data parallelism, model parallelism
11. A unified framework for Sparse plus Low-Rank Matrix Decomposition for LLMs
Mehdi Makni, Kayhan Behdin, Zheng Xu, Natalia Ponomareva, Rahul Mazumder
Keywords: model compression, sparse plus low-rank, optimization, inference acceleration, 2:4 sparsity, hardware and system co-design
12. Unlock the Theory behind Scaling 1-bit Neural Networks
Majid Daliri, Zhao Song, Chiwun Yang
Keywords: 1-bit neural network, neural tangent kernel, scaling law theory
13. Adversarially Robust Spiking Neural Networks with Sparse Connectivity
Mathias Schmolli, Maximilian Baronig, Robert Legenstein, Ozan Ozdenizci
Keywords: adversarial robustness, spiking neural networks, ANN-to-SNN conversion, sparsity, robust pruning
14. SGD with Weight Decay Secretly Minimizes the Ranks of Your Neural Networks
Tomer Galanti, Zachary S Siegel, Aparna Gupte, Tomaso A Poggio
Keywords: Low-Rank, SGD, Implicit Bias, Rank, Rank Minimization, Weight Decay
15. Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry
Mohammed Adnan, Rohan Jain, Ekansh Sharma, Yani Ioannou
Keywords: Lottery Ticket Hypothesis, sparse training, linear mode connectivity, weight symmetry, deep learning, deep neural networks, random initialization, git re-basin, optimization
16. Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Weixin Liang, LILI YU, Liang Luo, Srini Iyer, Ning Dong, Chunting Zhou, Gargi Ghosh, Mike Lewis, Wen-tau Yih, Luke Zettlemoyer, Xi Victoria Lin
Keywords: Sparse architecture, Efficient deep architecture, Multi-modal foundation models, Mixture-of-Experts, Transformer
17. Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity
Weixin Liang, Junhong Shen, Genghan Zhang, Ning Dong, Luke Zettlemoyer, LILI YU
Keywords: Sparse architecture, Efficient deep architecture, Multi-modal foundation models, Mixture-of-Experts, State Space Model
18. Training Bayesian Neural Networks with Sparse Subspace Variational Inference
Junbo Li, Zichen Miao, Qiang Qiu, Ruqi Zhang
Keywords: Bayesian neural networks, sparse Bayesian learning, variational inference
19. WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
Jinghan Jia, Jiancheng Liu, Yihua Zhang, Parikshit Ram, Nathalie Baracaldo, Sijia Liu
Keywords: Machine Unlearning, LLMs
20. Masks, Signs, And Learning Rate Rewinding
Advait Gadhikar, Rebekka Burkholz
Keywords: sparsity, pruning, lottery tickets, learning rate rewinding, iterative magnitude pruning
21. Streaming Kernel PCA Algorithm With Small Space
Yichuan Deng, Jiangxuan Long, Zhao Song, Zifan Wang, Han Zhang
Keywords: Principal Component Analysis, Kernel Method, Streaming Algorithm
22. Collaborative and Efficient Personalization with Mixtures of Adaptors
Abdulla Jasem Almansoori, Samuel Horváth, Martin Takáč
Keywords: federated learning, personalization, multi-task learning, clustering, parameter-efficient
23. Pruning neural network models for gene regulatory dynamics using data and domain knowledge
Intekhab Hossain, Jonas Fischer, Rebekka Burkholz, John Quackenbush
Keywords: sparsification, pruning, lottery tickets, explainability, gene regulation, domain knowledge, neural architecture design, NeuralODEs
24. Towards Vector Optimization on Low-Dimensional Vector Symbolic Architecture
Shijin Duan, Yejia Liu, Gaowen Liu, Ramana Rao Kompella, Shaolei Ren, Xiaolin Xu
Keywords: Vector Symbolic Architecture, Batch Normalization, Knowledge Distillation
25. Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
Pengxiang Li, Lu Yin, Shiwei Liu
Keywords: LayerNorm, LLM, Transformer
26. Approximate Nullspace Augmented Finetuning for Robust Vision Transformers
Haoyang Liu, Aditya Singh, Yijiang Li, Haohan Wang
Keywords: Robustness, Vision Transformer, Invariance
27. Learning of Patch-Based Smooth-Plus-Sparse Models for Image Reconstruction
Stanislas Ducotterd, Sebastian Neumayer, Michael Unser
Keywords: Image reconstruction, sparsity, dictionary learning, deep equilibrium
28. Prior Mismatch and Adaptation in PnP-ADMM with a Nonconvex Convergence Analysis
Shirin Shoushtari, Jiaming Liu, Edward P. Chandler, M. Salman Asif, Ulugbek S. Kamilov
Keywords: Computational Imaging, Plug-and-Play Priors, Imaging Inverse Problems, Mismatched Priors, Domain Adaptation
29. Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes
Dan Qiao, Kaiqi Zhang, Esha Singh, Daniel Soudry, Yu-Xiang Wang
Keywords: Minima Stability, Edge-of-Stability, Generalization, Flat Local Minima, Curvature
30. Certified Robustness against Sparse Adversarial Perturbations via Data Localization
Ambar Pal, Rene Vidal, Jeremias Sulam
Keywords: Adversarial Robustness, Certified Robustness, Sparse Perturbations, Data Localization
31. The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity
Yifang Chen, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song
Keywords: State-Space Models, Mamba, Circuit Complexity, Computational Limits
32. Fast John Ellipsoid Computation with Differential Privacy Optimization
Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song, Junwei Yu
Keywords: Fast Optimization, Differential Privacy, John Ellipsoid Computation
33. Understanding How Nonlinear Networks Create Linearly Separable Features for Low-Dimensional Data
Alec S Xu, Can Yaras, Peng Wang, Qing Qu
Keywords: union of subspaces, shallow nonlinear networks, random feature model
34. On Generalization Bounds for Neural Networks with Low Rank Layers
Andrea Pinto, Akshay Rangamani, Tomaso A Poggio
Keywords: Gaussian Complexity, Low Rank, Neural Collapse
35. Fast and Efficient Matching Algorithm with Deadline Instances
Zhao Song, Weixin Wang, Chenbo Yin, Junze Yin
Keywords: online weighted matching problem, sketching
Poster Session 2
Time: Day 3 (Mar 26) – Wednesday – 4:45 PM to 6:15 PM
1. AdaProx: A Novel Method for Bilevel Optimization under Pessimistic Framework
Ziwei Guan, Daouda Sow, Sen Lin, Yingbin Liang
Keywords: pessimistic bilevel optimization, convergence analysis, nonconvex, gradient-based method
2. Revisiting the Initial Steps in Adaptive Gradient Descent Optimization
Abulikemu Abuduweili, Changliu Liu
Keywords: Optimization, Adam, Adaptive Gradient Decent, Neural Networks
3. Exact and Rich Feature Learning Dynamics of Two-Layer Linear Networks
Wei Huang, Wuyang Chen, zhiqiang xu, Zhangyang Wang, Taiji Suzuki
Keywords: Neural networks dyanmics, Feature Learning, Optimization
4. Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets
Arthur Jacot, Alexandre Kaiser
Keywords: Low-rank bias, NeuralODE, Hamiltonian, Bottleneck structure
5. Quantum EigenGame for excited state calculation
David A. Quiroga, Jason Han, Anastasios Kyrillidis
Keywords: variational quantum algorithms, PCA, EigenGame, eigensolvers
6. Asymptotic Behavior of the Coordinate Ascent Variational Inference in Singular Models
Sean C Plummer, Anirban Bhattacharya, Debdeep Pati, Yun Yang
Keywords: Coordinate Ascent Variational Inference, Singular Models, Dynmaical Systems
7. Grouped Sequential Optimization Strategy - the Application of Hyperparameter Importance Assessment in Deep Learning
Ruinan Wang, Ian T. Nabney, MOHAMMAD GOLBABAEE
Keywords: Optimization, Hyperparameter Optimization, Hyperparameter Importance Assessment, Model Efficiency, Search Space Exploration, Resource Allocation
8. Provable Model-Parallel Distributed Principal Component Analysis with Parallel Deflation
Fangshuo Liao, Wenyi Su, Anastasios Kyrillidis
Keywords: Principal Component Analysis, Distributed Learning
9. AgentHPO: Large Language Model Agent for Hyper-Parameter Optimization
Siyi Liu, Chen Gao, Yong Li
Keywords: Large Language Models, Agent, Hyperparameter Optimization
10. FedOSAA: Improving Federated Learning with One-Step Anderson Acceleration
Xue Feng, M. Paul Laiu, Thomas Strohmer
Keywords: federated learning, quasi-Newton methods, Anderson acceleration
11. Unlocking Global Optimality in Bilevel Optimization: A Pilot Study
Quan Xiao, Tianyi Chen
Keywords: bilevel optimization; global convergence
12. On the Crucial Role of Initialization for Matrix Factorization
Bingcong Li, Liang Zhang, Aryan Mokhtari, Niao He
Keywords: nonconvex optimization, initialization, quadratic rate, low rank adapter, lora
13. Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces
Saket Tiwari, Omer Gottesman, George Konidaris
Keywords: resinforcement learning, continuous control, geometry
14. Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence
Berfin Simsek, Amire Bendjeddou, Daniel Hsu
Keywords: time complexity, gradient flow dynamics, hardness
15. Learning Dynamics of Deep Matrix Factorization Beyond the Edge of Stability
Avrajit Ghosh, Soo Min Kwon, Rongrong Wang, Saiprasad Ravishankar, Qing Qu
Keywords: edge of stability, deep linear networks
16. Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity
Dominik Stöger, Yizhe Zhu
Keywords: non-convex optimization, factorized gradient descent, matrix sensing, sample complexity, virtual sequences
17. Relaxed Contrastive Learning for Federated Learning
Seonguk Seo, Jinkyu Kim, Geeho Kim, Bohyung Han
Keywords: dimensional collapse, transferability, federated learning, local deviation
18. Do Global and Local Perform Cooperatively or Adversarially in Heterogeneous Federated Learning?
Huiwen Wu, Shuo Zhang
Keywords: federated learning; multilevel optimization; learning dynamics
19. FedPeWS: Personalized Warmup via Subnetworks for Enhanced Heterogeneous Federated Learning
Nurbek Tastan, Samuel Horváth, Martin Takáč, Karthik Nandakumar
Keywords: federated learning, heterogeneous federated learning, personalized warmup, subnetworks
20. Characterizing ResNet’s Universal Approximation Capability
Chenghao Liu, Enming Liang, Minghua Chen
Keywords: universal approximation, ResNet, optimal approximation rate
21. A Validation Approach to Over-parameterized Matrix and Image Recovery
Lijun Ding, Zhen Qin, Liwei Jiang, Jinxin Zhou, Zhihui Zhu
Keywords: Matrix recovery, low-rank, validation, gradient descent, nonconvex optimization
22. WHOMP: Optimizing Randomized Controlled Trials via Wasserstein Homogeneity
Shizhou Xu, Thomas Strohmer
Keywords: randomized controlled trial, Wasserstein homogeneity, anti-clustering, diverse K-means, control/test group splitting, cross-validation
23. What’s in a Prior? Learned Proximal Networks for Inverse Problems
Zhenghan Fang, Sam Buchanan, Jeremias Sulam
Keywords: Inverse problems, Proximal operators, Plug-and-play, Explicit regularizer, Convergent PnP, Input convex neural networks
24. Provable Probabilistic Imaging using Score-based Generative Priors
Yu Sun, Zihui Wu, Yifan Chen, Berthy Feng, Katherine Bouman
Keywords: Diffusion models, inverse problem, image reconstruction, langevin dynamics, markov processes, plug-and-play priors, posterior sampling, regularized inversion, score-based generative models, uncertainty quantification
25. Principle Component Trees and their Persistent Homology
Ben Kizaric, Daniel L. Pimentel-Alarcón
Keywords: subspace clustering, low-rank decomposition, unsupervised learning, manifold learning, dimensionality reduction, topological data analysis
26. FlowDAS: A Flow-Based Framework for Data Assimilation
Siyi Chen, Yixuan Jia, Qing Qu, He Sun, Jeffrey A Fessler
Keywords: Data Assimilation, Stochastic Dynamic System, Flow matching, Stochastic Interpolants, Inverse Problem
27. Large-Scale Multiway Clustering with Seeded Clustering
Jiaxin Hu
Keywords: scalable algorithm, time complexity, space complexity, large-scale data, tensor clustering, seeded clustering
28. Are all layers created equal: A neural collapse perspective
Jinxin Zhou, Jiachen Jiang, Zhihui Zhu
Keywords: Deep Learning, Neural Collapse, Robustness, Generalization, Memorization, Understanding
29. Geometry of Concepts in Next-token Prediction: Neural-Collapse Meets Semantics
Yize Zhao, Christos Thrampoulidis
Keywords: Large Language Models(LLMs), Neural Embeddings, Word Embeddings, Neural-Collapse, Interpretability, Optimization
30. Deep Neural Regression Collapse
Akshay Rangamani, Altay Unal
Keywords: Neural Collapse, Regression, Low Rank
31. Geometric Algebra Planes: Convex Implicit Neural Volumes
Irmak Sivgin, Sara Fridovich-Keil, Gordon Wetzstein, Mert Pilanci
Keywords: Volume representation, tensor decomposition, convex optimization, geometric algebra, nerf
32. A Robust Kernel Statistical Test of Invariance: Detecting Subtle Asymmetries
Ashkan Soleymani, Behrooz Tahmasebi, Stefanie Jegelka, Patrick Jaillet
Keywords: Invariance, Hypothesis Testing, Kernel Methods
33. Learning with Exact Invariances in Polynomial Time
Ashkan Soleymani, Behrooz Tahmasebi, Stefanie Jegelka, Patrick Jaillet
Keywords: Learning with Invariances, Kernels, Spectral Theory
34. Primal-Dual Spectral Representation for Off-policy Evaluation
Yang Hu, Tianyi Chen, Na Li, Kai Wang, Bo Dai
Keywords: reinforcement learning, off-policy evaluation, spectral representation, primal-dual representation
35. Dependence Induced Representations
Xiangxiang Xu, Lizhong Zheng
Keywords: representation learning, statistical dependence, maximal correlation, minimal sufficiency, neural collapse
36. MoXCo: How I learned to stop exploring and love my local minima?
Esha Singh, Shoham Sabach, Yu-Xiang Wang
Keywords: optimization, deep learning, adaptive methods
Coffee Break + Poster Session 3
Time: Day 4 (Mar 27) – Thursday – 11:00 AM to 12:30 PM
1. Improving Neuron-level Interpretability with White-box Language Models
Hao Bai, Yi Ma
Keywords: White-box models, deep learning architectures, neuron-level interpretation
2. Vanishing Feature: Diagnosing Model Merging and Beyond
Xingyu Qu, Samuel Horváth
Keywords: Model Merging, Efficiency, Deep Learning, Efficient Deep Learning
3. A Case Study of Low Ranked Self-Expressive Structures in Neural Network Representations
Uday Singh Saini, William Shiao, Yahya Sattar, Yogesh Dahiya, Samet Oymak, Evangelos E. Papalexakis
Keywords: Subspace Clustering, Centered Kernel Alignment, Representation Similarity Measures.
4. You Only Debias Once: Towards Flexible Accuracy-Fairness Trade-offs at Inference Time
Xiaotian Han, Tianlong Chen, Kaixiong Zhou, Zhimeng Jiang, Zhangyang Wang, Xia Hu
Keywords: fairness, weight space, neural network subspace
5. RecCrysFormer: Refined Protein Structural Prediction from 3D Patterson Maps via Recycling Training Runs
Tom Pan, Evan Dramko, Mitchell D. Miller, George N Phillips Jr., Anastasios Kyrillidis
Keywords: Protein Structural Prediction, Transformers, Patterson Maps
6. Dual Reasoning: A GNN-LLM Collaborative Framework for Knowledge Graph Question Answering
Guangyi Liu, Yongqi Zhang, Yong Li, Quanming Yao
Keywords: Large Language Model, Knowledge Graph, Question Answering
7. Meta ControlNet: Enhancing Task Adaptation via Meta Learning
Junjie Yang, Jinze Zhao, Peihao Wang, Zhangyang Wang, Yingbin Liang
Keywords: Meta Learning, Diffusion Models, Generalization
8. Bridging Domain Adaptation and Graph Neural Networks: A Tensor-Based Framework for Effective Label Propagation
Tao Wen, Elynn Chen, Yuzhou Chen, Qi Lei
Keywords: Graph Classification, Domain Adaptation, Label Propagation
9. Concept Bottleneck Model with Zero Performance Loss
Zhenzhen Wang, Aleksander Popel, Jeremias Sulam
Keywords: interpretability, explainability, concept bottleneck model, concept explanations
10. Enhancing Video Representation Learning with Temporal Differentiation
Siyi Chen, Minkyu Choi, Zesen Zhao, Kuan Han, Qing Qu, Zhongming Liu
Keywords: video representation learning, physics-inspired
11. Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning
Can Yaras, Siyi Chen, Peng Wang, Qing Qu
Keywords: multimodal learning, modality gap, contrastive learning
12. Learning Effective Dynamics across Spatio-Temporal Scales of Complex Flows
Han Gao, Sebastian Kaltenbach, Petros Koumoutsakos
Keywords: Learned Effective Dynamics, Reduced-Order Modeling, Multiscale Systems, Turbulent Flows
13. White-box Error Correction Code Transformer
Ziyan Zheng, Chin Wa Lau, Nian Guo, Xiang Shi, Shao-Lun Huang
Keywords: Error Correction Codes, Neural Decoder, White-box Transformer, Sparse Rate Reduction, Tanner Graph
14. Curse of Attention: A Kernel-Based Perspective for Why Transformers Fail to Generalize on Time Series Forecasting and Beyond
Yekun Ke, Yingyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang
Keywords: Time Series Forecasting, Transformer Generalization, Kernel Methods
15. Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo, Druv Pai, Yu Bai, Jiantao Jiao, Michael I. Jordan, Song Mei
Keywords: attention sink, mechanistic interpretability, language models, transformers
16. Diffusion models learn low-dimensional distributions via subspace clustering
Peng Wang, Huijie Zhang, Zekai Zhang, Siyi Chen, Yi Ma, Qing Qu
Keywords: diffusion models, mixture of low-rank Gaussians, phase transition, subspace clustering
17. Visual Prompting Reimagined: The Power of Activation Prompts
Yihua Zhang, Hongkang Li, Yuguang Yao, Aochuan Chen, Shuai Zhang, Pin-Yu Chen, Meng Wang, Sijia Liu
Keywords: visual prompt, parameter efficient finetuning, learning theory, generalization analysis
18. Understanding Diffusion-based Representation Learning via Low-Dimensional Modeling
Xiao Li, Zekai Zhang, Xiang Li, Siyi Chen, Zhihui Zhu, Peng Wang, Qing Qu
Keywords: diffusion representation learning, representation learning, diffusion model
19. Simplifying DINO by Coding Rate Regularization
Ziyang Wu, Jingyuan Zhang, Druv Pai, Yi Ma
Keywords: Representation Learning, Self Supervised Learning, Coding Rate
20. Closure Discovery for Coarse-Grained Partial Differential Equations Using Grid-based Reinforcement Learning
Jan-Philipp von Bassewitz, Sebastian Kaltenbach, Petros Koumoutsakos
Keywords: Closure Discovery, Inductive Bias, Multi-Agent Reinforcement Learning
21. Heterogeneous Decision Making in Mixed Traffic: Uncertainty-aware Planning and Bounded Rationality
Hang Wang, Qiaoyi Fang, Junshan Zhang
Keywords: Mixed Traffic, Reinforcement Learning, Planning, Bounded Rationality
22. CompeteAI: Understanding the Competition Dynamics of Large Language Model-based Agents
Qinlin Zhao, Jindong Wang, Yixuan Zhang, Yiqiao Jin, Kaijie Zhu, Hao Chen, Xing Xie
Keywords: LLM-based Agent, Agent Based Modeling, Competition
23. DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks
Kaijie Zhu, Jiaao Chen, Jindong Wang, Neil Zhenqiang Gong, Diyi Yang, Xing Xie
Keywords: Large Language Models, Evaluation, Data Contamination
24. Knowledge-aware Parsimony Learning: A Perspective from Relational Graphs
Quanming Yao, Yongqi Zhang, Yaqing Wang, Nan Yin, James Kwok, Qiang Yang
Keywords: scaling law, Parsimony Learning, Graph Learning
25. Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing
Peihao Wang, Ruisi Cai, Yuehao Wang, Jiajun Zhu, Pragya Srivastava, Zhangyang Wang, Pan Li
Keywords: State Space Models, Large Language Models, Recency, Over-smoothing
26. Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding
Jiajun Zhu, Peihao Wang, Ruisi Cai, Jason D. Lee, Pan Li, Zhangyang Wang
Keywords: Positional Encoding, Equivariant Machine Learning, Large Language Models
27. Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
Yize Zhao, Tina Behnia, Vala Vakilian, Christos Thrampoulidis
Keywords: language models, neural embeddings, optimization, implicit regularization, low-rank matrix factorization, support-vector machines
28. Dynamic Rescaling for Training GNNs
Nimrah Mustafa, Rebekka Burkholz
Keywords: graph neural network, rescale invariance, generalization, network balance
29. Image Reconstruction Via Autoencoding Sequential Deep Image Prior
Ismail Alkhouri, Shijun Liang, Evan Bell, Qing Qu, Rongrong Wang, Saiprasad Ravishankar
Keywords: Image Reconstruction, Deep Image Prior, Generative Models
30. SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems
Ismail Alkhouri, Shijun Liang, Cheng-Han Huang, Jimmy Dai, Qing Qu, Saiprasad Ravishankar, Rongrong Wang
Keywords: Image Restoration, Diffusion Models, Inverse Problems
31. SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu, Yuexiang Zhai, Jihan Yang, Shengbang Tong, Saining Xie, Sergey Levine, Yi Ma
Keywords: foundation model post-training
32. Attention-Only Transformers via Unrolled Subspace Denoising
Peng Wang, Yifu Lu, Yaodong Yu, Druv Pai, Qing Qu, Yi Ma
Keywords: transformer, self-attention, unrolled optimization, subspace denoising
33. Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Jiajun Song, Zhuoyan Xu, Yiqiao Zhong
Keywords: out-of-distribution generalization, low-dimensional subspace, composition, large language models, emergent ability, in-context learning
34. Sufficient and Necessary Explanations (and What Lies in Between)
Beepul Bharti, Paul Yi, Jeremias Sulam
Keywords: interpretability, explainability
35. Generative Learning for Solving Non-Convex Problem with Multi-Valued Input-Solution Mapping
Enming Liang, Minghua Chen
Keywords: Non-convex Optimization, Generative Modeling, Flow, ODE
36. Shallow Diffuse: Robust and Invisible Watermarking through Low-Dimensional Subspaces in Diffusion Models
Wenda Li, Huijie Zhang, Qing Qu
Keywords: diffusion Model, watermark, low-dimensional subspace, consistency, robustness