Splash photo of Tübingen
Conference on Parsimony and Learning (CPAL)
March 2026, Tübingen

Spotlight Track: Accepted Papers

Accepted Spotlight Track papers are presented as posters at CPAL 2026. See the full program for the precise time and location of each poster session.

REAP the Experts: Why Pruning Prevails for One-Shot MoE compression

Mike Lasby, Ivan Lazarevich, Nish Sinnadurai, Sean Lie, Yani Ioannou, Vithursan Thangarasa

Keywords: mixture-of-experts, moe, compresson, expert pruning, expert merging, merging, pruning, LLM, evaluation

Stackelberg Control in Combinatorial Congestion Games without Differentiating Through Equilibria

Saeed Masiha, Sepehr Elahi, Negar Kiyavash, Patrick Thiran

Keywords: Stackelberg games, zeroth-order optimization, congestion games, zero-suppressed decision diagrams (ZDDs), compact combinatorial representations

Dimension-free error estimate for diffusion model and optimal scheduling

Valentin De Bortoli, Romuald Elie, Anna Kazeykina, Zhenjie Ren, Jiacheng Zhang

Keywords: diffusion model, score-matching, statistical error, optimal scheduling

Adversarial generalization of unfolding (model-based) networks

Vicky Kouni

Keywords: unfolding networks, adversarial generalization, adversarial Rademacher complexity

Dynamic Sparse Training versus Dense Training: The Unexpected Winner in Image Corruption Robustness

Boqian Wu, Qiao Xiao, Shunxin Wang, Nicola Strisciuglio, Mykola Pechenizkiy, Maurice van Keulen, Decebal Constantin Mocanu, Elena Mocanu

Keywords: Dynamic Sparse Training, Image Corruption Robustness

Mask in the Mirror: Implicit Sparsification

Tom Jacobs, Rebekka Burkholz

Keywords: Sparse Training, Continuous sparsification, Implicit bias, Mirror flow, Time-dependent Bregman function, Regularization, Rich regime

E2ENet: Dynamic Sparse Feature Fusion for Accurate and Efficient 3D Medical Image Segmentation

Boqian Wu, Qiao Xiao, Shiwei Liu, Lu Yin, Mykola Pechenizkiy, Decebal Constantin Mocanu, Maurice van Keulen, Elena Mocanu

Keywords: Medical Image Segmentation, Sparse Training, Feature Fusion

HyperINR: Ensuring Semantics in Weights with Implicit Function Theorem

Tianming Qiu, Christos Sonis, Hao Shen

Keywords: Implicit Function Theorem, Semantics in Weights, Weight Space Learning, Implicit Neural Representations, Hypernetworks

GNNs Getting ComFy: Community and Feature Similarity Guided Rewiring

Celia Rubio-Madrigal, Adarsh Jamadandi, Rebekka Burkholz

Keywords: graph neural networks, over-squashing, graph rewiring, community structure, homophily, feature similarity

The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis

Hoang Pham, The-Anh Ta, Tom Jacobs, Rebekka Burkholz, Long Tran-Thanh

Keywords: Pruning Network, Graphon, Neural Tangent Kernel

LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning

Zihang Liu, Tianyu Pang, Oleg Balabanov, Chaoqun Yang, Tianjin Huang, Lu Yin, Yaoqing Yang, Shiwei Liu

Keywords: Reasoning, Sparse Fine-tuning, Low-Rank Approximation, Memory Efficiency

Beyond Scores: Proximal Diffusion Models

Zhenghan Fang, Mateo Diaz Diaz, Sam Buchanan, Jeremias Sulam

Keywords: Generative models, Diffusion models, Proximal operators, Backward discretization

Computational Algebra with Attention: Transformer Oracles for Border Basis Algorithms

Hiroshi Kera, Nico Pelleriti, Yuki Ishihara, Max Zimmer, Sebastian Pokutta

Keywords: Polynomial System Solving, Border Bases, Transformer, Computational Algebra, AI4Science, AI4Math

Dual-Kernel Adapter: Expanding Spatial Horizons for Data-Constrained Medical Image Analysis

Ziquan Zhu, Hanruo Zhu, Si-Yuan Lu, Xiang Li, Yanda Meng, Yunxiao Zhang, Gaojie Jin, Lu Yin, Lijie Hu, Di Wang, Lu Liu, Tianjin Huang

Keywords: Adapter; Medical Image Analysis; Data-Limited Training;

Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity

Arto Maranjyan, Peter Richtárik

Keywords: asynchronous SGD, data heterogeneity, optimal time complexity, nonconvex optimization, parallel methods, stochastic optimization

The Curse of Depth in Large Language Models

Wenfang Sun, Xinyuan Song, Pengxiang Li, Lu Yin, Yefeng Zheng, Shiwei Liu

Keywords: Curse of Depth, Large Language Models, Pre-Layer Normalization

PoLAR: Polar-Decomposed Low-Rank Adapter Representation

Kai Lion, Liang Zhang, Bingcong Li, Niao He

Keywords: low-rank adaptation, architecture-optimizer co-design, large language models, lora, low-rank adapter, fine-tuning

Sign-In to the Lottery: Reparameterized Sparse Training

Advait Gadhikar, Tom Jacobs, Chao Zhou, Rebekka Burkholz

Keywords: pruning at initialization, sparse training, lottery ticket hypothesis, mirror flow, reparameterization, sign flips

Revisiting Glorot Initialization for Long-Range Linear Recurrences

Noga Bar, Mariia Seleznova, Yotam Alexander, Gitta Kutyniok, Raja Giryes

Keywords: Recurrent Networks, Initialization, Signal Propagation, Joint Scaling Limits

Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution

Qiao Xiao, Alan Ansell, Boqian Wu, Lu Yin, Mykola Pechenizkiy, Shiwei Liu, Decebal Constantin Mocanu

Keywords: Large Language Models, Fine-Tuning, Sparse Training

Johnson-Lindenstrauss Lemma Beyond Euclidean Geometry

Chengyuan Deng, Jie Gao, Kevin Lu, Feng Luo, Cheng Xin

Keywords: Dimension Reduction, Geometry

AdaBoost.SDM: Similarity and dissimilarity-based manifold regularized adaptive boosting algorithm

Azamat Mukhamediya, Amin Zollanvari

Keywords: Ensemble learning, Adaptive boosting, Manifold regularization

Hyperbolic Aware Minimization: Implicit Bias for Sparsity

Tom Jacobs, Advait Gadhikar, Celia Rubio-Madrigal, Rebekka Burkholz

Keywords: Sparsity, Implicit bias, Sign flip, Exponential update, Training dynamics, Bregman function

Connectivity determines the capability of sparse neural network quantum states

Brandon Barton, Juan Felipe Carrasquilla Alvarez, Christopher Robert Roth, Agnes Valenti

Keywords: Neural network quantum states, Pruning, Lottery ticket hypothesis

LOST: Low-rank and Sparse Pre-training for Large Language Models

Jiaxi Li, Lu Yin, Li Shen, Jinjin Xu, Adarsh Kappiyath, LiWu Xu, Tianjin Huang, Wenwu Wang, Shiwei Liu, Xilu Wang

Keywords: Large language models, Low-rank, Sparse, Singular value decomposition, Pre-training

Mirror, Mirror of the Flow: How Does Regularization Shape Implicit Bias?

Tom Jacobs, Chao Zhou, Rebekka Burkholz

Keywords: Implicit bias, explicit regularization, weight decay, matrix sensing, LoRA, attention, mirror flow, time-dependent Legendre function

Fixed Aggregation Features Can Rival GNNs

Celia Rubio-Madrigal, Rebekka Burkholz

Keywords: deep learning, graph neural networks, node classification, kolmogorov-arnold representation, tabular learning, non-trainable aggregation

FOSL: A Foldable Sparse-and-Low-Rank Method for Efficient LLM Pre-training

Dong Wang, Francesco Corti, Yun Cheng, Olga Saukh

Keywords: efficient pre-training, low-rank adaption, structured sparsity, large language models, model folding

SparseOpt: Addressing Normalization-induced Gradient Skew in Sparse Training

Mohammed Adnan, Rohan Jain, Tom Jacobs, Ekansh Sharma, Rahul G Krishnan, Rebekka Burkholz, Yani Ioannou

Keywords: sparse training, dynamic sparse training, training dynamics, normalization layers

Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon

Tongtong Liang, Dan Qiao, Yu-Xiang Wang, Rahul Parhi

Keywords: Generalization bound, minima stability, gradient descent, large learning rate, ReLU neural network, minimax rate

Beyond the Ideal: Analyzing the Inexact Muon Update

Egor Shulgin, Sultan AlRashed, Francesco Orabona, Peter Richtárik

Keywords: Optimization, Muon

Pay Attention to Small Weights

Chao Zhou, Tom Jacobs, Advait Gadhikar, Rebekka Burkholz

Keywords: large model, finetuning, effciency, catastrophic forgetting

SALAAD: Sparse and Low-Rank Adaptation via ADMM for Large Language Model Inference

Hao Ma, Melis Ilayda Bal, Liang Zhang, Bingcong Li, Niao He, Melanie Zeilinger, Michael Muehlebach

Keywords: Sparse and Low-Rank Learning, Large Language Models, Structured Optimization, Model Compression, Elastic Inference

Kernel von Mises Formula of the Influence Function

Yaroslav Mukhin

Keywords: Influence function, Fisher-Rao gradient, first variation, tangential interpolation, semiparametric estimation, distributional robustness, kernel PCA, Mercer kernel