
Spotlight Track: Accepted Papers
Accepted Spotlight Track papers are presented as posters at CPAL 2026. See the full program for the precise time and location of each poster session.
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
Mike Lasby, Ivan Lazarevich, Nish Sinnadurai, Sean Lie, Yani Ioannou, Vithursan Thangarasa
Keywords: mixture-of-experts, moe, compresson, expert pruning, expert merging, merging, pruning, LLM, evaluation
Stackelberg Control in Combinatorial Congestion Games without Differentiating Through Equilibria
Saeed Masiha, Sepehr Elahi, Negar Kiyavash, Patrick Thiran
Keywords: Stackelberg games, zeroth-order optimization, congestion games, zero-suppressed decision diagrams (ZDDs), compact combinatorial representations
Dimension-free error estimate for diffusion model and optimal scheduling
Valentin De Bortoli, Romuald Elie, Anna Kazeykina, Zhenjie Ren, Jiacheng Zhang
Keywords: diffusion model, score-matching, statistical error, optimal scheduling
Adversarial generalization of unfolding (model-based) networks
Vicky Kouni
Keywords: unfolding networks, adversarial generalization, adversarial Rademacher complexity
Dynamic Sparse Training versus Dense Training: The Unexpected Winner in Image Corruption Robustness
Boqian Wu, Qiao Xiao, Shunxin Wang, Nicola Strisciuglio, Mykola Pechenizkiy, Maurice van Keulen, Decebal Constantin Mocanu, Elena Mocanu
Keywords: Dynamic Sparse Training, Image Corruption Robustness
Mask in the Mirror: Implicit Sparsification
Tom Jacobs, Rebekka Burkholz
Keywords: Sparse Training, Continuous sparsification, Implicit bias, Mirror flow, Time-dependent Bregman function, Regularization, Rich regime
E2ENet: Dynamic Sparse Feature Fusion for Accurate and Efficient 3D Medical Image Segmentation
Boqian Wu, Qiao Xiao, Shiwei Liu, Lu Yin, Mykola Pechenizkiy, Decebal Constantin Mocanu, Maurice van Keulen, Elena Mocanu
Keywords: Medical Image Segmentation, Sparse Training, Feature Fusion
HyperINR: Ensuring Semantics in Weights with Implicit Function Theorem
Tianming Qiu, Christos Sonis, Hao Shen
Keywords: Implicit Function Theorem, Semantics in Weights, Weight Space Learning, Implicit Neural Representations, Hypernetworks
GNNs Getting ComFy: Community and Feature Similarity Guided Rewiring
Celia Rubio-Madrigal, Adarsh Jamadandi, Rebekka Burkholz
Keywords: graph neural networks, over-squashing, graph rewiring, community structure, homophily, feature similarity
The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis
Hoang Pham, The-Anh Ta, Tom Jacobs, Rebekka Burkholz, Long Tran-Thanh
Keywords: Pruning Network, Graphon, Neural Tangent Kernel
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning
Zihang Liu, Tianyu Pang, Oleg Balabanov, Chaoqun Yang, Tianjin Huang, Lu Yin, Yaoqing Yang, Shiwei Liu
Keywords: Reasoning, Sparse Fine-tuning, Low-Rank Approximation, Memory Efficiency
Beyond Scores: Proximal Diffusion Models
Zhenghan Fang, Mateo Diaz Diaz, Sam Buchanan, Jeremias Sulam
Keywords: Generative models, Diffusion models, Proximal operators, Backward discretization
Computational Algebra with Attention: Transformer Oracles for Border Basis Algorithms
Hiroshi Kera, Nico Pelleriti, Yuki Ishihara, Max Zimmer, Sebastian Pokutta
Keywords: Polynomial System Solving, Border Bases, Transformer, Computational Algebra, AI4Science, AI4Math
Dual-Kernel Adapter: Expanding Spatial Horizons for Data-Constrained Medical Image Analysis
Ziquan Zhu, Hanruo Zhu, Si-Yuan Lu, Xiang Li, Yanda Meng, Yunxiao Zhang, Gaojie Jin, Lu Yin, Lijie Hu, Di Wang, Lu Liu, Tianjin Huang
Keywords: Adapter; Medical Image Analysis; Data-Limited Training;
Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity
Arto Maranjyan, Peter Richtárik
Keywords: asynchronous SGD, data heterogeneity, optimal time complexity, nonconvex optimization, parallel methods, stochastic optimization
The Curse of Depth in Large Language Models
Wenfang Sun, Xinyuan Song, Pengxiang Li, Lu Yin, Yefeng Zheng, Shiwei Liu
Keywords: Curse of Depth, Large Language Models, Pre-Layer Normalization
PoLAR: Polar-Decomposed Low-Rank Adapter Representation
Kai Lion, Liang Zhang, Bingcong Li, Niao He
Keywords: low-rank adaptation, architecture-optimizer co-design, large language models, lora, low-rank adapter, fine-tuning
Sign-In to the Lottery: Reparameterized Sparse Training
Advait Gadhikar, Tom Jacobs, Chao Zhou, Rebekka Burkholz
Keywords: pruning at initialization, sparse training, lottery ticket hypothesis, mirror flow, reparameterization, sign flips
Revisiting Glorot Initialization for Long-Range Linear Recurrences
Noga Bar, Mariia Seleznova, Yotam Alexander, Gitta Kutyniok, Raja Giryes
Keywords: Recurrent Networks, Initialization, Signal Propagation, Joint Scaling Limits
Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Qiao Xiao, Alan Ansell, Boqian Wu, Lu Yin, Mykola Pechenizkiy, Shiwei Liu, Decebal Constantin Mocanu
Keywords: Large Language Models, Fine-Tuning, Sparse Training
Johnson-Lindenstrauss Lemma Beyond Euclidean Geometry
Chengyuan Deng, Jie Gao, Kevin Lu, Feng Luo, Cheng Xin
Keywords: Dimension Reduction, Geometry
AdaBoost.SDM: Similarity and dissimilarity-based manifold regularized adaptive boosting algorithm
Azamat Mukhamediya, Amin Zollanvari
Keywords: Ensemble learning, Adaptive boosting, Manifold regularization
Hyperbolic Aware Minimization: Implicit Bias for Sparsity
Tom Jacobs, Advait Gadhikar, Celia Rubio-Madrigal, Rebekka Burkholz
Keywords: Sparsity, Implicit bias, Sign flip, Exponential update, Training dynamics, Bregman function
Connectivity determines the capability of sparse neural network quantum states
Brandon Barton, Juan Felipe Carrasquilla Alvarez, Christopher Robert Roth, Agnes Valenti
Keywords: Neural network quantum states, Pruning, Lottery ticket hypothesis
LOST: Low-rank and Sparse Pre-training for Large Language Models
Jiaxi Li, Lu Yin, Li Shen, Jinjin Xu, Adarsh Kappiyath, LiWu Xu, Tianjin Huang, Wenwu Wang, Shiwei Liu, Xilu Wang
Keywords: Large language models, Low-rank, Sparse, Singular value decomposition, Pre-training
Mirror, Mirror of the Flow: How Does Regularization Shape Implicit Bias?
Tom Jacobs, Chao Zhou, Rebekka Burkholz
Keywords: Implicit bias, explicit regularization, weight decay, matrix sensing, LoRA, attention, mirror flow, time-dependent Legendre function
Fixed Aggregation Features Can Rival GNNs
Celia Rubio-Madrigal, Rebekka Burkholz
Keywords: deep learning, graph neural networks, node classification, kolmogorov-arnold representation, tabular learning, non-trainable aggregation
FOSL: A Foldable Sparse-and-Low-Rank Method for Efficient LLM Pre-training
Dong Wang, Francesco Corti, Yun Cheng, Olga Saukh
Keywords: efficient pre-training, low-rank adaption, structured sparsity, large language models, model folding
SparseOpt: Addressing Normalization-induced Gradient Skew in Sparse Training
Mohammed Adnan, Rohan Jain, Tom Jacobs, Ekansh Sharma, Rahul G Krishnan, Rebekka Burkholz, Yani Ioannou
Keywords: sparse training, dynamic sparse training, training dynamics, normalization layers
Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon
Tongtong Liang, Dan Qiao, Yu-Xiang Wang, Rahul Parhi
Keywords: Generalization bound, minima stability, gradient descent, large learning rate, ReLU neural network, minimax rate
Beyond the Ideal: Analyzing the Inexact Muon Update
Egor Shulgin, Sultan AlRashed, Francesco Orabona, Peter Richtárik
Keywords: Optimization, Muon
Pay Attention to Small Weights
Chao Zhou, Tom Jacobs, Advait Gadhikar, Rebekka Burkholz
Keywords: large model, finetuning, effciency, catastrophic forgetting
SALAAD: Sparse and Low-Rank Adaptation via ADMM for Large Language Model Inference
Hao Ma, Melis Ilayda Bal, Liang Zhang, Bingcong Li, Niao He, Melanie Zeilinger, Michael Muehlebach
Keywords: Sparse and Low-Rank Learning, Large Language Models, Structured Optimization, Model Compression, Elastic Inference
Kernel von Mises Formula of the Influence Function
Yaroslav Mukhin
Keywords: Influence function, Fisher-Rao gradient, first variation, tangential interpolation, semiparametric estimation, distributional robustness, kernel PCA, Mercer kernel