
Poster Sessions at CPAL 2026
Presentation Format
All accepted papers at CPAL 2026, from both the Proceedings and Spotlight tracks, will be presented as posters at the conference. A select number of Proceedings track papers are also presented as orals, as specified on the orals page.
See the full program for an aggregated view of the precise times and locations of each poster and oral session.
Quick Links
Poster Session I
Time: Day 2 (Mar 24) – Tuesday – 4:00 PM to 6:00 PM
1. Optimal $k$-Discretization Learning
Tong Wang, Zhangyang Wang
Keywords: Clustering
2. AlphaFormer: End-to-End Symbolic Regression of Alpha Factors with Transformers
Haotong Huang, Jie Peng, Zezhen Ding, Pingzhi Li, Tianlong Chen
Keywords: Symbolic Regression, Alpha Mining, Time Series Generative Modeling
3. From sparse recovery to plug-and-play priors, understanding trade-offs for stable recovery with generalized projected gradient descent
Ali Joundi, Yann Traonmilin, Jean-François Aujol
Keywords: Inverse Problems, Sparse Recovery, Plug-and-Play, Deep Prior, Optimization
4. Generalized Radius and Integrated Codebook Transforms for Differentiable Vector Quantization
Haochen You, Heng Zhang, Hongyang He, Yuqi Li, Baojing Liu
Keywords: Vector Quantization, Discrete Representation Learning, Radius Surrogate, Codebook Transform, Gradient Coupling
5. Beyond Greedy Decoding: Model-Specific Strategy Selection via Multi-faceted Uncertainty Decomposition
Kwangje Baeg, Yubin Lim
Keywords: Uncertainty Decomposition, Adaptive Decoding, Model Heterogeneity, Behavioral Clustering, Instruction-Tuned Models
6. Parameter-Efficient Distributional RL via Normalizing Flows and a Geometry-Aware Cramér Surrogate
Simo Alami Chehboune, Rim Kaddah, Marie-Paule CANI, Jesse Read
Keywords: Distributional Reinforcement Learning, Generative models, Deep Learning, Optimal Transport
7. Cannistraci-Hebb Training with N:M Semi-Structured Sparsity for Pre-Training and Re-Training
Jiaqing Lyu, Ruijie Wang, Kangyou Bao, Yingtao Zhang, Carlo Vittorio Cannistraci
Keywords: Dynamic Sparse Training; Semi-Structured Sparsity; LLM; ViT
8. Effective Learning for Small Reasoning Models: An Empirical Study on 0.5B Reasoning LLMs
Xialie Zhuang, Peixian MA, Zhikai Jia, Zane Cao, Shiwei Liu
Keywords: Small Reasoning Model, Reasoning, Reinforcement Learning
9. ShapLoRA: Allocation of Low-rank Adaption on Large Language Models via Shapley Value Inspired Importance Estimation
Colin Zhao, Qinghua Yao, Xinyuan Song, Wei Zhu
Keywords: LLM LoRA
10. Simplex Deep Linear Discriminant Analysis
Maxat Tezekbayev, Arman Bolatov, Zhenisbek Assylbekov
Keywords: Deep LDA, Maximum likelihood, Simplex-constrained embeddings
11. Matrix Sensing with Kernel Optimal Loss: Robustness and Optimization Landscape
Xinyuan Song, Ziye Ma
Keywords: Matrix sensing, kernel loss function, optimization
12. Improving Medical Visual Reinforcement Fine-Tuning via Perception and Reasoning Augmentation
Guangjing Yang, ZhangYuan Yu, Ziyuan Qin, Xinyuan Song, Huahui Yi, Qingbo Kang, Jun Gao, Yiyue Li, Chenlin Du, Qicheng Lao
Keywords: Reinforcement Fine-Tuning (RFT), Medical Vision-Language Models, Reward Design, Perception-Reasoning Augmentation, Visual Reinforcement Learning, Medical Image Understanding
13. Token-Aware Representation Augmentation for Fine-Grained Semi-Supervised Learning
Hongyang He, Yan Zhong, Xinyuan Song, Daizong Liu, Victor Sanchez
Keywords: Semi-supervised learning, FixMatch, consistency regularization, token-aware masking, token-level augmentation, high-confidence token suppression, feature diversity
14. Enhancing Long-Context Inference with Context-Position Duo-Mixture
Zhenyu Zhang, Sharath Nittur Sridhar, Zhangyang Wang, Souvik Kundu
Keywords: Long-Context; LLM; Efficiency
15. Can Less Be More? Benchmarking Lightweight Models Against State-of-the-Art Deep Learning Architectures for Deployable Seizure Detection
Isaiah Essien, Donna-lee Ginsberg, Jesse Thornburg
Keywords: Parsimonious Learning, Mobile Health, Seizure Detection, TensorFlow Lite, Deep Learning, Resource-Constrained Deployment, Global Health Equity
16. ERC-SVD: Error-Controlled SVD for Large Language Model Compression
Haolei Bai, Siyong Jian, Tuo Liang, Yu Yin, Huan Wang
Keywords: Model Compression, SVD, Large Language Models
17. Sparsity-Aware Prompt Tuning: A Simple and Effective Way to Fine-tune High-Sparsity LLMs
Yuxin Zhang, Weizhong Huang, Yuexiao Ma, Yunshan Zhong, Xiawu Zheng, Rongrong Ji
Keywords: Large language models; Network Pruning
18. Teaching LLMs According to Their Aptitude: Adaptive Switching Between CoT and TIR for Mathematical Problem Solving
Xin Xu, Yan Xu, Tianhao Chen, Yuchen Yan, Chengwu Liu, Zaoyu Chen, Yufei Wang, Yichun Yin, Yasheng Wang, Qun Liu, Lu Yin
Keywords: Large Language Models, math QA, chain-of-thought, tool-integrated reasoning, fine-tuning
19. Scalable LLM Reasoning Acceleration with Low-rank Distillation
Harry Dong, Bilge Acun, Beidi Chen, Yuejie Chi
Keywords: large language model, efficiency, distillation, reasoning, scaling, low-rank, inference
20. Sparse Mixture-of-Experts for Compositional Generalization: Empirical Evidence and Theoretical Foundations of Optimal Sparsity
Jinze Zhao, Peihao Wang, Junjie Yang, Ruisi Cai, Gaowen Liu, Jayanth Srinivasa, Ramana Rao Kompella, Yingbin Liang, Zhangyang Wang
Keywords: Compositional Generalization, Sparsity, Mixture of Experts
21. Pruned Adaptation Modules: A Simple yet Strong Baseline for Continual Foundation Models
Elif Ceren Gok Yildirim, Murat Onur Yildirim, Joaquin Vanschoren
Keywords: continual learning, parameter efficient, foundation models
22. Prompt Stability Matters: Evaluating and Optimizing Auto-Generated Prompt in General-Purpose Systems
Ke Chen, Xucheng Yu, Yufei Zhou, Haohan Wang
Keywords: Prompt Stability, Prompt Evaluation, Multi-Agent System, General-Purpose System, Prompt Auto-Generation, Prompt Optimization
23. Dynamic SFT with Structured Measurements: Fast Queries, Fast Updates, Provable Guarantees
Yang Cao, Zhao Song
Keywords: sparse Fourier transform
24. Panza: Investigating the Feasibility of Fully-Local Personalized Text Generation
Armand Mihai Nicolicioiu, Eugenia Iofinova, Andrej Jovanovic, Eldar Kurtic, Mahdi Nikdan, Andrei Panferov, Ilia Markov, Nir N Shavit, Dan Alistarh
Keywords: LLMs, PEFT, LoRA, personalization, efficient ML
25. (PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork
Tianjin Huang, Yong Tao, Meng Fang, Li Shen, Fan Liu, Yulong Pei, Mykola Pechenizkiy, Tianlong Chen
Keywords: Structure Pruning, Visual Prompt, Recurrent HyperNetwork
26. Concept based Ambiguity Resolution in LLMs
Zhibo Hu, Chen Wang, Yanfeng Shu, Hye-young Paik, Liming Zhu
Keywords: Language Ambiguity; Large Language Model; Sparse Autoencoder; Path Kernel
27. Data-Efficient and Robust Trajectory Generation through Pathlet Dictionary Learning
yuanbo tang, Yan Tang, Zihui Zhao, Zixuan Zhang, Yang Li
Keywords: trajectory generative model, dictionary learning, sparse representation
28. Sign-In to the Lottery: Reparameterized Sparse Training
Advait Gadhikar, Tom Jacobs, Chao Zhou, Rebekka Burkholz
Keywords: pruning at initialization, sparse training, lottery ticket hypothesis, mirror flow, reparameterization, sign flips
29. PoLAR: Polar-Decomposed Low-Rank Adapter Representation
Kai Lion, Liang Zhang, Bingcong Li, Niao He
Keywords: low-rank adaptation, architecture-optimizer co-design, large language models, lora, low-rank adapter, fine-tuning
30. Kernel von Mises Formula of the Influence Function
Yaroslav Mukhin
Keywords: Influence function, Fisher-Rao gradient, first variation, tangential interpolation, semiparametric estimation, distributional robustness, kernel PCA, Mercer kernel
31. Dual-Kernel Adapter: Expanding Spatial Horizons for Data-Constrained Medical Image Analysis
Ziquan Zhu, Hanruo Zhu, Si-Yuan Lu, Xiang Li, Yanda Meng, Yunxiao Zhang, Gaojie Jin, Lu Yin, Lijie Hu, Di Wang, Lu Liu, Tianjin Huang
Keywords: Adapter; Medical Image Analysis; Data-Limited Training;
32. Adversarial generalization of unfolding (model-based) networks
Vicky Kouni
Keywords: unfolding networks, adversarial generalization, adversarial Rademacher complexity
33. Revisiting Glorot Initialization for Long-Range Linear Recurrences
Noga Bar, Mariia Seleznova, Yotam Alexander, Gitta Kutyniok, Raja Giryes
Keywords: Recurrent Networks, Initialization, Signal Propagation, Joint Scaling Limits
34. Mask in the Mirror: Implicit Sparsification
Tom Jacobs, Rebekka Burkholz
Keywords: Sparse Training, Continuous sparsification, Implicit bias, Mirror flow, Time-dependent Bregman function, Regularization, Rich regime
35. Dimension-free error estimate for diffusion model and optimal scheduling
Valentin De Bortoli, Romuald Elie, Anna Kazeykina, Zhenjie Ren, Jiacheng Zhang
Keywords: diffusion model, score-matching, statistical error, optimal scheduling
36. E2ENet: Dynamic Sparse Feature Fusion for Accurate and Efficient 3D Medical Image Segmentation
Boqian Wu, Qiao Xiao, Shiwei Liu, Lu Yin, Mykola Pechenizkiy, Decebal Constantin Mocanu, Maurice van Keulen, Elena Mocanu
Keywords: Medical Image Segmentation, Sparse Training, Feature Fusion
37. Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Qiao Xiao, Alan Ansell, Boqian Wu, Lu Yin, Mykola Pechenizkiy, Shiwei Liu, Decebal Constantin Mocanu
Keywords: Large Language Models, Fine-Tuning, Sparse Training
38. The Curse of Depth in Large Language Models
Wenfang Sun, Xinyuan Song, Pengxiang Li, Lu Yin, Yefeng Zheng, Shiwei Liu
Keywords: Curse of Depth, Large Language Models, Pre-Layer Normalization
39. Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity
Arto Maranjyan, Peter Richtárik
Keywords: asynchronous SGD, data heterogeneity, optimal time complexity, nonconvex optimization, parallel methods, stochastic optimization
40. FOSL: A Foldable Sparse-and-Low-Rank Method for Efficient LLM Pre-training
Dong Wang, Francesco Corti, Yun Cheng, Olga Saukh
Keywords: efficient pre-training, low-rank adaption, structured sparsity, large language models, model folding
41. LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning
Zihang Liu, Tianyu Pang, Oleg Balabanov, Chaoqun Yang, Tianjin Huang, Lu Yin, Yaoqing Yang, Shiwei Liu
Keywords: Reasoning, Sparse Fine-tuning, Low-Rank Approximation, Memory Efficiency
42. Johnson-Lindenstrauss Lemma Beyond Euclidean Geometry
Chengyuan Deng, Jie Gao, Kevin Lu, Feng Luo, Cheng Xin
Keywords: Dimension Reduction, Geometry
43. Stackelberg Control in Combinatorial Congestion Games without Differentiating Through Equilibria
Saeed Masiha, Sepehr Elahi, Negar Kiyavash, Patrick Thiran
Keywords: Stackelberg games, zeroth-order optimization, congestion games, zero-suppressed decision diagrams (ZDDs), compact combinatorial representations
Poster Session II
Time: Day 3 (Mar 25) – Wednesday – 3:30 PM to 5:30 PM
1. Stochastic Unrolled Neural Networks
Samar Hadou, Navid NaderiAlizadeh, Alejandro Ribeiro
Keywords: unrolled optimization, learning to learn, deep unfolding, interpretable deep architecture, constrained learning
2. Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs
Ruichen Zhang, Mufan Qiu, Zhen Tan, Mohan Zhang, Xiaopeng Lu, Jie Peng, Kaidi Xu, Leandro Z. Agudelo, Peter Zhenghao Qian, Tianlong Chen
Keywords: LLM, Agent, Knowledge Distillation, Web Agent, Symbiotic Cooperation, Privacy Preservation, Hybrid Mode
3. FocusDC: Real-World Scene Infusion for Robust Dataset Condensation
Youbing Hu, Yun Cheng, Olga Saukh, Firat Ozdemir, Anqi Lu, Zhiqiang Cao, Min Zhang, Zhijun Li
Keywords: Dataset Distillation and Condensation, Vision Transformer
4. Enhancing Low-Cost Video Editing with Lightweight Adaptors and Temporal-Aware Inversion
Yangfan He, Sida Li, Jianhui Wang, Xinyuan Song, Kun Li, Xinhang Yuan, Kuan Lu, Menghao Huo, Jingqun Tang, Yi Xin, Jiaqi Chen, Keqin Li, Miao Zhang, Xueqian Wang
Keywords: Text-to-Image (T2I) Generation, Diffusion Models, Text-to-Video (T2V) Editing, Temporal Consistency, Spatial Consistency
5. MMA:Benchmarking Multi-ModalLarge Language Models in Ambiguity Contexts
Ru Wang, Selena Song, Yuquan Wang, Liang Ding, Mingming Gong, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo
Keywords: Multi-Modal Large Language Model, Ambiguity, Benchmark, Dataset
6. Byzantine-Robust Optimization under $(L_0,L_1)$-Smoothness
Arman Bolatov, Samuel Horváth, Martin Takáč, Eduard Gorbunov
Keywords: byzantine-robust optimization, federated learning, generalized smoothness, normalized SGD
7. ROSE: Reordered SparseGPT for More Accurate One-Shot Large Language Models Pruning
Mingluo Su, Huan Wang
Keywords: Large language models, Unstructured pruning, Pruning order
8. Selective Collaboration for Robust Federated Learning
Nazarii Tupitsa, Samuel Horváth, Martin Takáč, Eduard Gorbunov
Keywords: federated learning, robust aggreagation
9. Trainable Bitwise Soft Quantization for Input Feature Compression
Karsten Schrödter, Jan Stenkamp, Nina Herrmann, Fabian Gieseke
Keywords: Soft Quantization, Trainable Quantization, Input Compression, Tiny Machine Learning, Split Inference
10. Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework
Xinyuan Song, Yangfan He, Sida Li, Jianhui Wang, Hongyang He, Xinhang Yuan, Ruoyu Wang, Jiaqi Chen, Keqin Li, Kuan Lu, Menghao Huo, Ziqian Bi, Binxu Li, Pei Liu
Keywords: Adapter-based Methods, Diffusion Models, Video Editing, Temporal Consistency, DDIM Inversion, Prompt Learning, Theoretical Analysis
11. Learning of Discretized LSTMs
Nikolaus Kopp, Franz Pernkopf
Keywords: probabilistic, QAT, discrete LSTM, Gumbel-Softmax
12. FLIPR: FLexible and Interpretable Prediction Regions for time series
Eshant English, Christoph Lippert
Keywords: interpretable regions, time series, conformal prediction
13. SPIKE: Sparse Koopman Regularization for Physics-Informed Neural Networks
Jose Marie Antonio Miñoza
Keywords: Physics-Informed Neural Networks, Koopman Operator, Out-Of-Distribution Generalization, Dynamical Systems
14. GRAIL: Post-hoc Compensation by Linear Reconstruction for Compressed Networks
Wenwu Tang, Dong Wang, Lothar Thiele, Olga Saukh
Keywords: Model Compression, Model Pruning, Model Folding, Model Compensation, LLM, Model Efficiency
15. What Scalable Second-Order Information Knows for Pruning at Initialization
Ivo Gollini Navarrete, Nicolas Mauricio Cuadrado, Martin Takáč, Samuel Horváth
Keywords: Pruning, Hessian, One-shot, Initialization, Hutchinson, Fisher
16. Superclass-Guided Representation Disentanglement for Spurious Correlation Mitigation
Chenruo Liu, Hongjun Liu, Zeyu Lai, Yiqiu Shen, Chen Zhao, Qi Lei
Keywords: Spurious Correlation, Group Robustness, Domain Generalization
17. Lattice-Based Vector Quantization for Low-Bit Quantization-Aware Training
Rishika Kohli, Soma S Dhavala, Shaifu Gupta, Manoj Singh Gaur
Keywords: compression, quantization, pruning, deep learning, vector quantization, quantization aware training, post training quantization, BERT
18. Deep Neural Regression Collapse
Akshay Rangamani, Altay Unal
Keywords: Neural Collapse, Low Rank, Neural Regression Collapse
19. Learning in the Null Space: Small Singular Values for Continual Learning
Cuong Anh Pham, Praneeth Vepakomma, Samuel Horváth
Keywords: continual learning, singular value decomposition, small singular values, null space
20. A Stein identity for $q$-Gaussians with bounded support
Sophia Sklaviadis, Thomas Möllenhoff, Mario A. T. Figueiredo, Andre Martins, Mohammad Emtiyaz Khan
Keywords: Generalized Stein identities, elliptical families, bounded-support q-Gaussians
21. LLMQ: Efficient Lower-Precision LLM Training for Consumer GPUs
Erik Schultheis, Dan Alistarh
Keywords: consumer GPU, quantized training
22. Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization
Ru Wang, Wei Huang, Selena Song, Haoyu Zhang, Qian Niu, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo
Keywords: Chain of Thought, Scaling Curve, Out-of-Distribution Generalization, Sample Efficiency
23. Analyzing and Mitigating Model Collapse in Reflow Methods
Huminhao Zhu, Fangyikang Wang, Tianyu Ding, Qing Qu, Zhihui Zhu
Keywords: Model Collapse, Self-training, Synthetic Data, Reflow, Rectified Flow
24. SonoEdit: Null-Space Constrained Knowledge Editing for Pronunciation Correction in LLM-Based TTS
Ayush Pratap Singh, Harshit Singh, Nityanand Mathur, Akshat Mandloi, Sudarshan Kamath
Keywords: Knowledge Editing, Text to Speech, LLMs, Parameter Efficiency
25. KNIGHT: Knowledge Graph-Driven Multiple-Choice Question Generation with Adaptive Hardness Calibration
Mohammad Amanlou, Erfan Shafiee Moghaddam, Mahdi Nouri, Yasaman Amou Jafary, Farhan Farsi, Behnam Bahrak
Keywords: Multiple-Choice Question Generation, Knowledge Graph, Difficulty Calibration, Question Answering Dataset
26. Emergence of Auditory Receptive Fields based on Surprise
Yashaswini, Sneha Dash, Sharba Bandyopadhyay
Keywords: Auditory receptive fields, Bayesian surprise, sparse coding, Oddball paradigm, predictive inference, Autoregressive generative modeling, efficient sensory coding, biologically inspired learning
27. Semantic Homogeneity As Demonstration: Batch-Structured Semi-Supervised In-Context Learning for Natural Language Understanding
Cheng Chen, Yuangang Pan, Ivor Tsang
Keywords: In-Context Learning, Natural Language Understanding, Prompt Engineering / Prompting, Aggregate Ranking
28. GNNs Getting ComFy: Community and Feature Similarity Guided Rewiring
Celia Rubio-Madrigal, Adarsh Jamadandi, Rebekka Burkholz
Keywords: graph neural networks, over-squashing, graph rewiring, community structure, homophily, feature similarity
29. Beyond Scores: Proximal Diffusion Models
Zhenghan Fang, Mateo Diaz Diaz, Sam Buchanan, Jeremias Sulam
Keywords: Generative models, Diffusion models, Proximal operators, Backward discretization
30. AdaBoost.SDM: Similarity and dissimilarity-based manifold regularized adaptive boosting algorithm
Azamat Mukhamediya, Amin Zollanvari
Keywords: Ensemble learning, Adaptive boosting, Manifold regularization
31. Pay Attention to Small Weights
Chao Zhou, Tom Jacobs, Advait Gadhikar, Rebekka Burkholz
Keywords: large model, finetuning, effciency, catastrophic forgetting
32. Dynamic Sparse Training versus Dense Training: The Unexpected Winner in Image Corruption Robustness
Boqian Wu, Qiao Xiao, Shunxin Wang, Nicola Strisciuglio, Mykola Pechenizkiy, Maurice van Keulen, Decebal Constantin Mocanu, Elena Mocanu
Keywords: Dynamic Sparse Training, Image Corruption Robustness
33. Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon
Tongtong Liang, Dan Qiao, Yu-Xiang Wang, Rahul Parhi
Keywords: Generalization bound, minima stability, gradient descent, large learning rate, ReLU neural network, minimax rate
34. HyperINR: Ensuring Semantics in Weights with Implicit Function Theorem
Tianming Qiu, Christos Sonis, Hao Shen
Keywords: Implicit Function Theorem, Semantics in Weights, Weight Space Learning, Implicit Neural Representations, Hypernetworks
35. Connectivity determines the capability of sparse neural network quantum states
Brandon Barton, Juan Felipe Carrasquilla Alvarez, Christopher Robert Roth, Agnes Valenti
Keywords: Neural network quantum states, Pruning, Lottery ticket hypothesis
36. Computational Algebra with Attention: Transformer Oracles for Border Basis Algorithms
Hiroshi Kera, Nico Pelleriti, Yuki Ishihara, Max Zimmer, Sebastian Pokutta
Keywords: Polynomial System Solving, Border Bases, Transformer, Computational Algebra, AI4Science, AI4Math
37. REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
Mike Lasby, Ivan Lazarevich, Nish Sinnadurai, Sean Lie, Yani Ioannou, Vithursan Thangarasa
Keywords: mixture-of-experts, moe, compresson, expert pruning, expert merging, merging, pruning, LLM, evaluation
38. Hyperbolic Aware Minimization: Implicit Bias for Sparsity
Tom Jacobs, Advait Gadhikar, Celia Rubio-Madrigal, Rebekka Burkholz
Keywords: Sparsity, Implicit bias, Sign flip, Exponential update, Training dynamics, Bregman function
39. Beyond the Ideal: Analyzing the Inexact Muon Update
Egor Shulgin, Sultan AlRashed, Francesco Orabona, Peter Richtárik
Keywords: Optimization, Muon
40. The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis
Hoang Pham, The-Anh Ta, Tom Jacobs, Rebekka Burkholz, Long Tran-Thanh
Keywords: Pruning Network, Graphon, Neural Tangent Kernel
41. Fixed Aggregation Features Can Rival GNNs
Celia Rubio-Madrigal, Rebekka Burkholz
Keywords: deep learning, graph neural networks, node classification, kolmogorov-arnold representation, tabular learning, non-trainable aggregation
42. SparseOpt: Addressing Normalization-induced Gradient Skew in Sparse Training
Mohammed Adnan, Rohan Jain, Tom Jacobs, Ekansh Sharma, Rahul G Krishnan, Rebekka Burkholz, Yani Ioannou
Keywords: sparse training, dynamic sparse training, training dynamics, normalization layers
43. SALAAD: Sparse and Low-Rank Adaptation via ADMM for Large Language Model Inference
Hao Ma, Melis Ilayda Bal, Liang Zhang, Bingcong Li, Niao He, Melanie Zeilinger, Michael Muehlebach
Keywords: Sparse and Low-Rank Learning, Large Language Models, Structured Optimization, Model Compression, Elastic Inference
44. LOST: Low-rank and Sparse Pre-training for Large Language Models
Jiaxi Li, Lu Yin, Li Shen, Jinjin Xu, Adarsh Kappiyath, LiWu Xu, Tianjin Huang, Wenwu Wang, Shiwei Liu, Xilu Wang
Keywords: Large language models, Low-rank, Sparse, Singular value decomposition, Pre-training