Tutorials

The first day of the conference features six tutorial presentations, arranged into two parallel tracks across three sessions throughout the day.

See the program schedule for the times and locations of each tutorial.

List of Tutorials

Track 1 — MPH Lecture Hall, Max-Planck-Ring 6

Orthogonal Training for Foundation Models: Theory, Algorithm and Application

Presenters: Weiyang Liu (CUHK), Zeju Qiu (Max Planck Institute for Intelligent Systems)

Abstract

Orthogonal training offers a principled way to optimize foundation models without disturbing their core spectrum and geometry. This tutorial will present a unified framework of orthogonal training for large-scale neural networks, with a focus on language and text-to-image foundation models. We will start from first principles – how orthogonal training relates to the inductive bias of minimum description length, and then show how this idea can lead to generalizable inductive bias. Building on this foundation, we will present recent orthogonal training algorithms, including orthogonal finetuning and orthogonal pretraining. We will further analyze their impact on optimization stability, generalization, calibration, and robustness. Special attention will be given to memory- and compute-efficient implementations that enable billion-parameter models to be trained under tight hardware budgets, as well as connections to sparse training and low-rank adaptation. Finally, we will discuss practical recipes, open-source implementations, and case studies on training and fine-tuning LLMs and other foundation models with orthogonal training, highlighting both the current limitations and promising research frontiers. The tutorial targets researchers and practitioners interested in principled ways to scale foundation models more efficiently, reliably, and interpretably.

Inference Optimization Playbook for Serving LLMs in the Industry: Techniques from Architecture to AI Accelerators

Presenters: Kailash Budhathoki (AWS), Jonas Kuebler (AWS), Ashish Khetan (AWS)

Abstract

The explosive growth of Large Language Models (LLMs) in production environments has made inference efficiency a critical bottleneck for industrial adoption. This tutorial provides a comprehensive playbook covering the advanced, production-proven techniques used to minimize latency and maximize throughput when serving state-of-the-art LLMs. We will guide attendees through the complete LLM lifecycle, starting by establishing the hardware constraints imposed by modern AI accelerators, detailing the interplay between memory hierarchy and compute units. We will then examine architectural design choices (e.g., Grouped-Query Attention, MoE, Speculative-Decoding Aware Architectures) and progress to post-training optimization (quantization, sparsity, speculative decoding) with a focus on practical guardrails for accuracy preservation. A significant portion of the tutorial will dive into the core of high-performance inference engines, detailing essential concepts such as continuous batching, Paged Attention, KV caching, efficient scheduling, and the use of CUDA Graphs and kernel fusion. Attendees will gain a deep, end-to-end understanding of how to architect, optimize, and serve LLMs efficiently at scale in a real-world setting.

Reward Modeling in Large Language Models: Principles, Methods, and Challenges

Presenters: Meng Fang (University of Liverpool), Yudi Zhang (Eindhoven University of Technology), Mykola Pechenizkiy (Eindhoven University of Technology)

Abstract

Reward modeling has become a central mechanism for steering large language models (LLMs) toward desired behaviors. Beyond pure next-token prediction, modern systems rely on pre-defined/learned reward signals to encode preferences over outputs, shape reasoning processes, and perform test-time optimization. This tutorial provides a structured overview of how rewards are built and used in contemporary LLM pipelines. We first introduce how LLMs can be viewed as policies and how learned rewards are used both for post-training and for test-time reranking. We then illustrate reward modeling from preferences: how human feedback is collected, how explicit scalar reward models are trained, and how DPO-style methods use the pair-wise data to optimize the policy directly. Next, we compare outcome and process reward modeling, and discuss the training pipeline of process reward models. We then introduce verifiable rewards and how reinforcement learning from such signals is applied in practice. This is followed by LLM-as-reward settings, where judge models provide language-based, holistic feedback that differs from purely numeric rewards. Finally, we present a unifying view of this design space, evaluation metrics for reward models, key challenges such as data quality and reward hacking, and emerging directions including multimodal and online reward learning.

Track 2 — MPI Lecture Hall, Max-Planck-Ring 4

Training Neural Networks at Any Scale

Presenters: Leena Chennuru Vankadara (University College London), Volkan Cevher (EPFL, LIONS)

Abstract

At the heart of deep learning’s transformative impact lies the concept of scale – encompassing both data and computational resources, as well as their interaction with neural network architectures. Scale, however, presents critical challenges, such as increased instability during training and prohibitively expensive model-specific tuning. Given the substantial resources required to train such models, formulating high-confidence scaling hypotheses backed by a rigorous theoretical research has become paramount. The first part of the tutorial will provide an overview of significant advances in the theory of scaling in deep learning, covering its historical foundations, recent breakthroughs, and practical implications for training large-scale models. To bridge theory and practice, the tutorial explores another key mathematical ingredient of scaling: the numerical solution algorithms commonly employed in deep learning, spanning domains from vision to language models. We unify these algorithms under a common master template, making their foundational principles transparent. In doing so, we reveal the interplay between adaptation to smoothness structures via online learning and the exploitation of optimization geometry through non-Euclidean norms. Our exposition moves beyond simply building larger models – it emphasizes strategic scaling, offering insights that promise to advance the field while economizing on resources.

Where to Spend Parameters: From Layerwise Efficiency To Federated Architecture Search

Presenters: Lu Yin (University of Surrey), Xilu Wang (University of Surrey)

Abstract

Large neural networks – especially Large Language Models (LLMs) and modern transformers – deliver strong performance, but their growing scale makes a central question unavoidable: where should we spend parameters to get the best capability per unit compute? This tutorial presents a unified answer across two complementary levels of design: allocation within a fixed model through layerwise efficiency, and allocation across models under distributed constraints through federated architecture search. We begin by showing that layerwise signals reveal substantial depthwise heterogeneity: different layers contribute unequally to model quality. Leveraging this structure enables targeted pruning for efficient inference and compute-aware finetuning by prioritizing updates on the most influential layers. We then examine the origin of layerwise imbalance as a pretraining deficiency and introduce simple training-time remedies that re-balance layer contributions across depth, improving utilization of the model’s capacity. Moving from “which layers to trust” to “which architectures to deploy,” we examine neural architecture search across increasingly complex settings. We begin with multitask scenarios where architectures must balance accuracy and efficiency across heterogeneous tasks and datasets, then extend to federated learning environments where additional constraints – distributed data, varying compute budgets, and privacy requirements – demand coordinated architecture discovery across participants. Together, these methods demonstrate that practical efficiency comes from learning to spend parameters deliberately – across layers within models, across architectures in model families, and across heterogeneous tasks and participants.

Theoretical Insights on Training Instability in Deep Learning

Presenters: Jingfeng Wu (University of California, Berkeley), Yu-Xiang Wang (UC San Diego), Maryam Fazel (University of Washington)

Abstract

The advances in deep learning build on the dark arts of gradient-based optimization. In deep learning, the optimization process is oscillatory, spiky, and unstable. This makes little sense in classical optimization theory, which primarily operates in a well-behaved, stable regime. Yet, the best training configuration in practice constantly operates in an unstable regime. This tutorial introduces recent theoretical progress in understanding the benign nature of training instabilities, providing new insights from both optimization and statistical learning perspectives. Participants will gain a solid understanding of training instabilities in deep learning, their theoretical and practical implications, and future research directions in this critical area.