Conference on Parsimony and Learning (CPAL)

March 2025, Stanford

CPAL Rising Stars Presentation Sessions

Presentation Format

Each awarded CPAL Rising Star will give a talk about their research in one of three sessions during the conference.

Presentations are ten minutes in duration, with two minutes for Q&A.

The ordering of session numbers matches their chronological ordering, and presentations will be delivered in the order they are listed. See the full program for the precise time and location of each CPAL Rising Stars session.

Quick Links

Rising Stars Talks 1
1. Time: Day 2 (Mar 25) – Tuesday – 1:30 PM to 2:30 PM
Rising Stars Talks 2
1. Time: Day 3 (Mar 26) – Wednesday – 1:30 PM to 2:30 PM
Rising Stars Talks 3
1. Time: Day 4 (Mar 27) – Thursday – 3:00 PM to 4:00 PM

Rising Stars Talks 1

Time: Day 2 (Mar 25) – Tuesday – 1:30 PM to 2:30 PM

Tianlong Chen

The University of North Carolina at Chapel Hill

Assistant Professor

Title: Breaking the Resource Monopoly from Industries: Sustainable and Reliable LLM Serving By Recycling Outdated and Resource-Constrained GPUs

Abstract: In recent years, Large Language Model (LLM) agents, exemplified by models like ChatGPT, and PaLM, have showcased remarkable prowess in various tasks, owing to their vast number of parameters and emergent in-context learning capabilities. To serve these gigantic models with billions of parameters, it is a trend and becomes a must to explore how to use the existing hardware, especially outdated hardware, to collectively improve environmental sustainability, efficiency, and reliability for LLM serving. A few pioneering examples include Microsoft’s Project Natick, Google’s TPU Pod Optimization, Alibaba’s Cloud Server Repurposing, and Facebook’s Network Hardware Reuse. In this talk, I will traverse my series of contributions with promising new directions, particularly emphasizing modularized LLM architecture (Part 1), in-storage sustainable computing (Part 2), and reliable serving against software and hardware attacks (Part 3).

Grigorios Chrysos

University of Wisconsin-Madison

Assistant Professor

Title: Stairway to Specialization: The Path of Scalable Experts

Abstract: The Mixture of Experts (MoE) paradigm utilized in large (language or multimodal) models facilitates tackling diverse tasks without specific training. MoE facilitates specialization, simplifies debugging and model steerability. However, scaling the number of experts to achieve fine-grained specialization presents a significant computational challenge, unless low-rank structures are assumed. To that end, we will then introduce the μMoE layer, which employs tensor algebra to perform implicit computations on large weight tensors in a factorized form. This enables using thousands of experts at once, without increasing the computational cost over single MLP layers. I will showcase how the μMoE layer enhances specialization in both image and text applications, including GPT-2 models. This approach allows for on-demand model tailoring by selectively deactivating experts or posing counterfactual questions.

Congyue Deng

Stanford University

Ph.D. Candidate

Title: Denoising Hamiltonian Network for Physical Reasoning

Abstract: Machine learning frameworks for physical problems are expected not only to model the data distributions, but also to understand and enforce the physical constraints that preserve the key structures of the physical systems. Many existing works address these problems by constructing physical operators in neural networks. Despite their theoretically guaranteed physical properties, these methods face two key limitations: (i) They mainly focus on local temporal relations between adjacent time steps, omitting longer-range or abstract-level physical relations; and (ii) they primarily emphasize forward simulation and overlook other physical reasoning tasks in broader scopes. To address these problems, we propose the Denoising Hamiltonian Network (DHN), a novel framework that generalizes the physical concepts in Hamiltonian mechanics with flexible neural network designs. By incorporating a denoising mechanism into the network, it also circumvents the inherent challenges of numerical integration. Moreover, we also introduce global conditioning to facilitate multi-system modeling. We demonstrate its effectiveness on multiple different physical reasoning tasks.

Nived Rajaraman

University of California, Berkeley

Ph.D. Candidate

Title: New data-centric frameworks for sequential-decision making

Abstract: As machine learning systems grow increasingly general-purpose and data-centric, there is a pressing need to develop approaches which mitigate the significant cost of collecting high-quality data. This challenge is exacerbated when agents are deployed in settings involving settings involving sequential decision making. In such changing environments, unseen situations are encountered frequently and undesirable behavior can be catastrophic. For problems involving sequential decision making, a hybrid pipeline (1. pre-training a base policy from offline datasets, which is then 2. fine-tuned by online exploration) has emerged as one of the most effective ways to train performant agents. But how do we carry out pre-training and fine-tuning efficiently and robustly, when access to high-quality data forms one of the major bottlenecks? In this talk, I will discuss new approaches for this problem, which build upon insights derived from principled mathematical frameworks. I will present, (i) [Pre-training] A statistical framework for Imitation Learning, resulting in provably optimal algorithms which have small data footprints in practice. (ii) [Fine-tuning] A study of how verifier-based approaches (such as RL) appear to scale more favorably than verifier-free approaches with fixed data budgets I will conclude with a discussion of future research directions and the longer-term goal of exploring the interplay of RL and modern approaches to sequence-modeling.

Yihua Zhang

Michigan State University

Ph.D. Candidate

Title: Authenticity and Resilience: New Frontiers in Machine Unlearning for Large Language Models

Abstract: Machine unlearning has emerged as a powerful approach for selectively removing harmful or undesirable knowledge from large language models (LLMs) while preserving their general capabilities. However, recent findings reveal significant pitfalls in existing unlearning methods, including ‘fake unlearning’—where knowledge is merely hidden rather than truly removed. Such incomplete removal can render models highly vulnerable to malicious attacks or unintentional downstream fine-tunings. In this talk, we will explore how authenticity—the genuine erasure of targeted knowledge—and resilience—robustness to relearning and finetuning—can jointly serve as guiding principles for more effective machine unlearning. Drawing on both theoretical insights and empirical findings, we discuss novel strategies such as second-order optimization, weight attribution analysis, invariance-regularized training, and sharpness-aware unlearning. We show how these approaches not only address ‘fake unlearning’ but also provide even more benefits. By mapping out these new frontiers, our work contributes practical insights and foundational ideas to help researchers and practitioners develop robust, efficient, and truly trustworthy unlearning solutions for the next generation of large language models.

Rising Stars Talks 2

Time: Day 3 (Mar 26) – Wednesday – 1:30 PM to 2:30 PM

Hadi Daneshmand

University of Virginia

Assistant Professor

Title: Learning to Compute

Abstract: Understanding the mechanisms of deep learning models with billions of parameters is a fundamental challenge in AI research. Recent findings reveal that feature extraction in these models progresses incrementally, step-by-step, across network layers. We will review these experimental observations and present theoretical studies that explain the incremental process. We show how this process enables models to implement iterative algorithms capable of solving several problems, including linear regression, optimal transport, and policy evaluation for reinforcement learning, with theoretical guarantees. This computational view provides insights into effective practices like prompt engineering for language models. These findings are steps towards learning from data to implement algorithms, a lasting quest in neural computing research.

Wei Huang

RIKEN

Research Scientist

Title: Advancing Feature Learning Theory: Optimization and Generalization for Foundation Models

Abstract: Foundation models, particularly Transformers, have revolutionized modern machine learning, showcasing remarkable capabilities such as in-context learning (ICL), multi-modal representation learning, and vision-specific applications. However, a deep theoretical understanding of their optimization dynamics, generalization mechanisms, and emergent behaviors remains incomplete. My recent research addresses these challenges, developing principled frameworks to unravel the intricate mechanisms of foundation models. This talk will explore three key contributions: (1) Optimization and Generalization in Transformers, where I analyze training dynamics and characterize the transition between effective and poor generalization in noisy data settings; (2) In-Context Learning, with a novel mathematical framework explaining how Transformers leverage multi-concept word semantics for efficient task adaptation; and (3) Multi-Modal Contrastive Learning, establishing a unified feature learning theory to explain why multi-modal learning outperforms single-modal approaches in both optimization and downstream generalization. These contributions bridge the gap between theoretical advancements and practical implementations, paving the way for the design of scalable, trustworthy, and efficient foundation model.

Souvik Kundu

Intel AI Lab

Lead Research Scientist

Title: AI Assisted Automation at Scale: Enabling Large Model Intelligence at Small Scale Devices

Abstract: With the emergence of large foundation models (LFMs), artificial intelligence (AI) has found its use-cases in various automation tasks across multiple modalities. With this increasing surge of AI assistance, there has been increasing demand for deployment of these models at the edge including AI personal computers (AIPCs) and mobile devices. However, these deployments at scale face a fundamental challenge of deploying large models on a small computation and memory budget. Additionally, various AI assisted tasks like long context reasoning require additional memory overhead of long prefix storage. The problem further intensifies with the emergence of agents where critical thinking may often require assistance from multiple LFMs. Towards mitigating these roadblocks this talk will focus on two major classes of solutions: (1) efficient and scalable optimizations for LFMs: to reduce their latency and improve operation throughput during autoregressive inference while maintaining their down-stream task performance; and (2) enable improved capabilities via post-training optimizations: to improve a model’s long context understanding beyond its training effective receptive field. In specific, we empirically demonstrate the long context understanding improvement for the Mamba state space models (SSMs) by up to orders of magnitude, that too without any training requirements of the pre-trained weights.

Denny Wu

New York University & Flatiron Institute

CDS-Flatiron Faculty Fellow

Title: Learning Single-Index Models with Neural Networks and Gradient Descent

Abstract: Single-index models (SIMs) are characterized by a univariate link function applied to a one-dimensional projection of the input. This framework has been extensively studied in the deep learning theory literature to investigate neural networks’ adaptivity to low-dimensional targets and the advantages of feature learning. In this talk, we will present recent advances in understanding the optimization dynamics of gradient-based feature learning for SIMs, drawing on analytical tools from high-dimensional statistics.

Ming Yin

Princeton University

Postdoctoral Associate

Title: On the role of reinforcement learning in the era of generative AI

Abstract: The rise of generative AI has transformed the landscape of artificial intelligence, enabling unprecedented capabilities in creative problem-solving, content generation, and novel scientific discovery. However, as these models continue to scale, challenges related to alignment, safety, and decision-making in dynamic, real-world environments become increasingly prominent. Reinforcement learning (RL) offers a powerful framework to address these challenges by enabling agents to learn from feedback, optimize long-term outcomes, and adapt to complex scenarios. This talk explores the intersection of reinforcement learning and generative AI, highlighting how RL can enhance generative models in areas such as fine-tuning for user preferences, faster inference, and safe deployment. We also discuss the evaluation front for the current generative AI.

Rising Stars Talks 3

Time: Day 4 (Mar 27) – Thursday – 3:00 PM to 4:00 PM

Ismail Alkhouri

DARPA; University of Michigan at Ann Arbor, Michigan State University

Research Scientist; Visiting Scholar

Title: Dataless Quadratic Differentiable Combinatorial Optimization

Abstract: Combinatorial Optimization (CO) addresses many important problems, including the Maximum Independent Set (MIS) problem and the Maximum Cut (MaxCut) Problem. Alongside exact and heuristic solvers, differentiable approaches have emerged, often using training data. Here, we propose a new dataless quadratic formulation for MIS and MaxCut. We characterize local minimizers and stationary points and derive conditions with respect to the solution. To tackle the non-convexity of the objectives, we propose optimizing several initializations in parallel using momentum-based gradient descent. Our experimental results demonstrate the effectiveness of the proposed method compared to exact, heuristic, sampling, and data-centric approaches. Notably, our method avoids the out-of-distribution tuning and reliance on (un)labeled data required by data-centric methods. Additionally, a key advantage of our approach is that, unlike exact and heuristic solvers, the runtime scales only with the number of nodes in the graph, not the number of edges.

Soufiane Hayou

Simons Institute, UC Berkeley

Researcher

Title: A Theoretical Framework for Efficient Learning at Scale

Abstract: State-of-the-art performance is usually achieved via a series of modifications to existing neural architectures and their training procedures. A common feature of these networks is their large-scale nature: modern neural networks usually have billions – if not hundreds of billions – of trainable parameters. While empirical evaluations generally support the claim that increasing the scale of neural networks (width, depth, etc) boosts model performance if done correctly, optimizing the training process across different scales remains a significant challenge, and practitioners tend to follow empirical scaling laws from the literature. In this talk, I will present a unified framework for efficient learning at large scale. The framework allows us to derive efficient learning rules that automatically adjust to model scale, ensuring stability and optimal performance. By analyzing the interplay between network architecture, optimization dynamics, and scale, we demonstrate how these theoretically-grounded learning rules can be applied to both pretraining and finetuning. The results offer new insights into the fundamental principles governing neural network scaling and provide practical guidelines for training large-scale models efficiently.

Yingcong Li

University of Michigan, Ann Arbor

Ph.D. Student

Title: Transformers as Support Vector Machines

Abstract: The remarkable success of large language models (LLMs) has drawn significant interest, but their underlying mechanisms remain underexplored. This is due to the complexity of their architectures and how their predictions depend heavily on the data. My research focuses on uncovering the fundamental reasons behind the effectiveness of LLMs. One key insight comes from analyzing attention mechanisms, and our work shows that optimized attention acts like a support vector machine, highlighting relevant elements in the input sequence while suppressing irrelevant ones.

Yu Sun

Johns Hopkins University

Assistant Professor

Title: Provable Probabilistic Imaging using Score-based Generative Models

Abstract: Inverse problems in imaging often suffer from ill-posedness, where the task of recovering an unknown signal from incomplete and noisy measurements lacks a unique solution. Posterior sampling offers a principled approach to tackle this challenge by estimating the full posterior distribution of the unknown signal, providing both reconstructions and uncertainty quantification. In this talk, I will introduce two complementary methods for provable posterior sampling in computational imaging by using score-based diffusion models. The first method is plug-and-play Monte Carlo (PnP-MC), which can be viewed as the sampling extension of the proximal gradient method; the other one is plug-and-play Diffusion Model (PnP-DM), which mimics the dynamics of alternating direction method of multipliers. Theoretical guarantees on the convergence of the two methods will be also discussed. Our results on various imaging tasks, including nonlinear black hole imaging, demonstrate the superior performance of PnP-MC/PnP-DM in image reconstruction, as well as their high-fidelity uncertainty quantification.

Yanchao Yang

The University of Hong Kong

Assistant Professor

Title: InfoBodied AI: Learning Mutual Information for Embodied AI

Abstract: Embodied AI strives to create agents capable of learning and tackling complex tasks involving physical interactions, with potential applications in many areas, such as housekeeping, caregiving, and logistics. Such agents must be able to perceive their environment, construct scene representations, and carry out reasoning and actions to accomplish task-specific goals. However, existing learning approaches rely on human annotations or unrealistic simulations, leading to generalization problems in the real world. Thus, it is crucial to equip embodied agents with the ability to autonomously learn from real-world data, minimizing reliance on human supervision and enabling adaptability to new tasks. We propose that the key to autonomous learning of embodied agents is the mutual correlations in the unlabeled data. In this presentation, we will talk about how we can efficiently compute mutual information of data by developing novel neural estimators. We will also show how these freely available mutual correlations can help reduce human annotation effort in learning label-efficient perception, scene representation, and manipulation concepts for generalizable policies. Finally, we show a potential framework to build embodied agents that can learn in unseen environments and automatically acquire novel interaction skills by leveraging mutual information in unlabeled observational data.