Publications

(2026). How to Scale Mixture-of-Experts: From μP to the Maximally Scale-Stable Parameterization. Preprint; Oral at ICML HiLD Workshop 2026.
(2025). On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling. NeurIPS 2025 (spotlight).
(2024). On Feature Learning in Structured State Space Models. NeurIPS 2024. Oral and runner-up for best paper award at ICML NGSM Workshop 2024..
(2024). mup^2: Effective Sharpness Aware Minimization Requires Layerwise Perturbation Scaling. NeurIPS 2024.
(2023). Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension. NeurIPS 2023.
(2023). Pitfalls of Climate Network Construction - A Statistical Perspective. Journal of Climate.