Research papers

Link to DBLP and Google Scholar. You can find the code for reproducing our empirical work in the following GitHub. Acknowledgement: Our research has been generously supported by funds from the National Science Foundation and from JP Morgan Chase.


By year | By topic | Selected papers


Representative publications

These selected publications include both my job market paper, more recent work with my PhD students at Northeastern, and collaborative work.

In this paper, we showed that in the over-parameterized matrix sensing problem, gradient descent starting from a small random initialization converges to the ground-truth matrix without explicit regularization added to the loss objective. We recently utilized techniques from this line of work to tackle matrix completion from ultra-sparse samples.

We analyze the supervised fine-tuning algorithm, which starts from a pretrained model instead of a random initialization. We identify a Hessian-based generalization measure (derived from a PAC-Bayes noise perturbation analysis) that gives non-vacuous bound on the generalization gap, and empirically validate this bound for various neural network architectures.

We improve the state-of-the-art generalization bound for graph neural networks, specifically message-passing neural networks. We prove bounds based on the spectral norm of the graph diffusion matrix, whereas prior work shows bounds depending on the maximum degree of the graphs.

We rigorously formulate the problem of accurately identifying negative transfer in multitask learning. We introduce linear surrogate models for predicting the outcomes of multitask training and prove linear sample complexity bounds. This problem formulation has generated numerous follow-up works including our latest work on task attribution at ICLR'26.

We give a precise quantization of negative transfer in the proportional limit regime for two linear regression tasks. We rigorously prove a phase transition from positive to negative transfer as the number of source-task sample increases, improving upon our initial analysis at ICLR'20.

* represents alphabetical authorship.

† indicates a Northeastern student co-author who was advised by me.