About Me

🚀 Hi there! I am Hao Sun, a final-year Ph.D. student at the University of Cambridge, supervised by Prof. Mihaela van der Schaar, working at the intersection of reinforcement learning (RL) and large language models (LLMs). During my M.Phil. study at MMLab@CUHK, I was advised by Prof. Dahua Lin and Prof. Bolei Zhou. I hold a B.Sc. in Physics from the Yuanpei College at Peking University, and a B.Sc. from the Guanghua School of Management. My undergraduate thesis was supervised by Prof. Zhouchen Lin.

I am seeking research scientist and academic positions starting in 2025.

News!

📄 (2024.10) Our Tutorial: Inverse RL Meets LLMs has been accepted for AAAI-2025! Join us in Philadelphia and let us explore the potential of Inverse RL in the era of LLMs!
💬 (2024.10) New talk on Inverse RL Meets LLMs at the vdsLab2024 OpenHouse and UCLA Zhou Lab. This talk summarizes our efforts in using IRL for better Prompting, Fine-Tuning, and Inference-Time Optimization. Slide is online
📄 (2024.09) Our Data Centric Reward Modeling paper is accepted by the Journal of Data-Centric Machine Learning Research (DMLR).
📄 (2024.08) InverseRLignment is presented at the RL beyond reward workshop (accepted with score 9) at the 1-st RLC.
📄 (2024.05) InverseRLignment is online, it builds reward models from SFT data.
📄 (2024.05) Our Dense Reward Model paper is accepted by ICML 2024.
📄 (2024.03) I wrote an article arguing that Supervised Fine Tuning is Inverse Reinforcement Learning!
💬 (2024.03) Prompt-OIRL and RATP are featured at the Inspiration Exchange, recording is online .
📄 (2024.02) 2 RL+LLM papers are online! ABC uses the attention mechanism to solve the credit assignment problem in RLHF; RATP uses MCTS to enhance the reasoning ability of LLMs with external documents
📄 (2024.01) 1 RL+LLM paper is accepted by ICLR 2024! Prompt-OIRL uses Inverse RL to Evaluate and Optimize Prompts for Reasoning.
💬 (2024.01) Invited talk on RLHF at the Intuit AI Research Forum. slide
💬 (2023.12) Invited talk on RLHF at the Likelihood Lab slide
💬 (2023.11) Invited talk on RLHF at the CoAI group, THU.. slide
📄 (2023.10) Prompt-OIRL is selected as an oral presentation at the NeurIPS 2023 ENLSP workshop!
📄 (2023.10) I wrote an article on RLHF to share my thoughts as an RL researcher in the Era of LLMs.
📄 (2023.9) 2 papers on Interpretable Offline RL and Interpretable Uncertainty Quantification are accepted by NeurIPS 2023.