About Me

🚀 Hi there, I am Hao Sun, a final year PhD student at the University of Cambridge supervised by Prof. Mihaela van der Schaar. During my M.Phil. study at MMLab@CUHK, I was advised by Prof. Dahua Lin and Prof. Bolei Zhou; I received my BSc in Physics from the Yuanpei Honor Program, at Peking University, and a BSc from the Guanghua School of Management, at Peking University. My undergrad thesis was advised by Prof. Zhouchen Lin.

🤖️ I believe Reinforcement Learning is a vital component of the solution for achieving AGI. My previous work on deep reinforcement learning is motivated by reality-centric applications like robotics🦾, healthcare💉, finance📈, and large language models🧠. My research keywords during the past years include:

  • (Inverse) RL in Language Models (2023-); Inverse RL (2021-); Interpretable RL (2023-);
  • Uncertainty Quantification (2022-); Data-Centric Reward Modeling (2022-);
  • Value-Based Deep-RL (2021-); Offline RL (2021-); Optimism in Exploration (2021-);
  • Continuous Control via Supervised Learning (2020-); Goal-Conditioned RL (2020-)
  • RL in Robotics (2019-)

🤝 I’m open to collaborations. Please drop me an email if you find my work interesting. Let us push RL closer to genuine general intelligence! Here are some topics I’m actively working on:

  • Inverse RL in Language Modeling: Alignment, Multi-Objective/Meta-Learn, Uncertainty
  • Data-Centric Perspective of Reward Modeling

News

📄 (2024.03) I wrote an article arguing that Supervised Fine Tuning is Inverse Reinforcement Learning!
💬 (2024.03) Prompt-OIRL and RATP are featured at the Inspiration Exchange, recording is online .
📄 (2024.02) 2 RL+LLM papers are online! ABC uses the attention mechanism to solve the credit assignment problem in RLHF; RATP uses MCTS to enhance the reasoning ability of LLMs with external documents
📄 (2024.01) 1 RL+LLM paper Prompt-OIRL is accepted by ICLR 2024! It uses Inverse RL to Evaluate and Optimize Prompts for LLMs
💬 (2024.01) Invited talk on RLHF at the Intuit AI Research Forum. slide
💬 (2023.12) Invited talk on RLHF at the Likelihood Lab. Discussion on Nash-Learning from Human Feedback is included! slide
💬 (2023.11) Invited talk on RLHF at the CoAI group, THU. Discussion on the problems of the Bradley-Terry Model is included. slide
📄 (2023.10) Our paper Prompt-OIRL is selected as an oral presentation at the NeurIPS 2023 ENLSP workshop!
📄 (2023.10) I wrote an article on RLHF to share my thoughts as an RL researcher in the Era of LLMs.
📄 (2023.9) 2 papers on Interpretable Offline RL and Interpretable Uncertainty Quantification are accepted by NeurIPS 2023.
💬 (2023.9) Invited talk on “Reinforcement Learning in the Era of LLMs” at Kuaishou Research. slide is online
📄 (2023.2) 2 papers are accepted by AISTATS 2023.
💬 (2022.11) Invited talk on value-based DRL at HW Cloud Research. slide is online
📄 (2022.9) 1 paper on Value-Based DeepRL is accepted by NeurIPS 2022. 2 papers are presented at the FMDM workshop, and 2 papers are presented at the DeepRL workshop.
📄 (2022.1) 1 paper on Offline GCRL is accepted by ICLR 2022.