About Me

๐Ÿš€ I am a penultimate year PhD student at the University of Cambridge, supervised by Prof. Mihaela van der Schaar. During my M.Phil. study at MMLab@CUHK, I was advised by Prof. Dahua Lin and Prof. Bolei Zhou; I received my BSc in Physics from the Yuanpei Honor Program, at Peking University, and a BSc from the Guanghua School of Management, at Peking University. My undergrad thesis was advised by Prof. Zhouchen Lin.

๐Ÿค–๏ธ I believe Reinforcement Learning is a vital component of the solution for achieving AGI. My previous work on deep reinforcement learning is motivated by reality-centric applications like robotics๐Ÿฆพ, healthcare๐Ÿ’‰, finance๐Ÿ“ˆ, and large language models๐Ÿง . My research keywords during the past years include:

  • RL in Language Models. (2023-); Interpretable RL (2023-); Inverse RL (2021-);
  • Uncertainty Quantification (2022-); Data-Centric Off-Policy Evaluation (2022-);
  • Value-Based Deep-RL (2021-); Offline RL (2021-); Optimism in Exploration (2021-);
  • Continuous Control via Supervised Learning (2020-); Goal-Conditioned RL (2020-)
  • RL in Robotics (2019-)

๐Ÿค Iโ€™m open to collaborations. Please drop me an email if you find my work interesting. Let us push RL closer to genuine general intelligence!


๐Ÿ“„ (2024.03) I wrote an article arguing that Supervised Fine Tuning is Inverse Reinforcement Learning!
๐Ÿ’ฌ (2024.03) Prompt-OIRL and RATP are featured at the Inspiration Exchange, recording is online .
๐Ÿ“„ (2024.02) 2 RL+LLM papers are online! ABC uses the attention mechanism to solve the credit assignment problem in RLHF; RATP uses MCTS to enhance the reasoning ability of LLMs with external documents
๐Ÿ“„ (2024.01) 1 RL+LLM paper Prompt-OIRL is accepted by ICLR 2024! It uses Inverse RL to Evaluate and Optimize Prompts for LLMs
๐Ÿ’ฌ (2024.01) Invited talk on RLHF at the Intuit AI Research Forum. slide
๐Ÿ’ฌ (2023.12) Invited talk on RLHF at the Likelihood Lab. Discussion on Nash-Learning from Human Feedback is included! slide
๐Ÿ’ฌ (2023.11) Invited talk on RLHF at the CoAI group, THU. Discussion on the problems of the Bradley-Terry Model is included. slide
๐Ÿ“„ (2023.10) Our paper Prompt-OIRL is selected as an oral presentation at the NeurIPS 2023 ENLSP workshop!
๐Ÿ“„ (2023.10) I wrote an article on RLHF to share my thoughts as an RL researcher in the Era of LLMs.
๐Ÿ“„ (2023.9) 2 papers on Interpretable Offline RL and Interpretable Uncertainty Quantification are accepted by NeurIPS 2023.
๐Ÿ’ฌ (2023.9) Invited talk on โ€œReinforcement Learning in the Era of LLMsโ€ at Kuaishou Research. slide is online
๐Ÿ“„ (2023.2) 2 papers are accepted by AISTATS 2023.
๐Ÿ’ฌ (2022.11) Invited talk on value-based DRL at HW Cloud Research. slide is online
๐Ÿ“„ (2022.9) 1 paper on Value-Based DeepRL is accepted by NeurIPS 2022. 2 papers are presented at the FMDM workshop, and 2 papers are presented at the DeepRL workshop.
๐Ÿ“„ (2022.1) 1 paper on Offline GCRL is accepted by ICLR 2022.