Publications
My full publication list can be found on my Google Scholar profile.
Preprint
[RL x LLM] Supervised Fine-Tuning as Inverse Reinforcement Learning [Paper]
Hao Sun
- In our work, we question the efficacy of the preference-based datasets in LLM alignment and explore various scenarios where alignment with expert demonstrations proves more realistic. Drawing insights from inverse reinforcement learning and imitation learning, we introduce various approaches for divergence minimization in the LLM alignment tasks. Our analysis highlights the mass-covering and mode-seeking behaviors of these different approaches. Inclusively, we examine the pros and cons of the classical supervised fine-tuning method, elaborating on scenarios where different methods shine.
[RL x LLM] Reinforcement Learning in the Era of LLMs: What is Essential? What is Needed? [Paper]
Hao Sun
- (1) RLHF is online IRL rather than offline RL. (2) RLHF is better than SFT because imitation learning alleviates the compounding error problem. (3) Insight of RM can be generalized to other LLM applications except alignment. (4) RLHF is more challenging than conventional IRL due to action space dimensionality and reward sparsity. (5) The superiority of PPO in RLHF may originate from its stability.
Conference
[ICLR 2024] Query-Dependent Prompt Evaluation and Optimization with Inverse RL [Paper] [Code]
Hao Sun, Alihan Hüyük, Mihaela van der Schaar
- We propose Prompt-OIRL, showing that Inverse RL can be used for offline query-dependent prompt evaluation and optimization. It does not require interactions with the LLMs during learning yet achieves superior performance on arithmetic reasoning tasks.
[NeurIPS 2023] Accountability in Offline Reinforcement Learning: Explaining Decisions with a Corpus of Examples [Paper] [Code (Soon)]
Hao Sun, Alihan Hüyük, Daniel Jarrett, Mihaela van der Schaar
- We introduce an effective algorithm to enhance interpretability and accountability in offline RL. This research is critical for responsibility-sensitive applications like finance and healthcare.
[NeurIPS 2023] DAUC: a Density-based Approach for Uncertainty Categorization [Paper (To Be Updated)] [Code (Soon)]
Hao Sun ^, Boris van Breugel^, Jonathan Crabbe, Nabeel Seedat, Mihaela van der Schaar
- We propose a density-based approach to classify and explain the source of uncertainty.
[NeurIPS 2022] Exploiting Reward Shifting in Value-Based DRL [Paper] [Code]
Hao Sun, Lei Han, Rui Yang, Xiaoteng Ma, Jian Guo, Bolei Zhou
- A positive reward shifting leads to conservative exploitation, while a negative reward shifting leads to curiosity-driven exploration.
[ICLR 2022] Rethinking Goal-conditioned Supervised Learning and Its Connection to Offline RL [Paper] [Code]
Rui Yang, Yiming Lu, Wenzhe Li, Hao Sun, Meng Fang, Yali Du, Xiu Li, Lei Han, Chongjie Zhang
- We optimize the GCSL with a lower bound of the goal-reaching objective and link the success of GCSL from perspective of offline RL.
[IJCAI 2021] Hierarchical Multi-Scale Gaussian Transformer for Stock Movement Prediction [Paper]
Qianggang Ding, Sifan Wu, Hao Sun, Jiadong Guo, Jian Guo
- We adapt transformer to stock movement predictions.
[AAAI 2021] Adaptive Regularization of Labels [Paper]
Qianggang Ding, Sifan Wu, Hao Sun, Jiadong Guo, Shu-Tao Xia
- We study the correlations between lables to improve model performance.
[NeurIPS 2019 (Spotlight)] Policy Continuation with Hindsight Inverse Dynamics [Paper] [Code] [Homepage]
Hao Sun, Zhizhong Li, Xiaotong Liu, Dahua Lin, Bolei Zhou
- Supervised Learning can be used to solve goal-conditioned RL tasks.