Latest Tech News - Reinforcement Learning Enhanced LLMs: A Survey

This paper surveys research in RL enhanced LLMs. We make a systematic review of the literature, including:

the basics of RL
popular RL-enhanced LLMs
two reward model-based RL techniques: RLHF and RLAIF
DPO: bypassing the reward model to directly align LLM outputs with human expectations