Follow
Wei Xiong
Wei Xiong
Verified email at illinois.edu - Homepage
Title
Cited by
Cited by
Year
Raft: Reward ranked finetuning for generative foundation model alignment
H Dong, W Xiong, D Goyal, Z Yihan, C Winnie, R Pan, S Diao, J Zhang, ...
TMLR, 2023
2902023
Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint
W Xiong, H Dong, C Ye, Z Wang, H Zhong, H Ji, N Jiang, T Zhang
ICML 2024, 2023
101*2023
Mitigating the Alignment Tax of RLHF
Y Lin, H Lin, W Xiong, S Diao, J Liu, J Zhang, R Pan, H Wang, W Hu, ...
EMNLP 2024, 2023
86*2023
A posterior sampling framework for interactive decision making
H Zhong, W Xiong, S Zheng, L Wang, Z Wang, Z Yang, T Zhang
arXiv preprint arXiv:2211.01962 2 (3), 2022
60*2022
RLHF Workflow: From Reward Modeling to Online RLHF
H Dong, W Xiong, B Pang, H Wang, H Zhao, Y Zhou, N Jiang, D Sahoo, ...
TMLR, 2024
572024
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts
H Wang, W Xiong, T Xie, H Zhao, T Zhang
EMNLP 2024, 2024
502024
Lmflow: An extensible toolkit for finetuning and inference of large foundation models
S Diao, R Pan, H Dong, KS Shum, J Zhang, W Xiong, T Zhang
NAACL 2024, Best Demo Paper Award, 2023
502023
Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets
H Zhong, W Xiong, J Tan, L Wang, T Zhang, Z Wang, Z Yang
ICML 2022, 2022
482022
Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game
W Xiong, H Zhong, C Shi, C Shen, L Wang, T Zhang
ICLR 2023, 2022
472022
Decentralized multi-player multi-armed bandits with no collision information
C Shi, W Xiong, C Shen, J Yang
AISTATS 2020, 2020
432020
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
H Wang, Y Lin, W Xiong, R Yang, S Diao, S Qiu, H Zhao, T Zhang
ACL 2024, 2024
382024
Maximize to explore: One objective function fusing estimation, planning, and exploration
Z Liu, M Lu, W Xiong, H Zhong, H Hu, S Zhang, S Zheng, Z Yang, Z Wang
NeurIPS 2023, 2024
32*2024
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
C Ye, W Xiong, Y Zhang, N Jiang, T Zhang
NeurIPS 2024, 2024
28*2024
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games
W Xiong, H Zhong, C Shi, C Shen, T Zhang
ICML 2022, 2022
272022
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
C Ye, W Xiong, Q Gu, T Zhang
ICML 2023, 2022
262022
DPO Meets PPO: Reinforced Token Optimization for RLHF
H Zhong, G Feng, W Xiong, L Zhao, D He, J Bian, L Wang
arXiv preprint arXiv:2404.18922, 2024
252024
Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization
C Shi, W Xiong, C Shen, J Yang
NeurIPS 2021, 2021
252021
Distributional reinforcement learning for multi-dimensional reward functions
P Zhang, X Chen, L Zhao, W Xiong, T Qin, TY Liu
NeurIPS 2021, 2021
232021
PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction
H Ye, W Xiong, T Zhang
arXiv preprint arXiv:2012.15010, 2020
162020
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
R Pi, T Han, W Xiong, J Zhang, R Liu, R Pan, T Zhang
ECCV 2024, 2024
152024
The system can't perform the operation now. Try again later.
Articles 1–20