Jan Leike

Cytowane przez

	Wszystkie	Od 2019
Cytowania	19039	18580
h-indeks	26	23
i10-indeks	32	27

10000

5000

2500

7500

201520162017201820192020202120222023202448 60 84 189 289 370 504 1132 6815 9408

Dostęp publiczny

Wyświetl wszystko

10 artykułów

0 artykułów

dostępne

niedostępne

Objęte finansowaniem

Współautorzy

Jeffrey WuOpenAIZweryfikowany adres z openai.com
Paul ChristianoNational Institute of Standards and TechnologyZweryfikowany adres z nist.gov
John SchulmanResearch Scientist, OpenAIZweryfikowany adres z openai.com
Ryan LoweOpenAIZweryfikowany adres z openai.com
Marcus HutterResearcher@DeepMind & Professor at ANUZweryfikowany adres z anu.edu.au
Dario AmodeiCEO and Co-Founder at AnthropicZweryfikowany adres z anthropic.com
Matthias HeizmannUniversity of Stuttgart, GermanyZweryfikowany adres z heizmann.name
David Scott KruegerUniversity Assistant Professor, University of CambridgeZweryfikowany adres z cam.ac.uk
Ilya SutskeverCo-Founder and Chief Scientist of OpenAIZweryfikowany adres z openai.com
Tom EverittStaff Research Scientist at Google DeepMindZweryfikowany adres z google.com
Pushmeet KohliDeepMindZweryfikowany adres z google.com
Andreas PodelskiProfessor of Computer Science, Freiburg UniversityZweryfikowany adres z informatik.uni-freiburg.de
Tegan MaharajAssistant Professor at University of TorontoZweryfikowany adres z polymtl.ca
Geoffrey IrvingUK AI Safety Institute (AISI)Zweryfikowany adres z naml.us
Yuri BurdaOpenAIZweryfikowany adres z openai.com
William SaundersOpenAIZweryfikowany adres z cs.toronto.edu
Collin BurnsResearcher, OpenAIZweryfikowany adres z openai.com
Pavel IzmailovAnthropic; NYUZweryfikowany adres z anthropic.com
Adam GleaveCEO at FAR AIZweryfikowany adres z far.ai
Andrew TraskUniversity of Oxford and OpenMinedZweryfikowany adres z openmined.org

Obserwuj

Jan Leike

OpenAI

Zweryfikowany adres z openai.com - Strona główna

reinforcement learning deep learning agent alignment


Tytuł Sortuj wg cytatów Sortuj wg roku Sortuj wg tytułu	Cytowane przez Cytowane przez	Rok
Training language models to follow instructions with human feedback L Ouyang, J Wu, X Jiang, D Almeida, C Wainwright, P Mishkin, C Zhang, ... Advances in Neural Information Processing Systems 35, 27730-27744, 2022	7771	2022
GPT-4 technical report OpenAI arXiv, 2023	3378*	2023
Deep reinforcement learning from human preferences PF Christiano, J Leike, T Brown, M Martic, S Legg, D Amodei Advances in Neural Information Processing Systems 30, 4299-4307, 2017	2510	2017
Evaluating large language models trained on code M Chen, J Tworek, H Jun, Q Yuan, HPO Pinto, J Kaplan, H Edwards, ... arXiv preprint arXiv:2107.03374, 2021	2471	2021
Reward learning from human preferences and demonstrations in Atari B Ibarz, J Leike, T Pohlen, G Irving, S Legg, D Amodei Advances in Neural Information Processing Systems, 8011-8023, 2018	370	2018
AI Safety Gridworlds J Leike, M Martic, V Krakovna, PA Ortega, T Everitt, A Lefrancq, L Orseau, ... arXiv preprint arXiv:1711.09883, 2017	333	2017
Scalable agent alignment via reward modeling: a research direction J Leike, D Krueger, T Everitt, M Martic, V Maini, S Legg arXiv preprint arXiv:1811.07871, 2018	281	2018
Let's Verify Step by Step H Lightman, V Kosaraju, Y Burda, H Edwards, B Baker, T Lee, J Leike, ... arXiv preprint arXiv:2305.20050, 2023	263	2023
Recursively summarizing books with human feedback J Wu, L Ouyang, DM Ziegler, N Stiennon, R Lowe, J Leike, P Christiano arXiv preprint arXiv:2109.10862, 2021	215	2021
Learning to Understand Goal Specifications by Modelling Reward D Bahdanau, F Hill, J Leike, E Hughes, P Kohli, E Grefenstette arXiv preprint arXiv:1806.01946, 2018	197*	2018
Language models can explain neurons in language models S Bills, N Cammarata, D Mossing, H Tillman, L Gao, G Goh, I Sutskever, ... URL https://openaipublic. blob. core. windows. net/neuron-explainer/paper …, 2023	144	2023
Self-critiquing models for assisting human evaluators W Saunders, C Yeh, J Wu, S Bills, L Ouyang, J Ward, J Leike arXiv preprint arXiv:2206.05802, 2022	144	2022
Ranking Templates for Linear Loops J Leike, M Heizmann Logical Methods in Computer Science, 2015	97	2015
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision C Burns, P Izmailov, JH Kirchner, B Baker, L Gao, L Aschenbrenner, ... arXiv preprint arXiv:2312.09390, 2023	89	2023
Learning human objectives by evaluating hypothetical behavior S Reddy, A Dragan, S Levine, S Legg, J Leike International Conference on Machine Learning, 8020-8029, 2020	83	2020
Linear ranking for linear lasso programs M Heizmann, J Hoenicke, J Leike, A Podelski Automated Technology for Verification and Analysis, 365-380, 2013	65	2013
Institutionalizing ethics in AI through broader impact requirements CEA Prunkl, C Ashurst, M Anderljung, H Webb, J Leike, A Dafoe Nature Machine Intelligence 3 (2), 104-110, 2021	62	2021
Hidden Incentives for Auto-Induced Distributional Shift D Krueger, T Maharaj, J Leike arXiv preprint arXiv:2009.09153, 2020	59*	2020
Quantifying Differences in Reward Functions A Gleave, M Dennis, S Legg, S Russell, J Leike arXiv preprint arXiv:2006.13900, 2020	59	2020
Geometric nontermination arguments J Leike, M Heizmann International Conference on Tools and Algorithms for the Construction and …, 2018	53*	2018

Nie można teraz wykonać tej operacji. Spróbuj ponownie później.

Prace 1–20

Cytowania rocznie

Powielone cytowania

Scalone cytowania

Dodaj współautorówWspółautorzy

Obserwuj

Cytowane przez

Współautorzy