Llama 2: Open foundation and fine-tuned chat models H Touvron, L Martin, K Stone, P Albert, A Almahairi, Y Babaei, ... arXiv preprint arXiv:2307.09288, 2023 | 12687 | 2023 |
Opt: Open pre-trained transformer language models S Zhang, S Roller, N Goyal, M Artetxe, M Chen, S Chen, C Dewan, ... arXiv preprint arXiv:2205.01068, 2022 | 3721* | 2022 |
The llama 3 herd of models A Dubey, A Jauhri, A Pandey, A Kadian, A Al-Dahle, A Letman, A Mathur, ... arXiv preprint arXiv:2407.21783, 2024 | 2493 | 2024 |
Efficient large scale language modeling with mixtures of experts M Artetxe, S Bhosale, N Goyal, T Mihaylov, M Ott, S Shleifer, XV Lin, J Du, ... arXiv preprint arXiv:2112.10684, 2021 | 149* | 2021 |
Few-shot Learning with Multilingual Generative Language Models XV Lin, T Mihaylov, M Artetxe, T Wang, S Chen, D Simig, M Ott, N Goyal, ... Proceedings of the 2022 Conference on Empirical Methods in Natural Language …, 2022 | 134* | 2022 |
Opt-iml: Scaling language model instruction meta learning through the lens of generalization S Iyer, XV Lin, R Pasunuru, T Mihaylov, D Simig, P Yu, K Shuster, T Wang, ... arXiv preprint arXiv:2212.12017, 2022 | 103 | 2022 |
A theory on adam instability in large-scale machine learning I Molybog, P Albert, M Chen, Z DeVito, D Esiobu, N Goyal, PS Koura, ... arXiv preprint arXiv:2304.09871, 2023 | 27 | 2023 |
Llama 2: Open foundation and fine-tuned chat models. arXiv [Preprint](2023) H Touvron, L Martin, K Stone, P Albert, A Almahairi, Y Babaei, ... URL https://arxiv. org/abs/2307 9288, 12, 0 | 14 | |
Llama 2: Open Foundation and Fine-Tuned Chat Models (Jul H Touvron, L Martin, K Stone, P Albert, A Almahairi, Y Babaei, ... arXiv preprint arXiv:2307.09288, 2023 | 8 | 2023 |
BTS: Harmonizing Specialized Experts into a Generalist LLM Q Zhang, P Bhargava, C Bi, CX Cai, J Foerster, J Fu, PS Koura, R Silva, ... arXiv preprint arXiv:2502.00075, 2025 | | 2025 |
Optimizing Pretraining Data Mixtures with LLM-Estimated Utility W Held, B Paranjape, PS Koura, M Lewis, F Zhang, T Mihaylov arXiv preprint arXiv:2501.11747, 2025 | | 2025 |