Don't stop pretraining: Adapt language models to domains and tasks S Gururangan, A Marasović, S Swayamdipta, K Lo, I Beltagy, D Downey, ... arXiv preprint arXiv:2004.10964, 2020 | 2398 | 2020 |
Annotation artifacts in natural language inference data S Gururangan, S Swayamdipta, O Levy, R Schwartz, SR Bowman, ... arXiv preprint arXiv:1803.02324, 2018 | 1262 | 2018 |
The llama 3 herd of models A Dubey, A Jauhri, A Pandey, A Kadian, A Al-Dahle, A Letman, A Mathur, ... arXiv preprint arXiv:2407.21783, 2024 | 1206 | 2024 |
Realtoxicityprompts: Evaluating neural toxic degeneration in language models S Gehman, S Gururangan, M Sap, Y Choi, NA Smith arXiv preprint arXiv:2009.11462, 2020 | 1061 | 2020 |
All that's' human'is not gold: Evaluating human evaluation of generated text E Clark, T August, S Serrano, N Haduong, S Gururangan, NA Smith arXiv preprint arXiv:2107.00061, 2021 | 394 | 2021 |
Editing models with task arithmetic G Ilharco, MT Ribeiro, M Wortsman, S Gururangan, L Schmidt, ... arXiv preprint arXiv:2212.04089, 2022 | 374 | 2022 |
Show your work: Improved reporting of experimental results J Dodge, S Gururangan, D Card, R Schwartz, NA Smith arXiv preprint arXiv:1909.03004, 2019 | 278 | 2019 |
Variational pretraining for semi-supervised text classification S Gururangan, T Dang, D Card, NA Smith arXiv preprint arXiv:1906.02242, 2019 | 142 | 2019 |
Detoxifying language models risks marginalizing minority voices A Xu, E Pathak, E Wallace, S Gururangan, M Sap, D Klein arXiv preprint arXiv:2104.06390, 2021 | 126 | 2021 |
Branch-train-merge: Embarrassingly parallel training of expert language models M Li, S Gururangan, T Dettmers, M Lewis, T Althoff, NA Smith, ... arXiv preprint arXiv:2208.03306, 2022 | 120 | 2022 |
Demix layers: Disentangling domains for modular language modeling S Gururangan, M Lewis, A Holtzman, NA Smith, L Zettlemoyer arXiv preprint arXiv:2108.05036, 2021 | 106 | 2021 |
Less: Selecting influential data for targeted instruction tuning M Xia, S Malladi, S Gururangan, S Arora, D Chen arXiv preprint arXiv:2402.04333, 2024 | 95 | 2024 |
Time waits for no one! analysis and challenges of temporal misalignment K Luu, D Khashabi, S Gururangan, K Mandyam, NA Smith arXiv preprint arXiv:2111.07408, 2021 | 80 | 2021 |
Silo language models: Isolating legal risk in a nonparametric datastore S Min, S Gururangan, E Wallace, W Shi, H Hajishirzi, NA Smith, ... arXiv preprint arXiv:2308.04430, 2023 | 49 | 2023 |
kNN-Prompt: Nearest Neighbor Zero-Shot Inference W Shi, J Michael, S Gururangan, L Zettlemoyer arXiv preprint arXiv:2205.13792, 2022 | 48 | 2022 |
Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments T Xie, D Zhang, J Chen, X Li, S Zhao, R Cao, TJ Hua, Z Cheng, D Shin, ... arXiv preprint arXiv:2404.07972, 2024 | 43 | 2024 |
Scaling expert language models with unsupervised domain discovery S Gururangan, M Li, M Lewis, W Shi, T Althoff, NA Smith, L Zettlemoyer arXiv preprint arXiv:2303.14177, 2023 | 32 | 2023 |
Whose language counts as high quality? measuring language ideologies in text data selection S Gururangan, D Card, SK Dreier, EK Gade, LZ Wang, Z Wang, ... arXiv preprint arXiv:2201.10474, 2022 | 22 | 2022 |
M2D2: A massively multi-domain language modeling dataset M Reid, V Zhong, S Gururangan, L Zettlemoyer arXiv preprint arXiv:2210.07370, 2022 | 19 | 2022 |
Datacomp-lm: In search of the next generation of training sets for language models J Li, A Fang, G Smyrnis, M Ivgi, M Jordan, S Gadre, H Bansal, E Guha, ... arXiv preprint arXiv:2406.11794, 2024 | 18 | 2024 |