Big bird: Transformers for longer sequences M Zaheer, G Guruganesh, KA Dubey, J Ainslie, C Alberti, S Ontanon, ... Advances in neural information processing systems 33, 17283-17297, 2020 | 2152 | 2020 |
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023 | 1548 | 2023 |
Fnet: Mixing tokens with fourier transforms J Lee-Thorp, J Ainslie, I Eckstein, S Ontanon arXiv preprint arXiv:2105.03824, 2021 | 486 | 2021 |
ETC: Encoding long and structured inputs in transformers J Ainslie, S Ontanon, C Alberti, V Cvicek, Z Fisher, P Pham, A Ravula, ... arXiv preprint arXiv:2004.08483, 2020 | 378 | 2020 |
Gqa: Training generalized multi-query transformer models from multi-head checkpoints J Ainslie, J Lee-Thorp, M de Jong, Y Zemlyanskiy, F Lebrón, S Sanghai arXiv preprint arXiv:2305.13245, 2023 | 302 | 2023 |
LongT5: Efficient text-to-text transformer for long sequences M Guo, J Ainslie, D Uthus, S Ontanon, J Ni, YH Sung, Y Yang arXiv preprint arXiv:2112.07916, 2021 | 253 | 2021 |
Realformer: Transformer likes residual attention R He, A Ravula, B Kanagal, J Ainslie arXiv preprint arXiv:2012.11747, 2020 | 105 | 2020 |
Formnet: Structural encoding beyond sequential modeling in form document information extraction CY Lee, CL Li, T Dozat, V Perot, G Su, N Hua, J Ainslie, R Wang, Y Fujii, ... arXiv preprint arXiv:2203.08411, 2022 | 79 | 2022 |
Making transformers solve compositional tasks S Ontanon, J Ainslie, V Cvicek, Z Fisher arXiv preprint arXiv:2108.04378, 2021 | 78 | 2021 |
Sparse upcycling: Training mixture-of-experts from dense checkpoints A Komatsuzaki, J Puigcerver, J Lee-Thorp, CR Ruiz, B Mustafa, J Ainslie, ... arXiv preprint arXiv:2212.05055, 2022 | 68 | 2022 |
Colt5: Faster long-range transformers with conditional computation J Ainslie, T Lei, M de Jong, S Ontañón, S Brahma, Y Zemlyanskiy, ... arXiv preprint arXiv:2303.09752, 2023 | 50 | 2023 |
Encoding long and structured data in transformers J Ainslie, S Ontanon, C Alberti, P Pham, A Ravula, S Sanghai arXiv preprint arXiv:2004.08483 2, 2020 | 33 | 2020 |
Conditional adapters: Parameter-efficient transfer learning with fast inference T Lei, J Bai, S Brahma, J Ainslie, K Lee, Y Zhou, N Du, V Zhao, Y Wu, B Li, ... Advances in Neural Information Processing Systems 36, 8152-8172, 2023 | 29 | 2023 |
Fido: Fusion-in-decoder optimized for stronger performance and faster inference M de Jong, Y Zemlyanskiy, J Ainslie, N FitzGerald, S Sanghai, F Sha, ... arXiv preprint arXiv:2212.08153, 2022 | 24 | 2022 |
Fnet: Mixing tokens with fourier transforms. arXiv 2021 J Lee-Thorp, J Ainslie, I Eckstein, S Ontanon arXiv preprint arXiv:2105.03824, 2021 | 22 | 2021 |
Functional interpolation for relative positions improves long context transformers S Li, C You, G Guruganesh, J Ainslie, S Ontanon, M Zaheer, S Sanghai, ... arXiv preprint arXiv:2310.04418, 2023 | 20 | 2023 |
Improving compositional generalization in classification tasks via structure annotations J Kim, P Ravikumar, J Ainslie, S Ontañón arXiv preprint arXiv:2106.10434, 2021 | 15 | 2021 |
Readtwice: Reading very large documents with memories Y Zemlyanskiy, J Ainslie, M de Jong, P Pham, I Eckstein, F Sha arXiv preprint arXiv:2105.04241, 2021 | 15 | 2021 |
Generate-and-Retrieve: use your predictions to improve retrieval for semantic parsing Y Zemlyanskiy, M de Jong, J Ainslie, P Pasupat, P Shaw, L Qiu, ... arXiv preprint arXiv:2209.14899, 2022 | 13 | 2022 |
ETC: Encoding long and structured inputs in transformers A Ravula, C Alberti, J Ainslie, L Yang, PM Pham, Q Wang, S Ontanon, ... Proceedings of the 2020 Conference on Empirical Methods in Natural Language …, 2020 | 12 | 2020 |