Follow
Sebastian Jaszczur
Sebastian Jaszczur
Verified email at uw.edu.pl
Title
Cited by
Cited by
Year
Sparse is Enough in Scaling Transformers
S Jaszczur, A Chowdhery, A Mohiuddin, L Kaiser, W Gajewski, ...
Advances in Neural Information Processing Systems 34, 9895-9907, 2021
712021
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
M Pióro, K Ciebiera, K Król, J Ludziejewski, S Jaszczur
arXiv preprint arXiv:2401.04081, 2024
142024
Neural heuristics for SAT solving
S Jaszczur, M Łuszczyk, H Michalewski
arXiv preprint arXiv:2005.13406, 2020
112020
Use of domain knowledge and feature engineering in helping AI to play Hearthstone
P Przybyszewski, S Dziewiątkowski, S Jaszczur, M Śmiech, M Szczuka
2017 Federated Conference on Computer Science and Information Systems …, 2017
62017
Scaling Laws for Fine-Grained Mixture of Experts
J Krajewski, J Ludziejewski, K Adamczewski, M Pióro, M Krutul, ...
arXiv preprint arXiv:2402.07871, 2024
12024
Structured Packing in LLM Training Improves Long Context Utilization
K Staniszewski, S Tworkowski, S Jaszczur, H Michalewski, Ł Kuciński, ...
arXiv preprint arXiv:2312.17296, 2023
2023
Mixture of Tokens: Efficient LLMs through Cross-Example Aggregation
S Antoniak, S Jaszczur, M Krutul, M Pióro, J Krajewski, J Ludziejewski, ...
arXiv preprint arXiv:2310.15961, 2023
2023
Sparse attention neural networks
A Chowdhery, A Mohiuddin, H Michalewski, JM Kanerva, LM Kaiser, ...
US Patent App. 17/666,400, 2022
2022
The system can't perform the operation now. Try again later.
Articles 1–8