Obserwuj
Yupan Huang
Tytuł
Cytowane przez
Cytowane przez
Rok
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Y Huang, T Lv, L Cui, Y Lu, F Wei
Proceedings of the 30th ACM International Conference on Multimedia, 2022
4412022
Seeing out of the box: End-to-end pre-training for vision-language representation learning
Z Huang*, Z Zeng*, Y Huang*, B Liu, D Fu, J Fu
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021
2942021
Probing inter-modality: Visual parsing with self-attention for vision-and-language pre-training
H Xue, Y Huang, B Liu, H Peng, J Fu, H Li, J Luo
Advances in Neural Information Processing Systems 34, 4514-4528, 2021
922021
Textdiffuser: Diffusion models as text painters
J Chen*, Y Huang*, T Lv, L Cui, Q Chen, F Wei
Advances in Neural Information Processing Systems 36, 2024
912024
Decoupling localization and classification in single shot temporal action detection
Y Huang, Q Dai, Y Lu
2019 IEEE International Conference on Multimedia and Expo (ICME), 1288-1293, 2019
602019
Unifying multimodal transformer for bi-directional image and text generation
Y Huang, H Xue, B Liu, Y Lu
Proceedings of the 29th ACM International Conference on Multimedia, 1138-1147, 2021
592021
Kosmos-2.5: A Multimodal Literate Model
T Lv*, Y Huang*, J Chen*, Y Zhao, Y Jia, L Cui, S Ma, Y Chang, S Huang, ...
arXiv preprint arXiv:2309.11419, 2023
422023
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
J Chen, Y Huang, T Lv, L Cui, Q Chen, F Wei
European Conference on Computer Vision, 386-402, 2024
382024
Reinforced short-length hashing
X Liu, X Nie, Q Dai, Y Huang, L Lian, Y Yin
IEEE Transactions on Circuits and Systems for Video Technology 31 (9), 3655-3668, 2020
252020
Sparkles: Unlocking chats across multiple images for multimodal instruction-following models
Y Huang, Z Meng, F Liu, Y Su, N Collier, Y Lu
arXiv preprint arXiv:2308.16463, 2023
192023
A picture is worth a thousand words: A unified system for diverse captions and rich images generation
Y Huang, B Liu, J Fu, Y Lu
Proceedings of the 29th ACM International Conference on Multimedia, 2792-2794, 2021
82021
Be specific, be clear: Bridging machine and human captions by scene-guided transformer
Y Huang, Z Zeng, Y Lu
Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia …, 2021
82021
RedStone: Curating General, Code, Math, and QA Data for Large Language Models
Y Chang, L Cui, L Dong, S Huang, Y Huang, Y Huang, S Li, T Lv, S Ma, ...
arXiv preprint arXiv:2412.03398, 2024
2024
Nie można teraz wykonać tej operacji. Spróbuj ponownie później.
Prace 1–13