Zhengyuan Yang

Cited by

	All	Since 2019
Citations	4024	4010
h-index	26	26
i10-index	35	35

1900

950

475

1425

201820192020202120222023202412 59 130 305 632 1863 1016

Public access

View all

14 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Lijuan WangMicrosoft GenAIVerified email at microsoft.com
Jianfeng WangMicrosoftVerified email at microsoft.com
Zicheng LiuMicrosoftVerified email at microsoft.com
Jiebo LuoAlbert Arendt Hopeman Professor of Engineering, University of RochesterVerified email at cs.rochester.edu
Linjie (Lindsey) LiSenior Researcher, MicrosoftVerified email at microsoft.com
Kevin LinMicrosoftVerified email at microsoft.com
Zhe GanResearch Scientist, AppleVerified email at apple.com
Liwei WangAssistant Professor at The Chinese University of Hong KongVerified email at cse.cuhk.edu.hk
Ce LiuPartner Research Manager, Microsoft GenAI; IEEE FellowVerified email at microsoft.com
Jinsong SuXiamen UniversityVerified email at xmu.edu.cn
Jiajun Deng (邓家俊)University of Adelaide, Australian Institute for Machine LearningVerified email at adelaide.edu.au
Yuncheng LiGoogleVerified email at google.com
Jianwei YangPrincipal Researcher, Microsoft Research, RedmondVerified email at microsoft.com
Chenglei SiStanford UniversityVerified email at stanford.edu
Boqing GongResearch Scientist, GoogleVerified email at google.com

Zhengyuan Yang

Researcher, Microsoft

Verified email at microsoft.com - Homepage

Computer Vision Multimedia Vision + Language Multimodal


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Git: A generative image-to-text transformer for vision and language J Wang, Z Yang, X Hu, L Li, K Lin, Z Gan, Z Liu, C Liu, L Wang Transactions on Machine Learning Research (TMLR), 2022	349	2022
A fast and accurate one-stage approach to visual grounding Z Yang, B Gong, L Wang, W Huang, D Yu, J Luo IEEE International Conference on Computer Vision (ICCV), 4683-4693, 2019	310	2019
An empirical study of gpt-3 for few-shot knowledge-based vqa Z Yang, Z Gan, J Wang, X Hu, Y Lu, Z Liu, L Wang Proceedings of the AAAI Conference on Artificial Intelligence 36 (3), 3081-3089, 2022	293	2022
TransVG: End-to-End Visual Grounding with Transformers J Deng, Z Yang, T Chen, W Zhou, H Li IEEE International Conference on Computer Vision (ICCV), 2021	247	2021
The dawn of lmms: Preliminary explorations with gpt-4v (ision) Z Yang, L Li, K Lin, J Wang, CC Lin, Z Liu, L Wang arXiv preprint arXiv:2309.17421 9 (1), 1, 2023	224	2023
Scaling up vision-language pre-training for image captioning X Hu, Z Gan, J Wang, Z Yang, Z Liu, Y Lu, L Wang Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022	215	2022
Mm-react: Prompting chatgpt for multimodal reasoning and action Z Yang, L Li, J Wang, K Lin, E Azarnasab, F Ahmed, Z Liu, C Liu, M Zeng, ... arXiv preprint arXiv:2303.11381, 2023	195	2023
Improving One-stage Visual Grounding by Recursive Sub-query Construction Z Yang, T Chen, L Wang, J Luo European Conference on Computer Vision (ECCV), 2020	187	2020
End-to-end multi-modal multi-task vehicle control for self-driving cars with visual perceptions Z Yang, Y Zhang, J Yu, J Cai, J Luo 2018 24th international conference on pattern recognition (ICPR), 2289-2294, 2018	185	2018
Action recognition with spatio–temporal visual attention on skeleton image sequences Z Yang, Y Li, J Yang, J Luo IEEE Transactions on Circuits and Systems for Video Technology 29 (8), 2405-2415, 2018	179	2018
Attentive relational networks for mapping images to scene graphs M Qi, W Li, Z Yang, Y Wang, J Luo IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3957-3966, 2019	169	2019
Prompting gpt-3 to be reliable C Si, Z Gan, Z Yang, S Wang, J Wang, J Boyd-Graber, L Wang International Conference on Learning Representations (ICLR 23), 2022	155	2022
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption Z Yang, Y Lu, J Wang, X Yin, D Florencio, L Wang, C Zhang, L Zhang, ... IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021	147	2021
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation Y Yin, F Meng, J Su, C Zhou, Z Yang, J Zhou, J Luo Annual Meeting of the Association for Computational Linguistics (ACL), 2020	123	2020
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling Z Yang, Z Gan, J Wang, X Hu, F Ahmed, Z Liu, Y Lu, L Wang European Conference on Computer Vision (ECCV), 521--539, 2022	120*	2022
Mm-vet: Evaluating large multimodal models for integrated capabilities W Yu, Z Yang, L Li, J Wang, K Lin, Z Liu, X Wang, L Wang arXiv preprint arXiv:2308.02490, 2023	119	2023
Promptcap: Prompt-guided task-aware image captioning Y Hu, H Hua, Z Yang, W Shi, NA Smith, J Luo arXiv preprint arXiv:2211.09699, 2022	72*	2022
Multimodal foundation models: From specialists to general-purpose assistants C Li, Z Gan, Z Yang, J Yang, L Li, L Wang, J Gao arXiv preprint arXiv:2309.10020 1 (2), 2, 2023	71	2023
Dynamic context-guided capsule network for multimodal machine translation H Lin, F Meng, J Su, Y Yin, Z Yang, Y Ge, J Zhou, J Luo Proceedings of the 28th ACM International Conference on Multimedia, 1320-1329, 2020	69	2020
SAT: 2D Semantics Assisted Training for 3D Visual Grounding Z Yang, S Zhang, L Wang, J Luo IEEE International Conference on Computer Vision (ICCV), 2021	68	2021

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors