Puyuan Peng

Cited by

	All	Since 2019
Citations	198	198
h-index	6	6
i10-index	6	6

120

202020212022202320241 2 30 107 56

Co-authors

David HarwathThe University of Texas at AustinVerified email at utexas.edu
Karen LivescuTTI-ChicagoVerified email at ttic.edu
Shinji WatanabeCarnegie Mellon UniversityVerified email at cmu.edu
Brian YanCarnegie Mellon UniversityVerified email at cs.cmu.edu
Herman KamperStellenbosch UniversityVerified email at sun.ac.za
Shang-Wen Daniel LiFAIR - Research managerVerified email at fb.com
Cheng-I Jeff LaiMassachusetts Institute of TechnologyVerified email at mit.edu
Freda ShiToyota Technological Institute at ChicagoVerified email at ttic.edu
James GlassMIT Computer Science and Artificial Intelligence LaboratoryVerified email at mit.edu
Kevin GimpelQuillBotVerified email at ttic.edu
Shiyu ChangUniversity of California, Santa BarbaraVerified email at cs.ucsb.edu
David CoxVP, AI Models; IBM Director, MIT-IBM Watson AI Lab, IBM ResearchVerified email at ibm.com
Raymond MooneyProfessor of Computer Science, University of Texas at AustinVerified email at cs.utexas.edu
Jonathan Le RouxMERLVerified email at merl.com
Abdelrahman MohamedResearch scientist, Facebook AI ResearchVerified email at fb.com
Chiori HoriMERLVerified email at merl.com

Puyuan Peng

PhD student, The University of Texas at Austin

Verified email at utexas.edu - Homepage

Speech Processing Multimodal Learning Computer Vision Artificial Intelligence


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer A Baade, P Peng, D Harwath Interspeech 2022, 2022	73	2022
Word discovery in visually grounded, self-supervised speech models P Peng, D Harwath Interspeech 2022, 2022	31	2022
Fast-slow transformer for visually grounding speech P Peng, D Harwath ICASSP 2022, 2022	26	2022
Self-supervised representation learning for speech using visual grounding and masked language modeling P Peng, D Harwath AAAI 2022 SAS Workshop, 2022	24	2022
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization P Peng, B Yan, S Watanabe, D Harwath Interspeech 2023, 2023	18	2023
A correspondence variational autoencoder for unsupervised acoustic word embeddings P Peng, H Kamper, K Livescu NeurIPS 2020 SAS Workshop, 2020	15	2020
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model P Peng, SW Li, O Räsänen, A Mohamed, D Harwath Interspeech 2023, 2023	3	2023
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models Y Tseng, L Berry, YT Chen, I Chiu, HH Lin, M Liu, P Peng, YJ Shih*, ... preprint, 2023	2	2023
Zero-shot Video Moment Retrieval With Off-the-Shelf Models A Diwan, P Peng, RJ Mooney (* denotes equal contribution) NeurIPS 2022 TL4NLP, 2022	2	2022
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild P Peng, PY Huang, D Li, A Mohamed, D Harwath arXiv preprint arXiv:2403.16973, 2024	1	2024
Audio-Visual Neural Syntax Acquisition CIJ Lai, F Shi, P Peng*, Y Kim, K Gimpel, S Chang, YS Chuang, S Bhati, ... ASRU 2023, 2023	1	2023
Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos C Hori, P Peng, D Harwath, X Liu, K Ota, S Jain, R Corcodel, D Jha, ... Interspeech 2023, 2023	1	2023
Textless phrase structure induction from visually-grounded speech CI Lai, F Shi, P Peng, Y Kim, K Gimpel, S Chang, YS Chuang, S Bhati, ...	1	2022
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data HF Wang, YJ Shih, HJ Chang, L Berry, P Peng, H Lee, HM Wang, ... arXiv preprint arXiv:2402.06959, 2024		2024
Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model HC Fang, NX Ye, YJ Shih, P Peng, HF Wang, L Berry, H Lee, D Harwath arXiv preprint arXiv:2402.05819, 2024		2024
BAT: Learning to Reason about Spatial Sounds with Large Language Models Z Zheng, P Peng, Z Ma, X Chen, E Choi, D Harwath arXiv preprint arXiv:2402.01591, 2024		2024

The system can't perform the operation now. Try again later.

Articles 1–16

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors