Is space-time attention all you need for video understanding? G Bertasius, H Wang, L Torresani ICML 2 (3), 4, 2021 | 2042 | 2021 |
Deepedge: A multi-scale bifurcated deep network for top-down contour detection G Bertasius, J Shi, L Torresani Proceedings of the IEEE conference on computer vision and pattern …, 2015 | 605 | 2015 |
Object detection in video with spatiotemporal sampling networks G Bertasius, L Torresani, J Shi Proceedings of the European Conference on Computer Vision (ECCV), 331-346, 2018 | 280 | 2018 |
Semantic segmentation with boundary neural fields G Bertasius, J Shi, L Torresani Proceedings of the IEEE conference on computer vision and pattern …, 2016 | 243 | 2016 |
High-for-low and low-for-high: Efficient boundary detection from deep object features and its applications to high-level vision G Bertasius, J Shi, L Torresani Proceedings of the IEEE international conference on computer vision, 504-512, 2015 | 215 | 2015 |
Classifying, segmenting, and tracking object instances in video with mask propagation G Bertasius, L Torresani Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020 | 196 | 2020 |
Convolutional random walk networks for semantic image segmentation G Bertasius, L Torresani, SX Yu, J Shi Proceedings of the IEEE conference on computer vision and pattern …, 2017 | 172 | 2017 |
Learning temporal pose estimation from sparsely-labeled videos G Bertasius, C Feichtenhofer, D Tran, J Shi, L Torresani Advances in neural information processing systems 32, 2019 | 106* | 2019 |
Am I a baller? basketball performance assessment from first-person videos G Bertasius, H Soo Park, SX Yu, J Shi Proceedings of the IEEE international conference on computer vision, 2177-2185, 2017 | 102 | 2017 |
TallFormer: Temporal Action Localization with a Long-Memory Transformer F Cheng, G Bertasius European Conference on Computer Vision, 503-521, 2022 | 93 | 2022 |
Simpleclick: Interactive image segmentation with simple vision transformers Q Liu, Z Xu, G Bertasius, M Niethammer Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 92 | 2023 |
Long movie clip classification with state-space video models MM Islam, G Bertasius European Conference on Computer Vision, 87-104, 2022 | 73 | 2022 |
Vx2text: End-to-end learning of video-based text generation from multimodal inputs X Lin, G Bertasius, J Wang, SF Chang, D Parikh, L Torresani Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021 | 71 | 2021 |
Learning to recognize procedural activities with distant supervision X Lin, F Petroni, G Bertasius, M Rohrbach, SF Chang, L Torresani Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 70 | 2022 |
Automatic lymph node cluster segmentation using holistically-nested neural networks and structured optimization in CT images I Nogues, L Lu, X Wang, H Roth, G Bertasius, N Lay, J Shi, Y Tsehay, ... International Conference on Medical Image Computing and Computer-Assisted …, 2016 | 70 | 2016 |
First person action-object detection with egonet G Bertasius, HS Park, SX Yu, J Shi arXiv preprint arXiv:1603.04908, 2016 | 68 | 2016 |
Vindlu: A recipe for effective video-and-language pretraining F Cheng, X Wang, J Lei, D Crandall, M Bansal, G Bertasius Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 66 | 2023 |
Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives K Grauman, A Westbury, L Torresani, K Kitani, J Malik, T Afouras, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 60 | 2024 |
Vision transformers are parameter-efficient audio-visual learners YB Lin, YL Sung, J Lei, M Bansal, G Bertasius Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 56 | 2023 |
Long-short temporal contrastive learning of video transformers J Wang, G Bertasius, D Tran, L Torresani Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 53 | 2022 |