Obserwuj
Zifan Wang
Zifan Wang
Zweryfikowany adres z andrew.cmu.edu
Tytuł
Cytowane przez
Cytowane przez
Rok
Score-CAM: Score-weighted visual explanations for convolutional neural networks
H Wang, Z Wang, M Du, F Yang, Z Zhang, S Ding, P Mardziel, X Hu
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020
11982020
Universal and transferable adversarial attacks on aligned language models
A Zou, Z Wang, N Carlini, M Nasr, JZ Kolter, M Fredrikson
arXiv preprint arXiv:2307.15043, 2023
9152023
Representation engineering: A top-down approach to ai transparency
A Zou, L Phan, S Chen, J Campbell, P Guo, R Ren, A Pan, X Yin, ...
arXiv preprint arXiv:2310.01405, 2023
2422023
Globally-Robust Neural Networks
K Leino, Z Wang, M Fredrikson
Proceedings of ICML 2021, 2021
1542021
Harmbench: A standardized evaluation framework for automated red teaming and robust refusal
M Mazeika, L Phan, X Yin, A Zou, Z Wang, N Mu, E Sakhaee, N Li, ...
arXiv preprint arXiv:2402.04249, 2024
1242024
The wmdp benchmark: Measuring and reducing malicious use with unlearning
N Li, A Pan, A Gopal, S Yue, D Berrios, A Gatti, JD Li, AK Dombrowski, ...
arXiv preprint arXiv:2403.03218, 2024
722024
Towards frequency-based explanation for robust cnn
Z Wang, Y Yang, A Shrivastava, V Rawal, Z Ding
arXiv preprint arXiv:2005.03141, 2020
542020
Smoothed Geometry for Robust Attribution
Z Wang, H Wang, S Ramkumar, M Fredrikson, P Mardziel, A Datta
Proceedings of NeurIPS 2020, 2020
532020
Consistent counterfactuals for deep models
E Black, Z Wang, M Fredrikson, A Datta
arXiv preprint arXiv:2110.03109, 2021
512021
Robust models are more interpretable because attributions look normal
Z Wang, M Fredrikson, A Datta
arXiv preprint arXiv:2103.11257, 2021
232021
Can LLMs Follow Simple Rules?
N Mu, S Chen, Z Wang, S Chen, D Karamardian, L Aljeraisy, B Alomair, ...
arXiv preprint arXiv:2311.04235, 2023
222023
Interpreting interpretations: Organizing attribution methods by criteria
Z Wang, P Mardziel, A Datta, M Fredrikson
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020
202020
Machine learning explainability and robustness: connected at the hip
A Datta, M Fredrikson, K Leino, K Lu, S Sen, Z Wang
Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data …, 2021
152021
Influence Patterns for Explaining Information Flow in BERT
K Lu, Z Wang, P Mardziel, A Datta
arXiv preprint arXiv:2011.00740, 2020
142020
Scaling in depth: Unlocking robustness certification on imagenet
K Hu, A Zou, Z Wang, K Leino, M Fredrikson
arXiv preprint arXiv:2301.12549, 2023
112023
Unlocking deterministic robustness certification on imagenet
K Hu, A Zou, Z Wang, K Leino, M Fredrikson
Advances in Neural Information Processing Systems 36, 2024
82024
A recipe for improved certifiable robustness: Capacity and data
K Hu, K Leino, Z Wang, M Fredrikson
arXiv preprint arXiv:2310.02513, 2023
82023
Improving robust generalization by direct pac-bayesian bound minimization
Z Wang, N Ding, T Levinboim, X Chen, R Soricut
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023
72023
Learning modulo theories
M Fredrikson, K Lu, S Vijayakumar, S Jha, V Ganesh, Z Wang
arXiv preprint arXiv:2301.11435, 2023
52023
Reconstructing Actions To Explain Deep Reinforcement Learning
X Chen, Z Wang, Y Fan, B Jin, P Mardziel, C Joe-Wong, A Datta
arXiv preprint arXiv:2009.08507, 2020
5*2020
Nie można teraz wykonać tej operacji. Spróbuj ponownie później.
Prace 1–20