Zhehao Zhang

PhD Student in Computer Science & Engineering (Language Agent Safety + LLM Alignment)

I am a first year PhD student in Computer Science & Engineering at The Ohio State University, with research interests in Language Agent Safety and Robustness of Large Language Models and Alignment. My work focuses on evaluating and mitigating the refusal behavior of LLMs, developing methods to improve the safety and reliability of language models in real-world applications.
Previously I have worked as an Applied Scientist Intern at Amazon and have collaborated with researchers at Stanford SALT Lab, Adobe Research, and Microsoft Research Lab – Asia, working on cutting-edge NLP research and applications.

Education

2025 — Present
Ph.D. in Computer Science
The Ohio State University, Columbus, OH
Advisor: Yu Su
Research focus: Language Agent Safety and Robustness of LLMs, Alignment
2023 — 2024
M.S. in Computer Science
Dartmouth College, Hanover, NH
Research focus: Natural Language Processing
2019 — 2023
B.Eng. in Artificial Intelligence (Honor Class)
Shanghai Jiao Tong University, Shanghai, China
Honor Class in Artificial Intelligence

Industry Research Experience

Jun 2026 — Aug 2026
Netflix, Los Gatos, CA
Machine Learning Intern, Netflix
Work as a machine learning intern on large language models and language agents.
Nov 2024 — Jun 2025
Amazon, Seattle, WA
Applied Scientist Intern, People eXperience and Technology (PXT) Central Science
Work as an applied scientist intern on evaluating and mitigating the refusal behavior of LLMs.
Jun 2024 — Aug 2024
Adobe Research, San Jose, CA
Research Intern, Adobe Research
Research in multi-modal large language models and visual perception enhancement.
Dec 2022 — Aug 2023
Microsoft Research Lab – Asia, Beijing, China
Research Intern, Data, Knowledge, and Intelligence Group
Research in hierarchical table analysis and complex reasoning question answering over tabular data.

Academic Research Experience

2025 — Present
The Ohio State University, Columbus, OH
PhD Student, OSU NLP Lab
Mentor: Yu Su
Research in Language Agent Safety and Robustness of LLMs, Alignment. Advised by Prof. Yu Su and Prof. Huan Sun.
2023 — 2024
Stanford University, Stanford, CA
Research Intern, Social and Language Technologies (SALT) Lab
Mentor: Diyi Yang
Research in Natural Language Processing, focusing on synthetic data and dynamic evaluation of large language models.

Honors and Awards

2026
ICML Gold Reviewer
2025
Graduate Fellowship
awarded by Ohio State University
2025
COLM 2025 Travel Grant
2025
ICLR Notable Reviewer
2023-2025
Merit Scholarship
awarded by Dartmouth College
2019-2023
Zhiyuan Honor Scholarship and Merit Scholarship
awarded by SJTU

Publications

For the most up-to-date list of publications, please refer to my Google Scholar profile.

Conference

C9
When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents
Jaylen Jones*, Zhehao Zhang*, Yuting Ning, Eric Fosler-Lussier, Pierre-Luc St-Charles, Yoshua Bengio, Dawn Song, Yu Su, Huan Sun
The 43rd International Conference on Machine Learning (ICML). 2026.
Project PDF Code Data BibTeX *Authors contributed equally
C8
When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents
Yuting Ning, Jaylen Jones, Zhehao Zhang, Chentao Ye, Weitong Ruan, Junyi Li, Rahul Gupta, Huan Sun
The 43rd International Conference on Machine Learning (ICML). 2026.
C7
Falsereject: A resource for improving contextual safety and mitigating over-refusals in llms via structured reasoning
Zhehao Zhang, Weijie Xu, Fanyou Wu, Chandan K Reddy
Conference on Language Modeling (COLM). 2025.
C6
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
Zhehao Zhang, Jiaao Chen, Diyi Yang
Advances in Neural Information Processing Systems (NeurIPS). Vancouver, Canada, 2024.
C5
VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use
Zhehao Zhang, Ryan A. Rossi, Tong Yu, Franck Dernoncourt, Ruiyi Zhang, Jiuxiang Gu, Sungchul Kim, Xiang Chen, Zichao Wang, Nedim Lipka
The 40th Annual AAAI Conference on Artificial Intelligence (AAAI). 2026.
C4
Is GPT-4V (ision) All You Need for Automating Academic Data Visualization? Exploring Vision-Language Models' Capability in Reproducing Academic Charts
Zhehao Zhang, Weicheng Ma, Soroush Vosoughi
Findings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP). Miami, FL, USA, 2024.
C3
E5: Zero-shot Hierarchical Table Analysis using Augmented LLMs via Explain, Extract, Execute, Exhibit, and Extrapolate
Zhehao Zhang, Yan Gao, Jian-Guang Lou
Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). Mexico City, Mexico, 2024.
C2
CRT-QA: A Dataset of Complex Reasoning Question Answering over Tabular Data
Zhehao Zhang, Xitao Li, Yan Gao, Jian-Guang Lou
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). Singapore, 2023.
C1
Mitigating Biases in Hate Speech Detection from A Causal Perspective
Zhehao Zhang, Jiaao Chen, Diyi Yang
Findings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). Singapore, 2023.

Journal

J2
Personalization of Large Language Models: A Survey
Zhehao Zhang, Ryan A. Rossi, Branislav Kveton, Yijia Shao, Diyi Yang, Hamed Zamani, Franck Dernoncourt, Joe Barrow, Tong Yu, Sungchul Kim, Ruiyi Zhang, Jiuxiang Gu, Tyler Derr, Hongjie Chen, Junda Wu, Xiang Chen, Zichao Wang, Subrata Mitra, Nedim Lipka, Nesreen Ahmed, Yu Wang
Transactions on Machine Learning Research (TMLR). 2025.
J1
Can Large Language Models Transform Computational Social Science?
Caleb Ziems, William Held, Omar Shaikh, Jiaao Chen, Zhehao Zhang, Diyi Yang
Computational Linguistics (CL). 2023.

Preprint

3
SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction
Yuting Ning*, Zhehao Zhang*, Yash Kumar Lal, Boyu Gou, Junyi Li, Weitong Ruan, Chentao Ye, Rahul Gupta, Diyi Yang, Yu Su, Huan Sun
arXiv preprint (arXiv). 2026.
Project PDF Code Data BibTeX *Authors contributed equally
2
QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks
Jian Xie, Tianhe Lin, Zilu Wang, Yuting Ning, Yuekun Yao, Tianci Xue, Zhehao Zhang, Zhongyang Li, Kai Zhang, Yufan Wu, Shijie Chen, Boyu Gou, Mingzhe Han, Yifei Wang, Vint Lee, Xinpeng Wei, Xiangjun Wang, Yu Su, Huan Sun
arXiv preprint (arXiv). 2026.
1
Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense
Zhehao Zhang, Weijie Xu, Shixian Cui, Chandan K Reddy
arXiv preprint (arXiv). 2025.

Service

Reviewer EMNLP 2023, 2024; NeurIPS 2023, 2024, 2025; NAACL 2024; ACL 2024, 2025; COLM 2024
CIKM 2024, 2025; ICLR 2025; COLING 2025; IJCAI 2025; IEEE TNNLS Journal
Volunteer EMNLP 2023; NAACL 2024

References

Prof. Yu Su Associate Professor, Computer Science & Engineering The Ohio State University su.809@osu.edu

Prof. Huan Sun Associate Professor, Computer Science & Engineering The Ohio State University sun.397@osu.edu

Prof. Diyi Yang Assistant Professor, Computer Science Stanford University diyiy@cs.stanford.edu

Dr. Ryan Rossi Principal Research Scientist Adobe Research ryrossi@adobe.com