Zhehao Zhang

PhD Student in Computer Science & Engineering (Language Agent Safety + LLM Alignment)

I am a first year PhD student in Computer Science & Engineering at The Ohio State University, with research interests in Language Agent Safety and Robustness of Large Language Models and Alignment. My work focuses on evaluating and mitigating the refusal behavior of LLMs, developing methods to improve the safety and reliability of language models in real-world applications.

Previously I have worked as an Applied Scientist Intern at Amazon and have collaborated with researchers at Stanford SALT Lab, Adobe Research, and Microsoft Research Lab – Asia, working on cutting-edge NLP research and applications.

Education

2025 — Present

Ph.D. in Computer Science

The Ohio State University, Columbus, OH

Advisor: Yu Su

Research focus: Language Agent Safety and Robustness of LLMs, Alignment

2023 — 2024

M.S. in Computer Science

Dartmouth College, Hanover, NH

Research focus: Natural Language Processing

2019 — 2023

B.Eng. in Artificial Intelligence (Honor Class)

Shanghai Jiao Tong University, Shanghai, China

Honor Class in Artificial Intelligence

Industry Research Experience

Jun 2026 — Aug 2026

Netflix, Los Gatos, CA

Machine Learning Intern, Netflix

Work as a machine learning intern on large language models and language agents.

Nov 2024 — Jun 2025

Amazon, Seattle, WA

Applied Scientist Intern, People eXperience and Technology (PXT) Central Science

Work as an applied scientist intern on evaluating and mitigating the refusal behavior of LLMs.

Jun 2024 — Aug 2024

Adobe Research, San Jose, CA

Research Intern, Adobe Research

Research in multi-modal large language models and visual perception enhancement.

Dec 2022 — Aug 2023

Microsoft Research Lab – Asia, Beijing, China

Research Intern, Data, Knowledge, and Intelligence Group

Research in hierarchical table analysis and complex reasoning question answering over tabular data.

Academic Research Experience

2025 — Present

The Ohio State University, Columbus, OH

PhD Student, OSU NLP Lab

Mentor: Yu Su

Research in Language Agent Safety and Robustness of LLMs, Alignment. Advised by Prof. Yu Su and Prof. Huan Sun.

2023 — 2024

Stanford University, Stanford, CA

Research Intern, Social and Language Technologies (SALT) Lab

Mentor: Diyi Yang

Research in Natural Language Processing, focusing on synthetic data and dynamic evaluation of large language models.

Honors and Awards

2026

ICML Gold Reviewer

2025

Graduate Fellowship

awarded by Ohio State University

2025

COLM 2025 Travel Grant

2025

ICLR Notable Reviewer

2023-2025

Merit Scholarship

awarded by Dartmouth College

2019-2023

Zhiyuan Honor Scholarship and Merit Scholarship

awarded by SJTU

Publications

For the most up-to-date list of publications, please refer to my Google Scholar profile.

Conference

When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents

Jaylen Jones*, Zhehao Zhang*, Yuting Ning, Eric Fosler-Lussier, Pierre-Luc St-Charles, Yoshua Bengio, Dawn Song, Yu Su, Huan Sun

The 43rd International Conference on Machine Learning (ICML). 2026.

Project PDF Code Data BibTeX *Authors contributed equally

When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents

Yuting Ning, Jaylen Jones, Zhehao Zhang, Chentao Ye, Weitong Ruan, Junyi Li, Rahul Gupta, Huan Sun

The 43rd International Conference on Machine Learning (ICML). 2026.

Project PDF Code BibTeX

Falsereject: A resource for improving contextual safety and mitigating over-refusals in llms via structured reasoning

Zhehao Zhang, Weijie Xu, Fanyou Wu, Chandan K Reddy

Conference on Language Modeling (COLM). 2025.

Project PDF Code BibTeX

DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph

Zhehao Zhang, Jiaao Chen, Diyi Yang

Advances in Neural Information Processing Systems (NeurIPS). Vancouver, Canada, 2024.

Project PDF Code BibTeX

VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use

Zhehao Zhang, Ryan A. Rossi, Tong Yu, Franck Dernoncourt, Ruiyi Zhang, Jiuxiang Gu, Sungchul Kim, Xiang Chen, Zichao Wang, Nedim Lipka

The 40th Annual AAAI Conference on Artificial Intelligence (AAAI). 2026.

Project PDF BibTeX

Is GPT-4V (ision) All You Need for Automating Academic Data Visualization? Exploring Vision-Language Models' Capability in Reproducing Academic Charts

Zhehao Zhang, Weicheng Ma, Soroush Vosoughi

Findings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP). Miami, FL, USA, 2024.

Project PDF Code BibTeX

E5: Zero-shot Hierarchical Table Analysis using Augmented LLMs via Explain, Extract, Execute, Exhibit, and Extrapolate

Zhehao Zhang, Yan Gao, Jian-Guang Lou

Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). Mexico City, Mexico, 2024.

Project PDF Code BibTeX

CRT-QA: A Dataset of Complex Reasoning Question Answering over Tabular Data

Zhehao Zhang, Xitao Li, Yan Gao, Jian-Guang Lou

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). Singapore, 2023.

Project PDF Code BibTeX

Mitigating Biases in Hate Speech Detection from A Causal Perspective

Zhehao Zhang, Jiaao Chen, Diyi Yang

Findings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). Singapore, 2023.

Project PDF Code BibTeX

Journal

Personalization of Large Language Models: A Survey

Zhehao Zhang, Ryan A. Rossi, Branislav Kveton, Yijia Shao, Diyi Yang, Hamed Zamani, Franck Dernoncourt, Joe Barrow, Tong Yu, Sungchul Kim, Ruiyi Zhang, Jiuxiang Gu, Tyler Derr, Hongjie Chen, Junda Wu, Xiang Chen, Zichao Wang, Subrata Mitra, Nedim Lipka, Nesreen Ahmed, Yu Wang

Transactions on Machine Learning Research (TMLR). 2025.

Project PDF BibTeX

Can Large Language Models Transform Computational Social Science?

Caleb Ziems, William Held, Omar Shaikh, Jiaao Chen, Zhehao Zhang, Diyi Yang

Computational Linguistics (CL). 2023.

Project PDF Code BibTeX

Preprint

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

Yuting Ning*, Zhehao Zhang*, Yash Kumar Lal, Boyu Gou, Junyi Li, Weitong Ruan, Chentao Ye, Rahul Gupta, Diyi Yang, Yu Su, Huan Sun

arXiv preprint (arXiv). 2026.

Project PDF Code Data BibTeX *Authors contributed equally

QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks

Jian Xie, Tianhe Lin, Zilu Wang, Yuting Ning, Yuekun Yao, Tianci Xue, Zhehao Zhang, Zhongyang Li, Kai Zhang, Yufan Wu, Shijie Chen, Boyu Gou, Mingzhe Han, Yifei Wang, Vint Lee, Xinpeng Wei, Xiangjun Wang, Yu Su, Huan Sun

arXiv preprint (arXiv). 2026.

Project Demo PDF Code Model BibTeX

Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense

Zhehao Zhang, Weijie Xu, Shixian Cui, Chandan K Reddy

arXiv preprint (arXiv). 2025.

Project PDF Code BibTeX

Service

Reviewer	EMNLP 2023, 2024; NeurIPS 2023, 2024, 2025; NAACL 2024; ACL 2024, 2025; COLM 2024 CIKM 2024, 2025; ICLR 2025; COLING 2025; IJCAI 2025; IEEE TNNLS Journal
Volunteer	EMNLP 2023; NAACL 2024

References

Prof. Yu Su Associate Professor, Computer Science & Engineering The Ohio State University su.809@osu.edu

Prof. Huan Sun Associate Professor, Computer Science & Engineering The Ohio State University sun.397@osu.edu

Prof. Diyi Yang Assistant Professor, Computer Science Stanford University diyiy@cs.stanford.edu

Dr. Ryan Rossi Principal Research Scientist Adobe Research ryrossi@adobe.com