Biography
I am currently a second-year Master student and an incoming Ph.D. student (Fall 2025) at ShanghaiTech University, to be fortunately advised by Prof. Wenjie Wang. Prior to that, I received my Bachelor's degree in Physics from ShanghaiTech University in 2023.
My research interest lies in real-world AI safety issues as well as their mitigation techniques, with a specific emphasis on language model behavior control and adversarial manipulations.
News
- [Jun. 2025] My first work DSN is accepted to ACL 2025 Findings. ๐๐๐
- [Apr. 2025] Invited as a guest lecturer for CS246: Trustworthy ML course (LLM Jailbreaking).
- [Oct. 2024] Secure the first place in JailbreakBench, an open-sourced jailbreak leaderboard.
- [Oct. 2024] Secure the best white-box method in CLAS, a NeurIPS 2024 Contest
- [Aug. 2024] Awarded "Outstanding Student" (top 10%) for 2023โ2024 academic year.
- [Sep. 2023] Joined ASPIRE Lab, to begin my exploration in CS, eventually.
Publications & Preprints
Beyond Jailbreaks: Revealing Stealthier and Broader LLM Security Risks Stemming from Alignment Failures
Yukai Zhou, Sibei Yang, Wenjie Wang
ArXiv 2025. [PDF] [Code] [Website] [Dataset]
Donโt Say No: Jailbreaking LLM by Suppressing Refusal
Yukai Zhou, Jian Lou, Zhijie Huang, Zhan Qin, Sibei Yang, Wenjie Wang
ACL 2025 Findings. [PDF] [Code]
Research Interests
Real-world AI Safety Issues
- Jailbreaking Attack in Large Language Models.
- Real-world Implications for Jailbreaking.
- Agent Safety.
Mitigations Towards Those Safety Issues
- Alignment & LLM Post-training.
- Adversarial Attack & Training.
- Defensive Techniques.
Awards and Services
Conference Reviewer:
ACL ARR 2025, etc.
Guest Lecturer:
To give a lecture upon llm jailbreaking in Trustworthy ML course CS246 (April, 2, 2025)
Outstanding Student:
Awarded to the top 10%
Teaching Assistant:
Introduction to Information Science and Technology, SI100b 24Spring
Undergraduate Mentor:
Dadao college, Sep. 2023 - Jan. 2024
|