Yukai Zhou (ๅ‘จๅฎ‡ๅ‡ฏ)


Yukai Zhou

Yukai Zhou

Second-year Master student, also as an incoming 25Fall PhD student

ASPIRE Lab
Visual & Data Intelligence (VDI) Center
School of Information Science and Technology
ShanghaiTech University

Address: 393 Middle Huaxia Road, Pudong New Area, Shanghai, 201210, China

E-mail: zhouyk12023 [at] shanghaitech.edu.cn; frank0606thou [at] gmail.com

[Google Scholar] [DBLP] [GitHub] [CV]


Biography

I am currently a second-year Master student and an incoming Ph.D. student (Fall 2025) at ShanghaiTech University, to be fortunately advised by Prof. Wenjie Wang. Prior to that, I received my Bachelor's degree in Physics from ShanghaiTech University in 2023. My research interest lies in real-world AI safety issues as well as their mitigation techniques, with a specific emphasis on language model behavior control and adversarial manipulations.


News

  • [Jun. 2025] My first work DSN is accepted to ACL 2025 Findings. ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰
  • [Apr. 2025] Invited as a guest lecturer for CS246: Trustworthy ML course (LLM Jailbreaking).
  • [Oct. 2024] Secure the first place in JailbreakBench, an open-sourced jailbreak leaderboard.
  • [Oct. 2024] Secure the best white-box method in CLAS, a NeurIPS 2024 Contest
  • [Aug. 2024] Awarded "Outstanding Student" (top 10%) for 2023โ€“2024 academic year.
  • [Sep. 2023] Joined ASPIRE Lab, to begin my exploration in CS, eventually.


Publications & Preprints

  • Beyond Jailbreaks: Revealing Stealthier and Broader LLM Security Risks Stemming from Alignment Failures
    Yukai Zhou, Sibei Yang, Wenjie Wang
    ArXiv 2025. [PDF] [Code] [Website] [Dataset]

  • Donโ€™t Say No: Jailbreaking LLM by Suppressing Refusal
    Yukai Zhou, Jian Lou, Zhijie Huang, Zhan Qin, Sibei Yang, Wenjie Wang
    ACL 2025 Findings. [PDF] [Code]


Research Interests

  • Real-world AI Safety Issues
    - Jailbreaking Attack in Large Language Models.
    - Real-world Implications for Jailbreaking.
    - Agent Safety.

  • Mitigations Towards Those Safety Issues
    - Alignment & LLM Post-training.
    - Adversarial Attack & Training.
    - Defensive Techniques.


Awards and Services

  • Conference Reviewer: ACL ARR 2025, etc.

  • Guest Lecturer: To give a lecture upon llm jailbreaking in Trustworthy ML course CS246 (April, 2, 2025)

  • Outstanding Student: Awarded to the top 10%

  • Teaching Assistant: Introduction to Information Science and Technology, SI100b 24Spring

  • Undergraduate Mentor: Dadao college, Sep. 2023 - Jan. 2024