
🎉 Congratulations!
We are excited to announce that a team consisting of relAI PhD students Shuo Chen, Bailan He, and Jingpei Wu, along with relAI Fellow Volker Tresp and members of the Torr Vision Group from the University of Oxford and TU Berlin, received the Honorable Mention Award at OpenAI Red-Teaming Challenge on Kaggle. They ranked among the top 20 teams (Top 3%) out of 5,911 participants and over 600 teams.
The Red Teaming Challenge, initiated by OpenAI, tasked participants with probing its newly released open-weight model, gpt-oss-20b. The objective was to identify previously undetected vulnerabilities and harmful behaviors, such as lying, deceptive alignment, and reward-hacking exploits.
Would you like to learn more about the awarded work?
The write-up of the hackathon and the accompanying paper, “Bag of Tricks for Subverting Reasoning-Based Safety Guardrails,” detail the findings of the study, revealing systemic vulnerabilities in recent reasoning-based safety guardrails like Deliberative Alignment.
👉 Check them out: https://chenxshuo.github.io/bag-of-tricks/