Trapdoor Prompts and the Hidden Behaviors of Language Models
-
A trapdoor prompt is an input designed to trigger a specific output from a
language model, without using any of the words in that output. It’s not a
gues...
2 weeks ago
1 comments:
I think especially annoying are instances where you zoom in with binoculars or the rifle scope, and also the interruptions of advice from your team
Post a Comment