Trapdoor Prompts and the Hidden Behaviors of Language Models
-
A trapdoor prompt is an input designed to trigger a specific output from a
language model, without using any of the words in that output. It’s not a
gues...
3 days ago
1 comments:
this is the best gamer??? lol)) don't make me laugh, please. well, I;d like to take a battle with him ;)
Post a Comment