Trapdoor Prompts and the Hidden Behaviors of Language Models
-
A trapdoor prompt is an input designed to trigger a specific output from a
language model, without using any of the words in that output. It’s not a
gues...
4 days ago
1 comments:
in Fallout 2 there was much better endings. I mean it - don't know shy, but it likes me much
Post a Comment