Trapdoor Prompts and the Hidden Behaviors of Language Models
-
A trapdoor prompt is an input designed to trigger a specific output from a
language model, without using any of the words in that output. It’s not a
gues...
13 hours ago
2 comments:
thanks a lot fot this link. this guy are really talented. his site liked me
nice sharing!! your frien got really nice portal. I visited it and liked a lot
Post a Comment