A bug hunter discovered a bypass in Meta’s Prompt-Guard-86M model by inserting character-wise spaces between English alphabet characters, rendering the classifier ineffective in detecting harmful content.
A bug hunter discovered a bypass in Meta’s Prompt-Guard-86M model by inserting character-wise spaces between English alphabet characters, rendering the classifier ineffective in detecting harmful content.