BET leaderboard V1
Techniques pour "craker" un llm
- Low ressource languages : Translating prompts through less common languages
- Redirect refusal : Asking the model to start refusing, then comply with the request anyway.
- Step by step : Asking the model to break down the answer into steps
- Payload splitting : Breaking harmful words in pieces with special characters
- Past tense : Framing prompts in past tense
- High stake : Framing the prompt as being high stake
- Nefarious goals : Stating harmful intentions as being nefarious directly
- Noble goals : Framing harmful requests as beneficial
Thu Feb 20 15:42:28 2025 - permalink -
-
https://www.prism-eval.ai/bet-leaderboard-v1