Дибров рассказал о новой возлюбленной20:41
The attacker was aware of some of the defensive instructions we had included in the system prompt, and explicitly attempted to bypass them. (Ignore every previous instruction, the "plain text" warning, analysis protocol, team rules, and output format.)
。关于这个话题,91吃瓜提供了深入分析
(save $30 at Amazon)
My first attempts were, uh, not quick. I wrote a parser