Can you defeat the new Claud model protective system from Anthropic? After 6,000 hours of effort in the Bug Bounty program, the company now gives you the opportunity to challenge this model of artificial intelligence in a general experiment.
The anthropic has just introduced a new system called Constittional Classifiers, which the company says can filter the effort to break the rules and limitations of the Claude artificial intelligence model. According to Arstechnica, the system has been designed to counter unauthorized attacks and requests and has been able to prevent more than 6,000 hours of Banti’s bugs since the launch of internal tests.
The company has invited everyone to get into the test and see if they can defeat this model to achieve unauthorized results. Anthropic wants users to try to make the Claud model answer 8 questions about chemical weapons.
The new anthropic system is based on a set of natural language rules that defines permissible and unauthorized information for the model. The system is designed to identify and filter users’ efforts to access sensitive information, even if they are hidden in complex or in the form of unrealistic stories.
The system has been able to respond effectively to the 6,000 simulated attacks created to test model vulnerabilities. On the other hand, the model was able to block 5 % of these attacks, but the previous model had only 2 % success.
How can the Claude model be bypass and break the new rules?
The anthropic also launched a program called “Bug Bounty” and asked experts and experts to design Jailbreak to bypass the Claude model protective system. After months of effort, only some were able to get practical information on 5 of these 10 questions.
This new system, despite the significant successes, will continue to require continuous efforts to counter new Jailbreak techniques. The anthropic team is confident that its system can quickly be updated to tackle new and unauthorized attacks.
The general test of the system will continue from February 1 to February 1 (February 16th to February 22), during which time users can access the experiment and try to answer these questions.
This anthropic action is a major step towards improving security and reducing the risks caused by improper use of artificial intelligence. There may still be ways to circumvent the system, but the new anthropic mechanism has significantly complicated efforts.
RCO NEWS