Anthropic Developing Constitutional Classifiers To Safeguard AI Models From Jailbreak Attempts

#news #newstoday #tech #technews #latestnews #techupdates #newsupdates

Anthropic announced the development of a new system on Monday that can protect artificial intelligence (AI) models from jailbreaking attempts. Dubbed Constitutional Classifiers, it is a safeguarding technique that can detect when a jailbreaking attempt is made at the input level and prevent the AI from generating a harmful response as a result of it. The AI firm has tested the robustness of the system via independent jailbreakers and has also opened a temporary live demo of the system to let any interested individual test its capabilities.

Anthropic Unveils Constitutional Classifiers

Jailbreaking in generative AI refers to unusual prompt writing techniques that can force an AI model to not adhere to its training guidelines and generate harmful and inappropriate content. Jailbreaking is not a new thing, and most AI developers implement several safeguards against it within the model. However, since prompt engineers keep creating new techniques, it is difficult to build a large language model (LLM) that is completely protected from such attacks.

Some jailbreaking techniques include extremely long and convoluted prompts that confuse the AI’s reasoning capabilities. Others use multiple prompts to break down the safeguards, and some even use unusual capitalisation to break through AI defences.

In a post detailing the research, Anthropic announced that it is developing Constitutional Classifiers as a protective layer for AI models. There are two classifiers — input and output — which are provided with a list of principles to which the model should adhere. This list of principles is called a constitution. Notably, the AI firm already uses constitutions to align the Claude models.

How Constitutional Classifiers work
Photo Credit: Anthropic

Now, with Constitutional Classifiers, these principles define the classes of content that are allowed and disallowed. This constitution is used to generate a large number of prompts and model completions from Claude across different content classes. The generated synthetic data is also translated into different languages and transformed into known jailbreaking styles. This way, a large dataset of content is created that can be used to break into a model.

This synthetic data is then used to train the input and output classifiers. Anthropic conducted a bug bounty programme, inviting 183 independent jailbreakers to attempt to bypass Constitutional Classifiers. An in-depth explanation of how the system works is detailed in a research paper published on arXiv. The company claimed no universal jailbreak (one prompt style that works across different content classes) was discovered.

Further, during an automated evaluation test, where the AI firm hit Claude using 10,000 jailbreaking prompts, the success rate was found to be 4.4 percent, as opposed to 86 percent for an unguarded AI model. Anthropic was also able to minimise excessive refusals (refusal of harmless queries) and additional processing power requirements of Constitutional Classifiers.

However, there are certain limitations. Anthropic acknowledged that Constitutional Classifiers might not be able to prevent every universal jailbreak. It could also be less resistant towards new jailbreaking techniques designed specifically to beat the system. Those interested in testing the robustness of the system can find the live demo version here. It will stay active till February 10.

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who’sThat360 on Instagram and YouTube.

WhatsApp for Android Begins Testing Ability to Open View Once Media on Linked Devices

Check out our Latest News and Follow us at Facebook

Original Source

Threads Reportedly Testing New Feature for Importing Social Graph, Finding Creators Followed on X

ByNews Polite April 11, 2025

#news #newstoday #tech #technews #latestnews #techupdates #newsupdates Threads is said to be testing a new feature which aims to make it easier to import the social graph from other microblogging platforms. According to a tipster, it is currently in beta and not available to all users. The feature is expected…

Tech

Oppo Reno 8T Design, Colour Options Revealed Ahead of Launch: All Details

ByNews Polite January 21, 2023

Oppo Reno 8T design and colour options have been revealed by the company, ahead of its upcoming launch. The company has now set up a landing page teasing the arrival upcoming smartphone along with some of its specifications. The Oppo Reno 8T was previously spotted on various certification sites which…

Tech

Flipkart Big Billion Days Sale 2022 Begins for Plus Members: All You Need to Know

ByNews Polite September 21, 2022

Flipkart Big Billion Days Sale 2022 has begun exclusively for Flipkart Plus members. For the remaining customers, the sale will begin from Friday. Flipkart is offering up to 80 percent discount on electronic gadgets or devices like laptops, mobiles, smartwatches, TVs, and more. In addition, ICICI Bank and Axis Bank…

Tech

Xiaomi 13, Xiaomi 13 Pro, Xiaomi 13 Lite Launched Ahead of MWC 2023: Details

ByNews Polite February 25, 2023

Xiaomi 13 series had a global launch today, on February 26, a day before MWC 2023 begins. The smartphone series from the Chinese manufacturer sported three models — the vanilla Xiaomi 13, the Xiaomi 13 Pro and the Xiaomi 13 Lite. The smartphone series announced its launch in China in…

Tech

Flipkart Big Billion Days 2022 Sale: Best Smartphone Offers

ByNews Polite September 24, 2022

Flipkart Big Billion Days 2022 sale has now entered its second day. The seven-day sale will bring discounts on various categories and will continue until September 30. The domestic e-commerce company is offering up to 40 percent discount on mobile phones during the festive sale with additional no-cost EMI options…

Tech

Infinix Note 12i (2022) With MediaTek Helio G85 SoC, 50-Megapixel Rear Camera Launched: Price, Specifications

ByNews Polite September 30, 2022

Infinix Note 12i (2022) was launched in Indonesia on Friday. Offered in three different colour options, the latest Infinix smartphone has a triple rear camera setup led by a 50-megapixel main sensor. The Infinix Note 12i (2022) is powered by MediaTek’s Helio G85 SoC, coupled with 6GB of RAM. The…

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Anthropic Developing Constitutional Classifiers to Safeguard AI Models From Jailbreak Attempts

Anthropic Unveils Constitutional Classifiers

Oppo Reno 8T Design, Colour Options Revealed Ahead of Launch: All Details

Flipkart Big Billion Days Sale 2022 Begins for Plus Members: All You Need to Know

Xiaomi 13, Xiaomi 13 Pro, Xiaomi 13 Lite Launched Ahead of MWC 2023: Details

Flipkart Big Billion Days 2022 Sale: Best Smartphone Offers

Infinix Note 12i (2022) With MediaTek Helio G85 SoC, 50-Megapixel Rear Camera Launched: Price, Specifications

Leave a Reply Cancel reply

Law Roach’s Top Spring Fashion Tips

Russia and Ukraine to ‘immediately’ start ceasefire talks, says Trump

Billionaire NFL owner plans affordable housing for tony NJ suburb

Death of Emilie Kiser’s 3-Year-Old Son Under Investigation

Wilfiks Weed Puller Tool is 44% off on Amazon

Anthropic Unveils Constitutional Classifiers

Similar Posts

Leave a Reply Cancel reply