Constitutional AI
- Overview
Constitutional AI (CAI) is an AI that follows certain rules that help it become more helpful and harmless. Hence its name because it adheres to a clear constitution.
This approach addresses the legal, ethical and social implications of AI deployment, ensuring that AI systems operate within the ambit of constitutional principles such as human rights, privacy protection, due process and equality before the law.
Researchers at Anthropic came up with the idea of CAI. In the paper "Constitutional AI: The Harmlessness of Artificial Intelligence Feedback," they claim that human labels are not needed to identify harmful output during CAI training because only human oversight can be done with a set of rules or principles (or a constitution) represent.
This eliminates a lot of the hard work required for humans to provide feedback, such as reinforcement learning with human feedback (RLHF), because the intended behavior of the AI system is achieved by specifying the rules in the constitution.
Reliance on humans is reduced as principles established in the constitution guide model development and evaluation. This will lead to a more scalable and efficient model training process.
In this process, the choice of constitutional principles is crucial as it shapes the ethical and moral foundation on which the AI model operates.
The CAI approach has three main goals:
- Reduce human supervision during fine-tuning of large language models (LLMs).
- Increase the innocuousness of the LLM while maintaining its usefulness and honesty.
- Improve the transparency of LLM responses by training them to explain why they refuse to answer harmful or unethical questions.
- Anthropic's Approach
Anthropic draws inspiration from a variety of sources, including Apple’s Terms of Service, the United Nations Declaration of Human Rights, and advice from other research labs.
In this process, the choice of constitutional principles is crucial as it shapes the ethical and moral foundation on which the AI model operates.
By integrating principles from a variety of sources, Anthropic was able to construct a constitution that reflected a broader view of ethical behavior in AI and was consistent with the values and expectations of the human community.
This approach enhances transparency, accountability, and trust in AI systems by providing a clear and public framework for evaluating their behavior and enforcing ethical standards.
By adhering to the principles outlined in the constitution, the AI model’s decision-making process becomes more transparent. The rules set out in the Constitution provide a clear framework for developers and users to understand and evaluate the behavior of AI models.
- How Does CAI Work?
To create CAI, Anthropic includes supervised learning and reinforcement learning stages in the training process.
Generative artificial intelligence (GenAI) is a set of models designed to perform natural language tasks with capabilities that match or exceed human levels. As such, these systems are often pre-trained and fine-tuned to achieve a common goal: to be as helpful as possible.
However, for AI systems, simply providing help is not enough. Imagine that we have an AI system that is trained to be a very useful assistant. To ensure it's useful, it will answer all of our questions, even those containing harmful or immoral content.
Therefore, it is crucial to ensure that these AI systems are not only useful but also harmless. One challenge with having a pure, harmless AI system is that it tends to be evasive, meaning it may refuse to answer controversial questions without explanation.
Common ways to enhance the harmlessness and transparency of AI systems while maintaining their usefulness are through supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF).
However, collecting manual annotation data for SFT and RLHF is time-consuming. Therefore, a method to automate the data generation process is needed to train AI systems more effectively.