We’re very excited about this approach towards aligning AGI, but we expect that it needs to be adapted and improved as we learn more about how AI technology develops. Our approach also has a number of important limitations:
* The path laid out here underemphasizes the importance of robustness and interpretability research, two areas OpenAI is currently underinvested in. If this fits your profile, please apply for our research scientist positions!
* Using AI assistance for evaluation has the potential to scale up or amplify even subtle inconsistencies, biases, or vulnerabilities present in the AI assistant.
* Aligning AGI likely involves solving very different problems than aligning today’s AI systems. We expect the transition to be somewhat continuous, but if there are major discontinuities or paradigm shifts, then most lessons learned from aligning models like InstructGPT might not be directly useful.
* The hardest parts of the alignment problem might not be related to engineering a scalable and aligned training signal for our AI systems. Even if this is true, such a training signal will be necessary.
* It might not be fundamentally easier to align models that can meaningfully accelerate alignment research than it is to align AGI. In other words, the least capable models that can help with alignment research might already be too dangerous if not properly aligned. If this is true, we won’t get much help from our own systems for solving alignment problems.
We’re looking to hire more talented people for this line of research! If this interests you, we’re hiring Research Engineers and Research Scientists!"
no subject
We’re very excited about this approach towards aligning AGI, but we expect that it needs to be adapted and improved as we learn more about how AI technology develops. Our approach also has a number of important limitations:
* The path laid out here underemphasizes the importance of robustness and interpretability research, two areas OpenAI is currently underinvested in. If this fits your profile, please apply for our research scientist positions!
* Using AI assistance for evaluation has the potential to scale up or amplify even subtle inconsistencies, biases, or vulnerabilities present in the AI assistant.
* Aligning AGI likely involves solving very different problems than aligning today’s AI systems. We expect the transition to be somewhat continuous, but if there are major discontinuities or paradigm shifts, then most lessons learned from aligning models like InstructGPT might not be directly useful.
* The hardest parts of the alignment problem might not be related to engineering a scalable and aligned training signal for our AI systems. Even if this is true, such a training signal will be necessary.
* It might not be fundamentally easier to align models that can meaningfully accelerate alignment research than it is to align AGI. In other words, the least capable models that can help with alignment research might already be too dangerous if not properly aligned. If this is true, we won’t get much help from our own systems for solving alignment problems.
We’re looking to hire more talented people for this line of research! If this interests you, we’re hiring Research Engineers and Research Scientists!"