OpenAI to Advance AI Ethics with a Focus on Controlling Superintelligent Systems

Just after Sam Altman left and came back to OpenAI with subsequent reshuffling, OpenAI’s Superalignment team has continued to make significant strides in addressing the challenges of controlling superintelligent AI systems.

While Altman’s exit generated substantial attention, the Superalignment team, led by OpenAI co-founder Ilya Sutskever, remains focused on the critical task of steering and governing AI systems with intelligence surpassing that of humans.

During an interview at NeurIPS, the annual machine learning conference, members of the Superalignment team – Collin Burns, Pavel Izmailov, and Leopold Aschenbrenner – discussed OpenAI’s latest efforts to ensure the responsible behavior of AI systems. The team, formed in July, specifically aims to develop frameworks for regulating and governing superintelligent AI systems, which possess intelligence far beyond human capabilities.

Collin Burns highlighted the current limitation in aligning models that are smarter than humans, stating, “Today, we can basically align models that are dumber than us, or maybe around human-level at most. Aligning a model that’s actually smarter than us is much, much less obvious — how we can even do it?”

Despite recent organizational changes and controversy surrounding Altman’s departure, Ilya Sutskever continues to lead the Superalignment team. The team acknowledges the skepticism surrounding the concept of superalignment within the AI research community, with some considering it premature or a distraction from pressing regulatory issues.

The Superalignment team is working on building governance and control frameworks applicable to future powerful AI systems. The approach involves using a weaker AI model (e.g., GPT-2) to guide a more advanced model (GPT-4) in the desired directions, thereby ensuring the alignment of AI behavior with human intent.

Addressing concerns about transparency and accessibility, the Superalignment team emphasizes that OpenAI’s research, including code, and the work of grant recipients will be shared publicly. The team views this commitment as essential to their mission of ensuring the safety and benefit of AI for humanity.

To further advance their research and encourage collaboration, OpenAI is launching a $10 million grant program dedicated to superintelligent alignment. The program, supported in part by former Google CEO Eric Schmidt, aims to fund technical research from academic labs, nonprofits, individual researchers, and graduate students. OpenAI also plans to host an academic conference on superalignment in early 2025 to share and promote the work of grant recipients.