Artificial intelligence software programs are becoming shockingly adept at carrying on conversations, winning board games and generating artwork — but what about creating software programs? In a newly published paper, researchers at Google DeepMind say their AlphaCode program can keep up with the average human coder in standardized programming contests.
“This result marks the first time an artificial intelligence system has performed competitively in programming contests,” the researchers report in this week’s issue of the journal Science.
There’s no need to sound the alarm about Skynet just yet: DeepMind’s code-generating system earned an average ranking in the top 54.3% in simulated evaluations on recent programming competitions on the Codeforces platform — which is a very “average” average.
“Competitive programming is an extremely difficult challenge, and there’s a massive gap between where we are now (solving around 30% of problems in 10 submissions) and top programmers (solving >90% of problems in a single submission),” DeepMind research scientist Yujia Li, one of the Science paper’s principal authors, told me in an email. “The remaining problems are also significantly harder than the problems we’re currently solving.”
Nevertheless, the experiment points to a new frontier in AI applications. Microsoft is also exploring the frontier with a code-suggesting program called Copilot that’s offered through GitHub. Amazon has a similar software tool, called CodeWhisperer.
Oren Etzioni, the founding CEO of Seattle’s Allen Institute for Artificial Intelligence and technical director of the AI2 Incubator, told me that the newly published research highlights DeepMind’s status as a major player in the application of AI tools known as large language models, or LLMs.
“This is an impressive reminder that OpenAI and Microsoft don’t have a monopoly on the impressive feats of LLMs,” Etzioni said in an email. “Far from it, AlphaCode outperforms both GPT-3 and Microsoft’s Github Copilot.”