Large Language Models’ Emergent Abilities Are a Mirage

The original version of this story appeared in Quanta Magazine. Two years ago, in a project called the Beyond the Imitation Game benchmark, or BIG-bench, 450 researchers compiled a list of 204 tasks designed to test the capabilities of large language models, which power chatbots like ChatGPT. On most tasks, performance improved predictably and smoothly … Read more