GPT-4 Turbo reclaims ‘best AI model’ crown from Anthropic’s Claude 3

Getty Images/sofiana indriani

OpenAI has been on an update hot streak, making the latest GPT-4 Turbo available to developers and paid ChatGPT subscribers last week. When launching the model, OpenAI shared that the new GPT-4 Turbo boasts several improvements from its predecessor, and users are finding that to be true.  

Also: Zoom gets its first major overhaul in 10 years, powered by generative AI

Starting Thursday, the updated version of GPT-4 Turbo, gpt-4-turbo-2024-04-09, reclaimed its number one spot on the Large Model Systems Organization (LMSYS) Chatbot Arena, a crowdsourced open platform where users can evaluate large language models (LLM). 

🔥Exciting news — GPT-4-Turbo has just reclaimed the No. 1 spot on the Arena leaderboard again! Woah!
We collect over 8K user votes from diverse domains and observe its strong coding & reasoning capability over others. Hats off to @OpenAI for this incredible launch!
To offer… pic.twitter.com/IxbN2Q9ecJ

— lmsys.org (@lmsysorg) April 11, 2024

In the Chatbot Arena, users can chat with two LLMs side by side and compare their responses to each other without knowing the identity of each model. 

After viewing the response, users can continue chatting until they feel comfortable determining which model won, if it is a tie, or if they are both bad, as seen below. 

Screenshot by Sabrina Ortiz/ZDNET

Those results are then used to rank the 82 LLMs in the Chatbot Arena on the leaderboard, which includes all of the most popular LLMs on the market such as Gemini Pro, the Claude 3 family of LLMs, and Mistral-Large-2402. 

READ MORE  Best MacBooks (2024): Which Model Should You Buy?

As of the latest Chatbot Arena update on April 13, the updated version of GPT-4 Turbo holds the lead in the overall, coding, and English categories. 

Also: The best AI chatbots: ChatGPT isn’t the only one worth trying

This means that less than a month after overtaking GPT-4 Turbo in the Chatbot Arena, Anthropic’s Claude 3 Opus has been pushed into second place in the overall category, followed by GPT-4-1106-preview, an older version of GPT-4 Turbo, in third place. 

These results could be attributed to gpt-4-turbo-2024-04-09’s improved coding, math, logical reasoning, and writing capabilities, demonstrated by its higher performance on a series of benchmarks used to test the proficiency of AI models, as seen below. 

Interested in comparing gpt-4-turbo-2024-04-09’s performance against other LLMs for yourself? You can visit the Chatbot Arena website and click on the Arena (side-by-side) option to select which models you want to compare.

Also: Adobe Premiere Pro’s two new AI tools blew my mind. Watch them in action for yourself

It is worth noting that since you know the identity of the models in the side-by-side option, you will not be able to vote. Rather, if you want to be able to vote and have that count toward the leaderboard, you can use the Arena (battle) option to compare random models to each other. 

If you’d rather skip the testing and jump straight into using gpt-4-turbo-2024-04-09 in ChatGPT, all you have to do is become a ChatGPT Plus subscriber, which costs $20 per month.

Leave a Comment