While ChatGPT and GPT-4 Turbo are currently being touted, there are other large language models (LLMs), and some are more efficient in practice. OpenAI competitor Anthropic has introduced Claude 3.5 Sonnet to the world , which outperforms ChatGPT in generating texts, ideas, and code.
In the article, We decided to compare Claude 3 Opus and GPT-4 Turbo, and for this purpose, they tested neural networks on various tasks using the BotHub aggregator , which provides access to the models we need.
Not ChatGPT Alone: The Main Competitors in the World of Neural Networks
ChatGPT is not a phenomenon, but just one of several large language models. It has both advantages (accuracy, large base, cost of use) and limitations, most notably the inability to perform some language tasks at a high level.
When choosing a neural network, you need to consider specific tasks and evaluate whether the model can handle them given its strengths and weaknesses. While OpenAI focuses on research, Anthropic focuses on developing AI for practical solutions, so their model is suitable for business tasks.
Claude 3 Opus, Anthropic’s smartest model, outperforms its peers, including GPT-4, on most major benchmarks:
- expert knowledge at undergraduate level (MMLU);
- Graduate Level Expert Thinking Assessment (GPQA);
- basic mathematics (GSM8K), etc.
According to the developers, Opus “demonstrates near-human-level understanding and fluency in solving complex problems.” You could say that Claude 3 is Anthropic’s answer to Google’s Gemini and OpenAI’s GPT-4. And it looks like Claude 3 is pulling ahead in this race.
Amazing fact: Claude 3 passed the IQ test better than the average person. Journalist Maxim Lott conducted an experiment in which neural networks answered IQ test questions. The smartest AI was Claude-3: the model showed a result of 101 points. For comparison: the average person has an IQ of 85 – 115.
Business-friendly features of Claude 3 Opus:
- automation of tasks;
- brainstorm;
- analysis of graphs, charts, and various indicators;
- forecasting.
A weaker model, Claude 3 Sonnet, in turn, represents an ideal balance between speed and intelligence, copes well with large volumes of data, can make product recommendations, forecasts, works with targeted marketing, generates code, analyzes text.
Claude 3 Haiku, the weakest of Anthropic’s new models, can generate content, do translations, simulate human interactions, and draw inferences from unstructured data.
In addition, Anthropic introduced a new model, Claude 3.5 Sonnet , which outperforms GPT-4o in almost all tests: it writes text and code better, and acts more “humanely”. We tested the previously released version of Claude 3 Opus, but by the time this article was published, Claude 3.5 Sonnet had already been released, which was also promptly added to BotHub.
Selecting a neural network for different tasks
To test different models without paying for each one, we used BotHub, which provides access to 8 neural networks in one window.
If you need to generate text
It is better to choose models with a good Russian language, and in this case Claude 3 Opus copes better. Among the advantages:
- Clearly formulates thoughts within a paragraph rather than a couple of sentences.
- There is virtually no confusion in expressions such as “a man and a woman came together” and “a man and a woman approached.”
- There are almost no repetitions of words with synonyms (for example, “feelings and emotions”, “interests and hobbies”).
- Does not use inappropriate and exaggerated comparisons.
- Does not invent new words; the text does not contain translated Anglicisms or controversial slang expressions.
- Almost never makes any reservations such as “in my opinion”, “probably”, “maybe”. In other words, the model does not try to absolve himself of responsibility for the truthfulness of the text.
Minuses:
- Makes short sentences.
- Generates text a little slowly, but not too critically.
- Keyword stuffing is present, but significantly less than with GPT-4.
- The problem with participial and adverbial participial phrases has not gone away.
If you need to work with large files
In this case, you need to look at the size of the model context. The larger it is, the more data can be inserted for processing. In this regard, the evolution of LLM is going by leaps and bounds: in June 2023, the basic ChatGPT had 4096 tokens, GPT-4 had 8000 tokens, and now GPT-4-Turbo already has 128K.
But the biggest context is Claude 3. In this issue, Anthropic, trying to compete with OpenAI, tried very hard: now its series has 200K tokens.
If you need to work with legal documents
I was pleasantly surprised by the quality of work with legal documents via BotHub. If you choose Claude 3 Opus, set up “Legal Analysis” in the ready-made roles and make a request, attaching the contract, then the output will be a high-quality review.
If you need a translation of the text
Translation by Claude 3 Opus is comparable to DeepL. The artistic quality of the text is well preserved. If you need an inexpensive option, then Claude 3 Haiku will do.
If you need storytelling (fiction) or roleplay (fictional character)
If you roleplay with ChatGPT, the answers are more “mechanical”. Claude is more suitable for this, especially considering that it works well with fiction texts. In addition, the context of 200K plays a role, so even if the dialogue turns out to be long, the model remembers what was discussed at the very beginning.
If on promt, the neural network constantly gives out “I am a language model and I can’t help with this”, then it makes sense to work with open source models. Although they are trained in English, the quality is not bad, but it can be uncomfortable.
If you need Programming
If you use ChatGPT to write code, you will have to manually correct minor errors. It is better to work with GPT-4-Turbo, especially if you correctly compose the prompt with the task. It will cope with large code. In this question, you will have to experiment with Claude 3.
What’s next
Firstly, OpenAI clearly cannot accept the fact that Anthropic is ahead in terms of performance with its Claude 3 Opus. Neural networks are developing rapidly, this is the third qualitative leap in a year: GPT 3.5 – GPT 4 – Claude 3. The LLM community expects GPT-5, which will compensate for the lag or, at least, will be no worse.
Secondly, the cost of unique content is rapidly decreasing, and it is not yet entirely clear how search engines and users will react to this. Most likely, behavioral factors will be taken into account. The main task for the next year will be to develop a model of “secondary content value”, i.e. we will have to figure out how to make our content better among generally good generated texts.
In general, it is better to choose models with Russian language, large context and non-mechanical answers. According to Computerra tests, Claude 3 Opus copes better with text generation, working with large files (including legal documents), storytelling and roleplay.