OpenAI Launches GPT-4.1: Better Coding & Million-Token Context

OpenAI has unveiled a new generation of artificial intelligence models named GPT-4.1, surpassing the capabilities of previous versions in coding and long text processing.

This announcement comes as the first piece of news within a series of updates that Sam Altman – the company's CEO – indicated would be rolled out gradually starting today.

GPT-4.1 Capabilities, Performance, and Pricing

According to official data, the new models can process one million context tokens, equivalent to reading 750,000 words at once – which is more than the length of the novel "War and Peace".

In standard programming benchmarks (SWE-bench), the main GPT-4.1 model achieved an accuracy ranging from 52% to 54.6%.

Meanwhile, models from Google and Anthropic performed slightly better (63.8% and 62.3%, respectively).

In another separate evaluation, OpenAI tested its GPT-4.1 model's ability to comprehend video content, using the Video-MME benchmark specifically designed to measure this skill. This is considered the highest score recorded in this category of tests.

GPT-4.1 Models Pricing

The new family is currently available via API updates, and has not yet been integrated into the free or paid versions of ChatGPT.

  • GPT-4.1: The most powerful model, costing $2 per million input tokens.
  • GPT-4.1 Mini: A lower-cost version ($0.4 per million input tokens), with a slight trade-off in accuracy.
  • GPT-4.1 Nano: The fastest and OpenAI's cheapest model ever ($0.1 per million input tokens), ideal for rapid tasks.

Challenges and Limitations

Despite the capabilities of the GPT-4.1 family, these models still face difficulties in:

  • Maintaining accuracy as the length of input text increases (accuracy drops from 84% to 50% when handling one million tokens).
  • Handling security vulnerabilities in the generated code, according to independent studies.
  • Understanding implicit contexts, which requires clearer instructions from users.

AI Companies' Competition in the Coding Space

Companies have recently been competing to produce high-accuracy coding models.

In this context, Google recently announced updates for Gemini 2.5 Pro, while Anthropic is preparing to release upgraded versions of Claude.

Meanwhile, in China, the company DeepSeek stands out as a strong player with its enhanced V3 model.

On another front, OpenAI aims to develop an "intelligent software engineer" capable of managing entire projects – from design to documentation.

During a tech conference in London, Sarah Friar (the company's CFO) explained that upcoming models will possess capabilities "akin to humans in creativity and accuracy".

Related Posts

OpenAI’s o3 and o4-mini show higher hallucination rates
  • April 19, 2025

In a controversial move, internal tests conducted by OpenAI have revealed that the new AI models “o3” and “o4-mini”, specifically designed…

Google Officially Launches Gemini 2.5 Flash Preview: Its First Hybrid Model with Controlled Thinking
  • April 18, 2025

Google has officially launched the preview version of its Gemini 2.5 Flash model within the Gemini app and developer platforms such…

Leave a Reply

Your email address will not be published. Required fields are marked *

You Missed

OpenAI’s o3 and o4-mini show higher hallucination rates

OpenAI’s o3 and o4-mini show higher hallucination rates

Google Veo 2: AI Video Creation Now Supports Arabic

Google Veo 2: AI Video Creation Now Supports Arabic

Google Officially Launches Gemini 2.5 Flash Preview: Its First Hybrid Model with Controlled Thinking

Google Officially Launches Gemini 2.5 Flash Preview: Its First Hybrid Model with Controlled Thinking

Grok Evolves: xAI Adds Free Grok Studio and New Memory Feature

Grok Evolves: xAI Adds Free Grok Studio and New Memory Feature