OpenAI Launches GPT-4.1: Better Coding & Million-Token Context

OpenAI has unveiled a new generation of artificial intelligence models named GPT-4.1, surpassing the capabilities of previous versions in coding and long text processing.

This announcement comes as the first piece of news within a series of updates that Sam Altman – the company's CEO – indicated would be rolled out gradually starting today.

GPT-4.1 Capabilities, Performance, and Pricing

According to official data, the new models can process one million context tokens, equivalent to reading 750,000 words at once – which is more than the length of the novel "War and Peace".

In standard programming benchmarks (SWE-bench), the main GPT-4.1 model achieved an accuracy ranging from 52% to 54.6%.

Meanwhile, models from Google and Anthropic performed slightly better (63.8% and 62.3%, respectively).

In another separate evaluation, OpenAI tested its GPT-4.1 model's ability to comprehend video content, using the Video-MME benchmark specifically designed to measure this skill. This is considered the highest score recorded in this category of tests.

GPT-4.1 Models Pricing

The new family is currently available via API updates, and has not yet been integrated into the free or paid versions of ChatGPT.

GPT-4.1: The most powerful model, costing $2 per million input tokens.
GPT-4.1 Mini: A lower-cost version ($0.4 per million input tokens), with a slight trade-off in accuracy.
GPT-4.1 Nano: The fastest and OpenAI's cheapest model ever ($0.1 per million input tokens), ideal for rapid tasks.

Challenges and Limitations

Despite the capabilities of the GPT-4.1 family, these models still face difficulties in:

Maintaining accuracy as the length of input text increases (accuracy drops from 84% to 50% when handling one million tokens).
Handling security vulnerabilities in the generated code, according to independent studies.
Understanding implicit contexts, which requires clearer instructions from users.

AI Companies' Competition in the Coding Space

Companies have recently been competing to produce high-accuracy coding models.

In this context, Google recently announced updates for Gemini 2.5 Pro, while Anthropic is preparing to release upgraded versions of Claude.

Meanwhile, in China, the company DeepSeek stands out as a strong player with its enhanced V3 model.

On another front, OpenAI aims to develop an "intelligent software engineer" capable of managing entire projects – from design to documentation.

During a tech conference in London, Sarah Friar (the company's CFO) explained that upcoming models will possess capabilities "akin to humans in creativity and accuracy".