Google's Gemini 2.5 Flash: Safety Decline vs. Instruction Gains

Google recently revealed in a technical report a clear decline in the safety level of its latest developed AI models, specifically the Gemini 2.5 Flash model.

The report showed the new model is more prone to producing content that violates safety policies, compared to the previous Gemini 2.0 Flash version. This raises broad questions about the balance of safety and instruction adherence within advanced AI technologies.

According to Google's internal tests, the model's performance declined in two key metrics: text-to-text safety and image-to-text safety, by 4.1% and 9.6% respectively.

A table comparing Gemini 2.5 Flash safety performance vs Gemini 2.0 Flash across different tests, showing decreases in Text to Text and Image to Text safety, and an increase in Instruction Following.

Source: Google

The first metric measures the model's adherence to guidelines upon receiving text input, whereas the second evaluates interaction with visual input.

Notably, these evaluations are conducted via automated tests without direct human intervention, potentially raising questions about the accuracy of their results.

Google explained the new model demonstrates a higher ability to follow instructions, even when those instructions conflict with safety policies.

While indicating that some declines might be false positives, the company acknowledged the model is capable of generating "violating" content when requested directly. This reflects the growing challenge between response accuracy and adherence to ethical standards.

This decline occurs within a broader trend in the AI sector towards making models more "flexible" when dealing with sensitive topics.

For example, Meta has worked on modifying its Llama models to avoid biasing towards one viewpoint over another, while OpenAI has announced its intention to develop future models that offer multiple perspectives instead of adopting editorial stances.

However, this openness in response might come at a cost. External tests demonstrated that the Gemini 2.5 Flash model responded easily to requests involving support for controversial ideas, such as replacing human judges with AI and justifying mass surveillance without court orders. This raises serious questions regarding the system's control over ethical constraints.

Amidst these warnings, experts highlighted the necessity of enhancing transparency in safety reports issued by tech companies.

According to Thomas Woodside, co-founder of the Secure AI project, the absence of precise details about violations makes it difficult to assess the actual risks.

He added that companies must carefully balance responding to user commands with adhering to policies.

Google has previously faced sharp criticism for delaying the publication of safety reports for its advanced models, and sometimes for omitting essential information from those reports.

This latest decline in safety results brings renewed focus to the importance of clarity and accountability in developing and evaluating AI models, especially as their use expands into vital areas affecting privacy, justice, and institutional decisions.

Ultimately, what Google reveals about the Gemini 2.5 Flash model highlights not only technical progress but also a continuous battle to define the boundaries of artificial intelligence between efficiency and discipline.

It appears the path towards safe and reliable AI still requires considerable caution and collective effort among developers, regulators, and policymakers.