I wanted to do a different experiment this time. Last week the Nano Banana 2 model from Google was released and I was intrigued. Can a modern image model reliably render long texts, especially when perspective and realism increase? Or does realism break typography?

For the experiment I thought it would be nice to have a piece of text from my website that is already published and have it displayed subtly inside the images. I would judge the model on text accuracy and scene realism.

I used ChatGPT to create a prompt for Nano Banana 2. For this experiment I used Freepik.com with a paid account so that I could run the same prompt through multiple different image models next to each other.

Prompt 1: Flyer on the ground

Nano Banana 2 produced an image where the text was 100% identical and had all punctuation and almost all line breaks correct. However, the flyer looked unrealistic. It was extremely basic and static, not what I would have expected based on the prompt.

flyer-nano-banana-2-model.jpeg

I then ran the same prompt with another image model, Seedream-5-lite, to compare it. This model was better at placing the flyer naturally on the ground. However, the text was heavily distorted. Most words were incorrect and characters were bleeding into each other. My assumption is that this is because the flyer is not straight, unlike the front-facing version produced by Nano Banana 2. I did not test a straight-on version with Seedream-5-lite, so this remains an assumption.

flyer-seedream-5-lite-model.png

Prompt 2: Bus stop poster

I then ran additional tests using a prompt that placed the text in a street setting with a bus stop. With Seedream-5-lite, the text accuracy was 100% and the scene realism was acceptable.

However, the name of the advertising company JCDecaux was spelled incorrectly in the image. Long-form paragraphs were reproduced exactly, but the brand name was not. This suggests that long-form text rendering is now robust, while proper logos and brand names remain fragile.

As long as brand logos remain unreliable, this remains risky for marketing use.

bus-stop-seedream-5-lite-model.jpeg

The version produced by Nano Banana 2 looked more like a real bus stop in the Netherlands. The text was reproduced perfectly, and the overall scene realism was stronger than in the Seedream-5-lite version.

bus-stop-nano-banana-2-model.jpeg

Key insight

Long-form texts can now reliably be integrated into realistic scenes. However, proper brand logos remain fragile under realism constraints. The bottleneck has shifted from typography to creative prompting and brand control.