Can Two Image Models Make the Storybook Costs Work?

Today I worked again on the custom storybook idea. Still stuck a bit on the image generation costs. It adds up faster than I expected.

So I tried a different setup.

I used GPT-image-2 for the base “world spreads”. These are the scenes without the main character. Just the environment. Those only need to be generated once per story.

Then for inserting the child into the scene, I switched to Nano Banana 2. That part happens on top of the existing image.

This split actually works. Quality stays good enough, and the expensive part is limited to a one-time step.

The main idea is simple: don’t use the best model for everything. Only for the parts that are reused.

What I also notice is that I still think in “AI is expensive” mode. Even when the setup already works better. That mindset is probably slowing things down more than the actual costs right now.

Key Insight

I was optimizing cost per image, but the real thing to optimize is cost per reusable asset. Once I looked at it like that, the setup became obvious. What I also notice is that I still think in “AI is expensive” mode. Even when the setup already works better. That mindset is probably slowing things down more than the actual costs right now.