I keep a folder on my phone called "How did they make this?" It's full of screenshots from X and LinkedIn: editorial illustrations, article covers, infographics, all made with Gemini's Nano Banana. The style control, the text rendering, the sheer polish. Every week, someone posts something that makes me stop scrolling.
So naturally, I tried it myself. I had a specific use case: I write articles for Learned Context and need editorial-quality cover illustrations. The visual identity I'm after is a collision of 18th-century copperplate engraving with modern subjects. Think The Economist meets Victorian printmaking. Bold colour field backgrounds. Deliberate anachronisms. A visual metaphor, never a literal depiction.
Two days later, I had a prompt engineering system that works better on ChatGPT than on Gemini.
That wasn't the plan. The plan was to learn Nano Banana. I ended up building a five-step prompt framework with style anchors, metaphor development, production decision matrices, and a master template. The system produces reliable, editorially distinctive results. But the cleanest outputs, the crispness, the compositional fidelity, the artistic touch, come from running those prompts through ChatGPT, not Gemini.
Which raises the question I still can't answer: am I doing Gemini wrong?
- Prompt dialect
A prompt dialect is the specific way an AI image generator interprets language. Each platform has its own: Gemini rewards technical precision and spatial specifications. ChatGPT rewards narrative intent and mood descriptions. The same words produce different images across tools. Mastering AI image generation prompt engineering means learning the dialect your tool speaks.
- AI image generation tools each have a "prompt dialect": the same words produce different results across platforms
- Gemini's Nano Banana excels at editing, text rendering, subject consistency, and real-world knowledge. ChatGPT is stronger on artistic fidelity and creative instruction-following
- Prompt ambiguity causes roughly 41% of first-try failures. Structured prompts with explicit style, spatial, and colour parameters close most of the gap
- Below is the five-step system I built. It's designed for editorial illustration but the architecture applies to any AI image generation workflow
The tools moved fast. The skills didn't.
The Nano Banana story is remarkable by any measure. Google launched the original model (Gemini 2.5 Flash Image) in August 2025. According to Google DeepMind, it attracted 13 million new users within four days and generated over 5 billion images by mid-October 2025. By November, Nano Banana Pro arrived with studio-quality controls, 4K resolution, and multilingual text rendering built on Gemini 3. Then in late February 2026, Nano Banana 2 (Gemini 3.1 Flash Image) rolled out as the default across the Gemini app, Google Search, and the API, combining Pro's fidelity with Flash's speed.
On the other side, OpenAI hasn't been idle. ChatGPT now uses GPT Image 1.5 natively, meaning the language model understands conversational context when generating images. You can describe what you want in plain English, refine through follow-up prompts, and the system maintains coherence across iterations. Multiple reviewers in early 2026 have noted that ChatGPT produces the strongest results for artistic and creative image generation, while Gemini leads on editing, consistency, and factual accuracy in visuals.
Both tools are exceptional. But they're exceptional at different things.
The showcase-to-reality gap
Here's what I think is happening. The people producing jaw-dropping Nano Banana outputs on social media have learned Gemini's specific dialect. They've internalised how the model interprets prompts: which keywords trigger which rendering behaviours, how spatial language maps to composition, which style references the model "understands" versus the ones it flattens into generic output.
I hadn't learned that dialect. I was writing prompts the way I think, conversationally, with emphasis on mood and artistic intent rather than technical parameters. And that, it turns out, is exactly how ChatGPT wants to be prompted.
According to a prompt engineering analysis by Banana Thumbnail, prompt ambiguity accounts for roughly 41% of first-try image generation failures on Gemini. Style inconsistency, where the model mixes photorealistic and illustrative elements, is the second biggest issue. Google's own prompting guide for Nano Banana Pro confirms this: their recommended prompt structure reads more like a technical specification than a creative brief. Define the subject's appearance, specify camera angle and lighting, declare the artistic medium, use spatial language for composition.
Gemini wants a cinematographer. ChatGPT wants a creative director.
The chart above reflects my experience, calibrated against published side-by-side tests. Gemini's strengths, editing precision, text rendering accuracy, subject consistency, speed, are real and well-documented. But for what I was trying to do, create original editorial illustrations with a specific artistic style and compositional intent, ChatGPT's instruction-following and artistic interpretation felt materially stronger.
The AI image generation prompt system I actually built
After enough failed attempts, I stopped prompting freehand and built a system. The target was editorial illustrations for Learned Context articles: copperplate engraving subjects against bold colour fields, with deliberate anachronisms and visual metaphors. But the architecture applies to any workflow where you need consistent, stylistically controlled image generation.
The system has five steps. The first two are about thinking. The last three are about execution. Most people skip straight to execution, which is why most people get generic results.
Step 4 is where the template does the heavy lifting. Every prompt I generate follows this six-block structure, regardless of which model I'm sending it to:
Six rules govern how I write the narrative block: narrative prose, not keywords (the model was trained on descriptions, not tag lists). Cinematic framing language ("85mm portrait lens perspective," "worm's-eye view"). Exact hex colour values, never colour names. Locked terminology ("copperplate engraving" in every prompt, never "etching" or "woodcut," because Gemini treats synonyms as different instructions). Negative boundaries always. One concept per prompt.
I also maintain a colour palette with pre-mapped spot colour pairings and a gallery of six style anchor images that get uploaded alongside every prompt for visual consistency. The anchors range from a red lobster on parchment (scientific register) to classical caryatids holding a computer against a dark background (anachronism register).
The system has a detail I'm particularly proud of: an African visual vocabulary that sits alongside the default Greco-Roman motifs. Benin bronze heads for authority and institutional memory. Nok terracotta for first principles. Great Zimbabwe for infrastructure. Timbuktu manuscripts for knowledge systems. Kente textile patterns for background treatment. Same engraving technique, same production rules, only the subject descriptions change.
Where I landed, and what surprised me
Here's the thing. The system works. It produces consistent, editorially distinctive illustrations that match a specific visual identity across 15+ published article covers. The metaphor development step alone improved my output quality more than any technical prompt parameter.
But when I run the same template through ChatGPT, same narrative, same rendering instructions, same colour specifications, the output tends to be crisper. The compositions feel more intentional. The engraving textures have more depth and variation. The artistic interpretation of the metaphor is bolder.
Gemini's editing capability is genuinely impressive: the ability to say "remove all text artefacts" or "make only the compass coloured" and get a targeted fix rather than a full regeneration. ChatGPT doesn't have an equivalent. When I need to iterate on a specific element, I still go to Gemini.
The honest conclusion: I use the system to think, ChatGPT to generate, and Gemini to edit.
Am I missing something?
This is where I have to be honest about my own limits. Gemini's Nano Banana models are clearly capable of extraordinary output. The evidence is all over social media, and the technical specs (5-character consistency, 14-object fidelity, multilingual text rendering) are impressive. The system I built was originally designed for Gemini. The style anchors, the locked terminology, the spatial language, all of that came from Google's own prompting guidance.
So the question isn't whether Gemini is good. It is. The question is whether my prompting approach is calibrated to what Gemini needs despite my attempts to calibrate it.
I think the answer is partially yes and partially something else. The "something else" is that ChatGPT's conversational interface is more forgiving. You can be imprecise about artistic intent and it infers what you mean reasonably well. Gemini's model, built on a fundamentally different architecture, rewards precision over inference. That's not a flaw. It's a design choice that suits different workflows. But it does mean the learning curve is steeper for someone who thinks in creative briefs rather than camera specifications.
If you're an expert Nano Banana prompter reading this and shaking your head, I genuinely want to hear what I'm getting wrong. My DMs are open.
We've moved past the question of whether AI can generate images. The answer is yes, at a quality level that would have been absurd three years ago. The new question is whether you can generate images. And the answer depends on whether you've learned the specific dialect your tool speaks.
I built a system because I couldn't learn the dialect fast enough. The system works. But I suspect the ceiling for someone who speaks fluent Nano Banana is higher than anything my framework can produce on ChatGPT. That's the tension I'm sitting with: the framework is a bridge, not a destination. And I still don't know which side of the bridge I should be walking toward.
The skill ceiling for AI image generation isn't the model. It's you.
- Google DeepMind, "Nano Banana launch announcement," August 2025
- Google Keyword Blog, "Nano Banana Pro prompting tips," November 2025
- TechCrunch, "Nano Banana 2 coverage," February 2026
- Banana Thumbnail, "Gemini image generation failure analysis," February 2026
- ImprovePrompt.ai, "AI image generation prompt guide," March 2026
