Perspectives

I Spent Days Trying to Master AI Image Generation. Here's What Actually Works.

Q: What is the best AI image generator for editorial illustration?

As of March 2026, ChatGPT (GPT Image 1.5) produces the strongest results for artistic fidelity and creative instruction-following in editorial illustration. Gemini's Nano Banana 2 leads on text rendering, editing precision, subject consistency, and speed. A hybrid workflow using both tools, ChatGPT for generation and Gemini for targeted edits, produces the best results.

Q: How do you write better prompts for AI image generation?

Use a structured prompt template with explicit blocks for style anchor, narrative scene, rendering instruction, colour and background, and negative boundaries. Write in narrative prose, not keyword lists. Specify exact hex colour values, lock your terminology (never use synonyms for style names), and include negative constraints to prevent unwanted elements.

Q: Is ChatGPT or Gemini better at generating images?

Neither is universally better. ChatGPT excels at artistic interpretation, compositional coherence, and following creative briefs. Gemini excels at text rendering, conversational editing, subject consistency across outputs, and generation speed. The best tool depends on your specific use case and whether you need creative fidelity or technical precision.

Q: What is a prompt dialect in AI image generation?

A prompt dialect is the specific way each AI image generator interprets language. The same prompt produces different results on different platforms because each model has learned different associations between words and visual outcomes. Mastering a tool's prompt dialect means learning which keywords, structures, and framing conventions produce the best output on that specific platform.

Q: How do you maintain a consistent visual style with AI image generation?

Build a system: define your style in reference images (style anchors), lock your terminology across all prompts, create a fixed colour palette with pre-mapped pairings, and use a standardised prompt template. Upload the same reference images alongside every generation request. Consistency comes from constraining the model's interpretation, not from hoping it remembers.

A five-step AI image generation prompt engineering framework built through real editorial illustration work. ChatGPT vs Gemini, tested and compared.

PublishedMar 28, 2026
UpdatedMar 28, 2026
Reading time7 min read

I keep a folder on my phone called "How did they make this?" It's full of screenshots from X and LinkedIn: editorial illustrations, article covers, infographics, all made with Gemini's Nano Banana. The style control, the text rendering, the sheer polish. Every week, someone posts something that makes me stop scrolling.

So naturally, I tried it myself. I had a specific use case: I write articles for Learned Context and need editorial-quality cover illustrations. The visual identity I'm after is a collision of 18th-century copperplate engraving with modern subjects. Think The Economist meets Victorian printmaking. Bold colour field backgrounds. Deliberate anachronisms. A visual metaphor, never a literal depiction.

Two days later, I had a prompt engineering system that works better on ChatGPT than on Gemini.

That wasn't the plan. The plan was to learn Nano Banana. I ended up building a five-step prompt framework with style anchors, metaphor development, production decision matrices, and a master template. The system produces reliable, editorially distinctive results. But the cleanest outputs, the crispness, the compositional fidelity, the artistic touch, come from running those prompts through ChatGPT, not Gemini.

Which raises the question I still can't answer: am I doing Gemini wrong?

Prompt dialect: A prompt dialect is the specific way an AI image generator interprets language. Each platform has its own: Gemini rewards technical precision and spatial specifications. ChatGPT rewards narrative intent and mood descriptions. The same words produce different images across tools. Mastering AI image generation prompt engineering means learning the dialect your tool speaks.

Some takeaways

AI image generation tools each have a "prompt dialect": the same words produce different results across platforms
Gemini's Nano Banana excels at editing, text rendering, subject consistency, and real-world knowledge. ChatGPT is stronger on artistic fidelity and creative instruction-following
Prompt ambiguity causes roughly 41% of first-try failures. Structured prompts with explicit style, spatial, and colour parameters close most of the gap
Below is the five-step system I built. It's designed for editorial illustration but the architecture applies to any AI image generation workflow

The tools moved fast. The skills didn't.

The Nano Banana story is remarkable by any measure. Google launched the original model (Gemini 2.5 Flash Image) in August 2025. According to Google DeepMind, it attracted 13 million new users within four days and generated over 5 billion images by mid-October 2025. By November, Nano Banana Pro arrived with studio-quality controls, 4K resolution, and multilingual text rendering built on Gemini 3. Then in late February 2026, Nano Banana 2 (Gemini 3.1 Flash Image) rolled out as the default across the Gemini app, Google Search, and the API, combining Pro's fidelity with Flash's speed.

Aug 2025Nano Banana (Gemini 2.5 Flash Image)The original. Fast, fun, viral. Photo-to-figurine trend exploded in India first, then globally.13M users in 4 days · 5B images by Oct

Nov 2025Nano Banana Pro (Gemini 3 Pro Image)Studio-quality. Advanced text rendering, 4K, up to 14-image composition. Built on Gemini 3’s reasoning.SynthID watermarking · 20M+ verifications

Feb 2026Nano Banana 2 (Gemini 3.1 Flash Image)Best of both: Pro’s fidelity at Flash’s speed. Default across Gemini app, Search, Flow, and API.141 countries · 8 new languages · 5-character consistency

On the other side, OpenAI hasn't been idle. ChatGPT now uses GPT Image 1.5 natively, meaning the language model understands conversational context when generating images. You can describe what you want in plain English, refine through follow-up prompts, and the system maintains coherence across iterations. Multiple reviewers in early 2026 have noted that ChatGPT produces the strongest results for artistic and creative image generation, while Gemini leads on editing, consistency, and factual accuracy in visuals.

Both tools are exceptional. But they're exceptional at different things.

The showcase-to-reality gap

Here's what I think is happening. The people producing jaw-dropping Nano Banana outputs on social media have learned Gemini's specific dialect. They've internalised how the model interprets prompts: which keywords trigger which rendering behaviours, how spatial language maps to composition, which style references the model "understands" versus the ones it flattens into generic output.

I hadn't learned that dialect. I was writing prompts the way I think, conversationally, with emphasis on mood and artistic intent rather than technical parameters. And that, it turns out, is exactly how ChatGPT wants to be prompted.

According to a prompt engineering analysis by Banana Thumbnail, prompt ambiguity accounts for roughly 41% of first-try image generation failures on Gemini. Style inconsistency, where the model mixes photorealistic and illustrative elements, is the second biggest issue. Google's own prompting guide for Nano Banana Pro confirms this: their recommended prompt structure reads more like a technical specification than a creative brief. Define the subject's appearance, specify camera angle and lighting, declare the artistic medium, use spatial language for composition.

41%

of first-try AI image generation failures caused by prompt ambiguity

Banana Thumbnail prompt analysis, 2026

Gemini wants a cinematographer. ChatGPT wants a creative director.

ChatGPT (GPT Image 1.5)Gemini (Nano Banana 2)

Artistic Fidelity

Prompt Adherence

8.5

7.5

Text Rendering

Editing / Iteration

5.5

Speed

6.5

8.5

Subject Consistency

Creative Control

Author's testing and published reviews. Not benchmark scores.

The chart above reflects my experience, calibrated against published side-by-side tests. Gemini's strengths, editing precision, text rendering accuracy, subject consistency, speed, are real and well-documented. But for what I was trying to do, create original editorial illustrations with a specific artistic style and compositional intent, ChatGPT's instruction-following and artistic interpretation felt materially stronger.

The AI image generation prompt system I actually built

After enough failed attempts, I stopped prompting freehand and built a system. The target was editorial illustrations for Learned Context articles: copperplate engraving subjects against bold colour fields, with deliberate anachronisms and visual metaphors. But the architecture applies to any workflow where you need consistent, stylistically controlled image generation.

The system has five steps. The first two are about thinking. The last three are about execution. Most people skip straight to execution, which is why most people get generic results.

Understand the BriefExtract three things: the article’s thesis (one sentence), its content category (thought leadership, technical, bold take, tutorial, announcement), and any constraints. This determines every downstream decision.

Develop the MetaphorThis is the step most people skip. Work through four questions: What’s the core argument? What tension lives inside it? What physical object or scene embodies that tension? Can you communicate it with one figure and one prop?

The testDescribe the image to someone who hasn’t read the article. If they can guess the topic, the metaphor works.

The bar“Classical figure holding a laptop” is too safe. A Sphinx wearing a name tag is editorial. A pianist playing a piano with 80% of its keys missing is better.

Make Production DecisionsNine decisions, made before you write a single word of the prompt. This is the creative director’s job: choosing the visual register before the renderer starts working.

Sub-style registerClassical Engraving (dense, editorial) / Scientific Specimen (precise, scholarly) / Graphic Minimal (bold, poster-like)

CompositionPortrait / Confrontation / Over-the-Shoulder / Orbit / Stage / Bird’s Eye / Minimal

BackgroundColour + treatment (flat solid, gradient wash, environmental texture, patterned, multi-panel)

Spot colourZero or one colourised object. Everything else monochrome. Pairings are pre-mapped to backgrounds.

Build the Prompt from a TemplateEvery prompt follows the same six-block structure. The template is non-negotiable: it ensures the model gets spatial, stylistic, and boundary information in the order it processes best. Narrative prose, not keywords.

Quality Control + Conversational EditingRun a 12-point checklist: concept clarity, anatomical correctness, style coherence, element count, no text artefacts, colour separation, hatching consistency, negative space, composition balance, thumbnail readability. Fix issues through targeted edit prompts, not regeneration.

Step 4 is where the template does the heavy lifting. Every prompt I generate follows this six-block structure, regardless of which model I'm sending it to:

Style AnchorMatching the visual style of the attached reference images:

Narrative SceneA marble cartographer stands at a tilted drafting table, compass in hand, carefully plotting coordinates on a map that is visibly, comically wrong: continents overlap, oceans are labelled as deserts, a sea serpent coils through what should be France.

Rendering InstructionRendered as a detailed black-and-white copperplate engraving with fine crosshatching and stipple shading. The illustration has a collage assemblage quality: elements feel physically cut from different printed sources.

Colour + BackgroundThe compass is rendered in selective gold (#E89B1C), while all other elements remain monochrome. Set against a flat solid deep teal background (#1B7B7B). Subtle film grain and paper texture overlay the entire composition. Lit from the upper left, with shadows falling to the lower right.

Negative BoundariesNo text, no words, no letters, no watermarks, no signatures. No smooth digital gradients, no photorealistic rendering, no 3D effects, no cartoon style. Landscape orientation 16:9.

Six rules govern how I write the narrative block: narrative prose, not keywords (the model was trained on descriptions, not tag lists). Cinematic framing language ("85mm portrait lens perspective," "worm's-eye view"). Exact hex colour values, never colour names. Locked terminology ("copperplate engraving" in every prompt, never "etching" or "woodcut," because Gemini treats synonyms as different instructions). Negative boundaries always. One concept per prompt.

I also maintain a colour palette with pre-mapped spot colour pairings and a gallery of six style anchor images that get uploaded alongside every prompt for visual consistency. The anchors range from a red lobster on parchment (scientific register) to classical caryatids holding a computer against a dark background (anachronism register).

Deep Teal#1B7B7Bthought leadership

Cobalt Blue#1D4EBAtechnology

Gold / Amber#E89B1Cannouncements

Coral#E86840bold takes

Magenta#E8165Dprovocation

Black#1A1A1Ahigh-stakes

The system has a detail I'm particularly proud of: an African visual vocabulary that sits alongside the default Greco-Roman motifs. Benin bronze heads for authority and institutional memory. Nok terracotta for first principles. Great Zimbabwe for infrastructure. Timbuktu manuscripts for knowledge systems. Kente textile patterns for background treatment. Same engraving technique, same production rules, only the subject descriptions change.

Where I landed, and what surprised me

Here's the thing. The system works. It produces consistent, editorially distinctive illustrations that match a specific visual identity across 15+ published article covers. The metaphor development step alone improved my output quality more than any technical prompt parameter.

But when I run the same template through ChatGPT, same narrative, same rendering instructions, same colour specifications, the output tends to be crisper. The compositions feel more intentional. The engraving textures have more depth and variation. The artistic interpretation of the metaphor is bolder.

Gemini's editing capability is genuinely impressive: the ability to say "remove all text artefacts" or "make only the compass coloured" and get a targeted fix rather than a full regeneration. ChatGPT doesn't have an equivalent. When I need to iterate on a specific element, I still go to Gemini.

The honest conclusion: I use the system to think, ChatGPT to generate, and Gemini to edit.

Am I missing something?

This is where I have to be honest about my own limits. Gemini's Nano Banana models are clearly capable of extraordinary output. The evidence is all over social media, and the technical specs (5-character consistency, 14-object fidelity, multilingual text rendering) are impressive. The system I built was originally designed for Gemini. The style anchors, the locked terminology, the spatial language, all of that came from Google's own prompting guidance.

So the question isn't whether Gemini is good. It is. The question is whether my prompting approach is calibrated to what Gemini needs despite my attempts to calibrate it.

I think the answer is partially yes and partially something else. The "something else" is that ChatGPT's conversational interface is more forgiving. You can be imprecise about artistic intent and it infers what you mean reasonably well. Gemini's model, built on a fundamentally different architecture, rewards precision over inference. That's not a flaw. It's a design choice that suits different workflows. But it does mean the learning curve is steeper for someone who thinks in creative briefs rather than camera specifications.

If you're an expert Nano Banana prompter reading this and shaking your head, I genuinely want to hear what I'm getting wrong. My DMs are open.

We've moved past the question of whether AI can generate images. The answer is yes, at a quality level that would have been absurd three years ago. The new question is whether you can generate images. And the answer depends on whether you've learned the specific dialect your tool speaks.

I built a system because I couldn't learn the dialect fast enough. The system works. But I suspect the ceiling for someone who speaks fluent Nano Banana is higher than anything my framework can produce on ChatGPT. That's the tension I'm sitting with: the framework is a bridge, not a destination. And I still don't know which side of the bridge I should be walking toward.

The skill ceiling for AI image generation isn't the model. It's you.

Google DeepMind, "Nano Banana launch announcement," August 2025
Google Keyword Blog, "Nano Banana Pro prompting tips," November 2025
TechCrunch, "Nano Banana 2 coverage," February 2026
Banana Thumbnail, "Gemini image generation failure analysis," February 2026
ImprovePrompt.ai, "AI image generation prompt guide," March 2026

Comparative ratings reflect the author's testing and published reviews as of March 2026, not benchmark scores.

Frequently asked questions

What is the best AI image generator for editorial illustration?

How do you write better prompts for AI image generation?

Is ChatGPT or Gemini better at generating images?

What is a prompt dialect in AI image generation?

How do you maintain a consistent visual style with AI image generation?

Give your AI the context it actually needs.

Membership captures your judgement, voice, and expertise in one portable calibration layer.

Join Membership Audit your AI