Introduction
While experimenting with Google’s Nano model (popularly called Nano Banana 🍌), I realized something interesting:
AI image quality doesn’t depend only on the model—it heavily depends on how you prompt it.
In this post, I’ll share a simple prompting framework I learned that makes AI-generated images more controlled, expressive, and realistic, even for beginners.
This blog is written from a learning-by-doing perspective, not a theoretical one.
What Is Google Nano Banana?
Google Nano Banana is a lightweight multimodal AI model that focuses on:
- Image understanding
- Reasoning-based generation
- Predicting what happens next instead of just static outputs The real power comes from structured prompts.
The 5-Step Prompt Formula (Core Learning)
Through experimentation, I found that breaking prompts into components dramatically improves results.
The 5 Key Prompt Elements
- Subject – Who or what is in the image
- Action – What the subject is doing
- Scene – Where it happens
- Style – Visual aesthetic or era
- Composition – Camera angle or framing
Example Prompt - Create an image of me (subject) laughing (action)
in a 1960s café (scene).Make it a close-up shot in a vintage photography style (composition and style).
Going Beyond Static Images: “What If” Reasoning
One of the coolest things about Nano Banana is reasoning-based continuation.
Step 1: Set a clear stage
Generate an image of a person standing and holding a 3-tier cake.

Step 2: Trigger an action
Now generate an image showing what would happen if they tripped.

The model doesn’t just redraw—it predicts the next logical outcome, including:
- Body posture
- Object movement
- Environmental reaction This feels closer to storytelling, not image generation.
What I Learned from This Experiment
Key Takeaways
AI models perform better with structured context “What if” prompts unlock reasoning ability Prompting is becoming a skill, not just typing text
Composition matters as much as description
Common Mistakes Beginners Make
- Writing very long, unstructured prompts
- Mixing multiple scenes at once
- Ignoring camera composition
- Expecting AI to “guess” intent
Best Practices for Prompting
- Think like a director, not a user
- Separate what, where, and how
- Add actions to make images dynamic
- Test small changes and iterate

Top comments (0)