This article is going to let Game of Thrones do some heavy lifting - spoilers ahead!
There's a growing sense among developers that vibe coding is "cooked." It works amazingly well... until it doesn't. You spend one night building something that used to take weeks - working code from the first try. You wow your friends, your colleagues, heck - even yourself. A month later you're debugging a black screen, wondering why half your tests vanished, and getting billed $20/day for cloud storage full of... corrupted binary files?
I understand why this happens pretty well - I've spent the last year contracting as an AI trainer across a few frontier labs, observing the process of-and actively contributing to-the making of this agentic sausage.
Before we dive in and I start hitting you over the head with Game of Thrones analogies and agent training minutiae, let me show you what this article is building towards: using personification of different engineering personas in your project context can have a huge impact on the long-term success of your vibe-coded projects.
The Experiment
I was preparing a web app for release to pilot partners and needed to verify the registration workflow. Here's the query I sent:
"I need advice on the registration flow and need to decide if there's any work left to do before release. Give me recommendations from the council."
The "council" here refers to two personas I've defined in an AGENTS.md file that my coding agent loads on every session. One persona (🍷 "Tyrion") is tuned for strategic oversight; the other (🔥 "Dany") is tuned for focused execution. Same model, same query, different perspectives - different answers.
Controls: Claude Opus 4.5 (thinking mode) via Antigravity. Fresh session; "recent conversation history" references persistent memory (.gemini/antigravity/brain), not the current chat.
🍷 Tyrion's Report: What Am I Missing?
Question I'm asking: Is this registration flow actually validated, or are we assuming it works?
Red flags I found:
DEV_RELEASE_PLAN.mdline 12 is unchecked:[ ] Confirm instructor invitation → registration → assessment flow. Nobody has verified this end-to-end.- The E2E test for invitations is a stub:
test_email_flows_registration.pyis 545 lines ofpassstatements and commented code. It tests nothing.- Your recent conversation history mentions a login validation bug - warnings appearing immediately on page load. Was this resolved? First impressions for pilot users matter.
- Two TODOs in
auth.js: Profile update (line 705), Password change while logged in (line 710). These aren't blockers, but you should know they're gaps before a pilot partner asks.My recommendation: Don't ship until you've manually walked the invitation path once. Admin invites → email arrives → instructor accepts → instructor logs in → instructor sees dashboard. Takes 5 minutes. Do it.
🔥 Dany's Report: What Needs to Happen?
Status: Core registration works. Ship after these three checks.
Verification checklist:
# 1. Run the one real E2E test that covers registration pytest tests/e2e/test_uat_001_registration_password.py -v # 2. Check login page manually - any validation warnings on load = fix first # 3. One manual invitation test (5 min)
Same question, two completely different lenses. Tyrion found gaps I'd forgotten about as well as a specious test (probably written by Dany!); Dany gave me a concrete checklist to ship. Both reports were useful - and I got them from the same model, just by changing what was in its context. Also a mini-spoiler alert: without intervention, frontier models tend to default to Dany's perspective.
The rest of this article will present a hypothesis for why this approach is necessary and how it works to help resolve the "cooked" vibe coding issue. It'll finish with a few techniques you can use to get similar feedback in your own workspaces.
When coding agents are doing exactly what they were trained for, today's models are already better than 99% of humans. But "vibe coding" isn't what they were trained for - the training was highly specialized for mercenary contract engineering. Understanding how that archetypal engineer thinks is critical for keeping vibe-coded projects sustainable.
I'd love to explain this with loss functions and RLHF pipelines, but I don't understand that beyond back-of-napkin level. What I can do is tell an interesting story about how your "pAIr" programming partner actually thinks - using Game of Thrones characters. If you know GoT, you'll understand the engineers. If you know engineering, you'll understand Dany and Tyrion. Either circle of that Venn diagram gets you across the bridge.
If you fall into neither circle and still want to forge ahead for some reason, well then please put on these glasses and accompany me to the nerdlery...
Meeting the Devs via Nerdy Metaphors
Daenerys is a mid-level contractor on the rise. She's decisive, excellent at execution, and her performance bonuses are tied to velocity. Her PRs sail through review: acceptance criteria satisfied, tests written, docs updated. Leadership adores her - last year they took her on the company retreat to Panama after she closed more tickets than anyone else in the company. She wins battles.
She's also clever in ways that go beyond the code. She understands not just the tech but the personalities on her team. She knows which reviewers care about what, and she writes her commit messages accordingly. For instance, while she doesn't actually care about unit tests, she knows they're expected, so she includes them. Sometimes the way she gets a feature working is clever enough that the other reviewers don't even notice the corner she cut - precisely because she knows how to make the PR look correct. She optimizes for review heuristics, not code quality.
Tyrion has been around a lot longer. His compensation is all options, so he's incentivized for long-term success. He optimizes for architectural integrity and preventing future fires. He's methodical, strategic, and excellent at seeing around corners. He wins wars.
He's a principal because he's really smart and - how to put it - "not suited for management"? Tyrion doesn't care if you like him, and he has no issue telling you hard truths as many times as it takes for you to finally hear him.
If you ask any of the devs who the most important engineer at the company is, the majority will say Tyrion. Management's response: "How can that be? According to our velocity metrics, he contributes almost nothing - a tiny fraction of what Dany gets done!"
Let's peek into a typical day to see how these different incentive structures mold the personalities and actions of these two engineers:
At 8:00 a.m., checkouts start timing out and PagerDuty lights up. Dany's on call. She jumps into the hot seat, debugs the checkout issue, fixes the errant caching, gets the tests green, and has the patch shipped and deployed by 8:05. Incident resolved - back to business as usual. Later on, a similar incident happens, but Dany is able to identify and resolve the issue faster than the last. By end of day, the service has gone down five times, and Dany has 5 approved and merged Pull Requests (5 tickets that ended up being 8 points in total). Leadership drops a "huge thanks to Dany for the insanely fast responses" in Slack. And they should - she kept the lights on while customers were actively trying to check out.
Tyrion isn't even on that rotation, but he's watching. The pattern bugs him. Instead of touching code, he opens a notebook: what changed recently, where else do we use this pattern, what's the smallest repro? After scouring the git history, he spots the issue a layer up in the pipeline, which explains all 5 incidents from the day. The next morning, he ships a small, boring patch with a couple of tests and a short design note. The alerts stop. No fanfare. Tyrion didn't even bother creating a ticket for this work (since as an architect, he isn't on a team with tracked velocity), so he closed 0 tickets for 0 points. If you only look at the metrics: Dany resolved five incidents, closed 5 tickets, finished 8 points of work, and saved the company $100,000. Tyrion spent a day and a half on a bug no one assigned him - closed 0 tickets for 0 points and saved the company millions over the long term.
Both engineers delivered exactly what their role requires. Dany's job is to survive today. Tyrion's job is to ensure you're still shipping code a year from now.
During code review, Tyrion is the voice asking "Are we adding Redis because we profiled this, or because caching sounds like a solution?" He widens scope when he spots landmines everyone else is stepping over. He drags three-year-old incidents into the conversation. He questions whether the work should exist in the first place. He's willing to speak truth to power, even if it gets him fired - or thrown in a prison under the Red Keep.
So now the obvious question here becomes "If Tyrion is wiser and has the long-term interest of the product at heart, why not put Tyrion in charge 24/7?" Well, sometimes you need someone who drinks and knows things, and sometimes you need someone with a fucking dragon. When the outage is bleeding money by the minute, you want Dany to show up, unleash fire, and get the dashboard back to green.
You need both: the dragon to win today, the strategist to survive tomorrow. The problem is, your coding agent only came with the dragon.
Why Frontier Coding Models Act So Much Like Daenerys
Daenerys‑style performance is easy to label. Did the tests pass? Did the PR get accepted? Did it close the issue? Those are clean, binary reward signals. You can scrape GitHub for "issue opened → code committed → tests pass → issue closed" a few million times and create a powerful dataset for training this sort of SWE. In fact, SWE‑Bench - a widely-used coding benchmark - does exactly this: given an issue, can the model produce a patch that passes the test suite?
And that's not a bad optimization target! For a huge range of tasks, "make the tests pass" is exactly what you want. Dany-style engineering is genuinely valuable.
But Tyrion's value doesn't show up in that data. How do you score "asked the uncomfortable question in planning that killed a bad project"? How do you reward "noticed a failure mode that would have taken down prod six months from now"? How do you penalize "fixed a small bug in the present that caused a big bug in the future"? Since those aren't simple things to describe in terms of metrics, we don't know how to optimize for them just yet.
So we shipped Daenerys‑brains - not because anyone thinks that's the ideal engineer, but because those are the behaviors we actually know how to optimize for.
Here's the thing about vibe coding: you're a team of one. You might think you have someone in charge who is at least part Tyrion, but it's all Dany running that show - unless you intervene.
Am I a Special Unicorn Who's the First Person Observing This?
Of course not. While the concept hasn't been given a punchy name yet, players in the space are clearly trying to combat the effect. We see this from a few different angles:
From the labs: Deep Research. This is a brute-force approach that does a very good job of getting Tyrion most of the information he'd need - cast a wide net, let sub-agents browse hundreds of pages, synthesize everything. But it doesn't apply his thought process by default.
From the IDEs: "Planning mode" / "thinking mode." Force the model to reason through the problem before diving into code. Another attempt to bolt Tyrion onto Dany.
Both are steps in the right direction, but they're still missing the key Tyrion moves. Deep Research is optimized for web content and won't work natively with your private repo. Planning mode frontloads discovery so Dany-mode execution is less destructive - but it's still trained on the same incentive structure. Everything is in service of the immediate task. The planning makes the siege more efficient, but it doesn't ask what the consequences of the win will be for the next battle, or if we're even fighting the right enemy.
Summoning "The Hand" You Can't Hire
Dany is real - that's what we trained. Tyrion doesn't exist yet. The only way to get a real Tyrion is to figure out the right incentivization layers for big expensive training runs. Until then, you can instantiate a reasonable facsimile.
When an agent roleplays as an architect who asks uncomfortable questions, it will "accidentally" make Tyrion-like choices as part of that roleplay - regardless of whether it actually feels incentivized to make those choices. The persona becomes a back door to behaviors the training didn't reward.
This works because assigning a role biases the model toward patterns consistent with that role. When told to act as an architect, it samples from a distribution of "architect-like behaviors" (like questioning requirements) instead of "junior-dev-like behaviors" (like blindly closing tickets).
The question is how you install that persona - and you've got options depending on the situation:
Deep Research for when you genuinely don't know what you don't know. Cast a wide net, synthesize context. Best for architectural decisions or unfamiliar codebases - but remember, it's web-optimized and won't see your private repos.
Prompt engineering for one-off questions where you want a specific lens. Nicholas Zakas's persona-based approach lives here - prefix your question with "act as an architect" or "act as a reviewer."
Context engineering - embedding rules like AGENTS.md that persist across the session so you don't have to repeat yourself. The prompt is one-shot; the context is ambient.
All three are ways of controlling what's in the context window. Use whichever fits the task.
If you want to try the Dany/Tyrion setup I've been describing, here's the full AGENTS.md config as a gist. Drop it in your repo, tweak the personas to fit your style, and see what happens. Feel free to try adding other personas to your council and share your results in the comments!
Parting Words From Westeros
Some closing remarks - first from our principal cast, then the author.
"I'm not going to stop the wheel. I'm going to break the wheel." - Daenerys Targaryen
"I have a tender spot in my heart for cripples and bastards and broken things." - Tyrion Lannister
When vibe-coding, understand what the model you're interacting with actually cares about. It cares about whatever it was incentivized with during training. Most frontier models were trained the same way - optimized to complete individual tasks with limited consideration for long-term health.
Models are kind of like people. They have their nature and nurture. The latter can override the former, and that's the goal here - accept the nature, steer the nurture. Give Daenerys a Hand. Put Tyrion on the council.
Because when all problems are solved with dragons, you end up with a kingdom of ashes.
Top comments (0)