Last week, we talked about why debugging may be frustrating. But that’s only half the battle. When you don’t understand the magic (because, hey, it’s magic), how do you explain it to your bosses?
But let’s say you’ve actually done it. You’ve caught the ghost on camera. Or a screenshot. There’s a real bug in the system, and it causes your agent to spit out “weirdness.”
Now comes the hard part: The Meeting.
You’re sitting across from your CTO or VP of R&D. They want to know if the agent is “ready.” You tell them, “It feels a bit finicky, but we see the weird cases a lot less.”
The thing is, the CTO doesn’t speak “weird.” Or “vibes.” Or “off-ness.” They hear “unreliable” and “unpredictable.” To them, it sounds like “This guy has no idea when this will be finished.”
Now, they’ve heard it once before, and twice. But this is repeating. Chasing AI ghosts has become almost a full-time activity in the office.
You need to calm them down. But more importantly, you need them to understand. And so, you need to explain in a way they’d understand.
That’s not just part of the job. It’s key to survivability in the job.
Communication is a big part of our job. I always say that our job as testers is learning about the system, then reporting what we find to the stakeholders. They, in turn, make decisions: Release the feature, delay it until we fix the bugs. But they can’t make decisions (the good kind, anyway), if they don’t understand what we’ve reported.
Maybe we can’t explain why a model failed, or returned what it did. But that’s not the important thing for making those decisions. We should explain impact. What will happen, if this bug appears to a user?
Will the users lose money? Will our company lose money? Will it leave us open to a lawsuit? Or maybe we’ll “just” be offending the user, and the user abandons us? If we speak in those terms, we’ll be able to alert, or calm, management.
But then, they will probably want to know what we know. And we don’t want them to lose faith in us. So let’s see what we can do.
Remember our “Ghost Plan“? Our API Analysis Agent was supposed to generate a test plan for the GET /orders API. Instead, it gave us a plan that referenced parameters that don’t exist, and suggested edge cases only a genie (a very creative one) could come up with.
If we mention ghosts (or genies), or go back to “weirdness,” we’re in horror territory – that’s no way give bosses confidence in us.
So, let’s analyze the situation to find our needle(s) in our haystack (which is made of more haystacks):
Now, that’s coming prepared for a discussion.
When you go back to that CTO, you leave “weird” out of the room (also hopefully, the ghosts didn’t follow you in). And you talk like a professional. Because that’s what you are.
You say: “We analyzed the recent ‘Ghost Plan’ failures. We identified a 50% failure rate in our Retrieval layer because it was pulling legacy docs, and a 30% failure because of our context truncation logic. We also suspect that the set temperature causes the model to be too creative.”
Kneel. You are no longer a tester. You are knighted as Architect of Quality. With those kind of communication powers (and those fancy testing skills), you are going to stay here for a long time.
Of course, without the testing skills to back it up, you won’t get anywhere. They are just as vital as the communication skills.
Take the Next Step
If you’re struggling to explain your AI risks, behaviors, impacts, and reasoning to stakeholders, you need a much-needed upgrade in the communication field.
On February 18th, I’m hosting a webinar on The Anatomy of AI Quality. We’ll discuss laying the foundations of quality for AI-powered systems, including how to communicate different types of faults in the system. Professionally.
Save My Seat!Want to help me with translating technical AI stuff to VP-ish? Explain the complexity and weird behavior of AI systems in a helpful way to VPs and CEOs? Help me build this taxonomy. Reply to my latest newsletter or DM me.
The post Speaking In Tongues: How to Professionally Explain AI “Weirdness” to Your CTO first appeared on TestinGil.You know the drill. Something’s broken. You’re not sure where. You add a console.log. Then another. Then twelve more. You run it again. The bug doesn’t appear. You add more logs. Now it appears but the logs don’t help. You start guessing.
This is debugging by wandering. It works eventually, but it’s slow and frustrating.
Here’s what I’ve learned: AI is annoyingly good at debugging. Not because it’s smarter, but because it’s systematic. You describe the symptoms, AI lists the likely causes in order of probability, and you work through them methodically.
While you’re guessing, AI is pattern matching against every bug it’s ever seen.
Debugging is pattern matching. You have a symptom. You need to find the cause. The space of possible causes is large.
AI has seen millions of bugs. It recognizes patterns: “silently fails sometimes” often means race condition, async issue, or swallowed error. “Works locally but not in production” often means environment variables, database differences, or timing issues.
When you describe a bug, AI immediately narrows to the most likely categories. Then it helps you test each one systematically.
Here’s the prompt that starts most debugging sessions:
I have a bug. Help me find it.
Symptom: [describe what's happening]
Expected: [describe what should happen]
Frequency: [always / sometimes / rarely]
Environment: [local / staging / production]
Relevant code:
[paste the code you think is involved]
Recent changes:
[paste recent commits if relevant]
What I've tried:
[what have you already ruled out?]
Help me:
1. List the top 5 most likely causes
2. For each cause, how would I confirm or rule it out?
3. What logging would help narrow it down?
Here’s how the card transfer bug went:
Me:
I have a bug. Help me find it.
Symptom: Card transfers sometimes fail silently. The API returns
success, but the card stays with the original owner.
Expected: Card ownership should transfer to the new owner.
Frequency: Sometimes. Maybe 1 in 100 transfers.
Environment: Production only. Can't reproduce locally.
Relevant code:
[pasted the transferCard function]
What I've tried:
- Added logging to the function
- Checked for database errors
- Verified the API is being called correctly
AI:
Top 5 most likely causes for intermittent silent failures:
1. Race condition in concurrent transfers
The ownership check and update aren't atomic. Two concurrent
transfers could both pass the ownership check before either writes.
Confirm: Add logging with timestamps before check and after write.
Look for overlapping timestamps in production logs.
2. Database transaction isolation
Default isolation level might allow dirty reads where the ownership
check sees stale data.
Confirm: Check your database isolation level. Test with two
simultaneous transfers of the same card.
3. Stale cache
If you're caching card data, the check might use cached ownership
while the write goes to the database.
Confirm: Check for any caching layer. Try disabling cache and
testing.
4. Optimistic locking failure without proper handling
If using optimistic locking, version conflicts might be silently
swallowed.
Confirm: Search for where version conflicts are caught. Check if
they're being logged.
5. Unhandled promise rejection
An async operation might be failing without the error bubbling up.
Confirm: Add try/catch around all async operations. Add
.catch() to any promises.
The race condition hypothesis led me to add timestamp logging. Sure enough, I found cases where two transfers were within 50ms of each other. Both passed the check. Both tried to write. Only one succeeded. The other silently did nothing because the ownership had already changed.
Good debugging is scientific. Hypothesis, test, refine.
Based on our debugging so far:
Hypothesis: [what you now think the cause is]
Evidence: [what supports this hypothesis]
Against: [what doesn't fit this hypothesis]
Next test: [what would confirm or refute this]
Generate the specific code or query I should run to test this hypothesis.
This keeps the debugging focused. No more random changes hoping something works.
When you have logs but can’t see the pattern:
Analyze these logs for the bug we're tracking.
Bug: [describe the symptom]
Logs from successful operation:
[paste logs]
Logs from failed operation:
[paste logs]
Compare them:
1. What's different between success and failure?
2. What's missing in the failure case?
3. What sequence of events leads to failure?
4. What timestamp patterns do you notice?
AI is good at spotting differences humans miss. Different order of operations. Missing log entries. Timing anomalies.
When you have a stack trace but don’t understand it:
Explain this stack trace and help me find the root cause.
Error: [paste the error message]
Stack trace:
[paste the full stack trace]
For each frame:
1. What file and function?
2. What was it trying to do?
3. Is this library code or our code?
Then:
1. Where did the error actually originate?
2. What caused the error?
3. What's the fix?
Sometimes you just need to explain the problem:
I'm stuck on a bug. Let me explain it to you, and ask questions
that help me think through it.
The bug: [describe it]
The code: [paste it]
Ask me questions about:
1. What I've already tried
2. What I expect vs what happens
3. Any recent changes
4. Edge cases I might have missed
AI asking you questions often reveals assumptions you didn’t realize you were making.
When the bug could be anywhere:
Help me narrow down where this bug lives.
System overview: [describe the components involved]
The bug: [describe the symptom]
Help me create a binary search:
1. What's the midpoint? What can I test to determine if the bug
is in the first half or second half of the flow?
2. Based on that test, what's the next midpoint?
Goal: narrow to a single component in as few tests as possible.
AI recognizes these patterns instantly:
“Works sometimes” → Race condition, caching, timing issue
“Works locally, fails in production” → Environment config, data volume, network latency
“Worked yesterday, broken today” → Recent commit, dependency update, data change
“First request works, subsequent fail” → State mutation, connection pool, memory leak
“Works for me, fails for users” → Permissions, data differences, browser/client differences
Tell AI which pattern matches your bug, and it knows where to look.
When you need more visibility:
I need to add logging to debug this issue.
The bug: [describe it]
The code: [paste it]
Generate logging that will help me:
1. Trace the exact path execution takes
2. See the state at each decision point
3. Capture timing information
4. Include enough context to identify the specific request
Use our logging format: [describe your logging pattern]
Make the logging easy to add and remove.
Once you think you’ve found it:
I think I found the bug and have a fix.
The bug: [describe it]
The cause: [describe what was wrong]
The fix: [paste your fix]
Review this fix:
1. Does it actually address the root cause?
2. Could it introduce new bugs?
3. What tests should I add to prevent regression?
4. Are there other places with the same bug pattern?
AI code has predictable bug patterns:
This is AI-generated code that has a bug.
The bug: [describe it]
The code: [paste it]
Common AI code bugs:
- Off-by-one errors in loops
- Missing null checks
- Incorrect async/await handling
- Wrong array methods (map vs forEach vs filter)
- Missing error handling
- Incorrect type coercion
Check for these patterns first.
Debugging is reactive. You find bugs after they exist. But what about bugs in production where you can’t just add console.log?
Tomorrow I’ll cover production debugging: when it’s on fire and you need to find problems with only the observability you already have.
Notice how quickly AI narrows to likely categories. That systematic approach is what makes AI debugging faster than random guessing.
Next time you hit a bug, start with AI instead of ending with it.
At the end of 2025 I was happy to take a long break to enjoy the incredible summers that the southern hemisphere provides. I’m back and writing my first post in 2026 which also happens to be my last post for the AWS News Blog (more on this later).
The AWS community is starting the year strong with various AWS re:invent re:Caps being hosted around the globe, with some communities already hosting their AWS Community Day events, the AWS Community Day Tel Aviv 2026 was hosted last week.
![]() |
![]() |
Last week’s launches
Here are last week’s launches that caught my attention:
Additional updates
These projects, blog posts, and news articles also caught my attention:

Upcoming AWS events
Join us January 28 or 29 (depending on your time zone) for Best of AWS re:Invent, a free virtual event where we bring you the most impactful announcements and top sessions from AWS re:Invent. Jeff Barr, AWS VP and Chief Evangelist, will share his highlights during the opening session.
There is still time until January 21 to compete for $250,000 in prizes and AWS credits in the Global 10,000 AIdeas Competition (yes, the second letter is an I as in Idea, not an L as in like). No code required yet: simply submit your idea, and if you’re selected as a semifinalist, you’ll build your app using Kiro within AWS Free Tier limits. Beyond the cash prizes and potential featured placement at AWS re:Invent 2026, you’ll gain hands-on experience with next-generation AI tools and connect with innovators globally.
Earlier this month, the 2026 application for the Community Builders program launched. The application is open until January 21st, midnight PST so here’s your last chance to ensure that you don’t miss out.
If you’re interested in these opportunities, join the AWS Builder Center to learn with builders in the AWS community.
With that, I close one of my most meaningful chapters here at AWS. It’s been an absolute pleasure to write for you and I thank you for taking the time to read the work that my team and I pour our absolute hearts into. I’ve grown from the close collaborations with the launch teams and the feedback from all of you. The Sub-Sahara Africa (SSA) community has grown significantly, and I want to dedicate more time focused on this community, I’m still at AWS and I look forward to meeting at an event near you!
Check back next Monday for another Weekly Roundup!
About the show
Sponsored by us! Support our work through:
Connect with the hosts
Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 11am PT. Older video versions available there too.
Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it.
Brian #1: Better Django management commands with django-click and django-typer
Michael #2: PSF Lands a $1.5 million sponsorship from Anthropic
Brian #3: How uv got so fast
uv design decisions possibleuv drops many backwards compatible decisions kept by pip.Michael #4: PyView Web Framework
Extras
Brian:
Michael:
Joke: Reverse Superman