I Interviewed My AI Coding Agent

I spent part of today doing something that feels obvious in hindsight.

I interviewed my AI coding agent.

Not as a gimmick. Not as a prompt hack. I opened my agent context repo, asked Codex to interview me one question at a time, and turned the answers into version-controlled instructions.

This was partly inspired by Garry Tan’s thread about giving agents a real operating context. I am not trying to restate his point here. The lower-level thing I wanted to figure out was more practical:

How do I make Codex understand how I actually work?

Not in a vague way. In a specific way.

When should it push back?
When should it stop asking and just execute?
What does done mean?
How should it write in my voice?
What are my blind spots?
What should it never store?

That context is easy to talk about and hard to maintain. So I put it in files.

The setup

I have a small repo for agent context.

The important files are:

SOUL.md for voice, judgment, and operating character
USER.md for how I work and what I care about
AGENTS.md for global operating rules
generated/AGENTS.md for the rendered Codex instructions

The source files are written by hand. The generated file is installed into Codex.

That matters because I do not want a giant blob of custom instructions that I never review again. I want this to behave like code. If I learn something about how I work with AI, I can update the source, render the output, install it, and commit the change.

That gives me a real history.

How the session worked

My first version of the prompt was too broad.

I asked Codex to interview me in batches of 10 to 15 questions. Almost immediately, I changed that. Too many questions at once creates generic answers.

So I switched to one question at a time.

That was the right move.

After each meaningful answer, Codex updated the source files, rendered the generated AGENTS.md, installed it locally, and committed the change.

The commits were intentionally small. One commit for writing style. One for autonomy. One for verification. One for product taste. One for privacy. One for consolidation.

That made the process feel less like “configure your assistant” and more like maintaining a working system.

The questions Codex asked me

The exact list mattered less than the shape of the questions.

The useful pattern was to ask about categories of behavior:

Voice and writing

Where does the agent’s writing stop sounding like me?
When should it preserve my phrasing, and when should it rewrite for clarity?
How should it handle dictated notes versus fast typed notes?
What should public writing avoid?

Personality and taste

How much personality should show up in work sessions?
Which references or callbacks are fun, and which would get annoying?
Should those be hard rules or soft preferences?

Autonomy and permission

What should the agent just do by default?
What actions still need explicit approval?
Where is speed useful, and where does it create real risk?

Engineering workflow

What does “done” mean for code work?
Which checks should run before a change is considered complete?
Should the local server already be running after a change?
Which tooling paths should be tried first?

Pushback and decision-making

When should the agent challenge me?
Should it lead with the risk, the recommendation, or the blunt objection?
What warning signs show that I am moving too fast or over-scoping?
What decisions should be sanity-checked even if it slows the session down?

Goals and prioritization

Which goals should shape tradeoffs right now?
Which project ideas are worth keeping warm?
When should the agent help me stay focused instead of chasing a new idea?

Privacy and boundaries

What should never be stored?
What details are fine for private working context but not public writing?
Which kinds of production, account, or identity-related actions require extra care?

That reads like a lot, but the session did not feel heavy once it moved one question at a time.

Each answer had a specific place to go.

If it changed the agent’s voice, it went into SOUL.md.
If it described how I work, it went into USER.md.
If it changed behavior for future sessions, it went into AGENTS.md.

That separation helped a lot.

The questions that mattered most

Some questions were more useful than others.

What should I stop asking permission for and just do by default?

This clarified autonomy.

For me, local reversible work should usually just happen. Read files. Edit intended files. Run local checks. Start local dev servers. Fix local breakage caused by the task.

But destructive actions, production API use, deploys, pushes, paid actions, and hard-to-undo external state changes should still get an explicit check.

That one distinction makes the agent much faster without making it reckless.

What kind of verification is required before code work is done?

This became one of the most important rules.

For user-facing work, done means the full documented verification suite. Tests, build, typecheck, lint where applicable, and smoke tests for the changed path.

If something deploys to production, local smoke tests are not enough. The live deployment has to be checked too.

That is now part of the default workflow.

Where do agents most often get your intent wrong?

My answer was writing style.

AI often writes with too many commas, too many dashes, and long sentences that technically make sense but do not sound like me.

So I added rules for that:

rewrite scattered thoughts for clarity
keep the voice direct and plain
avoid dash-heavy and comma-heavy prose
do not make drafts sound corporate or assistant-like
treat dictated input differently from typed input

That last one matters. Sometimes I dictate with speech-to-text, so the input looks clean but is still spoken raw material. Other times I type quickly in lowercase with random dashes and fragments. Neither surface style is the final target.

The agent has to infer the message and write the clean version.

When should I push back?

This helped define the relationship.

I do not want pushback framed as vague caution. I want risk-first pushback.

Name the risk. Explain why the current decision creates it. Recommend the stronger path.

That is especially useful when I am over-scoping, skipping verification, moving too fast, or looping on a small decision for too long.

What does high-agency engineering partner mean?

This one made the tooling preferences much sharper.

High agency means wasting no time.

Prefer native tools, first-party APIs, official CLIs, and scriptable workflows. Avoid GUI clicking when a reliable command-line or API path exists. Browser work is still valid for visual checks, but it should not replace a better automated path.

That is now written down.

What context should be private or never stored?

This was the cleanup question.

I am pretty open to using AI as a utility, but some things do not belong in durable context: secrets, production keys, auth tokens, exact home addresses, social security numbers, and anything identity-compromising.

That line is now explicit.

What changed by the end

The agent now has durable guidance for:

how I want writing to sound
when it should ask permission
when it should just execute
how to verify work before calling it done
when local servers should already be running
how to push back
how to handle public writing
how to handle privacy
how to think about product taste
how to keep my main goals in view
when to call out focus drift
which tooling paths to prefer

That is a lot of context, but it is not random.

It is operating context.

The version-controlled part is the key

I do not want important working preferences trapped in one chat thread.

I want them reviewed, edited, committed, and improved.

The biggest benefit of putting this in Git is not just backup. It is accountability.

If an instruction gets too broad, I can tighten it.
If a preference becomes stale, I can remove it.
If I notice the agent repeatedly making the same mistake, I can turn the correction into a source-file update.

That feels much better than hoping the next session remembers what I meant.

This will be different for everyone

My setup is not the right setup for everyone.

Some people want more permission checks. I want local reversible work to move quickly.

Some people want the agent to plan more. I want it to sanity-check, decide, execute, verify, and ship.

Some people want polished writing. I want plain English that sounds like me.

Some people want a lot of UI controls. I prefer simple products where every button and step earns its place.

The point is not to copy my files.

The point is to interview your agent about how you work.

My takeaway

The model matters.

The tools matter.

But the operating context around the agent might matter just as much.

If you use AI agents for real work, ask them to interview you. Make the questions concrete. Answer honestly. Turn the answers into version-controlled instructions. Then keep refining them as your workflow changes.

That process made Codex feel less like a tool I prompt and more like a system I can tune.

That feels like the right direction.