I Interviewed My AI Coding Agent
How I turned a long Codex session into version-controlled instructions for how I work, write, ship, and make decisions.
I spent part of today doing something that feels obvious in hindsight.
I interviewed my AI coding agent.
Not as a gimmick. Not as a prompt hack. I opened my agent context repo, asked Codex to interview me one question at a time, and turned the answers into version-controlled instructions.
This was partly inspired by Garry Tan’s thread about giving agents a real operating context. I am not trying to restate his point here. The lower-level thing I wanted to figure out was more practical:
How do I make Codex understand how I actually work?
Not in a vague way. In a specific way.
- When should it push back?
- When should it stop asking and just execute?
- What does done mean?
- How should it write in my voice?
- What are my blind spots?
- What should it never store?
That context is easy to talk about and hard to maintain. So I put it in files.
The setup
I have a small repo for agent context.
The important files are:
SOUL.mdfor voice, judgment, and operating characterUSER.mdfor how I work and what I care aboutAGENTS.mdfor global operating rulesgenerated/AGENTS.mdfor the rendered Codex instructions
The source files are written by hand. The generated file is installed into Codex.
That matters because I do not want a giant blob of custom instructions that I never review again. I want this to behave like code. If I learn something about how I work with AI, I can update the source, render the output, install it, and commit the change.
That gives me a real history.
How the session worked
My first version of the prompt was too broad.
I asked Codex to interview me in batches of 10 to 15 questions. Almost immediately, I changed that. Too many questions at once creates generic answers.
So I switched to one question at a time.
That was the right move.
After each meaningful answer, Codex updated the source files, rendered the generated AGENTS.md, installed it locally, and committed the change.
The commits were intentionally small. One commit for writing style. One for autonomy. One for verification. One for product taste. One for privacy. One for consolidation.
That made the process feel less like “configure your assistant” and more like maintaining a working system.
The questions Codex asked me
The exact list mattered less than the shape of the questions.
The useful pattern was to ask about categories of behavior:
Voice and writing
- Where does the agent’s writing stop sounding like me?
- When should it preserve my phrasing, and when should it rewrite for clarity?
- How should it handle dictated notes versus fast typed notes?
- What should public writing avoid?
Personality and taste
- How much personality should show up in work sessions?
- Which references or callbacks are fun, and which would get annoying?
- Should those be hard rules or soft preferences?
Autonomy and permission
- What should the agent just do by default?
- What actions still need explicit approval?
- Where is speed useful, and where does it create real risk?
Engineering workflow
- What does “done” mean for code work?
- Which checks should run before a change is considered complete?
- Should the local server already be running after a change?
- Which tooling paths should be tried first?
Pushback and decision-making
- When should the agent challenge me?
- Should it lead with the risk, the recommendation, or the blunt objection?
- What warning signs show that I am moving too fast or over-scoping?
- What decisions should be sanity-checked even if it slows the session down?
Goals and prioritization
- Which goals should shape tradeoffs right now?
- Which project ideas are worth keeping warm?
- When should the agent help me stay focused instead of chasing a new idea?
Privacy and boundaries
- What should never be stored?
- What details are fine for private working context but not public writing?
- Which kinds of production, account, or identity-related actions require extra care?
That reads like a lot, but the session did not feel heavy once it moved one question at a time.
Each answer had a specific place to go.
- If it changed the agent’s voice, it went into
SOUL.md. - If it described how I work, it went into
USER.md. - If it changed behavior for future sessions, it went into
AGENTS.md.
That separation helped a lot.
The questions that mattered most
Some questions were more useful than others.
What should I stop asking permission for and just do by default?
This clarified autonomy.
For me, local reversible work should usually just happen. Read files. Edit intended files. Run local checks. Start local dev servers. Fix local breakage caused by the task.
But destructive actions, production API use, deploys, pushes, paid actions, and hard-to-undo external state changes should still get an explicit check.
That one distinction makes the agent much faster without making it reckless.
What kind of verification is required before code work is done?
This became one of the most important rules.
For user-facing work, done means the full documented verification suite. Tests, build, typecheck, lint where applicable, and smoke tests for the changed path.
If something deploys to production, local smoke tests are not enough. The live deployment has to be checked too.
That is now part of the default workflow.
Where do agents most often get your intent wrong?
My answer was writing style.
AI often writes with too many commas, too many dashes, and long sentences that technically make sense but do not sound like me.
So I added rules for that:
- rewrite scattered thoughts for clarity
- keep the voice direct and plain
- avoid dash-heavy and comma-heavy prose
- do not make drafts sound corporate or assistant-like
- treat dictated input differently from typed input
That last one matters. Sometimes I dictate with speech-to-text, so the input looks clean but is still spoken raw material. Other times I type quickly in lowercase with random dashes and fragments. Neither surface style is the final target.
The agent has to infer the message and write the clean version.
When should I push back?
This helped define the relationship.
I do not want pushback framed as vague caution. I want risk-first pushback.
Name the risk. Explain why the current decision creates it. Recommend the stronger path.
That is especially useful when I am over-scoping, skipping verification, moving too fast, or looping on a small decision for too long.
What does high-agency engineering partner mean?
This one made the tooling preferences much sharper.
High agency means wasting no time.
Prefer native tools, first-party APIs, official CLIs, and scriptable workflows. Avoid GUI clicking when a reliable command-line or API path exists. Browser work is still valid for visual checks, but it should not replace a better automated path.
That is now written down.
What context should be private or never stored?
This was the cleanup question.
I am pretty open to using AI as a utility, but some things do not belong in durable context: secrets, production keys, auth tokens, exact home addresses, social security numbers, and anything identity-compromising.
That line is now explicit.
What changed by the end
The agent now has durable guidance for:
- how I want writing to sound
- when it should ask permission
- when it should just execute
- how to verify work before calling it done
- when local servers should already be running
- how to push back
- how to handle public writing
- how to handle privacy
- how to think about product taste
- how to keep my main goals in view
- when to call out focus drift
- which tooling paths to prefer
That is a lot of context, but it is not random.
It is operating context.
The version-controlled part is the key
I do not want important working preferences trapped in one chat thread.
I want them reviewed, edited, committed, and improved.
The biggest benefit of putting this in Git is not just backup. It is accountability.
- If an instruction gets too broad, I can tighten it.
- If a preference becomes stale, I can remove it.
- If I notice the agent repeatedly making the same mistake, I can turn the correction into a source-file update.
That feels much better than hoping the next session remembers what I meant.
This will be different for everyone
My setup is not the right setup for everyone.
Some people want more permission checks. I want local reversible work to move quickly.
Some people want the agent to plan more. I want it to sanity-check, decide, execute, verify, and ship.
Some people want polished writing. I want plain English that sounds like me.
Some people want a lot of UI controls. I prefer simple products where every button and step earns its place.
The point is not to copy my files.
The point is to interview your agent about how you work.
My takeaway
The model matters.
The tools matter.
But the operating context around the agent might matter just as much.
If you use AI agents for real work, ask them to interview you. Make the questions concrete. Answer honestly. Turn the answers into version-controlled instructions. Then keep refining them as your workflow changes.
That process made Codex feel less like a tool I prompt and more like a system I can tune.
That feels like the right direction.