In my latest project, a Coaching Operating System called Sumitos to make my #CTOCoaching operation smoother, in dire need due to a big influx of new CTO clients, I came to the point to no longer look at the code I ship.
In the beginning I’ve worked on the project like every other project over the last decades, by reading code, thinking and writing code. I did use LLMs to generate some of the code for me, with Claude Code and Cursor, bigger and bigger junks.
I now no longer look at the code and just let Claude Code do it’s thing.
This sounds incredible. But it works. Let’s see how.
Let’s look at some reasons why this works:
I have 40 years of coding experience in many different languages and frameworks. I might be able to write better prompts because I know what can go wrong.
Obviously always use Git. When using more than one agent to change code, I use git worktrees. Most often though I have two Claude Codes running, with one working and one planning with me.
From working with other agents, Claude Code, especially with Sonnet 4.5 is by far the best coding agent. People might think there is little difference in coding agents, but there is.
I use the $100 Max plan from Anthropic which mostly gets me through the day. Sometimes Anthropic changes the algorithm and token usage and I run out of tokens. As of this writing I’m not running out of tokens with the $100 Max plan.
Agile work - small changes, agile discovery. Feature by feature, not huge chunks of code at once. I check the result and if I don’t like it, I change it. I ship it if I like it.
My work and the code structure I’ve started with act as a great template for the LLM to base it’s code changes on. Without a good starting template the LLM goes haywire from the beginning. I suspect coding frameworks with great templates, like Django and Rails, work better than code projects without great starter templates. My - structured - code was my starter template.
My code is setup in vertical modules not in layers. Layers are confusing to LLMs, because they have too many dependencies, essentially the whole top layer depends on the layer below. Changes to that layer ripple through the code base. A Modulith with decoupled modules works much better for LLMs as it keeps the impact on other code low and it fills the context window with the right context, instead of those large layers. I suspect this has a huge impact on the success of the LLM.
Use Claude Code plan mode for everything! Don’t jump into prompting. Iterate on the plan, not on the prompt. Iterating on the prompt just creates more and more wrong code until the LLM chokes. I write my requirements into Claude Code plan mode. I read the plan, I add or remove requirements until I’m happy with the plan. I tell Claude in plan mode to finally check the code and the documentation (with URLs) to make sure the plan works. I tell Claude Code to check if there is pre-existing functionality to reuse for the plan. I point Claude Code to libraries it should use. If I’m happy, only then do I tell Claude to start coding.
I have many tests in place. After each feature, I ask Claude Code to refactor the new code to extract side effect free functions and write code for them. Claude Code runs the tests after changes and I told it to decide for failing tests if the tests need to be adapted or the code has a bug. Otherwise, Claude Code might undo the code changes just to make the test working again.
The stack I use is very simple: Go, Alpine.js and HTMX. It might be the case that with having a compiler, the very simple language and the simple type system of #Go - most often there is only one simple way to do things in Go - Claude Code works better than with some other languages. Also has a big code base on Github for LLMs to be trained on. Finally the compiler is really fast, and as Claude Code compiles during changing code, this is not slowing down the AI.
My build process includes all the Go linters like govet, goimports, gosec, golangci-lint, nilaway, govulncheck, staticcheck, and more. CLAUDE.md tells Claude Code to run the “make lint” target after it finished up coding and to fix the detected problems.
Sometimes I tell Claude to create a small Python script to make changes, or understand data, instead of changing code directly.
I run prompts to check for security, architecture, testing opportunities and weak code, then letting Claude Code suggest changes - Plan Mode! - to make the code better after some features.
My work flow model is:
Let the LLM code, code, code, then I do some refactoring through the LLM. The LLM is a probability machine, it creates the code that is most probable for the existing codebase. Starting from scratch, or with a bad codebase, the LLM will make it much worse. LLM, LLM, LLM, refactor, LLM, LLM, LLM, refactor leads to good code. With bad code, LLMs come to a point where they can’t move forward, because one thing breaks and the fix breaks something else.This is a game of trust. Many developers do not trust an LLM.
My question is: Do you trust your coworkers? If you do, why?The setup works very well for greenfield projects and is very suitable for startups. Hire experienced senior developers and you will be able to let LLMs write most of the code and gain maximum development speed. Make developers product engineers and teach them business and product understanding. You can easily outpace your competition who works in traditional Scrum/Senior Developer setups.
Caveat: I would not trust AIs with the core of a large existing system, or a complex code base, in fintech, health-tech etc. or other CTO positions I’ve held in the past - not yet. To get more and more modules in your company work this way, the key is to have proper guardrails in place.
Overall think of the old times, a Head of QA gave green or red light for a release based on the guardrails - tests, not by looking at the code. Neither should you need to look at the code to make that decision.



