How We Build With AI: Architecture and Flexibility Over Keystrokes

TL;DR. We use AI to write a large share of our code, and that is a deliberate choice, not a shortcut. The reason is simple: an agent is fast at producing code and bad at owning a system. So we let it do the typing and we move our engineers up the stack — to architecture, to flexibility, to the whole-system decisions that decide whether a codebase is still pleasant to change a year from now. The industry caught up to this in 2026: 84% of developers now use AI tools daily, and the winning pattern is "generate aggressively, verify rigorously." The catch is that AI optimizes every file locally and nobody is optimizing the codebase globally — which is exactly why the codebases we audit all break the same six ways. Our answer is to keep a human on the part AI cannot see: the architecture. This is how we build, and why.
The job changed, and we changed with it
For most of software's history, the bottleneck was typing. Turning a clear idea into working code was the slow, expensive part, so that is where engineering effort went and where seniority was measured.
That bottleneck is gone. As the June 2026 state of the developer world lays out, AI coding tools have moved from a curiosity to the baseline: 84% of developers reach for them every day, TypeScript overtook Python as the most-used language partly because typed code keeps AI honest, and the most productive engineers no longer write more code by hand — they orchestrate agents that do. The "10x engineer" became a manager of machines.
We did not resist this; we reorganized around it. If an agent can produce a working feature in minutes, then the scarce, valuable human skill is no longer "can you write this function." It is "should this function exist, where does it belong, what happens when the requirements change, and how does it behave when ten thousand people hit it at once." Those are architecture and flexibility questions. They are precisely the questions AI is worst at — and the ones that decide whether a product lives or dies.
The 2026 baseline, in four numbers:
- 84% use AI daily. AI coding tools are the baseline now, not an optional add-on.
- 46% distrust, 3% highly trust. Far more developers distrust AI output than trust it — which is why verification is not optional.
- Multi-agent is the new normal. The most productive engineers direct fleets of agents instead of writing more code by hand.
- TypeScript is #1. Typed languages overtook Python partly because they keep generated code honest.
What we hand to AI, and what we keep
The clearest way to describe our process is to split the work in two: the part we delegate to machines, and the part we refuse to.
We hand AI the keystrokes. Boilerplate, CRUD endpoints, the tenth variation of a form, data-shuffling glue, first-draft tests, migrations, the tedious mechanical transforms that used to eat afternoons — agents are genuinely excellent at all of it. A model can write a correct-looking implementation faster than a person can describe it. So we let it. There is no pride in typing out a paginated list view by hand in 2026.
We keep the architecture. Where does state live? What is the single source of truth for this domain? Which boundaries are allowed to know about which others? How does this scale from one instance to twelve? What is the failure mode when the payment provider times out? How will this shape bend when the client asks, three months from now, for the thing they swear they will never ask for? None of those have a "shortest path to running code" answer — and shortest-path-to-running-code is the only thing an agent optimizes for.
This division is not a compromise. It is the highest-leverage use of both parties. The machine does what it is fast at; the humans do what only humans currently do well. The phrasing going around the industry — "Codex for keystrokes, Claude Code for commits" — is really a statement about altitude: the closer you get to a decision that the whole system depends on, the more a human belongs in the loop.
Why architecture has to stay human
Here is the uncomfortable truth we see every week, and it is the engine behind everything we just described.
We run code audits on a steady stream of AI-built applications. The codebases are wildly different, but the findings almost never are: no tests, the same helper duplicated four times with slight drift, two screens in one project written as if by two different teams, complex state collapsed into callback hell, code that assumes a single machine and corrupts itself the moment it runs on two, secrets committed straight to the repo. Six problems, over and over.
Every one of those is a whole-system property. None of them can be produced by a single good prompt, because no single prompt can see the whole. A model generating code has a narrow context window and no memory of the decision it made three files ago. It makes the locally optimal choice every time — and the sum of locally optimal choices is a codebase that works today and resists every change tomorrow. The drift is not a bug in any one generation; it is the inevitable shape of generation without a global owner.
So the role of the human did not shrink when AI started writing code. It moved — from producing lines to guaranteeing the properties that only exist across lines. Coherent architecture, a single source of truth, state modeled as state instead of a pile of racing callbacks, awareness of the real deployment environment, security that is on by default. That is the connective tissue. It is the difference between a demo and a product, and an agent left alone will skip all of it. We do not let it.
Vibe and verify: how we keep AI code safe
There is a real trust gap in the data, and we take it seriously: across the industry, 46% of developers actively distrust AI output and only 3% highly trust it. The mature response is not to stop using AI — it is to treat everything it produces as untrusted third-party code until proven otherwise. Generate aggressively; verify rigorously. "Vibe and verify."
In practice that means every line an agent writes goes through the same gate a contractor's pull request would:
- It gets read by a human, not skimmed. The confidence trap with AI code is that it looks finished, so the 20% that is not finished is invisible. We read for the things models reliably get wrong — authorization that is missing behind a login screen, input concatenated straight into a query, an endpoint that returns the whole user object when the UI needed a name.
- It gets tests, on purpose. Generating a feature and generating tests for it are two different requests, and the second one rarely happens on its own. We make it happen: a smoke test that the app boots, real coverage around money and auth, and a regression test for every bug we fix.
- It gets measured against the architecture, not just "does it run." A feature that works but invents its own state pattern, or duplicates an existing helper, or reaches across a boundary it should not know about, gets sent back — even when it passes its happy path.
This is the same discipline we wrote up for founders shipping AI prototypes in what breaks between the demo and production. The shorthand we keep coming back to: AI gets you 80% of the way in 5% of the time — and the last 20% is the entire product. We let AI own the 80%. We own the 20%.
Orchestrating agents instead of writing more code
The other shift the 2026 landscape names is multi-agent development: instead of one assistant, specialized agents running in parallel — one drafting the feature, one writing tests, one reviewing for security, one handling the mechanical migration — with an engineer directing the whole thing.
This is how our engineers actually spend a day now. Less time inside a single function, more time deciding what the agents should build, in what order, against which interfaces, and then checking that what came back fits the system. It feels less like typing and more like technical leadership of a very fast, very literal team that needs precise direction and careful review. The leverage is enormous, but only if the person holding the orchestra has a score — an architecture — in their head. Point an agent fleet at a codebase with no spine and you get six audit findings faster.
The leverage cuts both ways
Agents amplify whatever direction they are given. Pointed at a clear architecture, they make a strong codebase grow faster. Pointed at no architecture, they make a tangled one tangle faster. The human deciding the direction is not overhead on top of the AI — it is the thing that decides whether the AI is an accelerator or an avalanche.
Flexibility is the whole point
If there is one word for what we are optimizing for, it is not speed — speed is now cheap. It is flexibility: the ability to absorb a change in requirements without a rewrite.
This is where the architecture-first approach pays off most visibly. The AI-built codebases we audit are fast to start and brutal to change — they calcify, not because the code is good and precious, but because no one dares touch it without tests, without a coherent structure, without knowing what else will break. That is the opposite of flexible. A product that cannot change is dead the day the market moves.
When a human owns the architecture and AI owns the typing, you get both halves: the speed of generation and a system that bends. New requirement comes in, and there is one source of truth to update instead of four copies to hunt down; one state model to extend instead of a callback maze to untangle; one clear boundary to extend instead of a guess about which of two screens does it the "right" way. We did exactly this on Arcana, a streaming AI chat client — the unglamorous work of getting the architecture, streaming, and state model right is what turned a fast prototype into something we could keep evolving instead of rebuilding.
That is the whole thesis: we use AI so our engineers can stop spending their best hours on keystrokes and spend them on the decisions that keep software flexible. The code is the cheap part now. The architecture is the product.
Frequently asked questions
No, because we never treat AI output as finished. We follow the pattern the industry settled on in 2026: generate aggressively, then verify rigorously. Every line an agent writes is reviewed by a human like a contractor pull request, gets real tests around the parts that matter, and is checked against the system architecture before it ships. AI handles the typing; humans guarantee the quality. Used this way, the result is faster and at least as solid as fully hand-written code, because engineers spend their attention on design instead of boilerplate.
Build fast, stay flexible
The teams winning in 2026 are not the ones who refused AI, and they are not the ones who let it run unsupervised. They are the ones who put the machine on the keystrokes and a human on the architecture. That is how we build every project: AI for speed, engineers for the decisions that keep software flexible enough to survive its own success.
If you have an AI-built app and you are not sure whether it can keep evolving, run it through an AI code audit, or book a free assessment. We will tell you where the architecture is solid, where it is about to bite you, and what it takes to fix — with a fixed-scope quote, not a guess.
