The AI hater's guide to code with LLMs (The Overview)

February 12th, 2026

Introduction

This is the post I don’t think many people expected me to write, and I have (rightly!) surrounded myself with people who are generally uncomfortable to somewhat hostile to “AI”, mostly for good reasons, though I’ll get into the many caveats on that as I go.

As activists mitigating the harms of “AI”, we need to be well informed, and we need to understand what the specific harms are. Treating it with a hands-clean purist mindset will be extremely difficult and as activism, more alienating than effective. These are genuinely useful tools, and pretending they aren’t will not in fact win many hearts and minds.

This post is going to be very long, because in addition to technical context, I’m touching social issues, technical background, discourse norms, context in a culture of rising technocracy and fascism funded by venture capital, and the erosion of our information systems and cultural norms all at once. I can’t get into it all here, but I am not staying away from it on purpose.

Overall, I still believe that LLMs are a net negative on humanity, that the destruction of our infosphere is going to have generational consequences, and that if the whole thing disappeared from the face of the earth tomorrow, I wouldn’t be sad. The damage is would still be out there, but the cheapness of bullshit pervading everything would at least resume being human content mill scale. Not to say that that was good before LLMs came along and made it this bad, but it was better.

That said, that’s not going to happen, and the amount of effort required to make it happen would be much better spent on organizing labor and climate action. The AI industry may collapse in a house of cards. I think it somewhat likely considering amount of financial trickery these companies are using. But as someone I know put it: we’re not just going to forget that computers can write code now. We aren’t.

I want you to think about all of this with an intensely skeptical mind. Not hostile, mind you, but skeptical. Every claim someone makes may well be checkable. You can check! I recommend you do so. My math in this essay will be rough back of envelope calculation, but I think that is appropriate given the tendency of the costs of technology to change orders of magnitude, and situationally for things to vary by at least a factor of two.

And since we’re both operating in the domain of things not long ago considered science fiction, and because the leadership of AI companies tend to be filled with people with a love of science fiction, many of whom won’t hesitate to, as is said, create the Torment Nexus from the popular science fiction novel Don’t Create The Torment Nexus, I suggest one story to read and keep in mind: Marshall Brain’s “Manna – Two Views of Humanity’s Future”.

TL;DR

There are open models and closed; good code work needs to be done on stuff that needs very high end hardware to run, at least in part.
Chinese models are quite good, and structured differently as companies.
Don’t bother running models on your own hardware at home to write code unless you’re a weird offline-first free software zealot. I kind of am, and still I see the futility with the hardware I have on hand.
Nobody agrees on the right way to do things.
Everyone is selling something. Usually a grand vision hand-waving the hard and bad parts.
I’ll write more about how to actually use the tools in another segment.
A lot of the people writing about this stuff are either executives who want to do layoffs, or now-rich people who made it big in some company’s IPO. Take what they say with the grain of salt you’d use for someone insulated by money and who can have free time relatively easily. They are absolutely hand-waving over impacts they themselves will not experience.

A note on terms

I am writing this with as much verbal precision as I can muster. I loathe terms like “Vibe Code”, and in general I am not jumping on any marketing waves and hype trains. I’m being specifically conservative in the words I use. I say LLM, not “AI”, when talking about the text generation models at the heart of most of the “AI” explosion. I’ll prefer technical terms to marketing buzzwords the whole way through, even at the cost of being awkward and definitely a little stodgy. Useful precision beats vacuous true statements every time, and the difference now very much matters.

The Models

There are a zillion models out there. Generally the latest and greatest models by the most aggressive companies are called “frontier” models, and they are quite capable. The specific sizes and architectures are somewhat treated as trade secrets, at least among the American companies, so things like power required to operate them and the kind of equipment required is the sort of things analysts in the tech press breathe raggedly over.

the American frontier models include

Anthropic’s “Claude Opus”
OpenAI’s GPT-5.2
Google Gemini 3 Pro
something racist from xAI called Grok.

The frontier models are a moving target as they’re always the most sophisticated things each company can put forth as a product, and quite often they’re very expensive to run. Most of the companies have tools that cleverly choose cheap models for easy things and the expensive models for difficult things. Remember this when evaluating anything resembling a benchmark: it’s an easy place to play sleight of hand.

When you use a frontier model company’s products, most of the time you interact with a mix of models. This is usually a somewhat cheaper to run version of the frontier models as the main mode, sometimes offering the true best model as an option, a thing that is sometimes invoked, and the whole thing is hidden behind a façade that makes it all look the same. Version numbers often resemble cell phone marketing, with a race to have bigger numbers, “X” and “v” in places to make it seem exciting. There is no linear progression nor comparison of any of the numbers in the names of models or products.

I largely have no interest in interacting with the American frontier model companies, as their approach is somewhat to dominate the industry and burn the world doing it. Anthropic is certainly the best of the bunch but I really don’t want to play their games.

I do not know this for sure, but I expect these models run into the terabytes of weights, more than a trillion parameters, plus they are products with a lot of attached software — tools they can invoke, memory and databases and user profiles fed into the system.

Behind them are the large models from other AI companies, largely Chinese, producing research models that they and others operate as services, and often they are released openly (called “open weights models”). Additionally some of the frontier model companies will release research models for various purposes. All core AI companies pretty much style themselves as research organizations first, and product companies second. Note that nearly every AI company calls its best model a frontier model, whether it fits with the above or not.

Chinese companies and therefore models often have a drive for efficiency that the American ones do not. They are not the same kind of market-dominating monopolist-oriented sorts that VC-funded American companies are. They aren’t as capable, but they do more with less. They’re very pragmatic in their approach compared to the science fiction fueled leadership of American AI companies. These models run in the hundreds of gigabytes and have hundreds of billion of parameters, though most can be tweaked to run some parts in a GPU and the rest on a CPU in main memory, if slowly. They can run on regular PC hardware, if extremely high end hardware, and distillations and quantizations of these models, while they lose some fidelity, fit on even more approachable hardware. Still larger than most people own, but these are not strictly data-center-only beasts.

Large, capable open models (Mostly Chinese) include:

z.AI’s GLM-4.7 and GLM-5
Kimi K2.5
MiniMax M2.1
DeepSeek-V3.2
Alibaba’s Qwen3-Max
Mistral Large 3
Trinity Large

Mistral Large 3 comes out of Europe. Trinity comes out of the US, but has a less “win the AI race” mindset. There’s a lot of superpower “We need our own sovereign solution” going on. China, the US and Europe are all making sure they have a slice of the AI pie.

I’m sure there’s more — the field is ever changing, and information about the models from Chinese companies percolates slowly compared to the American frontier models.

Behind these models are specialized smaller models, often sort-of good for code writing tasks if one isn’t challenging them, but I actually think this is where the line of usefulness is drawn.

Medium-small coding models include:

Qwen2.5-Coder
GPT-OSS 120b
Mistral’s Codestral
GPT-4.7-Flash
Claude Haiku
Gemini 2.5 Coder
Smaller versions of Qwen3
Smaller versions of many other models

There’s also some much smaller models that will run on large gaming GPUs. I don’t think they’re quite useful, they’re very attractive toys that people can get to do some truly impressive things, but I don’t think they’re all that. They are, however, about the capability of what knee-jerk AI-haters expect, error-prone lossy toys that if anyone called “the future”, I’d laugh in their face or spit at their feet. Notice how far down the list this is.

The Economics

LLMs are expensive pieces of software to run. Full stop, anything with broad utility is something that requires a GPU greater than most high end gaming PCs, and quite a lot of RAM. I am setting a high bar here for utility, because AI boosters tend to have a frustrating way of equivocating, showing low numbers for costs when it suits them, and high ones for performance, despite not being from the same models. There are domain specific tasks and models that can work in mere small GPU or even Raspberry Pi levels of computation, but for general purpose “reasoning” tasks and coding specifically, right now in 2026, with current model efficiencies, and with current hardware, if you want to use LLMs for writing software, you will be throwing a lot of computing power at it. A $5000 budget would barely suffice to run something like gpt-oss 120b (OpenAI’s open model that is okay at code-writing tasks). Additionally, if you kept the model busy 100% of the time, you might be talking $50-$200 in electricity depending on local prices, per month.

If you spent $15,000 and triple the electricity you could run something like GLM-4.7 at a really good pace.

Water cooling for data centers is probably the most talked about environmental cost, but I think it’s actually a distraction most of the time. Dear god why do people build data centers in Arizona, that’s a travesty, but also that’s a specific decision made by specific people with names and addresses who should be protested specifically.

Data-center growth at the cost of people driving up electricity demand is a big problem, and we need to get back on the solar train as fast as possible.

This is not inexpensive software to run. However, it’s not an unfathomable amount of power.

Training models is wildly expensive, but it amortizes. There are in fact difficult economic conversations we need to be having here, but it’s all obscured by the fog of “what about the water?” and “AI will save us all and change everything!” that pervades the discourse. The framing of the arguments at large are fundamentally misleading, by basically everyone, pro or anti-AI, and much more about affiliative rhetoric than argumentative. We need to have the arguments, and actually look for and persuade people of the truths. They’re uncomfortable so I fully understand why we’re not very often, but if we want to actually solve crises, we need to talk with actual truths in mind.

With prices of $200/month for “Max” plans, if one uses the tools well, a company would in fact be making a smart decision to get their developers using them. They are definitely below cost, probably by at least 3-5x. Maybe 10x. (Remember that a price shock will come at some point before depending on the economics of these systems in existential ways for a business.)

Even at cost the math works out for a great many use cases.

Light plans are $20/month, and I think that for intermittent use, with good time sharing, that’s quite sustainable. In my experimentation I’m paying even less than that, and while I don’t think those prices will be sustained, I don’t think they’re impossible either.

Most of the big providers and almost all of the hosted open model providers have a pay-by-the-token API option. This is an unpackaged a-la-carte offering, in the style of cloud providers. They nickle and dime you. The model while transparent is hard to calculate. The usual rates are in prices per million input tokens and per million output tokens. Input tokens are cheaper, but interactions with tools will re-send them over and over so you get charged for them multiple times. Output tokens are more expensive but closer to one-time things. Expensive models can be $25 per million output tokens and $5 per million input tokens (Claude Opus 4.6). I expect this reflects a decent margin on the true costs, but I have not a ton to back this expectation up. Most open models run in the realm of $0.50-$3 per million input tokens and $1-$5 per million output tokens. Given that a lot of the open models are run by companies with no other business than running models, I expect these represent near true financial costs. There’s no other business nor investment to hide any complexity in.

The Tools

Most of the tools can talk to most of the models in some way. Usually each has a preferred model provider, and doing anything else will be a lesson in configuration files and API keys. Some more so than others.

Most of the tools are roughly as secure as running some curl | bash command. They kinda try to mitigate the damage that could happen, but not completely, and it’s a losing battle with fundamentally insecure techniques. Keep this in mind. There are ways to mitigate it (do everything in containers) but you will need to be quite competent with at least Docker to make that happen. I have not, I’m going for being a micromanaging busybody and not using anything resembling “YOLO mode”. I also back everything up and am not giving permission to write to remote repos, just local directories.

I know terminal-based tools more than IDEs, though I’ll touch on IDE-integrated things a bit. I haven’t used any web-based tools. I grew up in terminals and that’s kinda my jam.

Claude Code is widely acknowledged as best in class, has a relatively good permission model, and lots of tools hook into it. It’s the only tool Anthropic allows with their basic consumer subscriptions. If you want to use other tools with Claude, you have to pay by the token. Can use other models, but it’s a bit of a fight, and a lot of APIs don’t support Anthropic’s “Messages” API yet.
OpenAI Codex is OpenAI’s tooling. It’s got decent sandboxing, so that what the model suggests to run can’t escape and trash your system nearly so easily. It’s not perfect but it’s quite a bit better than the rest. It’s a bit of a fight to use other models.
OpenCode touts itself as open source, when in reality most stuff is. It’s a bit less “please use my company’s models” than most tools, and it’s the tool I’ve had the best luck with. It has two modes — Build and Plan — and using them both is definitely a key to using the tool well. Plan mode creates documents and written plans. Build does everything else and actually changes files on disk.
Kilo Code is both a plugin for VS Code, and a tool in the terminal. It has not just two modes but five, and more can be customized. “Code”, “Architect”, “Ask”, “Debug”, and “Orchestrator”. Orchestrator mode is interesting in that it’s using one stream of processing with one set of prompts to evaluate the output of other modes. This should allow more complex tasks without failing because there’s a level of oversight. I’ve not used this yet, but I will be experimenting more. Its permission model is pretty laughable but at least it starts out asking you if it can run commands instead of just doing it.
Charmbracelet Crush is aesthetically cute but also infuriating, and it’s very insistent on advertising itself in commit messages. I’ve not yet seen if I can make it stop, but it did make me switch back to OpenCode.
Cursor — App and terminal tool. Requires an account and using their models at least in part, though you can bring your own key to use models through other services.
Cline — Requires an account. IDE plugins and terminal tools.
TRAE — IDE with orchestration features. Intended to let it run at tasks autonomously. I’ve not used it.
Factory Droid. Requires an account. Can bring your own key.
Zed. IDE editor, with support for a lot of providers and models.

TL;DR

I like OpenCode; Kilo Code and Charmbracelet Crush are runners-up. The (textual) user interface is decent in all three, and it’s not loud, it’s not fancy, but it’s pretty capable. At some point I’ll try orchestration and then maybe Kilo Code will win the day. You’re not stuck with just one tool either.

Antagonistic Structures and The Blurry JPEG of the Internet

At its core, you can think of LLMs as extremely tight lossy data compression. The idea that it is a “blurry JPEG of the internet” is not wrong in kind, though in scope it understates it. Data compression is essentially predicting what’s next, and that’s exactly what LLMs do. Very different specifics, but in the end, small bits of stuff go in, large outputs come out. It’s also “fancy autocomplete”, but that too undersells it because when you apply antagonistic independent chains of thought on top, you get some much more useful emergent behavior.

A pattern that you have to internalize is that while lots of these tools and models are sloppy and error-prone, anything you can do to antagonize that into being better will be helpful. This is the thing where I show you how LLM code tools can be a boon to an engineer who wants to do things well. Suddenly, we have a clear technical reason to document everything, to use a clear type system, to clarify things with schemas and plans, to communicate technical direction before we’re in the weeds of editing code. All the things that developers are structurally pushed to do less of, even though they’re always a net win, are rewarded.

You will want your LLM-aided code to be heavily tested. You will want data formats fully described. You will want every library you use to have accurate documentation. You will use it. Your tools will use it.

You will want linters. You will want formatters. Type systems help.

This pattern goes deep, too. Things like Kilo Code’s “Orchestrator” mode and some of Claude Code’s features work as antagonistic checks on other models. When one model says “I created the code and all the tests pass” by deleting all the failing tests, the other model which is instructed to be critical will say “no, put that back, try again”.

One of the big advances in models was ‘reasoning’ which is internally a similar thing: If you make a request, the model is no longer simply completing what you prompted, but instead having several internal chains of thought approaching it critically, and then when some threshold is met, continuing on completing from there. All the useful coding models are reasoning models. The model internally antagonizes itself until it produces something somewhat sensible. Repeat as needed to get good results.

Even then, with enough runtime, Claude will decide that the best path forward is to silence failing tests, turn off formatters, or put in comments saying //implement later for things that aren’t enforced. Writing code with these tools is very much a management task. It’s not managing people, but sometimes you will be tempted to think so.

The Conservative Pressure

So here’s the thing about LLMs. They’re really expensive to train.

There’s two phases: “pre-training” (which is really more building the raw model, it’s most of training), and “post-training” (tailoring a general model into one for certain kind of tasks).

Models learn things like ‘words’ and ‘grammar’ in pre-training, along with embedded, fuzzy knowledge of most things in their training set.

Post-training can sort-of add more knowledge, giving it a refresher course in what happened since it came out. There’s always a lag, too. It takes time to train models.

The thing is though that the models really do mostly know only about what they were trained on. Any newer information almost certainly comes from searches the model and tools together do, and stuffs into the context window of the current session, but it doesn’t know anything, really.

The hottest new web framework of 2027 will not in fact be new, because the models don’t know about it and won’t write code for it.

Technology you can invent from first principles will work fine. Technology that existed and was popular in 2025 will be pretty solid. Something novel or niche, code generation will go of the rails much more easily without a lot of tooling.

This is, in the case of front-end frameworks, maybe a positive development in that the treadmill of new frameworks is a long-hated feature of a difficult to simplify problem space for building things that real people touch.

In general however, it will be a force for conservatism in technology. Expect everything to be written in boring, as broken as it ever was in 2025 ways for a while here.

They’re Making Bullets in Gastown

There’s a sliding scale of LLM tools, with chats on one end and full orchestration systems of independent streams of work being managed by yet more LLM streams of work at the other. The most infamous of this is Gastown, which is a vibe-coded slop-heap of “what if we add more layers of management, all LLM, and let it burn through tokens at a prodigious rate?”

Automating software development as a whole will look a lot like this - if employers want to actually replace developers, this is what they’ll do. With more corporate styles and less vibe coded “let’s turn it all to eleven” going on.

Steve’s point in the Gastown intro is that most people aren’t ready for Gastown and it may eat their lunch, steal their baby and empty their bank account. This is true. Few of us are used to dealing with corporate amounts of money and effort and while we think a lot about the human management of it, we don’t usually try to make it into a money-burning code-printer. I think there’s a lot of danger for our whole world here. Unfettering business has never yielded unambiguously good results.

Other tools like this are coming. While I was writing this, multi-claude was released, and there’s more too: Shipyard, Supacode, everyone excited about replacing people is building tools to burn more tokens faster with less human review. They’re writing breathless articles and hand-waving about the downsides (or assuming they can throw more LLMs at it to fix problems.)

I personally want little part in this.

Somewhere much further down the scale of automation is things like my friend David’s claude-reliability plugin, which is a pile of hacks to automate Claude Code to keep going when it stops for stupid reasons. Claude is trained on real development work and “I’ll do it later” is entirely within its training set. It really does stop and put TODOs on the hard parts. A whack upside the head and telling it to keep going sure helps make it make software that sucks less.

Automating the automation is always going to be a little bit of what’s going on. Just hopefully with some controls and not connecting it to a money-funnel and saying full speed ahead on a gonzo clown car of violence.

There’s a lot of this sort of thing.

The Labor Issue

The labor left has had its sights on AI for a while as the obvious parallel to the steam-looms that reshaped mill-work from home craft to extractive industry. We laud the Luddites, who, contrary to popular notions about them were not anti-technology per se, they just saw the extractive nature of businesses using these machines, turning a craft one might make a small profit at to a job where people get used up, mind and body, and exhausted. They destroyed equipment and tried to make a point. In the end they had only moderate success, though they and the rest of the labor movement won us such concepts as “the weekend” and “the 8 hour day”.

Even the guy who made Gastown sees how extractive businesses can - or even must! - be. Maybe especially that guy. We’re starting to see just how fast we can get the evil in unfettered business, capital as wannabe monopolists, to show itself.

Ethan Marcotte knows what’s up: We need to unionize. That’s one of the only ways out of this mess. We, collectively, have the power. But only collectively. We don’t have to become the protectionist unions of old, but we need to start saying “hey no, we’re not doing that” en masse for the parts that bring harms. We need to say “over my dead body” when someone wants to run roughshod over things like justice, equality, and not being a bongo-playing extractive douchecanoe. We’ve needed to unionize for a long time now, and not to keep wages up but because we’re at the tip of a lot of harms, and we need to stop them. The world does not have to devolve into gig work and widening inequality.

Coding with automated systems like this is intoxicating. It’s addictive, because it’s the loot-box effect. We don’t get addicted to rewards. We get addicted to potential rewards. Notice that gamblers aren’t actually motivated by having won. They’re motivated by maybe winning next time. It can lead us to the glassy eyed stare with a bucket of quarters at a slot machine, and it can lead us to 2am “one more prompt, maybe it’ll work this time” in a hurry. I sure did writing webtty.

There’s Something About Art…

Software, while absolutely an art in many ways, is built on a huge commons of open work done by million of volunteers. This is not unambiguously always good, but the structure of this makes the ethics of code generation more complex and nuanced than it is for image generation, writing generation, and video generation. We did in fact put a ton of work out there with a license that says “Free to use for any purpose”. Not to say every scrape of GitHub was ethical at all: I’m sure AGPL code was snaffled up with the rest, and ambiguously or non-permissively licensed too. It is however built on a massive commons where any use is allowed. The social status quo was broken, but the legal line at least is mostly in the clear. (Mostly. This is only a tepid defense of some of the AI company scrapes.)

AI image generation and video generation can get absolutely fucked. It was already hard to make it as an artist because the value of art is extremely hard to capture. And we broke it. Fuck the entire fucking AI industry for this and I hope whoever decided to make it a product first can’t sleep soundly for the rest of their life. I hope every blog post with a vacuously related image with no actual meaning finds itself in bit rot with alacrity.

Decoding the Discourse

It’s helpful to know that the words used to describe “AI” systems are wildly inconsistent in how people use them. Here’s a bit of a glossary.

Agent:

A separate instance of a model with a task assigned to it.
A coding tool.
A tool of some kind for a user to use but that can operate in the background in some way.
A tool models can invoke
A service being marketed that uses AI internally.
A tool that other agents can use.

Agentic: in some way related to AI.

Orchestration: Yo dawg I heard you liked AI in your AI, so I put AI in your AI so your AI can AI while you AI your AI.

Vibe Coding:

coding with LLMs.
using LLMs to write code without looking or evaluating the results.

A coda

In the time it took to write this over a week or more, Claude Opus 4.5 gave way to Claude Opus 4.6. GLM-4.7 was surpassed by GLM-5 just today as I write this bit, but z.ai is now overloaded trying to bring it online and has no spare computing power. All my tools have had major updates this week. The pace of change is truly staggering. This is not a particularly good thing.

I may edit this article over time. No reason we can’t edit blog posts, you know. Information keeps changing with new data and context.

Now go out there and try your best to make the world better.