AI Coding

Despite the massive step by Codex, they still have a large gap to close to Claude on the product side. Opus 4.6 is another step in the right direction, where Claude Code feels like a great experience. It’s approachable, it tends to work in the wide range of tasks I throw at it, and this’ll help them gain much broader adoption than Codex. If I’m going to recommend a coding agent to an audience who has limited-to-no software experience, it’s certainly going to be Claude. At a time when agents are just emerging into general use, this is a massive advantage, both in mindshare and feedback in terms of usage data.

Part of the problem is that most people are using the free version of AI tools. The free version is over a year behind what paying users have access to. Judging AI based on free-tier ChatGPT is like evaluating the state of smartphones by using a flip phone. The people paying for the best tools, and actually using them daily for real work, know what’s coming.

One of them, the managing partner at a large firm, spends hours every day using AI. He told me it’s like having a team of associates available instantly. He’s not using it because it’s a toy. He’s using it because it works.

This is how I’ve described Claude Code – it’s like I have a couple interns doing work for me.

Dario Amodei, who is probably the most safety-focused CEO in the AI industry, has publicly predicted that AI will eliminate 50% of entry-level white-collar jobs within one to five years. And many people in the industry think he’s being conservative. Given what the latest models can do, the capability for massive disruption could be here by the end of this year. It’ll take some time to ripple through the economy, but the underlying ability is arriving now.

This is different from every previous wave of automation, and I need you to understand why. AI isn’t replacing one specific skill. It’s a general substitute for cognitive work. It gets better at everything simultaneously. When factories automated, a displaced worker could retrain as an office worker. When the internet disrupted retail, workers moved into logistics or services. But AI doesn’t leave a convenient gap to move into. Whatever you retrain for, it’s improving at that too.

A lot of people find comfort in the idea that certain things are safe. That AI can handle the grunt work but can’t replace human judgment, creativity, strategic thinking, empathy. I used to say this too. I’m not sure I believe it anymore.

The most recent AI models make decisions that feel like judgment. They show something that looked like taste: an intuitive sense of what the right call was, not just the technically correct one. A year ago that would have been unthinkable. My rule of thumb at this point is: if a model shows even a hint of a capability today, the next generation will be genuinely good at it. These things improve exponentially, not linearly.

What you should actually do

I’m not writing this to make you feel helpless. I’m writing this because I think the single biggest advantage you can have right now is simply being early. Early to understand it. Early to use it. Early to adapt.

Start using AI seriously, not just as a search engine. Sign up for the paid version of Claude or ChatGPT. It’s $20 a month. But two things matter right away. First: make sure you’re using the best model available, not just the default. These apps often default to a faster, dumber model. Dig into the settings or the model picker and select the most capable option. Right now that’s GPT-5.2 on ChatGPT or Claude Opus 4.6 on Claude, but it changes every couple of months. If you want to stay current on which model is best at any given time, you can follow me on X (@mattshumer_). I test every major release and share what’s actually worth using.

Second, and more important: don’t just ask it quick questions. That’s the mistake most people make. They treat it like Google and then wonder what the fuss is about. Instead, push it into your actual work. If you’re a lawyer, feed it a contract and ask it to find every clause that could hurt your client. If you’re in finance, give it a messy spreadsheet and ask it to build the model. If you’re a manager, paste in your team’s quarterly data and ask it to find the story. The people who are getting ahead aren’t using AI casually. They’re actively looking for ways to automate parts of their job that used to take hours. Start with the thing you spend the most time on and see what happens.

And don’t assume it can’t do something just because it seems too hard. Try it. If you’re a lawyer, don’t just use it for quick research questions. Give it an entire contract and ask it to draft a counterproposal. If you’re an accountant, don’t just ask it to explain a tax rule. Give it a client’s full return and see what it finds. The first attempt might not be perfect. That’s fine. Iterate. Rephrase what you asked. Give it more context. Try again. You might be shocked at what works. And here’s the thing to remember: if it even kind of works today, you can be almost certain that in six months it’ll do it near perfectly. The trajectory only goes one direction.

Just Talk To It - the no-bs Way of Agentic Engineering | Peter Steinberger

This is a good read for AI coding. (Peter is the author of OpenClaw.)

The above reminds me of all the complex plugins I’ve looked at, and I get a little intimidated with my doc-driven development plugin. So it is refreshing to see this perspective - you don’t need to make it complex - use the model!

What about $openmodel#

I keep an eye on China’s open models, and it’s impressive how quickly they catch up. GLM 4.6 and Kimi K2.1 are strong contenders that slowly reach Sonnet 3.7 quality, I don’t recommend them as daily driver tho.

The benchmarks only tell half the story. IMO agentic engineering moved from “this is crap” to “this is good” around May with the release of Sonnet 4.0, and we hit an even bigger leap from good to “this is amazing” with gpt-5-codex.

Plan Mode & Approach#

What benchmarks miss is the strategy that the model+harness pursue when they get a prompt. codex is far FAR more careful and reads much more files in your repo before deciding what to do. It pushes back harder when you make a silly request. Claude/other agents are much more eager and just try something. This can be mitigated with plan mode and rigorous structure docs, to me that feels like working around a broken system.

I rarely use big plan files now with codex. codex doesn’t even have a dedicated plan mode - however it’s so much better at adhering to the prompt that I can just write “let’s discuss” or “give me options” and it will diligently wait until I approve it. No harness charade needed. Just talk to it.

What about MCPs

Other people wrote plenty about MCPs. IMO most are something for the marketing department to make a checkbox and be proud. Almost all MCPs really should be clis. I say that as someone who wrote 5 MCPs myself.

I can just refer to a cli by name. I don’t need any explanation in my agents file. The agent will try $randomcrap on the first call, the cli will present the help menu, context now has full info how this works and from now on we good. I don’t have to pay a price for any tools, unlike MCPs which are a constant cost and garbage in my context. Use GitHub’s MCP and see 23k tokens gone. Heck, they did make it better because it was almost 50.000 tokens when it first launched. Or use the gh cli which has basically the same feature set, models already know how to use it, and pay zero context tax.

This again reinforces that the terminal is the future for the foreseeable future. I’ve heard multiple times now that MCPs blow up context and gobble tokens.

Conclusion#

Don’t waste your time on stuff like RAG, subagents, Agents 2.0 or other things that are mostly just charade. Just talk to it. Play with it. Develop intuition. The more you work with agents, the better your results will be.

And yes, writing good software is still hard. Just because I don’t write the code anymore doesn’t mean I don’t think hard about architecture, system design, dependencies, features or how to delight users. Using AI simply means that expectations what to ship went up.

PS: This post is 100% organic and hand-written. I love AI, I also recognize that some things are just better done the old-fashioned way. Keep the typos, keep my voice. :high_speed_train::victory_hand:

Shipping at Inference-Speed

Another good article on AI coding by Peter Steinberger.

It’s also become clear to me that LLMs actively reward existing top tier software engineering practices:

  • Automated testing. If your project has a robust, comprehensive and stable test suite agentic coding tools can fly with it. Without tests? Your agent might claim something works without having actually tested it at all, plus any new change could break an unrelated feature without you realizing it. Test-first development is particularly effective with agents that can iterate in a loop.
  • Planning in advance. Sitting down to hack something together goes much better if you start with a high level plan. Working with an agent makes this even more important—you can iterate on the plan first, then hand it off to the agent to write the code.
  • Comprehensive documentation. Just like human programmers, an LLM can only keep a subset of the codebase in its context at once. Being able to feed in relevant documentation lets it use APIs from other areas without reading the code first. Write good documentation first and the model may be able to build the matching implementation from that input alone.
  • Good version control habits. Being able to undo mistakes and understand when and how something was changed is even more important when a coding agent might have made the changes. LLMs are also fiercely competent at Git—they can navigate the history themselves to track down the origin of bugs, and they’re better than most developers at using git bisect. Use that to your advantage.
  • Having effective automation in place. Continuous integration, automated formatting and linting, continuous deployment to a preview environment—all things that agentic coding tools can benefit from too. LLMs make writing quick automation scripts easier as well, which can help them then repeat tasks accurately and consistently next time.
  • A culture of code review. This one explains itself. If you’re fast and productive at code review you’re going to have a much better time working with LLMs than if you’d rather write code yourself than review the same thing written by someone (or something) else.
  • A very weird form of management. Getting good results out of a coding agent feels uncomfortably close to getting good results out of a human collaborator. You need to provide clear instructions, ensure they have the necessary context and provide actionable feedback on what they produce. It’s a lot easier than working with actual people because you don’t have to worry about offending or discouraging them—but any existing management experience you have will prove surprisingly useful.
  • Really good manual QA (quality assurance). Beyond automated tests, you need to be really good at manually testing software, including predicting and digging into edge-cases.
  • Strong research skills. There are dozens of ways to solve any given coding problem. Figuring out the best options and proving an approach has always been important, and remains a blocker on unleashing an agent to write the actual code.
  • The ability to ship to a preview environment. If an agent builds a feature, having a way to safely preview that feature (without deploying it straight to production) makes reviews much more productive and greatly reduces the risk of shipping something broken.
  • An instinct for what can be outsourced to AI and what you need to manually handle yourself. This is constantly evolving as the models and tools become more effective. A big part of working effectively with LLMs is maintaining a strong intuition for when they can best be applied.
  • An updated sense of estimation. Estimating how long a project will take has always been one of the hardest but most important parts of being a senior engineer, especially in organizations where budget and strategy decisions are made based on those estimates. AI-assisted coding makes this even harder—things that used to take a long time are much faster, but estimations now depend on new factors which we’re all still trying to figure out.

If you’re going to really exploit the capabilities of these new tools, you need to be operating at the top of your game.

:100: