Harnessing AI for Production
Yes, it's possible to ship non-slop to prod.
Reality is somewhere in the middle.
From the beginning of the AI movement, there’s been a constant bombardment of AI related content. On social media, it seems to break down into three main camps, those that hype it up as the way to Utopia, those that claim it’s garbage and will bring about Dystopia, and those often quieter folks who live somewhere in the middle. I’ve found that the middle crowd is usually filled with people that either have been working in machine learning (ML) before the AI boom, have extensive engineering backgrounds and are excited about technology changes but reasonable in their approach and continuously learn.
I started off closer to the pessimistic side of the groups. Having some, albeit small in comparison, experience with ML, I saw AI as just a trained word calculator. For LLMs, it is just that. Despite that, I’ve learned how to harness its capabilities and use it every day. It seems like now, we’re at a bit of a plateau, (even though the founders say otherwise) of the AI model’s ability to support engineering and programming. And because of this, focusing on the harness around the agent loop is where the most gains for day-to-day engineering exist.
What is harness engineering?
Harness engineering is a new hot topic going around the ether. Mitchell Hashimoto recently wrote a post on this that provides his own simple definition:
I don't know if there is a broad industry-accepted term for this yet, but I've grown to calling this "harness engineering." It is the idea that anytime you find an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again.
This definition highly resonates with me because it takes a simple engineering and critical thinking approach to a non-deterministic system. Why wouldn’t we want to engineer for correctness with these new AI tools? That’s what the model trainers are doing by baking in tool calls directly after-all. Why shouldn’t engineers, and really, any AI user, not optimize their AI systems to produce the best results?
Seems obvious, right?’
Bootstrapping the harness
Taking inspiration from Dex Horothy’s talks Advanced Context Engineering for Agents and No Vibes Allowed I decided to bootstrap my own AI setup and make it such that it’s (mostly) reproducible across agentic tools and devices. It’s heavily inspired and derived from the original Advanced Context Engineering for Agents talks, with the iteration of tuning prompts and skills instead of solely relying on commands.
My workflow has been simple and consistent, allowing me to better recognize the unique patterns that each model has. And it’s enabled me to build amazing prototypes, ship code all the way to production, operate platforms, build home labs, and contribute more readily to Open Source.
I’ve published this harness on my GitHub creatively named ai-engineering-harness – I know, super original. As of this writing, it works on ClodeCode CLI, OpenCode (my primary agentic tool), and GeminiCLI.
This repository can be used in multiple facets. It can be cloned down as-is and stowed onto your filesystem. It can be incorporated in an MDM to bootstrap an employee machine. Or it can be bundled up into a local development accelerator that anyone has custom coded. The key is that you can start from day-one with a tuned workflow to produce quality results.
The bootstrapped workflow
Context engineering puts thoughtfulness upfront as part of the workflow. It’s not a “I’m going to throw in a sentence of an abstract thought and hope that somehow the code comes out” workflow. It’s a “I’m intentionally building something that I want a high confidence that the code will bring out the proper behavior” workflow.
It works like this;
I think about what I want to build, the behaviors I’m looking for, and do some up-front research written down into a “ticket.”
Then I run
/create_planagainst the ticket.The harness agents will read through the ticket, analyze the codebase, research the web and documentation, and come back with a proposed plan draft at a high level, along with some questions.
Often what I’ve found from this workflow is that the models will infer ideas formulated into questions that I hadn’t thought about or had but didn’t write into the original ticket. It enables me to have some decent up-front dialog that fully formulates the idea and context for the plan. From here I;
I review the plan and run
/implement_plan.I run
/commitor occasionally iterate once I’ve reviewed thegit diff
The plan created is not the normal “plan” mode that agentic tools come pre-bundled with. It’s deterministic and sequential. The models and agents don’t deviate from the plan during implementation. It’s basically deterministic at this point, where the /create_plan was more non-deterministic.
Spending up-front time in reading and editing the plan is where the biggest production grade benefits are had. It’s also where learning happens. It’s the opportunity to learn about what will be generated and why. It’s where one can learn new patterns and coding paradigms.
Thus, this is the workflow that the harness brings forth to any machine. And, it’s something that should be continually iterated on to improve the behaviors and results from AI agents.
Fin
As engineers, we like to say that “writing code” was never the problem. And while that statement I think is generally true, it certainly at times is a bottleneck (especially if you’re one of those senior folks who loves to engineer but also has way to many meetings every day). But as engineers, we’ve spent most of our time reading code rather than writing it, and that holds even more true today.
When used right, and with the right quality of output, AI can accelerate our learning as well. While it’s amazing to experiment, and we should be constantly experimenting with new tools, patterns, and technologies, we also have a lot of knowledge and experience of our own that can guide these new AI tools to the outcomes we’re actually looking for. AI Harness engineering not only brings forth a paved path to production for AI-assisted engineering, but it also sets a pattern for the continuous iteration and learning required for tuning the outputs of agentic systems.
All while shipping to production more reliably and with higher levels of confidence.

