by
Andrey Pozdnyakov

In part 1 of our series Make AI Code Your Way we walked through how project-scoped SKILL.md files solved most of our "this code does not look like ours" problem on Claude Code. Two skills, a couple of TRIGGER clauses, and a one-line table at the bottom of CLAUDE.md were enough to make Claude reliably reach for the right skill mid-plan.
In this installment, we take the same skills, conventions and expectations to a different agent: Qwen Code CLI running against a locally hosted Qwen3-Coder-80B-A3B on our own local inference hardware.
The skills format is a defacto standard and identical skills should work across coding agents. However, the behavior was not identical in Qwen Code CLI. A little extra effort went into making the Qwen3-Coder-80B-A3B spontaneously call our skills.
Running Qwen Locally
We use Anthropic's frontier models Sonnet and Opus quite a bit, but for some projects we require locally hosted models. Here’s why Qwen3-Coder-80B-A3B is the local model we reach for the most:
Privacy
Some work can’t leave our network. A local model wins by default for any code we can’t send to a third party.
Cost
Frontier models earn their price when the problem is hard. But for the 60% of coding work that’s routine, a local model running on hardware we already own is much cheaper.
Qwen3-Coder-80B-A3B is lightning fast and relatively accurate on our RTX 6000 GPUs. But fast and accurate in the abstract is not the same as fast and accurate on a Qt/QML codebase that uses an aggressively dependency-injected, SOLID-shaped architecture. (Read more about that in Part 1).
Two things had to be true for Qwen to be productive on our project. First, the skills had to be written for it. Second, the skills had to actually get invoked.
Smaller Models Need More Hand-Holding
While Claude Sonnet or Opus could often infer our conventions from one or two existing classes, Qwen couldn’t. We had to spell out the conventions. That’s why our Qwen-facing skills are noticeably longer than the Claude-facing originals. There are more worked examples, more explicit "here is the wrong way, here is the right way" tables, and more discussion of edge cases.
It is said that free isn’t free if your time is worth money. We had to invest time in improving and verifying our skills and managing the model’s context. In the end it is an investment as these skills and techniques can be used on all Qt projects. These patterns helped us a lot:
Toolkit documentation via internal RAG
We run an internal RAG server that indexes Qt and QML documentation. Qwen calls into it through MCP for every Qt API question instead of answering from memory. Without it, Qwen hallucinates Qt signatures and enum values at an intolerable rate. But with it, accuracy on Qt APIs is vastly better.
Verbose, opinionated skills
Anywhere Claude would "follow the existing pattern," the Qwen-facing version needed specific examples. The skill body grew, while the variance in the generated code shrank. Turns out the improved skill is also backwards compatible with Claude so it is a win all around.
What’s the lesson here? A local model is not a drop-in replacement for a frontier model. It’s a different tool with different failure modes. We discovered that skills are a powerful way to close the gap, but only if we’re willing to invest in them more heavily than we do for the frontier model.
Qwen Rarely Invoked Skills On Its Own
What nearly sank our Qwen rollout was not code quality — RAG and careful skill content fixed many of the issues with Qt/QML code generation we saw. Instead, it was the fact that Qwen almost never loaded those skills on its own.
With the same project, the same skills and the same CLAUDE.md-style quick-reference (a QWEN.md), Qwen Code CLI was visibly worse than Claude Code at deciding "this prompt matches a skill, I should load it."
Explicit /skill invocation worked. Direct prompts sometimes worked. Plan execution almost never worked. A plan step that said "add a new QML component and test" would just start writing code without the skill loaded into context.
This may improve in a few versions since Qwen Code ships fast. But we needed something that worked now.
A Brief Word on Hooks
Qwen Code recently added a hooks system very similar to Claude Code's. A hook is a helper (in our case, a shell command) the agent runs at a specific session lifecycle point: before a prompt is submitted, after a tool call, on session start, and so on. The hook can place the contents of that shell command's stdout into context, and the agent folds that output into its next turn. It is a clean escape hatch for "I want to inject some text right here, every single time."
UserPromptSubmit + skill_reminder.md
We hooked into UserPromptSubmit. On every prompt, we injected the contents of a short skill_reminder.json file that tells Qwen which skills exist and when to use each one. It’s essentially the same idea as the QWEN.md table, but instead of relying on QWEN.md to stay salient through a long session, we force the reminder into the top of every user turn.
The hook configuration
In ~/.qwen/settings.json (or the project-level equivalent):
Command here should send reminder text in specific json format to stdin:
The skill_reminder.json file
Two points worth highlighting. First, this is the same content as the QWEN.md table. We are not telling Qwen anything new. Second, the file is intentionally short. A long reminder injected into every turn would burn tokens and crowd out user content. The whole point of the hook is to keep this short and reliable.
Here’s What Changed with the Hook Enabled
Qwen immediately began invoking skills on its own for direct prompts and — crucially — during plan execution. The table was not new information. It was information that was now right next to the prompt instead of several thousand tokens up the context window. Two key differences we noticed:
Plan steps that previously generated code from memory now started with the matching skill load, then wrote code that matched our architecture.
Direct prompts hit the right skill on the first turn instead of needing a follow-up reminder. The number of times we had to Ctrl-C and remind Qwen "use the skill" dropped to near zero on the projects where we deployed the hook.
This Applies Far Beyond Qwen
Not Qwen-specific, the lesson here is about how reliably an agent will follow instructions that live somewhere other than the active turn. Frontier models will tolerate a lot of distance between an instruction and the moment it matters. Smaller, faster, cheaper models tolerate a lot less. Hooks let you trade a tiny bit of context window for guaranteed proximity, and for important workflow rules, we’ve discovered that trade is almost always worth it.
If you’re running any small or self-hosted model in an agent harness, you should look at every "the model should remember to do X" rule and question whether it would be more reliable as a UserPromptSubmit injection than as a paragraph in an agent file.
In our experience, the answer is usually yes.
Building a Reliable Second Tier
Qwen3-Coder-80B-A3B is a real workhorse on hardware we already own. It produces good code on our Qt/QML projects once two things are true: the skills are written verbosely enough for it, and the skills actually get invoked. Project-specific skills, which we covered in part 1, handle the first. A UserPromptSubmit hook injecting a small skill_reminder.md file handles the second.
Together they’ve made Qwen a reliable second tier in our agent stack, capable of handling the routine 60% of our coding load while we save the frontier models for the hardest problems. In part 3 we’ll cover in detail when to use which model.
Disclaimer: Behavior described here was observed in our own Qt/QML evaluation harness running Qwen Code CLI against a self-hosted Qwen3-Coder-80B-A3B. Your mileage on a different stack may vary.
About the author
Andrey Pozdnyakov
Andrey is a Qt software developer at ICS with 20+ years of experience building high-performance user interfaces and data visualization systems across industries from agriculture to energy. He also specializes in AI-driven development, using LLMs like Qwen and Claude to create precise, model-aware coding systems that generate production-quality code on demand.

