Why HMRC AI Claims Are a Myth and How to Really Survive Audits

You are staring at a 14-page compliance check letter from HMRC. It arrived this morning. The questions feel robotic, generic, and completely disconnected from the custom logistics software your team spent eight months building.
You immediately assume a machine wrote it. You assume an AI scanned your R&D tax claim, misunderstood the codebase, and automatically issued a rejection.
You aren't alone. Every founder I speak to right now thinks they are fighting a rogue algorithm. They are convinced their £60k tax credit is being held hostage by ChatGPT's bureaucratic cousin. They are wrong.
The phantom auditor myth
The phantom auditor myth is the false belief that HMRC uses generative AI to read technical narratives and automatically reject SME tax claims. It's a comforting lie that founders tell themselves when a claim gets flagged. It shifts the blame from poor record-keeping to an unfeeling machine. But the tribunal records show a completely different reality.
In September 2025, following a two-year Freedom of Information battle, HMRC finally disclosed its actual AI usage [source](https://easyrnd.co.uk/tribunal-orders-hmrc-to-disclose-its-use-of-ai-in-rd-tax-reviews/). They confirmed that no generative AI is used by the R&D Tax Relief Compliance Team. Zero. All correspondence is drafted and reviewed by human caseworkers. No automated letters are sent to businesses.
So why does the letter feel so robotic? Because human caseworkers use standard templates. They copy and paste generic questions from a central database.
The system flagging your claim isn't an advanced neural network reading your PDF. It's a basic algorithmic risk-profiling script. This script looks at structured data. It compares your Additional Information Form against your CT600 tax return. It checks for sudden spikes in subcontractor costs. It flags mismatches in Standard Industrial Classification codes.
If your numbers trip a threshold, the script puts your file in a queue. A human picks it up, glances at it, and sends a templated letter.
The phantom auditor myth persists because the alternative is harder to swallow. It's easier to blame a rogue AI than to admit your claim data was structurally messy. Once you understand that the initial filter is just a dumb script, your entire approach to compliance has to change.
Why the obvious fix fails
The most common and catastrophic fix SMEs try is using an LLM to generate a massive, jargon-heavy technical report. When founders think an AI is reading their claim, they try to fight fire with fire. They export hundreds of Jira tickets, dump them into a Claude window, and ask it to write an HMRC-compliant narrative.
The result is usually a 40-page PDF filled with dense, repetitive paragraphs. The founder thinks this wall of text will satisfy the algorithm. This is the exact opposite of what you need to do.
Here's the mechanism you are missing. HMRC's risk profiling engine doesn't read your PDF. It can't parse the nuance of your prompt-engineered masterpiece. The initial flag is triggered purely by the financial inputs and categorical data in your digital submission. The PDF is ignored until the file lands on a human caseworker's desk.
Pay attention to this part. When that caseworker opens your AI-generated report, they are already tired. They have fifty files to review this week. They start reading a document that uses the phrase technological uncertainty seventeen times on the first page, but never actually explains what broke.
The AI has smoothed over the rough edges. It has removed the specific, messy details of your engineering failures and replaced them with corporate jargon. The caseworker can't find the actual innovation. They get frustrated, assume the claim is inflated, and issue a rejection. End of.
In my experience reviewing these setups, a £50k R&D claim backed by an AI-generated narrative actually doubles your risk of a prolonged enquiry. You are giving a human reviewer a document that feels evasive. You are hiding the real engineering work behind a wall of synthetic text. Not smart. Not effective.
The approach that actually works

A technical architecture for automated compliance capture using Slack, Make, and Claude to log engineering failures into structured Airtable records.
The only reliable way to survive a compliance check is to build an automated system that captures raw engineering failures as they happen. You need to stop writing retrospective narratives at year-end. You structure the data weekly, so the human caseworker gets exactly what they need.
Here's how you actually build this. You start where your engineers already live. You create a dedicated Slack channel called #rnd-log. You tell your team to drop a quick note whenever they hit a wall. No formal writing. Just a raw brain dump of what failed.
A webhook in Make listens to that specific Slack channel. When a message lands, the webhook triggers a Claude API call. This is where you use AI, but you use it strictly. You enforce a rigid JSON schema in the API call.
The schema forces Claude to extract three specific things: the project name, the technical baseline, and the uncertainty encountered. The webhook then takes that parsed JSON and PATCHes a new row into an Airtable base.
Let's look at a real example. An engineer types: Stripe webhook is double-firing on the custom checkout flow, the payload is dropping the session ID and I have no idea why. Claude extracts Stripe webhook double-firing dropping session ID as the uncertainty. Airtable logs it with a timestamp.
Come year-end, you don't write a story. You export a structured CSV from Airtable containing 40 specific, time-stamped engineering failures. You hand that to your accountant. When the HMRC caseworker reads it, they see real work. They see dates, specific technical hurdles, and actual human frustration.
You can build this pipeline in about two weeks. It costs roughly £2k to £4k in setup time, depending on how complex your Airtable base is. The running cost is maybe £20 a month in API credits.
The main failure mode is garbage input. If an engineer just types fixed the database, the Claude API extracts database fix. The uncertainty field is effectively null. You catch this by adding a Slack bot response.
If the JSON returns a vague uncertainty, the Make scenario pings the engineer back in a thread: What exactly broke? It forces clarity at the point of failure. It stops the silent data loss that ruins claims nine months later.
Where this breaks down
This continuous-capture system breaks down entirely if your R&D involves physical hardware or messy external subcontractor invoices. You need to check your operational reality before you build it. It relies on your team communicating technical problems in text, in real time.
If you're a manufacturing SME building physical prototypes on a factory floor, the uncertainties aren't captured in Slack. They are captured in scrapped materials and physical testing logs. A Slack integration is useless here. You need a completely different capture mechanism, usually involving tablet-based forms at the testing station.
It also fails if your subcontractor data is a mess. If your external dev shop sends you scanned PDF invoices with no line-item detail, internal tracking won't save you. The HMRC risk script will flag the vague subcontractor costs long before anyone reads your Airtable logs.
If your invoices come in as scanned TIFFs from a legacy accounting system, you need OCR first. Once you introduce OCR, the error rate jumps from 1% to around 12%. You end up spending more time fixing bad transcription than you would have spent just writing the report. Fix the data source first.
Don't build a digital tracking system for a physical process. And don't expect a slick Airtable base to hide the fact that your core financial data is unstructured. The foundation has to be solid.
Three mistakes to avoid
The biggest anti-patterns in R&D compliance stem from misunderstanding who is actually reviewing the claim.
- Don't prompt-engineer your way out of a weak claim. Founders love feeding vague project notes into an LLM and asking it to generate a technical narrative. The output looks professional, but it lacks the specific engineering failures that qualify for relief. A human caseworker will spot the synthetic fluff immediately. When they can't find the actual technological advance, they will reject the claim.
- Don't ignore the digital data structure. HMRC's initial risk profiling is entirely based on the numbers you submit in the Additional Information Form and the CT600. If your categorical data is wrong, or your subcontractor costs spike inexplicably, you'll trigger an automatic flag. No amount of brilliant technical writing can bypass this initial algorithmic filter. Get the structured data right first.
- Don't assume a machine is making the final call. The tribunal disclosure proved that human caseworkers handle every flagged claim. They are reading your documents. If you format your submission to appease a hypothetical AI auditor, you're alienating the actual person who holds the power to approve your tax credit. Write for a tired, overworked human who just wants to see clear evidence of technological uncertainty.
Get our UK AI insights.
Practical reads on AI for UK businesses — teardowns, how-to guides, regulatory news. Unsubscribe anytime.
Unsubscribe anytime.