Skip to main content
YUFAN & CO.
Back to Blog
blog.categories.industry-insights

Why Pure AI Workflows Fail and the Shift to Deterministic Pipelines

Yufan Zheng
Founder · ex-ByteDance · MSc Peking University
1 min read
· Updated
Cover illustration for Why Pure AI Workflows Fail and the Shift to Deterministic Pipelines

You watch the Make scenario history flash red. The webhook fired, the JSON payload landed, and the ChatGPT API step confidently hallucinated a discount code that does not exist in your Stripe account. You manually fix the invoice, send an awkward apology to the client, and wonder why the automation that looked flawless in the YouTube tutorial is now creating more work than it saves.

You aren't the only one quietly turning off AI workflows. The enterprise giants are hitting the exact same wall, and they are finally admitting it out loud.

The eight-step amnesia gap

The eight-step amnesia gap is the point at which a large language model silently drops critical business rules because its context window is overloaded with operational instructions. You feed it a perfectly crafted prompt detailing exactly how to handle a customer return, and it works beautifully for the first three tests. Then, on day four, it skips a crucial validation step and authorises a full refund for a product you don't even sell.

This isn't a temporary glitch that a software update will fix. It is a structural limitation of how probabilistic models process text.

Salesforce recently admitted this exact problem at an enterprise scale. After redeploying roughly 4,000 support staff to bet heavily on AI agents, their executives confessed that their confidence in pure LLMs had sharply declined(https://timesofindia.indiatimes.com/technology/tech-news/after-claiming-to-redeploy-4000-employees-and-automating-their-work-with-ai-agents-salesforce-executives-admit-we-were-more-confident-about/articleshow/116999999.cms). Their CTO, Muralidhar Krishnaprasad, noted a brutal technical reality: once a model is given more than eight directives, it simply starts omitting instructions(https://timesofindia.indiatimes.com/technology/tech-news/after-claiming-to-redeploy-4000-employees-and-automating-their-work-with-ai-agents-salesforce-executives-admit-we-were-more-confident-about/articleshow/116999999.cms).

Think about your own customer service SOPs. A standard ticket resolution easily involves a dozen micro-decisions. Check the Shopify order status. Verify the Stripe payment. Read the Xero invoice. Cross-reference the 30-day return policy. If the LLM drops just one of those directives, the entire workflow fails.

Salesforce is now pivoting its flagship Agentforce product towards deterministic frameworks to eliminate the inherent randomness of large models(https://timesofindia.indiatimes.com/technology/tech-news/after-claiming-to-redeploy-4000-employees-and-automating-their-work-with-ai-agents-salesforce-executives-admit-we-were-more-confident-about/articleshow/116999999.cms). They realised that raw intelligence is useless without strict guardrails.

For SME owners, this is massive validation. You aren't failing at AI because your prompts are bad. You are failing because you are asking a text predictor to execute rigid business logic. Every time the model forgets a rule, you pay for it in manual rework, angry customers, and broken data.

Why the "mega-prompt" Zapier flow fails

Most founders try to fix AI unreliability by writing longer, more aggressive prompts in Zapier or Make. They think more instructions equal more control.

The opposite is true. When your ChatGPT step in Zapier spits out a bad JSON payload, the instinct is to add "CRITICAL: ALWAYS output valid JSON and NEVER invent a product ID."

Here's what actually happens at the API level. Every word you add dilutes the attention mechanism of the model. By trying to patch the eight-step amnesia gap with a 1,500-word mega-prompt, you increase the probability of hallucination.

The model doesn't read your prompt like a checklist. It calculates the statistical likelihood of the next token. If your customer's email contains the phrase "I demand a full refund," the model's training data... millions of internet arguments where refunds are eventually given... overpowers your custom instructions.

I see SMEs burn £500 a month on premium AI wrapper subscriptions, thinking the software will handle the logic. You connect it to your Gmail and Outlook, upload a PDF of your company policies, and hit go.

It works for basic FAQs. But when a customer asks a compound question: "Can I change my shipping address for order #1234 and add a second item?" The LLM tries to solve both problems simultaneously in a single API call.

It grabs the new address, hallucinates a stock confirmation for the second item, and replies to the customer. It doesn't actually hit the Shopify API to check inventory because you asked it to generate text, not execute a database query.

You cannot prompt your way out of a missing deterministic workflow. Adding more capitalised words to a Zapier node won't make an LLM behave like a strict API integration. It just makes the inevitable failure harder to debug.

Building deterministic AI workflows

Building deterministic AI workflows

The deterministic AI pipeline. Claude extracts the JSON data, but n8n handles the actual Xero API validation and routing.

The only way to build reliable automation is to separate the reasoning from the execution. You use the LLM strictly to extract data from messy human inputs, and you use hard-coded APIs to execute the business logic.

Here is a real system for processing inbound supplier invoices.

Not forwarding an email to a generic AI assistant and hoping it updates your accounting software. You build a rigid, multi-step pipeline.

First, an n8n webhook triggers when a new email lands in a specific Google Workspace inbox.

Second, n8n sends the email body and the attached PDF to the Claude API. Pay attention to this part. You do not ask Claude to "process the invoice." You use Claude's strict JSON schema mode. You tell it to extract exactly four fields: Supplier Name, Invoice Number, Total Amount, and Due Date. Nothing else.

Third, n8n takes that JSON output and runs a deterministic check. It queries the Xero API to see if the Supplier Name exists in your contacts.

If Xero returns a match, n8n proceeds to the next step. If Xero returns null... maybe the supplier is new, or Claude misread a blurry logo... the workflow stops. It does not guess. It routes the extracted data to a Slack channel for human review.

If the match is successful, n8n makes a final API call to Xero to create the draft bill.

This is what Salesforce means by predictable, rule-based automation(https://timesofindia.indiatimes.com/technology/tech-news/after-claiming-to-redeploy-4000-employees-and-automating-their-work-with-ai-agents-salesforce-executives-admit-we-were-more-confident-about/articleshow/116999999.cms). The LLM is only responsible for reading the messy PDF. The actual database updates are handled by deterministic API calls that either succeed 100% of the time or fail safely with an error code.

Building a pipeline like this in n8n takes 2 to 3 weeks of build time and costs between £4,000 and £8,000, depending on how messy your existing Xero data is.

The failure modes are entirely predictable. If Claude hallucinates a date format, swapping US and UK formats, the Xero API rejects the payload because it expects a strict ISO-8601 date string. The automation dies safely. No phantom bills are created. You catch the error in the n8n logs, adjust the JSON schema prompt to enforce the correct date format, and run it again.

You isolate the probabilistic AI inside a deterministic cage.

Where deterministic routing breaks down

Strict routing is not a universal fix. You need to know where the boundaries are before you commit time and capital to an n8n build.

If your core operating system lacks modern APIs, deterministic workflows become brittle. If your invoices come in as scanned TIFFs from legacy accounting software, you need a dedicated OCR layer before the LLM even sees the document. Once you introduce OCR, the error rate jumps from 1% to roughly 12%. The LLM will struggle to parse the garbage text, and your deterministic Xero API calls will constantly fail validation.

It also fails when the output requires genuine empathy or complex negotiation. You can build a strict pipeline to process a standard return, but you cannot build a deterministic flow to handle a furious client threatening to cancel a £50k contract.

Once a workflow requires subjective judgement rather than data extraction, you have to route it to a human. If you try to force an LLM to follow a strict decision tree for a complex emotional escalation, you will hit the eight-step amnesia gap all over again.

Deterministic AI works for high-volume, low-variance tasks. If every input is a unique snowflake, no amount of API routing will save you. Build rigid pipes for your data. Leave the relationship management to your team.

Three questions to sit with

The enterprise software market is quietly walking back its wildest AI promises. Salesforce's pivot is a warning signal for every SME owner trying to automate their operations. You don't have to learn this lesson the hard way by breaking your own customer relationships. Before you pay for another AI wrapper subscription, or spend your weekend wrestling with a broken Zapier flow, ask yourself:

Stop trying to prompt your way to reliability. Build the deterministic pipes, cage the AI inside them, and let your team get back to actual work. End of.

  1. Are you asking an LLM to make a definitive database decision that should actually be handled by a hard-coded API call?
  2. When your current AI automation fails, does it fail safely by alerting a human, or does it silently write hallucinated data into your core accounting systems?
  3. Have you structurally separated the messy, probabilistic job of reading human inputs from the strict, deterministic job of executing your business rules?

Get our UK AI insights.

Practical reads on AI for UK businesses — teardowns, how-to guides, regulatory news. Unsubscribe anytime.

Unsubscribe anytime.