Your HR manager opens their inbox at 8:45 AM. A disgruntled former sales rep has just submitted a Data Subject Access Request. But it isn't a standard two-line email asking for a personnel file. It's a four-page, perfectly formatted legal demand.

The text asks for Microsoft 365 telemetry data, Slack message edit histories, raw server logs, and building swipe-card records. The rep didn't write this. ChatGPT wrote it. They just typed a prompt asking for a punishing GDPR request for their old boss and hit send.

You're now legally obligated to respond within 30 days. You have 400 gigabytes of unstructured data to sift through. This is the new reality of offboarding.

The synthetic DSAR payload

The synthetic DSAR payload is a data access request that an LLM artificially inflates to demand every conceivable metadata log, draft, and communication record across your entire tech stack. It's a structural shift in how ex-employees and angry customers weaponise GDPR.

Before recent AI tools, a complex data request took actual legal knowledge to draft. Now, it takes a free OpenAI account. The requester spends ten seconds generating the text. Your operations team then spends three weeks manually redacting PDFs. The asymmetry of effort is brutal.

You might think you can reject these outright. You can't. The ICO addressed this exact scenario at the Data Protection Practitioners Conference 2025. They confirmed that a request isn't invalid just because an algorithm wrote it. You still have to process the underlying demand.

Osborne Clarke's December 2025 HR guidance highlights how these automated demands are breaking SME compliance teams. They note that employees increasingly use these tools to intentionally disrupt operations. Founders are pulling senior staff off active revenue-generating projects just to read through thousands of Slack messages and Outlook threads.

The problem persists because regulators built the legal framework for humans asking humans for specific files. They didn't build it for machines generating maximum friction legal traps. When an LLM asks for all associated processing metadata, your junior accounts assistant doesn't know what that means. They just know the 30-day deadline is ticking and the fines are massive.

Why throwing basic AI at the problem fails

Using a standard ChatGPT Plus subscription to parse and redact your company data is a massive breach of privacy that actively creates more legal liability. Most SMEs think they can fight AI with AI. They receive a synthetic DSAR payload and try to solve it with a prompt.

The usual instinct is to export all emails to a CSV, upload it to a web-based LLM, and ask the tool to remove other people's names. Here's what actually happens: When you upload a 50MB file of raw Slack exports into a public AI interface, you act as a data controller sharing personally identifiable information with an unauthorised third-party processor. You just breached GDPR to comply with GDPR. The fines for unauthorised data sharing often exceed the penalties for a delayed DSAR response.

Even if you pay for an enterprise tier with zero data retention, the technical approach misses the mark. Off-the-shelf LLMs fail at redaction because they are next-token predictors, not rule-based search engines.

They guess the most likely next word based on patterns. If you ask a model to redact the name Sarah, it will dutifully replace the word Sarah with a black box. But it will completely ignore the surrounding context. It leaves intact sentences like the new marketing director who started last month or John's wife. The identity is still entirely exposed. Any employee reading that document knows exactly who the text refers to.

In my experience running test data through these models, standard prompt-based redaction misses contextual identifiers in about one out of every five documents. A human reviewer then has to read the entire output anyway to catch the mistakes.

You end up paying for a tool that doesn't save you time. You just shifted the bottleneck from initial reading to quality control. The 30-day deadline keeps ticking, and your legal exposure hasn't dropped at all.

The local redaction and extraction pipeline

A secure extraction pipeline combines deterministic search tools to gather the data with a strict, API-driven LLM to evaluate relevance before any human reads it. You don't want a human reading 10,000 irrelevant emails. You also don't want an LLM guessing at redactions.

Take a specific example. A former employee demands all internal communications mentioning their performance. Your initial search across Microsoft 365 yields 4,500 emails and Teams messages.

Step one is containment. You use Microsoft Purview eDiscovery to run the initial keyword and date-bound search across Outlook and Teams. You export this as a secure JSON file.

Step two is filtering. An n8n workflow picks up the raw export. n8n chunks the text and calls the Claude API. You use Claude because its JSON mode is highly reliable. You configure the Claude API to drop the temperature to zero. You want absolute deterministic behaviour, not creative thinking.

The prompt forces a strict schema output. It asks Claude one simple question: Does this message explicitly discuss the subject's performance? Return true or false.

Claude isn't reading to redact. It's reading to filter. The 4,500 messages drop to 120 actual hits.

Step three is deterministic redaction. For those 120 messages, you skip the LLM entirely. You route the text through a dedicated natural language processing tool like AWS Comprehend. Comprehend uses named entity recognition to identify third-party names, addresses, and financial details. It replaces them with fixed tags like [PERSON_1] or [LOCATION_A]. Because it operates on deterministic rules rather than generative guesses, the redaction is consistent.

This system takes about 2-3 weeks of build time. Expect to spend £6k-£12k depending on how messy your existing Microsoft Workspace or Google Workspace permissions are.

The main failure mode is rate limiting. When n8n fires 4,500 concurrent requests at the Claude API, the endpoint will throttle you and the workflow will die. And yes, that's annoying. You catch this by building exponential backoff into the n8n HTTP request node. You force the system to pause and retry when it hits a limit.

Your HR manager now only reviews 120 pre-redacted messages instead of 4,500 raw emails. They do a final sanity check, approve the generated PDF, and send it off to the former employee. You hit the 30-day deadline with weeks to spare, and your core team never had to stop their actual work.

Where the automated pipeline breaks down

Automated extraction pipelines fail completely when scanned images or legacy on-premise servers trap your historical business data. This isn't a magic fix for bad data hygiene.

Before you commit to building a custom n8n flow, you need to check your inputs. If your HR records, disciplinary notes, or supplier disputes exist as scanned TIFFs from a legacy filing system, the text is invisible to an API. You need to run Optical Character Recognition first to convert those pixels into text.

Once you introduce OCR, the error rate jumps from near zero to roughly 15%. A smudge on a scanned 2021 performance review turns the name Colin into Coin. Your deterministic redaction tool will miss it entirely. The unredacted name slips through to the final export.

The same applies to shadow IT. If your sales reps use WhatsApp on personal phones to discuss client accounts, no Microsoft Purview search will find those messages. No API can pull that data technically or legally. You are back to asking staff to manually screenshot their phones.

Check where your data actually lives. If it's unstructured and offline, fix your storage before you try to automate your compliance.

Three mistakes to avoid

Don't ignore the request just because a machine wrote it. It's tempting to delete a four-page demand that clearly came from ChatGPT. Don't do this. The ICO has made it clear that the origin of the text doesn't invalidate the legal right of the individual. If you ignore it, the requester will escalate to the regulator. You will miss your 30-day window and face an immediate compliance investigation.
Don't dump raw system exports onto the requester to save time. When faced with a massive request, some founders just export the entire Slack channel history and hand it over. This is a catastrophic error. That export contains personal data of other employees, proprietary business logic, and third-party supplier details. You solve the DSAR deadline but instantly trigger a massive data breach. Always filter and redact.
Don't try to build your extraction pipeline while the clock is ticking. You can't design, test, and ship a secure n8n and Claude API workflow in the middle of an active 30-day compliance window. You will rush the JSON schema, skip the exponential backoff, and leak data in the testing phase. Build the system when your inbox is quiet. Test it on dummy data. Have it ready on the shelf for when the inevitable request lands.

Managing the Synthetic DSAR Payload: AI-Automated Legal Demands

The synthetic DSAR payload

Why throwing basic AI at the problem fails

The local redaction and extraction pipeline

Where the automated pipeline breaks down

Three mistakes to avoid

Get our UK AI insights.