Data Leakage in AI Workflows

A year ago, a developer on a team I was advising pasted a production database dump into Copilot Chat to ask why a query was slow. He got a helpful answer. He also sent several thousand records of customer PII to a third-party inference endpoint. Nothing was logged on our side. Nothing was flagged. He did not realize any of this had happened until I explained it to him.

Every AI workflow is a new data egress vector. Most organizations have not drawn the map. Here is how I think about it.

Five distinct leak paths

Training data leakage. If your vendor uses your prompts to train their next model, your data may resurface as someone else's completion. Most enterprise tiers now contractually exclude training use. Free and consumer tiers frequently do not. Read the terms you agreed to, not the marketing page.

Prompt-as-exfil. A user pastes sensitive data into a prompt. The data is now on the vendor's servers, regardless of whether the model trains on it. Retention periods in the contract matter here. "Thirty days of operational logs" is a very different posture from "never stored."

Secrets in prompts. API keys, passwords, connection strings. Developers paste them accidentally while debugging. Engineers include them in RAG corpora because nobody scanned the source docs. Once the secret is in someone else's logs, rotating it is your only defense, and you have to notice first.

Retrieval leakage across users. A RAG system pulls a document into context that the current user should not have access to. The model dutifully summarizes it. ACLs enforced at the UI layer but not at the retrieval layer are the classic failure mode.

Output leakage into downstream systems. The model writes a summary into a ticket that gets emailed to a vendor. Sensitive details were in the prompt, now they are in the summary, now they are in a third party's inbox. The data crossed a boundary and nobody logged it.

Reading vendor data use clauses

Three clauses matter more than the rest.

First, training use. "We do not train on your data" is the baseline. Make sure it is in the contract, not the FAQ.

Second, retention. How long is the raw prompt and completion stored? Where? Who at the vendor can see it? Thirty days is typical. Zero retention is available from most major vendors on enterprise tiers, sometimes at a premium. For regulated data, pay the premium.

Third, subprocessors. Many providers subcontract parts of the stack (hosting, moderation, evaluation). Your data goes to their subprocessors too. The list should be disclosed and updated. If it is not, that is a red flag.

Auditing patterns that work

You cannot secure what you cannot see. For every AI integration we run, the pipeline logs: the full prompt, the retrieved context identifiers (not the content, to avoid doubling the exposure), the model response, and the user. This lives in our own logging stack with the same retention policy as other sensitive logs. When a question comes up, forensic reconstruction is possible. When it does not, it mostly sits there.

DLP at the egress helps too. Our web proxy flags requests to known AI inference endpoints that contain patterns matching API keys, credit-card numbers, or social security numbers. It is noisy. It has also caught things I am glad we caught.

Isolated inference environments

For the data where the risk is real and the regulatory exposure is high, the answer is to keep inference inside your perimeter. We run a small Llama deployment on vLLM inside a VPC with no internet egress for exactly this class of workload. Quality is not as good as Claude. Quality also does not have to be frontier quality to beat the alternative of "we cannot touch this data with AI at all."

The organizational piece

None of this works without policy. Ours is short. Two paragraphs that say what data classes go where, what the sanctioned tools are, and what happens if you send customer data to a consumer AI endpoint. The point of the policy is not to punish. It is to give people a clear enough rule that they can self-correct before they paste.

Then you pair that with sanctioned alternatives that are actually good. If your enterprise AI tool is worse than ChatGPT, your policy is a piece of paper.