The moment you paste a secret, a customer's email, or an internal pricing sheet into an AI prompt, you have lost control of that data — understanding what to keep out of context is the first rule of responsible AI use.
Categories of Sensitive Data That Must Never Enter AI Context
AI assistants operate by sending your input — your entire prompt — to a remote server for processing. Even if the provider claims not to train on your data, that data travels over a network, lands on infrastructure you do not control, and may be logged for abuse monitoring, debugging, or compliance purposes. This means the discipline of context hygiene is not optional; it is a baseline professional responsibility.
The following categories of data should be treated as off-limits by default:
Credentials and secrets. API keys, OAuth tokens, database connection strings, private SSH keys, JWT signing secrets, and any value that grants access to a system. Even a partial key or a rotated key that is "no longer active" should be excluded — it trains habits, and you may be wrong about whether it is truly inactive.
Personally Identifiable Information (PII). Real names paired with emails or phone numbers, government ID numbers, passport details, dates of birth, location history, and any other data that could identify a specific person. This includes your own colleagues' information — do not paste a Slack thread containing someone's personal details to ask the AI to summarize it.
Customer data. Production database records, support ticket contents that contain end-user information, analytics events tied to real users, and payment-related data (card numbers, bank account details) are all in this category. The bar here is high: even if data has been partially anonymized, residual identifiers may still make it PII under regulations like GDPR.
Internal confidential information. Unreleased roadmaps, financial projections, acquisition targets, internal pricing models, legal correspondence, and intellectual property that has not been publicly disclosed. Sharing this with a cloud AI tool is equivalent to sharing it with an external contractor without an NDA.
Learning tip: Create a personal pre-prompt checklist — before every AI interaction, spend five seconds asking: "Does this text contain a secret, a real person's details, a customer record, or something under NDA?" That five-second pause will prevent the majority of accidental disclosures.
How AI Tool Providers Handle Your Context Data
Understanding provider data policies is not bureaucratic box-checking — it changes how you should use each tool. Policies vary significantly across providers and product tiers.
Consumer vs. enterprise tiers. The free or personal tier of most AI tools (including ChatGPT Free, Claude.ai free, and Gemini consumer) may use your conversations to improve the model. Enterprise and API tiers typically include contractual commitments not to train on your data. If your company has not purchased the enterprise tier, your engineers using the consumer product are operating without those protections.
What "not training on data" actually means. Even providers who contractually commit to not training on your data still receive and process it. It transits their network, may be held temporarily in memory or logs, and could be accessed by employees for abuse investigation. "Not training" does not mean "never stored" — it means the data is not used to update model weights.
The API vs. the product interface. If you call an AI model via its API with an enterprise agreement, the data handling terms are defined in your contract. If an employee uses the web chat interface under a free account, those terms may differ entirely. Engineering teams must decide which interface is approved and communicate that clearly.
Learning tip: Bookmark your primary AI tool's data processing agreement (DPA) and read the section on "data retention" and "training." If you cannot find a DPA, that tool is consumer-grade and should not receive any data your company classifies as confidential.
Practical Substitution Patterns
The goal is not to make AI useless by never sharing context — it is to share the structural information the AI needs while stripping out the sensitive values.
Replace secret values with clearly fake placeholders.
Instead of:
My database URL is postgres://admin:s3cr3tP@[email protected]:5432/users
Use:
My database URL is postgres://USER:PASSWORD@HOST:PORT/DB_NAME
The AI can still help you parse the connection string, write a connection pool, or debug a timeout — it does not need the actual credentials.
Mask PII before sharing.
Instead of:
User Jane Doe ([email protected], DOB 1985-03-14) is getting a 403 on checkout.
Use:
A user (email: [USER_EMAIL], DOB: [REDACTED]) is getting a 403 on checkout.
The AI can still diagnose the authorization error without knowing who the user is.
Summarize instead of paste.
Instead of pasting a raw CSV of production records, describe the schema:
I have a users table with columns: id (uuid), email (varchar), created_at (timestamp), subscription_tier (enum: free/pro/enterprise).
The AI can write queries, migrations, and data processing logic without ever seeing a single real row.
Learning tip: Build a VS Code snippet or shell alias that automatically wraps selected text in
[REDACTED: REASON]tags. This makes masking fast enough that engineers will actually do it under time pressure.
How to Review Prompts Before Sending
Developing a habit of reviewing prompts before hitting Enter is like reviewing code before pushing — it takes ten seconds and catches mistakes that would otherwise be expensive.
A practical review involves scanning your prompt for:
- Any sequence that looks like a token or key (long random strings, strings with colons that look like
key:valueauth headers) - Email addresses, phone numbers, or names that refer to real people
- File paths that reveal internal system architecture (e.g.,
/home/prod-user/services/payment-gateway/) - SQL snippets that contain real table names from a production schema you have not disclosed
- Error messages that include stack traces with internal hostnames or IP addresses
If you find any of these, apply substitution before sending.
Learning tip: Treat your prompt review like a diff review — read it as if a stranger were going to see it. If you would not paste this into a public GitHub issue, do not paste it into a consumer AI tool.
Tooling to Detect Secrets Before They Leave Your Machine
Relying solely on human vigilance is not sufficient at scale. Use tooling as a backstop.
git-secrets scans commits and staged files for patterns that look like credentials. It is lightweight and integrates as a pre-commit hook, which means it catches secrets before they enter version control — a natural checkpoint before they might be pasted anywhere.
truffleHog performs deep scanning using entropy analysis and regex pattern matching to find secrets in git history, file systems, and even S3 buckets. It is useful for retrospective audits of repositories.
gitleaks is a fast, configurable secret scanner that supports custom rule sets and integrates with CI pipelines. You can run it as a GitHub Actions step or a local pre-commit hook.
Clipboard managers with redaction. Some clipboard tools can be configured to auto-clear clipboard contents after a timeout, reducing the risk that a copied secret sits in your clipboard while you are working in an AI chat window.
None of these tools scan what you type directly into an AI chat. The human review step before sending remains irreplaceable — but tooling reduces the surface area of accidental exposure through adjacent channels.
Learning tip: Add gitleaks or git-secrets as a pre-commit hook on every new repository as part of your standard project setup template. Making it the default means it is always there without requiring individual engineers to remember to install it.
Hands-On: Building a Context Hygiene Habit
Work through this exercise to establish a personal and team-level context hygiene practice.
Step 1: Audit a recent prompt you sent.
Look at your AI chat history from the past week. Copy one of the longer prompts you sent and paste it into a text editor. Scan it using the checklist above.
Review the following prompt text for security and privacy risks. Identify any secrets, PII, internal confidential data, or system details that should have been redacted before sending. For each issue found, suggest a safe substitution.
[PASTE YOUR PREVIOUS PROMPT HERE]
Expected result: The AI will identify specific phrases that represent exposure risks and suggest masked alternatives. This builds your pattern recognition for future prompts.
Step 2: Create a sanitized version of a real debug scenario.
Take a real bug you are currently investigating. Write the prompt twice: once with the actual details, once with all sensitive information substituted. Compare them.
I am debugging a 500 error in our payment processing service. The error occurs when a user with subscription tier "enterprise" attempts to upgrade to the "enterprise_plus" tier. The relevant code calls an external billing API. Here is the stack trace (hostnames replaced with [HOST], tokens replaced with [TOKEN]):
[PASTE SANITIZED STACK TRACE]
What are the most likely causes and how should I investigate?
Expected result: You should find that the sanitized prompt gets equally useful debugging guidance. This demonstrates that sensitive details rarely add value to the AI's response.
Step 3: Install gitleaks as a pre-commit hook.
brew install gitleaks
cat > .git/hooks/pre-commit << 'EOF'
#!/bin/bash
gitleaks protect --staged -v
EOF
chmod +x .git/hooks/pre-commit
Expected result: The next time you attempt to commit a file containing a pattern that matches a known secret format, gitleaks will block the commit and show you which file and line triggered the alert.
Step 4: Write your team's context hygiene one-pager.
Help me write a one-page context hygiene policy for a 12-person engineering team that uses Claude and GitHub Copilot daily. The policy should cover: what data categories are prohibited from AI context, what substitution patterns to use, how to handle accidental disclosure, and who to notify if a disclosure occurs. Keep it practical and short enough that engineers will actually read it.
Expected result: A draft policy document ready for team review, typically 400–600 words covering prohibitions, substitutions, incident response, and tooling recommendations.
Step 5: Build a pre-prompt template.
Create a reusable template you prepend to all prompts when you are in any doubt:
CONTEXT HYGIENE REMINDER: I have reviewed this prompt and confirmed it does not contain: (1) API keys, tokens, or credentials, (2) real PII including names, emails, or government IDs, (3) customer data from production systems, (4) unreleased roadmap, financial, or legal information. All sensitive values have been replaced with [PLACEHOLDER] notation.
[YOUR ACTUAL PROMPT FOLLOWS]
Expected result: Using this template as a mental forcing function trains the habit of review before every send.
Key Takeaways
- Secrets, PII, customer data, and confidential internal information should never appear in AI context — use placeholder substitution to provide structure without sensitive values.
- Consumer and enterprise AI tiers have meaningfully different data handling agreements; verify which tier your team is using before treating it as safe for business data.
- Prompt review before sending is the most important human-layer control — five seconds of scanning catches most accidental disclosures.
- Tool-layer controls (gitleaks, truffleHog, git-secrets) reduce exposure through adjacent channels but do not replace the human review step.
- A written team context hygiene policy, even a short one, dramatically improves consistency — ambiguity about what is allowed leads to the worst-case defaults.