How to Redact PII from OpenAI API Calls
Every time your application calls the OpenAI API with user data, that data lands in OpenAI's servers, logs, and potentially training pipelines. Names, email addresses, phone numbers, Social Security numbers, credit card numbers — all of it.
With the EU AI Act enforcement starting August 2026 and existing regulations like GDPR, CCPA, and HIPAA, this is becoming a legal liability, not just a best practice.
This guide shows three approaches to redacting PII from OpenAI API calls, from manual to fully automated.
The Problem
Here's a typical customer support summarization call:
response = openai.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "user",
"content": f"Summarize this ticket: {ticket_text}"
}]
)
If ticket_text contains "Hi, my name is Sarah Johnson, email sarah.j@acme.com, SSN 078-05-1120", all of that goes to OpenAI. Even if OpenAI doesn't train on API data, the data still traverses their infrastructure, sits in logs, and is subject to their data processing terms.
Approach 1: Manual Regex (Fragile)
import re
def strip_pii(text):
text = re.sub(r'\b[\w.-]+@[\w.-]+\.\w+\b', '[EMAIL]', text)
text = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '[SSN]', text)
text = re.sub(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', '[PHONE]', text)
return text
Problems:
- Doesn't catch names (regex can't do named entity recognition)
- Misses edge cases: SSNs with spaces, international phone formats, accented names
- Trivially evaded: zero-width characters, Cyrillic lookalike letters
- No restoration — the response comes back with
[EMAIL]instead of the original address - You maintain the regex forever
Approach 2: Microsoft Presidio (Better, But Assembly Required)
from presidio_analyzer import AnalyzerEngine from presidio_anonymizer import AnonymizerEngine analyzer = AnalyzerEngine() anonymizer = AnonymizerEngine() results = analyzer.analyze(text=ticket_text, language="en") anonymized = anonymizer.anonymize(text=ticket_text, analyzer_results=results) # Now call OpenAI with anonymized.text # But you still need to: build the proxy, handle streaming, # map placeholders back, handle tool_calls, prevent evasion...
Problems:
- Presidio is a library, not a service — you build and maintain the proxy
- No built-in response restoration
- No streaming support
- No evasion resistance (zero-width chars, homoglyphs pass through)
- Doesn't handle OpenAI's content array format, tool_calls, or function_call fields
Approach 3: Veil (One Line Change)
Veil is a drop-in proxy that handles all of this automatically. Change your base URL and you're done:
// Before client = OpenAI( api_key=os.environ["OPENAI_API_KEY"], ) // After — PII never reaches OpenAI client = OpenAI( api_key=os.environ["OPENAI_API_KEY"], base_url="https://veil-api.com/v1", default_headers={ "Authorization": f"Bearer {os.environ['VEIL_API_KEY']}", "x-upstream-key": os.environ["OPENAI_API_KEY"], } )
What happens behind the scenes:
- Your app sends the request to Veil
- Veil detects and replaces 79+ types of PII with cryptographic tokens
- The sanitized request goes to OpenAI
- OpenAI's response comes back with tokens
- Veil restores the original values
- Your app gets a clean response with real names, emails, etc.
What OpenAI sees: <<VEIL_PERSON_a8f2c3d1e4f5>> instead of "Sarah Johnson".
What your app gets back: "Sarah Johnson" — fully restored.
What Veil Catches That Regex Doesn't
| Category | Examples |
|---|---|
| Personal info | Names, emails, phones, SSNs, addresses across 18 countries |
| Financial | Credit cards, IBANs, bank account numbers, routing numbers |
| Government IDs | Passports, driver's licenses, national IDs (US, UK, DE, IT, IN, KR, etc.) |
| Secrets | AWS keys, GitHub tokens, Stripe keys, GCP keys, JWTs, private keys |
| Crypto | Ethereum, Bitcoin, Litecoin, Monero wallets |
| Evasion attempts | Zero-width chars, Cyrillic homoglyphs, accent-based evasion |
Works with Any Provider
Veil isn't just for OpenAI. Set the x-upstream-provider header to route through Anthropic, Together, Groq, Mistral, DeepSeek, Fireworks, Perplexity, or any of 41 supported providers. Same code, same PII protection.
Compliance Coverage
- GDPR — personal data never reaches third-party processors
- EU AI Act — enforcement begins August 2026
- CCPA — California consumer data stays in your control
- HIPAA — PHI redacted before hitting external APIs
Try Veil Free
100 requests/month on the free tier. No credit card required. Change one URL and your LLM calls are compliant.
Get API Key View on GitHub