How to Redact PII from OpenAI API Calls

April 3, 2026 · 5 min read

Every time your application calls the OpenAI API with user data, that data lands in OpenAI's servers, logs, and potentially training pipelines. Names, email addresses, phone numbers, Social Security numbers, credit card numbers — all of it.

With the EU AI Act enforcement starting August 2026 and existing regulations like GDPR, CCPA, and HIPAA, this is becoming a legal liability, not just a best practice.

This guide shows three approaches to redacting PII from OpenAI API calls, from manual to fully automated.

The Problem

Here's a typical customer support summarization call:

response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{
        "role": "user",
        "content": f"Summarize this ticket: {ticket_text}"
    }]
)

If ticket_text contains "Hi, my name is Sarah Johnson, email sarah.j@acme.com, SSN 078-05-1120", all of that goes to OpenAI. Even if OpenAI doesn't train on API data, the data still traverses their infrastructure, sits in logs, and is subject to their data processing terms.

Approach 1: Manual Regex (Fragile)

import re

def strip_pii(text):
    text = re.sub(r'\b[\w.-]+@[\w.-]+\.\w+\b', '[EMAIL]', text)
    text = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '[SSN]', text)
    text = re.sub(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', '[PHONE]', text)
    return text

Problems:

Doesn't catch names (regex can't do named entity recognition)
Misses edge cases: SSNs with spaces, international phone formats, accented names
Trivially evaded: zero-width characters, Cyrillic lookalike letters
No restoration — the response comes back with [EMAIL] instead of the original address
You maintain the regex forever

Approach 2: Microsoft Presidio (Better, But Assembly Required)

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()

results = analyzer.analyze(text=ticket_text, language="en")
anonymized = anonymizer.anonymize(text=ticket_text, analyzer_results=results)

# Now call OpenAI with anonymized.text
# But you still need to: build the proxy, handle streaming,
# map placeholders back, handle tool_calls, prevent evasion...

Problems:

Presidio is a library, not a service — you build and maintain the proxy
No built-in response restoration
No streaming support
No evasion resistance (zero-width chars, homoglyphs pass through)
Doesn't handle OpenAI's content array format, tool_calls, or function_call fields

Approach 3: Veil (One Line Change)

Veil is a drop-in proxy that handles all of this automatically. Change your base URL and you're done:

// Before
client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
)

// After — PII never reaches OpenAI
client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
    base_url="https://veil-api.com/v1",
    default_headers={
        "Authorization": f"Bearer {os.environ['VEIL_API_KEY']}",
        "x-upstream-key": os.environ["OPENAI_API_KEY"],
    }
)

What happens behind the scenes:

Your app sends the request to Veil
Veil detects and replaces 79+ types of PII with cryptographic tokens
The sanitized request goes to OpenAI
OpenAI's response comes back with tokens
Veil restores the original values
Your app gets a clean response with real names, emails, etc.

What OpenAI sees: <<VEIL_PERSON_a8f2c3d1e4f5>> instead of "Sarah Johnson".

What your app gets back: "Sarah Johnson" — fully restored.

What Veil Catches That Regex Doesn't

Category	Examples
Personal info	Names, emails, phones, SSNs, addresses across 18 countries
Financial	Credit cards, IBANs, bank account numbers, routing numbers
Government IDs	Passports, driver's licenses, national IDs (US, UK, DE, IT, IN, KR, etc.)
Secrets	AWS keys, GitHub tokens, Stripe keys, GCP keys, JWTs, private keys
Crypto	Ethereum, Bitcoin, Litecoin, Monero wallets
Evasion attempts	Zero-width chars, Cyrillic homoglyphs, accent-based evasion

Works with Any Provider

Veil isn't just for OpenAI. Set the x-upstream-provider header to route through Anthropic, Together, Groq, Mistral, DeepSeek, Fireworks, Perplexity, or any of 41 supported providers. Same code, same PII protection.

Compliance Coverage

GDPR — personal data never reaches third-party processors
EU AI Act — enforcement begins August 2026
CCPA — California consumer data stays in your control
HIPAA — PHI redacted before hitting external APIs

Try Veil Free

100 requests/month on the free tier. No credit card required. Change one URL and your LLM calls are compliant.

Get API Key View examples