PCI DSS & Payment Data with LLMs
Understanding why cardholder data should never enter LLM systems and how to safely use AI for payment-related workflows
The Payment Card Industry Data Security Standard (PCI DSS) sets strict requirements for organizations that handle credit card information. Unlike other compliance frameworks that may permit regulated data in LLM systems under certain conditions, PCI DSS requires that cardholder data should NEVER be transmitted to third-party LLM services. Understanding these restrictions and implementing safe alternatives is critical for any business processing payments.
⚠️ Critical Warning
PCI DSS violations can result in fines of $5,000-$100,000 per month, card brand penalties, increased transaction fees, and potential loss of the ability to process credit card payments. A single data breach can cost millions in remediation, legal fees, and reputational damage.
What is Cardholder Data (CHD)?
PCI DSS protects two categories of payment data that must NEVER be sent to LLM services:
Primary Account Number (PAN) - The Card Number
- The 13-19 digit number on the front of payment cards
- Even partial PANs (first 6 or last 4 digits alone are OK, but together are restricted)
- Truncated or masked versions in certain contexts
- Tokenized equivalents if reversible to the original PAN
Sensitive Authentication Data (SAD) - NEVER Store After Authorization
- CVV/CVC codes (3-4 digit security code)
- Full magnetic stripe data or chip equivalent
- PIN or PIN blocks
- These must NEVER be stored after transaction authorization, even encrypted
Additional cardholder data that must be protected if stored alongside PAN:
- Cardholder name
- Expiration date
- Service code
Why Cardholder Data Can't Go to LLMs
PCI DSS Requirement 3.3 states: "Sensitive authentication data is not stored after authorization (even if encrypted)." Requirement 3.4 mandates rendering PAN unreadable through specific methods. Sending CHD to third-party LLM services violates these requirements:
❌ Loss of Control
Once CHD enters an LLM service, you lose control over how it's stored, processed, and protected. The data may be logged, cached, or used for model training.
❌ Cardholder Data Environment (CDE) Breach
Transmitting CHD to an LLM expands your CDE to include the LLM provider's infrastructure, which you cannot validate for PCI compliance.
❌ Inadequate Encryption
PCI DSS requires specific encryption methods with key management controls. HTTPS in transit is insufficient if data is decrypted at the LLM service.
❌ Vendor Compliance Gap
Most LLM vendors are not PCI DSS compliant service providers and have not been assessed as such.
⚠️ Important: Third-Party Service Providers
If a third party processes, stores, or transmits CHD on your behalf, they must be PCI DSS compliant. Since general-purpose LLM providers are not designed for payment processing and typically don't offer PCI compliance, they cannot be used for CHD.
PCI DSS Compliance Levels
Merchants are assigned PCI compliance levels based on annual transaction volume. Regardless of level, the prohibition against sending CHD to unauthorized third parties applies equally:
- Annual on-site security assessment by Qualified Security Assessor (QSA)
- Quarterly network scans by Approved Scanning Vendor (ASV)
- Attestation of Compliance (AOC) required
- Annual Self-Assessment Questionnaire (SAQ)
- Quarterly network scans by ASV
- May require on-site assessment depending on card brand
- Annual Self-Assessment Questionnaire (SAQ)
- Quarterly network scans by ASV
- Annual Self-Assessment Questionnaire (SAQ) may be required
- Compliance requirements set by merchant bank
Safe Alternative: Tokenization
Tokenization replaces cardholder data with a non-sensitive equivalent (token) that has no exploitable value. If you need to reference payment methods in AI workflows, use tokenization:
✅ How Tokenization Works
Step 1: Customer provides PAN at checkout
Step 2: Payment gateway/processor immediately replaces PAN with a random token (e.g., "tok_1a2b3c4d5e6f")
Step 3: Token is stored in your database; actual PAN stays only with the PCI-compliant payment processor
Step 4: For future charges, send the token back to the payment processor, who maps it to the real PAN
Popular Tokenization Services
Stripe
Stripe.js creates tokens client-side; PAN never touches your server. Tokens like tok_visa can safely be logged and referenced.
PayPal/Braintree
Hosted fields or SDK collect payment data directly. Returns payment method tokens for future use.
Square
Web Payment SDK creates card nonces. Your server never handles raw card numbers.
💡 LLM-Safe Payment References
You CAN safely send these to LLM services:
- Payment tokens from your gateway (e.g.,
tok_1a2b3c) - Last 4 digits only (e.g., "card ending in 4242")
- Card brand only (e.g., "Visa")
- Transaction IDs/order numbers
- Payment status (succeeded, failed, refunded)
Scope Reduction: Keep CHD Out of Your Environment
The best PCI strategy is to minimize or eliminate systems that handle CHD. Every system that stores, processes, or transmits CHD is in-scope for PCI DSS:
🎯 Recommended Architecture: Fully Outsourced
Use hosted payment pages or client-side tokenization so CHD never touches your servers:
- Stripe Checkout: Customer enters card on Stripe's page, you receive only a payment intent ID
- PayPal Commerce Platform: Customer stays on your site but payment fields are in PayPal iframes
- Square Web SDK: Card data collected in Square-controlled fields, SDK returns nonce
Result: Your PCI scope is reduced to SAQ A (22 questions) instead of SAQ D (300+ questions)
⚠️ Higher Scope: Direct POST to Payment Gateway
If payment form is on your domain but POSTs directly to payment gateway (not your server):
- CHD passes through your website (in browser) but not your server
- Still in-scope for PCI: SAQ A-EP (approximately 150 questions)
- Must ensure no JavaScript logs or caches CHD
❌ Highest Scope: CHD Touches Your Server
If your server receives, processes, or stores CHD (even briefly):
- Full PCI DSS compliance required: SAQ D (300+ questions) or on-site audit
- Network segmentation, encryption at rest, penetration testing, etc.
- Quarterly vulnerability scans by Approved Scanning Vendor
- NEVER send this data to LLM services
Safe LLM Use Cases for Payment Workflows
You CAN safely use LLMs for payment-adjacent workflows that don't involve CHD:
✅ Fraud Pattern Analysis
Analyze transaction metadata (amounts, timestamps, locations) without CHD to detect suspicious patterns.
✅ Customer Support Chatbots
Handle refund requests, payment status inquiries using order IDs and last 4 digits only.
✅ Chargeback Response Generation
Draft responses to disputes using order details, shipping confirmations, and customer communication logs.
✅ Payment Reminder Personalization
Generate friendly payment reminders referencing payment method type (not full details).
✅ Financial Report Summaries
Summarize revenue, refund rates, and payment metrics from aggregated data.
✅ Subscription Lifecycle Emails
Draft renewal reminders, upgrade offers, and cancellation feedback requests.
⚠️ Key Principle
Before sending any payment-related data to an LLM, ask: "Could this data be used to make fraudulent charges?" If yes, it's likely CHD and should not be sent. Use tokens, transaction IDs, and aggregated data instead.
Preventing Accidental CHD Transmission
Even with the best intentions, CHD can accidentally leak into LLM prompts through error messages, logs, or support tickets. Implement these safeguards:
1. Input Validation and Filtering
Detect and redact potential CHD before sending to LLM:
function sanitizeForLLM($text) {
// Redact potential credit card numbers (Luhn algorithm check)
$text = preg_replace_callback('/\b\d{13,19}\b/', function($match) {
if (isValidLuhn($match[0])) {
return '[REDACTED-CARD]';
}
return $match[0];
}, $text);
// Redact potential CVV codes
$text = preg_replace('/\b(cvv|cvc|security code)[\s:]*\d{3,4}\b/i', '[REDACTED-CVV]', $text);
return $text;
}
2. Error Message Sanitization
Never include CHD in error logs or messages sent to LLMs for analysis:
❌ DON'T
✅ DO
3. Support Ticket Screening
If using LLMs to categorize or route support tickets, scan for CHD first:
- Detect card number patterns (13-19 digits, Luhn algorithm validation)
- Flag keywords like "CVV", "security code", "card number", "expiration"
- Quarantine tickets containing potential CHD for human review only
- Train support staff to NEVER include full card details in tickets
4. Database Query Result Filtering
If LLMs analyze query results, ensure CHD columns are excluded:
-- ❌ DON'T: Select all columns
SELECT * FROM payments WHERE customer_id = 123;
-- ✅ DO: Explicitly exclude CHD
SELECT
payment_id,
amount,
status,
CONCAT('****', RIGHT(card_token, 4)) as card_display,
created_at
FROM payments
WHERE customer_id = 123;
PCI DSS 4.0 Requirements at a Glance
PCI DSS 4.0 (effective March 2024) has 12 core requirements across 6 control objectives:
Build and Maintain a Secure Network
Req 1: Install and maintain network security controls
Req 2: Apply secure configurations to all system components
Protect Cardholder Data
Req 3: Protect stored account data
Req 4: Protect cardholder data with strong cryptography during transmission
Maintain a Vulnerability Management Program
Req 5: Protect all systems and networks from malicious software
Req 6: Develop and maintain secure systems and software
Implement Strong Access Control Measures
Req 7: Restrict access to system components and cardholder data by business need to know
Req 8: Identify users and authenticate access to system components
Req 9: Restrict physical access to cardholder data
Regularly Monitor and Test Networks
Req 10: Log and monitor all access to system components and cardholder data
Req 11: Test security of systems and networks regularly
Maintain an Information Security Policy
Req 12: Support information security with organizational policies and programs
💡 Key Takeaway for LLM Usage
Requirement 3.3 prohibits storing sensitive authentication data after authorization. Requirement 12.8 requires documented policies for service providers that handle CHD. Since most LLM providers are not PCI-compliant service providers and you cannot adequately monitor their handling of CHD per Requirement 10, transmitting CHD to them violates multiple requirements.
Best Practices for Payment Data & LLMs
DO
- Use payment tokens from your gateway (Stripe, Braintree, etc.) in LLM prompts
- Reference last 4 digits only (e.g., "Visa ending in 4242")
- Implement client-side tokenization to keep CHD off your servers entirely
- Use Luhn algorithm validation to detect and redact potential card numbers
- Analyze aggregated transaction data (amounts, dates, locations) without CHD
- Document your PCI scope and which systems handle CHD
DON'T
- Send full credit card numbers (PAN) to any LLM service
- Include CVV/CVC codes in error logs, support tickets, or LLM prompts
- Send expiration date + card brand + last 4 together (this combination is restricted)
- Assume HTTPS alone makes CHD transmission to LLMs compliant (it doesn't)
- Store CHD in application logs, even temporarily
- Use SELECT * queries that might include CHD columns when feeding data to LLMs
Need Help with PCI-Compliant AI Implementation?
We can help you safely integrate AI into payment workflows while maintaining full PCI DSS compliance and implementing proper tokenization strategies.
Schedule a Consultation