PCI DSS & Payment Data with LLMs

Understanding why cardholder data should never enter LLM systems and how to safely use AI for payment-related workflows

The Payment Card Industry Data Security Standard (PCI DSS) sets strict requirements for organizations that handle credit card information. Unlike other compliance frameworks that may permit regulated data in LLM systems under certain conditions, PCI DSS requires that cardholder data should NEVER be transmitted to third-party LLM services. Understanding these restrictions and implementing safe alternatives is critical for any business processing payments.

⚠️ Critical Warning

PCI DSS violations can result in fines of $5,000-$100,000 per month, card brand penalties, increased transaction fees, and potential loss of the ability to process credit card payments. A single data breach can cost millions in remediation, legal fees, and reputational damage.

What is Cardholder Data (CHD)?

PCI DSS protects two categories of payment data that must NEVER be sent to LLM services:

Primary Account Number (PAN) - The Card Number

The 13-19 digit number on the front of payment cards
Even partial PANs (first 6 or last 4 digits alone are OK, but together are restricted)
Truncated or masked versions in certain contexts
Tokenized equivalents if reversible to the original PAN

Sensitive Authentication Data (SAD) - NEVER Store After Authorization

CVV/CVC codes (3-4 digit security code)
Full magnetic stripe data or chip equivalent
PIN or PIN blocks
These must NEVER be stored after transaction authorization, even encrypted

Additional cardholder data that must be protected if stored alongside PAN:

Cardholder name
Expiration date
Service code

Why Cardholder Data Can't Go to LLMs

PCI DSS Requirement 3.3 states: "Sensitive authentication data is not stored after authorization (even if encrypted)." Requirement 3.4 mandates rendering PAN unreadable through specific methods. Sending CHD to third-party LLM services violates these requirements:

❌ Loss of Control

Once CHD enters an LLM service, you lose control over how it's stored, processed, and protected. The data may be logged, cached, or used for model training.

❌ Cardholder Data Environment (CDE) Breach

Transmitting CHD to an LLM expands your CDE to include the LLM provider's infrastructure, which you cannot validate for PCI compliance.

❌ Inadequate Encryption

PCI DSS requires specific encryption methods with key management controls. HTTPS in transit is insufficient if data is decrypted at the LLM service.

❌ Vendor Compliance Gap

Most LLM vendors are not PCI DSS compliant service providers and have not been assessed as such.

⚠️ Important: Third-Party Service Providers

If a third party processes, stores, or transmits CHD on your behalf, they must be PCI DSS compliant. Since general-purpose LLM providers are not designed for payment processing and typically don't offer PCI compliance, they cannot be used for CHD.

PCI DSS Compliance Levels

Merchants are assigned PCI compliance levels based on annual transaction volume. Regardless of level, the prohibition against sending CHD to unauthorized third parties applies equally:

Level 1 6+ million transactions/year

Annual on-site security assessment by Qualified Security Assessor (QSA)
Quarterly network scans by Approved Scanning Vendor (ASV)
Attestation of Compliance (AOC) required

Level 2 1-6 million transactions/year

Annual Self-Assessment Questionnaire (SAQ)
Quarterly network scans by ASV
May require on-site assessment depending on card brand

Level 3 20,000-1 million e-commerce transactions/year

Annual Self-Assessment Questionnaire (SAQ)
Quarterly network scans by ASV

Level 4 Fewer than 20,000 e-commerce transactions/year or up to 1 million total

Annual Self-Assessment Questionnaire (SAQ) may be required
Compliance requirements set by merchant bank

Safe Alternative: Tokenization

Tokenization replaces cardholder data with a non-sensitive equivalent (token) that has no exploitable value. If you need to reference payment methods in AI workflows, use tokenization:

✅ How Tokenization Works

Step 1: Customer provides PAN at checkout

Step 2: Payment gateway/processor immediately replaces PAN with a random token (e.g., "tok_1a2b3c4d5e6f")

Step 3: Token is stored in your database; actual PAN stays only with the PCI-compliant payment processor

Step 4: For future charges, send the token back to the payment processor, who maps it to the real PAN

Popular Tokenization Services

Stripe

Stripe.js creates tokens client-side; PAN never touches your server. Tokens like tok_visa can safely be logged and referenced.

PayPal/Braintree

Hosted fields or SDK collect payment data directly. Returns payment method tokens for future use.

Square

Web Payment SDK creates card nonces. Your server never handles raw card numbers.

💡 LLM-Safe Payment References

You CAN safely send these to LLM services:

Payment tokens from your gateway (e.g., tok_1a2b3c)
Last 4 digits only (e.g., "card ending in 4242")
Card brand only (e.g., "Visa")
Transaction IDs/order numbers
Payment status (succeeded, failed, refunded)

Scope Reduction: Keep CHD Out of Your Environment

The best PCI strategy is to minimize or eliminate systems that handle CHD. Every system that stores, processes, or transmits CHD is in-scope for PCI DSS:

🎯 Recommended Architecture: Fully Outsourced

Use hosted payment pages or client-side tokenization so CHD never touches your servers:

Stripe Checkout: Customer enters card on Stripe's page, you receive only a payment intent ID
PayPal Commerce Platform: Customer stays on your site but payment fields are in PayPal iframes
Square Web SDK: Card data collected in Square-controlled fields, SDK returns nonce

Result: Your PCI scope is reduced to SAQ A (22 questions) instead of SAQ D (300+ questions)

⚠️ Higher Scope: Direct POST to Payment Gateway

If payment form is on your domain but POSTs directly to payment gateway (not your server):

CHD passes through your website (in browser) but not your server
Still in-scope for PCI: SAQ A-EP (approximately 150 questions)
Must ensure no JavaScript logs or caches CHD

❌ Highest Scope: CHD Touches Your Server

If your server receives, processes, or stores CHD (even briefly):

Full PCI DSS compliance required: SAQ D (300+ questions) or on-site audit
Network segmentation, encryption at rest, penetration testing, etc.
Quarterly vulnerability scans by Approved Scanning Vendor
NEVER send this data to LLM services

Safe LLM Use Cases for Payment Workflows

You CAN safely use LLMs for payment-adjacent workflows that don't involve CHD:

✅ Fraud Pattern Analysis

Analyze transaction metadata (amounts, timestamps, locations) without CHD to detect suspicious patterns.

"Review these 50 transactions for fraud: Order #12345 ($499, NYC, 2pm), Order #12346 ($12, LA, 2:05pm)... Flag unusual patterns."

✅ Customer Support Chatbots

Handle refund requests, payment status inquiries using order IDs and last 4 digits only.

"Customer asking about charge on Visa ending 4242. Transaction ID: ch_abc123. Status: completed. Amount: $49.99."

✅ Chargeback Response Generation

Draft responses to disputes using order details, shipping confirmations, and customer communication logs.

"Draft chargeback response for Order #789. Product delivered per tracking #1Z999. Customer confirmed receipt via email on 1/15."

✅ Payment Reminder Personalization

Generate friendly payment reminders referencing payment method type (not full details).

"Write a payment failed email for customer John. Payment method: Visa ending 4242. Amount due: $29/month for Pro plan."

✅ Financial Report Summaries

Summarize revenue, refund rates, and payment metrics from aggregated data.

"Summarize Q1 payment trends: $450K revenue, 2.1% refund rate, top countries: US (60%), UK (20%), CA (12%)."

✅ Subscription Lifecycle Emails

Draft renewal reminders, upgrade offers, and cancellation feedback requests.

"Write renewal email for customer Sarah. Plan: Basic ($9/mo). Renewal date: Feb 1. Card: Amex ending 1005."

⚠️ Key Principle

Before sending any payment-related data to an LLM, ask: "Could this data be used to make fraudulent charges?" If yes, it's likely CHD and should not be sent. Use tokens, transaction IDs, and aggregated data instead.

Preventing Accidental CHD Transmission

Even with the best intentions, CHD can accidentally leak into LLM prompts through error messages, logs, or support tickets. Implement these safeguards:

1. Input Validation and Filtering

Detect and redact potential CHD before sending to LLM:

function sanitizeForLLM($text) {
    // Redact potential credit card numbers (Luhn algorithm check)
    $text = preg_replace_callback('/\b\d{13,19}\b/', function($match) {
        if (isValidLuhn($match[0])) {
            return '[REDACTED-CARD]';
        }
        return $match[0];
    }, $text);

    // Redact potential CVV codes
    $text = preg_replace('/\b(cvv|cvc|security code)[\s:]*\d{3,4}\b/i', '[REDACTED-CVV]', $text);

    return $text;
}

2. Error Message Sanitization

Never include CHD in error logs or messages sent to LLMs for analysis:

❌ DON'T

"Payment failed for card 4532-1234-5678-9010. Error: Invalid CVV 123."

✅ DO

"Payment failed for card ending 9010. Error: Payment authentication required."

3. Support Ticket Screening

If using LLMs to categorize or route support tickets, scan for CHD first:

Detect card number patterns (13-19 digits, Luhn algorithm validation)
Flag keywords like "CVV", "security code", "card number", "expiration"
Quarantine tickets containing potential CHD for human review only
Train support staff to NEVER include full card details in tickets

4. Database Query Result Filtering

If LLMs analyze query results, ensure CHD columns are excluded:

-- ❌ DON'T: Select all columns
SELECT * FROM payments WHERE customer_id = 123;

-- ✅ DO: Explicitly exclude CHD
SELECT
    payment_id,
    amount,
    status,
    CONCAT('****', RIGHT(card_token, 4)) as card_display,
    created_at
FROM payments
WHERE customer_id = 123;

PCI DSS 4.0 Requirements at a Glance

PCI DSS 4.0 (effective March 2024) has 12 core requirements across 6 control objectives:

Build and Maintain a Secure Network

Req 1: Install and maintain network security controls

Req 2: Apply secure configurations to all system components

Protect Cardholder Data

Req 3: Protect stored account data

Req 4: Protect cardholder data with strong cryptography during transmission

Maintain a Vulnerability Management Program

Req 5: Protect all systems and networks from malicious software

Req 6: Develop and maintain secure systems and software

Implement Strong Access Control Measures

Req 7: Restrict access to system components and cardholder data by business need to know

Req 8: Identify users and authenticate access to system components

Req 9: Restrict physical access to cardholder data

Regularly Monitor and Test Networks

Req 10: Log and monitor all access to system components and cardholder data

Req 11: Test security of systems and networks regularly

Maintain an Information Security Policy

Req 12: Support information security with organizational policies and programs

💡 Key Takeaway for LLM Usage

Requirement 3.3 prohibits storing sensitive authentication data after authorization. Requirement 12.8 requires documented policies for service providers that handle CHD. Since most LLM providers are not PCI-compliant service providers and you cannot adequately monitor their handling of CHD per Requirement 10, transmitting CHD to them violates multiple requirements.

Best Practices for Payment Data & LLMs

DO

Use payment tokens from your gateway (Stripe, Braintree, etc.) in LLM prompts
Reference last 4 digits only (e.g., "Visa ending in 4242")
Implement client-side tokenization to keep CHD off your servers entirely
Use Luhn algorithm validation to detect and redact potential card numbers
Analyze aggregated transaction data (amounts, dates, locations) without CHD
Document your PCI scope and which systems handle CHD

DON'T

Send full credit card numbers (PAN) to any LLM service
Include CVV/CVC codes in error logs, support tickets, or LLM prompts
Send expiration date + card brand + last 4 together (this combination is restricted)
Assume HTTPS alone makes CHD transmission to LLMs compliant (it doesn't)
Store CHD in application logs, even temporarily
Use SELECT * queries that might include CHD columns when feeding data to LLMs

Need Help with PCI-Compliant AI Implementation?

We can help you safely integrate AI into payment workflows while maintaining full PCI DSS compliance and implementing proper tokenization strategies.

Schedule a Consultation