Skip to main content
8 min readSanolith Engineering

How HIPAA-compliant is ChatGPT Enterprise, really?

OpenAI signs a BAA. That's necessary but not sufficient. Here's what actually happens to PHI in ChatGPT Enterprise, and where the gaps that matter to a privacy officer hide.

HIPAAComplianceChatGPTBuyer guide

OpenAI signs a BAA for ChatGPT Enterprise. Marketing teams quote that fact as if it ends the conversation. It doesn't. A BAA is a contract; it says "if PHI leaks, we'll help you handle the breach." It doesn't say PHI won't leak.

Here's what a privacy officer should actually ask about any LLM platform claiming HIPAA compliance.

What the BAA does and doesn't cover

A Business Associate Agreement legally binds the vendor to:

  • Use PHI only for the services they're providing
  • Implement reasonable safeguards
  • Report breaches promptly
  • Allow audits
  • Help with patient data-access requests

It does NOT cover:

  • Whether PHI is actually scrubbed before model inference
  • Whether your prompts and completions are used to train a future model
  • Whether other tenants on the same infrastructure can ever see your data
  • What happens to the data when you churn

These are the questions that decide whether a BAA is the start of compliance work or the end of it.

Where PHI lives in a ChatGPT Enterprise prompt

When a clinician types into ChatGPT Enterprise, the prompt:

1. Travels from their browser to OpenAI's API edge 2. Gets routed to a GPU pod running the model 3. Generates a completion 4. Returns to the user 5. May land in workspace activity logs

At every step, the prompt is in plaintext PHI. No automatic redaction. If a clinician pastes a chart note that includes a patient's name, DOB, MRN, and a question about contraindications, all of that crosses every boundary.

OpenAI's BAA covers what happens AFTER PHI is in their system. It doesn't make the PHI disappear before it gets there.

The fail-closed alternative

A fail-closed PHI redactor is a service that sits BETWEEN your clinician and the model. Every prompt passes through it. The redactor:

1. Identifies names, MRN, DOB, SSN, phone, addresses, dates within 1 day of admission, and 40+ other PHI categories 2. Replaces them with [REDACTED] placeholders before the prompt is sent for inference 3. Fails the request entirely if redaction errors; never silently lets PHI through

"Fail-closed" is the key phrase. A fail-OPEN redactor that errors and lets the original prompt pass is worse than no redactor at all; it gives false confidence. Production HIPAA systems must fail closed.

ChatGPT Enterprise does not ship with a fail-closed redactor. You can build one in front of their API, but you're now in the business of operating a HIPAA-grade redactor service. Which is real work.

Audit trails: what counts

Workspace activity logs in ChatGPT Enterprise tell you WHO talked to the model and WHEN. That's the floor.

What a privacy officer wants:

  • WHAT was the prompt (post-redaction)
  • WHICH identifiers were redacted (in what categories)
  • WHICH model received the prompt
  • WHICH tools the model called
  • WHICH retrieved documents the model used
  • The output as it left the system
  • A tamper-evident log of all of the above

That's an audit ledger, not an activity log. If your privacy officer can't replay a clinician's session three months later from the ledger alone, the audit isn't going to satisfy a compliance review.

Data residency and training

Two questions that should be in every healthcare LLM RFP:

1. Where physically does my data live? (Region, country, sub-processors) 2. Will my prompts ever be used to improve the model?

ChatGPT Enterprise: data stays in US regions by default, and OpenAI's enterprise terms say workspace data is not used for training. Both fine, but the answer must be in writing, signed, and verifiable through audits. "We don't train on your data" without a contractual penalty for doing so is marketing.

The fair summary

ChatGPT Enterprise + a BAA is an acceptable choice for SOME healthcare use cases: non-PHI workflows, drafting non-clinical documents, internal ops. It's NOT acceptable as the only safeguard for prompts that may contain PHI. There is no automatic redaction; the user is the redactor.

If your clinicians type PHI into prompts (and they will), you need a redactor in front of any LLM. Whether you build it yourself or buy it is a question of where you want your engineering time to go.

What to verify with any vendor

Use this as a checklist on every sales call:

  • Is PHI redacted before inference? Fail-closed or fail-open?
  • What's the redactor coverage (names, MRN, DOB, etc.)?
  • Is the audit trail tamper-evident? Hash-chained? Exportable?
  • What happens to my data on churn? Deletion SLA? Certified destruction?
  • Can I bring my own model? Self-host inference?
  • Are sub-processors disclosed? Can I refuse one?
  • Is the training data isolated per tenant?

If a vendor can't answer all seven, the BAA is doing more work than it can carry.