Red-Teaming Your Health Plan: Demetri Giannikopoulos on Responsible AI, the Cures Act, and Adverserial Approach to AI Outputs
Demetri Giannikopoulos is Chief Innovation Officer at RadAI, overseeing clinical integration of AI in radiology across US health systems. He has lived with multiple sclerosis for roughly twenty years. His wife is a nurse practitioner and a cancer survivor. He speaks with both sides of the microphone: as a patient who has been navigating a broken insurance apparatus for two decades, and as an executive deploying AI into radiology workflows.
Summary
Demetri's AI use pattern inverts the common assumption. For MS management he trusts his Johns Hopkins care team and does not lean on AI. Where LLMs have been "absolutely priceless" is the administrative layer — specifically, decoding US health insurance. He describes a case study in which he fed roughly 50 pages of underwriting documents into ChatGPT, prompted it adversarially ("imagine you are an insurance underwriter trying to design a plan where the patient pays the least"), and discovered that a high-deductible plan actually beat the "better" premium plan by several thousand dollars for his and his wife's combined imaging and oncology needs. He then widens the lens: to the Cures Act's unintended consequence of weekend-anxiety from auto-released radiology reports; to AI-generated radiology impressions catching legacy speech-recognition errors where the word "no" had been dropped from reports; to the fragmented, per-site AI governance slowing hospital adoption. His framing metaphor — AI as nuclear power, where regulation should sit at the application layer, not the discovery layer — is the most operationally useful in the series.
Key Warnings
1. Simple prompts produce simple, wrong answers. Demetri's opening rule: "Never ask a simple question because you'll often get a simple answer, and that can take you down the wrong course." For insurance, a single copy-pasted plan summary missed the loopholes; only pulling the full underwriting document surfaced the real cost structure.
2. The scribe-era review burden is higher, not lower. Legacy speech-recognition was so clearly imperfect that clinicians knew to spot-check. Modern AI scribes produce output so close to what was said that errors are harder to catch — and the burden falls on both clinician and patient to verify the record.
3. Medication histories in EHRs are frequently wrong. Not hypothetically — Demetri states this as a known failure mode. Patients need to verify their own medication list in the chart.
4. The Cures Act created an anxiety gap no one is filling. Results drop to patients the moment they're available; the physician doesn't review until days later. The interval is being filled by ChatGPT, often over a weekend, often with worst-case interpretations.
5. Per-site AI governance is a fragmentation crisis. Some sites require two pages of documentation to onboard an AI tool; others require hundreds. The result is a patchwork of review that slows adoption without improving safety.
6. "Responsible AI" in insurance should mean asymmetric automation. AI-approved coverage: acceptable. AI-denied coverage without human clinical review: not. The well-publicized insurance case where reviewers were instructed not to deviate more than 0.1% from the algorithm is, in his words, "poor system design" — not a case for banning AI but for designing the human loop correctly.
Key Insights
1. The killer app for patient AI is administrative, not clinical. The time-and-money ROI for a patient prompting a frontier model is concentrated in insurance, billing, prior authorization, and benefits comparison — domains where the system is deliberately opaque and where the patient is motivated to reverse-engineer it.
2. Adversarial prompting as the default patient technique. "Always red team. Use those exact words. Red team." He advises patients to say "conduct a red team analysis of this" — which leverages the model's training on the term to produce genuinely critical output. It is the most transferable single prompt pattern in the series.
3. Feed the model everything, not one document. "Don't feed in a single episode, a single piece of information. That's not the way healthcare works." His analysis used 50+ pages of source material, cross-compared three plan options simultaneously, and factored in his wife's expected utilization.
4. AI surfaces errors that pre-date the AI era. His RadAI example: an impression generated from a report flags a cancer diagnosis; the clinician checks and finds that the non-AI speech-recognition system had dropped the word "no" from the original dictation, flipping "no evidence of neoplasm" into a positive finding. The new system catches the old system's errors. This is a latent quality-control opportunity for health systems sitting on decades of legacy dictation.
5. Nuclear power is the right regulatory metaphor. "We regulate the atom. We regulate the application of it. But once it hits the distribution side, it's going back into your traditional already-regulated energy grid." Translation: regulate the medical-device AI and regulate the point of clinical application, but don't try to regulate every downstream use. The challenge: "we're creating more power than we know what to do with."
6. Model cards and SOC 2-style third-party audits are the realistic validation path. He applauds the Coalition for Health AI's ambition but says industry-wide validation at scale "never quite emerged." The workable analogue is how finance handles trust: independent standards bodies (NIST, ISO), independent auditors (the Deloittes of the world), and a certification badge that neither vendor nor buyer issues to themselves.
Key Tips (actionable)
For insurance decisions: download the full underwriting document, not the plan summary. Feed all candidate plans simultaneously. Frame the prompt adversarially from the insurer's perspective. Factor in expected-utilization events (surgeries, scans, known meds) explicitly.
For clinical records: read the impression on radiology reports first. If something seems wrong, check the full dictation.
Before using an AI tool in a clinical vendor context: ask for the model card. If they don't have one, that is itself information.
Use voice notes to build long, context-rich prompts. Demetri's typical prompt is "a couple thousand words" because he dictates them.
Before trusting an AI answer, ask the model to challenge itself. Different models have different failure modes — he cross-checks with Anthropic when he suspects ChatGPT of telling him what he wants to hear.
Key Learning Lessons
Use AI where the system is most broken — insurance, billing, care-coordination, benefits — not where it works adequately.
Red-team everything as a standing prompt pattern.
Full documents beat excerpts; healthcare decisions are context-dependent.
Automated approval is fine; automated denial is not — the human loop belongs on the denial side.
The review burden has shifted: modern AI output is closer to correct, which makes residual errors more dangerous.
Regulate the application layer, not the atom. Over-regulating definitions freezes innovation against a technology that changes every six months.
Patient and Family Care Councils (e.g., the American College of Radiology PFCC) exist, are looking to recruit patients, and are where AI-informed patients can meaningfully shape system design.