The Challenge
Medicare pays over $1 trillion annually. Roughly $70 billion of that -- about 7 cents on every dollar -- is estimated to be improper payments. The GAO has repeatedly criticized CMS for lacking quantifiable fraud detection metrics. The existing model is reactive: pay first, chase later, often years after the money is gone.
The ACT-IAC 2026 AI Hackathon posed a direct question: Can AI shift CMS from reactive to proactive program integrity?
We had 13 days.
The Constraint
No protected health information. No data use agreements. No claims-level data. Everything had to work on publicly available CMS datasets -- the same data anyone can download from data.cms.gov.
This is a harder problem than it sounds. Public data is annual aggregates, not individual claims. You can see that a provider billed $2.3M last year, but not whether they billed the same patient twice on the same day. Most academic fraud detection work assumes claims-level access. We didn't have that luxury.
What We Built
A single Go binary that ingests 22 CMS datasets, scores 1.47 million providers, and serves an interactive investigation dashboard.
Integrity is a single Go binary that ingests 22 CMS datasets (548 million rows), scores 1.47 million Medicare providers across 10 years of history, and serves an interactive investigation dashboard -- all from one server.
The Data Pipeline
The system ingests data from CMS, OIG, DEA, Census, and other public sources into an embedded DuckDB database. No external database server. No ETL orchestration platform. Just direct CSV-to-columnar ingestion with SHA-256 deduplication.
| Metric | Value |
|---|---|
| Rows ingested | 548M+ |
| Providers scored | 1.47M |
| Coverage | 2013-2023 |
| Peer groups | 1,046 (specialty x region x year) |
| Graph nodes | 5.8M |
| Graph edges | 82.6M |
27 Features, 8 Categories
Every provider gets scored on 27 features in eight categories:
- Billing Volume -- billing far more than specialty peers, sudden spikes, solo practitioners billing like groups.
- Procedure Mix -- upcoding E&M visits, concentrating in expensive codes, deviating from specialty norms.
- Drug Prescribing -- opioid prescribing intensity, high-risk medications for elderly, rare drug combinations.
- Payment Network -- Open Payments manufacturer relationships correlated with prescribing patterns.
- Sanctions -- LEIE exclusions, Corporate Integrity Agreements, SAM exclusions.
- Geographic -- billing deviation from county baselines, cross-payer risk signals.
- Graph Analysis -- six custom network algorithms: prescribing isolation, opioid network contagion, manufacturer concentration, exclusion proximity, geographic deviation, co-prescribing anomalies.
- Enrichment -- career stage anomalies, controlled substance supply chain deviations from DEA ARCOS data.
Three-Way Ensemble Scoring
27 features normalized to 0-100, combined with tunable weights. Deterministic, reproducible, transparent.
Isolation Forest, K-Means, Local Outlier Factor. Corroboration boost up to +30 points.
Trained on LEIE exclusion labels. Validation AUC: 0.965.
The Dashboard
National heatmap, state/county drill-downs, deep provider profiles. Peer comparison radar charts, top contributing factors, 10-year billing trajectories, network visualizations.
Key differentiator: live weight adjustment. Investigators tune feature weights in real-time and watch scores re-rank instantly.
Every score shows exactly how it was computed. The SQL queries are visible. The weights are visible. The peer groups are visible. Not a black box.
The Results
Backtest Performance (train 2013-2019, validate 2020-2023)
42% precision at top 100 means 42 of every 100 highest-flagged providers are confirmed fraud. At base rate 0.047%, that's 890x improvement.
6.9-year lead time means the system would have flagged providers nearly 7 years before OIG excluded them. ~$3.4M prevented waste per provider caught early.
Out-of-Sample Validation
We searched every non-LEIE provider in our top 250 against DOJ press releases, OIG enforcement actions, court records, and state medical boards. 37 confirmed fraud cases. $500M+ total fraud value.
Highlights
| Rank | Finding |
|---|---|
| #2 | DOJ settlement $1.2M |
| #7 | 55,000 false claims on deceased patients |
| #8 | $14.3M wound care fraud |
| #22 | $250M opioid conspiracy |
| #52 | $6.5M false claims (independently investigated by ProPublica) |
Architecture
Single server. Zero external runtime dependencies. Under $800/month.
Single server: Go binary, DuckDB embedded, GraphWizard in-process, HTMX+Alpine+ECharts frontend embedded, <$800/mo infrastructure.
| Component | Choice | Why |
|---|---|---|
| Language | Go | Single binary, embedded assets, goroutines for pipeline |
| Database | DuckDB (embedded) | No server, columnar analytics on 548M rows |
| Graph engine | GraphWizard (in-process) | Replaced 100GB RAM Memgraph dependency |
| Frontend | HTMX + Alpine + ECharts | Embedded in binary, no build step |
| ML | Go ML libraries | Isolation Forest, K-Means, LOF, GBT -- all in-process |
| Infrastructure | Single VPS | <$800/mo for the entire system |
GraphWizard
Born from necessity. Open-sourced for everyone.
Built during and after the hackathon to replace Memgraph. 40+ algorithms, pure Go, zero dependencies beyond gonum, 97.3% test coverage. MIT licensed, open source.
Replaced a 100GB RAM Memgraph dependency with an in-process library. Full 82.6M-edge graph analysis in ~90 seconds.
| Metric | Value |
|---|---|
| Algorithms | 40+ |
| Dependencies | gonum only |
| Test coverage | 97.3% |
| License | MIT |
| Graph size handled | 82.6M edges, 5.8M nodes |
| Analysis time | ~90 seconds |
| RAM replaced | 100GB (Memgraph) to in-process |
The Numbers
Hackathon
| Duration | 13 days |
| Team size | 4 people |
| Result | 1st place, ACT-IAC 2026 |
Fraud Detection
| Confirmed cases | 37 |
| Total fraud value | $500M+ |
| P@100 | 42% |
| Lift over random | 890x |
| Avg lead time | 6.9 years |
Infrastructure
| Monthly cost | <$800 |
| External runtime deps | 0 |
Open Source
| Project | GraphWizard |
| Algorithms | 40+ |
| Language | Pure Go |