Skip to main content
Data Security

Decoy

AI-powered synthetic data generation behind your trust boundary. We build your application on the decoy. You verify we never saw the original.

In Development

Decoy deploys inside your security boundary as a single binary with read-only access to your data sources. An AI agent — running in your environment, authorized to see your data — analyzes schemas, discovers relationships, and generates a complete synthetic dataset that matches the original's shape, volume, and statistical properties. Only the synthetic output crosses the trust boundary. We never see the real data.

The Problem

Building enterprise applications requires realistic data — correct schemas, plausible distributions, proper relationships, enough volume to test performance. But the data these systems touch is sensitive: financial records, healthcare information, acquisition details, personnel data. AI-first development makes this worse — every dataset is one prompt away from being sent to a model provider. NDAs and access controls manage the risk. Decoy eliminates it.

What Decoy Does

AI-Powered Analysis

An AI agent in your environment reads schemas, understands field semantics, discovers relationships (explicit FKs and soft-references like matching GUIDs), identifies embedded structures, and notes data quirks.

Provider Code Generation

The AI writes custom data generation rules for every field, every relationship, every distribution. If generation fails — wrong types, broken constraints — the AI debugs and fixes it automatically.

Full-Scale Synthetic Output

1:1 volume match. If your JSON has 1.2M lines, the synthetic version has 1.2M lines. Same field lengths, nested structures, referential integrity. Performance testing on synthetic data matches production behavior.

Documentation Package

Schema maps, field profiles, relationship graphs, data quirks log, and provider reference. Everything developers need to understand data they can't see. Yours to keep — great for onboarding and transitions.

310+
Built-in Data Types
0
Real Values Exposed
1:1
Volume Match
100%
Customer Auditable
Only synthetic data crosses the trust boundary — real data never leaves
Standard Fake Data
  • Random values by type
  • No cross-table relationships
  • Uniform distributions
  • Fixed row counts
  • Breaks on first real query
Decoy
  • AI-analyzed field semantics
  • Full referential integrity
  • Real distribution shapes
  • Production-scale volume
  • Applications work identically

Supported Data Sources

S3 buckets, relational databases, data lakes, Microsoft Dataverse, JSON, YAML, Parquet, CSV, XML, and dozens more formats. Configured with a straightforward file specifying data source connections and optional AI prompting for corner cases. Read-only access only — Decoy never writes to your data.

A note on availability: Yes, we know there's interest in buying Decoy as a standalone product. As of now, you have to hire us to get it.