01
Every attribute cites a source
Marginals are drawn from published US references — ACS, NHANES, CDC NDSS, KFF, MEPS, BLS. BMI follows weight over height squared. ZIP codes match states. Diabetes prevalence rises with age. Verified with KS and chi-squared tests.
02
No source dataset required
Most synthetic-data tools need your real records first, then learn from them. This one doesn't. It is built entirely from public reference data — no real PII ever enters the system at any point.
03
Deterministic by seed
Same seed, same people — byte for byte, every time. Pin your test fixtures, reproduce a bug, version a dataset. Change the seed and you get a different, equally-calibrated population.
04
Bulk export, async
Need millions of rows? Submit a job and the async endpoint streams up to 5,000,000 records as JSONL to object storage. Poll for status, then download. The link stays live for 90 days.