SimpleIDGen
Home/Synthetic Data/Person Profile
Synthetic · Not real PII 65 attributes / record Calibrated · NHANES / ACS / CDC Deterministic by seed

Synthetic Person Data Generator

Synthetic person records with 65 jointly-distributed attributes — demographics, health, behavioral, financial — calibrated against public US reference data.

Demographic-first generator. Returns synthetic person records with 65 jointly-distributed attributes across 9 domains (identity, geography, social, financial, behavioral, health basics, health conditions, healthcare utilization, medications). Each marginal distribution cites a public source (ACS 2022, NHANES 2017-2020, CDC NDSS, KFF 2023, MEPS 2022, BLS 2023, USPS L005 2024). Cross-field invariants are enforced: BMI = weight/(height/100)², ZIP matches state per USPS SCF ranges, insulin only fires for diabetics. Deterministic by seed. Three locales (en-US full fidelity; en-GB / en-IN identity-native with en-US health fallback, disclosed via locale_data_source). Async bulk generation: submit a job, download the JSONL or CSV file via a download URL.

Parameters

NameTypeReqDefaultDescription
count integer optional 100000 Number of person records to generate. Range: 1–5,000,000.
seed integer optional (derived from job_id) RNG seed for reproducibility. Same seed + same params = byte-identical records.
locale string optional en-US Locale: en-US, en-GB, en-IN. Health attributes use en-US fallback for en-GB / en-IN.
idFormat string optional ulid ID format: ulid, uuidv7, uuid, nanoid, cuid2.

Example record

{
  "id": "64PG6RYQXXD7XFEKZJ6AW616M7",
  "given_name": "Elizabeth", "family_name": "Robinson",
  "age": 31, "sex_at_birth": "female",
  "race": "white", "ethnicity": "hispanic",
  "locale": "en-US", "country": "US", "state": "IL", "urbanicity": "suburban",
  "education": "some_college", "insurance_type": "marketplace",
  "height_cm": 171.8, "weight_kg": 76.4, "bmi": 25.9, "waist_circumference_cm": 89.1,
  "diabetes_status": "diagnosed_t2dm", "family_history_diabetes": true,
  "visits_past_year": 7, "number_of_prescriptions": 1, "on_insulin": false
  // ... 49 more attributes
}

Call it

# 1. Register once — returns your clientId and sets a session cookie
curl -sS -c cookies.txt -X POST https://api.simpleidgen.com/v1/auth/register \
  -H 'Content-Type: application/json' \
  -d '{"name":"You","email":"you@company.com","password":"your-password"}'

# 2. Submit a generation job (uses the saved cookie)
curl -sS -b cookies.txt -X POST https://api.simpleidgen.com/v1/datasets/person \
  -H 'Content-Type: application/json' \
  -d '{"clientId":"<your client id>","count":100000,"seed":42}'

# 3. Poll status, then download the JSONL once completed
curl -sS -b cookies.txt https://api.simpleidgen.com/v1/datasets/<job_id>
// After registering or logging in (session cookie set), submit a job:
const res = await fetch('https://api.simpleidgen.com/v1/datasets/person', {
  method: 'POST',
  credentials: 'include',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ clientId: '<your client id>', count: 100000, seed: 42 }),
});
const { jobId, statusUrl } = await res.json();
import requests
s = requests.Session()
s.post('https://api.simpleidgen.com/v1/auth/login', json={'email': 'you@company.com', 'password': '...'})
job = s.post('https://api.simpleidgen.com/v1/datasets/person', json={'clientId': '<your client id>', 'count': 100000, 'seed': 42}).json()
print(job['jobId'], job['statusUrl'])
Get started

Generation requires a free account — it takes about 10 seconds and gives you a client ID and an API session.

Create a free account

Already have one? Log in.

Endpoint
POST /v1/datasets/person

Async — submit a job, poll /v1/datasets/{job_id}, then download JSONL.

Multiple datasets — 10 × 200K records

Variance evidence: 10 independent regenerations of ~200K rows each (a different base seed per run), pooled and cross-checked across 45 pairwise comparisons.

person-profile-advanced v0.5 · 10 datasets · 2,000,000 records pooled
Calibrated against ACS 2022 · NHANES 2017-2020 · CDC-NDSS 2022 · CDC-mortality 2022 · KFF 2023 · MEPS 2022 · BLS 2023 · USPS-L005 2024.

2M
records
100.00%
unique rows
100.00%
unique IDs
100.00%
valid emails
100.00%
valid phones

Categorical fidelity

Chi-squared test vs reference; effect size is Cramér's V. A small effect size = the synthetic distribution tracks the reference.

AttributeVerdictCramér’s V
ckd_statusmatch0.097
diabetes_statusmatch0.072
educationmatch0.069
employment_statusmatch0.054
ethnicitymatch0.007
hypertension_statusmatch0.017
insurance_typematch0.081
marital_statusmatch0.043
racematch0.097
sex_at_birthmatch0.001
smoking_statusmatch0.057
statematch0.004

Numeric fidelity

Kolmogorov–Smirnov distance vs the reference distribution; lower D = closer fit.

AttributeVerdictKS DReference
a1c_valueclose0.0959NHANES 2017-2020 (LBXGH adult mean)
agematch0.0176ACS 2022 (US adults)
bmimatch0.0195NHANES 2017-2020 (US adults)
height_cmmatch0.0224NHANES 2017-2020 (adult height)
waist_circumference_cmmatch0.0106NHANES 2017-2020 (BMXWAIST adult mean)
weight_kgmatch0.0132NHANES 2017-2020 (adult weight)

Correlations

Observed pairwise correlation vs the published reference.

PairObservedReferenceVerdict
age × a1c_value0.2850.140match
age × bmi0.0010.100match
bmi × waist_circumference_cm0.7600.850match
bmi × weight_kg0.8880.780match
height_cm × weight_kg0.5160.450match

Distributions

Each attribute's generated distribution against its reference curve.

diabetes status
diabetes status

Generated category counts across 2,000,000 records.

CategoryCountShare
none1,175,79158.8%
prediabetic577,77728.9%
diagnosed_t2dm179,7609.0%
undiagnosed_t2dm52,3502.6%
type114,3220.7%
household income bracket
household income bracket

Generated category counts across 2,000,000 records.

CategoryCountShare
r50to100k576,76628.8%
r25to50k548,06627.4%
under25k440,83822.0%
r100to150k252,46612.6%
over150k181,8649.1%
hypertension status
hypertension status

Generated category counts across 2,000,000 records.

CategoryCountShare
none1,095,10054.8%
diagnosed769,29638.5%
undiagnosed135,6046.8%
insurance type
insurance type

Generated category counts across 2,000,000 records.

CategoryCountShare
employer993,86849.7%
medicare363,21718.2%
medicaid263,12113.2%
uninsured186,9529.3%
marketplace84,1814.2%
other70,7573.5%
military37,9041.9%
race
race

Generated category counts across 2,000,000 records.

CategoryCountShare
white1,202,90560.1%
black252,02312.6%
multi_racial239,04612.0%
other167,8358.4%
asian120,2306.0%
aian13,8770.7%
nhpi4,0840.2%
sex at birth
sex at birth

Generated category counts across 2,000,000 records.

CategoryCountShare
female1,010,59750.5%
male989,40349.5%
smoking status
smoking status

Generated category counts across 2,000,000 records.

CategoryCountShare
never1,305,53065.3%
former428,43721.4%
current266,03313.3%
state
state

Generated category counts across 2,000,000 records.

CategoryCountShare
CA235,86711.8%
TX179,5019.0%
FL134,1926.7%
NY115,8765.8%
PA78,1063.9%
IL76,5013.8%
OH69,8823.5%
GA65,9933.3%
NC64,0883.2%
MI60,0523.0%
NJ56,1272.8%
VA51,9262.6%
WA46,1282.3%
AZ43,8502.2%
TN42,1332.1%
MA41,9092.1%
IN39,9592.0%
MO38,0071.9%
MD37,8801.9%
WI36,2391.8%
CO34,1111.7%
MN33,7831.7%
SC32,0291.6%
AL29,8901.5%
KY28,0081.4%
LA27,9871.4%
OR26,0881.3%
OK23,7841.2%
CT22,1391.1%
NV20,1381.0%
IA20,0211.0%
UT19,8091.0%
AR18,0670.9%
KS17,9040.9%
MS17,8030.9%
NM12,1200.6%
NE12,0670.6%
ID11,8600.6%
WV9,9240.5%
ME8,1810.4%
NH8,0180.4%
HI7,9250.4%
DE6,1050.3%
SD6,0610.3%
RI6,0410.3%
MT5,8990.3%
ND4,0520.2%
VT4,0230.2%
DC4,0050.2%
WY3,9870.2%
AK3,9550.2%
correlations
correlations
PairObservedReference
age × a1c_value0.2850.140
age × bmi0.0010.100
bmi × waist_circumference_cm0.7600.850
bmi × weight_kg0.8880.780
height_cm × weight_kg0.5160.450
cross overlap
cross overlap
drift
drift
bmi by age
bmi by age
diabetes by age
diabetes by age
a1c value
a1c value
StatisticValue
count2,000,000
mean5.69
std dev1.00
min4.00
max14.00
age
age
StatisticValue
count2,000,000
mean47.47
std dev18.49
min18.00
max95.00
bmi
bmi
StatisticValue
count2,000,000
mean29.49
std dev6.71
min15.00
max65.00
height cm
height cm
StatisticValue
count2,000,000
mean168.48
std dev9.90
min140.00
max210.00
waist circumference cm
waist circumference cm
StatisticValue
count2,000,000
mean97.56
std dev15.81
min55.00
max175.00
weight kg
weight kg
StatisticValue
count2,000,000
mean84.16
std dev21.91
min34.90
max192.30

Single large dataset — 1 × 2M records

Scale evidence: a single 2M-row dataset generated end-to-end via the async endpoint and streamed to object storage by multipart upload.

person-profile-advanced v0.5 · 2,000,000 records · generated in 100s
Calibrated against ACS 2022 · NHANES 2017-2020 · CDC-NDSS 2022 · CDC-mortality 2022 · KFF 2023 · MEPS 2022 · BLS 2023 · USPS-L005 2024.

2M
records
100.00%
unique rows
100.00%
unique IDs
100.00%
valid emails
100.00%
valid phones

Categorical fidelity

Chi-squared test vs reference; effect size is Cramér's V. A small effect size = the synthetic distribution tracks the reference.

AttributeVerdictCramér’s V
ckd_statusmatch0.098
diabetes_statusmatch0.071
educationmatch0.068
employment_statusmatch0.055
ethnicitymatch0.008
hypertension_statusmatch0.016
insurance_typematch0.080
marital_statusmatch0.043
racematch0.098
sex_at_birthmatch0.000
smoking_statusmatch0.058
statematch0.006

Numeric fidelity

Kolmogorov–Smirnov distance vs the reference distribution; lower D = closer fit.

AttributeVerdictKS DReference
a1c_valueclose0.0955NHANES 2017-2020 (LBXGH adult mean)
agematch0.0178ACS 2022 (US adults)
bmimatch0.0193NHANES 2017-2020 (US adults)
height_cmmatch0.0226NHANES 2017-2020 (adult height)
waist_circumference_cmmatch0.0103NHANES 2017-2020 (BMXWAIST adult mean)
weight_kgmatch0.0137NHANES 2017-2020 (adult weight)

Correlations

Observed pairwise correlation vs the published reference.

PairObservedReferenceVerdict
age × a1c_value0.2850.140match
age × bmi0.0000.100match
bmi × waist_circumference_cm0.7600.850match
bmi × weight_kg0.8890.780match
height_cm × weight_kg0.5160.450match

Distributions

Each attribute's generated distribution against its reference curve.

diabetes status
diabetes status

Generated category counts across 2,000,000 records.

CategoryCountShare
none1,176,97358.8%
prediabetic576,88228.8%
diagnosed_t2dm179,5259.0%
undiagnosed_t2dm52,2642.6%
type114,3560.7%
household income bracket
household income bracket

Generated category counts across 2,000,000 records.

CategoryCountShare
r50to100k576,05228.8%
r25to50k546,98427.3%
under25k442,39422.1%
r100to150k252,65112.6%
over150k181,9199.1%
hypertension status
hypertension status

Generated category counts across 2,000,000 records.

CategoryCountShare
none1,095,16254.8%
diagnosed768,70138.4%
undiagnosed136,1376.8%
insurance type
insurance type

Generated category counts across 2,000,000 records.

CategoryCountShare
employer993,57349.7%
medicare363,19318.2%
medicaid264,29513.2%
uninsured186,0119.3%
marketplace83,8094.2%
other70,9453.5%
military38,1741.9%
race
race

Generated category counts across 2,000,000 records.

CategoryCountShare
white1,202,07560.1%
black251,76212.6%
multi_racial240,08912.0%
other167,5988.4%
asian120,3816.0%
aian14,1000.7%
nhpi3,9950.2%
sex at birth
sex at birth

Generated category counts across 2,000,000 records.

CategoryCountShare
female1,009,84450.5%
male990,15649.5%
smoking status
smoking status

Generated category counts across 2,000,000 records.

CategoryCountShare
never1,305,73165.3%
former427,44221.4%
current266,82713.3%
state
state

Generated category counts across 2,000,000 records.

CategoryCountShare
CA236,40411.8%
TX179,8389.0%
FL133,0576.7%
NY116,1945.8%
PA77,4903.9%
IL75,5833.8%
OH70,3503.5%
GA65,9943.3%
NC63,7913.2%
MI59,9553.0%
NJ56,2032.8%
VA52,2712.6%
WA46,1352.3%
AZ43,8662.2%
MA42,1932.1%
TN42,1392.1%
IN40,0412.0%
MO37,9581.9%
MD37,7121.9%
WI35,7751.8%
MN34,0611.7%
CO34,0151.7%
SC31,8981.6%
AL30,2101.5%
LA28,3411.4%
KY28,1901.4%
OR25,9691.3%
OK24,1751.2%
CT22,1691.1%
UT20,0701.0%
NV19,8941.0%
IA19,8301.0%
MS18,1460.9%
AR17,9290.9%
KS17,7450.9%
NE12,3080.6%
NM12,1160.6%
ID11,8880.6%
WV9,9010.5%
HI8,0260.4%
ME7,9680.4%
NH7,8990.4%
DE6,1320.3%
RI6,0820.3%
MT6,0540.3%
SD5,8640.3%
VT4,1760.2%
AK4,0130.2%
DC3,9980.2%
WY3,9980.2%
ND3,9860.2%
correlations
correlations
PairObservedReference
age × a1c_value0.2850.140
age × bmi0.0000.100
bmi × waist_circumference_cm0.7600.850
bmi × weight_kg0.8890.780
height_cm × weight_kg0.5160.450
bmi by age
bmi by age
diabetes by age
diabetes by age
a1c value
a1c value
StatisticValue
count2,000,000
mean5.69
std dev1.00
min4.00
max14.00
age
age
StatisticValue
count2,000,000
mean47.49
std dev18.49
min18.00
max95.00
bmi
bmi
StatisticValue
count2,000,000
mean29.49
std dev6.72
min15.00
max65.00
height cm
height cm
StatisticValue
count2,000,000
mean168.48
std dev9.90
min140.00
max210.00
waist circumference cm
waist circumference cm
StatisticValue
count2,000,000
mean97.56
std dev15.82
min55.00
max175.00
weight kg
weight kg
StatisticValue
count2,000,000
mean84.18
std dev21.93
min34.90
max200.10