UC BERKELEY · CAPSTONE 2026

Ground it before
you simulate it.

A framework for demographically grounded LLM simulations of public opinion — replacing implicit assumptions with empirically validated design choices.

CORPUS 79 waves Pew ATP, 2021–2024
RESPONDENTS 38,449 unique panelists
OPINION ITEMS 1,426 across 10 policy domains
CENSUS FRAME ~2.5M / yr ACS PUMS adults
01 Overview

A simulated population is only useful if it’s the right one.

When a government tests a housing or minimum wage policy on a simulated population, the quality of those predictions depends on one thing: whether the simulated people match the real ones. We argue that without empirical demographic grounding, current LLM-based simulations of public opinion introduce systematic, measurable distortions into who is represented and whose preferences are modeled — distortions that are directional, not random, and that compound existing biases in LLM-internal opinion representations.

CivicSim provides three empirical demonstrations of these distortions and proposes a corrective framework. Every demographic modeling decision is treated as an empirical question, not a design preference.

PROBLEM

The survey data conditioning today’s LLM agents is not a faithful replica of the U.S. population.

PROBLEM

Conventional demographic variables capture less than 11% of the available opinion signal.

PROBLEM

Geographic conditioning is applied uniformly when its necessity varies sharply by domain.

02 Architecture

Two orthogonal grounding streams.

The structural stream draws agents from ACS Census microdata. The behavioral stream attaches opinion priors from Pew ATP, filtered by information-theoretic variable selection. Both fuse into a grounded persona that conditions every LLM call.

CivicSim architecture diagram
Structural stream: ACS PUMS → stratified draw → TVD-validated demographic profile dᵢ.     Behavioral stream: Pew ATP → IG variable selection → geo-conditioned prior P(y | d, q).
03 Empirical Findings

Three failures, three corrections.

STUDY 01

The survey is not the population.

Post-stratification weighting corrects marginal demographic distributions, but not joint distributions in sparse, systematically excluded subgroups. Rural and intersectional populations face the largest representation gaps.

0.321 TVD Young Black Americans (18–29) income gap — nearly 1/3 of mass in the wrong bracket
0.303 TVD Rural × low-income geographic gap (Census Division)
14× baseline Rural geographic misalignment vs. full-sample baseline
ACS vs ATP demographic distributions
ACS (true U.S. population) vs. weighted Pew ATP across all eight demographic dimensions. Marginal weighting closes most gaps but cannot recover joint distributions in systematically excluded subgroups.
STUDY 02

Marginal rankings are the wrong selection tool.

Information gain across all 127 non-empty subsets of 7 demographics × 1,426 opinion items shows the conventional conditioning set captures barely a tenth of available opinion signal — and that the most important variable in the joint ranks only third in marginal importance.

CONVENTIONAL {age, income, education}
10.6%
of full 7-variable joint signal
vs.
GREEDY OPTIMAL {race, location, age}
25.2%
of full 7-variable joint signal
−53.5%
The structural failure. Removing Census division from the full joint causes a 53.5% drop in opinion information gain — the largest unique contribution of any variable, despite its only-third marginal rank. Marginal rankings will reliably exclude the most important variable.
Greedy vs conventional ablation curves
Greedy-optimal (blue) vs. conventional (red) coverage by variable count.
Leave-one-out contribution by variable
Leave-one-out drop in joint IG. Census division dominates.
STUDY 03

Geography is domain-specific — not universal.

We measured whether opinion distributions transfer across Census regions after demographic conditioning, using Jensen-Shannon distance. Most domains pool nationally; international affairs require explicit geographic conditioning. Young respondents need geography even where the domain doesn’t.

not needed

Pool nationally after demographic conditioning.

  • Technology 0.119
  • Environment / Climate 0.119

optional

Include for sensitive analyses; required for young agents.

  • Health 0.129
  • Family & Society 0.134
  • Economy 0.135
  • Religion 0.141
  • Politics & Government 0.142
  • Race & Inequality 0.145
  • Immigration 0.150

required

Geographic conditioning mandatory — views covary with local immigrant community composition.

  • International 0.174
+70%
The age modifier. Young respondents (18–29) exhibit up to 70% more geographic variation than older cohorts — across every income tier and every domain. For this group, geographic conditioning should be applied one tier more aggressively than the domain-level classification suggests.
04 Framework

Three corrective steps.

CivicSim operationalizes a single principle: the decision of who to simulate, and along which demographic axes, is an empirical question, not a design preference.

STEP 01

Draw agents from census microdata

Sample synthetic agents from ACS PUMS (~2.5M adult records per year) rather than survey sample data. Population representativeness becomes a property of the data, not a research question.

→ Validated by Study 01
STEP 02

Select conditioning variables empirically

Run a leave-one-out or greedy IG ablation over the survey corpus for the target domains. Always include race and Census division — interaction-dominated signal cannot be recovered from marginal rankings.

→ Validated by Study 02
STEP 03

Apply tiered geographic conditioning

Use the domain classification from Study 03 to determine whether geography is required, optional, or unnecessary. For young agents (18–29), tier up by one level regardless of domain.

→ Validated by Study 03
05 Paper

Read the full work.

UC BERKELEY · CAPSTONE 2026

Ground It Before You Simulate It: The Case for Demographically Grounded LLM Simulations

We argue that current LLM-based public opinion simulations are not approximations of a representative population but consistent, predictable distortions at the input level — and that fixing this is methodologically prior to all other concerns about LLM agent quality.

CivicSim Team · UC Berkeley