Cooperative Election Study FAQs

Answers to some of the questions users frequently have about the CES are below (Authors: Brian Schaffner, Shiro Kuriwaki). Please address any other questions to the PIs.

Participation

A large portion of the CES respondents are YouGov panelists. These are people who have made an account on yougov.com to receive periodic notifications about new surveys. Others are recruited live from online advertisements or are recruited from another survey provider. Therefore, while panelists are prompted to participate in the CES, they opt-in to being a YouGov panelist.
In order to make the sample representative, not all respondents to the CES questionnaire end up in the final dataset. To read more about the pruning process used to match to the target population, please refer to the guide.
The CCES has always been an online survey. The use of devices have changed each year. In 2018, 35 percent used Desktop, 56 percent used a smartphone, 9 percent used a tablet (In contrast, 90 percent used a Desktop in 2012).
YouGov respondents are compensated by points for taking each survey. Respondents can exchange accumulated points with giftcards and other prizes.
The pre-election wave of the CES is designed to take approximately 20 minutes, though this timing varies depending on the respondent. The post-election wave is designed to take 10 minutes. The odd-year CES surveys happen in one 20 minute wave.
We would be cautious of this. The timing of response (the start time and endtime of the interviews) are not random. For example, respondents who are easier to reach or are more likely to opt-in tend to respond to the CCES first. Survey vendors may also encourage their panel to respond at different times in different intensities.

Representativeness

The CCES is designed to be representative of all national adults.
Yes. State is one of the variables used to construct the target population.
No. But you may be interested in population distributions estimated at the CD-level (Kuriwaki et al. 2023) from CES data, to construct weights.

Variables and Questions

Search https://cooperativeelectionstudy.shinyapps.io/ccsearch/ for common content questions, and https://cooperativeelectionstudy.shinyapps.io/CESquestionsearch/ for the team module questions in 2016, 2018, and 2020.
There are two operationalizations:
1. Set those with race == "Hispanic" to 1; set everyone else to 0.
2. Set to those with either race == "Hispanic" or those who identified as Hispanic/Latino in the follow-up question asking ethnicity to 1; set everyone else to non-Hispanic.
The follow-up question, "Are you of Spanish, Latino, or Hispanic origin or descent?", is only asked to respondents who do not select "Hispanic" as their race. It is called "hispanic" in most CCES datasets.

In short, the second option will count Hispanic Blacks and Hispanic Whites as Latino/Hispanic. When the CCES weights to popualtion distributions of race, we use the second definition.
YouGov’s profiling library stores responses over time and preserves previous answers, even when respondents’ religious identification changes. When a panelist participates in multiple surveys, the system retains their historical responses to denomination-specific questions along- side their current religious identification.

For example, if a respondent previously identified as Protestant and answered questions about Protestant denominations, but later identified as Catholic in a ore recent survey, the dataset will show both their current religion (Catholic) and their earlier responses to Protestant denomination questions.

Geolocation

We use the full line address if we have it on file and they say they live there. If we do not have their full address, we use their zip codes.
No. The mapping between zipcodes and state legislative districts is more unreliable compared to congressional districts.
“inputzip” is a zipcode each respondent hand enters for the question of the form, “So that we can ask you about the news and events in your area, in what zip code do you currently reside?”
Some datasets have a separate field for an individual’s zip code where registered (“regzip”) and the zip code of residence. The “regzip” is the response to the question “Is [inputzip] the zip code where you are registered to vote?” which includes the option “No (I am registered to vote at this zip code: [User enters zipcode]) ”
“lookupzip” is the zipcode used to look up a voter’s state and Congressional district. It will be the zip code of a person’s residence unless the zip code in which they are registered to vote is different, in which case this will be their registration zip code. In other words, if both “regzip” and “inputzip” are provided and are different, the former is used for “lookupzip”. This variable will not be missing and is used as the main zipcode for, e.g., the cumulative dataset.

Weighting

CCES weighting generally constitues two steps: matching and post-stratification weighting. The approach and algorithm changes somewhat across studies. Each even-year guide on Dataverse provides details on the weighting approach taken for that particular survey.
2016 and prior: For election year common contents that include both a weight and a weight variable with the "_vv" suffix (2012, 2016), the best weights to use for all analyses are the weights with a “_vv” suffix. Those are the weights that were calculated after the vote validation and therefore they are the most accurate. The weight variables that do not include the “_vv” suffix are included only so that scholars can replicate analyses that they may have produced prior to the vote validation. Where those two versions do not exist (2006 - 2010, 2014, 2018-), the weight variables are the weights that have been computed after vote validation. We have seen this is often a source of confusion, and we will employ a clearer naming moving forward.
2018 and onwards: vvweight assigns weights only to active registered voters. "commonweight" in 2018 and onwards are weights for the adult population. See the 2018 guide for details.
The key point is that “_vv” weights are not weights specific to the turnout electorate, especially 2016 and prior. They still apply to voters and non-voters alike, but they are simply called that because they are weights after vote validation has been performed.
In 2012, 2016, and 2020, there are also weights that include the “_post” suffix or term. We recommend the use of the post-election weights variable any time researchers use variables from the post-election wave of the study in their analyses. The post-election weights help to adjust for attrition between the pre- and post-election waves of the study.
What is the difference between "weight" and "weight_cumulative" in the cumulative Common Content?
“weight_cumulative” is a simple transformation of the the most up to date "weight" in each common content that only adjust for the different sample size in each year. Use "weight_cumulative" for multi-year analysis for in which you want each year to be weighted the same (despite their different sample).

Validated Vote

We have created some guides for earlier datasets to help explain the use of these variables in a bit more detail. In more recent years, this information has been integrated into the guides themselves. We recommend you consult these guides as the validation process and variables change slightly from year to year. However, some brief information follows.
Typically, the main variable of interest will be the variable indicating the vote method used by a respondent. This typically takes on values such as “absentee,” “mail,” “early,” “polling,” and “unknown.” If a respondent has any of these values, they have a validated vote record (“unknown” means that the state did not record what method the individual used to vote, but the individual did vote).
It should be noted that a record may not be matched to the voter file either because the individual is not registered to vote or because of incomplete or inaccurate information that prevented a match. Matches are made only with records for which there is a high level of confidence that the respondent is being assigned to the correct record. However, even by setting a high threshold of confidence, there will still be some false-positives which should be considered when using the validation records.
For identifying non-voters, the researcher may take several different approaches. These different options are laid out in the guides, but the most common approach is to simply treat all individuals who are not validated voters as non-voters (regardless of whether they were matched to the voter files or not). The justification for this approach is the fact that the most common reason that the voter file firm will not have a record for an individual is because that individual is not registered to vote. Indeed, rates of self-reported non-registration and non-voting are much higher among un-matched respondents than among those for whom there is a match.

Panel

The main CES studies are based on different cross-sectional samples in each year. Thus, these do not constitute a panel survey where the same respondents are being re-interviewed year after year. However, the CES did conduct a panel survey in 2010, 2012, and 2014 and you can find the data for that study here.
This panel survey was born out of the sample of respondents who took the 2010 common content, but those respondents were reserved for the panel survey in subsequent years. 19,000 of those who are in the 2010 common content dataset were re-interviewed in 2012 and 9,500 of that group were re-interviewed again in 2014. (See the guides for those datasets for more information on how the panel was constructed.) Thus, respondents in the panel datasets will overlap with respondents in the 2010 common content dataset, but they will not overlap with the 2012 and 2014 common content datasets.

Planning and Support

The principal investigators write the common content questions in the summer before the election, with input from others. Questions in team modules are written by the owners of the module. YouGov also maintains standard panel questions for all members of their panel. These are often indicated by descriptive variable names (such as “race”, “educ”, and “birthyr”).
If you are using the CES data then you have benefited from the National Science Foundation's support of the project. Whenever possible, it is helpful to acknowledge this support by citing the National Science Foundation and the correct award number for the year you are using. The table below shows the correct award number for each year that the NSF has supported the project.
Study year NSF Award Number
2022 2148907
2020 1948863
2018 1756447
2016 1559125
2014 1430505
2012 1225750
2010 0924191
2010-2014 panel 1430473 and 1154420

Study year	NSF Award Number
2022	2148907
2020	1948863
2018	1756447
2016	1559125
2014	1430505
2012	1225750
2010	0924191
2010-2014 panel	1430473 and 1154420

The CES PIs receive IRB approval for the common content survey along with other "hub" surveys (such as the 2018 Competitive Districts Study and the 2022 Recontact Survey). The table below provides the relevant institution and study number for the surveys conducted each year since 2014. (For surveys conducted before 2014 please contact the PIs).

Teams are expected to seek IRB approval for their modules.

Year	Institution	Protocol #
2023	Tufts University	STUDY00004327
2022	Harvard University	IRB19-1411
2021	Harvard University	IRB19-1411
2020	Tufts University	1909024
2019	Harvard University	IRB19-1411
2018	Harvard University	IRB18-1007
2017	Harvard University	IRB16-0257
2016	Harvard University	IRB16-0257
2015	Harvard University	IRB15-3796
2014	Harvard University	IRB14-2229
2010-2014 Panel Study	University of Massachusetts Amherst	2014-1991

Peer Reviewed Documentation

More general discussion of methodology can be found in the following peer-reviewed academic articles.

On online surveys as opposed to phone and mail: Stephen Ansolabehere and Brian Schaffner. 2014. "Does Survey Mode Still Matter? Findings from a 2010 Multi-Mode Comparison." Political Analysis. 22(3): 285–303.

On the cooperative structure of the CES: Stephen Ansolabehere and Douglas Rivers. 2013. "Cooperative Survey Research." Annual Review of Political Science. 16(1): 307-329.

On the voter validation: Stephen Ansolabehere and Eitan Hersh. 2012. "Validation: What Big Data Reveal About Survey Misreporting and the Real Electorate." Political Analysis. 20(4): 437-459.

Content Types

Common Content consists of approximately 15 minutes of questions, with about 10 minutes in the pre-election wave and about 5 minutes in the post-election wave. These questions are included on all administered surveys and amount to a 60,000 national adult sample survey. Common Content is asked at the beginning of each survey.

In addition to these questions, YouGov provides demographic indicators, party identification, ideology, and validated vote.

Team Content arises from institutions who purchase modules to the CES. Each research team that wishes to be involved in the project purchases a 1,000 person sample survey which is connected to the Common Content. Common content is asked of everyone, and then each individual team determines the other half of the questions asked of its 1,000 person sample. This amounts to 10 minutes of team content on the pre-election questionnaire and 5 minutes on the post-election questionnaire. A separate dataset is produced for each team’s content.

Cooperative Election Study FAQs

Participation

How are respondents recruited into the CES?

On what device do respondents take the survey?

Are respondents compensated for their responses?

How long is one sitting of a survey?

Can one CES be used as a time series of the election cycle?

Representativeness

Is the CES representative of all national adults, or only those who vote?

Are weighted samples of states designed to be representative of each state?

Are weighted samples of congressional districts (CD) designed to be representative of each CD?

Variables and Questions

Where can I search all the questions ever asked in the CES?

How should I define the Latino / Hispanic voters?

Why do some respondents have values for religious denomination questions (religpew protestant, religpew catholic, etc.) that don’t match their current reported religion in the religpew variable?

Geolocation

How does CCES determine the representative's congressional district?

Does CES identify the respondent's state legislative district?

In the dataset, what do all the variations for zipcode (inputzip, regzip, lookupzip) distinguish?

Weighting

How does the CES construct weights?

What are the weights that end in “_vv”, and which should I use?

In the cumulative file, what is "weight_cumulative"?

Validated Vote

How do I distinguish validated voters and non-voters using validated vote?

Panel

Is the CES a panel?

Are the respondents in the 2010-2014 panel the same as those in Common Content each year?

Planning and Support

Who writes the CES questions?

How do I acknowledge NSF support?

How does IRB approval work?

Peer Reviewed Documentation

Content Types