Building Robust & Inclusive Democracy

Cooperative Election Study FAQs

Answers to some of the questions users frequently have about the CES are below (Authors: Brian Schaffner, Shiro Kuriwaki). Please address any other questions to the PIs.

Participation

  • A large portion of the CES respondents are YouGov panelists. These are people who have made an account on yougov.com to receive periodic notifications about new surveys. Others are recruited live from online advertisements or are recruited from another survey provider. Therefore, while panelists are prompted to participate in the CES, they opt-in to being a YouGov panelist. 

    In order to make the sample representative, not all respondents to the CES questionnaire end up in the final dataset. To read more about the pruning process used to match to the target population, please refer to the guide. 

  • The CCES has always been an online survey. The use of devices have changed each year. In 2018, 35 percent used Desktop, 56 percent used a smartphone, 9 percent used a tablet (In contrast, 90 percent used a Desktop in 2012).  

  • YouGov respondents are compensated by points for taking each survey. Respondents can exchange accumulated points with giftcards and other prizes.

  • The pre-election wave of the CES is designed to take approximately 20 minutes, though this timing varies depending on the respondent. The post-election wave is designed to take 10 minutes. The odd-year CES surveys happen in one 20 minute wave.

  • We would be cautious of this. The timing of response (the start time and endtime of the interviews) are not random. For example, respondents who are easier to reach or are more likely to opt-in tend to respond to the CCES first. Survey vendors may also encourage their panel to respond at different times in different intensities.

Representativeness

  • The CCES is designed to be representative of all national adults.

  • Yes. State is one of the variables used to construct the target population.

  • No. But you may be interested in population distributions estimated at the CD-level (Kuriwaki et al. 2023) from CES data, to construct weights.

Variables and Questions

  • Search https://cooperativeelectionstudy.shinyapps.io/ccsearch/ for common content questions, and https://cooperativeelectionstudy.shinyapps.io/CESquestionsearch/ for the team module questions in 2016, 2018, and 2020.

  • There are two operationalizations:

    1. Set those with race == "Hispanic" to 1set everyone else to 0.
    2. Set to those with either race == "Hispanic" or those who identified as Hispanic/Latino in the follow-up question asking ethnicity to 1; set everyone else to non-Hispanic. 

    The follow-up question, "Are you of Spanish, Latino, or Hispanic origin or descent?", is only asked to respondents who do not select "Hispanic" as their race. It is called "hispanic" in most CCES datasets.


    In short, the second option will count Hispanic Blacks and Hispanic Whites as Latino/Hispanic. When the CCES weights to popualtion distributions of race, we use the second definition. 

Geolocation

  • We use the full line address if we have it on file and they say they live there. If we do not have their full address,  we use their zip codes. 

  • No. The mapping between zipcodes and state legislative districts is more unreliable compared to congressional districts.

  • “inputzip” is a zipcode each respondent hand enters for the question of the form, “So that we can ask you about the news and events in your area, in what zip code do you currently reside?”  

    Some datasets have a separate field for an individual’s zip code where registered (“regzip”) and the zip code of residence.  The “regzip” is the response to the question “Is [inputzip] the zip code where you are registered to vote?” which includes the option “No (I am registered to vote at this zip code: [User enters zipcode]) ”

    “lookupzip” is the zipcode used to look up a voter’s state and Congressional district. It will be the zip code of a person’s residence unless the zip code in which they are registered to vote is different, in which case this will be their registration zip code. In other words, if both “regzip” and “inputzip” are provided and are different, the former is used for “lookupzip”. This variable will not be missing and is used as the main zipcode for, e.g., the cumulative dataset.

Weighting

  • CCES weighting generally constitues two steps: matching and post-stratification weighting. The approach and algorithm changes somewhat across studies. Each even-year guide on Dataverse provides details on the weighting approach taken for that particular survey. 

  • 2016 and prior: For election year common contents that include both a weight and a weight variable with the "_vv"  suffix (2012, 2016), the best weights to use for all analyses are the weights with a “_vv” suffix. Those are the weights that were calculated after the vote validation and therefore they are the most accurate. The weight variables that do not include the “_vv” suffix are included only so that scholars can replicate analyses that they may have produced prior to the vote validation. Where those two versions do not exist (2006 - 2010, 2014, 2018-), the weight variables are the weights that have been computed after vote validation. We have seen this is often a source of confusion, and we will employ a clearer naming moving forward.

    2018 and onwards: vvweight assigns weights only to active registered voters. "commonweight" in 2018 and onwards are weights for the adult population. See the 2018 guide for details.

    The key point is that “_vv” weights are not weights specific to the turnout electorate, especially 2016 and prior. They still apply to voters and non-voters alike, but they are simply called that because they are weights after vote validation has been performed. 

    In 2012, 2016, and 2020, there are also weights that include the “_post” suffix or term. We recommend the use of the post-election weights variable any time researchers use variables from the post-election wave of the study in their analyses. The post-election weights help to adjust for attrition between the pre- and post-election waves of the study. 

  • What is the difference between "weight" and "weight_cumulative" in the cumulative Common Content?

    “weight_cumulative” is a simple transformation of the the most up to date "weight" in each common content that only adjust for the different sample size in each year. Use "weight_cumulative" for multi-year analysis for in which you want each year to be weighted the same (despite their different sample).

Validated Vote

  • We have created some guides for earlier datasets to help explain the use of these variables in a bit more detail. In more recent years, this information has been integrated into the guides themselves. We recommend you consult these guides as the validation process and variables change slightly from year to year. However, some brief information follows.

    Typically, the main variable of interest will be the variable indicating the vote method used by a respondent. This typically takes on values such as “absentee,” “mail,” “early,” “polling,” and “unknown.” If a respondent has any of these values, they have a validated vote record (“unknown” means that the state did not record what method the individual used to vote, but the individual did vote).

    It should be noted that a record may not be matched to the voter file either because the individual is not registered to vote or because of incomplete or inaccurate information that prevented a match. Matches are made only with records for which there is a high level of confidence that the respondent is being assigned to the correct record. However, even by setting a high threshold of confidence, there will still be some false-positives which should be considered when using the validation records. 

    For identifying non-voters, the researcher may take several different approaches. These different options are laid out in the guides, but the most common approach is to simply treat all individuals who are not validated voters as non-voters (regardless of whether they were matched to the voter files or not). The justification for this approach is the fact that the most common reason that the voter file firm will not have a record for an individual is because that individual is not registered to vote. Indeed, rates of self-reported non-registration and non-voting are much higher among un-matched respondents than among those for whom there is a match.

Panel

  • The main CES studies are based on different cross-sectional samples in each year. Thus, these do not constitute a panel survey where the same respondents are being re-interviewed year after year. However, the CES did conduct a panel survey in 2010, 2012, and 2014 and you can find the data for that study here. 

  • This panel survey was born out of the sample of respondents who took the 2010 common content, but those respondents were reserved for the panel survey in subsequent years. 19,000 of those who are in the 2010 common content dataset were re-interviewed in 2012 and 9,500 of that group were re-interviewed again in 2014. (See the guides for those datasets for more information on how the panel was constructed.) Thus, respondents in the panel datasets will overlap with respondents in the 2010 common content dataset, but they will not overlap with the 2012 and 2014 common content datasets.

Planning and Support

  • The principal investigators write the common content questions in the summer before the election, with input from others. Questions in team modules are written by the owners of the module. YouGov also maintains standard  panel questions for all members of their panel. These are often indicated by descriptive variable names (such as “race”, “educ”, and “birthyr”). 

  • If you are using the CES data then you have benefited from the National Science Foundation's support of the project. Whenever possible, it is helpful to acknowledge this support by citing the National Science Foundation and the correct award number for the year you are using. The table below shows the correct award number for each year that the NSF has supported the project. 

    Study yearNSF Award Number
    20222148907
    20201948863
    20181756447
    20161559125
    20141430505
    20121225750
    20100924191
    2010-2014 panel1430473 and 1154420
  • The CES PIs receive IRB approval for the common content survey along with other "hub" surveys (such as the 2018 Competitive Districts Study and the 2022 Recontact Survey). The table below provides the relevant institution and study number for the surveys conducted each year since 2014. (For surveys conducted before 2014 please contact the PIs).

    Teams are expected to seek IRB approval for their modules. 

    YearInstitutionProtocol #
    2023Tufts UniversitySTUDY00004327
    2022Harvard UniversityIRB19-1411
    2021Harvard UniversityIRB19-1411
    2020Tufts University1909024
    2019Harvard UniversityIRB19-1411
    2018Harvard UniversityIRB18-1007
    2017Harvard UniversityIRB16-0257
    2016Harvard UniversityIRB16-0257
    2015Harvard UniversityIRB15-3796
    2014Harvard UniversityIRB14-2229
    2010-2014 Panel StudyUniversity of Massachusetts Amherst2014-1991

Peer Reviewed Documentation

More general discussion of methodology can be found in the following peer-reviewed academic articles.

On online surveys as opposed to phone and mail: Stephen Ansolabehere and Brian Schaffner. 2014. "Does Survey Mode Still Matter? Findings from a 2010 Multi-Mode Comparison.Political Analysis. 22(3): 285–303.

On the cooperative structure of the CES: Stephen Ansolabehere and Douglas Rivers. 2013. "Cooperative Survey Research.Annual Review of Political Science. 16(1): 307-329.

On the voter validation: Stephen Ansolabehere and Eitan Hersh. 2012. "Validation: What Big Data Reveal About Survey Misreporting and the Real Electorate.Political Analysis. 20(4): 437-459.

Content Types

Common Content consists of approximately 15 minutes of questions, with about 10 minutes in the pre-election wave and about 5 minutes in the post-election wave. These questions are included on all administered surveys and amount to a 60,000 national adult sample survey. Common Content is asked at the beginning of each survey.

In addition to these questions, YouGov provides demographic indicators, party identification, ideology, and validated vote. 

Team Content arises from institutions who purchase modules to the CES. Each research team that wishes to be involved in the project purchases a 1,000 person sample survey which is connected to the Common Content. Common content is asked of everyone, and then each individual team determines the other half of the questions asked of its 1,000 person sample. This amounts to 10 minutes of team content on the pre-election questionnaire and 5 minutes on the post-election questionnaire. A separate dataset is produced for each team’s content.