SRL sends out weekly news bulletins that cover various aspects of survey research and best practices. Below is an archive of the bulletins sent out to date by category; to view, click on any title.
If you would like to start receiving these weekly emails, please send your name, affiliation, and e-mail to email@example.com.
Click on a title to view content.
In an effort to encourage an open science of survey research, the American Association for Public Opinion Research (AAPOR) has formally launched a Transparency Initiative, designed to education and encourage the disclosure of methodological details regarding publicly released survey data. Building on its Code of Professional Ethics and Practices, the Transparency Initiative outlines a minimal set of technical details that it requires member organizations disclose when reporting survey data. These include the following:
- Who sponsored the research study, who conducted it, and who funded it, including, to the extent known, all original funding sources.
- The exact wording and presentation of questions and responses whose results are reported.
- A definition of the population under study, its geographic location, and a description of the sampling frame used to identify this population. If the sampling frame was provided by a third party, the supplier shall be named. If no frame or list was utilized, this shall be indicated.
- A description of the sample design, giving a clear indication of the method by which the respondents were selected (or self-selected) and recruited, along with any quotas or additional sample selection criteria applied within the survey instrument or post-fielding. The description of the sampling frame and sample design should include sufficient detail to determine whether the respondents were selected using probability or nonprobability methods.
- Sample sizes and a discussion of the precision of the findings, including estimates of sampling error for probability samples and a description of the variables used in any weighting or estimating procedures. The discussion of the precision of the findings should state whether or not the reported margins of sampling error or statistical analyses have been adjusted for the design effect due to clustering and weighting, if any.
- Which results are based on parts of the sample, rather than on the total sample, and the size of such parts.
- Method and dates of data collection.
We strongly encourage all researchers to plan proactively to disclose these details about their surveys, along with other disclosure elements mentioned in AAPOR's Code, when reporting findings from their research.
In 2014, SRL reached its 50th anniversary, a landmark event that was celebrated by a symposium in Urbana and one in Chicago. Dr. Richard Warnecke, former director of SRL, spoke about the history of SRL, and Dr. Jon Krosnick and Dr. Norbert Schwarz gave invited lectures about the present and future of the survey methodology. Many past and former SRL staff, students, and faculty attended the event. SRL's history is documented in a publication that was released this week (including a summary of the history of the lab and brief descriptions of every study conducted at SRL in the past 50 years) that is available at www.srl.uic.edu/Publist/50_Year_History.pdf.
The 8th edition of the Standard Definitions document was recently released by the American Association for Public Opinion Research (AAPOR). This document contains updated standardized lists of disposition codes and formulas for the calculation of response rates, cooperation rates, and refusal rates for telephone and in-person household surveys, mail and Internet surveys of specifically named persons, mixed-mode surveys, and establishment surveys. It includes updated sections on establishment surveys and dual-frame telephone surveys. AAPOR plans to continue to update this document with new disposition codes and rate estimation best practices to address evolving survey practices and new technologies. Using these standardized disposition codes and formulas facilitates meaningful comparisons across surveys and many professional journals now require their use when reporting findings from primary survey data collection efforts.
For more information, see
The American Association for Public Opinion Research. (2015). Standard definitions: Final dispositions of case codes and outcome rates for surveys (8th ed.). AAPOR.
The American Association for Public Opinion Research (AAPOR) recently updated its Code of Professional Ethics and Conduct. The Code is reviewed and updated every five years to insure it remains relevant for the continuing practice of survey research. A key section of the Code focuses on disclosure of research methods. AAPOR believes that "Good professional practice imposes the obligation upon all public opinion and survey researchers to disclose sufficient information about how the research was conducted to allow for independent review and verification of research claims." In addition to revising its standards for disclosure of survey research methodology, the updated Code includes for the first time standards for disclosure of (1) qualitative methodologies, including focus groups, and (2) content analyses.
The revised Code of Professional Ethics and Conduct is available at
No. 42. Online education opportunities in survey methods and public opinion research at the University of Illinois.
For those interested in learning more about surveys and public opinion research, there are many online educational opportunities available at the University of Illinois. In addition to the Survey News Bulletins, the Survey Research Laboratory conducts free webinars each semester on a variety of topics related to survey research methods (see www.srl.uic.edu/seminars.htm for a list of the live fall webinars and access to recordings of past webinars). Also, the Institute of Government and Public Affairs will host a live screening of a webinar about racial attitudes and public opinion co-sponsored by the Midwest Association for Public Opinion Research and the American Association for Public Opinion Research on the UIC campus (see https://igpa.uillinois.edu/event/webinar-screening-racial-attitudes-and-public-opinion for details and to register to attend the live screening and www.aapor.org/AAPORKentico/Education-Resources/Online-Education/Webinar-Details.aspx?webinar=WEB1015 for information about how to register for the webinar if you cannot attend the live screening). More formal online education opportunities are available through courses offered as part of the Survey Research Methods Online Certificate Program (see www.uic.edu/cuppa/pa/srm/index.html) that are open to both current and nondegree students.
Between 1975 and 2013, there have been ten periodic conferences concerned with health survey research methodology held in the United States. These conferences, supported primarily by government agencies, have continuously tracked developments, innovations, and challenges in the design and implementation of health surveys. Proceedings from these conferences contain summaries of the presentations given at each conference and represent a valuable resource for health survey researchers about advances in health survey methods. The Survey Research Laboratory has been a leader in organizing and hosting these many of these meetings and has organized on its Web site PDF versions of the proceedings from all ten of these conferences. These documents can be accessed at http://www.srl.uic.edu/links/proceedings.html.
The U.S. Department of Health and Human Services’ Office of Research Integrity (ORI) and Office for Human Research Protections (OHRP) have an online training module called The Research Clinic designed to teach both clinical and social researchers how to avoid research misconduct and protect subjects. In this module, participants can assume the role of principal investigator, clinical research coordinator, research assistant, or IRB coordinator. It is a valuable tool for teaching everyone involved in the research process--especially those new to subject recruitment and/or data collection--the consequences of deviating from established protocols.
The research clinic can be found at : http://ori.hhs.gov/TheResearchClinic
Ever agree to participate in a survey, only to realize after answering a few questions that you are really in the middle of a sales pitch or fund-raising call? If so, you may have been the victim of sugging (i.e., “selling-under-the-guise-of-research”) or frugging (i.e., “fund-raising-under-the-guise-of-research”). These common-place and highly unethical practices that disguise either selling or fundraising as a scientific survey give legitimate researchers a bad name and lead many people to be suspicious of all research and resistant to almost any request for survey participation. Sugging and frugging have both been publicly condemned by the American Association for Public Opinion Research (AAPOR) and other professional research associations.
For more information, visit the following web pages:
AAPOR Condemned Survey Practices:
AAPOR Statement on Trump/Pence Campaign Web Survey:
Response rates (i.e., the proportion of eligible respondents who participate in a survey) have been decreasing and are of great concern to researchers (particularly telephone surveys). One approach to this has been research exploring the usefulness of responsive design procedures where contact efforts are guided by information and evidence collected from previous contacts. One source of such evidence is ratings from interviewers in both face-to-face and telephone surveys. Interviewer ratings of response likelihood were predictive of cooperation in a telephone survey, suggesting that interviewer ratings could prove to be a useful source of information for responsive design procedures.
Eckman, S., Sinibaldi, J., & Montmann-Hertz, A. (2013). Can interviewers effectively rate the likelihood of cases to cooperate? Public Opinion Quarterly, 77, 561-573.
Groves, R. M. & Heeringa, S. G. (2006). Responsive design for household surveys: Tools for actively controlling survey errors and costs. Journal of Research in Statistics, 169, 439-457.
Everyone wants their data fast. It takes time to design and conduct a high quality survey. A typical Web survey can take 3-5 months; a mail survey, 4-6 months; and the time needed to conduct a telephone or face-to-face survey depends on many factors that don't lend themselves to predictable time frames (such as geographical dispersion of the population and sample size). Steps that affect data collection time include the following:
- Questionnaire development and testing (especially when new questions or measures are being developed or if programmed for self-administration via Web, or interviewer-administration in CAI software). This can be particularly time consuming if many stakeholders are involved in the questionnaire development process.
- Sample frame development.
- IRB review and approval (always leave time to respond to modifications! Do not expect approval upon initial submission).
- Cognitive pretesting.
- A thorough pilot study.
- Time to amend the IRB protocol based on the pilot study.
- Adequate time to collect data--plan for more than you think (you may decide to do another mailing, for example, if returns have been slow to come in).
- Time for data processing and cleaning before the final data set is ready.
Meta-analyses of randomized experiments have demonstrated that providing potential respondents a prepaid (i.e., noncontingent) incentive is more effective than a promised (i.e., contingent) incentive for increasing survey response rates in both self-administered mail (Church, 1993) and interviewer-mediated telephone and face-to-face surveys (Singer et al., 1999). In addition, monetary incentives are consistently found to be more effective than gifts and other forms of non-monetary incentives for increasing response rates.
Church, A. H. (1993) Estimating the effect of incentives on mail survey response rates: A meta-analysis. Public Opinion Quarterly, 57, 62-79.
Singer, E., Van Hoewyk, J., Gebler, N., et al. (1999) The effect of incentives on response rates in interviewer-mediated surveys. Journal of Official Statistics, 15, 217-230. (www.jos.nu/Articles/article.asp)
Developing a survey requires trade-offs between data quality and the cost of obtaining the data. Beyond the labor costs associated with professional time for survey/sample design, questionnaire development and data analysis, there are myriad expenses that apply depending on your mode of data collection and study design. Whether you are fielding your own data collection effort or hiring a professional survey research firm to collect data, here is a sample list of expenses you should anticipate:
- Questionnaire length (for mail surveys, affects printing, postage, and data entry costs; for in-person and telephone interviews, affects total interview time and interviewer costs)
- Geographic dispersion of the sample
- Whether screening is required to find target population
- Printing costs
- Telephone charges
- Materials (envelopes, paper)
- Translation of questionnaire and recruitment materials into other languages
- Respondent incentives and disbursement (if they are mailed to subjects later, labor, materials, and postage on top of the incentive)
- Number of contact attempts
- Interviewer travel time and mileage
- Data entry and processing
- Software license fees for Web or computer-assisted interview questionnaires
- Equipment purchases (computers or other electronic data collection tools such as tablets for in-person surveys in particular)
See Blair, J. E., Czaja, R. F., & Blair, E. A. (2013). Designing surveys (3rd ed., pp. 337-343). Thousand Oaks, CA: Sage.
Health-related research relies heavily on the use of survey methodologies, and these methodologies have become increasing sophisticated in recent decades. Two recent books summarize much of the research literature on this topic. Aday and Cornelius (2006), in the third edition of Designing & conducting health surveys, walk investigators through each phase of the survey process, from conceptualization, through operationalization, data collection and basic analyses. Johnson's (2015) edited volume Handbook of health survey methods, presents 29 chapters written by experts in various aspects of health survey methodology, covering detailed topics under the headings of Design and Sampling, Measurement, Field Data Collection, Special Populations, and Data Management and Analysis.
For further information about health survey research methods, see
Aday, L. A., & Cornelius L. J. (2006). Designing & conducting health surveys (3rd ed.). San Francisco: Jossey-Bass.
Johnson, T. P. (Ed.) (2015). Handbook of health survey methods. Hoboken, NJ: John Wiley & Sons.
Many researchers choose to hire students or graduate research assistants to collect data, particularly with list samples of subjects. It is important to consider when you are most likely to reach your population so that you can staff the study accordingly. For example, a list of high school principals would only need contact on weekdays during business hours. But if you are trying to reach parents who have children in day care, you will need evening and weekend contact attempts to maximize your chance of reaching your subjects. You therefore need to hire your staff with the availability that closely matches that of your population. Moreover, varying the dates and times of contact attempts will affect your response rate. If you have students who only work Monday, Wednesday, and Friday, then you are not making any contact attempts on the other days of the week. Also avoid making contact attempts only in the early evening or Saturday morning, for example, as you will repeat the same patterns of noncontact; you need variety not only in the days of the week attempts are made but also in the times of day. At a minimum, 10 varied contact attempts should be made on each case before finalizing it as a noncontact. Finally, consider how student semester schedules overlap with your data collection schedule. You should avoid fielding your study during known breaks when students will most likely be absent.
Panel studies involve the collection of data over time from the same sample of respondents. Unlike other forms of longitudinal studies, panels allow for the study of individual behavior change over time. However, because the same individuals are followed, there is eventual attrition, or nonresponse, after the baseline data collection wave. Attrition is either due to the researcher’s inability to locate the respondent for additional waves of data collection or to the respondent declining to participate when located. Since the value of panel surveys is dependent upon the ability to study the same respondents at different points in time, reducing attrition is of major concern in social and behavioral research. Loss of respondents over time raises the possibility of bias if those who are lost to follow-up differ from those who remain in the panel on key dependent variables. Therefore, panel attrition can affect both the internal and external validity of the study (Cook and Campbell, 1979). There are three main factors that will affect the degree of attrition in any panel study: (1) recruiting the respondent into the study; (2) successfully locating the respondent for subsequent interviews; and (3) maintaining the respondent’s commitment to the panel.
For more information, see:
Parsons, J.A. (2015). Longitudinal research: Panel retention. In J. D. Wright (Ed.-in-chief), International Encyclopedia of the Social & Behavioral Sciences (2nd ed., Vol 14, pp. 354–357). Oxford: Elsevier.
Cook, T.D. & Campbell, D. (1979). Quasi-experimentation: Design and analysis Issues for field settings. Geneva, IL: Houghton-Mifflin Company.
Training of interviewers for telephone or face-to-face data collection should always include practice interviews. Sometimes referred to as "mock" interviews, this hands-on practice should be as close as possible to an actual interview interaction and involve a trainer or supervisor playing the role of the respondent. Interviewers should be expected to read and record exactly as they would in an actual interview, and trainers should provide realistic practice scenarios.
Mock interviews with a screening questionnaire allow interviewers to practice introductions and refusal aversion and answer common respondent questions about the study. Group sessions of practice interviews with the screener and introduction can be conducted with each interviewer taking turns reading the same text and responding to scenarios posed by the trainer. As these practice interviews progress, they can be used to test interviewers in various scenarios and allow evaluation of reading, coding, and note taking on open-ended captures.
During these mock interview sessions, interviewers should be given immediate feedback on pacing, verbatim reading, probing, and following instructions. Effective mock interviews will engage interviewers in the training process, help them work through nerves, and allow them to listen to other interviewers. They also reinforce the importance of standardization and establish the role of feedback in the interview process.
For more information:
Lavrakas, P. J. (2008). Role playing. In P. J. Lavrakas (Ed.), Encyclopedia of survey research methods: Volume 1 (p. 768). Thousand Oaks, CA: Sage.
Fowler, F. J. Jr. (2009). Survey research methods (4th ed., pp. 127-145). Thousand Oaks, CA: SAGE Publications, Inc. doi: http://dx.doi.org/10.4135/9781452230184.n8
Monitoring the work of interviewers is an essential part of data collection. Monitoring is done to gauge interviewer productivity, assess the quality of each interviewer’s work, minimize errors in data collection, and to guard against falsification.
In a phone center, monitoring is typically done remotely, with a monitor listening to both the interviewer and respondent and watching data entry as it happens. For face-to-face interviewing, monitoring can be done by accompanying an interviewer into the field and observing the interview process. It can also be supported by validation, a process by which some respondents are re-contacted to confirm that the interview was conducted properly. Among the questions to consider during monitoring:
* Are interviewers providing study information accurately to informants and respondents?
* Are interviewers averting refusals effectively?
* Are procedures for dialing cases or visiting sampled addresses being followed?
* Are cases being coded correctly?
* Is screening being conducted correctly?
* Are notes accurate, succinct, and sufficiently detailed?
* Are interviewers working efficiently?
* Are questions being read verbatim, at the right pace, and with correct emphasis?
* Are interviewers probing when necessary?
* Is probing neutral and thorough?
* Is data entry accurate?
Feedback given to interviewers immediately after monitoring should point out and validate correct behaviors and provide constructive feedback on things interviewers need to do better. It should be supported by specific examples from the interviewer’s work. Comments should be written and saved so there is a record of each monitoring, feedback given, and an overall snapshot of interviewer performance.
All new interviewers should be monitored early in a study to make sure they are following study procedures. Monitoring should continue throughout a study to make sure procedures are (still) being followed.
For more information:
Steve, K. W. (2008). Interviewer monitoring. In P. J. Lavrakas (Ed.), Encyclopedia of survey research methods: Volume 1 (pp. 372-375). Thousand Oaks, CA: Sage.
Czaja, R., & Blair, J. (2005). Designing surveys: A guide to decisions and procedures (2nd ed.). Thousand Oaks, CA: Pine Forge Press.
As part of training, interviewers should be given specific instructions about how to respond to commonly-asked informant or respondent questions. Scripted answers to frequently asked questions that might be posed (often called the FAQ) provide study-specific information while addressing respondent concerns. Written in everyday language, the FAQ are designed to help interviewers address respondent concerns and questions. In particular, FAQ often address respondent concerns that might be barriers to participation.
Reading the FAQ aloud during training is a first step in helping interviewers learn the responses they will have to provide while they are on the phone or in the field; they should be handy during mock practice interviews and refusal aversion practice (interviewer training specifically designed to help interviewers avoid respondent refusals). During a study, they are a resource for answers to questions that are asked less often or for providing details that can be hard to remember. For telephone surveys, FAQs are typically posted in booths in a phone center; they are carried by face-to-face interviewers in the field.
Answers should be scripted for any and all concerns that can be identified a priori, including sampling (How did you get this number? / Why did you pick me? / Can’t you interview someone else?); confidentiality (Who will see my answers? / How will this information be used?); respondent burden (How long will this take?); survey topic or knowledge concerns (What is this about? / Why do you want to know about x?); and who to call for more information. They can also script quick study-specific comebacks to refusals such as “I’m not interested“ or “I’m too busy.” While the goal is for interviewers to be able to use the information in the FAQ to provide responses in their own words or phrasing as they gain experience on a study, FAQs should be short enough that they can be committed to memory early on. Researchers may want to revise or add to the FAQs if additional common questions or concerns are identified after data collection has begun.
For more information:
Moore, D.L., & Tarnai, J. (2008). Interviewer training. In P.J. Lavrakas (Ed.), Encyclopedia of survey research methods: Volume 1 (p. 372-375). Thousand Oaks, CA: Sage.
Frey, J.H. (1989). Survey research by telephone. (2nd ed.) Newbury Park, CA: Sage.
The way questions are presented for an interviewer to read helps achieve the goal of standardized questionnaire administration. Standardized reading can be an ongoing challenge when multiple interviewers are working on a study, and attending to the details of question formatting and writing helps in this process.
- Questions should be scripted so that interviewers are not tempted (or forced) to add anything to make a question sound complete. Questions that seem complete on paper may not be readable aloud. An absence of question stems (Would you say…) can lead to different interviewer readings. One interviewer may add a stem, but another may not. Similarly, interviewers may read response categories differently in the absence of punctuation such as commas.
- Complicated question formats, while they can save space on a screen, often leave out things (such as question stems, punctuation, or even words) that make for good reading.
- Use standard formatting for emphasis, text that should be read vs. not read, and for notation for acceptable readings in repetitive question series. Interviewers should understand the conventions that are used to express all of these things. Inconsistent question formatting or notation for emphasis can lead to different interviewer readings.
- Avoid parentheticals (words that clarify other words); they usually are not readable aloud.
- If interviewers are allowed to provide definitions, or if specific instructions are required for data entry or coding, they should be provided on screen, rather than left to interviewer memory.
- Providing scripted transitions between sections can help interviewers avoid the temptation to add words in an effort to be "conversational."
- The scripting of one mode (such as Web) will likely not transfer seamlessly to interviewer administration.
Read-throughs with interviewers and mock interview practice can reveal surprising things about how interviewers see questions and can be helpful in identifying shortcomings before a questionnaire is fielded.
For more information, see:
Fowler, F. J. & Mangione, T. W. (1999). Standardized survey interviewing: Minimizing interviewer-related error. Newbury Park, CA: Sage.
Houtkoop-Steenstra, H. (2000). Interaction and the standardized survey interview: The living questionnaire. New York: Cambridge.
A reality of survey research is that falsification by interviewers does happen – and the risk is not just one for large, federally funded surveys administered by survey research organizations, but also for smaller studies where the principal investigator directly supervises a staff of interviewers. Falsification involves the interviewer’s intentional deviation from the study protocol, and includes fabricating all or part of an interview, changing outcomes of contact attempts with subjects, miscoding an answer to a question in order to skip out of follow-up questions, and interviewing a non-sampled person in order to reduce the amount of effort required to complete an interview. Preventing falsification involves fostering extrinsic and intrinsic motivation in study interviewers (Koczela, et al 2015). Detection of falsification requires resources that must be allocated at the budgeting and planning phase of your research.
For a summary of best practices on preventing and detecting interviewer falsification, see:
Interviewer Falsification in Survey Research: Current Best Methods for Prevention, Detection and Repair of Its Effects (http://www.aapor.org/Education-Resources/Resources/Interviewer-Falsification-Practices-and-Policies.aspx)
Interviewers may unintentionally influence respondent behaviors in systematic ways during survey interactions. For example, considerable empirical research suggests that an interviewer’s observable characteristics -- such as gender, age and race/ethnicity -- may cue respondents to relevant social norms that then become integrated into their answers. This is believed to be most likely to happen when interviewer characteristics are directly relevant to the questions being asked. For example, interviewer gender may become relevant when respondents are answering questions about gender-related topics. These differential answers as a consequence of varying social identifiers are commonly referred to as interviewer effects.
In contrast, interviewer variance represents generalized differences across interviewers that are more idiosyncratic in nature, for example, how they phrase questions or probe responses. These differences may account for measurable amounts of unique variance across individual interviewers.
In most survey data analyses, both interviewer effects and interviewer variance remain unexamined, despite the fact that they may have significant influence on statistical estimates. The good news is that these can be evaluated using readily available software programs. Elliott and West (2015) present an example of an interviewer variance analysis. Davis et al (2010) review the literature on interviewer effects.
For more information, see:
Davis, R. E., Couper, M. P., Janz, N. K., Caldwell, C. H., & Resnicow K. (2010). Interviewer effects in public health surveys. Health Education Research, 25(1), 14-26.
Elliott, M. R., & West, B. T. (2015). “Clustering by interviewer”: A source of variance that is unaccounted for in single-stage health surveys. American Journal of Epidemiology, 182(2), 118-126.
One of the most reliable findings in the survey nonresponse literature is the consistent, and sometimes large, urban-rural differences in survey response rates. The response rates obtained in rural areas to both telephone and in-person surveys continue to be higher than in more densely populated urban areas. This “urbanicity” effect has been documented both in terms of the ease with which respondents can be contacted (i.e., contact rates), as well as in terms of respondent willingness to participate in surveys once contacted (i.e., cooperation rates). In urban areas, longer work commuting times, greater proportions of single person households, and greater numbers of restricted access residences all pose barriers to successfully contacting potential respondents. Once contacted, crime fears, reluctance to engage with strangers, and the reduced social cohesion typical of many urban areas contribute to lower cooperation rates. These challenges suggest greater effort is necessary to complete field work in urban environments. These may include increased number and timing of attempts to contact sampled households and individuals, longer data collection periods, decreased interviewer workloads, higher incentives, and more careful tailoring of interviewer-respondent interactions. More research, of course, is needed to address urban-rural disparities in survey response.
Surveys use computer-assisted survey information collection (CASIC) to aid in survey data collection in a variety of ways. Although CASIC may have advantages in terms of efficiency and error reduction, one of its main advantages is that it allows researchers to tailor survey questionnaires to each respondent to minimize respondent burden. Two specific strategies used to do so are programmed skip patterns and text fills.
Programmed skip patterns ask questions of respondents based on their responses to earlier questions. For example, a health survey might only ask about treatments for hypertension of those respondents who report they have been diagnosed with hypertension; or a post-election survey might only ask respondents who say they voted for whom they voted. The advantage of implementing skip patterns with CASIC is that it is possible in self-administered surveys to only show respondents the questions that apply to them. Similarly, in interviewer administered surveys, programmed skip patterns restrict interviewers to only see questions that should be asked of a given respondent. This is an advantage over paper-and-pencil questionnaires where either respondents (self-administered) or interviewers (interviewer-administered) need to follow instructions about which questions should be answered. Programmed skip patterns shorten the instrument, reduce cognitive burden and fatigue, and help to keep respondents engaged in the task of answering survey questions.
Text fills are another CASIC tool. Text fills involve filling in a word, phrase, or value in a question based on a respondent’s answers to one or more previous questions. For example, a respondent might first be asked to indicate the most important problem facing the country today. Later questions can be tailored to reference the problem s/he identified. A respondent who reports that “the economy” is the most important problem facing the country could be asked “Do you have more confidence in the Republican Party or the Democratic Party to deal with the economy?”, whereas a respondent who reports that the most important problem facing the country is "the moral climate" might be asked “Do you have more confidence in the Republican Party or the Democratic Party to deal with the moral climate?” Text fills may also be used to provide memory cues for later questions. For example, a question might ask a respondent the date of their last doctor’s appointment. That date can then be used as a cue in later questions about the appointment (e.g., "When you visited the doctor on [filled date], were you satisfied or dissatisfied with the amount of time the doctor spent with you?"). Text fills can also draw from more than one previous questions and can involve calculations. For example, a series of questions in a survey might ask a respondent about the number of people in different age categories (e.g., under 5, 5-12, 13-17, and 18 or older) living in the respondent's household. A text fill in a later question could use the total household size (i.e., the sum of all these responses). Text fills simplify the task of answering follow-up questions because the respondent doesn’t need to remember his or her responses to earlier questions.
Both skip patterns and text fills are made substantially easier by CASIC. These procedures help researchers avoid asking respondents questions that are irrelevant or unnecessary or requiring respondents to remember answers to previous questions. As a result, respondents are able to put their efforts toward answering the survey questions more carefully and are less likely to become fatigued or bored as the survey progresses.
For more information, see
Couper, M. P., Baker, R. P. Bethlehem, J., Clark, C. Z. F., Martin, J. Nicholls, W. L., & O’Reilly, J. M. (1998). Computer Assisted Survey Information Collection. New York: Wiley.
Individuals typically serve as the unit of analysis in most survey research conduct in this country. Other units of analysis, of course, are possible and one that is used often is the establishment. Establishments can represent one of many organizational forms. They can be for-profit or not-for-profit organizations, and they can have varying functions, such as business, government, education, or criminal justice, to name a few. They can also vary in size and can have multiple locations. Each of these dimensions can present challenges not usually found in traditional person-level surveys. For example, the optimal informant(s) within an establishment need to be identified prior to collecting data. There may be a distinction between those who have the authority and those who have the ability to report data about the establishment. In addition, it is important to recognize that participating in a survey is not always consistent with organizational priorities. A few tips for executing a successful establishment survey include the following:
- Avoid times when workloads are likely to be heaviest and survey requests are most likely to be given low priority
- Be prepared to negotiate with gatekeepers, particularly when trying to reach the leader(s) of an establishment
- Remember that appeals for survey participation are most likely to be successful when framed as being consistent with organizational goals
- Be sure to collect information regarding the position of the informants who actually complete the survey, as informants within organizations are often assigned based on convenience, rather than because they are the most qualified to respond
For additional information regarding establishment surveys:
Special Issue on Establishment Surveys. (2014). Journal of Official Statistics 30(4). (https://www.degruyter.com/view/j/jos.2014.30.issue-4/issue-files/jos.2014.30.issue-4.xml).
One of the first and most central decisions to be made when designing a survey is the mode (or modes) in which survey data are to be collected. Each mode has advantages and disadvantages, and the specific choice of mode will depend on a variety of factors including the goals of the research project, the type of data one wants to collect, the planned sampling approach, the information available about potential respondents or households in the sampling frame, and the resources available to conduct the research. One of the four major modes of survey data collection used today is in-person interviewing.
The major advantages of in-person interviews are that they tend to have higher cooperation rates than other modes and that interviewers can use nonverbal cues in their communication with respondents to build rapport and identify problems. In-person interviews also allow researchers to present complex visual stimuli, conduct relatively long interviews, and provide the best mode for collecting non-self-report data (e.g., biophysical measures). They allow interviewers to provide documentation to establish the legitimacy of the survey request, provide clarification to respondents, and address respondent questions or problems; respondents interviewed in-person may work hard to answer survey questions carefully. Most in-person interviews today also use computer-assisted personal interviewing (CAPI), and respondents' answers are entered directly into a laptop or other electronic device, eliminating the need for additional data entry and allowing for complex skip patterns, randomizations, and/or fills to be programmed into the survey instrument. The disadvantages of in-person interviews are that they tend to be expensive and time consuming to conduct. They also have greater potential for interviewer bias and provide less privacy to respondents than do self-administered surveys. It is more difficult to supervise and monitor in-person interviews, and interviewer falsification may therefore be more likely than with telephone interviewing.
See: Holbrook, A. L., Green, M. C., & Krosnick, J. A. (2003). Telephone vs. face-to-face interviewing of national probability samples with long questionnaires: Comparisons of respondent satisficing and social desirability response bias. Public Opinion Quarterly, 67, 79-125.
Lyberg, L. E. & Kasprzyk D. (1991). Data collection methods and measurement error: An overview. In P. P. Biemer, R. M. Groves, L. E. Lyberg, N. A. Mathiowetz, & S. Sudman. (Eds.), Measurement errors in surveys (pp. 237-257). New York: Wiley.
One of the first and most central decisions to be made when designing a survey is the mode (or modes) in which survey data are to be collected. Each mode has advantages and disadvantages and the specific choice of mode will depend on a variety of factors including the goals of the research project, the type of data one wants to collect, the planned sampling approach, the information available about potential respondents or households in the sampling frame, and the resources available to conduct the research. One of the four major modes of survey data collection used today is telephone interviewing.
The major advantage of telephone interviewing is that it can be used to collect data quickly. It is generally less expensive than in-person interviewing but more expensive than either mail or Web surveys (although this is not always the case). Data are entered directly into the computer by interviewers, eliminating the need for additional data entry and allowing for complex skip patterns, randomizations, or fills to be programmed into the survey instrument. Further, it allows for efficient monitoring and supervising of interviewers through the use of a centralized telephone facility and an electronic monitoring system that allows monitors to listen to interviewers as they make calls and talk to respondents. The disadvantages of telephone interviewing are that respondent cooperation rates are lower and the practical length of telephone interviews is generally shorter than in-person interviews. Although interviewers are available to answer questions and address potential problems, they are limited to verbal communication and cannot make use of nonverbal communication to build rapport with respondents or identify respondent problems. It is also more difficult for telephone interviewers to establish the legitimacy of the survey request because they cannot provide written documentation or ID. Telephone interviews have greater potential for interviewer effects on data than self-administered modes and may be less well-suited for asking sensitive questions than are other modes. Future challenges for telephone surveys include declining cooperation rates and the continuing development of sampling and interviewing strategies that incorporate cell phones.
For more information, see
Dillman, D., Smyth, J.,& Christian, L. (2014). Internet, phone, mail, and mixed-mode surveys: The tailored design method. Hoboken, NJ: Wiley.
Groves, R. M., Biemer, P. P., Lyberg, L. E., Massey, J. T., Nicholls, W. L., & Waksberg, J. (2001). Telephone survey methodology. Hoboken, NJ: Wiley.
One of the first and most central decisions to be made when designing a survey is the mode (or modes) in which survey data are to be collected. Each mode has advantages and disadvantages and the specific choice of mode will depend on a variety of factors including the goals of the research project, the type of data one wants to collect, the planned sampling approach, the information available about potential respondents or households in the sampling frame, and the resources available to conduct the research. One of the four major modes of survey data collection used today is mailed paper and pencil self-administered questionnaires.
Mailed paper and pencil questionnaires have the advantage of generally being less expensive than modes that require interviewers (although this is not always the case). Mail surveys (and self-administered questionnaires more generally) are preferable to interviewer-administered surveys for asking sensitive questions because they maximize respondent privacy. Mailed questionnaires also allow respondents to complete the survey at their own pace. It is possible to present visual material in mail surveys, but some forms of complex stimuli cannot be presented (e.g., a video clip). Mail surveys generally have lower response rates than either telephone or in-person interviews, although optimally designed mail surveys can in some cases obtain higher response rates. All communication in mail surveys is typically done in writing, so the legitimacy of the survey request and all instructions must be clear and well written. Relying on written communication also assumes, however, that respondents read materials and instructions carefully and the researcher has little or no control over whether or not they do so. Skip patterns must be conveyed in writing to respondents and require them to follow instructions. There is no interviewer present to answer questions or address problems. In addition, data must be entered into a computer once questionnaires are returned and this is an additional cost and a place where error can be introduced. Double entry of all questionnaires (with checks of any inconsistencies) is becoming industry standard for mail surveys. Finally, mail surveys are generally more suited to surveys of known individuals than to household surveys where within-household selection of a respondent is necessary, as it can be difficult to implement a within-household selection process without the presence of an interviewer.
For more information, see
Dillman, D., Smyth, J.,& Christian, L. (2014). Internet, phone, mail, and mixed-mode surveys: The tailored design method. Hoboken, NJ: Wiley.
One of the first and most central decisions to be made when designing a survey is the mode (or modes) in which survey data are to be collected. Each mode has advantages and disadvantages, and the specific choice of mode will depend on a variety of factors including the goals of the research project, the type of data one wants to collect, the planned sampling approach, the information available about potential respondents or households in the sampling frame, and the resources available to conduct the research. One of the four major modes of survey data collection used today is Web or Internet surveys.
Web surveys are increasingly popular and have a number of advantages. First, because respondents enter data directly, they avoid the additional cost and potential error introduced by data entry found with mailed questionnaires. Web surveys often can be conducted quite inexpensively and quickly by a single researcher. They also allow for the use of complex skip patterns, randomizations, or fills to be programmed into the survey instrument and provide respondents with privacy when completing the questionnaire, making them good for asking sensitive questions. However, like paper-and-pencil self-administered questionnaires, all materials and instructions must be written. As a result, the potential for errors or misunderstandings is greater than for interviewer-administered surveys. In addition, the researcher has little ability (beyond written instructions) to motivate respondents to answer survey questions carefully and completely. Perhaps the biggest limitation of Web surveys is that they are primarily appropriate when the sample frame includes e-mail addresses. This limits their utility for conducting surveys with probability samples of many populations including those of the general population (e.g., adults in a particular geographic area 18 and older). Nonprobability sampling approaches are quite population with Web surveys because of these limitations, but these bring their own set of problems.
For more information, see
Couper, M. (2000). Web surveys: A review of issues and approaches. Public Opinion Quarterly, 64, 464-494.
Dillman, D., Smyth, J., & Christian, L. (2014). Internet, phone, mail, and mixed-mode surveys: The tailored design method. Hoboken, NJ: Wiley.
One of the first and most central decisions to be made when designing a survey is the mode (or modes) in which survey data are to be collected. Each of the four major modes (in-person interviewing, telephone interviewing, mailed paper-and-pencil self-administered questionnaires, and Web or Internet questionnaires) has advantages and disadvantages that have been reviewed in the last four Survey News Bulletins (Nos. 35-38). Sometimes instead of choosing one mode or another, researchers combine modes into what are called "mixed mode" designs. This often is done to capitalize on the different advantages of the modes to be combined. This can be done in a number of ways. In some surveys, respondents are given a choice of modes in which to respond. Ironically, there is some evidence, however, this may actually decrease response rates in some cases (in particular when mail surveys are provided with a Web option). In other cases, respondents who initially do not respond in one mode may be recontacted in another mode (e.g., nonrespondents to a mail survey may be contacted via telephone), and this is generally a more successful strategy for increasing participation. Other mixed-mode surveys involve using one mode to recruit respondents and another to interview them (e.g., recruiting respondents from a telephone sample to participate in an in-person interview) or use different modes in different waves of a panel survey. Finally, some surveys may use one mode embedded in another. For example, respondents in an in-person interview may be asked to answer a subset of particularly sensitive questions by entering their responses directly into a laptop computer (known as computer-assisted self-interviewing or CASI) in order to maximize privacy for those questions.
One disadvantage of a mixed-mode design is that it can confound mode (which can influence survey results) with other variables such as willingness to participate (e.g., when follow-up contacts are done in a different mode), respondent characteristics (e.g., when respondents self-select themselves into a particular mode), time (e.g., when waves of a panel survey are conducted in different modes), or question sensitivity (e.g., when a self-administered mode such as CASI is embedded in an in-person interview). As a result, the use of mixed-mode designs should be considered carefully within the goals of each particular study.
For more information see:
Dillman, D., Smyth, J.,& Christian, L. (2014). Internet, phone, mail, and mixed-mode surveys: The Tailored Design Method. Hoboken, NJ: Wiley.
Key informant interviews refer broadly to the collection of information about a particular organization or social problem through in-depth interviews of a select, nonrandom group of experts most knowledgeable of the organization or issue. They often are used as part of program evaluations and needs assessments, though they also can be used to supplement survey findings, particularly for the interpretation of survey findings. In survey studies, key informant interviews can be valuable in the questionnaire development process, so that all question areas and possible response options are understood. Further, relying on this method is appropriate when the focus of study requires in-depth, qualitative information that cannot be collected from representative survey respondents or archival records. While the selection of key informants is not random, it is important that there be a mix of persons interviewed, reflecting all possible sides of the issue at study. Key informant interviews are most commonly conducted in person and can include closed- and open-ended questions. They often are recorded and transcribed so that qualitative analyses of the interviews can be performed. Key informant interviews have a useful role in the beginning stages of research studies where information gathering and hypothesis building are the goal.
Parsons, J. A. (2008). Key informant. In P. J. Lavrakas (Ed.), Encyclopedia of survey research methods: Volume 1 (p. 407). Thousand Oaks, CA: Sage.
Focus groups are in-depth qualitative interviews with a small number of carefully selected people brought together to discuss a host of topics. They are often used to aid with designing survey questions or understand how to get cooperation from a target population. They are concerned with understanding attitudes, experiences, and motivation rather than measuring them. Their interactive nature allows a discussion to address "how" and "why." They are often less costly than surveys. However, the analysis is subjective, and one cannot generalize to the population (more for exploring, not representing).
When planning for a focus group session, make sure you have a clear “focus” for the group, define and locate your population dependent on the topic, and consider hiring a professional moderator. In most cases, you will need to submit your discussion guide, consent form, recruiting and screening materials, and protocol to the Institutional Review Board. Remember when designing your discussion guide to start with (3) clear goals / objectives, ask questions that require reflection (how or why, not yes or no), and use the “funnel approach”: begin with broad questions / topics and move to narrow and specific towards the end. Try to place more neutral questions before sensitive ones. Be ready with appropriate follow-ups and probes.
Your recruitment will need to include a screening / scheduling questionnaire with key questions to screen out ineligible people (under 18 years old), etc. You can locate members of the population you are targeting through flyers, lists of members of relevant organizations, mailed invitations to people or households that are likely to be eligible, telephone screening, on-line list-servs, Craigslist, or ads in local newspapers (the latter is particularly effective if you are targeting geographical areas). Set up an unique e-mail address and / or voicemail for potential participants to contact you. E-mail a confirmation letter with directions and contact information soon after scheduling. We recommend scheduling 12 to 15 participants in order to have 8 attend (you can adjust this ratio as you go). Make reminder calls / texts two business days in advance of the group meeting (this gives you time to possibly replace people who can no longer attend.)
Avoid putting people who know each other or who are in a chain of command (supervisors and employees) in the same group. Avoid “Professional Respondents” who have participated in many other focus groups and research studies (who you may attract using Craigslist) as much as possible. When appropriate, consider matching moderators to participants by gender and / or race-ethnicity.
Analysis can including meeting afterwards to discuss observations, a summary report by each observer / moderator, basic demographic questionnaires, documenting nonverbal behaviors, and transcriptions of the audio files to aid in identifying common themes and patterns (and deviations from these patterns).
For more information, see:
Krueger, R. A., & Casey, M. A. (2009). Focus groups: A practical guide for applied research. New York: Sage.
It is tempting to want to offer respondents a choice of survey modes upon first contact with the hope of maximizing response rates. For example, a mail survey might also include a URL for the respondent to complete the questionnaire online. However, research consistently indicates that more than one option depresses response rates. Offering more than one mode may make the respondent’s response decision more complex, which leads to a delayed response as they mull over which option to select. A 2012 meta-analysis of studies that offered a concurrent Web option in mail surveys offers further evidence of this effect (Medway & Fulton, 2012). Best practices dictate that a concurrent option be offered at later or final contact attempts. For an example on how to offer more than one mode to maximize response rates, see Chapter 2 of Dillman, Smyth, & Christian (2014).
Dillman, D. A., Smyth, J. D., & Christian, L. M. (2014). Internet, phone, mail and mixed-mode surveys: The tailored design method (4th ed.). John Wiley: Hoboken, NJ
Medway R. L., & Fulton J. (2012). When more gets you less: a meta-analysis of the effect of concurrent web options on mail survey response rates. Public Opinion Quarterly, 76(4), 733–746.
Web surveys conducted with panels of survey respondents are increasingly popular. However, these Web panels differ in one important way - some of them are based on probability sampling and others on nonprobability sampling. In the past three years, the American Association for Public Opinion (AAPOR) has appointed two task forces to evaluate nonprobability Web panels and explore when using probability versus nonprobability samples affects results. (See here, here, and here.)
Populations can be hard to survey for a variety of reasons. Tourangeau et al. (2014) distinguishes populations that are hard to sample, hard to identify, hard to find or contact, hard to persuade, and hard to interview. Hard-to-sample populations typically consist of groups that make up a small percentage of the overall population. Unless they are physically clustered in some way, it is often prohibitively expensive to screen the general population to find them. Populations that are hard to identify often are stigmatized in some way (e.g., illicit drug users or men who have sex with men). Thus, they are unlikely to admit membership in a population to an interviewer. Examples of hard-to-find populations are migrant laborers, homeless persons, or college students. The mobility of such populations makes them difficult to locate. Hard-to-persuade populations are those who refuse to participate in surveys when contacted. Research suggests that those who refuse to participate are less engaged civically than those who agree to participate. Finally, the hard-to-interview populations are challenging because of physical or cognitive barriers, language barriers, or due to vulnerability (e.g., prisoners or children).
For further information about various types of hard-to-survey populations, challenges specific to different populations, and strategies employed by researchers to obtain survey data from members of these populations, see Tourangeau, R., Edwards, B., Johnson, T. P., Wolter, K. M., & Bates, N. (Eds.). (2014). Hard-to-survey populations. Cambridge University Press.
The 7th edition of the Standard Definitions document, published in 2011 by the American Association for Public Opinion Research (AAPOR), contains standardized formulas for the calculation of response rates, cooperation rates, and refusal rates for telephone, in-person, mail and internet surveys. Use of these formulas facilitates meaningful comparisons across surveys and many professional journals now require their use when reporting findings from primary survey data collection efforts. We strongly encourage their use. For more information, see
The American Association for Public Opinion Research. (2011). Standard definitions: Final dispositions of case codes and outcome rates for surveys (7th ed.). AAPOR.
When conducting a mail survey of the general population, how do you select a respondent from the household to complete the survey? Battaglia et al. (2008) tested three respondent selection methods:
- Any adult in the household
- Adult with the next birthday
- All adults in the household
Through follow-up telephone interviews, they were able to get detailed information on who completed the questionnaire (and why other household members did not complete the questionnaire). They found that the next birthday and all adult selections show promise as selection methods: their household level response rates were comparable to the any adult method. At the respondent level, however, the respondent rate for the all adults method was lower.
For more information, see Battaglia, M. P., Link, M. W., Frankel, M. R., Osborn, L. & Mokdad, A. H. (2008). An evaluation of respondent selection methods for household mail surveys. Public Opinion Quarterly, 72(3), 459-469.
The proportion of adults living in cell-phone-only (CPO) households has increased dramatically in recent years-from less than 5 percent in 2003 to almost 40 percent at the end of 2013. Moreover, 65.7% of adults aged 25-29 live in CPO households while only 13.6% of adults aged 65 and older do. Adults who rent their homes, live in poverty, or are Hispanic are also more likely to live in CPO households. Because so many people live in households without landlines-and because those who have landlines are older, wealthier, and more likely to own their homes-survey researchers can no longer sample exclusively from landline phone numbers to attain a sample that represents the general population. In an attempt to include the entire adult population in phone studies, practitioners are combining landline and cellular phone numbers into a dual frame. The proportion of interviews that should be completed with respondents on a cellphone vs. a landline depends on the proportion of the population that is CPO, the proportion that uses both cell and landline, the proportion that is landline only, as well as the cost differential between calling landline vs. cell frames.
For information about the demographics of cellphone usage, see Blumberg, S. J., & Luke, J. V. (2014, July). Wireless substitution: Early release of estimates from the National Health Interview Survey, July-December 2013. National Center for Health Statistics.
For information about combining cellular and landline phone sample into a dual frame, see Levine, B., & Harter, R. (2015). Optimal allocation of cell-phone and landline respondents in dual-frame surveys. Public Opinion Quarterly, 79, 91-104.
In an effort to maximize coverage (and thus generalizability), many survey researchers are using address-based sampling (ABS) frames. ABS frames use the household address as the unit of analysis. In urban areas, the coverage of ABS frames is virtually complete. While coverage is not as high in rural areas, as rural-type addresses are updated to city-style addresses (i.e., addresses with street numbers and names) for the 911 system, the ABS coverage will continue to improve in rural areas.
Addresses are sampled from the U.S. Postal Service Delivery Sequence File (DSF) to which sampling vendors have access. ABS samples are ordered based on geography, which can be defined at almost any level-states, counties, tracts, blocks, ZIP-codes, street boundaries, or even by radius from a single point. Once the geography is determined, a random sample of addresses can be selected.
Samples selected using ABS frames can be used on their own for face-to-face studies or mail studies. Because some addresses in ABS frames can be linked to phone numbers, ABS frames can also be used for multi-mode studies that include telephone interviews.
For further information, see:
Iannacchione, V. G. (2011). Research synthesis: The changing role of address-based sampling in survey research. Public Opinion Quarterly, 75, 556-575.
Snowball sampling, also known as chain referral sampling, is a nonprobability method of survey sample selection that is commonly used to locate rare or difficult-to-find populations. Although there are several variations, this approach involves a minimum of two stages: (a) the identification of a sample of respondents with characteristic x during the initial stage, and (b) the solicitation of referrals to other potentially eligible respondents believed to have characteristic x during subsequent snowball stages. In many applications, this referral process continues (or snowballs) until an acceptable number of eligible respondents have been located. Statistical inferences can be drawn from the first stage of a snowball sample, assuming that probability methods of selection were used. Samples drawn during snowball stages, and samples that combine the initial and snowball stages are not representative, however, and cannot be used to make statistical inferences.
Beyond nonrandom selection procedures, other limitations include correlations between social network size and selection probabilities, reliance on the subjective judgments of informants, and confidentiality concerns.
Key advantages include low cost and the potential time efficiency with which samples can be recruited.
The following resources provide more information about snowball sampling:
Biernacki, P. & Waldorf, D. (1981). Snowball sampling: Problems and techniques of chain referral sampling. Sociological Methods & Research, 10, 141-163.
Sudman, S. (1976). Applied sampling. New York: Academic Press.
Respondent-driven sampling (RDS) can be used to recruit individuals who belong to hidden, hard-to-reach, stigmatized populations, where the members of the population are known to each other—for example, illegal drug users.
RDS includes several steps. The recruitment of the initial respondents—called seeds—is done by the researcher; however, all subsequent recruitment is done by the selected respondents. Following initial interviews, the seeds are given coupons that they are asked to give to other eligible members of the network. If the second-stage recruits wish to reveal their identity, they can contact the researcher to be interviewed. Researchers also give incentives to respondents when they participate and tell them that they will get an additional incentive if their recruits also participate. If the second-stage respondents complete an interview, they are in turn given coupons and incentives for the recruitment of third-stage respondents. The process continues until the needed number of completed interviews has been attained.
The coupons that are given to respondents are returned to the researcher at the time of the interview; information included on the coupon enables the researcher to trace the links between initial and subsequent respondents.
See the references for discussions regarding the modeling and weighting that is required to attain unbiased estimates from RDS samples. The first two references are papers by Heckathorn, who developed the RDS methodology. The third reference is an edited volume that contains several papers that discuss RDS.
For more information, see:
Heckathorn, D. D. (1997). Respondent-driven sampling: A new approach to the study of hidden populations. Social Problems, 44(2), 174-99.
Heckathorn, D.D. (2002). Respondent-driven sampling II: Deriving valid population estimates from chain-referral samples of hidden populations. Social Problems, 49(1), 11-34.
Tourangeau, R., Edwards, B., Johnson, T. P., Wolter, K. M.,& Bates, N. (Eds.). (2014). Hard-to-survey populations. Cambridge University Press.
The Internet can be employed to dramatically accelerate the recruitment of hard-to-find populations using Web-based respondent-driven-sampling (WebRDS). In recent years, researchers have tested strategies for extending respondent driven sampling [see also SRL News Bulletin 51] for use on the Internet. The references below provide key information regarding the development and testing of this methodology. Key advantages of WebRDS include the low cost and speed with which online data can be collected and the privacy afforded by online self-administration. Limitations include the requirement that members of the target population have access to e-mail, that recruitment remains open long enough to compensate for variable e-mail usage, and the challenge of avoiding duplicate responses from individuals using more than one e-mail address.
For more information, see:
Bauermeister, J. A., Zimmerman, M. A., Johns, M. M., Glowacki, P., Stoddard, D., & Votz E. (2012). Innovative recruitment using online networks: Lessons learned from an online study of alcohol and other drug use utilizing a web-based, respondent-driven sampling (webRDS) strategy. Journal of Studies of Alcohol and Drugs, 73, 834-838.
Strömdahl, S., Lu, X., Bengtsson, L., Liljeros, F., & Thorson, A. (2015). Implementation of web-based respondent driven sampling among men who have sex with men in Sweden. PLOS One, 10(10), e0138599.
Wejnert, C., & Heckathorn, D. D. (2008). Web-based network sampling: Efficiency and efficacy of respondent-driven sampling for online research. Sociological Methods & Research, 37(1), 105-134.
Network sampling—also known as multiplicity sampling—is a probability sampling methodology that can be used to locate members of a rare population. At the outset, a random sample of respondents is selected from the population; those respondents are asked if anyone in their network–including themselves—has the characteristic that makes them a member of the rare population.
The network must be clearly defined; members of one’s family including parents, siblings, and children is one example. The initial respondent must be able to identify the individuals in the network.
In addition, the initial respondent must know whether or not network members have the sought-after rare characteristic. For example, the diagnosis of cancer would likely be known within the family network defined above, whereas tax evasion may not be something that even close family members would know.
Because network sizes will differ, the probability of identifying eligible respondents will differ across networks. For this reason, weights must be constructed to adjust for the different selection probabilities.
For more on network (multiplicity) sampling, please see:
Blair, E., & Blair, J. (2015). Applied survey sampling. Sage Publications, Inc.
Sirken, M. G. (1970). Household surveys with multiplicity. Journal of the American Statistical Association, 65 (329), 257-266.
A census is the enumeration or administration of a questionnaire to an entire population. A sample survey is the collection of data from a subset of the population of interest. While researchers may be tempted to use a census so all members of the population are included, a well-designed survey based on a representative sample can provide accurate representation of the study population.
Major disadvantages of conducting a census are that they can be very time consuming and expensive. A census is most appropriate when it is important to have information for the entire population, such as the US decennial census that is mandated by law.
A survey of a well-designed probability sample provides results that are representative of the population. It is faster and more cost effective than a census and allows for resources to be spent on follow-up efforts, possibly resulting in higher response rates than a census. It may also allow for faster results. A sample survey is most appropriate when the population size is larger than the number of observations required for the desired statistical power and when the resources available to conduct a high quality, rigorous census are not available.
Once the target population for a survey is defined, one of the first steps to selecting a sample will be to develop a list of the elements that represent that population. This list of elements is known as a sample frame.
For example if the population of the State of Illinois is the target population, two possible sample frames would be a list of all the addresses of household residences within the State of Illinois or a list of all census blocks or tracts within the state. If the target population is college students within Illinois, an initial sample frame would be a list of all colleges and universities in the state. A sample frame could also be a list of the members of a professional organization.
Because the sampled cases will be selected from the sample frame, developing a quality frame is crucial to ensuring that the sampled cases will have good coverage of the target population. Only cases that are included on the sample frame have any possibility of being selected into the sample. To the extent that eligible cases are excluded from the sampling frame, and/or ineligible cases are included in the sample frame, coverage error may exist.
Over the next few weeks, we will build upon this information and discuss some common sample designs.
For more information on sample frames please see
Kalton, G. (1983). "Introduction to Survey Sampling." Sage University Paper series on Quantitative Applications in the Social Sciences, 35. Beverly Hills, New Delhi, and London: Sage Pubns.
Maisel, R., & Persell, C. H. (1996). How Sampling Works. Thousand Oaks, California: Pine Forge Press.
In order to be able to generalize survey results to the population from which it was drawn, the sample must be a probability sample. Such a sample is, by definition, one where the elements are randomly selected with a known—non zero—probability of selection.
In order to draw a sample with a known probability of selection, one must start with a sample frame [see SNB #74 for a description of sample frames]. If n cases are randomly selected from a sample frame that has N cases, the probability of selection for those cases is n/N.
The most basic type of probability sample is a Simple Random Sample. This is a sample where every element that is selected has the same probability of being selected and every combination of elements has the same probability of selection.
Elements from a sample frame can be randomly selected using a random number generator that is available within most software programs.
In the coming weeks we will introduce stratified and cluster sampling.
For more information on Probability Sampling, including Simple Random Sampling, see
Blair, E., & Blair, J. (2015). Applied Survey Sampling. Thousand Oaks, CA: Sage Publications.
One strategy often used to ensure that a sample resembles some aspect of the population from which it is drawn, is a stratified sampling design.
With stratification, members of the population of interest are assigned to mutually exclusive and exhaustive groups, referred to as strata. Members are then sampled from each stratum independently.
If, for example, we want to select a sample of undergraduates from a university and we want to be sure that the selected sample matches the population of undergraduates vis-à-vis their year in school, we can take a stratified random sample.
In a proportionate stratified sample, the same probability of selection is used for each of the strata.
Sometimes, when the strata are of different sizes and the researcher wants to make comparisons between two or more of the strata, a disproportionate stratified sample design should be used. If one wants to compare the views of freshmen and seniors who live in undergraduate dormitories—and assuming that there are many more freshman than seniors living in the dorms—disproportionate random sampling can be used. After the sample frame containing all freshmen and seniors living in dormitories is divided into mutually exclusive strata, a simple random sample using a different probability of selection is taken from each of the strata. A higher probability of selection should be used to oversample the smaller strata—the seniors—and a lower probability of selection should be used to sample the larger strata—the freshmen. The sampling fractions can be calculated such that equal number of cases are selected from each strata. This equal number of cases will allow comparisons among the strata.
In order to use a stratified sample, the sample frame must include the information necessary (e.g., year in school, race, or gender) for assigning members to a stratum.
It is important to note that prior to data analysis, weights must be calculated when cases have been selected with different probabilities. Future Survey News Bulletins will cover weight construction.
For more information, see Chapter 5 Stratification and Stratified Random Sampling in
Levy, P. S., & Lemeshow, S. (2008). Sampling of populations: Methods and applications (4th ed.). Hoboken, New Jersey: John Wiley& Sons.
A design in which a group of individual enumeration units, which are in some way associated with one another, is sampled as a whole. The association is generally physical proximity (households on a block, students in a classroom, etc.). Cluster designs are used when a sample frame that lists individual enumeration units is not available or when such a list is available, but drawing a random sample from it would result in prohibitive interview costs. In the first case, one example is a survey of high school students. One straightforward way to interview students would be to draw a random sample of classes in a school, then administer the questionnaire to all students in each of those selected classrooms. The classroom is the cluster and the students are associated by virtue of being in the same class.
Another example of a cluster is a city block. If one were conducting a face-to-face household survey in the City of Chicago, for example, it is possible to obtain a complete listing of all addresses in Chicago. However, drawing a random sample from this list, then sending interviewers all over the city to interview households would be time-consuming and inefficient. To reduce travel costs and expenses, a sample of city blocks could first be drawn, then households on those blocks would be interviewed. The block is the cluster and households are associated by geography.
In a simple cluster design, once the clusters are sampled, all units within a selected cluster are subsequently sampled. In a multistage design, there are two or more stages of selection. In the first stage, the clusters are sampled. In subsequent stages, smaller clusters or individual sample units are sampled from the original clusters. For example, one might draw a sample of schools, then classrooms within the school, then students within the classroom.
Clusters can be selected using simple random sampling, with the probability of selection for each cluster being one divided by the number of clusters in the sample frame. Or, clusters can be selected using Probability Proportionate to Size (PPS) sampling, where the larger clusters have a greater probability of being selected than smaller clusters. If PPS sampling is used, the measure of size will be determined by information in the sampling frame, such as the number of students in the colleges or universities or the number of housing units on a block. The probability of a cluster being sampled is the number of elements in the cluster divided by the total number of elements across all eligible clusters.
For more information, see:
Levy, Paul S., and Stanley Lemeshow. Sampling of populations: methods and applications. John Wiley & Sons, 2013.
Lohr, Sharon L. Sampling: Design and Analysis. Cengage Learning, 2009.
A complex sample design is one in which many different components of sampling, such as stratification, clustering, or unequal probabilities of selection, are used in the same survey. While not technically the same as a multistage design (see Bulletin #77 –Cluster Sampling), complex survey designs generally incorporate multiple stages of selection. Many large, federally-funded surveys, such as the National Health Interview Survey (NHIS), the National Health and Nutrition Examination Survey (NHANES), and the American Community Survey (ACS), use multistage, stratified, cluster designs with over-sampling of populations of particular interest.
Primary sampling units (PSUs), such as Census blocks, are often stratified by some demographic characteristic, such as population size or race. Then the PSUs are sampled from the strata, housing units are sampled from the PSUs, and individuals are sampled from the households. In some cases, households with particular characteristics (e.g., dependent children in the household) are oversampled. Because clusters and individuals can be sampled at different rates, the computation of sample weights (see upcoming bulletin for more detail) is necessary to adjust for unequal probabilities of selection.
In addition, complex samples often produce larger variance estimates than simple random samples. As a result, statistical procedures that assume a simple random sample will underestimate variances and overstate statistical significance. Thus, it is necessary to analyze data from complex surveys with software that can take into account the sample design. Many programs (e.g., STATA, SAS, and SPSS) have this functionality. If conducting secondary data analysis on a dataset collected with a complex sample design, it is critical to read the documentation carefully.
For more information, see:
Levy, Paul S., and Stanley Lemeshow. Sampling of populations: methods and applications. John Wiley & Sons, 2013.
Lohr, Sharon L. Sampling: Design and Analysis. Cengage Learning, 2009.
In the collection of survey data, a user may need to calculate sample weights. In its simplest form, a weight is the inverse of the probability of selection and indicates the number of population units each sample unit represents. For example, in a simple random sample in which 100 units are drawn from a population of 1,000, the probability of selection of each unit is 100/1,000, or 1/10. The base sample weight for each unit is therefore 10 and the weights sum to the population size. A sample in which each unit has the same probability of selection, and therefore the same weight, is referred to as a self-weighting sample. However, few surveys actually incorporate simple random samples. Many use disproportionate stratified samples, clusters samples, or complex, multistate designs. In these sample types, probabilities of selection are not equal. Sample weights are thus used to correct for oversampling of some units and undersampling of others. For example, if a population of 1,000 consisted of 800 men and 200 women, and we drew a disproportionate stratified sample of 50 men and 50 women, the probability of selection of women would be 50/200 (.25), while the probability of selection of men would be 50/800 (.0625)—women would have 4 times the chance of being sampled as men. If sample weights were not included in this analysis, the results would overrepresent women.
Additional types of weights will be presented in Bulletin #80.
For more information, see:
Lohr, S. (2009). Sampling: design and analysis. Nelson Education.
In Survey News Bulletin #79, we introduced sample weights in their most basic form. This bulletin expands on that by covering other types of weights. A base weight is the inverse of the probability of selection and is equal to the number of population units each sample unit represents. In complex sample designs, there are multiple stages of selection, and therefore multiple probabilities of selection. In these designs, the probability of selection of the final sample unit is the product of the probability of selection at each stage. For example, a household survey utilizing a multistage design could include the probability of selection of the Primary Sampling Unit (PSU), which may be a census tract or block group; the probability of selection of the household; and the probability of selection of the respondent in the household. The base weight in this example would have three components.
Adjusting for different probabilities of selection is often insufficient, as most surveys include some level of nonresponse and nonresponse rates can vary across sample strata. Nonresponse weights are designed to inflate the weights of survey respondents to compensate for nonrespondents with similar characteristics. For example, assume we have a sample that is stratified by gender and women have a response rate of 50% compared to 40% for men. Even after adjusting for probabilities of selection, men would be underrepresented, because they participated a lower rate. Nonresponse weights are the inverse of the response rate—in this case 2 for women and 2.5 for men--and they indicate how many sampled units each responding unit represents.
Post-stratification weights are used to bring the sample proportions in demographic subgroups into agreement with the population proportion in the subgroups. After selection and nonresponse weights are calculated, the sample distribution of demographic characteristics may still vary from the population distribution. For example, the weighted sample may be 55% female and 45% male, but the population distribution may be 52% female and 48% male. A post-stratification weight is the ratio of the population percentage to the sample percentage. In this example, the post-stratification weight for women would be 52/55 or .95. In most surveys that use them, post-stratification weights are computed for multiple demographic variables. The use of post-stratification weights requires an auxiliary dataset that allows the user to compare the sample characteristics to the population from which the sample was drawn.
Many data analysis programs (SAS, STATA, SPSS) have weight statements that allow users to specify the variable containing the weight they would like to use. Thus, it is straightforward to specify the weight to use. However, some surveys (e.g., many federally funded surveys such as the NHANES, CPS, etc) have complicated sample designs and therefore complicated weighting schemes, with different weights being used for different subpopulations. When performing secondary data analysis on complex surveys, such as these, it is critical to read the documentation and use the appropriate weights.
Groves, R. M., Fowler Jr, F. J., Couper, M. P., Lepkowski, J. M., Singer, E., &Tourangeau, R. (2011). Survey methodology (Vol. 561). John Wiley& Sons.
Missing completely at random is a term used to describe missing responses to items in a survey. Data are missing completely at random (MCAR) when being missing is independent from any of the variables in the model being estimated. For example, if missing data on an income question is unrelated to the respondent's actual income, the data are MCAR. Data can also be missing at random (MAR). Data are MAR when the probability of a variable being observed is independent of the true value of that variable, controlling for one or more variables. For example, if missing income is unrelated to income within varying levels of education, then the data are MAR. When the likelihood of being observed is dependent on the variable being observed, even when controlling for other factors, the data are not missing at random (NMAR). Strategies that can be used to address missing data in analysis are dependent on whether the data are MCAR, MAR, or NMAR. For further information, see
McKnight, P. E., McKnight, K. M., Sidani, S., Figueredo, A. J. (2007). Missing data: A gentle introduction. New York: Guilford Publications.
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.
Originally, paradata was defined as process data that are an offshoot of data collection efforts, such as number of call attempts (Couper, 1998; 2005). However, the definition has broadened to include any data collected or observed by interviewers that is not part of the questionnaire. This may include interviewer observations of neighborhood conditions or respondent characteristics, and specific information about contact attempts, such as disposition or time of contact. Olson (2013) describes five categories of paradata: interviewer observations of (1) the sampled unit's neighborhood, (2) the sampled housing unit, and (3) persons in the sampled housing unit. The last two categories are (4) call record information and (5) interviewers' observations about their interaction with the respondents. With the continuing decline in response rates, paradata become increasingly important in the analysis and adjustment of nonresponse bias. A complete presentation of the types and uses of paradata can be found in Kreuter (2013).
Couper, M. P. (1998). Measuring survey quality in a CASIC environment. In Proceedings of the Survey Research Methods Section (pp. 41-49). Alexandria, VA: American Statistical Association.
Couper, M. P. (2005). Technology trends in survey data collection. Social Science Computer Review, 23(4), 486-501.
Kreuter, F. (Ed.). (2013). Improving surveys with paradata: Analytic uses of process information. John Wiley & Sons.
Olson, K. (2013). Paradata for nonresponse adjustment. The Annals of the American Academy of Political and Social Science, 645, 142-170.
Big data is "data that is so large in context that handling the data becomes a problem in and of itself. Data can be hard to handle due its size (volume) and/or the speed of [sic] which it's generated (velocity) and/or the format in which it is generated, like documents of text or pictures (variety)" (AAPOR, 2015). In recent years, there has been a great deal of interest in Big Data as a potential alternative or complement to survey data. Big Data can be used to analyze economic or social systems at a macro level. Examples of Big Data are tweets posted on Twitter (or other social media messages) or the content of online searches (such as the "Google flu index"). Big Data are often secondary data that are "found" or "organic" rather than "made" or "design based" and researchers often cannot control or affect the specific format in which data is generated. Big Data represent a huge opportunity, but there are many challenges in using Big Data, including establishing the validity and reliability of measures based on Big Data, developing new analysis approaches to deal with the complexity of it, and establishing best practices and ethical guidelines for analyzing and reporting analysis of Big Data. Earlier this year, the American Association for Public Opinion Research (AAPOR) released a Report on Big Data (see below).
For more information see
AAPOR Big Data Task Force. (2015, February). AAPOR report on big data. AAPOR.
p-values are commonly employed when analyzing survey data. A p-value is the probability of getting an observed result, or a more extreme result, under the assumption that the null hypothesis is true. For example, if a researcher wanted to test the efficacy of a drug compared to a gold standard, the null hypothesis would be that there is no difference in the efficacy of the two drugs. If the study showed that the new drug reduced symptoms to a greater degree than the gold standard and that result was statistically significant with a p-value of .05, it means that the probability of getting that result or a more extreme result (i.e., greater symptom reduction than what was observed) given the null hypothesis (i.e., no difference) is 5%. The p-value is routinely misunderstood and misused, so much so that the American Statistical Association recently issued a statement on statistical significance and p-values (http://amstat.tandfonline.com/doi/abs/10.1080/00031305.2016.1154108). p-values do not measure the probability that a hypothesis is true, or the probability that the data were produced by random chance alone. p-values provide a measure of statistical significance and provide no information about the substantive significance of a finding. A trivial outcome can be statistically significant if the sample size is large enough. A p-value is only one piece of information that can speak to the value of a scientific finding, but in and of itself, should not be considered as evidence of the truth of a model or hypothesis.
The use of null hypothesis significance testing, which drives the use of p-values, has received some criticism (e.g., Gliner, Leech, & Morgan, 2002), but there are few viable alternatives to this statistical approach to testing whether a difference or relationship is meaningful. One criticism is that reliance on p-values biases the research process so that the process is designed to obtain significant results (see Greenwald, 1975). In addition to these broader potential biases in the research process, individual researchers may engage in what has come to be known as “p-hacking” – using one or more strategies specifically designed to obtain significant statistical tests (typically p-values less than .05). These strategies can include using some criteria to exclude participants, transforming the data, including covariates in models, analyzing and reporting results from only a subset of conditions in an experimental study, reporting results only from measures that show statistically significant results (while not reporting on results with measures that do not show statistically significant results), and stopping data collection once statistical tests are significant. While some of the strategies can be used for legitimate reasons (e.g., transforming variables), using one or more of these strategies specifically to obtain significant results is frowned upon and considered to be a form of statistical cheating.
For additional information, see:
American Statistical Association (www.amstat.org)
Cumming, G. (2013). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge.
Gliner, J. A., Leech, N. L., & Morgan, G. A. (2002). Problems with null hypothesis significance testing (NHST): What do the textbooks say? Journal of Experimental Education, 71(1), 83-92.
Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82(1), 1-20.
Nuzzo, R. (2014). Statistical errors. Nature, 506(7487), 150-152.
Survey respondents are often sampled using complex sample designs, such as stratification, clustering, or a combination of the two. Analysis of survey data that ignores this complexity can result in incorrectly calculated standard errors and therefore overstated statistical significance. To avoid this, it is necessary to use a statistical analysis package that can incorporate the sample design into the analysis and to use the correct procedures within those programs.Statistical programs that can adjust for complex designs include STATA, SAS, and SPSS.
Statistical procedures that can incorporate complex sample designs include, but are not limited to:
- Power analysis
- Descriptive statistics
- Bivariate analysis of categorical data
- Linear, logistic, multinomial logit, and other forms of regression
- Structural equations modeling
The Survey Research Laboratory (SRL) at the University of Illinois has extensive experience in both designing complex surveys as well as analyzing data collected from such surveys. Clients who have a need to analyze complex survey data can benefit from SRL’s expertise. If you would like assistance conducting analysis with complex survey data, please contact Linda Owens at firstname.lastname@example.org.
For more information, see:
Lohr, S. (2009). Sampling: design and analysis. Nelson Education.
Heeringa, S. G., West, B. T., & Berglund, P. A. (2010). Applied survey data analysis. CRC Press.
Election polling represents one of the most visible examples of survey research. Especially during the campaign leading up to presidential elections, there are many media polls showing a variety of results. There are many aspects of survey design that can influence the results of a poll and one of these is the approach taken to analyze the data. To illustrate this point, Nate Cohn of the New York Times Upshot recently gave raw data from a pre-election poll conducted by Siena College to four pollsters. The data were also analyzed by researchers at the NYT Upshot.
These five different pollsters/organizations took different approaches to adjusting and analyzing the data to estimate support for Trump and Clinton. First, they took different approaches to making the survey sample representative of the population, using different estimates of the population (e.g., the Census or voter registration files) and different approaches for doing so (e.g., traditional weighting versus statistical modeling). Pre-election polls are unique in that their accuracy also is dependent on predicting who votes. The five analysts used different definitions of who is a likely voter (using self-report, voter history, or a combination of the two) and therefore included different subsets of the respondents when estimating support for the two presidential candidates. The findings for the presidential election question varied quite a bit, from one analysis showing Clinton up by 4 points to one showing Trump up by 1 point.
This exercise illustrates the importance of analysis decisions in affecting the outcome of political polls and surveys more broadly. It also demonstrates why it is valuable to average or aggregate results across surveys that use different methodologies or analysis approaches. Aggregating political polls is a strategy that has been made visible by websites like RealClearPolitics (http://www.realclearpolitics.com/) and bloggers like Nate Silver (http://projects.fivethirtyeight.com/) Sam Wang (http://election.princeton.edu/).
For more information about the Upshot exercise, see: http://www.nytimes.com/interactive/2016/09/20/upshot/the-error-the-polling-world-rarely-talks-about.html
The results of 2016 presidential election are in and they raise some interesting questions about the methods used for pre-election polling (and survey methods more broadly). Although pre-election polls showed the race being tight, most showed Hillary Clinton winning by a small margin. Current reports of the vote outcome show that Clinton appears to have won the popular vote by a very small margin, but less than was predicted by recent national surveys. Evidence from state polls in battleground states also suggests that they consistently overestimated support for Clinton relative to the election outcome. Although the causes of discrepancies between pre-election polls and the election outcome will not be fully understood for some time pending analysis by survey experts, there are a number of possible explanations:
- The potential for nonresponse bias in pre-election polls is high because response rates are very low (typically less than 10%). Survey predictions of vote choice in an election will be inaccurate to the extent that vote choices among those who participate are different from those who do not. In other words, if Trump supporters were less likely to participate in surveys than Clinton supporters, surveys might have systematically overestimated support for Clinton.
- The margin of error that is reported for polling estimates reflects sampling error, but it does not take into account non-sampling errors such as measurement error, coverage error, or nonresponse bias. The effects of these “non-sampling” sources of error are much more difficult to estimate and account for in our interpretation and analysis of survey data.
- Surveys predominantly rely on respondents’ self-reports. That means that the accuracy of pre-election polls depends on respondents' willingness to honestly and accurately answer survey questions.
- Predicting elections depends not only on predicting vote choice, but also on predicting the level of turnout and who will turn out to vote. Past polling failures to predict election outcomes (e.g., Jesse Ventura in the 1998 Minnesota gubernatorial election) have often been attributed to failures to accurately predict and understand turnout.
- Finally, much pre-election polling has come to rely on non-probability sampling approaches including conducting surveys with members of Web panels constructed using nonprobability methods or simply interviewing the person who picks up the telephone in telephone surveys. The results of surveys that use nonprobability sampling may or may not accurately represent a population and it is nearly impossible to accurately estimate potential error in estimates from these surveys.
Although pre-election polling represents only a small portion of survey research, thinking about the potential sources of error and inaccuracy in pre-election surveys is also useful for understanding sources of error in other surveys.
We are often asked at SRL whether or not sample weights should be used when conducting multivariate analyses of survey data. This is one of those questions that has passionate believers on both sides of the fence. Because sample weights are commonly used to adjust for sample design and differential selection probabilities, failure to employ them may leave the analyst with a sample that is not representative of the population being examined. Employing weights comes with some cost, however, as they can lead to increased variance and standard errors for regression parameters. Hence, the decision as whether or not to use weights is often based on the models to be examined. In particular, if the variables included in sample weights (e.g., education or age) are of substantive interest as model covariates, we would recommend not using the sample weights, as they may distort those relationships. We also recommend that analysts plan to compare models with and without weights to determine the degree to which their inclusion has an effect on the coefficients of interest, and report those comparisons (in a footnote if possible). Major differences between weighted and unweighted models is an indicator that the model may be misspecified, requiring further elaboration of the model being examined.
For more information:
Bollen, K. A., Biemer, P. P., Karr, A. F., Tueller, S., & Berzofsky, M.E. (2016). Are survey weights needed? A review of diagnostic tests in regression analysis. Annual Review of Statistics and its Application, 3, 375–392.
Winship, C., & Radbill, L. (1994). Sampling weights and regression analysis. Sociological Methods and Research, 23(2), 230–257.
In a recent article examining the use of "some/other" questions (i.e., those that are introduced by saying that "Some people think that…, but other people think…."), researchers concluded that these questions increase question complexity and length without improving the validity of responses. Respondents took longer to answer these questions, but there was little evidence that they improved respondents answers. The authors recommend instead short, direct questions that avoid unconventional response option order.
Yeager, D. S., & Krosnick, J. A. (2012). Does mentioning "some people" and "other people" in an opinion question improve measurement quality? Public Opinion Quarterly, 76, 131-141.
Researchers often want to provide instructions for respondents to clarify inclusion/exclusion criteria when asking a question in a survey (e.g., respondents must know what to count or not count as a "shoe" in a question asking them about the number of pairs of shoes they own). Recent evidence suggests that it is better to give instructions before the question than after the question, but that decomposing the question into composite parts (e.g., asking separate questions about different types of shoes) may result in the most accurate responses, although asking multiple questions took longer than providing instructions (see Redline, C. (2013). Clarifying categorical concepts in a Web survey. Public Opinion Quarterly, 77, 81-105.)
The reliability and validity of attitude reports can be improved greatly by breaking up the respondents' rating task into a series of steps (branching) rather than asking for a single rating in one step. For example, the branching alternative to a question asking respondents to rate how positive or negative they feel about an attitude object would be to first ask respondents how they feel toward the object (positive, negative, or neutral), and follow up with a question about the degree (extremely, somewhat, etc.) to which they feel this way. More recent research suggests that the optimal way to implement branching would be to offer respondents three options for the first part (e.g., positive, negative, or neutral) and in the follow-up offer three options (e.g., extremely, somewhat, and slightly) only to those who select the endpoints (i.e., positive or negative). The outcome will be a quality attitudinal report collected on a seven-point bipolar scale.
Malhotra, N., Krosnick, J. A., & Thomas, R. K. (2009). Optimal design of branching questions to measure bipolar constructs. Public Opinion Quarterly, 73, 304-324. (https://pprg.stanford.edu/wp-content/uploads/2009-Branching-Research-Note.pdf)
Although there are many variations, the basic committee translation approach involves several steps. First a translation committee team of three or more bilingual translators are identified, including a team leader. Second, each translator is assigned to translate a random section of the questionnaire. Third, the committee then meets to review the complete translated document and determine the extent to which it is correct in meaning, grammar, syntax, and language use that is familiar and culturally appropriate to the target population. Corrections are made where there is consensus regarding them. Fourth, where there is not a consensus, the team leader makes the final decision regarding the final version. Fifth, the final translation is pilot tested to identify any issues that require additional meetings of the translation committee to resolve.
(See https://www.census.gov/srd/papers/pdf/rsm2005-06.pdf; also see https://www.census.gov/srd/papers/pdf/ssm2013-27.pdf; http://www.census.gov/srd/papers/pdf/ssm2012-05.pdf; https://www.census.gov/srd/papers/pdf/ssm2012-04.pdf; https://www.census.gov/srd/papers/pdf/rsm2007-18.pdf)
Researchers routinely ask people to answer a series of questionnaire items all using the identical rating scale for recording responses. Nondifferentiation occurs when a person selects the same or similar response to all items in the series so as to invest minimal effort while responding, rather than because these ratings genuinely reflect the person's views, a behavior termed "satisficing." Nondifferentiation is more likely to occur toward the end of a long questionnaire, when fatigue presumably sets in and motivation to provide optimal answers declines. Studies have found a negative relationship between nondifferentiation and respondents' educational levels, consistent with the notion that satisficing is more likely among respondents with less cognitive ability. Data quality could likely improve by designing questionnaires to reduce the likelihood of non-differentiation. One simple way to do this is to present questions individually rather than as a series.
Krosnick, J. A. (1991). Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied Cognitive Psychology, 5(3), 213-236. (http://web.a.ebscohost.com/ehost/pdfviewer/pdfviewer?sid=18496d30-56f8-4ae5-822f-44f7f12dddff%40sessionmgr4001&vid=1&hid=4112)
Krosnick, J. A., Narayan, S., & Smith, W. R. (1996). Satisficing in surveys: Initial evidence. New Directions for Evaluation, 70, 29-44.
Agree-disagree questions are subjective survey questions that ask respondents to report whether they agree or disagree with a statement or a series of statements. These include questions that ask only for direction (e.g., Do you agree or disagree with the following statement?) and those that include intensity (e.g., Do you strongly agree, somewhat agree, neither agree nor disagree, somewhat disagree, strongly disagree with the following statement?). These questions are problematic because they are subject to acquiescence response bias (ARB) whereby respondents agree with statements regardless of their content. ARB may bias estimates based on either single agree-disagree items or scales composed of agree-disagree items. ARB is also influenced by cultural factors and has been shown to vary across countries and cultural groups within countries. Simply asking multiple agree-disagree questions some of which are positively worded and some of which are negatively worded and averaging them together, one solution that has been suggested to ARB, results in unnecessarily long and inefficient scales. Furthermore, those respondents who demonstrate ARB in response to these types of scales are given scale values in the middle of range of values, which may artificially reduce variance, and respondents from some cultures may be reluctant to endorse negatively worded statements. The recommended alternative to agree-disagree items is to use questions with construct specific response options. In order to do this, the researcher should determine the underlying construct of interest and write a survey questions explicitly designed to measure that construct. For example, one could conclude that a question asking respondents whether they agree or disagree with the statement "My doctor often treats me with respect" is designed to assess how much of the time the respondent's doctor treats him or her with respect. One could more directly assess this by asking respondents: "How often does your doctor treat you with respect? Would you say never, rarely, sometimes, often, or always?" The latter construct-specific question is more direct, minimizes the cognitive burden for respondents, and avoids ARB.
For more information, see
Saris, W., Revilla, M., Krosnick, J. A., & Shaeffer, E. M. (2010). Comparing questions with agree/disgree response options to questions with item-specific response options. Survey Research Methods, 4, 61-79.
Yes-no survey questions are those that ask respondents to answer yes or no to a question. For example, a survey respondent might be asked "Do you think the government has the responsibility to ensure that all Americans have access to jobs?" A problem with these questions is that they present only one possibility. In the example question, the other possibility is that the respondent does not think the government has this responsibility, but this possibility remains unspoken. As a result, respondents tend to start thinking about their answer to this question by considering reasons why they agree with the statement (i.e., reasons why this is the government's responsibility) rather than reasons why they disagree with it (i.e., reasons why this is not the government's responsibility). Because respondents' thinking may be biased in a confirmatory direction and because not all respondents may fully go through the process of considering both possibilities (i.e., they may stop before considering or fully considering why this is not the government's responsibility), respondents may demonstrate a form of acquiescence response bias (ARB) to these questions. ARB occurs when respondents agree or in this case, say "yes," to questions regardless of their content. Thus, yes-no questions may be biased toward overestimating the proportion of respondents who take the position (or engage in the behavior) framed as the "yes" response. The recommended solution to this problem is to revise the questions to present both positions in a more balanced way. For example, the question about government responsibility for providing jobs could be rewritten to read: "Do you think the government has the responsibility to ensure that all Americans have access to jobs, or do you not think that the government has this responsibility?" This presents both potential responses in an equal and balanced way. Although this question is slightly longer than the simpler yes-no question first presented, initial evidence suggests that it takes no longer to administer the balanced question in an interviewer-administered survey than it does to administer the simple yes-no question (Anand et al., 2010).
Anand, S., Owens, L. K., & Parsons, J. A. (2010). Forced-choice vs. yes-no questions: Data quality and administrative effort. Paper presented at the annual conference of the World Association for Public Opinion Research, Chicago.
Time bounding is giving respondents a time window for their response. If the reference is appropriate respondents are more likely to actually count or estimate rather than use a vague quantifier. For instance, "How often do you exercise?" will yield a vague quantifier ("sometimes").
"How often in the past 30 days did you exercise?" is better. The respondent will probably figure out number of times per day and multiply by 30 or number of times per week and multiply by 4.
"How often have you exercised today?" is best; the respondent will count the number of times.
Often, when we give a time reference (past 30 days), we add date prompts that will tell the respondent what those exact dates are. Without them, respondents interpret the reference differently. Asking past 30 days in early August, for instance could mean "July" to a respondent while asking past 30 days in mid August could mean the first half of August to that same respondent. Similarly, when asking "past 7 days" or "past week" late in a given week (i.e., Thursday), respondents tend to leave out the previous weekend. This can be avoided by using the prompt "Since last Friday…"
Also see Rockwood, T. (2015). Assessing physical health. In T. P. Johnson (Ed.), Handbook of health survey methods (pp. 107-142). Hoboken, NJ: Wiley.
One challenge of using surveys to collect data is that they rely almost exclusively on self-report data. As such, they rely on respondents to be both able and willing to honestly and completely answer survey questions. One type of question that is a particular challenge in surveys is the sensitive question. Such questions measure, for example, constructs like whether a respondent has engaged in risky sexual behavior or used illegal drugs, the extent to which a respondent holds negative racial attitudes, or whether a student has cheated on an exam. Tourangeau and Yan (2007) define sensitive questions as those that are "intrusive," questions where there is a "threat of disclosure," or questions for which there are answers that might make the respondent appear "socially undesirable" (p. 860). People may deal with surveys that contain sensitive questions by not participating in the survey (unit nonresponse), not answering specific questions (item nonresponse), or not answering them honestly (socially desirable responding). There are many factors that may affect responses to sensitive questions including the respondent's tendency to engage in socially desirable responding, the mode of data collection, and interviewer characteristics and behavior. Indirect questioning techniques like the randomized response technique (RRT) and list technique (aka item count technique or unmatched count technique) are strategies used in surveys that allow respondents to answer a question to an interviewer in a way that protects their anonymity. Unfortunately, these methods are sometimes quite complex to implement, often result in reduced statistical power, and may not always work as intended. Another strategy in asking survey questions is to try to normalize the undesirable behavior or opinion being asked about or providing reassurances about the confidentiality of responses.
See Tourangeau, R., & Yan, T. (2007). Sensitive questions in surveys. Psychological Bulletin, 133, 859–883.
One growing issue in survey research today has to do with the effect of culture on survey responses and survey data collection. This is of increasing concern both because cross-national surveys are becoming more widespread (e.g., the European Social Survey) and because cultural heterogeneity is growing even within countries or other geographic areas. For example, the U.S. Census Bureau projects that by 2043, Latinos will be the largest racial or ethnic group in the U.S. but not a majority, and non-Hispanic Whites will be a minority (see https://www.census.gov/newsroom/releases/archives/population/cb12-243.html). These population and research trends emphasize the importance of designing surveys that equivalently measure constructs across cultures. A great deal of research in survey methods has addressed these issues, and a 2015 special issue of Public Opinion Quarterly was dedicated to this methodological issue. The articles in this special volume addressed comparability of measurement (e.g., Davidov et al., 2015; Yu & Yang, 2015); differences in the likelihood of responding to surveys across cultural groups (e.g., Banduci & Stevens, 2015); response effects and response styles across cultures (e.g., Banducci& Stevens, 2015; He & Van De Vijver, 2015); and the measurement of cultures and values across cultures (Lakotos, 2015).
See http://poq.oxfordjournals.org/content/79/S1.toc for the special volume of POQ on cross-cultural survey methods.
Question-by-Question Guidelines (Q x Qs) should be part of the materials provided to interviewers for every telephone and face-to-face interviewer training. A Q x Q is a complete and annotated version of the questionnaire and the screener (i.e., questions used to determine eligibility for the survey) and thus is an important tool for interviewers as they learn how to administer a questionnaire. It includes not only the questions interviewers will be asking respondents, but also instructions about additional data to be collected (e.g., observations or procedures) and additional details and clarification about the survey questions. As part of training, allow ample time for a read-through of the Q x Q on paper with the trainer leading discussion of the annotations and answering interviewer questions. The Q x Q can also serve as a resource for interviewers and supervisors after training.
The Q x Q should put in writing as many details as possible about the questionnaire and interview process that an interviewer might need to refer to later. Not all questions require annotation, but notes should
- point out question-level things that are important for collecting good data, such as data entry guidelines and probing instructions,
- call attention to critical data entry (e.g. why certain pieces of data are collected, especially if they are to be used later on, such as capturing phone numbers for validation),
- document skip patterns and explain the overall purpose of sections of the questionnaire, and
- always provide details about the screening process or interview requirements.
Discussion of the Q x Q will also help interviewers learn how to better manage the answers they are likely to get on certain questions. They should be encouraged to use their copies of the Q x Q to take notes during training, and also to refer to it later on as they learn the questionnaire and participate in mock interviews.
Survey researchers often want to measure respondents’ age. Although it might seem most straightforward to simply ask respondents “How old are you?”, this type of question (one that requires respondents to give an integer numeric response) may result in what is called response heaping or rounding. Response heaping occurs when respondents give answers that are divisible by 5 or 10 (see Holbrook et al., 2014; Pudney, 2008), and this type of heaping has been observed in reports of ages in the U.S. and in other countries. A simple histogram of age reports shows spikes of people answering with values that are divisible by 5 or 10. Since we know that this cannot be accurate, these spikes represent a source of error in age data. Because of the potential for this error in estimates of age, current best practices are to ask respondents their year (or date) of birth and to calculate age based on that report.
For more information, see:
Holbrook, A. L., Anand, S., Johnson, T. P., Cho, Y. I., Shavitt, S., Chavez, N., et al. (2014). Response heaping in interviewer-administered surveys: Is it really a form of satisficing? Public Opinion Quarterly, 78, 591-633.
Pudney, S. (2008). Heaping and leaping: Survey response behavior and the dynamics of self-reported consumption expenditure. Institute for Social and Economic Research, No. 2008-2009. Available at https://www.iser.essex.ac.uk/research/publications/working-papers/iser/2008-09
Cognitive pretesting (sometimes called cognitive interviewing) is used to develop and evaluate survey questionnaires. It originated in the Cognitive Aspects of Survey Methodology (CASM) movement where cognitive psychology has been applied to the survey context. Cognitive pretesting is based on the theory that respondents in a survey go through a series of cognitive steps to answer a survey question. In general, respondents must understand the question and its purpose, retrieve relevant information from memory, integrate that information into a summary judgment, and report that judgment using the response format requested.
There are two broad approaches to cognitive pretesting, although some cognitive interviews combine the two. The first is think-aloud interviews in which respondents are read proposed questions from a survey. However, instead of answering each question, respondents are instructed to report aloud their thoughts as they go through the process of thinking about and answering each question. Advantages of this approach are that it provides rich, unbiased data about respondents' cognitive processes. Disadvantages are that it is fatiguing for respondents (and very difficult for some), and the process of articulating thoughts may actually change the answer a respondent would give. The data may also not always address concerns that researchers have about specific survey questions.
A second cognitive pretesting approach involves structured probes. Respondents answer each proposed survey question and then are asked a series of follow-up questions about their thought processes. For example, they may be asked how they interpreted a specific term or word in the question, how they came to an estimate of how frequently they performed a particular behavior, or whether or not they thought most other people would be comfortable answering a particular question. The advantage of this approach is that a researcher can ask specific questions about specific concerns that he or she may have about a question. It is also substantially easier for respondents than "think alouds", and the data are easier to analyze and interpret. Disadvantages are that the researcher only collects information related to the specific probes he or she includes and may therefore miss important problems with proposed survey questions.
Some cognitive interviewing combines elements of these two approaches. Both rely on respondents' ability and willingness to report on their own cognitive processes. Cognitive interviewing is a very common part of the questionnaire development process (particularly when new questions or instruments are being developed) and is primarily used to identify problems with questions in order to revise and improve them. Unfortunately, less is known about how best to revise potentially problematic items so as to avoid introducing new problems, so it can be useful to use multiple iterations of cognitive pretesting to evaluate revised items.
For more information, see
Presser, S., Rothgeb, J. M., Couper, M. P., Lesslier, J. T., Martin, E. Martin, J., & Singer, E. (2004). Methods for testing and evaluating survey questionnaires. Hoboken, NJ: Wiley and Sons.
Willis, G. (2004). Cognitive interviewing: A tool for improving questionnaire design. Thousand Oaks, CA: Sage Publications.
One aspect of questionnaire design that researchers often struggle with is whether to use verbatim questions from past questionnaires, to modify questions from previous questionnaires, or to write new questions. We encourage researchers to be thoughtful about this decision and to use several criteria for doing so.
First, one goal in designing a questionnaire should be to reduce measurement error. Advances in questionnaire design mean that best practices for writing good questions have changed over time and it, therefore, often does not make sense to use questions based on old practices or knowledge. Also, not all previously used survey questions were created using equivalent methods. Some questions have been developed, pretested, and validated extensively and others have not, so it is important to consider whether there is evidence about the quality of questions when deciding whether or not to include them.
Second, it is important to keep in mind that the optimal question wording may change over time or with the population being studied. For example, in the 1970s, the General Social Survey asked questions about Negroes, which would clearly not be appropriate terminology today. Similarly, questions about communication and social support have evolved to include electronic communication, as technology has become an increasingly important part of how we communicate. The target population can also impact optimal question design. For example, questions designed for adults may need to be rewritten or revised to be appropriate for children.
A third factor that researchers should consider is the purpose of the questions. In some cases, there may not be existing questions that measure exactly the construct of interest; in that case it is better to write a new question - or modify an existing one - that is targeted to the study’s purpose rather than to use an existing question; even a question that has been pretested and validated should not be used if it is not well-suited to the study’s purpose. On the other hand, researchers may sometimes want to make direct (even statistical) comparisons to previous data. In this case, using exact questions from previous research may be important to the research goals. Even verbatim question wording, however, does not ensure that data from a survey will be directly comparable to data from previous surveys if the meaning of words in the question have changed over time. For example, Smith (1987) notes that a question asked in a Gallup survey in 1954 ("Which American city do you think has the gayest night life?") would be interpreted very differently if asked in the 1980s.
There are reasons to use existing questions under some conditions and reasons not to use existing questions under other conditions. We encourage researchers to thoughtfully consider whether it’s best to use existing questions or whether new or revised questions can reduce measurement error or better measure the construct of interest. We strongly encourage researchers to not simply include questions from past surveys and use precedent as the rationale for doing so.
Smith, T. W. (1987). The art of asking questions, 1936-1985. Public Opinion Quarterly, 51, 95-108.
Traditionally, survey questions simply asked for the respondent’s sex and offered two choices—male or female. However, while that question may address biological sex at birth, it does not capture the variations in gender identity. One option for asking gender identity is:
Which of the following best describes your gender identity?
c. Transgender Female
d. Transgender Male
e. Gender variant/non-conforming
f. Other (specify)
g. Prefer not to answer
There are other options as well. Please consult the list of references below.
Sexual orientation is separate from gender and involves three separate components—identity, behavior, and attraction. For example, a man may feel attracted to another man, or even engage in sexual activity with him, but may not identify as gay. Thus, a question asking about sexual orientation needs to specify which of the three separate components it seeks information about. SMART (Sexual Minority Assessment Research Team) makes the following recommendations:
Self-identification: how one identifies one’s sexual orientation (gay, lesbian, bisexual, or heterosexual).
Recommended Item: Do you consider yourself to be:
a. Heterosexual or straight;
b. Gay or lesbian; or
Sexual behavior: the sex of sex partners (i.e., individuals of the same sex, different sex, or both sexes).
Recommended Item: In the past (time period e.g., year) with whom have you had sex?
a. Men only
b. Women only
c. Both men and women
d. I have not had sex in the past [time period]
Sexual attraction: the sex or gender of individuals that someone feels attracted to.
Recommended Item: People are different in their sexual attraction to other people. Which best describes your feelings? Are you:
a. Only attracted to females?
b. Mostly attracted to females?
c. Equally attracted to females and males?
d. Mostly attracted to males?
e. Only attracted to males?
f. Not sure?
Sexual Minority Assessment Research Team (SMART). (2009). Best practices for asking questions about sexual orientation on surveys. The Williams Institute. Retrieved March 1, 2017, from http://williamsinstitute.law.ucla.edu/wp-content/uploads/SMART-FINAL-Nov-2009.pdf
Federal Interagency Working Group on Improving Measurement of Sexual Orientation and Gender Identity in Federal Surveys. (2016). Current measures of sexual orientation and gender identity in federal surveys. Working Paper. Retrieved March 1, 2017, from https://isgmh.northwestern.edu/files/2017/01/WorkingGroupPaper1_CurrentMeasures_08-16-1xnai8d.pdf
Miller, K., & Ryan, J. M. (2011). Design, development and testing of the NHIS sexual identity question. Questionnaire Design Research Laboratory, Office of Research and Methodology, National Center for Health Statistics. Retrieved March 1, 2017, from https://wwwn.cdc.gov/qbank/report/Miller_NCHS_2011_NHIS%20Sexual%20Identity.pdf
Fryrear, A. (2016). How to write gender questions for a survey. Survey Gizmo. Retrieved March 1, 2017, from https://blog.surveygizmo.com/how-to-write-survey-gender-questions
Question wording plays a critical role in how respondents interpret and answer survey questions. Question wording in surveys can change over time in response to advances in questionnaire design, changes in society or culture, or changes in definitions. This is particularly true when survey researchers are using terminology that is associated with a medical diagnosis or legal definition.
One such change occurred in 2010 when President Obama signed a law in October 2010. Known as Rosa’s Law, this legislation required the federal government to replace the term “mental retardation” with “intellectual disability.” The law is named after Rosa Marcellino, a girl with Downs Syndrome who was nine years old when it became law, and who, according to President Barack Obama, “worked with her parents and her siblings to have the words 'mentally retarded' officially removed from the health and education code in her home state of Maryland.” Rosa’s Law is part of a series of modifications to terminology - beginning in the early 1990s - that have been used to describe persons with what we now refer to as intellectual disabilities.
One result of this law is that federal surveys such as the National Health Interview Survey changed the terminology used in survey questions from asking about “mental retardation” to asking about “intellectual disability, also known as mental retardation.” Survey researchers using the NHIS data on intellectual disabilities should be aware of this change and the possible implications for prevalence estimates, particularly if data from before and after 2010 are being compared or combined. In addition, researchers who are designing surveys that measure intellectual disabilities may want to use terminology and question wording that is consistent with federal guidelines.
Language of Rosa’s Law: www.congress.gov/111/plaws/publ256/PLAW-111publ256.pdf
Zablotsky, B., Black, L. I, Maenner, M. J., Schieve, L. A., & Blumberg, S. J. (2015). Estimating prevalence of autism and other developmental disabilities following questionnaire changes in the 2014 National Health Interview Survey. National Health Statistics Reports, No. 87. Available at: www.cdc.gov/nchs/data/nhsr/nhsr087.pdf
One common response format in surveys is a response scale. A survey question that uses a response scale is a closed-ended question that presents respondents with a series of response options that fall along a continuum or dimension (e.g., satisfaction, amount, frequency). Response scales can be unipolar—meaning that one end of the scale represents the complete absence of or a minimum level of the dimension (i.e., a “0” point) and the other represents the maximum level of the dimension. For example, “Over the past month, how often have you felt tired? Would you say never, occasionally, sometimes, often, or always?” Other dimensions are bipolar—meaning that the two ends of the scale represent equally intense opposites and the midpoint of the scale represents a “0” or a neutral point. For example, “Do you think spending on education in the U.S. should be increased a great deal, increased some, increased a little, stay at current levels, decreased a little, decreased some, or decreased a great deal?” (see Survey News Bulletin #9 under “Questionnaire Design” at www.srl.uic.edu/Publist/bulletin.html for a discussion of using branching in these types of bipolar questions).
Some constructs are inherently unipolar (e.g., quantity, frequency) and should be measured using unipolar scales. Others are inherently bipolar (e.g., comparative judgments, change, evaluations of liking or quality) and should be measured using bipolar scales. Some constructs, however, can be assessed using either a unipolar or bipolar scale (e.g., satisfaction). Bipolar scales are more cognitively difficult and rely on the assumption that the two scale endpoints are opposing viewpoints on a single dimension (which might not always be the case). Unipolar scales are less cognitively difficult than bipolar scales and are much more clearly focused on a single dimension. Other aspects of response scales, such as the number and labeling of scale points for unipolar and bipolar scales will be addressed in future Survey News Bulletins.
One question that arises when constructing survey scales is the number of scale points or response choices to offer respondents. Response scales used in questionnaires have varied between 2 or 3 to as many as 101 scale points (see the scale for feeling thermometers used in the American National Election Studies, questionnaires available at http://www.electionstudies.org/). Researchers have tested the optimal number of scale points for both unipolar and bipolar scales, typically by manipulating the number of scale points and assessing the effect of the number of scale points on data quality (e.g., reliability or validity). The preponderance of evidence suggests that 5 scale points are optimal for unipolar scales, and 7 scale points are optimal for bipolar scales (e.g., Krosnick & Fabrigar, 1997). However, others have argued that the optimal number may be as high as 10 or 11 scale points, although the evidence regarding increases in data quality obtained by increasing the number of scale points above 7 is mixed (see Krosnick and Presser, 2010, for a review). These mixed findings may result because the effect of adding more points to a scale may vary across individuals and contexts. For example, the benefits of including more scale points might depend on factors such as how able respondents are to make more fine grained judgements, whether scale points are fully verbally labeled, and individual differences in ability and motivation to provide optimal responses to survey questions.
Krosnick, J. A., & Fabrigar, L. R. (1997). Designing rating scales for effective measurement in surveys. In L. Lyberg, P. Biemer, M. Collins, L. Decker, E. DeLeeuw, C. Dippo, N. Schwarz, and D. Trewin (Eds.), Survey Measurement and Process Quality. New York: Wiley-Interscience.
Krosnick, J. A., & Presser, S. (2010). Questionnaire design. In J. D. Wright & P. V. Marsden (Eds.), Handbook of Survey Research (Second Edition). West Yorkshire, England: Emerald Group.
Survey questions that use a response scale format present respondents with a series of response options that fall along a continuum or dimension (e.g., satisfaction, amount, or frequency). The last two Survey News Bulletins have dealt with the difference between unipolar and bipolar response scales and the optimal number of scale points they should have (see Bulletins No. 88 and 89 in the Questionnaire Design category at www.srl.uic.edu/Publist/bulletin.html).
In addition, researchers need to make decisions about how to label the various points on such scales. Particularly in the case of visually presented scales (such as in web or mail surveys), there are several options for how to label scale points. Scale points can be labeled using numbers only, verbal labels only, or a combination. Scale endpoints alone can be labeled, scale endpoints and the midpoint can be labeled, or all scale points can be labeled. Because numeric values may be interpreted in ways not intended by the researcher (e.g., Schwarz et al., 1991), some researchers advocate using only verbal labels (Krosnick & Presser, 2010). However, if both verbal labels and numbers are being used, it is better to match verbal labels and numerical labels—for example to use negative values for the negative side of a bipolar scale and to use values from 0 to a positive number for a unipolar scale (Saris & Gallhofer, 2007). Several studies have also shown that scales with all points labeled (i.e., fully-labeled scales) provide more reliable data than do scales with labels on only some points (i.e., partially labeled scales; Alwin, 2007; Krosnick & Fabrigar, 1997; Saris & Gallhofer, 2007).
Alwin, D. F. (2007). Margins of error: A study of reliability in survey measurement. New York: John Wiley & Sons.
Krosnick, J. A., & Fabrigar, L. R. (1997). Designing rating scales for effective measurement in surveys. In L. Lyberg, P. Biemer, M. Collins, L. Decker, E. DeLeeuw, C. Dippo, N. Schwarz, and D. Trewin (Eds.), Survey Measurement and Process Quality. New York: Wiley-Interscience.
Krosnick, J. A., & Presser, S. (2010). Questionnaire design. In J. D. Wright & P. V. Marsden (Eds.), Handbook of Survey Research (Second Edition). West Yorkshire, England: Emerald Group.
Saris, W. E.,& Gallhofer, I. N. (2007). Design, evaluation, and analysis of questionnaires for survey research. New York: John Wiley & Sons.
Schwarz, N., B. Knauper, H. J. Hippler, E. Noelle-Neumann, & Clark, L. (1991). Rating scales: Numeric values may change the meaning of scale labels. Public Opinion Quarterly, 55(4), 570–582.
The last several bulletins have dealt with the use of response scales in survey questions. In this Survey News Bulletin we discuss whether or not to include a midpoint in response scales. Bulletin No. 89 (see the Questionnaire Design section of bulletins at www.srl.uic.edu/Publist/bulletin.html) recommends using 5 scale points for unipolar scales and 7 for bipolar scales, both of which include a midpoint. The primary argument for including a midpoint, particularly in bipolar scales, is that some respondents legitimately have a neutral position and should not be forced to choose an option that indicates that their position is closer to one end of the scale than the other (e.g., Schuman and Presser, 1981). However, some researchers have expressed concerns about providing a midpoint for response scales, particularly for bipolar ones. Specifically, researchers have expressed concerns that respondents might select the midpoint of a scale not because that response accurately reflects their answer to the question, but as a way of avoiding giving a potentially unfavorable response (i.e., social desirability), as a strategy for quickly providing an answer to the question without carefully constructing a response (i.e., satisficing; Krosnick, 1991), or as a way of saying “don’t know.”
Although Schuman and Presser (1981) found that there is somewhat less midpoint responding when a “don’t know” response is explicitly offered than when one is not explicitly offered, the bulk of the empirical evidence suggests that providing a midpoint does not have a negative effect on data quality and that respondents who select a midpoint do so because it best reflects their true answer. Narayan and Krosnick (1996) found that midpoint selection was not associated with cognitive abilities as one would expect if it were the result of satisficing. Malhotra, Krosnick, and Thomas (2009) found that using a follow-up question asking respondents who selected the midpoint whether they leaned toward one end of the scale of the other lowered data quality. Both these findings suggest that respondents do not select the midpoint because they are unable or unmotivated to do the cognitive work necessary to answer the question carefully and completely. Finally, Weijters, Cabooter, and Schillewaert (2010) find that including a midpoint reduces the likelihood that respondents will give contradictory responses to two items that have opposite meaning or are coded in opposite directions (e.g., agreeing - or disagreeing - with two statements that are the opposite of one another) and recommend using scales with a midpoint unless there are compelling reasons not to do so.
Krosnick, J. A. (1991). Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied Cognitive Psychology, 5, 213-236.
Malhotra, N., Krosnick, J. A., & Thomas, R. K. (2009). Optimal design of branching questions to measure bipolar constructs. Public Opinion Quarterly, 71(3), 304-324.
Narayan, S., & Krosnick, J. A. (1996). Education moderates some response effects in attitude measurement. Public Opinion Quarterly, 60(1), 58-88.
Schuman, H. & Presser, S. (1981). Questions and answers in attitude surveys: Experiments in question form, wording, and content. New York: Harcourt Brace Jovanovich.
Weijters, B., Cabooter, E., & Schillewaert, N. (2010). The effect of rating scale format on response styles: the number of response categories and response category labels. International Journal of Marketing Research, 27(3), 236-247.
A recent article suggests that items that are shown higher up on the screen in a web survey are rated more positively than when those items are presented lower on the screen. The studies manipulated screen position in several different ways (e.g., showing the target above or below the rating scale, rotating two items to be shown at the top or bottom of the screen) and also rotated other characteristics of the question (e.g., how familiar the target was, the rating scale order). A meta-analysis of all six studies showed that the effect was homogenous and reliable across studies.
Tourangeau, R., Couper, M. P., Conrad, F. G. (2013). "Up means good": The effect of screen position on evaluative ratings in Web surveys. Public Opinion Quarterly, 77, 69-88.
Web surveys conducted with panels of respondents are increasingly popular. One concern with using panels of respondents is that "experienced" respondents may somehow be substantially different from "inexperienced" ones. However, several recent articles suggest that panel conditioning does not affect substantive survey responses and that "experienced" respondents may actually give higher quality responses.
Binswanger, J., Schunk, D., & Toepoel, V. (2013). Panel conditioning in difficult attitudinal questions. Public Opinion Quarterly, 77, 783-797.
Dennis, J. M. (2001). Are Internet panels creating professional respondents? Marketing Research, 13, 34-38. www.gfksay.com/insights/docs/Panel%20Effects.pdf
Software packages that allow respondents to complete a survey questionnaire while online have been available for some time now. These packages are usually relatively inexpensive and offer menu-driven programming and sample management options. Owing to the proliferation of mobile devices, the software also allows for programmed questionnaires to be displayed on and answered using mobile devices such as smartphones and tablets. A recent feature being incorporated into these packages is the completion of survey questionnaires while offline using mobile devices. The data are collected offline and can be uploaded to the server when online. This feature might come in handy, for example, when collecting data from students who are in a classroom where wireless Internet access is not available or from participants at an event such as fairs without the need to set up kiosks or booths.
In Web surveys, respondents often face the task of answering a series of questions using the same rating scale, presented in a grid or matrix format, with the items along the rows and the rating scale points in the columns. The use of this grid/matrix format might increase the tendency of respondents to nondifferentiate (i.e., to select the same or similar response to all items in the series to minimize cognitive effort while responding). Research has found higher correlations among items presented in this format, which is consistent with the occurrence of nondifferentiation. Also, reverse-worded items presented in this format were more likely to solicit an opposite than expected response, suggesting that respondents were not devoting cognitive effort to providing optimal answers. However, research also suggests that there are lower rates of missing data in grids, and grids take less time to complete than when same items are presented on separate Web pages.
Tourangeau, R., Couper, M. P., & Conrad, F. (2004). Spacing, position, and order interpretive heuristics for visual features of survey questions. Public Opinion Quarterly, 68, 368-393. (http://poq.oxfordjournals.org/content/68/3/368.full.pdf+html)
Couper, M. P., Traugott, M. W., & Lamias, M. J. (2001). Web survey design and administration. Public Opinion Quarterly, 65, 230-253. (http://poq.oxfordjournals.org/content/65/2/230.full.pdf)
Toepoel, V., Das, M., & Van Soest, A. (2008). Effects of design in Web surveys comparing trained and fresh respondents. Public Opinion Quarterly, 72, 985-1007. (http://poq.oxfordjournals.org/content/72/5/985.full.pdf)
Progress indicators are designed to provide Web survey respondents with continual feedback regarding how much of the questionnaire they have completed as they work their way through an online instrument. Providing this feedback is intended to minimize respondent breakoffs before completing the questionnaire by providing information regarding progress and a sense of accomplishment. Experimental evidence suggests that progress indicators will decrease breakoffs when the feedback is perceived as encouraging (i.e., for short questionnaires), but they can also increase breakoffs when perceived as discouraging (i.e., for long questionnaires). Consequently, it may be best to display progress indicators only intermittently when deploying long questionnaires, for which progress will be made at a slower pace than for shorter questionnaires. For more information, see
Conrad, F. G., Couper, M. P., Tourangeau, R., & Peytchev, A. (2010). Impact of progress indicators on task completion. Interacting with Computers, 22, 417-427.
Yan, T., Conrad, F. G., Tourangeau, T., & Couper, M. P. (2011). Should I stay or should I go: The effects of progress feedback, promised task duration, and length of questionnaire on completing Web surveys. International Journal of Public Opinion Research, 23, 131-147.
When designing Web questionnaires, an important caveat to remember is that respondents’ choices about the visual presentation of information may be processed and used by respondents. As a result, aspects of formatting or design such as colors, fonts, and images must be chosen carefully so as not to provide respondents with unintentional cues. For example, an experiment by Couper, Conrad, and Tourangeau (2007) documented how the inclusion of images of persons exercising vs. laying in a hospital bed can influence self-health ratings by serving as a frame for personal comparison. Hence, respondents viewing an image of a person actively exercising were found to rate their personal health as lower than did those viewing an image of a person in a hospital bed. Respondents also use formatting and layout information in forming their judgments in other survey modes (e.g., self-administered mail or telephone or in-person interviews), but the myriad of choices for formatting and design in Web surveys make it a particular concern in this mode.
Couper, M. P., Conrad, F. G., & Tourangeau, R. (2007). Visual context effects in Web surveys. Public Opinion Quarterly, 71, 91-112.
One of the advantages of Web surveys is that pictures and other media can be used. However, pictures can also systematically influence respondents' answers to survey questions. For example, Witte et al. (2004) found in a National Geographic survey that images increased support for species protection. Similarly, Couper et al. (2007) found that when a picture of a fit person was shown with a question about respondents' health, respondents reported consistently lower health than when the same question was shown with a picture of a sick person. Toepoel and Couper (2011) found that respondents reacted to the content of images shown, giving higher-frequency reports when pictures of high-frequency events were shown and lower-frequency reports when pictures of low-frequency events were shown. In this study, the effects of pictures on survey responses were similar to assimilation effects found with verbal instructions (i.e., ratings became more similar or were "assimilated" to the context). Verbal and visual cues had independent effects and also interacted. Verbal instructions had stronger effects, were attended to first (before pictures), and took longer to process than did pictures. The effects of verbal instructions could counteract the effect of pictures when both were present and contradictory. This suggests that survey respondents pay attention to verbal instructions more than visual cues such as pictures and that good question writing with clear instruction can reduce context effects from visual cues.
Also see: Couper, M. P., Conrad, F. G.,& Tourangeau, R. (2007). Visual context effects in Web surveys. Public Opinion Quarterly, 71, 633-634.
Toepoel, V., & Couper, M. P. (2011). Can verbal instructions counteract visual context effects in Web surveys? Public Opinion Quarterly, 75, 1-18.
Witte, J. Pargas, R., Mobley, C.& Hawdon, J. (2004). Instrument effects of images in Web surveys. Social Science Computing Review, 22, 1-7.
Slider bars are a type of response format used in Web surveys in which respondents are asked to slide a marker along a continuum to indicate their response. For example, respondents might be asked to report their attitudes toward a target object on a continuum labeled with "strongly like" at one end and "strongly dislike" at the other. The hypothesized advantages of sliders are that they are more enjoyable for respondents (in line with the notion of gamification of the survey response process) and that they allow respondents to choose a response anywhere along the continuum, unlike traditional scales with a fixed number of response options. However, research investigating sliders suggests that they take longer to complete than traditional radio buttons or more traditional visual analog scales in which respondents are asked to click on a continuum rather than dragging and dropping a marker (as they are asked to do in a slider). Furthermore, sliders may reduce the response rate particularly on mobile devices and may be difficult to use across a wide range of mobile devices. The distribution of responses obtained is similar to those obtained from more traditional radio buttons, so there is little advantage to using slider bars in Web surveys. Current best practices suggest avoiding this response format.
See also Couper, M. P., Tourangeau, R., Conrad, F. G., & Singer, E. (2006). Evaluating the effectiveness of visual analog scales: A web experiment. Social Science Computer Review, 24, 227.
Funke, F. (2015). Negative effects of slider scales compared to visual analogue scales and radio button scales. Social Science Computer Review, published online.
As smartphones have become more prevalent, an increasing number of respondents may choose to complete web surveys using these mobile devices. Smartphones are different from desktop and laptop computers (and even tablets) because of their portability and small screen size. Typing a response is also very different on a smartphone because most involve touch screens or very small keyboards. The increased use of smartphones and other mobile devices by respondents to complete web surveys has led researchers to study the effects of mobile technologies on survey responding. Similar responses to sensitive questions and rates of item nonresponse are found for computers and mobile devices. However, researchers have found that completing web surveys using mobile devices can decrease the length of responses to open-ended questions and reduce the quality of responses to particular types of questions (in particular, those that require scrolling to view on mobile devices). It also takes respondents longer to complete web questionnaires when they complete them on a mobile device. This has led researchers to begin to develop web questionnaire formats that work well across device types and to collect data about the device on which a respondent completes a web survey.
For more information, see:
Buskirk, T.D., & C.H. Andrus. (2014). Making mobile browser surveys smarter: Results from a randomized experiment comparing online surveys completed via computer or smartphone. Field Methods 26(4): 322-342.
de Bruijne, M. & A. Wijnant. (2013). Can mobile web surveys be taken on computers? A discussion on a multi-device survey design. Survey Practice, 6(4), 1–8. Available at
Callegaro, M. (2010). Do you know which device your respondent has used to take your online survey? Survey Practice, 3(6): 1–12. Available at
Mavletova, A. (2013). Data Quality in PC and Mobile Web Surveys. Social Science Computer Review, 31(6), 725-743. Available at
Revilla, M., Toninelli, D., & Ochoa, C. (2016). Personal computers vs. smartphones in answering web surveys: Does the devise make a difference? Survey Practice, 9(4), 1-6. Available at
Online survey panels are sample frames of email addresses linked to individuals who indicate their willingness to participate in future Web surveys. Some organizations have online panels that include tens of thousands of individuals who have volunteered or otherwise agreed to be contacted; others claim to have panels representing millions of persons. Most panels are recruited using mostly or exclusively non-probability methods; a smaller number are based on probability sampling, which is more expensive and time-consuming. Researchers interested in developing representative estimates of population characteristics should avoid the use of non-probability online panels. Those considering the use of an online panel also need to look carefully at how the panel was initially recruited in order to understand whether or not it is probability-based. The degree to which those who maintain the panel are transparent regarding how it is constructed and maintained will determine the extent to which researchers are able to discern whether or not the sample is representative of the population from which it was drawn. The American Association for Public Opinion Research has released a Task Force Report on Online Panels that will be useful to researchers considering the use of an online panel for their research. This report is available free at: https://www.aapor.org/AAPOR_Main/media/MainSiteFiles/AAPOROnlinePanelsTFReportFinalRevised1.pdf.
Additional information regarding online panels can be found in these references:
Callegaro, M., Baker, R., Bethlehem, J., Goritz, A. S., Krosnick, J. A., & Lavrakas, P. J. (2014). Online Panel Research: A Data Quality Perspective. Chichester, United Kingdom: Wiley.
Spijkerman, R., Knibbe, R., Knoops, K., Van De Mheen, D., Van Den Eijndem, R. (2009). The utility of online panel surveys versus computer-assisted interviews in obtaining substance -use prevalence estimates in the Netherlands. Addiction, volume 104, issue 10, pp. 1641-1645.