Published on August 26th, 20131
Can We Trust Opinion Polls? The Central Limit Theorem, Binomial Proportion Confidence Intervals, and Likely Voters
By Tor G. Jakobsen
Ever since the 1824 straw poll of the U.S. presidential election showing Andrew Jackson leading over John Quincy Adams, opinion polls has become more and more popular. In 1936 George Gallup introduced the concept of representative sampling, which means that the people asked should be a mirror image of the population under scrutiny.
However, even if we draw a random sample of the population we cannot be certain that it is a true mirror image. The numbers we get from opinion polls are associated with statistical uncertainty. To understand the logic of this uncertainty we need to get acquainted with what is known as the central limit theorem. Let us say that we conduct a survey of 1000 persons and ask if they prefer the candidate Obama or one of the other candidates. We find that 54 percent of those we ask will vote for Obama.
Our next step is to imagine that we conducted several opinion polls (for example, one million polls) asking 1000 persons each time whether or not they would vote for Obama. According to the central limit theorem the distributions of the results of all of these polls will center round the true population percentage of people who favor Obama, and the distribution will follow the shape of a bell curve (normal distribution).
95 percent of our estimates of Obamas percentage will fall within 1.96 standard deviations of the true population mean of people who will vote for him. The standard deviation is a measure of the mean distance of each opinion poll from the true population mean.
But how can we calculate our confidence interval when we have not performed more than one opinion poll (the one where we found that 54 percent will vote Obama)? Well, we can insert a standard deviation using formulas determined by statistical inference. What we are doing here is calculating what is known as a binomial proportion confidence interval.
There are in fact several ways to calculate the confidence interval, and the normal approximation interval is the simplest formula. This calculation is based on the central limit theorem, and applies poorly when the distribution is less than 30, or if the percentage who say they will vote for Obama is close to 0 or 100. However, for most instances this is an appropriate and easy-to-grasp calculation.
p = percentage stating they will vote for Obama
n = sample size
a = desired confidence
Z1-a/1 = 1.96 for 95 % confidence
Z1-a/1 = 2.57 for 99 % confidence
It is common practice in statistics to operate with a 95 percent confidence interval, which means that we will accept 5 percent uncertainty (that there is 2.5 percent chance the true number is below our confidence interval, and a 2.5 percent chance the true number is above our confidence interval). Therefore we will insert the number 1.96 (which gives us the 95 percent) as well as 0.54 (the proportion in our survey that said they would vote Obama) as well as the number of people asked (1000 were interviewed) in the formula. We can now calculate our confidence interval:
We see that we have a confidence interval of ±3.09 percent for our estimated percentage of 54. Thus, we can with 95 percent certainty state that the true percentage of people who will vote for Obama is within 50.91 to 57.09 percentages. The true value of 56 percent falls within this interval (but 5 percent of the times it will be further away).
From the formula we see that the range of the confidence interval will vary depending on the percentage who said they will vote for Obama. This is illustrated in the table below, where different intervals (95 %) are calculated for different percentages, all with a sample size of 1000 (remember that the larger the sample size the more precise estimates we will get). Remember that this calculation does not work well for N<30 and values close to 0 and 100.
We can also note that the confidence intervals decreases when our sample increases, so if we interview 1200 persons we will get better estimates than when we ask 600. The larger our sample the closer we can assume our sample is to the true values of the population.
Now that you know how to interpret the statistical uncertainty associated with random sampling, we will move on to the other things you as a critical reader should be concerned with when viewing the opinion polls before an election.
Eligible voters vs. likely voters
An additional concern when it comes to opinion polls is whether or not you are drawing a sample from the correct population. One can easily argue that when trying to determine who will win an election one should not be interested in a mirror image of the voting population, but rather with a mirror image of those that are actually going to vote. One way of determining this is to ask the respondent if he or she intends to vote in the upcoming election.
However, more people say they will vote than actually do. If an opinion poll has not taken this into account, the results will be skewed in favor of certain political parties or candidates. Parties and candidates with a strong party identity and a voter base with strong resources will thus be underestimated in the polls, as they are more probable to actually vote than people without party identity or from a less privileged background.
If this bias was not corrected for, the Democratic Party in the United States would do considerably better in the polls than in the elections, and vice versa for the Republican Party. Luckily U.S. polling bureaus are aware of this issue, and ask questions to determine whether or not the respondent is a likely voter. These may include whether the person has voted in the previous election, if they know where to vote, and if they have given much thought to the election.
Additional concerns include what is known as nonresponse bias and response bias. The former pertains to the fact that some people are unwilling to either answer the phone or refuses to answer the poll. These sub-groups of people are not randomly drawn from the population. They have certain characteristics separating them from those that answer the phone and states who they will vote for. This results in the remaining sample not being a true mirror image of neither the electorate nor those that are likely voters. Naturally, one should always strive for the highest possible response rate in order to minimize this bias.
There is also the problem of response bias, that is, people will state that they vote for a certain candidate or political party, but will vote for another candidate/party on Election Day. This can actually work both in the direction of overrepresentation of both more extreme and more moderate views.
It can often be easier in the spur of the moment state that you will vote for what is perceived as a fringe party/candidate when asked in a poll, but when situated in the voting booth the voter will change his or her mind and vote for the more known moderate alternative. But the opposite can also take place, that the respondent is unwilling to admit to the interviewer that he or she supports candidates associated with views that are not considered politically correct (e.g., sexism or racism), but can in the privacy of the voting booth express support for this candidate. Whether this effect favors the extreme or the moderate differs from one country to another.
There are also instances when such an effect comes into play with regard to unwillingness to even admit to voting for mainstream candidates (for example, voting for conservatives over social democrats, or a white candidate over a black candidate).
Political opinion polls are meant to give a picture of the strength of candidates or parties at a given time before an election. These will also change as the electoral campaign advances. The less party loyalty there is in a society, the more the opinion polls will swing. It is important to take these polls with a grain of salt, and to think about the points mentioned in this article. Some polls are more accurate than others (especially U.S. polls have good measures of determining who the likely voters are) but some problems associated with polling are still difficult to come around.
*Cover photo by Muhammad Ghafari, interview photo by gauge opinion, voting booth photo by Sam Felder, women photo by Paolo Fefe.