Wednesday, September 19, 2012

Are Election Polls Oversampling Democrats? Not Really.


As of right now, Obama leads Romney by 2.9 points nationally, according to RealClearPolitics. And his lead among swing states is even higher. Thanks to Fox News and NBC/WSJ/Marist, Obama leads by 4.8 points in Ohio, possibly the most important swing state in the election. His lead is also at 4.7 points in Virginia, and 2.0 points in Florida. Thanks to polling like this, Intrade has Obama's chances of winning at 67.4%.

However, some Republicans are skeptical of this lead, arguing these polls tend to oversample Democrats in order to make it look as though Obama's lead is actually much smaller than it appears. Attempts to "unskew" these polls have resulted in Romney leads of 9%! So it is a legitimate question to ask, have these polls been oversampling Democrats in order to make Obama look stronger than Romney, perhaps in order to create some kind of self-fulfilled prophecy?

No doubt conservative "skeptics" of the mainstream media would be quick to answer with a resounding "YES!" Could you have any more proof of a liberal media bias? Of course, the truly skeptical mind would probably see a few red flags first.

Starting in 1992, EVERY Pew poll appears to lean to one direction — always towards the Democrat, and by an average of more than 5 percentage points. Worse this is a reflection of the “final” poll which even the Democratic firm, Public Policy Polling, usually gets right." (emphasis not mine)
After listing the final polls for Pew for the 1988-2008 presidential elections, the Number-Cruncher wonders why Pew doesn't try to adjust their numbers? Actually, the answer to this is rather simple. The numbers he used were for Registered Voters (RV), not Likely Voters (LV). In fact, LV models are actually used to better predict the outcome of an election since not all registered voters actually vote. Nate Silver at FiveThirtyEight points out that, when choosing between polls of LVs verses RVs, always go with the former. This may not necessarily be the case early on, before the conventions. But it is definitely the case now. As a result, out of the 7 polls included in the RCP average, 6 are LV polls.  In fact, if we are to look at Pew's results for their final LV polls, they actually are quite accurate:
  • 2008 error: D-1 (Republican numbers were accurate but Democrat numbers off by 1)
  • 2004 error: 0
  • 2000 error: R+2
  • 1996 error: D+6
Other than 1996, their recent track record has been very good, with only small errors of 0-2 points favoring Republicans. 

In fact, many of the most heavily criticized polls do extremely well at predicting the vote. This prompts the obvious question: so why are people saying there is a problem?

Despite the fact that many of these critiques point to RV polls, LV polls often show a stronger Democrat presence than Republican:
 Note: I did not Include Rasmussen because I do not have access to their numbers

In recent elections, party turnout has swung from a tie between Democrats and Republicans in 2004 and 2010, to a 7 point lead for Democrats in 2008, which was an unprecedented year for Democrat voter turnout. So it may be a stretch to assume Democrats will turnout in numbers like the Pew poll suggests.

So why is this the case? Some conservatives have speculated pollsters may be oversampling to try and replicate the unusual 2008 voter turnout among Democrats.  However Emily Perkins of reason.com finds there is little support for this theory:
It is hard to say whether pollsters are in fact relying too heavily on 2008 partisan turnout, because it is extraordinarily difficult to track down how these pollsters define likely voters.
According to Chris Jackson at Ipsos-Reuters, “most research organizations use a combination of prior voting behavior, interest in the election and self-report likelihood to vote to categorize likely voters. ...Some pollsters also use ‘voter lists’ or commercial lists of people who voted in the last election instead of screening these individuals from the population.”
Rasmussen gives a vague explanation here, “The questions involve voting history, interest in the current campaign, and likely voting intentions. Rasmussen Reports determines its partisan weighting targets through a dynamic weighting system that takes into account the state’s voting history, national trends, and recent polling in a particular state or geographic area.”
ABC News explains, they “develop a range of ‘likely voter’ models, employing elements such as self-reported voter registration, intention to vote, attention to the race, past voting, age, respondents’ knowledge of their polling places, and political party identification.”
As Huffington Post’s Mark Blumenthal reports, “CNN has published no explanation of how they select likely voters.” (emphasis mine)

In addition, there is little reason for polls to adjust their samples to fit 2008 at all. Pew explains:
"While all of our surveys are statistically adjusted to represent the proper proportion of Americans in different regions of the country; younger and older Americans; whites, African Americans and Hispanics; and even the correct share of adults who rely on cell phones as opposed to landline phones, these are all known, and relatively stable, characteristics of the population that can be verified off of U.S. Census Bureau data or other high quality government data sources."
"Party identification is another thing entirely. Most fundamentally, it is an attitude, not a demographic. To put it simply, party identification is one of the aspects of public opinion that our surveys are trying to measure, not something that we know ahead of time like the share of adults who are African American, female, or who live in the South"
...
In effect, standardizing, smoothing, or otherwise tinkering with the balance of party identification in a survey is tantamount to saying we know how well each candidate is doing before the survey is conducted."
In other words, while pollsters may adjust party identification in LV models to reflect the fact that not all registered voters actually vote, there is little they may do to adjust party turnout for surveys in general (RV, All), other than assign weights due to census factors. Since the appearance of Democrat oversampling is stronger in other surveys than LVs, there is little reason to think the LV adjustments would be responsible for the appearance of Democrat oversampling. This means that the appearance of Democrat oversampling may only exist because more voters actually consider themselves Democrats at the time the polls are conducted.

Now it is highly unlikely we will have the same composition of voters once election time comes around. However this is not all that much of a problem since party identification, unlike registration, changes during the election season. Pew explains:
"Particularly in an election cycle, the balance of party identification in surveys will ebb and flow with candidate fortunes, as it should, since the candidates themselves are the defining figureheads of those partisan labels."
This "ebb and flow" can be drastic. Rasmussen measured party identification in July 2012 with a 1 point advantage for Republicans and again in August 2012 with a 4 point advantage for Republicans. Gallup saw an even larger bounce from Nov 7-9 2008, where Democrats outnumbered Republicans by 5 points, to Nov 13-16 2008, where Democrats outnumbered Republicans by 13 points. That is an 8 point change in a few weeks! From Aug 20-22 2012, Gallup measured a 3 point lead for Democrats. A few weeks later, from Sep 6-9, that lead extended to 8 points. So there is even less reason to try and adjust polling since it will adjust itself if necessary anyway. Pollsters that do try and adjust based on party identification risk skewing the results to show a situation not reflective of the country as a whole.

So we are left with one more question: Why do the results of party identification differ so much between polls? Other than sampling error, this can be explained by differences in methodology. Nate Silver explains one such difference:
"Although there are exceptions on either side, like the Gallup national tracking poll, for the most part Mr. Obama seems to be getting stronger results in polls that use live interviewers and that include cellphones in their samples — enough to suggest that he has a clear advantage in the race.
In the polls that use an automated dialing method (“robopolls”) or which exclude cellphones, Mr. Obama’s bounce has been much harder to discern, and the race looks considerably closer."
And there is good reason why this is the case:
"These results are consistent with some past research. Roughly one third of American households rely solely on mobile phones and do not have landlines, meaning they will simply be excluded by polls that call landlines only. Potential voters who rely on cellphones belong to more Democratic-leaning demographic groups than those which don’t, and there is reasonably strong empirical evidence that the failure to include them in polls can bias the results against Democrats, even after demographic weightings are applied."
PEW has confirmed this trend in the past. And Nate Silver has confirmed this is the case for the 2012 election as well:




Since Rasmussen uses only landlines, and others like Fox News, NBC, WSJ, and Quinnipac use a mixture of landlines and cellphones, it is easy to see why Rasmussen tends to poll a larger sample of Republicans than Democrats, and thus also tends to poll to the right of other polls as well. This effect has become more pronounced over the last few years, leading to a decline in Rasmussen's ability to predict election results.

So, to answer the question originally posed in this article, any appearance of party oversampling is likely caused by one party being better represented in the population at a given time than another. And since these polls look at party identification, not registration,  we should expect this to change over time, meaning we don't necessarily expect to see the exact same party identification distribution on election day. However, poor sampling methodology that may bias one party over another, such as ignoring cellphones when sampling, is likely a better explanation for the appearance of bias in one poll or another.


No comments:

Post a Comment