Sunday, December 27, 2015

Divergent Polls

Divergent Polls
Last week saw a rash of polls released. Here's a quick summary of those numbers:
This is a good crop of fairly consistent polls, all within the expected sampling error of each other. From these polls, we can conclude that Trump is on another major surge, reaching for that magic 40% range he needs to avoid a brokered convention. Meanwhile, Cruz is showing minor gains and Rubio appears to have stalled.

The most interesting thing, however, is not in these numbers. It's in these other numbers:
This is the same race, across the same nation, during roughly the same period of time. These polls tell a very different story, where Trump is struggling to expand his support beyond the 28% ceiling we've seen for months. Cruz has launched a major surge towards the front-runner slot, and Rubio's growth continues at a slow but steady pace.

Two Sets of Polls, Two Sets of Reality 
This is not a case of an outlier poll, like most political stories seem to be casting it. The odds of the generating the Quinnipiac poll as a statistical outlier from the first grouping is roughly 1 in a quadrillion less likely than the Fox News poll. That's 1 with 12 zeros behind it. If that isn't unlikely enough, think about the odds of that happening three times. 

This means that these are two very different modes, which are clustered within sampling error and are self consistent across multiple candidates. That's two different answers for what is happening, and they cannot be treated as polling the same reality.

What the *!@? is Happening Here?
The big question is what's so different about these two groups? For once, it's not cell phones. Both groups are using a similar mixture of landlines and cell phones with live interviews. Also, it's not registered voters vs likely voters, or live calls vs robocalls, or any of the other usual suspects. Looking at the other methodologies doesn't show any other tell-tale signs. The truth appears to be more subtle than that. 

Unfortunately, most of the pollsters don't like to reveal any of the deep internals of their models, or report any of the raw results. However, a deep dive into the cross tabs does show something interesting. I performed some statistical forensics to back out magnitudes of how many minorities were interviewed in the poll, inferred from the statistical variation between different categories of registered Republicans, such as "All Males" vs "White Males." Keep in mind that Gallup polls and exit polls from 2012 and 2014 put the number of minorities who are registered or self-identifying as Republicans somewhere around 11-13%. If these groups are interviewed at this proportion from the general populace, even on non-racial or non-political issues, we would expect to see a certain amount of differences solely from sampling variance. As the size of the group goes down, so does the variance. In short: if you think white people and minorities agree on everything down to 1%, you didn't bother to ask one of the groups.

For all of the first group of polls, the number of minority respondents appears to be very low, bordering on non-existent. Across all categories, the variation between these two groups is never more than one percent. That includes estimating Trump's support. If minorities seem to support a racist candidate like Trump as much as white people, there's something strange going on. 

In the second grouping, computing the crosstabs is a little different. Two of the pollsters (Suffolk & Quinnipiac) are university polls that flat out tell us how many minorities they reached. While they didn't break it down by ethnicity within the Republicans, both pollsters hit very close to the target number of Blacks without weighting, and did a good job of representing Hispanics and Asians with only a little undersampling.  Furthermore, they seem to give a good natural distribution from the very low income groups (<$30k/year) and younger age groups than the first grouping.

I dug a little deeper, and looked at results for the first group, going back three months. This is roughly the time period where we have seen the bifurcation of the polls that has so recently accelerated. It is also during this time that the undersampling of minorities seems to take place. The further back in time you go, the more "Whites" seem to differ from "All Republicans."

Conclusion A: Quality of Sample Matters
While excluding minorities isn't enough to explain the difference in polls by itself, it is a clear indication that the quality of the sample is bad for roughly half of the major pollsters. No amount of re-weighting will account for having only three Hispanics answer your poll. Also, this consistent degradation of the quality likely applies to other subgroups as well, since it is highly doubtful that these pollsters are just being racist somehow.

Conclusion B: Trump Might Be In For a Rude Awakening
As it stands, I could believe that many polls are not only suffering from higher than reported margins of error, they are also inadvertently introducing a consistant bias into the results. I caution readers to start treating the first group of pollsters with a certain amount of skepticism until February 9th (New Hampshire's primary) when we have a little more information to go on. The second group of polls, which show Trump stalled and Cruz rising fast, seem to be much more believable once you look under the hood. 

It is quite possible that Trump's 'lead' is still at -22%.

No comments:

Post a Comment