Friday, November 13, 2015

The Exaggerated Demise of the Political Poll

Political Polling is Dead!
I have heard a lot of discussion recently about the death of political polling. The recent failure of polls to even come close to predict the results of recent elections is widespread, including the USA, Canada, Britain, Poland, Israel, Bihar (India), and just this month, Turkey. Pollsters themselves have been warning of more and more challenges in gathering the data, citing lower response rates and the rise of people who have ditched land lines for cell-phone-only households (which cannot be polled as easily or as cheaply). The growing consensus among pollsters and politicos is that the art of political polling is in danger of dying off.

Long Live Political Polling!
Ironically, at the same time that political polls are being declared a dying breed, their importance to media and campaigns is growing quickly. Polls are being conducted earlier in the season, and more often than ever. Politicians rely on internal polls to decide when and if to declare their candidacy. In these increasingly data-driven campaigns, polls determine where to focus resources, micro-targeting messages, and even positions on key issues. Mega-donors, who drive the bulk of spending through the superPACs in our American elections, are clearly attuned to the most recent poll numbers to decide who to back and how much to bet. The Republicans have gone so far as to use polls to decide who gets on stage, and where they sit. Even as the pollsters are telling us that their results are garbage, we clamor for more. Clearly, polls are king, and they aren't going to be allowed to go anywhere.

The Data is Fine, the Math is Bad
The problem, everyone agrees, stems from the rise of the cell-phone-only household (41% and growing). By federal regulation, cell phones must be dialed and interviewed by human beings. Furthermore, response rates on cell phones are abysmal- about 95% of calls by pollsters are not answered or hung up on. This makes calling cell phones prohibitively expensive to do, and very difficult to reach a large representative sample within a 2-3 day window, especially for cash-strapped news organizations. This tends to mean small samples, skewed populations, and a tendency to rely on land lines for a disproportionate section of the responses.

This, of course, gives a very biased view of the electorate, one that fails to reach young, urban, or less affluent voters. The primary observable factors affecting cell phone use and response rate are age, gender, race, and income. Pollsters know this, of course, and attempt to correct for this through a process of re-weighting. Basically, if they "know" that 18% of a state is black females, but they only have 6% of their responses as black females, each of those responses is counted 3x more than normal. Using key demographic questions and a base distribution of the population (e.g. census data) they try to realign their data with the true population.

The first problem with this approach is that demographic data is coarse and out-dated. Out-dated in the sense that the last comprehensive census was 2010. Coarse, in the sense that you rarely get fine grained demographics, like urban Hispanic females earning between $30k-$50k. The demographics may tell you the percentage that are urban, and independently the percentage that are are Hispanic, but rarely both together. Attempting to use reweighting simultaneously across these coarse statistics is something best explained in terms of hex runes and animal sacrifice.

The other main problem is that this is just wrong. Wrong in so many ways that I can't even verbalize coherently here. To summarize, it gives the wrong answer, and gives the wrong amount of uncertainty. For example, consider a group of people that has 5% of the population over 65 years old. Your poll of 300 people only reached 5 people in this category when you expected 15, and they are split 4 for Candidate A vs 1 for Candidate B. Re-weighting would change this count to 12 and 3, and include it in the "corrected" count. First off, it should be clear that the uncertainty for the first case is much higher than the second. Was that one vote for Candidate B a fluke, or is there a real minority support? Furthermore, that minority support could easily be anything from 10-30% and still "match" the data perfectly. Secondly, (in a topic best relegated to it's own post) the predictive distribution for 4:1 ratio is not 80%/20%, but really is 71%/29%, while a 12:3 ratio is closer to 77%/23%. While this example may seem like a trivially small group, keep in mind that we are talking about simultaneously categorizing people based on age, gender, race, and income (four factors heavily influencing both the exclusive use of cell phones and voting habits). According to voter registries, white males 18-30 making <$30k/year are the largest subgroup in FL, and represent approximately 6-7% of the total. So small samples are normal.

The pollsters have valid data. It would be different if people were lying to interviewers (pollsters lying about the people is a separate issue - see Strategic Vision, DataUSA). Instead, the data is valid, it just comes from a highly biased sample set, with some good information on how that set is biased. The problem is that the analysts attempting to make sense of this data don't know how to handle it. That's not unexpected, either - these are traditional statisticians who have built careers out of using old rules of thumb that used to work pretty well. When your sample bias is small and fairly random, re-weighting is "approximately correct" and the errors tend to cancel out over time. The same is true for the reported uncertainties. Basically, these are practical people, not theoreticians, who are leveraging tried-and-true mathematical theorems about sampling. However, these theorems are based on certain axioms (assumptions about the data) and these axioms are no longer true.

Saving an Endangered Species
It's clear that political polls are increasingly important, regardless of their quality. It's also clear that the old statistical rules don't apply anymore. Putting another band-aid on the models or adding more epicycles to the orbits (see Apollonius) isn't going to cut it. We need to rethink what information we have, and what it's telling us. Then we might be able to figure out what we want to know.

What we have are several hundred answers to one key question. The people who gave these answers are not at all representative of the general electorate. However, we have no reason to suspect that they are not representative of a specific sub-population of the electorate. If we talk on the phone to 100 people who are white males 18-30 making <$30k/year, their political opinions are likely the same as any other random group of 100 white males 18-30 making <$30k/year. We may reach more or less people within one of these subgroups than they represent in the total population, but within that group there is minimal bias. So, instead of trying to Frankenstein together a sample that equals the population as a whole (with some people more equal than others), we should instead model each sub-population and then stick them together, using my favorite tool from probability, marginalization. This transforms a single political poll into a whole bunch of much smaller polls. Each new poll is completely separate of the others, with independent numbers of samples, results, and amounts of uncertainty. When we put these separate polls back together, we join them up based on what we know about each piece of the population - poll results and demographics. This gives a much more accurate and reliable understanding of the whole by consciously examining the pieces.

Meanwhile, Back In Reality
Alright, I’m a realist. No pollster is going to give me access to their raw data, and let me “fix” it for them. The best crosstabs that I can get from most of these pollsters might report one or two of the relevant pieces of information, and never as a joint crosstab. I’m pretty much stuck with the results as they have reweighted them. However, that doesn’t mean my hands are tied.

Each pollster uses a very similar method of polling people each time. This means that their bias and uncertainty will remain the same over time. So comparing two different pollsters, we should see (on average) the same differences between them. So if we look at all of the polls together, we can get a model of the error and uncertainty for each pollster. Taken together, we can make very informed decisions about the nation as a whole. This is the approach that I took four years ago, in both the primaries and the general election, and did very well with it.

The problem this time around is that the pollsters, as a whole, are wrong. That’s the big punchline of the past two years, is that no one is getting even close to correct. The big question becomes, how do you model the aggregate errors of the pollsters? Well, we have a lot of data on our side. We are averaging nearly twenty polls a week (and growing). Also, we know the demographic breakdowns of each state, and we know who uses cell phones. Additionally, we can verify any of our models about how the pollsters are skewed against the individual primary states as they vote, providing a long string of guess-and-test experiments (known more formally as The Scientific Method). How exactly are we going to go about putting all of this together? Well... I’ll keep you posted.

3 comments:

  1. You need to learn to spell -- all right -- not "alright"

    ReplyDelete
  2. This is a fascinating analysis. Proper analysis of this campaign season has lacked substance, to this point, so it's a breath of fresh air to see some substantive thoughts. BTW, "alright" is the correct work, as was drilled into me by my grammar teachers.

    ReplyDelete
  3. This is a fascinating analysis. Proper analysis of this campaign season has lacked substance, to this point, so it's a breath of fresh air to see some substantive thoughts. BTW, "alright" is the correct work, as was drilled into me by my grammar teachers.

    ReplyDelete