Black Swan

Thursday, November 19, 2015

Ranking the Pollsters

Trump Surges Ahead! Again! No, Really, For Real This Time!

The headlines and talk shows are filled with deep, thoughtful sounding people explaining why the GOP is flocking to Trump in recent polls. Theories range from his aggressive stance towards ISIS, a growing fear for border security, or Carson's recent fumbling of foreign policy.

The truth is much simpler. Trump's support hasn't changed.

The real culprit is that most pollsters tend to release their numbers on a monthly basis, though not all at the same time. Right now we have Bloomberg (Trump +4) and PPP (Trump +7), with Quinnipiac (Trump +1) and NBC/Wall Street Journal (Carson +6) polls as the last ones to compare to. If they keep to schedule, we should see CNN (Trump +5) and ABC (Trump +10) polls coming out soon, which are likely to continue the perceived trend of "Return of the Trump" before the next CBS/NY Times (Carson +4) and NBC polls causes pundits to wonder if Trump's decline could spell the beginning of the end of his campaign.

I can't be sure which pollster is right (yet), but let me show you some trends on Trump's versus Carson's poll numbers:

21 vs 5 (08/02) 21 vs 16 (09/21) 24 vs 20 (11/17) Bloomberg

29 vs 15 (08/30) 27 vs 17 (10/04) 26 vs 19 (11/17) PPP

28 vs 12 (08/25) 25 vs 17 (09/21) 24 vs 23 (11/02) Quinnipiac

19 vs 10 (07/30) 21 vs 20 (09/24) 23 vs 29 (10/29) NBC/WSJ

24 vs 9 (08/16) 24 vs 14 (09/19) 27 vs 22 (10/17) CNN/ORC

24 vs 6 (07/19) 33 vs 20 (09/10) 33 vs 22 (10/18) ABC/WaPo

If we look at the individual pollsters, Trump is holding steady since August. Carson has been building support, though that may have slowed recently. Using this data, we can predict that the CNN and ABC numbers will be very similar to the October 17 & 18 numbers, with Trump holding steady and Carson gaining around 2-4 points. However these numbers are still noisy, so any individual poll might be off by as much as five points. The trend and the aggregate is much more stable. Looking too closely at any one poll is like blindly touching the trunk of an elephant and deciding you’ve caught a snake.

The Hidden Romney-Trump Connection

This effect of differing pollsters is of course not new to this election. A look at the last election is illustrative, and can perhaps serve as a starting point for interpreting these latest polls.

In 2012, I tracked all of the general election polls for the President and created an excellent predictor for the results of the national popular vote, as well as each of the individual states. Looking at the final month of predictions, corrected by the actual results in each state, I computed the bias (consistent error) and standard deviation (noise level) of each major pollster. The results are shown below, with a positive bias indicating a consistent over-prediction for Romney.

2016 Bias 2012 Bias 2012 S.D. Pollster

-3.3 Trump -2.3 Romney 2.2 ABC/Washington Post

-1.3 Romney 5.1 Marist College

-3.1 Trump -0.7 Romney 4.0 PPP

-0.7 Trump -0.3 Romney 3.4 CNN/ORC

0.3 Trump 0.8 Romney 4.3 Quinnipiac

1.9 Trump Bloomberg

1.2 Romney 3.4 IBD/TIPP

7.7 Trump 1.3 Romney 2.6 NBC/Wall Street Journal

1.3 Romney 3.6 Fox

This shows that even in 2012, the pollsters were all over the place. However, the three most accurate were CNN/OPC, PPP, and Quinnipiac, and their bias mostly cancelled each other out. ABC/WaPo was way off in one direction while NBC/WSJ was off in the other (though not quite as extreme). If we’re brave enough to carry that lesson forward, we would expect to see the best three pollsters in the middle again, flanked on either side by ABC/WaPo and NBC/WSJ. In fact, that’s not too different from what we see across the past three months, with Quinnipiac and CNN/ORC sitting in the middle with NBC and ABC sitting at the two extremes.

The really interesting thing is the strong correlation between a bias towards Romney, and an apparent bias towards Trump. The complete ordering of most to least biased is preserved. This would lend one to believe that something consistent is playing out in the pollsters and their underlying models of the electorate. Perhaps correcting the pollsters is more tractable than it looks. However, determining how to capture that correlation, and how that relates back to the actual election will be the trick. I’ll keep you posted.

Tuesday, November 17, 2015

A Quick Poll Round-up, And It's Ugly

Obama Job Approval

We recently were treated to a rash of polls on Obama's job approval ratings. As I discussed earlier, these reported poll results are getting less and less reliable. This latest round of polls really underscores my point nicely:

49% Approve, 47% Disapprove = +2% Gallup (1500 adults, 11/14 - 11/16)

44% Approve, 54% Disapprove = -10% Rasmussen (1500 "likely voters", 11/12 - 11/16)
43% Approve, 50% Disapprove = -7% Reuters / Ipsos (1586 adults, 11/7 - 11/11)
45% Approve, 47% Disapprove = -2% CBS News / NY Times (1495 adults, 11/6 - 11/10)
45% Approve, 48% Disapprove = -3% Economist / YouGov (2000 adults, 11/6 - 11/10)

The numbers are all over the place, despite all being held within 10 days of each other. This is clearly outside the reported 3% margin of error for these polls (the margin of error in this case means that 95% of the time the spread is within X% of the true underlying distribution in the population). In fact, if you take these polls at face value, this distribution implies a margin of error closer to 9% - hardly the kind of number that inspires confidence. However, let’s look at an earlier set of polls from each pollster.

Previous Polls

49% Approve, 48% Disapprove = +1% Gallup (1500 adults, 10/24 - 10/26)

46% Approve, 52% Disapprove = -6% Rasmussen (1500 "likely voters", 10/18 - 10/20)

43% Approve, 51% Disapprove = -8% Reuters/Ipsos (1586 adults, 10/15 - 10/21)

45% Approve, 46% Disapprove = -1% CBS News/NY Times (1495 adults, 10/4 - 10/8)

44% Approve, 48% Disapprove = -4% Economist/YouGov (2000 adults, 10/9 - 10/13)

Observed Change

+1% Gallup

-4% Rasmussen

+1% Reuters/Ipsos

-1% CBS News / NY Time

+1% Economist / YouGov

This group is also widely scattered, still closer to the 9% margin of error than the reported 3%. However, there's a very good reason for that - the amount of change within each pollster is much smaller, with a likely margin of error between polls of about 4%. So what we are seeing is more likely a distribution of different bias from different pollsters. Now, this bias is unlikely to be intentional or necessarily reflect the views of the pollster. It is much more likely that the pollster is putting together some magical weighting process that is their own secret sauce to weight the poll respondents. They want to get it right, they just don't know how. So which pollster is right? Well, you, lucky reader, just might be able to find out. I am working on solving that problem right now, though it might take me a little while to be sure. Like maybe three months, when the first primary results come in to provide a little ground truth.

Democratic Primary

Let's be complete, and give the same treatment to the Democratic primary, and see where the race sits.

52% Clinton, 33% Sanders, 5% O'Malley CBS News / NY Times (11/6 - 11/10)

56% Clinton, 31% Sanders, 2% O'Malley FOX News (11/1 - 11/3)

57% Clinton, 35% Sanders, 4% O'Malley McClatchy / Marist (10/30 - 11/5)

53% Clinton, 35% Sanders, 0% O'Malley Quinnipiac (10/29 - 11/2)

The quick summary of these polls show Clinton up by 21% over Sanders, with a possible margin of error around 6%. The interesting thing is that this is actually much closer to the reported margin of error of 4.5% (since the sample sizes are much smaller than the job approval ratings, the reported margin of error goes up). Could it be that the pollsters are better at modeling registered Democrats than the population as a whole? Possibly. Or this group of pollsters just happen to agree with each other more, especially when they have one less dimension to weight their respondents with (party affiliation).

Now look back at the previous results for these pollsters.

Previous Polls

46% Clinton, 27% Sanders, 0% O'Malley CBS News / NY Times (10/4-10/8)

45% Clinton, 25% Sanders, 1% O'Malley FOX News (10/10 - 10/12)

(data not available) McClatchy / Marist

43% Clinton, 25% Sanders, 0% O'Malley Quinnipiac (9/17 - 9/21)

Observed Change

0% CBS News / NY Times

+5% Fox News

0% Quinnipiac

Not much reported change in the spread. However, in all of these polls (that we have data for) both Clinton and Sanders seem to have increased their portion significantly (by 6-10%!). This could reflect a firming up of the electorate, with a shrinking pool of undecided primary voters. With only about 10% of the voters still in an 'undecided' camp, there really isn't much room left for Sanders to get a majority. If he's in this to win, he needs to start changing some minds. The one area that voters don't really know about Sanders is his foreign policy. That also happens to be a subject thrust into the forefront in recent days. Expect Sanders to start heavily detailing his stance on Syria and ISIS. Anything less, and he's just playing for the Veep slot.

Republican Primary

OK, this is just a mess. The margin of error is a statistic specifically calculated for a two way race. When you have someone like O'Malley on the Democrats' side who is in the low single digits, it really isn't much of a stretch to make the old formulas fit "well enough." However, the Republicans have four major players right now, with another six making up about 17-25% of the voters. There's no way that any self-respecting mathematician could continue to use "margin of error" calculations, yet the polls keep rolling out these meaningless numbers, none-the-less.

First, the "raw" polls:

26% Trump 23% Carson 11% Rubio 11% Cruz 4% Bush 17% Other Fox News (11/1-11/3)

23% Trump 24% Carson 12% Rubio 8% Cruz 8% Bush 20% Other McClatchy/Marist (10/20-11/4)

24% Trump 23% Carson 14% Rubio 13% Cruz 4% Bush 13% Other Quinnipiac (10/29-11/2)

23% Trump 29% Carson 11% Rubio 10% Cruz 8% Bush 14% Other NBC / WSJ (10/25-10/29)

28% Trump 23% Carson 11% Rubio 6% Cruz 6% Bush 11% Other IBD / TIPP (10/24-10/29)

Treating this as a horse race between Trump and Carson, they appear essentially tied, with Trump reportedly getting less than 1/2% advantage. Taken at face value, Rubio would seem to have a 2% lead over Cruz, while still polling 12% behind Trump or Carson, basically garnering half the support of the front runners. Even without an easy sense of the "noise" on these polls, it's easy to see clear stratification of the race, at least according to the pollsters' models.

Previous Polls

24% Trump 23% Carson 9% Rubio 10% Cruz 8% Bush 17% Other Fox News (10/10 - 10/12)

(data not available) McClatchy/Marist

25% Trump 17% Carson 9% Rubio 7% Cruz 10% Bush 20% Other Quinnipiac (9/17 - 9/21)

25% Trump 22% Carson 13% Rubio 9% Cruz 8% Bush 16% Other NBC / WSJ (10/15 - 10/18)

17% Trump 24% Carson 11% Rubio 6% Cruz 8% Bush 21% Other IBD / TIPP (9/26 - 10/1)

Observed Change

+2 Trump 0 Carson +2 Rubio +1 Cruz -4 Bush 0 Other Fox News (10/10 - 10/12)

-1 Trump +6 Carson +5 Rubio +6 Cruz -6 Bush 0 Other Quinnipiac (9/17 - 9/21)

-2 Trump +7 Carson -2 Rubio +1 Cruz 0 Bush -2 Other NBC / WSJ (10/15 - 10/18)

+11 Trump -1 Carson 0 Rubio 0 Cruz -2 Bush -10 Other IBD / TIPP (9/26 - 10/1)

Looking at the earlier polls from the same firms, it's clear that the noise on the Republican primary is much larger. Whether that marks a true volatility in the race, is an artifact of sampling the sheer number of candidates, or represents a real instability in the pollsters' models, these polls require a healthy dose of skepticism when reading the "latest breaking poll" on the Republican race.

But what is the latest breaking poll? And how should we interpret it?? Well... I'll keep you posted.

Late Breaking News!

As of the time of this post, Bobby Jindal has just suspended his campaign. Given that most pollsters actually didn't find a single voter supporting his candidacy, this will have absolutely no effect on the race whatsoever.

Sunday, November 15, 2015

Democratic Primary Update

Democratic Debate: More Than Just Business, Yet Still Business As Usual
The Democrats took to the stage Saturday night to hold their second debate. The event was filled with facts, policy, and positions in response to the moderators sometimes pointed questions. Many of the facts provided were even true and relevant. In other words, nothing at all like the Republican debates. The only way that the two parties' debates could be more different is if the gaggle of GOP actually started throwing things (or knifing each other, or brandishing randomly concealed hand guns).

In response to the attacks in Paris on Friday, the first half hour of the debate was devoted to foreign policy, before continuing with the regularly scheduled questions on the economy. All three candidates (yes, Martin O'Malley is still in the race) provided solid reiterations of their stated positions and reasoning. This provides a great introduction to the candidates for anyone who didn't know what each of them stood for, but didn't provide any shockers or revelations, either in terms of domestic or foreign policy.

And the Winner Is... [insert your candidate here]
Immediately after the debates, various news organizations attempted to declare a "winner," whatever that means. The reports from various "experts" were a little mixed, but I think they were best summed up by CBS News. Immediately after the debate, they assembled a representative panel of Democrats and independents that they thought were likely to vote. 51% of this panel declared Clinton the winner, versus 28% who chose Sanders, and 7% for O'Malley. Coincidentally, CBS News just released a poll (conducted Nov 6-10) which found Clinton going into the debates with the lead of 52%, with Sanders at 33% and O'Malley at 5%, which is very similar to what they got last month. So, ignoring the potential problems with political polls, one might safely assume that after hearing the same arguments on the same topics, primary voters are declaring the same candidate as both their favorite, and as the winner.

Pragmatically speaking, the winner of any event (debates, campaign events, meet and greets, SNL skits - whatever) is the candidate whose number of supporters grows as a result. So, until we see the new polls at the end of the week, it's impossible to tell who won. My prediction, however, based on early reactions, is a stalemate that failed to change anyone's minds.

Which is a strategic victory for Hillary.

Clinton is Way in the Lead
Meanwhile, in the poll that really matters, Clinton is way in the lead. A recent poll of Democratic super-delegates finds that 359 of the 710 are publicly supporting Clinton, with 343 unwilling to endorse at this point. These super-delegates are individuals who (by virtue of their position within the party) get a vote at the convention, unbound to popular opinion. In a somewhat un-democratic process, these super-delegates represent 15% of total delegates to the convention. So, three months before the first caucus or primary allows mere mortals to vote, Clinton is already 15% of the way to getting the nomination (possibly even 28% of the way, if the uncommitted super-delegates are privately split the same way). After being out-flanked by Obama in the primary, Clinton is laser-focused on collecting the all-important delegates, playing the game by the rules that are written, instead of the "popular vote" that everyone tends to think of.

As for what is happening in the GOP Circus of 2016, well... I'll keep you posted.

Friday, November 13, 2015

The Exaggerated Demise of the Political Poll

Political Polling is Dead!
I have heard a lot of discussion recently about the death of political polling. The recent failure of polls to even come close to predict the results of recent elections is widespread, including the USA, Canada, Britain, Poland, Israel, Bihar (India), and just this month, Turkey. Pollsters themselves have been warning of more and more challenges in gathering the data, citing lower response rates and the rise of people who have ditched land lines for cell-phone-only households (which cannot be polled as easily or as cheaply). The growing consensus among pollsters and politicos is that the art of political polling is in danger of dying off.

Long Live Political Polling!
Ironically, at the same time that political polls are being declared a dying breed, their importance to media and campaigns is growing quickly. Polls are being conducted earlier in the season, and more often than ever. Politicians rely on internal polls to decide when and if to declare their candidacy. In these increasingly data-driven campaigns, polls determine where to focus resources, micro-targeting messages, and even positions on key issues. Mega-donors, who drive the bulk of spending through the superPACs in our American elections, are clearly attuned to the most recent poll numbers to decide who to back and how much to bet. The Republicans have gone so far as to use polls to decide who gets on stage, and where they sit. Even as the pollsters are telling us that their results are garbage, we clamor for more. Clearly, polls are king, and they aren't going to be allowed to go anywhere.

The Data is Fine, the Math is Bad
The problem, everyone agrees, stems from the rise of the cell-phone-only household (41% and growing). By federal regulation, cell phones must be dialed and interviewed by human beings. Furthermore, response rates on cell phones are abysmal- about 95% of calls by pollsters are not answered or hung up on. This makes calling cell phones prohibitively expensive to do, and very difficult to reach a large representative sample within a 2-3 day window, especially for cash-strapped news organizations. This tends to mean small samples, skewed populations, and a tendency to rely on land lines for a disproportionate section of the responses.

This, of course, gives a very biased view of the electorate, one that fails to reach young, urban, or less affluent voters. The primary observable factors affecting cell phone use and response rate are age, gender, race, and income. Pollsters know this, of course, and attempt to correct for this through a process of re-weighting. Basically, if they "know" that 18% of a state is black females, but they only have 6% of their responses as black females, each of those responses is counted 3x more than normal. Using key demographic questions and a base distribution of the population (e.g. census data) they try to realign their data with the true population.

The first problem with this approach is that demographic data is coarse and out-dated. Out-dated in the sense that the last comprehensive census was 2010. Coarse, in the sense that you rarely get fine grained demographics, like urban Hispanic females earning between $30k-$50k. The demographics may tell you the percentage that are urban, and independently the percentage that are are Hispanic, but rarely both together. Attempting to use reweighting simultaneously across these coarse statistics is something best explained in terms of hex runes and animal sacrifice.

The other main problem is that this is just wrong. Wrong in so many ways that I can't even verbalize coherently here. To summarize, it gives the wrong answer, and gives the wrong amount of uncertainty. For example, consider a group of people that has 5% of the population over 65 years old. Your poll of 300 people only reached 5 people in this category when you expected 15, and they are split 4 for Candidate A vs 1 for Candidate B. Re-weighting would change this count to 12 and 3, and include it in the "corrected" count. First off, it should be clear that the uncertainty for the first case is much higher than the second. Was that one vote for Candidate B a fluke, or is there a real minority support? Furthermore, that minority support could easily be anything from 10-30% and still "match" the data perfectly. Secondly, (in a topic best relegated to it's own post) the predictive distribution for 4:1 ratio is not 80%/20%, but really is 71%/29%, while a 12:3 ratio is closer to 77%/23%. While this example may seem like a trivially small group, keep in mind that we are talking about simultaneously categorizing people based on age, gender, race, and income (four factors heavily influencing both the exclusive use of cell phones and voting habits). According to voter registries, white males 18-30 making <$30k/year are the largest subgroup in FL, and represent approximately 6-7% of the total. So small samples are normal.

The pollsters have valid data. It would be different if people were lying to interviewers (pollsters lying about the people is a separate issue - see Strategic Vision, DataUSA). Instead, the data is valid, it just comes from a highly biased sample set, with some good information on how that set is biased. The problem is that the analysts attempting to make sense of this data don't know how to handle it. That's not unexpected, either - these are traditional statisticians who have built careers out of using old rules of thumb that used to work pretty well. When your sample bias is small and fairly random, re-weighting is "approximately correct" and the errors tend to cancel out over time. The same is true for the reported uncertainties. Basically, these are practical people, not theoreticians, who are leveraging tried-and-true mathematical theorems about sampling. However, these theorems are based on certain axioms (assumptions about the data) and these axioms are no longer true.

Saving an Endangered Species
It's clear that political polls are increasingly important, regardless of their quality. It's also clear that the old statistical rules don't apply anymore. Putting another band-aid on the models or adding more epicycles to the orbits (see Apollonius) isn't going to cut it. We need to rethink what information we have, and what it's telling us. Then we might be able to figure out what we want to know.

What we have are several hundred answers to one key question. The people who gave these answers are not at all representative of the general electorate. However, we have no reason to suspect that they are not representative of a specific sub-population of the electorate. If we talk on the phone to 100 people who are white males 18-30 making <$30k/year, their political opinions are likely the same as any other random group of 100 white males 18-30 making <$30k/year. We may reach more or less people within one of these subgroups than they represent in the total population, but within that group there is minimal bias. So, instead of trying to Frankenstein together a sample that equals the population as a whole (with some people more equal than others), we should instead model each sub-population and then stick them together, using my favorite tool from probability, marginalization. This transforms a single political poll into a whole bunch of much smaller polls. Each new poll is completely separate of the others, with independent numbers of samples, results, and amounts of uncertainty. When we put these separate polls back together, we join them up based on what we know about each piece of the population - poll results and demographics. This gives a much more accurate and reliable understanding of the whole by consciously examining the pieces.

Meanwhile, Back In Reality
Alright, I’m a realist. No pollster is going to give me access to their raw data, and let me “fix” it for them. The best crosstabs that I can get from most of these pollsters might report one or two of the relevant pieces of information, and never as a joint crosstab. I’m pretty much stuck with the results as they have reweighted them. However, that doesn’t mean my hands are tied.

Each pollster uses a very similar method of polling people each time. This means that their bias and uncertainty will remain the same over time. So comparing two different pollsters, we should see (on average) the same differences between them. So if we look at all of the polls together, we can get a model of the error and uncertainty for each pollster. Taken together, we can make very informed decisions about the nation as a whole. This is the approach that I took four years ago, in both the primaries and the general election, and did very well with it.

The problem this time around is that the pollsters, as a whole, are wrong. That’s the big punchline of the past two years, is that no one is getting even close to correct. The big question becomes, how do you model the aggregate errors of the pollsters? Well, we have a lot of data on our side. We are averaging nearly twenty polls a week (and growing). Also, we know the demographic breakdowns of each state, and we know who uses cell phones. Additionally, we can verify any of our models about how the pollsters are skewed against the individual primary states as they vote, providing a long string of guess-and-test experiments (known more formally as The Scientific Method). How exactly are we going to go about putting all of this together? Well... I’ll keep you posted.