Brexit and now The Donald -- how wrong can you be? The pollsters have been thwarted once more in their predictions -- even the all-knowing Nate Silver, who had called it Hillary's way, looked like just another statistical charlatan who had just been lucky in prior elections. Should we now turn back to astrology, Ouija boards and clairvoyants for our punditry? What went so wrong?
The first thing is that it didn't really go wrong from a numbers perspective. I think it was more the media's interpretation of those numbers. A poll is basically a sample of people that we use to represent the electorate.
Here's the first problem. The U.S. doesn't have compulsory voting -- therefore the population you are trying to represent is not easily identifiable. That's a major issue when you are interpreting the results. If a particular group who is pro one candidate or the other decides to get out and cast their vote when historically they haven't, then it can have a real impact on the validity of the poll.
Even though Hillary Clinton won the popular vote nationally, the U.S. political system does not choose a president based upon the national share, but the Electoral College system. We ended up with a similar situation to what we saw in 2000; the national winner of votes is not the one who will be moving into the White House. So you can call the winner nationally, but as we have seen, that may not be reflected in the actual person elected to the White House. Most polls were therefore not far out and Nate Silver showed what would have happened if just 1 in 100 voters had voted the other way (it would have been bang on the poll forecasts).
Polling also relies on the pollster being able to use experience as well as expertise to predict accurately. Historical trends have to be relied on to correct issues in many pollsters' models. Analytics has advanced tremendously in recent years, allowing us to build historical patterns into the algorithms that capture nuances in order to predict the result. This is where we do have an issue -- as shown in this election and the Brexit vote, the historical data was non-existent.
If you looked at the composition of support for Donald Trump, it is not a regular Republican victory. That's because he wasn't a regular Republican candidate. At times like these, you can despair trying to predict as your model assumptions do not hold. I found it interesting that pollsters were actually struggling up until election day in reading their models -- I wouldn't be surprised if they come out and declare they hadn't corrected appropriately (e.g. they looked at specific groups and modelled these, but when they analyse the data, these groups are not found to be valid predictors). It's a key lesson when looking at any analytics model where you are using historical data as an input. Anything too much outside of the norm and the prediction can suffer.
Also, we didn't have a popularity contest, we had an unpopularity contest. This was yet another factor that created some issues in trying to predict this contest. This was also true in the Brexit vote, where the issue was not really what people thought about when they voted. In both cases you had a protest vote, rather than a pro-vote. Tricky to deal with in prediction models set up to deal with the popularity of a candidate, not necessarily the underlying issues that are not directly linked to the pollsters' results.
My main outtakes from the polls -- don't let a headline number get in the way of the analysis. Any good insight professional would know that one number doesn't tell you the story. When you also have a situation that is unprecedented, question whether the "tried and true" is appropriate -- modelling is about building in contingencies for variables, not ignoring them.
And finally and most importantly, don't believe everything you read or hear. Polls help create and direct headlines at a time when the real result cannot be determined. Catchy headlines and a good story provide just that, and won't necessarily deliver an accurate interpretation of what is going on in the numbers. Leave that to the analytics.