« Fox Poll Shows Kerry Lead in Battleground States |
Main
| Kerry Takes Lead in New National Poll »
September 13, 2004
Why the Race Is Closer Than People Think
Is Bush ahead by a little or a lot? Is it close to a tie ball game or has Bush surged to a commanding lead?
The conventional wisdom inclines to the latter not the former. The reason has a great deal to do with two persistent problems with contemporary polls that--at least at this point in time--tend to considerably inflate Bush's apparent lead. But once you dissect the available data with these problems in mind, a truer picture of the race comes into focus which suggests that the race continues to be very close.
The two problems are: (1) samples that have an unrealistic number of Republican identifiers and hence tend to favor Bush; and (2) the widespread and highly questionable practice of using likely voters (LVs) instead of registered voters (RVs) to measure voter sentiment this far before the election.
First, the issue of partisan distribution in samples. Lately, and very suddenly, many polls have been turning up more Republican identifiers than Democratic identifiers in their samples--in some cases, many more (as high as a 9-10 point Republican advantage).
How realistic is it to be suddenly turning up a Republican lead on party ID, much less a large one? Not very. The weight of the academic evidence is that, while the distribution of party ID among voters can and does change over time, it changes slowly, not in big lurches from week to week.
And the weight of the empirical evidence is that the distribution of party ID among voters has favored and continues to favor the Democrats. In 2000, the exit polls showed Democrats with a 4 point advantage over Republicans. In 1996, it was also 5 points; in 1996, it was 3 points and in 1988 it was also 3 points.
The data also indicate that there were two shifts in party ID over the 2001-04 period which largely cancelled each other out. The first shift, in the period after 9/11, shaved several points off the Democrats' lead and brought the Republicans close to even (but never ahead) in party ID. The second shift tooks place in late 2003 and 2004 and reconstituted the Democrats' lead on party ID to about 4 points, exactly where it was in the 2000 election according to the exit polls (see this useful study "Democrats Gain Edge in Party Identification" by the Pew Research Center for more details).
So, if polls are suddenly turning up too many Republican identifiers and that is unrealistic and skews reported horse race results toward Bush, what, if anything, should be done?
One possible solution is to weight poll results by a more reasonable distribution of party ID. The issue of whether to use this approach to the problem is well-summarized by Alan Reifman in his invaluable essay "Weighting Pre-Election Polls for Party Composition: Should Pollsters Do It or Not?" on his website.
As Reifman puts it:
One factor (among many) that may contribute to discrepancies between different outfits' polls in their Bush-Kerry margins....is polling firms' different philosophies as to whether it's advisable to mathematically adjust their samples -- after all the interviews have been completed -- to make the percentages of D's and R's in their survey sample match the partisan composition that is likely to be evident at the polls on Election Day. The latter can be estimated from exit polls from previous elections, party registration figures (in states where citizens declare a party ID when registering to vote), and surveys.
(Another issue that often comes up in evaluating pre-election surveys, with which many of you may be familiar, is whether results are reported for "registered" or "likely" voters. That is a different issue from what is being discussed [in this essay]. Whether a pollster reports results for registered voters, likely voters, or both, weighting by party ID is a separate, independent decision.)
Note well Reifman's point that the issue of whether and how to use LVs, not RVs, to report results is separate from the issue of whether and how to do party-weighting. I discuss the LV issue below after the party-weighting discussion.
Given that party ID does shift some over time, my instinct has generally been to avoid party-weighting if possible and promote a full-disclosure approach. This is how I put it in a recent post:
[B]ecause the distribution of party ID does shift some over time....polls should be able to capture this. What I do favor is release and prominent display of sample compositions by party ID, as well as basic demographics, whenever a poll comes out. Consumers of poll data should not have to ferret out this information from obscure places--it should be given out-front by the polling organizations or sponsors themselves. Then people can use this information to make judgements about whether and to what extent they find the results of the poll plausible.
But this approach increasingly seems unrealistic to me. The polling organizations and sponsors do not routinely release the data I call for and certainly do not prominently display them. And even if they did, the typical consumer of polling data lacks the time and skills to use these data to re-weight or adjust reported results. The fact of the matter is that people pay attention to reported results period; therefore they are at the mercy of whichever results are reported and emphasized (an issue that also looms large in the LVs vs. RVs issue, discussed below).
This suggests that weighting poll results by a reasonable distribution of party ID may be necessary to avoid giving the public distorted impressions of the state of the race.
What is a reasonable distribution of party ID to use in such weighting? One obvious candidate is the exit poll distribution from 2000: 39D/35R/26I. Moreover, the Democratic advantage in this distribution--4 points--closely matches the average Democratic advantage in 2004, as measured by the Pew Research Center (see above) and other polling organizations, making it an even more attractive option.
But political analyst Charlie Cook probably has the best idea, even though it can really only be implemented by the polling organizations themselves: "dynamic party identification weighting". Cook's idea is that polls should weight their samples by a rolling average of their unweighted party ID numbers taken over the previous several months. This would allow the distribution of party ID to change some over time, but eliminate the effects of sudden spikes in partisan identifiers in samples (such as we are experiencing now).
Lacking such a dynamic weighting, however, the best we can probably do at this point is to use the exit poll distribution mentioned above. How much difference would this make if we applied it to recent polls?
Quite a bit. Here are Bush's leads in a number of recent polls, ordered by size of his lead, once the horse race question is weighted by the 2000 exit poll distribution (note: not all recent polls can be included because you need the horse race figures among Democrats, Republicans and independents separately to do this procedure and not all polls release these figures; in addition Zogby and Rasmussen results are party-weighted to begin with and therefore do not have to be re-weighted; RV results used unless only LV results available):
CBS News, September 6-8 RVs: +5
Zogby, September 8-9 LVs: +2
Rasmussen: September 10-12 LVs: +1
Fox News: September 7-8 LVs: +1
Washington Post, September 6-8 RVs: +1
Newsweek, September 9-10 RVs, -2
Gallup, September 3-5 RVs: -4
These data present a clear picture of a tight race, with Bush likely running a small lead, but not the solid--and even large--advantage that has been conveyed to the public.
The other problem that is afflicting the polls and considerably inflating perceptions of Bush's lead is the widespread, and highly questionable, use of LVs, instead of RVs, to report horse race results far in advance of the actual election. The reason why using LVs instead of RVs is a bad idea is simple: the LV approach is being asked to do a job--gauge voter sentiment and how it changes from week-to-week (and even day-to-day)--that it was never designed to do. What the LV approach was designed to do was measure voter sentiment on the eve of an election and predict the outcome. That was, and remains, an appropriate application of the LV approach.
But applied as many polling organizations currently do, it is highly inappropriate and frequently very misleading. As political scientists Robert Erikson, Costas Panagopoulos and Christopher Wlezien put in in their important forthcoming paper, "Likely (and Unlikely) Voters and the Assessment of Campaign Dynamics" in Public Opinion Quarterly:
[E]stimates of who may be likely voters in the weeks and months prior to Election Day in large part reflect transient political interest on the day of the poll, which might have little bearing on voter interests on the day of the election. Likely voters early in the campaign do not necessarily represent likely voters on Election Day. Early likely voter samples might well represent the pool of potential voters sufficiently excited to vote if a snap election were to be called on the day of the poll. But these are not necessarily the same people motivated to vote on Election Day.
And of course, since the group of people "sufficiently excited to vote if a snap election were to be called on the day of the poll" changes from poll to poll, it raises the uncomfortable possibility that observed changes in the sentiments of "likely voters" represent not actual changes in voter sentiment, but rather changes in the composition of likely voter samples as political enthusiasm waxes and wanes among the different parties' supporters. Or, as Erikson et. al. put it:
At one time, Democratic voters may be excited and therefore appear more likely to vote than usual. The next period the Republicans may appear more excited and eager to vote. As Gallup’s likely voter screen absorbs these signals of partisan energy, the party with the surging interest gains in the likely-voter vote. As compensation, the party with sagging interest must decline in the likely-voter totals.
And this is exactly what their analysis of Gallup data from the 2000 election finds--"shifts in voter classification as likely or unlikely account for more observed change in the preferences of likely voters than do actual changes in voters’ candidate preferences".
This is an important result and helps nail down what has always been disturbing about the use of likely voter methods far in advance of the actual election. Instead of giving you a better picture of voter sentiment and how it is changing than conventional RV data, it gives you a worse one since true changes in voter sentiment are swamped by changes in who is classified as a likely voter.
Does this matter? You bet it does. When Gallup told the world on September 6 that Bush was leading Kerry by 7 points among LVs, the world listened and absorbed that figure as a trustworthy indicator of where the race was. Completely lost, except to those who bother to look at such things, was the Gallup finding that Bush only led by single point among RVs--in other words, that the race was about tied. Gallup and its sponsoring organizations implicitly and explicitly encouraged people to treat the LV finding as the real story and the RV finding as an unreliable afterthought (after all, those voters aren't "likely"!). The incredible irony, of course, is that the real situation was exactly the reverse: as the Erikson et. al. findings suggest, it was the RV data that provided the best gauge of voter sentiment and the LV data that should have been an unreliable afterthought.
Or take the Gallup data gathered in Ohio in the last two months, perhaps the key state in this election and the subject of endless media stories about "the battle for Ohio". On September 8, Gallup released data showing Bush ahead of Kerry by 8 points among LVs in Ohio, a 14 swing from late July when Kerry led by 6. Again, completely lost in the Gallup, newspaper and television reports on the poll was the poll's finding that Bush had just a 1 point lead among RVs in the state, representing a much more modest swing of 6 points since late July.
Guess which figures are still with us as coverage of the battle for Ohio continues? That's right: Bush's 8 point lead among LVs and 14 point swing. In fact, just this Sunday, The New York Times practically built their Ohio campaign story around these figures which showed just how well Bush is doing! and just how much the situation has changed!.
In short, these LV figures, especially from Gallup, are contributing mightily to the impression that Bush has built a substantial lead and is even surging ahead in some of the key swing states. But, as we have seen, these LV data are fundamentally inappropriate for measuring the state of the race, and how it is changing, this far ahead of election day. For that, you need the RV data and they suggest something far different: the race is damn close and Bush's substantial lead is a myth.
Posted by Ruy Teixeira at 11:42 PM | link
|