Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

Poll Results May 18 2019

By Bill James

May 18, 2019

Poll Results May 18, 2019

Julian Castro and Tulsi Gabbard were the winners in yesterday’s poll; Michael Bennet was the main candidate who did not do well, and Jeff Flake (a) is not actually a candidate, but (b) didn’t do too well, either, although he wasn’t off as much as Bennet was. In this weeks’ polls I am polling "heats" by their strength of representation in previous polls, so the four candidates in each group, particularly yesterday’s group, came in with about equal expectation, each expected to get about 25% of the vote. In the early voting Julian Castro was dramatically outperforming expectations, getting 40% at one point while Tulsi Gabbard was down around 17%, but then Gabbard started moving up, and at the end of the day Gabbard and Castro were at 35% each, with Gabbard indicated by Twitter’s color coding as the winner of the poll, meaning that she had the higher 35%. I presume that what happened was that the poll got re-tweeted into a pool of Gabbard supporters somewhere, and somebody took the initiative to urge people to support her in the poll. She made news yesterday with her social media campaign—not the kind of news you want to make—but this could be a second indication that her campaign is social-media savvy. Anyway, here are the updated poll standings:

Rank	First	Last	Current
1	Elizabeth	Warren	1234
2	Pete	Buttigieg	1031
3	Joe	Biden	952
4	Kamala	Harris	897
5	John	Kasich	711
6	Stacey	Abrams	549
7	Beto	O'Rourke	535
8	Bernie	Sanders	499
9	Donald	Trump	469
10	Cory	Booker	453
11	Amy	Klobuchar	410
12	John	Hickenlooper	349
13	Bill	Weld	276
14	Kirsten	Gillibrand	249
15	Andrew	Yang	237
16	Julian	Castro	209	UP
17	Howard	Schultz	199
18	Tulsi	Gabbard	193	UP
19	Jeff	Flake	180	DOWN
20	Jay	Inslee	165
21	Tim	Ryan	158
22	Michael	Bennet	156	DOWN
23	Eric	Swallwell	117
24	Mike	Gravel	79
25	John	Delaney	73
26	Seth	Moulton	70
27	Marrianne	Williamson	36
28	Wayne	Messam	34

Gabbard reportedly had a social media "enemies list". The term "enemies list" is emotionally loaded because it invokes the memory of Richard Nixon using the powers of government to go after his enemies, but there is actually nothing at all wrong with a political campaign keeping track of those who don’t like them. It is in the nature of politics that somebody is actively trying to see that you lose. It’s just common sense to be aware of who is working to try to defeat you.

I had a question from a reader in regard to yesterday’s poll:

Bill, is there a way to translate these poll standings into a 2-person contest - say, Warren vs Trump? (at least as far as it is reflecting your Twitter readers, of course.) Or a 3-person contest- say, if Schultz puts himself on the ticket as an Independent in a Warren/Trump contest?

The theory of the method is that the ratio of the numbers represents how they would do against one another in a future poll. With Warren at 1234 and Trump at 469, a ratio of 72 to 28, Warren (in theory) would beat Trump 72 to 28 in a head-on poll. If you added Schultz (199) to the poll, that would make it Warren 65, Trump 25, Schultz 10.

In reality, Warren would probably beat Trump a little worse than 72-28, probably more like 76-24, among my twitter followers. The reason this is true is that in the daily polls, since the list of 28 candidates is 23 Democrats, 4 Republicans and 1 Independent, Warren is almost always splitting the "Democratic" vote, while Trump is very rarely splitting the "Republican" vote. This does not cause Warren’s support to be under-estimated or Trump’s to be under-estimated, but it does cause Warren’s support to be under-estimated relative to Trump, because it doesn’t pick up on the "second choice" effect. Those who are voting for Kamala Harris in my poll are not likely to switch to Donald Trump if Harris is not in the poll; they are almost certain to switch to Elizabeth Warren.

It is not the purpose of the poll to predict the election, because, if it were, we would surely fail. The purpose is to build understanding. The point is to establish the strength of one candidate in relation to another, and thus track how they are doing, who is moving up, who is falling behind, who is a third-tier candidate and who is fourth-tier, etc.

When I post each day’s poll I make an internal prediction for who will get what percentage in that poll, based on the previous polls. I don’t usually publish these, because publishing a prediction about the poll could be construed as influencing the voting, but I do make my own prediction for how the poll will go. For the poll of May 14, this was the prediction:

Abrams

Kasich

O'Rourke

Sanders

And this is how the poll actually went:

35	Abrams	29	Kasich	27	O'Rourke	22	Sanders	21
	Abrams	20	Kasich	34	O'Rourke	22	Sanders	23

This is the same data for the polls of May 15, 16 and 17:

36	Booker	24	Hickenlooper	26	Klobuchar	22	Trump	28
	Booker	36	Hickenlooper	10	Klobuchar	28	Trump	27

37	Gillibrand	22	Schultz	24	Weld	31	Yang	23
	Gillibrand	43	Schultz	12	Weld	20	Yang	25

38	Bennet	24	Castro	26	Flake	25	Gabbard	24
	Bennet	13	Castro	35	Flake	17	Gabbard	35

You can see that the predictions for today’s poll, based on previous polls, are never completely accurate, but that they’re never entirely crazy, either. We always get some things right and some things wrong.

In yesterday’s poll, Michael Bennet fell 11 points short of expectations, and Tulsi Gabbard beat expectations by 11 points. These discrepancies could be because:

(a) The previous polling had not yet accurately measured the strength of the candidate’s support,

(b) This particular poll was atypical, or

In regard to Bennet and Gabbard, I am relatively confident that Bennet was down because his previous measurement, his previous number, was too high. My previous polling for him, while not wildly inaccurate, wasn’t entirely accurate. He had been polled only three times previously, and one of those three was an atypical poll in which there were two Republicans and a very weak Democrat, and this caused Bennet’s share of the vote to be atypically high. In the case of Tulsi Gabbard, I am relatively sure that her +11 result from yesterday indicated a social media bubble; somebody who was in her camp caught the poll and urged supporters to get there and vote for her.

But it doesn’t make any difference. It doesn’t matter what it was. If Bennet’s previous number was inaccurate, we’ll still get it right over time, with repeated polling. If Gabbard’s number is misleading because of a social media effort, then either she will not be able to repeat this in future polls, in which case it won’t matter, or else she WILL be able to repeat this in future polls, which will mean that it is a REAL strength, as opposed to an illusion, and that it SHOULD be measured as such.

The situation is very much like this. A great many people, including those who call themselves statisticians or statistical analysts, don’t have any real understanding of how statistics work. In baseball, sometime in the 1980s I decided I wanted to track the ability of different baserunners to go from first to third on a single. A lot of other people said "Yeah, that’s great, why don’t we track that." But for a long, long time—25 years or so—people wanted to "filter" the data as they were recording it. First, people didn’t think we should count those "opportunities" to go first to third when there was also a runner on SECOND base, because, if the runner from second doesn’t score, then the runner from first can’t go to third, so we don’t count those. Second, runners very rarely go from first to third on a single to LEFT field, whereas they commonly go from first to third on a single to right field, so people wanted to filter out the base hits to left field, and just focus on the base hits to right field. And, of course, people don’t go from first to third on an INFIELD hit, so you have to filter those out, and then there are four or five other reasons that people wanted to filter out of the data this play or that one.

It took me 25 years to convince people—no no no; you don’t do ANY of that. Just leave the data alone; don’t filter out plays with another runner on base, hits to left field, infield hits; just leave it alone. If you leave the data alone, those things will take care of themselves. A regular player has 25 to 40 chances to go first to third on a single over the course of a season. Francisco Lindor last year was 14-for-27 going first to third on a single; Victor Martinez was 1-for-18. That’s what you need to know; the data tells you what you need to know.

If you just count everything, the data will sort itself out, and it will be apparent who is where. But if you DO filter stuff out, then you’re going to eliminate 80% of the trials, so you’re going to have one guy at 2-for-6 and the other guy at 3-for-5 and the league leader at 4-for-7; you don’t have anything.

Just this week, I had the same discussion with a nice gentleman who didn’t understand how we could measure objectively a catcher’s ability to block a pitch in the dirt/prevent a runner from advancing. It’s dead simple: (a) is there a runner on base, (b) is the ball in the dirt, and (c) does the runner advance? There’s no subjective issue here.

But the gentleman was thinking "Balls in the dirt are all different. All of these plays are different. Sometimes the runner is Billy Hamilton; he’s going to advance. Other times it is Kendrys Morales; he’s not going anywhere. Sometimes a ball in the dirt is two feet outside; sometimes it is right at the catcher. Sometimes it takes an unpredictable hop; sometimes it bounces right to you. They’re all different."

They’re all different as individual plays. But if you don’t filter ANY of that out, the data will take care of itself. The data will tell you how good the catcher is at blocking balls in the dirt—so long as you don’t try to filter the input data. If you filter the input data, you’re not REMOVING bias; you’re INTRODUCING a bias. You’re preventing the process of data collection from working the way it is supposed to work.

And this is the exact same problem, with Bennet and Gabbard: people are worrying about filtering the input data. It’s a systemic problem in polling. One of the biggest problems with poll data is that people worry WAY too much about filtering the input data, and not nearly enough about collecting as much data as they can. If you collect enough data, many of those things will sort themselves out.

COMMENTS (3 Comments, most recent shown first)

OldBackstop
"In reality, Warren would probably beat Trump a little worse than 72-28, probably more like 76-24, among my twitter followers."

Kay, among professional polling this week, that is basically a dead head.

Do you think Warren, from Mass, is overly represented among you followers, who would include a disproportionate amount of Red Sox fans?

If you just want to point out that you are only talking about your twitter followers, again, then that is certainly you perogative, but it seems important to me.
12:12 AM May 21st

nettles9
Correction: Bravo!! Well-said, Mr. Bill, sir.
2:47 PM May 18th

nettles9
Bravo!! Well-stayed, Mr. Bill, sir.
1:32 PM May 18th

Poll Results May 18 2019

COMMENTS (3 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: