Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

The MVP Vote Bias Detector Part I

By Bill James

November 23, 2011

Theory and Explanation

Every category has a won-lost record in MVP voting. That’s the basic idea of this article; RBI have a won-lost record in MVP voting. Batting Average has a won-lost record in MVP voting. The categories that the voters "like"—the things that they over-rate and over-value—have "winning" records. The things that they under-rate and under-value have losing records. That’s the idea.

Ryan Braun and Matt Kemp this year had 37 Win Shares each. Actually, Braun and Kemp have very similar stats across the board; they are separated by just 6 runs, 8 hits, 5 doubles, 2 triples, 6 homers, 15 RBI, 16 walks, 7 stolen bases, 8 points in batting average, 2 points in on base percentage, 11 points in slugging percentage, 9 points of OPS. ..there really isn’t very much difference between them anywhere except in strikeouts (Kemp struck out 66 more times.)

The line of thought that resulted in this article started with Braun and Kemp (and with a question from a reader about Braun and Kemp), but the fact that their numbers are so much the same makes this a bad case to use as an illustration. I need to find a case where two players value is the same but their stats are very different. 1997 National League would be good; the two best players in the league were Mike Piazza and Tony Gwynn, with 39 Win Shares each. That would be a good illustrative case, except that the MVP Award was won by a third player, Larry Walker, who had 32 Win Shares but stats inflated by playing in Colorado.

Keep looking, keep looking. .. ..1991 National League is the same; the best players were Barry Bonds and Ryne Sandberg, with 37 Win Shares each, but the MVP Award was won by Terry Pendleton. 1979 National League is another one; the best players in the NL were Dave Winfield and Mike Schmidt, with 33 Win Shares each, but the MVP Award vote ended in a tie—between Keith Hernandez and Willie Stargell. 1977 National League is another one; the co-equal "true" MVPs were Mike Schmidt and Dave Parker, with 33 Win Shares each, but the MVP Award was won by George Foster, who had 32 Win Shares.

OK; I give up. Let’s go back to 1962 National League. In the 1971 movie The Steagle there is a funny scene in which Richard Benjamin, a college professor, is teaching a boring class to bored students on something that he doesn’t really care anything about. It’s the time of the Cuban missile crisis, and Benjamin thinks that the world is about to blow up and he’s wasting his life doing something he doesn’t actually care anything about, so in the middle of his lecture he suddenly starts ranting about the 1962 National League MVP Award voting. The Award was given to Maury Wills, who hit .299 with 6 homers, 48 RBI for a second-place team, in preference to Willie Mays, who hit .304 with 49 homers, 141 RBI for a first-place team, which Benjamin’s character thinks is a terrible injustice.

Let us assume for the sake of argument that Willie Mays and Maury Wills are the only two candidates for the 1962 National League MVP Award, which is not true, and let us assume that Mays and Wills are of equal value, which is not true, either, but history has conspired to deprive us of a clean example with which to illustrate a simple point. Mays out-homered Wills, 49-6, but Wills had 104 stolen bases to Mays’ 18. The vote for Wills, rather than Mays, then, can be seen as a vote for stolen bases, or as a vote against home runs. Since Wills played shortstop and Mays played center field, the vote for Wills rather than Mays may also be seen as a vote for shortstops, but against center fielders. Since Wills played in Los Angeles (a bigger city) and Mays played in San Francisco (not a small city, either, but smaller than Los Angeles) the vote for Wills may be seen as a vote for a big-city player, and against a smaller-city player. Wills had more hits than Mays (208-189), but Mays had more RBI than Wills (141-48), so the vote for Wills can be seen as a vote for hits and against RBI. Mays had more doubles, Wills had more triples, so the vote. . ..oh, you’ve got it now? Sorry. Sometimes I’m not quite sure when the horse is completely dead.

My first thought here was that we could learn something about the biases and preferences of MVP voters by focusing on the cases in which players were of essentially equal value—like Kemp and Braun—and asking ourselves which "categories" the voters choose when the players are equal. Do they tend to choose doubles, or triples? Do they tend to choose on base percentage, or slugging percentage? Do they tend to choose walks, or stolen bases?

That might have been an interesting study just on that level, but the problem is that there aren’t that many cases in baseball history where two players are of equal value and one of them wins. There are a few cases in which two players are of equal value and some other yahoo wins, and there are a handful of cases in which two players are of equal value and one of them wins, like the American League in 1938, but that would have been a very limited study.

What opened the study up and made it something more than that is the realization that you can use this technique even when the players are not equal—like the National League in 1962. Of course Maury Wills in 1962 was not the equal of Willie Mays; Maury Wills was not the equal of Willie Mays on the best day of his life. (The author is well aware that there is no doubt some day on which Willie Mays went 0-for-4 with 4 strikeouts while Wills in the same game went 4-for-5 with three stolen bases and 4 runs scored; I’m aware. Don’t send me letters.) Anyway, we have Mays in 1962 with 41 Win Shares, Wills with 32. Richard Benjamin was right. Richard Benjamin forgot about Frank Robinson, who also had 41 Win Shares, but we’ll get there. Forget I mentioned that; we’re still pretending that this is a two-man contest between Wills and Mays.

When the two players are not equal in fact, that doesn’t give us less information about the biases of the MVP voters; that gives us more information about the biases of the MVP voters. I started by counting it as a "win" for stolen bases (in the case of Wills vs. Mays) but a "loss" for home runs. But then I realized that it’s a bigger win for stolen bases (and a bigger loss for home runs) when the players are not equal.

The next step, then, was to count it as one point for stolen bases (one point against home runs) when the players were equal, plus one point for each difference of one Win Share. I said one point for and one point against; we actually count it as WINS for stolen bases and LOSSES for home runs. Since Mays is 9 Win Shares better than Wills, we count Wills’ MVP triumph as 10 wins for Stolen Bases (1 if they were tied, 9 more because Wills is actually 9 Win Shares behind), and 10 losses for Home Runs.

But that’s not quite right, is it? Take the case of the 1977 MVP vote, where the MVP, George Foster, has one Win Share less than Dave Parker. Foster led the league in home runs, with 52; Parker led in batting average, at .338, thus the vote may be seen as Wins for home runs, and Losses for batting average. However, there’s just one Win Share between them, 33-32. In reality, we’re not really sure that Parker was better than Foster. That would be our best estimate, but. . .one Win Share, you can’t really be sure. 41 to 32, like the 1962 contest, you can be pretty confident you’re got the right man; 33 to 32, not so much. But in this system, we’re doubling the advantage for Home Runs (and doubling the losses for Batting Average) based on a difference in Win Shares so small that we can’t really be sure there’s a real difference. If it was 33 to 31, we would be tripling the value, based on a difference that still might not be real or accurately measured.

So I changed the won-lost counts, in this way: If two players were equal in Win Shares but one of them won the MVP Award, I counted that as three wins for the categories in which the winning player had an advantage. Then, for each Win Share difference between the two players, I added a point (added a win or added a loss.)

This, then, is the final system; I have reached the end of my explanation of the system, except for nagging details. (If you over-explain a nagging detail, could you say that you are beating a dead nag?) Anyway, Maury Wills in 1962 played 165 games; Willie Mays played 162, so we count that as 12 Wins for Games Played (3 + 9).

Wills had more at bats than Mays, so we count that as 12 Wins for At Bats.

Wills and Mays had the same number of Runs Scored (130 each), so we make no entry for Runs Scored.

Wills had more hits than Mays, so we count that as 12 Wins for Hits.

Mays had more doubles than Wills, but lost the MVP contest, so we count that as 12 Losses for Doubles.

Wills had more triples than Mays, so we count that as 12 Wins for Triples.

Mays had more homers than Wills, so we count that as 12 Losses for Homers.

Mays had more RBI than Wills, so we count that as 12 Losses for RBI.

Mays had more walks than Wills, so we count that as 12 Losses for Walks.

Wills had fewer strikeouts than Mays, so we count that as 12 Wins for Strikeouts.

Wills had more stolen bases than Mays, so we count that as 12 Wins for Stolen Bases.

Mays had a higher batting average than Wills, so we count that as 12 Losses for Batting Average.

Mays had a higher on base percentage than Wills, so we count that as 12 Losses for On Base Percentage.

Mays had a higher slugging percentage than Wills (slightly), so we count that as 12 Losses for Slugging Percentage.

Mays had a higher OPS than Wills, so we count that as 12 Losses for OPS.

And there are a lot of details that I haven’t explained, but I think I’d better move ahead, or I may lose the audience.

If the player who "should" win the MVP Award (by Win Shares) does in fact win the Award, that’s a non-event from our standpoint, since it reveals no evidence of bias if the right man wins. Since the BBWAA MVP vote began in 1931 and not including 2011 there have been 160 MVP Awards (going to 161 players, with the tie in the 1979 voting.) Of those 160 awards, 60 have in fact gone to the player who led his league in Win Shares. The other 100 awards "contribute" to the study; that is, the other 100 Awards can be studied for the bias that they may reveal toward one type of player or another.

We can sort awards into four categories:

a) Best Man Wins,

b) There’s a tie in league leadership in Win Shares, and ONE of the best men wins,

c) There’s a "near tie"; the MVP doesn’t LEAD the league in Win Shares, but does come within three points of so doing (assuming that three Win Shares or less are too few to assert with any confidence that some other player should have won), and

d) Bad Selections (not meaning really that it is a bad selection, because after all, maybe the voters were right and we’re wrong, but meaning simply that there is a clear disagreement between who the voters chose and who Win Shares would have chosen.)

Of the 160 awards, the count is 60-6-26-68; 60 good awards, 68 bad awards, 6 ties and 26 near-ties. That can be summarized as 47% agreement with the award. For what it is worth, the agreement has grown slightly over time. The agreement in the first ten years of the vote was actually very good (51%), which disguises the fact that several of the "bad awards" were really bad awards, meaning that the winner not only didn’t lead the league in Win Shares, he wasn’t anywhere near leading. After that pretty good start (1931-1940), the agreement dropped to 38% in the 1940s (1941-1950) and 35% in the 1950s (1951-1960). For the first 40 years of the vote the agreement was 42%; for the second 40 years it was 51%. The agreement for the last ten years has been 55%, a figure exceeded only by the 1970s, when it was 63%.

COMMENTS (3 Comments, most recent shown first)

raincheck
This is an interesting topic. But the case of Wills vs. Mays is a pretty unique one to use as an example. Mays was CLEARLY the better player. But Wills did something unique. He set the all time steals record for a season. His 104 was more than 3 times either the AL leader (Aparicio with 31) or the NL runner up (teammate Willie Davis with 32). Voters were mesmerized and went for Wills. So applying some steal/vote math on that misses the point. It was the excitement around what Wills had done, not the numbers per se.
9:16 AM Nov 25th

bjjp2
Couldn't you accomplish the same thing with a standard regression analysis?

10:33 PM Nov 23rd

yorobert
maybe i'm missing something, but shouldn't the agreement percent be 57.5% (60+6+26/160)?
5:43 PM Nov 23rd

The MVP Vote Bias Detector Part I

COMMENTS (3 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: