Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

The Perfect MVP Voting Structure

By Bill James

November 18, 2019

The Perfect Voting Structure

The purpose of this research is to estimate how often a group of MVP voters will wind up getting the vote right, given different sets of voting parameters. In other words, if we vote this way with this number of voters, we’ll wind up getting the right MVP xx% of the time, whereas if we vote that way with that number of voters, we’ll wind up getting the right MVP xx% of the time.

I am studying that problem by a series of models. As I do this, I’ll have to explain exactly how the research was done, not because this is interesting but because someone else, at some point in the future, may want to follow up on the research, and it is better if he or she does not start out at zero.

This issue could of course be studied in other ways, but there are two advantages to modeling the problem. One is that it greatly expands the sample size. We have a hundred years of real-life history with MVP votes; we can create—and I have created—thousands of years of simulated votes. The other advantage is that, in a model, we absolutely KNOW who the Most Valuable Player is. In real life, you may have your thoughts about who should be the MVP, I may have mine, but there is no absolute knowledge about the subject, thus no way to say for certain whether the voting system got the answer right or wrong. Combining these advantages, no one can say with much confidence whether the MVP vote reaches the best conclusion 90% of the time, 70%, 40%. . .no one really knows. Working with a model, we can know.

I have a friend; I’ll call him Jerry because that is not his name. Jerry has some wonderful qualities and some really annoying qualities. One of the latter is that he condemns things that are not the way they were when he learned to love them. He used to like college basketball, but they ruined college basketball for him when they adopted the shot clock and the 3-point basket, so he no longer has any interest in college basketball. When I first knew him he liked movies sometimes, but the last good movie he saw was. . ..I forget what it was; there was a movie he liked in the 1990s. He holds every movie that he sees to the standard of Casablanca, and thus rejects them all. You can go to lunch with him, but there are only like four restaurants in town that he won’t find some reason not to go to, and he’s not real happy with two of those. I’m guessing he doesn’t like poetry because it doesn’t rhyme anymore, so he’s right about that one; a blind pig will get a Valentine once in a while, I’m told. I’m sure you know people like that; you feel so bad for them because they are cutting themselves off from things that they once enjoyed and should still enjoy, but there’s really nothing you can do about it.

So one time, maybe 1985, it was November and they were getting ready to announce the Cy Young Awards. I asked Jerry who he thought should win, but he responded, "Oh, who cares; it’s all bullshit now anyway."

Excuse me?

It turns out that Jerry is upset about the fact that, in voting for the Cy Young Award, each voter votes for three pitchers, not just one like they used to. Until 1969, through 1969, each voter just voted for one pitcher; whoever had the most votes got the trophy. Obviously it is a matter of time until that system winds up in a tie, and time ran out in 1969, when Denny McLain and Mike Cuellar tied for the American League Cy Young Award with ten votes each out of 24 cast. Nobody remembers this, but in the National League MVP battle, Willie McCovey and Tom Seaver tied for first with 11 votes each out of 24. Nobody remembers that because they used a sensible voting structure with 10 names on each ballot and points given for each name, so that the vote didn’t wind up in a tie. But, in Jerry’s world, "Who the hell cares who they think should be second or third? It’s all bullshit."

That’s another of Jerry’s less charming qualities; the man has the analytical skills of Lou Dobbs. I tried to explain to him that, when you collect more information in the vote, you get a more reliable, more valid outcome. You can probably guess how that went.

Anyway, this article deals with that issue: what is the best way to vote for the MVP? How often does the best player actually win the MVP Award, do you suppose? How much better is the voting system now than it was in years past? How could it be done better than it is?

The Basic Model

I "created" 100 players to represent the players in a league, with the value of each player created by the formula:

100 * random * random

That is, 100, times a random number, times another random number. The reason you create player values with two random numbers, rather than one, is that it creates a more realistic distribution of values. If you just create each value as 100 times a random number, then there will be as many players between 90 and 100 as there are between zero and 10. In the real world, of course, there are many more players near the bottom of the value scale than near the top of the scale. The other alternative over-populates the high end of the scale, thus making it less clear who should be the MVP.

With this system, you have 100 players to choose from but a limited number of high-impact players, and it thus becomes relatively clear who the MVP candidates are—like real life. With 100 players in a league, the best player will usually have a value around 88 to 90, but sometimes the best player will be at 75 (or even lower), and sometimes he will be at 99.5 (or even higher). The average player will have a value a little bit less than 25. Sometimes you will have three players who are almost the same, and sometimes you will have one player who is far better than everybody else—like real life.

Having created the players, we must then create each voter’s perception of each player. I set up the system so that each voter’s perception of each player was the player’s actual value, plus 15 points, minus 30 times a random number. Using "AV" for actual value and "PV" for Perceived Value:

PV = AV + 15 – (30 * random)

So the player’s perceived value—that is, his value as seen by the voter--can be as much as 15 points higher than his actual value, or 15 points lower. A player with an Actual Value of 90 can have a perceived value by the voter as high as 105, or as low as 75. An average player (25) can have a perceived value as high as 40, or as low as 10. The voter will never think that an average player is the best player in the league, and he will never think that a player with a value of 60 is better than a player with a value of 90, but he MAY think that a player with a value of 70 is better than a player with a value of 90.

Note that these parameters were established mostly to model the problem of identifying the MVP of a league. If you were modeling the Cy Young Award, for example, you could probably use 40 players rather than 100, and if you were modeling the problem of identifying the Most Valuable Player on a team, then you could probably create 10 players, rather than 100. You would have to adjust the model for what you are trying to study.

Bias and Error

Each voter is subject to both bias and error, which may be seen as different things. Bias is systematic, and applies to groups of players. Error may be individual, and apply to only one player. For example, a voter may be a fan of one team, and over-value the members of that team, or he may be prejudiced against a group of players or a type of player, or he may just not want to vote for a pitcher for the MVP, without regard to individual value. That’s bias. The voter may believe that only players from championship teams should win the Award, which is bias in favor of a set of teams. But the voter may also have happened to see games in which a player did not play well or played super-well, or he may have a false belief about the player’s baserunning skill or his defense, or he may believe that someone is a great team leader when in reality the player is just good with the press. That is error.

As bias tends to be apply to groups of players, it also tends to be seen in groups of voters. If one person has a bias, probably others have the same bias—for example, the now-discredited belief in the reliability of won-lost records was a form of bias common to all the sportswriters of the 1950s/1960s.

I thought about modeling bias and error separately, but ultimately concluded that bias was simply one form of error—thus, that all of them could be accounted for at one time.

I should also note that statistical systems like WAR and Win Shares are also subject to both bias and error. These systems are built on assumptions, generalizations and estimates which, while hidden within the calculations, are nonetheless forms of bias and error.

While I ultimately decided to model bias and error as one, I bring this up because someone else might pick up this research and execute it in more detail than I have done, and that person might want to create bias and error as separate elements, with bias being shared among voters to a certain extent. We’ll return to the issue of bias later on, with our definition of bias being "shared assumptions which are not valid."

Our First Results

Our first question is, Given this set of assumptions, how often would the individual voter be "right" in selecting the MVP?

In 16,384 trials with this set of assumptions, the "voter" picked the right MVP 8,457 times, and picked a different player 7,927 times. The voter was right 51.6% of the time, and the individual voter was wrong 48.4% of the time.

Reality Check

After I calculated that figure—voter is right 52% of the time—I had the thought that "I wish there was some way to check what the number is in the real world." And then it occurred to me: There is, sort of.

Suppose that you assume that the MVP Award winner is always the right person, the person who should have been selected. If that was the case, then you could get the number we want by asking the objective question, what percentage of all first-place MVP votes go to the person who wins the Award?

Since the BBWAA began voting on the MVP Awards in 1931 and including 2019, there have been 4,348 ballots cast in MVP voting. 48 of those ballots are unaccounted for, meaning that in six of the early votes we do not know how many of the voters voted for the winner. 99% of the ballots, however, are accounted for—4,300 out of 4,348.

Of those 4,300 ballots, 2,875 have listed the MVP winner as the #1 man. That’s 66.9%. It’s almost exactly two-thirds. Two-thirds of MVP votes are cast in favor of the eventual winner.

We do not know, of course, that the actual MVP Award winner is the most-deserving MVP candidate. We do not know that, but there are two possibilities. If the deserving MVP is in fact the winner, then he gets 66.9% of the first-place vote. If the deserving MVP is NOT the winner, then he gets LESS THAN 66.9% of the first-place vote.

What we know, then, is that the percentage of the ballots which go to the "right" man cannot be higher than 66.9%--and, unless the right man is ALWAYS selected, then it must overall be LESS THAN 66.9% of the vote.

OK, there is a tiny bit of space there created by the fact that the most-deserving candidate could have gotten a higher percentage of the FIRST-PLACE vote, but a lower percentage of the OVERALL vote. That could in theory happen, but as a practical matter it almost never does—and even when it does, it’s not mathematically significant. The MVP winner has gotten fewer votes than another candidate only once in the last 50 years, and then it was a margin of one vote. As a practical matter, we know that, unless the right candidate is ALWAYS the winner, then the overall percentage of votes which go to the right candidate must be less than 66.9%.

How much less? Well, that depends on how often you think the voters get it wrong. It is my opinion that, while the majority of awards do go to the most deserving candidate, there have been a substantial number of awards which went to the wrong man. In 1958 Mickey Mantle led the American League in WAR by a wide margin—but didn’t draw a single first-place MVP vote. The same year, Frank Lary had the highest WAR for pitchers by a good margin, but was not mentioned in the Cy Young voting. The Cy Young Award winner, Bob Turley, was not in the top 20 in WAR by a pitcher (combining the leagues, as the Cy Young Award was at that time a combined-league award.) It is reasonable to argue that everybody got it wrong—everybody in the AL MVP vote, everybody in the Cy Young vote. The NL MVP, Ernie Banks, also did not lead the league in WAR, so one could argue that only 3 out of 24 National League MVP voters got it right. The American League Rookie of the Year, Albie Pearson, had only 0.9 WAR, while other candidates had 2.9, 2.3 and 1.4.

I don’t "know" that these voters all got it wrong; my point is that it is reasonable to argue that, in 1958 at least, the MVP voters were almost unanimous in voting for the wrong candidates. It is possible to argue that the actual percentage of voters voting for the "right" candidate is significantly lower than 66.9%. It cannot be higher than that. 51.6% actually seems to me like a pretty good estimate.

The Three-Judge Panel

Suppose that you have a three-judge panel voting on an award, with each judge listing just one player. How often is that going to arrive at the right result?

Three-judge panels are commonly used for post-season series, in which the award is going to be announced as soon as the series is over. Very often the two in-booth announcers and one other guy, the on-field interviewer or a producer, will just vote quickly on the award. If we assume that each voter is 51.6% accurate, how often will the three of them get it right?

Don’t have to model this one separately; it’s just simple math. If each voter gets it right 51.6% of the time, then:

All three voters will be right 13.75% of the time,

Two of the three will be right 38.67%,

Only one of the three will be right 36.25%, and

All three will be wrong 11.33%.

Obviously, if two of the three or all three of the three are right, then the right person wins the award, so that’s 52.4% of the time the award will go to the right person. If all three are wrong, obviously the award goes to the wrong person. The only complication is that, if only one voter gets it right, then the result could be either a tie in the voting, or an award given to the less-deserving candidate.

If only one of the three voters gets it right, then the award will be given to the wrong man about 30% of the time, and there will be a tie about 70% of the time. Don’t ask me how I know this; it’s just an estimate, and it doesn’t make any difference anyway. It’s just how you split 36.25% when only one of the three voters sees the right answer. If two voters agree on the wrong answer 30% of the time, then 10.9 of those 36.25 will go to a lesser candidate, and the other 25.4 will wind up in three-way ties. Thus, we can estimate that, with a three-judge panel, each person voting only once:

The panel will get it right 52.4% of the time,

The panel will get it wrong 36.7% of the time, and

There will be a three-way tie, with the most-deserving candidate being one of three winners, 10.9% of the time.

Eight-Person Panel, One Vote Each

Suppose that you have an eight-person panel voting on the MVP Award, with each man casting just one vote. I’m not aware that this system has ever been used in Major League baseball, although it is fairly common in amateur leagues. In the old Big 8, for example, the coaches would vote for the Coach of the Year, eight votes; somebody would win 3-2 or 4-2 or something. Suppose that you voted on the MVP that way. How often would that result in the right player being elected?

That system would result in the right MVP being selected about 68.6% of the time, depending on how you score the ties and how you break the ties. You’d have a lot of ties.

I created a model of the problem in the manner outlined before, and ran the process 512 times. The "right" MVP candidate:

Was a unanimous selection 15 times,

Won 7 of the 8 votes 36 times,

Won 6 of the 8 votes 62 times,

Won 5 of the 8 votes 89 times,

Won 4 of the 8 votes 104 times,

Won 3 of the 8 votes 97 times,

Won 2 of the 8 votes 71 times,

Won 1 of the 8 votes 37 times, and

Did not win any of the 8 votes one time.

Obviously, if the "true" MVP gets 5 or more votes then he will win the Award, so there’s 202 times out of the 512 that the true MVP wins the Award.

If the most-deserving MVP gets 4 of the 8 votes, he can either win the Award outright or tie for it. In the 104 times that the most-deserving player got four votes, he won the award outright 86 times, and tied for the award (one other player getting all four of the other votes) 18 times.

If the most-deserving MVP gets 3 of the 8 votes, he can (a) win the Award, (b) lose the Award outright, or (c) finish in a 3-3 tie with another candidate. There were 97 times that the "true" MVP got 3 of the 8 votes. In those 97 trials, the true MVP:

(a) Won the award outright 29 times,

(b) Lost it outright 29 times, and

If the most-deserving MVP gets 2 of the 8 votes, he could, in theory, still win the award, as six other candidates could get one vote each. In 512 trials there were 71 times when the most-deserving candidate got only two votes, but it never happened that this was enough to win the Award anyway. The 71 trials led to 11 ties and 60 outright losses.

The results of the 512 trials with this model are summarized in the following chart:

First Place Votes	Occurs	Wins	Ties	Losses
8	15	15	0	0
7	36	36	0	0
6	62	62	0	0
5	89	89	0	0
4	104	86	18	0
3	97	29	39	29
2	71	0	11	60
1	37	0	0	37
0	1	0	0	1

Total	512	317	68	127

This voting structure will result in a tie about 13% of the time. If we split the ties and count them as half-victories, then the "right" player wins 351 MVP Awards in 512 trials, or 68.6%.

Sixteen-Person Panel, One Vote Each

OK, we move on now to a 16-person panel, with each voter casting just one vote. This is the system that was actually used in the Cy Young vote from 1956 to 1960, before expansion added two voters per team.

I ran 640 trial seasons with the assumptions outlined before, and a 16-person, one-vote panel. In those 640 trials the most-deserving MVP candidate won the vote outright 481 times, which is almost exactly 75%. He lost the vote outright 113 times (18%) and the vote ended in a tie the other 46 times, or 7%. This is a fuller breakdown of the results:

First Place Votes	Occurs	Wins	Ties	Losses
16	13	13	0	0
15	8	8	0	0
14	21	21	0	0
13	24	24	0	0
12	42	42	0	0
11	46	46	0	0
10	77	77	0	0
9	66	66	0	0
8	76	71	5	0
7	70	57	4	9
6	68	36	16	16
5	66	20	16	30
4	28	0	4	24
3	19	0	1	18
2	10	0	0	10
1	5	0	0	5
0	1	0	0	1

Total	640	481	46	113

For the sake of making sure that you can understand this chart, there were 70 times in those 640 trials when the most-deserving MVP got 7 first-place votes. In those 70 trials, the most-deserving MVP won the Award 57 times, lost it 9 times, and the vote ended in a tie 4 times. There was one time in the 640 trials when not a single MVP voter voted for the most-deserving candidate.

If we assume that there is a tie-breaker process in place and that the most-deserving MVP wins the tie-breaker 50% of the time, then the 16-person panel should find the right man 78.75% of the time. 79%.

Observation about the 16-person Panel and the Cy Young

Our study above concludes that a 16-person panel with one vote per person should get the answer right about 79% of the time. This voting structure was actually used in determining the Cy Young Award from 1956 to 1960, and one would think that, because there are fewer candidates in the Cy Young competition, fewer serious contenders, that the voting results should be MORE accurate than that. . .more accurate in a Cy Young vote than in an MVP vote, which is the basis of our model.

But in reality, if you look at those Cy Young votes, one can make a good argument that the voters got all five of them wrong. I think it is probably true that the voters got all five of them wrong. By WAR, they missed them all, and missed most of them by very wide margins. If that is true, that means that the actual results—granted, it is a sample of five—but the actual results do not seem to be consistent with the theoretical model. That forces us to ask why. What are we missing here?

I think it is bias. The voters in that era had a shared assumption that won-lost records were reliable, and thus, that the best pitchers would have the best won-lost records. They missed it because they were all voting on the same wrong assumption. It was Groupthink.

There’s a second indicator that that is what happened. Our study shows that 7% of such votes SHOULD result in a tie. This suggests that there should probably have been a tie in the Cy Young voting long before there actually was. With a 7% chance of a tie in each vote, there is a 52% chance of a tie in the first ten votes. In fact, there was no tie in the first 16 votes.

Why?

Groupthink. They never had a tie, because they all thought the same way, all shared SOME of the same error, which created a certain amount of consensus where logic and individual error of observation would not have yielded consensus.

The 24-Person, 10-Vote Ballot

Let us step forward now to a 24-person panel in which each voter votes for 10 players, and ranks them 10-9-8-7-6-5-4-3-2-1, with the best player getting 10 points and the 10^th best player getting one. This exact voting structure has never been used to determine the BBWAA MVP Award, although it is close to the system which was used for many years. From 1931 to 1937 the BBWAA used a 10-9-8-7-6-5-4-3-2-1 voting structure, but with only eight voters, one per team. In 1938 they made two changes, changing from an eight-person panel to a 24-person voting group, and also changing from a 10-9-8-7-6-5-4-3-2-1 weighting system to a 14-9-8-7-6-5-4-3-2-1 system; in other words, the same as before except that the person listed first on the ballot gets 14 points, rather than 10. That voting system was used from 1938 to 1960, then was used again in the American League from 1969 to 1976, and in the National League from 1969 to 1992. When the first expansion happened in 1961, the BBWAA cut the number of voters from 3 per team to 2 per team, thus cutting the number of ballots from 24 to 20, but when the second expansion happened in 1969 then there were 12 teams in each league, so the number of ballots went back up to 24. It was 24 in each league, then, until that league expanded.

In my studies, this system was 86.7% accurate at determining the best-qualified candidate for the Most Valuable Player Award. I did two studies of this, and got that identical percentage both times. In the first study I simulated 128 seasons. Of the 128 seasons, the correct MVP was identified by the process in 111 seasons—86.72%. In the second round of studies I figured out a more time-efficient method to run the studies, and I was able to do 512 seasons in fewer work hours than it had required to do the first 128. In that series, the correct MVP was identified in 444 out of 512 trials—precisely the same percentage. Sticking just with the second group, the "true" MVP finished:

1^st in the voting 444 times,

2^nd in the voting 62 times,

3^rd in the voting 5 times, and

5^th in the voting once.

Turning it around the other way, the voted MVP was:

The best candidate 444 times,

The second-best candidate 55 times,

The third-best candidate 11 times, and

The fourth-best candidate 2 times.

In the 512 trials, there were only two contests which ended in a tie.

Discrepancy Noted

In my 512-season simulation there was not a single case of a unanimous MVP selection with 24 ballots. In the real world, there have been 18 unanimous selections, although I think that count includes a couple of Awards that predate the BBWAA taking over the vote in 1931. I believe that most of those 18 unanimous selections were with less than 24 votes, and, as you get more voters, you get less chance of unanimity, but still, it’s a pretty significant discrepancy between the model and the real world.

There would seem to be three possible sources for the discrepancy. First, it could result from Groupthink, from people all agreeing on something that isn’t necessarily true. I could build this into the model by creating systematic bias—that is, having all of the "voters" or most of the voters agree on some value that isn’t actually there.

Probably some of the unanimous selections did result from Groupthink bias. In 1967, for example, Orlando Cepeda was a unanimous MVP selection in the National League, albeit with only 20 voters, but still, that seems like a Groupthink selection. I’m not really certain that Cepeda was the MVP at all, now that you mention it. He was 5^th in the league in WAR, but he led the league in RBI, which was a huge deal at that time, and his team won the pennant after two seasons very near .500, which may have unduly influenced some voters, and the player who perhaps should have been the MVP, Roberto Clemente, had won the Award the previous season, which probably discouraged some voters from voting for him again. There have been other unanimous MVP selections which seem to me to have perhaps been the result of collective bias, and also there are other telling details all over the study which suggest that there is some Groupthink that influences the voting.

However, there have also been cases in which players won the Award unanimously, and it would seem like they should indeed have done so. Al Rosen in 1953, or Mike Schmidt in 1980; it seems like you would have to be pretty dense to miss the fact that this was the best player in the league.

Second, it could be in some cases more obvious who is the deserving MVP than my model has allowed for. We could create this "occasional separation from the pack" by adding another random element to the value model, thus occasionally allowing one player to separate himself by a wider margin.

Third, it could be that perceptual error in real life is less than it is in my model. In my model I allowed each voter’s perception of each candidate to be 15 points better than the player’s actual ability, or 15 points worse, as a theoretical maximum. It would be a simple matter to change that to 14 points, or 10 points; in other words, to reduce the perceptual error of each voter.

I estimated that, using this voting structure, the voters would get the right result 87% of the time. The key question here is whether this discrepancy indicates that the actual vote is MORE accurate than I have estimated—that is, that the voters get the answer right more than 87% of the time—or whether it indicates that they are less accurate. If the discrepancies result from Groupthink in the voting, then the real-life voting is probably less accurate than 87%. If, on the other hand, the right MVP stands out from the group sometimes more than my model believes, or if the relative perceptual error is less than I have built into the model, then the real-life voting would probably be more than 87% accurate.

I’m not going to re-run these studies to try to resolve the issue, because (a) these studies represent more than a week’s work, and I don’t have another week to put into this project, and (b) I don’t really know which direction to go in in reconstructing the model—building in Groupthink bias, reducing the Perceptual Error, or creating a feature which would occasionally allow one player to stand out from the group by a wider margin.

Also, the real goal here is not to build a perfect model; it is, rather, to understand how different variables affect the accuracy of the voting. Does increasing the number of voters meaningfully increase the accuracy of the voting? Does using a 14-9-8-7 system rather than a 10-9-8-7 system actually improve the accuracy of the selection? That’s really what I am trying to get to. The answers to those questions are probably the same regardless of what causes this discrepancy.

The 1938 Model

So let’s get to that question: does using the 14-9-8-7-6-5-4-3-2-1 voting weight, rather than a 10-9-8-7-6-5-4-3-2-1 system, actually increase the reliability of the system for identifying the most deserving MVP?

It does not.

I will call this the 1938 Model, which is not intended in any way to suggest that this is an outdated or antique model, like a 1938 Ford or something; that’s not what I am saying. From 1931 to 1937 the BBWAA used an 8-person panel and weighted votes by the 10-9-8-7-6-5-4-3-2-1 method; in 1938 they switched to a 24-person panel and to the 14-9-8-7-6-5-4-3-2-1. That exact model has been used essentially ever since, although the number of voters has varied from as low as 20 to as high as 32, but essentially the same system.

If you have read my stuff over the years, you know that I have generally spoken well of this system. I have always described it as an intelligently designed system which generally does an excellent job of finding the right MVP. That’s the conclusion here, as well: this system generally works.

And one can understand why the change from 10-9-8-7 to 14-9-8-7 was made. The BBWAA members at that time were saying that, while they wanted to know who the voters thought all of the best players in the league were, there should be a special emphasis on knowing who the voters thought was the BEST player in the league, the #1 guy. It was sort of like what my friend Jerry was saying, back at the start of the article: that the only thing that should REALLY matter was who is the number one man? Intuitively, it makes sense.

But mathematically, it doesn’t make sense, and mathematically, it doesn’t really work. Mathematically, what you are doing by giving 14 points for a first-place vote, rather than 10, is arbitrarily giving additional weight to a distinction which there is no reason to believe is especially reliable. This doesn’t cause the system to work better; it actually causes it to work slightly worse.

Well, I don’t want to overstate that; overstating it will cause confusion. There actually is a mathematical reason to give extra weight to the #1 selection. Given an array of player values in a competitive environment, it is likely that the difference between the #1 player and the #2 player is greater than the difference between the #2 player and the #3 player. It is virtually certain that the difference between the #1 player and the #2 player would be larger than the difference between the #65 player and the #66 player. If you studied WAR, for example, or Win Shares, you would certainly find that the difference between the #1 player in the league and the #2 player in the league was, on average, much greater than the difference between #2 and #3. This difference would, in fact, justify a mathematical model which places extra weight on who the voters perceive as being #1 man.

But the key question is "how much"? How much extra weight?

The 4 extra points for the first-place vote are probably way too much. Probably giving one point to the first-place vote—an 11-9-8-7—probably that would be too much. Giving 4 extra points is CERTAINLY too much.

The thing is, there are cases, like Al Rosen in 1953 or Mookie Betts in 2018, where one player is pretty obviously better than everybody else. But when one player is far better than everybody else, the voters are going to see that, anyway. The system does not benefit from those 4 extra points for the first-place vote, because that kind of player will usually win the award without the extra help.

In the simulation study, rather than re-running the data for all 512 teams to compare the 14-point system to the 10-point system, I simply took out all of the 10-point votes, and replaced them all with 14-point votes. This saved me several hours of work, but also it seemed like the more appropriate way to the do it, because it creates a more direct comparison between the systems.

When I made this change, there were 17 cases (in the 512 simulated votes) in which adding the 4 points to the first-place votes changed the award recipient from the wrong selection to the right one. The problem is, there were 20 cases in which it changed the award winner from the right player to the wrong one. The net effect was to reduce the number of awards going to the right man from 444 out of 512 to 441 out of 512. It reduced the accuracy of the voting system from 87% to 86%. The 4 extra points simply add emphasis to some random perceptual error.

I could use one of those cases to illustrate the point, but that would involve making a lot of statements like "In Simulated Season number 378, player 51 had an actual value of 83.421, whereas Player 74 had an actual value of 83.247. However, voter number 16 had a perceptual error of. . .. " You get why that’s not helpful.

Instead, I would suggest that you look at a real-life case, which is the National League MVP vote in 1979. The Pittsburgh Pirates won the National League East in 1979 with what was then a very fun team to watch, although the story soured a couple of years later. The ’79 Pirates adopted Sister Sledge’s "We Are Family" as their theme song, wore very cool retro hats, played aggressive, exciting baseball, and won 98 games, winning the NL East by three games.

Willie Stargell at that time was very much like David Ortiz toward the end of David’s career. He was a greatly respected veteran leader in the clubhouse, and a beloved Old Hand by the public, just as David was, and also, although he was old and slow and couldn’t really play the field, he could still mash. He was a left-handed power bat, and a good one. He had a tremendously quick bat, even quicker than Ortiz, and he had formidable strength in his wrists, which enabled him to propel a heavy bat at a high rate of speed and still make contact with the ball.

He had terrible wheels, however, and, because the National League had no DH rule, and also because the National League at that time had a LOT of primitive artificial turf that was as hard as cement and played hell with Stargell’s aching feet, he played in only 126 games, 16 of those as a pinch hitter. He had only 480 plate appearances, no defensive value at all. He hit well, .281 with 32 homers, but he was not the best hitter in the league, by a pretty good margin. He had been the best hitter in the league 1973-1974, but by 1979 he wasn’t really close to that.

He was, however, the emotional center of the championship team, and a FUN championship team, at that. He was "Pops"—again parallel to Ortiz, who was "Papi". In September, he delivered a few Game-Breaking big hits down the stretch. It actually, when you look at the factual record, wasn’t all that huge of a deal. When you look at the actual record, you’re actually talking about just five big hits, in games on September 1, 5, 11, 18 and 25—not that those five hits were not important, but Stargell hit just .222 in September with only 18 RBI, hardly phenomenal numbers. The Pirates, six games ahead on September 1, wound up winning the pennant by three games. Still, a narrative started to develop that Pops was The Guy on this team; he was the guy delivering big, game-breaking hits day after day after day as his team drove to the pennant with a September surge.

It was, to be blunt, kind of a bullshit narrative. Stargell’s WAR for the season was only 2.5, while four players were over 7.5, which I will grant you is not a precisely accurate comparison, either; it may not give Stargell enough credit for his big hits in September, and it gives him no credit at all for his leadership.

Still, Stargell was really NOT the best player in the league, and the majority of the MVP voters knew that. Stargell received ten first-place votes, with the other 14 going to players who had more WAR, most of them to players with three times as much WAR, but split among those players, with no one player getting more than four first-place votes. Stargell finished well down the ballot on the other 14 ballots.

Had the votes been counted on a 10-9-8-7 basis, Stargell would have finished a distant second in the MVP voting. But given an extra 40 points by the 14-9-8-7 weighting, Stargell wound up in a tie for first place—the only tied vote in the history of the award—and wound up with a half-share of the MVP Award.

It’s one case, of course, but I think it illustrates why the 14 points for the first place ballot is not actually helpful in identifying the true Most Valuable Player. Emphasizing the first-place selection gives additional weight to a distinction in the mind, rather than to a distinction on the field. There is narrative value—that is, a story which explains why this player is the Most Valuable—and there is production value, which is imperfectly measured by mathematical tools. The 4-point bonus for a first-place vote gives weight to an excited minority of the voters who have convinced one another of a narrative which selects certain facts as the "important" facts, but gives no weight to all of the other boring facts, those boring home runs in July and those boring doubles and triples and all that boring defensive play.

The Thirty-Man Panel

Excuse Me, the Thirty-Person Panel

In modern baseball, of course, we use a 30-person voting group, two representatives from each team. This brings up the next question: Are 30 voters meaningfully more likely to get the answer right than 24 voters?

It depends on how you define "meaningfully", but yes, getting 30 voters is more accurate than using 24. Using the 10-9-8-7 weighting system for ballots, the 24-person ballot got the "right" answer 444 times in 512 trials, with the second-best candidate winning 55 of the other 68. Using the same system but with 30 voters, the voters got the right answer in 449 out of 512 trials, with the second-best candidate winning 56 of the other 63. The number of awards going to players who should have finished third or lower dropped from 13 to 7, and the "correct decision" percentage increased from 87% to 88%--and really, it is a little more meaningful for that, because the net effect is not to transfer awards from the second-most deserving candidate to the best candidate, but actually from the third-most deserving candidate to the best candidate. Some awards go from the third most-deserving candidate to the second-most, some from the second-most deserving to the most-deserving, so the net effect is to move awards from the third-best candidate to the best.

That is using the 10-9-8-7 ballot. Using the 14-9-8-7 ballot which is actually used, the larger voting panel (30 as opposed to 24) increases the number of correct choices from 441 to 447 (from 86% to 87%), and also decreases the number of selections going to the third-best candidate from 13 to 9.

If you think about it from a certain perspective, it becomes logically obvious that increasing the voter panel would increase the predictability of the outcome. If you think about MVP votes simply in terms of "preference"—I think it is this guy, you think it is that guy, one opinion is as good as another—then it does not seem that increasing the voting panel has much effect; you would still have differences of opinion if you had 100 voters.

But if you assume that there IS one player who is more valuable than any other player—an assumption which I believe is necessarily implicit in voting for a Most Valuable Player—then the votes for other players are not merely differences of opinion, but errors. If you think about it that way, it’s obvious that increasing the voting panel increases the accuracy of the outcome. There MUST be observational errors, right; otherwise everybody would see who the Most Valuable Player actually was. If there was no observational error, you’d just need one voter to decide the thing. The reason you need a larger voting panel is to balance the observational error. The "preference" does not reside in the ACTUAL value; it resides in the observational error.

In mathematical terms, suppose that one player has a value of 91 Whatsis and the other player has a value of 90 Whatsis. The player who has 91 Whatsis will win the vote unless the sum of the observational errors favors the lesser player by at least 1 per voter. Assuming that Observational Errors are a random variable centering at zero, then the more voters are involved, the less chance there is that the average of the Observational Errors is larger than the difference in value between the two players.

What If You Added a Third Voter Per Team?

What if we added a third voter per team, increasing the voter panel from 30 to 45 voters. What difference would that make?

It would make a quite significant difference. Using a 10-9-8-7 ballot, a 30-voter panel got the right answer about 449 times in 512 trials, or 87%, as stated before. A 45-person panel got the right answer 463 times in 512 trials, or a little bit above 90%.. ..a high 90, less than 91.

Using the 14-9-8-7-6 weighting system which is actually used, the number of correct decisions increased from 447 out of 512 to 462 out of 512, which, again, is 90%.

An increase in reliability from 86 or 87% up to 90% may not seem to be a big deal, but if you focus instead on the number of incorrect votes, it seems much larger. Using the actual voting structure, the number of expected incorrect voting results in 512 trials drops from 65 to 50—a 23% decrease. That’s quite significant, in my opinion.

But What About. . .

The only argument that I can think of against adding a third voter for each team would be that the third voter might be less well-informed than the previous two, and thus might have a larger range of error.

It seems to me that this is tremendously unlikely. The modern world, compared to the world of 1960, is vastly better at creating and distributing information. In the 1960s, maybe the voter didn’t know that much about the other teams unless he traveled with one team. In the modern world, many of us have the MLB-TV package that enables us to watch a very large number of games. I’ll bet I saw 50 Oakland A’s games this season. In the 1970s, Peter Gammons made himself a national institution by, among other things, getting on the telephone and sharing information with beat writers around the country. In the modern world, information of that type is shared seamlessly with people who would never have qualified for access to detailed information in the pre-internet universe. I don’t think that there is a shortage of qualified voters, frankly.

Nor do I believe that it’s a highly relevant issue. One of the things that could be done with studies of this type would be to vary the observational error, to see how the conclusions change with different levels of observational error. In other words, if the present method is 87% accurate if we assume that the observational error is potentially 15 points per player, what would the accuracy be if we assumed that the potential observational error was 10 points per player, or 20 points per player?

I have not done THOSE studies, but I’ll tell you what I think. I don’t think it would make a great deal of difference. My belief, which I must have published 5,000 times over the years, is that the external world is vastly more complicated than the human mind, billions of times more complicated, and, because this is true, everyone’s understanding of anything and everything is unreliable.

People think that getting "qualified" voters is the key to getting a good result, but is it really? I doubt it. I don’t believe it, because I don’t believe that anyone actually understands the world. We approach understanding only by working together. That is the foundation of science, that we create understanding only by working together. For that reason, I believe that a 45-person panel would work substantially better, in getting an accurate MVP vote, than a 30-person panel.

Thank you for reading.

COMMENTS (28 Comments, most recent shown first)

djmedinah
Hi Bill,
I don’t know if you saw this earlier, but it hasn’t posted if you did. Anyway, I wanted to tell you that this essay was a very good explanation of the Marquis de Condorcet’s Jury Theorem, and ask if that was something you were aware of, or whether this is like that moment in Sunstein & Thaler’s review of Moneyball where they mention Tversky & Kahneman. Actually, now that I mention it, what are your thoughts about those two?
2:15 AM Nov 30th

steve161
Maris, it's probably just you and me talking to each other now, which I'd much rather do over a deli lunch in New York some day, so I'll just add that I understand the doubts expressed in your last paragraph, which are probably the result of Bill's failure to express himself as clearly as he would for publication.
1:00 PM Nov 22nd

MarisFan61
P.S. (would have been added as an edit/correction to the previous main post if we could do edits)

It was pointed out to me that I stated an error, about the effect of eliminating the 'extra credit' for 1st place votes.

I got a little lost in a couple of Bill's stated percentages.
I was wrong that with a 30-man panel, the theoretical 'reliability' actually became WORSE if that 'extra credit' is eliminated.
But it remains true that this didn't give any meaningful improvement. It does give a fraction-of-a-percent increase. (in most of the instances, no change when we round to the integer.)

BTW, not that I'm making excuses :-) but it's easy to get lost in the percents.
Bill gave an incorrect rounded number in one instance (one of the latter 87's should be 88%, i.e. 449/512), but that's not why I made the mistake.
12:12 AM Nov 22nd

MarisFan61
.....Sorry for the typos.

"14-9-8-9" was supposed to be 14-9-8-7.

"(and one if the ones you're positing is actually not even borne out but refuted)" was supposed to be 'one OF the ones you're positing.'

10:56 AM Nov 21st

MarisFan61
Steve:
Maybe, but, if that really was Bill's main intent and his main focus, I would think he would have gotten pause from the mathematical result and presented the whole thing quite differently.

I think that if you're focusing on that, it's hard not to reach these conclusions:

-- About eliminating the 'extra points' for #1 picks on ballots: 'Never mind, it makes the results worse.'
That part of it is definite, is it not?
Bill's method found that "10-9-8-7..." picks the 'right' guy slightly less reliably than "14-9-8-9..." with a 30-member panel, and no more reliably than 14-9-8-7... with a 45-member panel.

-- About a 45-member panel rather than 30: that it doesn't make enough of a difference to be worth considering -- i.e. "30 comes closer to total reliability than one might have thought, and 45 hardly helps."

Maybe you're right, but it seems to me that there's an awful lot of neck-sticking-out with questionable assumptions and steps to get to so weakly bear out those bottom lines (and one if the ones you're positing is actually not even borne out but refuted). That's why I have doubts that the things you're saying were the main focus.
10:52 AM Nov 21st

steve161
Maris, being a part of what I expect is a tiny minority that actually read the immediately preceding parsifalian post, I'd like to immodestly call attention to the questions I asked just below it.

What needs to be emphasized--I believe--is that Bill has used an idealized representation of the MVP voting system to demonstrate that the real world system would be improved by a straight 10-1 scale and an increase in the number of voters. The idealized system creates the impression that, contrary to what he explicitly stated, he believes that there is a 'right' answer and a 'wrong' answer. What I think he actually believes is that IF there were a right answer, the changes he proposes would result in the voting being more likely to find it. That's a fine distinction, admittedly, and I may be misunderstanding him completely, as well.

I'd be interested to know what he thinks of this, but I expect he is not a member of the same minority I am, and that we will never know.
7:35 AM Nov 21st

MarisFan61
Deep breath. :-)

Since the discussion of the points already raised seems to have quieted down, if not run its course:

The things I've mentioned so far only scratch the surface of the issues.

I'm going to add several other things, which I mention right away so that if you fee that's too many, you can stop right here. :-)

------------------------------------------------

While that portion that was highlighted by JGF ("In real life, you may have your thoughts about who should be the MVP, I may have mine, but there is no absolute knowledge about the subject, thus no way to say for certain whether the voting system got the answer right or wrong") is wonderful and, as I said in the prior comment, seems (seems) to indicate clearly what Bill really thinks about "MVP," doesn't the rest of the article seem to belie what it's saying?

Doesn't all the emphasis and effort on minimizing the incidence of obtaining a "wrong" result go against that?
Especially since the way that 'right' and 'wrong' results are being defined (as near as I can tell, please say if I'm wrong) is by a single specific method?
That's part of why the JGF-cited portion didn't make a firm impression on me in the first place -- I had frankly forgotten it by the time I got into the core of the article, and it was totally gone when I got to that part near the end that I emphasized.
How is the rest of the article consistent with that quoted portion near the beginning?

-----------------------

This next thing is related to the above, in that it also shows a clear valuing of one kind of assessment.
Check out this part, toward the end (the paragraph just above the section titled "The Thirty-Man Panel, Excuse Me, the Thirty-Person Panel"), which Bill said as part of why he doesn't love the extra points that are given for #1 picks on ballots:

Emphasizing the first-place selection gives additional weight to a distinction in the mind, rather than to a distinction on the field. There is narrative value—that is, a story which explains why this player is the Most Valuable—and there is production value, which is imperfectly measured by mathematical tools. The 4-point bonus for a first-place vote gives weight to an excited minority of the voters who have convinced one another of a narrative which selects certain facts as the "important" facts, but gives no weight to all of the other boring facts, those boring home runs in July and those boring doubles and triples and all that boring defensive play.

I don't think I need to elaborate on how this shows the relative valuing and devaluing that I stated. I'll just also note that there's a nugget in there -- the first sentence of it -- that shows an additional aspect, a more basic one, of relative valuing and devaluing, which most of you might not feel to be any issue because it's about what we might call straight-sabermetric as opposed to impressionistic stuff, "so what's the problem" -- but all I'm saying is, it seems to belie the earlier quoted thing. Here, again, is that first sentence:

Emphasizing the first-place selection gives additional weight to a distinction in the mind, rather than to a distinction on the field.

Well. :-)
I know it's not surprising coming from a sabermetrician, but it belies the seeming neutrality in that earlier part.

--------------------

Maybe most of all: Look at the magnitude (i.e. very small) of the theoretical resultant advantage of expanding the voting panel, which is so small that I'm frankly surprised Bill felt it worthwhile to go into much detail or length about it in the process of presenting this. I think it's hard to see how the idea benefits from it. Bill gave data for how much theoretical 'improvement' there would be respectively with the current 14-9-8-7.... system and with a 10-9-8-7.... system if the panel were expanded from 30 voters to 45. For the actual current 30-voter system, the frequency of getting the "right" answer would rise from 87% to 90%.

Excuse me? :-)

Even forgetting the other noted issues, like the various underlying assumptions and biases that I regard as questionable and in some instances bad, like the whole idea that there is generally a "right" answer, the fact is that the entire exercise is of course a merely theoretical thing -- and it's supposed to be a point in its favor that the chance of a right answer is increased from 87% to 90%?

Bill adds, "An increase in reliability from 86 or 87% up to 90% may not seem to be a big deal...."

check :-)

....but if you focus instead on the number of incorrect votes, it seems much larger.....the number of expected incorrect voting results in 512 trials drops from 65 to 50 — a 23% decrease. That’s quite significant, in my opinion.

I don't see that as changing the impression.

Bill then deals with how perhaps adding more voters could mean adding less-well-informed voters, which he thinks would be unlikely but it's possible, and that it could introduce more "observational error," which to some extent would worsen the result. I'd add that there's a further aspect of this that could worsen the result: I think that with a higher number of voters, in general they'd tend (somewhat) to take the thing less seriously, and that would work in the wrong direction too. Look at it this way: If you were in some voting group, would you take it more seriously if you were one of ten people, or one of a hundred? (I'm using a larger spread to make the picture clearer.)

Anyway, if there's such a concern about minimizing the "error," if one believes in such a thing and feels there's too much of it now, why settle for improvement from 87% to 90%? Shouldn't a revision aim for better? What size panel would it take to increase it to 98%, or at least 95%? I'd guess it would take a panel that's much larger.

It seems to me that this really comes down a lot to how much we do or don't like whatever size panel, and how big of a panel we think would be too big or even ridiculous. Other things equal, I personally find a 30-member panel more appealing that a larger one. It "feels" just about right (recognizing fully that in part I may be coming just from a Jerry standpoint). If you just like 45 better than 30, and especially if you think it would indeed give increased 'accuracy' -- and if you feel there is indeed a problem currently, then sure. (I don't.)

As others said (and I agreed), it's an interesting idea that this article looked at. I'm just struck by what I see as the head-scratching things in it.
1:41 AM Nov 21st

MarisFan61
With the benefit of the portion of the article that I mentioned having been highlighted by JGF:
"In real life, you may have your thoughts about who should be the MVP, I may have mine, but there is no absolute knowledge about the subject, thus no way to say for certain whether the voting system got the answer right or wrong."
.....I think that's pretty clearly the answer about what Bill really thinks.

But, where the problem comes in (besides that it seems to me that in essence the whole thrust of the article goes against that above portion) :-) is that later portion, which at first looked to me like it was an opposite kind of representation of his view.
I'm now seeing it differently -- not as anything about Bill's view of "MVP" per se, but rather how he sees each given MVP ballot, and I see that as mistaken -- but as a different story from what Bill thinks about "MVP."
10:42 AM Nov 20th

steve161
We're still hashing this out on Reader Posts--47 and counting. Trying to understand you, Bill, I wrote: "I understood Bill to say that his research method created an artificial environment in which it was clear who should be the MVP, not that it reflected the real world."

The latter phrase seems to be controversial: some posters believe you believe that there is an objectively most valuable player and that anyone who believes it's somebody else is in error. Some believe you believe this but add that it's not knowable. Some believe you don't believe it. Jerry believes it's all bullshit and he may be right.

Do you care to clarify what you believe? (Some believe you will, others believe you won't.)

@jgf704: my original post on the range of the random number attempted to define the possibilities, but the .NET framework seems to have been thrown a curve by symbols like greater-than, greater-equal, etc. I cited as possibilities the Excel RAND function, which produces a real number between zero (inclusive) and one (exclusive), and the RANDBETWEEN function, which produces an integer within an inclusive range. Using symbols, I said the same thing with half the words.
6:14 AM Nov 20th

MarisFan61
From our ongoing discussion on Reader Posts, it is seeming that a crux point for a variety of our disagreements is contained in this thing from the article that I quoted before, from toward the bottom, in the 5th paragraph of the section titled, in a nicely woke manner: :-)
The Thirty-Man Panel
Excuse Me, the Thirty-Person Panel

"But if you assume that there IS one player who is more valuable than any other player—an assumption which I believe is necessarily implicit in voting for a Most Valuable Player—then the votes for other players are not merely differences of opinion, but errors."

Bill states this as part of why it would be better to have more voters.
But the notions in there are relevant to much else here.

Let's look at it.

Bill says -- and I guess also thinks, believes, is convinced -- that this is so. I mean, I don't see how else to take it when one says it's "an assumption which I believe is necessarily implicit."

Suppose he's wrong?
Suppose that's not true?

Well, in fact, it's wrong. It's not true.

Unless I'm utterly unique, which I wouldn't flatter myself to think, it's demonstrably false, because it's absolutely not how I would see it in my voting, or how I do see it when I do such mock voting, as we often do on this site.
And, I would assert, there must be others here who are similarly...how to put it...I could call it modest-minded or open-minded, or I could call it fuzzy-minded or confused; take your pick .....I'll put it this say: There must be others right here who are similarly modest-minded that when, for example, they cast an A.L. MVP ballot for this year with Trout first and Bregman second (as I did), it doesn't at all mean they think that "votes for other players are not merely differences of opinion, but errors," much less that they (as Bill says) are errors.
We don't even necessarily have great confidence that we ourselves were "right." I picked Trout; I almost picked either Bregman or LeMahieu. Sure as hell it wouldn't by any stretch be "errors," even using my own ballot as the frame of reference (provided that you grasp correctly what's behind the ballot, if someone picked Bregman or LeMahieu first, nor if they picked Cole or Verlander or Semien first.

From what we've seen in the Reader Posts discussion, I'm sure some members will still (somehow) think that what Bill said is right, although frankly I don't see how. Maybe chalk that up to my lack of sufficient open-mindedness -- but don't blame me for having a hard time seeing how such an assertion -- which really is a basis for much of the thought process in this article -- could possibly be right if it doesn't reflect a correct understanding of what my own voting would be, and, as I'm guessing strongly, not of the voting of other people here either.
10:04 PM Nov 19th

MarisFan61
.......which I would put a little differently, not meaning that I disagree with that....

What's most interesting about it is the different ways that there are to see it.
8:41 PM Nov 19th

shthar
If they got it 'right' every time, what the hell would we argue about?

7:56 PM Nov 19th

FrankD
I like the study and the comments. I agree that there is no 'perfect' definition of MVP and thus there will always be room for argument on that. Its kinda like: Who is the most beautiful woman any year (I know, a sexist question). Well, we could take a poll of males (the MVP vote as is done now). We could attempt to quantify beauty: weight, height, age, body measurements, movie tickets sold, etc. And we probably could get a formula that predicts the male poll fairly often, but there would be outliers. I like the attempt at quantifying MVP/beauty selection in that it works very well and illustrates those MVPS that were selected based on 'other' criteria. That these selections were based on other criteria does not make these selections wrong, it just says information outside the quantification criteria were used in the selection.

On the model, would it be better to assume a certain binning of player levels and then use random number to place 'player' in bin. For example, it only historically 3 % of players reach a certain level then assign a random number of above 0.97 to that bin. Then for a modeled year (100 random numbers) there will be times that there are not players in the highest bin, sometimes there may be more that one. Note that the bins would not be of equal size …..
7:01 PM Nov 19th

evanecurb
Who the hell cares? It’s all bullshit anyway!

Signed,

Jerry
2:29 PM Nov 19th

jgf704
steve161 asks: "Technical question, for anyone who might wish to replicate or extend the study: what is the range of the random numbers used?"

Bill doesn't say explicitly, but the circumstantial evidence is that it is a random number between 0 and 1, with a uniform distribution.
1:14 PM Nov 19th

MarisFan61
Bill: Maybe we could use some clarification from you.

We're discussing this on Reader Posts. I highlighted this part of what you say, toward the end of this article, and which seems to me is an underpinning of the article:

"if you assume that there IS one player who is more valuable than any other player — an assumption which I believe is necessarily implicit in voting for a Most Valuable Player — then the votes for other players are not merely differences of opinion, but errors."
(bold added)

But, as our member JGF highlighted, early in the article (right before the Jerry part), you say:
"In real life, you may have your thoughts about who should be the MVP, I may have mine, but there is no absolute knowledge about the subject, thus no way to say for certain whether the voting system got the answer right or wrong."

How do we square that all?

BTW, this gets to the thing I said in the 1st comment, that sometimes it's hard to know how literally to take some of such things in sabermetric writing.
11:39 AM Nov 19th

MarisFan61
I actually agree with Robin's lead-in: "I found this an interesting and clever way of analyzing voting rules" (very much so) -- as long as we forget about the "right" and "wrong" and anything like actual most valuable.
11:11 AM Nov 19th

Robinsong
I found this an interesting and clever way of analyzing voting rules. I do think the 1979 MVP race is fascinating and expanding the voting would reduce the chances of a tie. I think however that the spread of WAR and Win Shares and the access to information has fundamentally changed the voting process, and reduced the need for concern about the small number of voters. The error rate of individual voters has significantly dropped as understanding of what wins games has spread and the information is easily accessible. I think that the model significantly understates the correlation of errors.
10:11 AM Nov 19th

MarisFan61
Steve: Thank you.

I was going to wait a little to say to Bill, if nobody else did, that his reply stunk. :-)
Thanks for obviating me from it.
9:27 AM Nov 19th

steve161
Technical question, for anyone who might wish to replicate or extend the study: what is the range of the random numbers used?

I tried to be more specific, citing how the Excel RAND function might have been used to specify the desired ranges, but I got an error message from the Microsoft .NET framework on which this feature is evidently implemented. I can replicate it at any time, if the webmasters are interested.
8:13 AM Nov 19th

steve161
Seems to me that Maris is not only right, he is obviously right, and tarring him with the Jerry brush is not only inappropriate, it's a cheap shot.

The Stargell example is a good one, and so, I presume, is that of Kirk Gibson. This entire article is based on the assumption that there is an objectively most valuable player, even if not everybody will agree on who it is. This assumption is incorrect, because not everything that makes a player valuable is quantifiable. Both Stargell and Gibson had qualities that voters perceived as making them valuable--call them 'intangibles'--and the only way you can argue that those voters voted wrong is to deny the existence of those qualities.

You can make that argument, of course, but it is obviously just a matter of opinion. It is not objectively correct.
7:27 AM Nov 19th

MarisFan61
WDR: Great examples.
I'll leave aside 1961, not just because of who's involved but mainly because I sort of covered it down there.

Look at 1941: GREAT example.

As is my wont, and I really don't know why more folks don't want it to be their wont :-) ....I don't have any feeling about what was right or wrong there.
I just see what the competing issues and arguments are (how can one not?) and I don't see how one could assert that one side or the other is simply righthow could one? -- even Bill's friend Jerry. :-) :-)

I think I would have voted for Joe. But I'm not sure without having been there, and of course I can see how one would pick Ted.

Arguments for Ted:
-- His hitting stats were far better.
-- The people at that time wouldn't have known this and it might have surprised them (it did me): his situational stats were better than Joe's.
-- He had that .406 batting average (although at the time they had no way of knowing quite how special that would turn out to be).
-- He was tops in the league on Win Shares (ALTHOUGH, would you believe, JUST BY A HAIR!!! -- 42 to 41 over Joe per the data on this site, 41.8 to 41.7 per Baseball Gauge), and tops on "WAR" (per baseball-ref.com, 10.6 to 9.1 over Joe).

[Time out: I think actually those data in themselves might be enough to make everyone here acknowledge that there's a hugely reasonable question here.)

Arguments for Joe:
-- He was on the pennant winner; Ted's team was an extremely distant 2nd.
-- He had that 56 game hitting streak.
-- His home park worked against him, while Ted's worked for him.
-- He had far more of an all-around game than Ted.
-- ......possibly related to which, it might be thought (I do) that he was more likely to have more 'uncalculatable value.'

I would offer that there are almost always reasonable competing considerations for different players in the league, and depending on how you see the priorities among them in any given year (and just how you see the game altogether), you could come out with different answers.

It quite befuddles me that sophisticated folks could imagine that there are "right" answers, and that their belief in "right" and "wrong" answers would so easily lead them to think there are major failings in the current way.
8:31 PM Nov 18th

wdr1946
It would be interesting to know what weight is given or should be given to unusual and record breaking feats in the MVP votes- for instance, Roger Maris won the AL MVP in 1961, although he was not the best player by WAR- but 61 HRs had to be rewarded. Maybe Joe D. in 1941 wasn't as good as Teddy, although this is more arguable, but won. Maury Wills in 1962. I'm not sure if record breaking should be taken into account or not.
5:35 PM Nov 18th

MarisFan61
BTW, the "right" and "wrong" is the heart of what's amiss here.
5:31 PM Nov 18th

BobGill
"All three voters will be right 13.75% of the time,
Two of the three will be right 38.67%,
Only one of the three will be right 36.25%, and
All three will be wrong 11.33%."

Didn't Abraham Lincoln say that?

5:05 PM Nov 18th

MarisFan61
An interventional psychiatrist? :-)
3:57 PM Nov 18th

bjames
Maris, I'd like to introduce you to my friend, Jerry. You two should know one another.
3:43 PM Nov 18th

MarisFan61
(Not addressing the 3rd-voter question, except to say that I don't think it matters very much.)

What I'm saying here won't be any surprise to y'all who have seen my comments before.

There is a fundamental problem with what is done here, as there is in essentially all such pieces in the sabermetric realm (except to the extent that I can never be sure how literally they mean what they're saying, and to what extent a literal reading reflects the actual thought process).

Taken at face value, the article assumes a single and specific meaning for MVP.
(Although Bill does acknowledge that there is no definite answer for who it is, that doesn't seem based on any acknowledgment of legitimate divergence on what it means, just on what the "right" answer to that would be.)

And actually there are 2 separate problems with that.
(No "IMO" needed. :-) There are 2 separate problems with that.)

Problem #1: It's hung up on the linguistics.
It assumes that just because the term has "Valuable" in it, necessarily it is about value.
I can't argue against it being a legitimate view that a thing should be about its exact words. But I don't think it can either be reasonably argued that the opposite view (which is mine) is unreasonable or wrong. As I've said elsewhere, the way I see it is that the meanings of things are shown and are thereby essentially defined by usage over time, and IMO it seems clear that "MVP," not just in baseball but in all sports and venues, has multiple possible meanings, depending on each individual situation. Sometimes it's about literally "most valuable" (re which, see below), sometimes it's best player, sometimes it's who made the biggest difference, sometimes it's to honor a specific outstanding achievement.

Problem #2: Even to the extent that you might think it has to be about value and therefore just is about that, since it's in the title, please realize, there isn't just a single correct view of what "most valuable" means.

Sabermetrics seems almost uniformly to think and indeed assume that it's simply and only about quantitative value of the player's performance.
(......actually that's maybe a little generous, because usually it's about calculatable value, which isn't the same because what we're capable of calculating isn't the same as actual quantitative value, because we're not there yet, and I doubt it's possible ever to be.
But let's forget that.)

The problem with that is, I can tell you for sure that there remains a large school of thought that assumes, just as strongly as sabermetrics mostly assumes the above, that "most valuable" means "player who made the biggest difference."
This would include things like that old Ralph Kiner thing, "We finished last with you, we can finish last without you." It also includes things like when a very good team adds a player who was hardly the best player in the league and may not even have been the best on his own team but who you felt "put them over the top," sometimes including things that are outside the realm of on-field performance, he could be regarded as having made the biggest difference and therefore "most valuable."

Screwy? Crazy? You're allowed to feel that, of course. But if you don't recognize that "MVP" can mean multiple different things, you're missing an important part of the picture, and so whatever follows from that assumption doesn't really cover the subject.
2:51 PM Nov 18th

The Perfect MVP Voting Structure

COMMENTS (28 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: