Three Looks at the MVPs
This research began with a Twitter discussion that I got involved in last week. Somebody in the discussion was arguing. . .
Well, it was Twitter, so it’s hard to say for sure what anybody’s point was. Somebody in the discussion seemed to be arguing, as best I understood it, that Sabermetrics has not actually changed the way that people think about baseball; it has merely taken credit for doing so. His evidence for this was Babe Ruth. Babe Ruth was the biggest star of all time before Sabermetrics. Babe Ruth is recognized now, with modern methods, as the greatest player of all time, so nothing has really changed.
Ordinarily I would not respond to an argument of that nature (which reminds me of one of the greatest lines in the history of Hollywood: Usually one must go to a Bowling Alley to meet a woman of your stature.) Anyway, it happened that, on this day, I had been looking at some old MVP votes which were starkly inconsistent with modern voting practices. I tried to ask the gentleman how he would explain these discrepancies, if the way that we evaluate players has not really changed. Yes, Babe Ruth was always recognized as great, and yes, Babe Ruth WAS great, but what about Don Baylor in 1979, or Jackie Jensen in 1958, to name just a couple of MVP selections from the past which are not likely to be mirrored in the present?
I realized that I was stumbling into a Twitter argument, which is the modern equivalent of interrupting elephants during mating, so I exited the situation as gracefully as I could, but it started me thinking, again, about a subject I have thought about many times. How could we document the changes in MVP voting patterns over time? We THINK we know that MVP voting patterns have changed, but we think we know a lot of things that aren’t true. How could we objectively test this?
I have worked on that problem several times before, but up to now I have just wasted a lot of hours doing research that turned out to be time-consuming and complicated, but not useful. I don’t think I have published anything on this exact issue before; maybe I have, I don’t know, but I don’t think so. But this time I just looked at it from exactly the right angle, and I saw how this could be measured. So here we are.
The Simplest Thing
I’m talking about three things here, three different looks at the MVP of each season: the Most Valuable Player as selected by the BBWAA, the #1 player in the league in terms of Baseball-Reference WAR, and the #1 player in the league in Win Shares. How often do these three searches for the best player in the league come home with the same specimen? Is that number higher now than it was in the 1950s, or the 1970s?
Yes, it is, but we’ll get there. I need to explain also what we are NOT talking about in this article. We are not here to talk about:
1) Who "should" have been the MVP in any season, or
2) Whether Win Shares or WAR is the better method to locate the best player.
There is a time and a place to argue about those things, but this isn’t it. This is about whether modern analytic tools, Win Shares and WAR, are (or are not) more closely allied with MVP voting now than they were in the past.
The simplest thing to do would be just to count how often the MVP Award goes to the player with the highest WAR (baseball reference WAR) in the league, and how often it goes to the player with the most Win Shares, and how often it goes to some other no-account shuffler. Since the BBWAA started voting for the MVP Award in 1931 there have been 178 Awards, one of which was split between two players.
Of those 178 Awards, 58 have gone to the player who led the league in bWAR, and 73 have gone to the player who led the league in Win Shares. In other words, the bWAR MVP and the BBWAA MVP have been the same a little bit less than one-third of the time (58/178), while the Win Shares leader and the BBWAA MVP have been the same a little bit less than 40% of the time (73/178). A fourth of the difference between those two is explained by the fact that ties are more common in Win Shares than in bWAR, and if either of the players who tied for the league lead in Win Shares was the MVP, I counted that as a match. It has happened 7 times that a player TIED for the league lead in Win Shares and was the BBWAA MVP. If we counted half of those as "matches" for Win Shares, then Win Shares would lead bWAR in matches to the MVP award only 69.5 to 58, rather than 73 to 58.
The Nuanced Method
The simplest method, however, is only a binary count of matches, rather than an actual measurement of the degree of agreement. Comparing WAR and the MVP Award, two possible results: either they match, or they don’t match. A binary outcome.
The binary outcome leaves a great deal of information on the shelf. In 2019, for example, the American League MVP was Mike Trout, while the league leader in WAR was Alex Bregman. Bregman, however, was second to Trout in MVP voting, and a close second at that (355 to 335), while Trout was second to Bregman in WAR, and a VERY close second, 8.4 to 8.3. The two systems disagree, but they disagree by only a small amount. We could say—and we will say, in a moment—that the two are 90% in agreement, but just a little bit not so.
In the National League in 1998, on the other hand, Sammy Sosa was the MVP, while Kevin Brown had the league’s highest WAR. Sosa, however, not only was not second in WAR, he also was not third, or fourth, or anything like that. He was not in the top 10. I’m not sure where he was; he might have been 13th, 17th, something like that. And Kevin Brown, the leader in WAR, was not second in the MVP voting, or third, or fourth; he was 16th.
The binary count records these two contests—2019 AL and 1998 NL--as being the same. They are both in the "don’t agree" category. In reality, they are nothing like the same. The 2019 split is a tiny, narrow disagreement between who is a hair ahead and who is a little bit behind, but there is a consensus that those two are the best players in the league. The 1998 split was a complete and total disagreement, with the MVP voters saying that the league leader in WAR was not any kind of an MVP candidate, and the WAR calculators saying that the MVP was not among the best players in the league, not in the top 10 at least.
So this is what we did instead. . . .I mentioned that I just happened to look at this problem in the right way and see what could be done, and this was that way. Suppose that we say that, when the two agree, that that is 100% agreement. If the MVP is not first in WAR, but is second, then we list that as -1 of 10, which means that there is 90% agreement there. If the MVP is third in WAR, we list that as -2, or 80% agreement; if the MVP is third in WAR, we list that as -3, or 70% agreement. If the MVP is not in the top ten in the league in WAR, we list that as -10, or 0% agreement, 100% disagreement.
It is, however, a two-sided process. The "degree of agreement" between MVP Voting and WAR is measured by how the MVP does in WAR, but also by how the league leader in WAR does in MVP voting. It is, then, not a 10-point system, but a 20-point system.
In the American League in 2019 (Trout and Bregman), that scores as 18 out of 20, or 90%--9 out of 10 because the MVP is second in WAR, and 9 out of 10 because the leader in WAR is second in MVP voting. In the National League in 1998 (Sosa vs. Brown), that scores as 0 out of 20, or 0%--0 out of 10 because the MVP is not in the top 10 in WAR, and also 0 out of 10 because the league leader in WAR is not in the top 10 in MVP voting.
That’s very rare. A complete disagreement is a rare outcome. An agreement between WAR and MVP voters, as I said above, has happened 58 times in 178 elections. A complete disagreement between WAR and MVP voting, however, has happened only seven times.
The seven times that there has been a complete disagreement between the two are the National League in 1931, the American League in 1976, the American League in 1984, the American League in 1987, the American League in 1992, the American League in 1993, and the National League in 1998. And a complete disagreement between the MVP voting and Win Shares has happened only twice, those two being in the National League in 1931 and in the American League in 1984.
The Five Possible Outcomes
In a simple form, there are five possible outcomes:
1) All three systems agree as to who the MVP should be,
2) The MVP voters and WAR agree, but Win Shares gets a different answer,
3) The MVP voters and Win Shares agree, but WAR gets a different answer,
4) The two analytical systems agree, but the MVP voters don’t, and
5) The three systems get three different answers.
In 178 leagues, we have:
39 cases in which all three methods agree
18 cases in which the MVP voters and WAR agree, but Win Shares does not,
34 cases in which the MVP voters and Win Shares agree, but WAR does not,
38 cases in which WAR and Win Shares agree, but the MVP voters did not, and
49 cases in which the three methods reach three different answers.
In the "nuanced" form, there are 20 possible answers for the extent of agreement—0%, 5%, 10%, 15%, 20%, etc., up to 90%, and then 100%. It is not possible to get 95% agreement since, if the elected MVP is not also the league leader in WAR, then the league leader in WAR cannot be the elected MVP. You can’t have -1 over 20; you can have -0 or -2, but not -1. But you can have -3, -4. . .whatever.
The Three Methods Agree
Just listing these for your perusal and enjoyment, these are the 39 cases in which all three methods agree as to who the Most Valuable Player was:
1932 American League, Jimmie Foxx
1933 American League, Jimmie Foxx
1936 National League, Carl Hubbell-Telescope
1937 National League, Joe Medwick
1938 American League, Jimmie Foxx
1939 National League, Bucky Walters
1943 National League, Stan Musial
1945 American League, Hal Newhouser
1946 American League, Ted Williams
1946 National League, Stan Musial
1948 National League, Stan Musial
1949 American League, Ted Williams
1953 American League, Al Rosen
1954 National League, Willie Mays
1956 American League, Mickey Mantle
1957 American League, Mickey Mantle
1965 National League, Willie Mays
1967 American League, Carl Yastrzemski
1968 National League, Bob Gibson
1975 National League, Joe Morgan
1976 National League, Joe Morgan
1977 American League, Rod Carew
1980 American League, George Brett
1981 National League, Mike Schmidt
1982 American League, Robin Yount
1983 American League, Cal Ripken
1984 National League, Ryne Sandberg
1990 National League, Barry Bonds
1991 American League, Cal Ripken
2001 National League, Barry Bonds
2002 National League, Barry Bonds
2003 American League, Alex Rodriguez
2004 National League, Barry Bonds
2005 American League, Alex Rodriguez
2005 National League, Albert Pujols
2007 American League, Alex Rodriguez
2009 National League, Albert Pujols
2015 National League, Bryce Harper
2016 National League, Kris Bryant
Why Can’t We All Just Get Along?
The 1931 National League race, on the other hand, illustrates the concept of near-complete disagreement among the three. The National League MVP, by the voters, was Frankie Frisch. The league leader in WAR, however, was Brooklyn pitcher Watty Clark, and the league leader in Win Shares was Boston outfielder Wally Berger.
Frankie Frisch, the MVP, was not in the Top 10 in the league in WAR, and Watty Clark, the league leader in WAR, was not in the Top 10 in Win Shares. That’s a complete disagreement between the two systems—10 points off because the MVP was not in the Top 10 in WAR, and 10 points off because the league leader in WAR was not in the Top 10 in MVP voting, either. There is a 0% agreement between the two systems.
The league leader in Win Shares was a third player, Wally Berger. Again, a 0% agreement with the MVP voters. Frisch, the MVP, was not among the Top 10 in Win Shares, and Berger, the league leader in Win Shares, was not in the Top 10 in MVP voting.
Between Win Shares and WAR, however, there is a small degree of agreement. Watty Clark, the league leader in WAR, had only 22 Win Shares, which was (a) more than Frisch, but (b) still not in the Top 10 in the league. Wally Berger, the league leader in Win Shares, however, was 5th in the league in WAR, so that is only a -4 there. The agreement between Win Shares and WAR, then, is scored at 20 minus 10 minus 4, divided by 20, or 30%:
(20 – 10 – 4) / 20 = .30
The 1931 National League vote/calculation is the MOST divided in history. There is zero percent agreement between MVP voters and WAR, and 0% agreement between MVP voters and Win Shares. There is a 30% agreement between WAR and Win Shares, so the sum total of the agreement is .30, on a scale in which 3.00 is the maximum, so that’s a 10% consensus. These are the 10 most-divided contests of all time; I guess it is actually twelve because there is a tie for the 9-10-11-12 spots:
1931 National League
MVP: Frankie Frisch
WAR Leader: Watty Clark
Win Shares Leader: Wally Berger
Agreement: 10%
1992 American League
MVP: Dennis Eckersley
WAR Leader: Roger Clemens
Win Shares Leader: Roberto Alomar
Agreement: 17%
1998 American League
MVP: Juan Gonzalez
WAR Leader: Alex Rodriguez
Win Shares Leader: Albert Belle
Agreement: 22%
1987 National League
MVP: Andre Dawson
WAR Leader: Tony Gwynn
Win Shares Leader: Tim Raines
Agreement: 28%
1984 American League
MVP: Willie Hernandez
WAR Leader: Cal Ripken
Win Shares Leader: Cal Ripken
Agreement: 33%
(There was 100% agreement between the two analytical approaches, but 0% agreement of either one with the MVP voting, since Hernandez was not in the top 10 in either analytical stat, and Ripken was not in the top 10 in the BBWAA voting. Ripken, in fact, received only one tenth-place vote.)
2006 American League
MVP: Justin Morneau Canada!
WAR Leader: Johan Santana
Win Shares Leader: Derek Jeter
Agreement: 33%
1976 American League
MVP: Thurman Munson
WAR Leader: Mark (the Bird) Fidrych
Win Shares Leader: George Brett
Agreement: 37%
1998 National League
MVP: Sammy Sosa
WAR Leader: Kevin Brown
Win Shares Leader: Mark McGwire
Agreement: 40%
1987 American League
MVP: George Bell
WAR Leader: Roger Clemens
Win Shares Leader: Alan Trammell
Agreement: 42%
1993 American League
MVP: Frank Thomas
WAR Leader: Kevin Appier
Win Shares Leader: John Olerud
Agreement: 42%
1995 American League
MVP: Mo Vaughn
WAR Leader: Randy Johnson
Win Shares Leader: Edgar Martinez
Agreement: 42%
2004 American League
MVP: Vladimir Guerrero
WAR Leader: Ichiro Suzuki
Win Shares Leader: Gary Sheffield
Personal Opinion: I would have voted for David Ortiz
Agreement: 42%
You will notice, in the above, that there is a period of time in which almost all of the widest splits occur. More on that in the next segment.
Conclusions
Excuse me. . ..
Conclusions!
Ah, the advantages of not working in an academic environment.
Anyway, there are three real conclusions which can be drawn from this study.
1) It is clearly true that the Analytical age has strongly influenced MVP voting. MVP voting matches with the analytical stats, and in particular with WAR, much more strongly now than it did 20 years ago.
2) Throughout all of MVP voting history, Win Shares matches MVP voting much, much more closely than WAR does. I’m not saying that Win Shares is right and WAR is wrong; I am merely saying that Win Shares is much closer to MVP voting than WAR is.
3) Both Win Shares and WAR need to be updated. They’ve both got some hickeys which are revealed by a careful review of this process.
1) It is clearly true that the Analytical age has strongly influenced MVP voting.
The "Degree of Agreement" between the MVP votes and the analytical stats has shot up in the last 20 years, obviously as a result of MVP voters being influenced by the analytical stats.
In the early days of the MVP vote, there was a high degree of agreement between the voting and the analytical stats, which of course were not developed until decades after these votes took place, but retrospectively, there was a high degree of agreement:
Years
|
MVP Vote to WAR Agreement
|
MVP Vote to Win Shares Agreement
|
1931
|
to
|
1939
|
71%
|
79%
|
1940
|
to
|
1949
|
82%
|
84%
|
In the 1940s, to a lesser extent in the 1930s, there were just a lot of very obvious MVP votes. In the 1940s two superstars—Williams and Musial—were consensus picks for the award a total of five times. Hal Newhouser was a consensus pick once.
From 1950 to 1999, the "degree of agreement" between the MVP votes and the analytical stats, particularly WAR, went steadily downward, dropping to 49% agreement in the 1990s:
Years
|
MVP Vote to WAR Agreement
|
MVP Vote to Win Shares Agreement
|
1931
|
to
|
1939
|
71%
|
79%
|
1940
|
to
|
1949
|
82%
|
84%
|
1950
|
to
|
1959
|
61%
|
74%
|
1960
|
to
|
1969
|
61%
|
80%
|
1970
|
to
|
1979
|
55%
|
84%
|
1980
|
to
|
1989
|
52%
|
78%
|
1990
|
to
|
1999
|
49%
|
74%
|
The 1940s number is just kind of a fluke. If you replace that 82% from the 1940s with, let’s say, 67% for WAR and 77% for Win Shares, then you can see that the chart is in almost perfect order.
As to why the degree of agreement between MVP votes and analytical stats declined for 50 years (or 70 years, if we write the 1940s off as a fluke). . there are multiple causes for that, and I’ll write them up in a separate section, "Why the MVP votes and the Analytical Stats drifted apart for a Half-Century." The point here is only that they did—and, absent an outside force, they would have continued to do so into the 20th century.
In the last 20 years, however, the MVP votes have moved dramatically closer to the analytical stats:
Years
|
MVP Vote to WAR Agreement
|
MVP Vote to Win Shares Agreement
|
1931
|
to
|
1939
|
71%
|
79%
|
1940
|
to
|
1949
|
82%
|
84%
|
1950
|
to
|
1959
|
61%
|
74%
|
1960
|
to
|
1969
|
61%
|
80%
|
1970
|
to
|
1979
|
55%
|
84%
|
1980
|
to
|
1989
|
52%
|
78%
|
1990
|
to
|
1999
|
49%
|
74%
|
2000
|
to
|
2009
|
72%
|
89%
|
2010
|
to
|
2019
|
83%
|
86%
|
There is just no room for doubt, I don’t think, about this conclusion.
2) Throughout all of MVP voting history, Win Shares matches MVP voting much, much more closely than WAR does.
I did expect that Win Shares would match MVP voting more closely than WAR does; however, I was extremely surprised by the extent to which this is true.
First, Win Shares matches MVP voting better than WAR does in every decade of MVP voting history without exception; see chart above.
Second, if you focus not on the "agreement percentage" but on the "disagreement percentage". . . .the disagreement percentage for WAR is often twice as high, or more than twice as high. In the 1960s there is a 20% disagreement rate between Win Shares and the MVP voting, a 39% disagreement rate for WAR. In the 1970s it is 16% for Win Shares, 45% for WAR. In the 1980s it is 22% for Win Shares, 48% for WAR, and in the 1990s, it is 26% for Win Shares, 51% for WAR. In the first decade of this century it was 11% for Win Shares, 28% for WAR.
Third, consider this stat. In MVP voting history, there have been 58 times that the league leader in WAR won the MVP Award, but 34 times when the league leader in WAR was not in the top 10 in MVP voting. 58-34. For Win Shares, the split is 73-7. There have only been 7 times in history when the league leader in Win Shares was not in the top ten in MVP voting (list will follow).
In the last decade, WAR has gained rapidly on Win Shares as a predictor of MVP voting. In all of baseball history through 2009, there are only 11 cases in which WAR agrees with the MVP voters, but Win Shares is an outlier, only 11 times in 78 years. In the last ten years that has happened seven times—
2010 American League, WAR and the MVP voters both think that Josh Hamilton was the best player in the American League; Win Shares would have preferred Jose Bautista or Robinson Cano.
2011, WAR and the MVP voters agree that Justin Verlander was the American League MVP; Win Shares would have chosen his teammate, Miguel Cabrera.
2012, WAR and the MVP voters agree that Watch It, Buster Posey was the National League MVP; Win Shares would have chosen Andrew McCutchen.
2016, WAR and the MVP voters agree that Mike Trout was the American League’s best MVP pick; Win Shares would have chosen Jose Altuve.
2017, WAR and the MVP voters agree than Giancarlo Stanton was the National League’s MVP; Win Shares would have chosen Charlie Blackmon.
2018, WAR and the MVP voteratti agree, and I agree, that Mookie Betts was the American League’s MVP; Win Shares would have chosen Mike Trout.
2019, WAR and the MVP voters agree that Cody Bellinger was the National League MVP; Win Shares would have chosen Christian Yelich. (Do Jewish family ever name their sons Jewish Rothstein or anything? Just wondering.)
However, even in the last decade, Win Shares still tracks with MVP voting better than WAR does. There are also several cases in the last decade in which WAR was the outlier, while Win Shares and the voters agreed.
These are the 7 players who led their league in Win Shares, but were not in the top 10 in MVP voting:
1931 National League, Wally Berger
1945 National League, Stan Hack
1951 American League, Ted Williams
1952 American League, Larry Doby
1954 American League, Mickey Mantle
1984 American League, Cal Ripken
1995 National League, Barry Bonds
Why the MVP votes and the Analytical Stats
drifted apart for a Half-Century
There are several factors at work here. . .well, three, let us say.
First, there is the compression of talent over time, noted by Stephen Jay Gould and others. The stars of the 1930s, Jimmie Foxx and Lou Gehrig and Lefty Grove, etc., towered over their competitors to a significantly greater extent than through most of baseball post-1950.
Second, there is expansion. When you have more entries into a competition, any competition, it becomes less likely that one will be obviously better than all of the others.
Third, or so people will say, after the Cy Young Award was introduced in 1956, and in particular after there was a Cy Young vote in each league beginning in 1967, it gradually became less common for a pitcher to win the MVP Award.
But let’s drill down on that. Yes, the essential condition of WAR not matching the MVP voting IS caused mostly by pitchers not winning the MVP Award even when they are seen by WAR as the most valuable players in the league. Of the 34 players who are seen by WAR as being the most valuable player in the league, but who did not finish in the Top 10 in the league in MVP voting, 29 are pitchers. Only 5 are position players.
BUT.
But don’t rush from that to the conclusion that this is just a Cy Young Award effect. It isn’t.
First, the Win Shares system ALSO measures pitchers and position players on a common scale, just as WAR does—but the Win Shares system does not have this condition, of a sharply widening split between the league leader and the voted MVP in the years 1950 to 1999. So why is that?
Second, there are 34 cases in which there is a player who led the league in WAR but did not finish in the Top 10 in MVP voting, and 29 of those are pitchers, but in the Win Shares method, there are only 7 cases in which there is a player who led the league but did not finish in the Top 10 in MVP voting, and NONE of those are pitchers. We listed them above, see? None of them are pitchers. So why is that?
Third, the problem of pitchers not winning the MVP award because they have their own award is not closely connected in time to the Cy Young Award. Look at this breakdown:
Years 1931-1945 MVP Awards: 30 Won by Pitchers: 9 30%
Years 1946-1955 MVP Awards: 20 Won by Pitchers: 2 10%
Years 1956-1966 MVP Awards: 22 Won by Pitchers: 2 9%
Years 1967-1979 MVP Awards: 26 Won by Pitchers: 3 11%
Years 1980-1999 MVP Awards: 40 Won by Pitchers: 4 10%
Years 2000-2019 MVP Awards: 40 Won by Pitchers: 2 5%
So where does the decline in pitchers winning the MVP Award actually occur—1956 (the first Cy Young Award), 1967 (the split of the Cy Young into two leagues), or 1946 (the beginning of the Post-War era)?
Yes, there is a second drop-off beginning 2000, but remember, since 2000, WAR and the MVP Award are much more closely connected, so the 2000 drop-off is not related to our essential problem, which is "Why are there 29 pitchers who led their league in WAR, but didn’t finish in the Top 10 in MVP voting?"
Here’s what I am trying to say. MAYBE the fact that pitchers don’t win MVP Awards is not related to the Cy Young Award at a high level. MAYBE it is caused by the fact that many voters don’t perceive pitchers as the most valuable players in the league except in exceptional cases—as the Win Shares system also does not.
What actually happens is, the WAR system goes through this weird 30-year-period in which it usually thinks that some pitcher should win the MVP Award.
In the 1940s, WAR chooses a pitcher as the MVP 6 times. In 1950s, it is 5 times; in the 1960s, 6 times. In the years 1940 to 1969, WAR chooses a pitcher as the MVP 28% of the time—17 out of 60, a reasonable percentage.
But then in the 1970s, WAR chooses a pitcher as the Most Valuable Player in the League 13 times. The number goes down to 8 times in the 1980s, but then is back up to 13 in the 1990s. In the years 1970 to 1999, WAR chooses a pitcher as the MVP 34 times in 60 awards, or 57%.
Is that reasonable, do you think?
Reasonable or not, what really happened is NOT that the voters stopped choosing pitchers as MVPs. In the years 1970 to 1999, voters chose pitchers as MVPs with the same frequency that they had since 1946—10%. What ACTUALLY happened is that, for some reason, WAR STARTED picking a pitcher as the MVP most of the time. The "disagreement" between WAR and the MVP voting in those years is not caused by the MVP voters; it is clearly caused by WAR.
Summation
Oh, I’m sorry. . .
Summation!!!
So here is what I think, having put about a week into pouring this data into forms and then studying the forms, this is what I have concluded; you can take it for whatever you think it is worth. I think that this study shows that both Win Shares and baseball reference WAR need to be updated.
Win Shares was created 20 years ago. I made some mistakes in the original design of the study. I should have figured Win Shares and Loss Shares separately and then united them into one value. I didn’t. The system has, in certain respects, gotten behind the times in terms of the incorporation of modern defensive values into the process.
The process of saying what a player’s "value" is is immensely complicated, and in that long and complicated process the analyst has to make hundreds of choices about the interpretation of data. The analyst has to make some decision about the treatment of fluke outcomes. Norm Cash had almost exactly the same three true outcomes in 1962 as he had in 1961, but his batting average dropped 118 points. Do you treat that as a fluke, or do you treat it as a reality? There is no clearly correct answer. Do you measure park effects in one-year increments, or five-year aggregates? There is no clearly correct answer. If a team wins 90 games but has numbers which suggest that they should have won 80, do you treat them as a 90-win group of players, or an 80-win group of players? One answer is not necessarily better than the other.
We’re choosing a pathway through a forest of choices. You make one choice, you wind up at a lake; you make the other choice, you wind up on a mountain.
At times, in designing Win Shares, I was absolutist when I should have chosen a middle ground. I chose a narrow pathway when I should have chosen a broader one. Also, I made the system so damned complicated that almost nobody really understands it; people say all kinds of things about Win Shares that are clearly not true, but it’s my own fault for making the system so complicated. Of perhaps more importance, making it so complicated makes it hard to fix, hard to update, hard to program.
But. . .this is just my opinion; take it for what it is worth. I think the problems of Win Shares are trivial compared with the problems of WAR. Sabermetrics is supposed to be, as much as possible, an open road toward insight on an issue. The designers of WAR—friends of mine, almost without exception—have made choices which create an extremely narrow pathway through the forest of problems.
If you think about it, if you create a logical pathway toward Wins Above Replacement, you first have to measure WINS. Right? If you’re measuring Wins, and you are measuring Wins Above Replacement, which problem do you come to first? AFTER you measure how many WINS each player has contributed to his team, THEN you are in a position to ask "How many of those wins would have been contributed by a Replacement Level Player, and how many are Wins Above Replacement?"
This has never been done. The designers of WAR skipped the first problem, and tried to take a shortcut toward the second.
The problem of how many wins a player has contributed ABOVE replacement level is necessarily more complicated than the problem of how many Wins he has contributed. In order to reach Wins Above Replacement, you have to solve all of the problems associated with measuring Wins, and then you have to solve an additional set of problems.
This has never been done. I spent two, three years working essentially full-time on Win Shares, trying to think through every little problem as best I was able. I made some mistakes. But if you’re REALLY going to measure Wins Above Replacement, rather than merely pretending that you are measuring it, you’re going to have to take a couple of year’s sabbatical from whatever else it is that you are doing, and think through all of the problems. I don’t believe that anyone has ever done this, and I don’t believe that the structure of WAR was ever really thought through in a logical fashion.
If you think about it, this should be obvious: that your measurement of the number of Wins a player is above Replacement can never be more accurate than your measure of his Wins.
And then, WAR is a derivative stat, derived from an estimate of the player’s Wins and an estimate of the Replacement Level, one subtracted from the other. But a derivative stat of this nature is inherently less accurate than EITHER of its component measurements. It absolutely has to be.
I propose this, as a thought experiment. This complicated math that we go through to find Win Shares or WAR, it is like a scale. It is a scale that measures value—and, frankly, it is not a tremendously accurate scale. It’s a best-we-can-do scale.
This is my thought experiment. Suppose that you create a universe of players, and suppose that, for each player, you create (a) a number of wins for him, and (b) a number of replacement-level wins, to be subtracted from his wins to find his value. Suppose, however, that the scale on which you measure each one of those things is 10% inaccurate—just 10%. Will the resulting derivative stat also be 10% inaccurate?
No; in fact, it will be something like 35% inaccurate. Suppose that a player’s true Wins Contributed is 7.0, but that the replacement level player would have contributed 4.0 (a normal ratio of wins to WAR.) His true value is 3.0 WAR. But if each of the two major components is measured with a potential error of 10%, then the player’s measured Wins Above Replacement can be anywhere from 1.9 (that is, 6.3 minus 4.4) up to 4.1 (that is, 7.7 minus 3.6). With a potential 10% measurement error on each element, a player with a WAR of 3.0 can be measured anywhere from 1.9 to 4.1.
What I think that almost no one who uses WAR understands is how fantastically accurate your process measurements would have to be to get an accurate WAR. A derivative estimate contains all of the inaccuracy in any of the components from which it is derived, combined in a geometric fashion. And, in fact, they have NOT arrived at an accurate WAR.
WAR is. . . it’s not a fraud, because a fraud is a DELIBERATE attempt to mislead. No one involved in the creation of promulgation of WAR has attempted to mislead you; they have merely over-estimated their ability to measure baseball value accurately, based on what we know. We’re not there yet; we have not yet reached the point at which WAR estimates are even reasonably reliable. Due to the remarkable skills of Sean Foreman, and his devotion to the concept of WAR, millions of people have come to attribute to WAR a reliability that the stat simply does not have.
When you review WAR estimates in the way that I have been spent the last week doing, this becomes obvious. B-WAR leads the user along a narrow pathway through the forest of decisions, and tells us that the best player in the American League in 1962 was: Hank Aguirre. This is a weird idea. I hate to tell you this, but Hank Aguirre was really NOT the best player in the American League in 1962—nor one of the five best, nor one of the 20 best. B-WAR leads the user along a narrow pathway through the statistics of the 1966 season, and tells you that, if you buy ALL of their choices, the best player in the American League was not Frank Robinson, it was Earl Wilson. This is a weird idea. It is a weird conclusion, and it is logically indefensible.
And there are quite a few of them.
Do you know who WAR says was the Most Valuable Player in the American League in 2008? Nick Markakis. He wasn’t mentioned in the American League’s MVP voting, but. . .that’s what they want us to believe.
Nick Markakis back then was a good player. These are Markakis’ stat lines from 2007 through 2009, copied directly from Sean Foreman’s wonderful site:
Year
|
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
SB
|
CS
|
BB
|
SO
|
BA
|
OBP
|
SLG
|
OPS
|
2007
|
|
161
|
637
|
97
|
191
|
43
|
3
|
23
|
112
|
18
|
6
|
61
|
112
|
.300
|
.362
|
.485
|
.848
|
2008
|
|
157
|
595
|
106
|
182
|
48
|
1
|
20
|
87
|
10
|
7
|
99
|
113
|
.306
|
.406
|
.491
|
.897
|
2009
|
|
161
|
642
|
94
|
188
|
45
|
2
|
18
|
101
|
6
|
2
|
56
|
98
|
.293
|
.347
|
.453
|
.801
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Doesn’t it look to you, kind of, like Nick Markakis was the same player in 2008 that he was in 2007 or 2009? Isn’t that the conclusion that you would tend to reach?
But no, WAR says that Markakis’ value was 4.2 in 2007 and 2.9 in 2008, but 7.4 in 2008. His value in 2008 was greater than his combined value in 2007 AND 2009. It is, frankly, a weird thing to say. His walks were up by 38 but his RBI were down by 25, his other stats really the same. I buy it to the extent of saying that he had SOME more value in 2008 than in the other years. If you said he was 4.2 in 2007 but 5.2 in 2008, I’d be OK with that. Win Shares shows his value in those three seasons as 20-23-16—a moderate increase for 2008. The conclusion that he was, for some reason, the American League’s best player in 2008 is weird.
I should not leave the impression that the 2008 calculation is mysterious and I don’t understand it, or some nitwit will write and explain it to me. It results from a combination of his offensive and his defensive stats. His walks spiked upward in 2008, leading to an increase in offensive value, and his defensive value also spiked upward. BWAR says that his dWAR is negative in every season of his career up to 2015, except 2008, when it is tremendously positive. His dWAR by season, beginning in 2006, was -0.1, -0.1, +1.8, -0.8, -1.7, -0.1, -1.2, -0.5, -1.4. The spike in defensive value in 2008 explains most of why he was the American League’s best player that year. I’m not saying that I don’t understand it; I’m saying that I don’t believe it.
WAR chooses a narrow pathway through the forest of numbers, to lead you to that conclusion—and people say, "Oh. Okay. If that’s what the formulas say, I guess that’s his value."
Well. . .one of the differences between WAR and Win Shares is in the proportion of value that we assign to top-flight pitchers. WAR, as I have suggested, probably assigns too much value to top-flight pitchers—and Win Shares almost certainly assigns too little.
Clayton Kershaw in 2014 went 21-3 with a 1.77 ERA, and won the National League’s Most Valuable Player Award. He made only 27 starts and pitched only 198 innings, so Win Shares values him at only 22 Win Shares although he was nearly perfect in his 27 starts. 22 Win Shares is not one of the Top 10 totals in the National League.
Win Shares is wrong about that. Somehow, we have undervalued him. We have undervalued many of the VERY top pitchers, the Verlander-in-2011 type seasons. The system needs to be re-evaluated on that issue.
I have been trying for several years to re-work Win Shares as Win Shares and Loss Shares, but it’s an enormously complicated issue, and I have just never been able to find the time to work it all the way to the finish line. I’ll hope to get that finished this year.
Thanks for reading. I’ll open this up for comments tomorrow. If anyone involved with WAR wants to post a response article defending WAR, of course we’ll be happy to post that.