Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

Three Looks at the MVPs

By Bill James

February 20, 2020

Three Looks at the MVPs

This research began with a Twitter discussion that I got involved in last week. Somebody in the discussion was arguing. . .

Well, it was Twitter, so it’s hard to say for sure what anybody’s point was. Somebody in the discussion seemed to be arguing, as best I understood it, that Sabermetrics has not actually changed the way that people think about baseball; it has merely taken credit for doing so. His evidence for this was Babe Ruth. Babe Ruth was the biggest star of all time before Sabermetrics. Babe Ruth is recognized now, with modern methods, as the greatest player of all time, so nothing has really changed.

Ordinarily I would not respond to an argument of that nature (which reminds me of one of the greatest lines in the history of Hollywood: Usually one must go to a Bowling Alley to meet a woman of your stature.) Anyway, it happened that, on this day, I had been looking at some old MVP votes which were starkly inconsistent with modern voting practices. I tried to ask the gentleman how he would explain these discrepancies, if the way that we evaluate players has not really changed. Yes, Babe Ruth was always recognized as great, and yes, Babe Ruth WAS great, but what about Don Baylor in 1979, or Jackie Jensen in 1958, to name just a couple of MVP selections from the past which are not likely to be mirrored in the present?

I realized that I was stumbling into a Twitter argument, which is the modern equivalent of interrupting elephants during mating, so I exited the situation as gracefully as I could, but it started me thinking, again, about a subject I have thought about many times. How could we document the changes in MVP voting patterns over time? We THINK we know that MVP voting patterns have changed, but we think we know a lot of things that aren’t true. How could we objectively test this?

I have worked on that problem several times before, but up to now I have just wasted a lot of hours doing research that turned out to be time-consuming and complicated, but not useful. I don’t think I have published anything on this exact issue before; maybe I have, I don’t know, but I don’t think so. But this time I just looked at it from exactly the right angle, and I saw how this could be measured. So here we are.

The Simplest Thing

I’m talking about three things here, three different looks at the MVP of each season: the Most Valuable Player as selected by the BBWAA, the #1 player in the league in terms of Baseball-Reference WAR, and the #1 player in the league in Win Shares. How often do these three searches for the best player in the league come home with the same specimen? Is that number higher now than it was in the 1950s, or the 1970s?

Yes, it is, but we’ll get there. I need to explain also what we are NOT talking about in this article. We are not here to talk about:

1) Who "should" have been the MVP in any season, or

2) Whether Win Shares or WAR is the better method to locate the best player.

There is a time and a place to argue about those things, but this isn’t it. This is about whether modern analytic tools, Win Shares and WAR, are (or are not) more closely allied with MVP voting now than they were in the past.

The simplest thing to do would be just to count how often the MVP Award goes to the player with the highest WAR (baseball reference WAR) in the league, and how often it goes to the player with the most Win Shares, and how often it goes to some other no-account shuffler. Since the BBWAA started voting for the MVP Award in 1931 there have been 178 Awards, one of which was split between two players.

Of those 178 Awards, 58 have gone to the player who led the league in bWAR, and 73 have gone to the player who led the league in Win Shares. In other words, the bWAR MVP and the BBWAA MVP have been the same a little bit less than one-third of the time (58/178), while the Win Shares leader and the BBWAA MVP have been the same a little bit less than 40% of the time (73/178). A fourth of the difference between those two is explained by the fact that ties are more common in Win Shares than in bWAR, and if either of the players who tied for the league lead in Win Shares was the MVP, I counted that as a match. It has happened 7 times that a player TIED for the league lead in Win Shares and was the BBWAA MVP. If we counted half of those as "matches" for Win Shares, then Win Shares would lead bWAR in matches to the MVP award only 69.5 to 58, rather than 73 to 58.

The Nuanced Method

The simplest method, however, is only a binary count of matches, rather than an actual measurement of the degree of agreement. Comparing WAR and the MVP Award, two possible results: either they match, or they don’t match. A binary outcome.

The binary outcome leaves a great deal of information on the shelf. In 2019, for example, the American League MVP was Mike Trout, while the league leader in WAR was Alex Bregman. Bregman, however, was second to Trout in MVP voting, and a close second at that (355 to 335), while Trout was second to Bregman in WAR, and a VERY close second, 8.4 to 8.3. The two systems disagree, but they disagree by only a small amount. We could say—and we will say, in a moment—that the two are 90% in agreement, but just a little bit not so.

In the National League in 1998, on the other hand, Sammy Sosa was the MVP, while Kevin Brown had the league’s highest WAR. Sosa, however, not only was not second in WAR, he also was not third, or fourth, or anything like that. He was not in the top 10. I’m not sure where he was; he might have been 13^th, 17^th, something like that. And Kevin Brown, the leader in WAR, was not second in the MVP voting, or third, or fourth; he was 16^th.

The binary count records these two contests—2019 AL and 1998 NL--as being the same. They are both in the "don’t agree" category. In reality, they are nothing like the same. The 2019 split is a tiny, narrow disagreement between who is a hair ahead and who is a little bit behind, but there is a consensus that those two are the best players in the league. The 1998 split was a complete and total disagreement, with the MVP voters saying that the league leader in WAR was not any kind of an MVP candidate, and the WAR calculators saying that the MVP was not among the best players in the league, not in the top 10 at least.

So this is what we did instead. . . .I mentioned that I just happened to look at this problem in the right way and see what could be done, and this was that way. Suppose that we say that, when the two agree, that that is 100% agreement. If the MVP is not first in WAR, but is second, then we list that as -1 of 10, which means that there is 90% agreement there. If the MVP is third in WAR, we list that as -2, or 80% agreement; if the MVP is third in WAR, we list that as -3, or 70% agreement. If the MVP is not in the top ten in the league in WAR, we list that as -10, or 0% agreement, 100% disagreement.

It is, however, a two-sided process. The "degree of agreement" between MVP Voting and WAR is measured by how the MVP does in WAR, but also by how the league leader in WAR does in MVP voting. It is, then, not a 10-point system, but a 20-point system.

In the American League in 2019 (Trout and Bregman), that scores as 18 out of 20, or 90%--9 out of 10 because the MVP is second in WAR, and 9 out of 10 because the leader in WAR is second in MVP voting. In the National League in 1998 (Sosa vs. Brown), that scores as 0 out of 20, or 0%--0 out of 10 because the MVP is not in the top 10 in WAR, and also 0 out of 10 because the league leader in WAR is not in the top 10 in MVP voting.

That’s very rare. A complete disagreement is a rare outcome. An agreement between WAR and MVP voters, as I said above, has happened 58 times in 178 elections. A complete disagreement between WAR and MVP voting, however, has happened only seven times.

The seven times that there has been a complete disagreement between the two are the National League in 1931, the American League in 1976, the American League in 1984, the American League in 1987, the American League in 1992, the American League in 1993, and the National League in 1998. And a complete disagreement between the MVP voting and Win Shares has happened only twice, those two being in the National League in 1931 and in the American League in 1984.

The Five Possible Outcomes

In a simple form, there are five possible outcomes:

1) All three systems agree as to who the MVP should be,

2) The MVP voters and WAR agree, but Win Shares gets a different answer,

3) The MVP voters and Win Shares agree, but WAR gets a different answer,

4) The two analytical systems agree, but the MVP voters don’t, and

5) The three systems get three different answers.

In 178 leagues, we have:

39 cases in which all three methods agree

18 cases in which the MVP voters and WAR agree, but Win Shares does not,

34 cases in which the MVP voters and Win Shares agree, but WAR does not,

38 cases in which WAR and Win Shares agree, but the MVP voters did not, and

49 cases in which the three methods reach three different answers.

In the "nuanced" form, there are 20 possible answers for the extent of agreement—0%, 5%, 10%, 15%, 20%, etc., up to 90%, and then 100%. It is not possible to get 95% agreement since, if the elected MVP is not also the league leader in WAR, then the league leader in WAR cannot be the elected MVP. You can’t have -1 over 20; you can have -0 or -2, but not -1. But you can have -3, -4. . .whatever.

The Three Methods Agree

Just listing these for your perusal and enjoyment, these are the 39 cases in which all three methods agree as to who the Most Valuable Player was:

1932 American League, Jimmie Foxx

1933 American League, Jimmie Foxx

1936 National League, Carl Hubbell-Telescope

1937 National League, Joe Medwick

1938 American League, Jimmie Foxx

1939 National League, Bucky Walters

1943 National League, Stan Musial

1945 American League, Hal Newhouser

1946 American League, Ted Williams

1946 National League, Stan Musial

1948 National League, Stan Musial

1949 American League, Ted Williams

1953 American League, Al Rosen

1954 National League, Willie Mays

1956 American League, Mickey Mantle

1957 American League, Mickey Mantle

1965 National League, Willie Mays

1967 American League, Carl Yastrzemski

1968 National League, Bob Gibson

1975 National League, Joe Morgan

1976 National League, Joe Morgan

1977 American League, Rod Carew

1980 American League, George Brett

1981 National League, Mike Schmidt

1982 American League, Robin Yount

1983 American League, Cal Ripken

1984 National League, Ryne Sandberg

1990 National League, Barry Bonds

1991 American League, Cal Ripken

2001 National League, Barry Bonds

2002 National League, Barry Bonds

2003 American League, Alex Rodriguez

2004 National League, Barry Bonds

2005 American League, Alex Rodriguez

2005 National League, Albert Pujols

2007 American League, Alex Rodriguez

2009 National League, Albert Pujols

2015 National League, Bryce Harper

2016 National League, Kris Bryant

Why Can’t We All Just Get Along?

The 1931 National League race, on the other hand, illustrates the concept of near-complete disagreement among the three. The National League MVP, by the voters, was Frankie Frisch. The league leader in WAR, however, was Brooklyn pitcher Watty Clark, and the league leader in Win Shares was Boston outfielder Wally Berger.

Frankie Frisch, the MVP, was not in the Top 10 in the league in WAR, and Watty Clark, the league leader in WAR, was not in the Top 10 in Win Shares. That’s a complete disagreement between the two systems—10 points off because the MVP was not in the Top 10 in WAR, and 10 points off because the league leader in WAR was not in the Top 10 in MVP voting, either. There is a 0% agreement between the two systems.

The league leader in Win Shares was a third player, Wally Berger. Again, a 0% agreement with the MVP voters. Frisch, the MVP, was not among the Top 10 in Win Shares, and Berger, the league leader in Win Shares, was not in the Top 10 in MVP voting.

Between Win Shares and WAR, however, there is a small degree of agreement. Watty Clark, the league leader in WAR, had only 22 Win Shares, which was (a) more than Frisch, but (b) still not in the Top 10 in the league. Wally Berger, the league leader in Win Shares, however, was 5^th in the league in WAR, so that is only a -4 there. The agreement between Win Shares and WAR, then, is scored at 20 minus 10 minus 4, divided by 20, or 30%:

(20 – 10 – 4) / 20 = .30

The 1931 National League vote/calculation is the MOST divided in history. There is zero percent agreement between MVP voters and WAR, and 0% agreement between MVP voters and Win Shares. There is a 30% agreement between WAR and Win Shares, so the sum total of the agreement is .30, on a scale in which 3.00 is the maximum, so that’s a 10% consensus. These are the 10 most-divided contests of all time; I guess it is actually twelve because there is a tie for the 9-10-11-12 spots:

1931 National League

MVP: Frankie Frisch

WAR Leader: Watty Clark

Win Shares Leader: Wally Berger

Agreement: 10%

1992 American League

MVP: Dennis Eckersley

WAR Leader: Roger Clemens

Win Shares Leader: Roberto Alomar

Agreement: 17%

1998 American League

MVP: Juan Gonzalez

WAR Leader: Alex Rodriguez

Win Shares Leader: Albert Belle

Agreement: 22%

1987 National League

MVP: Andre Dawson

WAR Leader: Tony Gwynn

Win Shares Leader: Tim Raines

Agreement: 28%

1984 American League

MVP: Willie Hernandez

WAR Leader: Cal Ripken

Win Shares Leader: Cal Ripken

Agreement: 33%

(There was 100% agreement between the two analytical approaches, but 0% agreement of either one with the MVP voting, since Hernandez was not in the top 10 in either analytical stat, and Ripken was not in the top 10 in the BBWAA voting. Ripken, in fact, received only one tenth-place vote.)

2006 American League

MVP: Justin Morneau Canada!

WAR Leader: Johan Santana

Win Shares Leader: Derek Jeter

Agreement: 33%

1976 American League

MVP: Thurman Munson

WAR Leader: Mark (the Bird) Fidrych

Win Shares Leader: George Brett

Agreement: 37%

1998 National League

MVP: Sammy Sosa

WAR Leader: Kevin Brown

Win Shares Leader: Mark McGwire

Agreement: 40%

1987 American League

MVP: George Bell

WAR Leader: Roger Clemens

Win Shares Leader: Alan Trammell

Agreement: 42%

1993 American League

MVP: Frank Thomas

WAR Leader: Kevin Appier

Win Shares Leader: John Olerud

Agreement: 42%

1995 American League

MVP: Mo Vaughn

WAR Leader: Randy Johnson

Win Shares Leader: Edgar Martinez

Agreement: 42%

2004 American League

MVP: Vladimir Guerrero

WAR Leader: Ichiro Suzuki

Win Shares Leader: Gary Sheffield

Personal Opinion: I would have voted for David Ortiz

Agreement: 42%

You will notice, in the above, that there is a period of time in which almost all of the widest splits occur. More on that in the next segment.

Conclusions

Excuse me. . ..

Conclusions!

Ah, the advantages of not working in an academic environment.

Anyway, there are three real conclusions which can be drawn from this study.

1) It is clearly true that the Analytical age has strongly influenced MVP voting. MVP voting matches with the analytical stats, and in particular with WAR, much more strongly now than it did 20 years ago.

2) Throughout all of MVP voting history, Win Shares matches MVP voting much, much more closely than WAR does. I’m not saying that Win Shares is right and WAR is wrong; I am merely saying that Win Shares is much closer to MVP voting than WAR is.

3) Both Win Shares and WAR need to be updated. They’ve both got some hickeys which are revealed by a careful review of this process.

1) It is clearly true that the Analytical age has strongly influenced MVP voting.

The "Degree of Agreement" between the MVP votes and the analytical stats has shot up in the last 20 years, obviously as a result of MVP voters being influenced by the analytical stats.

In the early days of the MVP vote, there was a high degree of agreement between the voting and the analytical stats, which of course were not developed until decades after these votes took place, but retrospectively, there was a high degree of agreement:

Years			MVP Vote to WAR Agreement	MVP Vote to Win Shares Agreement
1931	to	1939	71%	79%
1940	to	1949	82%	84%

In the 1940s, to a lesser extent in the 1930s, there were just a lot of very obvious MVP votes. In the 1940s two superstars—Williams and Musial—were consensus picks for the award a total of five times. Hal Newhouser was a consensus pick once.

From 1950 to 1999, the "degree of agreement" between the MVP votes and the analytical stats, particularly WAR, went steadily downward, dropping to 49% agreement in the 1990s:

Years			MVP Vote to WAR Agreement	MVP Vote to Win Shares Agreement
1931	to	1939	71%	79%
1940	to	1949	82%	84%
1950	to	1959	61%	74%
1960	to	1969	61%	80%
1970	to	1979	55%	84%
1980	to	1989	52%	78%
1990	to	1999	49%	74%

The 1940s number is just kind of a fluke. If you replace that 82% from the 1940s with, let’s say, 67% for WAR and 77% for Win Shares, then you can see that the chart is in almost perfect order.

As to why the degree of agreement between MVP votes and analytical stats declined for 50 years (or 70 years, if we write the 1940s off as a fluke). . there are multiple causes for that, and I’ll write them up in a separate section, "Why the MVP votes and the Analytical Stats drifted apart for a Half-Century." The point here is only that they did—and, absent an outside force, they would have continued to do so into the 20^th century.

In the last 20 years, however, the MVP votes have moved dramatically closer to the analytical stats:

Years			MVP Vote to WAR Agreement	MVP Vote to Win Shares Agreement
1931	to	1939	71%	79%
1940	to	1949	82%	84%
1950	to	1959	61%	74%
1960	to	1969	61%	80%
1970	to	1979	55%	84%
1980	to	1989	52%	78%
1990	to	1999	49%	74%
2000	to	2009	72%	89%
2010	to	2019	83%	86%

There is just no room for doubt, I don’t think, about this conclusion.

2) Throughout all of MVP voting history, Win Shares matches MVP voting much, much more closely than WAR does.

I did expect that Win Shares would match MVP voting more closely than WAR does; however, I was extremely surprised by the extent to which this is true.

First, Win Shares matches MVP voting better than WAR does in every decade of MVP voting history without exception; see chart above.

Second, if you focus not on the "agreement percentage" but on the "disagreement percentage". . . .the disagreement percentage for WAR is often twice as high, or more than twice as high. In the 1960s there is a 20% disagreement rate between Win Shares and the MVP voting, a 39% disagreement rate for WAR. In the 1970s it is 16% for Win Shares, 45% for WAR. In the 1980s it is 22% for Win Shares, 48% for WAR, and in the 1990s, it is 26% for Win Shares, 51% for WAR. In the first decade of this century it was 11% for Win Shares, 28% for WAR.

Third, consider this stat. In MVP voting history, there have been 58 times that the league leader in WAR won the MVP Award, but 34 times when the league leader in WAR was not in the top 10 in MVP voting. 58-34. For Win Shares, the split is 73-7. There have only been 7 times in history when the league leader in Win Shares was not in the top ten in MVP voting (list will follow).

In the last decade, WAR has gained rapidly on Win Shares as a predictor of MVP voting. In all of baseball history through 2009, there are only 11 cases in which WAR agrees with the MVP voters, but Win Shares is an outlier, only 11 times in 78 years. In the last ten years that has happened seven times—

2010 American League, WAR and the MVP voters both think that Josh Hamilton was the best player in the American League; Win Shares would have preferred Jose Bautista or Robinson Cano.

2011, WAR and the MVP voters agree that Justin Verlander was the American League MVP; Win Shares would have chosen his teammate, Miguel Cabrera.

2012, WAR and the MVP voters agree that Watch It, Buster Posey was the National League MVP; Win Shares would have chosen Andrew McCutchen.

2016, WAR and the MVP voters agree that Mike Trout was the American League’s best MVP pick; Win Shares would have chosen Jose Altuve.

2017, WAR and the MVP voters agree than Giancarlo Stanton was the National League’s MVP; Win Shares would have chosen Charlie Blackmon.

2018, WAR and the MVP voteratti agree, and I agree, that Mookie Betts was the American League’s MVP; Win Shares would have chosen Mike Trout.

2019, WAR and the MVP voters agree that Cody Bellinger was the National League MVP; Win Shares would have chosen Christian Yelich. (Do Jewish family ever name their sons Jewish Rothstein or anything? Just wondering.)

However, even in the last decade, Win Shares still tracks with MVP voting better than WAR does. There are also several cases in the last decade in which WAR was the outlier, while Win Shares and the voters agreed.

These are the 7 players who led their league in Win Shares, but were not in the top 10 in MVP voting:

1931 National League, Wally Berger

1945 National League, Stan Hack

1951 American League, Ted Williams

1952 American League, Larry Doby

1954 American League, Mickey Mantle

1984 American League, Cal Ripken

1995 National League, Barry Bonds

Why the MVP votes and the Analytical Stats

drifted apart for a Half-Century

There are several factors at work here. . .well, three, let us say.

First, there is the compression of talent over time, noted by Stephen Jay Gould and others. The stars of the 1930s, Jimmie Foxx and Lou Gehrig and Lefty Grove, etc., towered over their competitors to a significantly greater extent than through most of baseball post-1950.

Second, there is expansion. When you have more entries into a competition, any competition, it becomes less likely that one will be obviously better than all of the others.

Third, or so people will say, after the Cy Young Award was introduced in 1956, and in particular after there was a Cy Young vote in each league beginning in 1967, it gradually became less common for a pitcher to win the MVP Award.

But let’s drill down on that. Yes, the essential condition of WAR not matching the MVP voting IS caused mostly by pitchers not winning the MVP Award even when they are seen by WAR as the most valuable players in the league. Of the 34 players who are seen by WAR as being the most valuable player in the league, but who did not finish in the Top 10 in the league in MVP voting, 29 are pitchers. Only 5 are position players.

BUT.

But don’t rush from that to the conclusion that this is just a Cy Young Award effect. It isn’t.

First, the Win Shares system ALSO measures pitchers and position players on a common scale, just as WAR does—but the Win Shares system does not have this condition, of a sharply widening split between the league leader and the voted MVP in the years 1950 to 1999. So why is that?

Second, there are 34 cases in which there is a player who led the league in WAR but did not finish in the Top 10 in MVP voting, and 29 of those are pitchers, but in the Win Shares method, there are only 7 cases in which there is a player who led the league but did not finish in the Top 10 in MVP voting, and NONE of those are pitchers. We listed them above, see? None of them are pitchers. So why is that?

Third, the problem of pitchers not winning the MVP award because they have their own award is not closely connected in time to the Cy Young Award. Look at this breakdown:

Years 1931-1945 MVP Awards: 30 Won by Pitchers: 9 30%

Years 1946-1955 MVP Awards: 20 Won by Pitchers: 2 10%

Years 1956-1966 MVP Awards: 22 Won by Pitchers: 2 9%

Years 1967-1979 MVP Awards: 26 Won by Pitchers: 3 11%

Years 1980-1999 MVP Awards: 40 Won by Pitchers: 4 10%

Years 2000-2019 MVP Awards: 40 Won by Pitchers: 2 5%

So where does the decline in pitchers winning the MVP Award actually occur—1956 (the first Cy Young Award), 1967 (the split of the Cy Young into two leagues), or 1946 (the beginning of the Post-War era)?

Yes, there is a second drop-off beginning 2000, but remember, since 2000, WAR and the MVP Award are much more closely connected, so the 2000 drop-off is not related to our essential problem, which is "Why are there 29 pitchers who led their league in WAR, but didn’t finish in the Top 10 in MVP voting?"

Here’s what I am trying to say. MAYBE the fact that pitchers don’t win MVP Awards is not related to the Cy Young Award at a high level. MAYBE it is caused by the fact that many voters don’t perceive pitchers as the most valuable players in the league except in exceptional cases—as the Win Shares system also does not.

What actually happens is, the WAR system goes through this weird 30-year-period in which it usually thinks that some pitcher should win the MVP Award.

In the 1940s, WAR chooses a pitcher as the MVP 6 times. In 1950s, it is 5 times; in the 1960s, 6 times. In the years 1940 to 1969, WAR chooses a pitcher as the MVP 28% of the time—17 out of 60, a reasonable percentage.

But then in the 1970s, WAR chooses a pitcher as the Most Valuable Player in the League 13 times. The number goes down to 8 times in the 1980s, but then is back up to 13 in the 1990s. In the years 1970 to 1999, WAR chooses a pitcher as the MVP 34 times in 60 awards, or 57%.

Is that reasonable, do you think?

Reasonable or not, what really happened is NOT that the voters stopped choosing pitchers as MVPs. In the years 1970 to 1999, voters chose pitchers as MVPs with the same frequency that they had since 1946—10%. What ACTUALLY happened is that, for some reason, WAR STARTED picking a pitcher as the MVP most of the time. The "disagreement" between WAR and the MVP voting in those years is not caused by the MVP voters; it is clearly caused by WAR.

Summation

Oh, I’m sorry. . .

Summation!!!

So here is what I think, having put about a week into pouring this data into forms and then studying the forms, this is what I have concluded; you can take it for whatever you think it is worth. I think that this study shows that both Win Shares and baseball reference WAR need to be updated.

Win Shares was created 20 years ago. I made some mistakes in the original design of the study. I should have figured Win Shares and Loss Shares separately and then united them into one value. I didn’t. The system has, in certain respects, gotten behind the times in terms of the incorporation of modern defensive values into the process.

The process of saying what a player’s "value" is is immensely complicated, and in that long and complicated process the analyst has to make hundreds of choices about the interpretation of data. The analyst has to make some decision about the treatment of fluke outcomes. Norm Cash had almost exactly the same three true outcomes in 1962 as he had in 1961, but his batting average dropped 118 points. Do you treat that as a fluke, or do you treat it as a reality? There is no clearly correct answer. Do you measure park effects in one-year increments, or five-year aggregates? There is no clearly correct answer. If a team wins 90 games but has numbers which suggest that they should have won 80, do you treat them as a 90-win group of players, or an 80-win group of players? One answer is not necessarily better than the other.

We’re choosing a pathway through a forest of choices. You make one choice, you wind up at a lake; you make the other choice, you wind up on a mountain.

At times, in designing Win Shares, I was absolutist when I should have chosen a middle ground. I chose a narrow pathway when I should have chosen a broader one. Also, I made the system so damned complicated that almost nobody really understands it; people say all kinds of things about Win Shares that are clearly not true, but it’s my own fault for making the system so complicated. Of perhaps more importance, making it so complicated makes it hard to fix, hard to update, hard to program.

But. . .this is just my opinion; take it for what it is worth. I think the problems of Win Shares are trivial compared with the problems of WAR. Sabermetrics is supposed to be, as much as possible, an open road toward insight on an issue. The designers of WAR—friends of mine, almost without exception—have made choices which create an extremely narrow pathway through the forest of problems.

If you think about it, if you create a logical pathway toward Wins Above Replacement, you first have to measure WINS. Right? If you’re measuring Wins, and you are measuring Wins Above Replacement, which problem do you come to first? AFTER you measure how many WINS each player has contributed to his team, THEN you are in a position to ask "How many of those wins would have been contributed by a Replacement Level Player, and how many are Wins Above Replacement?"

This has never been done. The designers of WAR skipped the first problem, and tried to take a shortcut toward the second.

The problem of how many wins a player has contributed ABOVE replacement level is necessarily more complicated than the problem of how many Wins he has contributed. In order to reach Wins Above Replacement, you have to solve all of the problems associated with measuring Wins, and then you have to solve an additional set of problems.

This has never been done. I spent two, three years working essentially full-time on Win Shares, trying to think through every little problem as best I was able. I made some mistakes. But if you’re REALLY going to measure Wins Above Replacement, rather than merely pretending that you are measuring it, you’re going to have to take a couple of year’s sabbatical from whatever else it is that you are doing, and think through all of the problems. I don’t believe that anyone has ever done this, and I don’t believe that the structure of WAR was ever really thought through in a logical fashion.

If you think about it, this should be obvious: that your measurement of the number of Wins a player is above Replacement can never be more accurate than your measure of his Wins.

And then, WAR is a derivative stat, derived from an estimate of the player’s Wins and an estimate of the Replacement Level, one subtracted from the other. But a derivative stat of this nature is inherently less accurate than EITHER of its component measurements. It absolutely has to be.

I propose this, as a thought experiment. This complicated math that we go through to find Win Shares or WAR, it is like a scale. It is a scale that measures value—and, frankly, it is not a tremendously accurate scale. It’s a best-we-can-do scale.

This is my thought experiment. Suppose that you create a universe of players, and suppose that, for each player, you create (a) a number of wins for him, and (b) a number of replacement-level wins, to be subtracted from his wins to find his value. Suppose, however, that the scale on which you measure each one of those things is 10% inaccurate—just 10%. Will the resulting derivative stat also be 10% inaccurate?

No; in fact, it will be something like 35% inaccurate. Suppose that a player’s true Wins Contributed is 7.0, but that the replacement level player would have contributed 4.0 (a normal ratio of wins to WAR.) His true value is 3.0 WAR. But if each of the two major components is measured with a potential error of 10%, then the player’s measured Wins Above Replacement can be anywhere from 1.9 (that is, 6.3 minus 4.4) up to 4.1 (that is, 7.7 minus 3.6). With a potential 10% measurement error on each element, a player with a WAR of 3.0 can be measured anywhere from 1.9 to 4.1.

What I think that almost no one who uses WAR understands is how fantastically accurate your process measurements would have to be to get an accurate WAR. A derivative estimate contains all of the inaccuracy in any of the components from which it is derived, combined in a geometric fashion. And, in fact, they have NOT arrived at an accurate WAR.

WAR is. . . it’s not a fraud, because a fraud is a DELIBERATE attempt to mislead. No one involved in the creation of promulgation of WAR has attempted to mislead you; they have merely over-estimated their ability to measure baseball value accurately, based on what we know. We’re not there yet; we have not yet reached the point at which WAR estimates are even reasonably reliable. Due to the remarkable skills of Sean Foreman, and his devotion to the concept of WAR, millions of people have come to attribute to WAR a reliability that the stat simply does not have.

When you review WAR estimates in the way that I have been spent the last week doing, this becomes obvious. B-WAR leads the user along a narrow pathway through the forest of decisions, and tells us that the best player in the American League in 1962 was: Hank Aguirre. This is a weird idea. I hate to tell you this, but Hank Aguirre was really NOT the best player in the American League in 1962—nor one of the five best, nor one of the 20 best. B-WAR leads the user along a narrow pathway through the statistics of the 1966 season, and tells you that, if you buy ALL of their choices, the best player in the American League was not Frank Robinson, it was Earl Wilson. This is a weird idea. It is a weird conclusion, and it is logically indefensible.

And there are quite a few of them.

Do you know who WAR says was the Most Valuable Player in the American League in 2008? Nick Markakis. He wasn’t mentioned in the American League’s MVP voting, but. . .that’s what they want us to believe.

Nick Markakis back then was a good player. These are Markakis’ stat lines from 2007 through 2009, copied directly from Sean Foreman’s wonderful site:

Year	G	AB	R	H	2B	3B	HR	RBI	SB	CS	BB	SO	BA	OBP	SLG	OPS
2007	161	637	97	191	43	3	23	112	18	6	61	112	.300	.362	.485	.848
2008	157	595	106	182	48	1	20	87	10	7	99	113	.306	.406	.491	.897
2009	161	642	94	188	45	2	18	101	6	2	56	98	.293	.347	.453	.801

Doesn’t it look to you, kind of, like Nick Markakis was the same player in 2008 that he was in 2007 or 2009? Isn’t that the conclusion that you would tend to reach?

But no, WAR says that Markakis’ value was 4.2 in 2007 and 2.9 in 2008, but 7.4 in 2008. His value in 2008 was greater than his combined value in 2007 AND 2009. It is, frankly, a weird thing to say. His walks were up by 38 but his RBI were down by 25, his other stats really the same. I buy it to the extent of saying that he had SOME more value in 2008 than in the other years. If you said he was 4.2 in 2007 but 5.2 in 2008, I’d be OK with that. Win Shares shows his value in those three seasons as 20-23-16—a moderate increase for 2008. The conclusion that he was, for some reason, the American League’s best player in 2008 is weird.

I should not leave the impression that the 2008 calculation is mysterious and I don’t understand it, or some nitwit will write and explain it to me. It results from a combination of his offensive and his defensive stats. His walks spiked upward in 2008, leading to an increase in offensive value, and his defensive value also spiked upward. BWAR says that his dWAR is negative in every season of his career up to 2015, except 2008, when it is tremendously positive. His dWAR by season, beginning in 2006, was -0.1, -0.1, +1.8, -0.8, -1.7, -0.1, -1.2, -0.5, -1.4. The spike in defensive value in 2008 explains most of why he was the American League’s best player that year. I’m not saying that I don’t understand it; I’m saying that I don’t believe it.

WAR chooses a narrow pathway through the forest of numbers, to lead you to that conclusion—and people say, "Oh. Okay. If that’s what the formulas say, I guess that’s his value."

Well. . .one of the differences between WAR and Win Shares is in the proportion of value that we assign to top-flight pitchers. WAR, as I have suggested, probably assigns too much value to top-flight pitchers—and Win Shares almost certainly assigns too little.

Clayton Kershaw in 2014 went 21-3 with a 1.77 ERA, and won the National League’s Most Valuable Player Award. He made only 27 starts and pitched only 198 innings, so Win Shares values him at only 22 Win Shares although he was nearly perfect in his 27 starts. 22 Win Shares is not one of the Top 10 totals in the National League.

Win Shares is wrong about that. Somehow, we have undervalued him. We have undervalued many of the VERY top pitchers, the Verlander-in-2011 type seasons. The system needs to be re-evaluated on that issue.

I have been trying for several years to re-work Win Shares as Win Shares and Loss Shares, but it’s an enormously complicated issue, and I have just never been able to find the time to work it all the way to the finish line. I’ll hope to get that finished this year.

Thanks for reading. I’ll open this up for comments tomorrow. If anyone involved with WAR wants to post a response article defending WAR, of course we’ll be happy to post that.

COMMENTS (32 Comments, most recent shown first)

tangotiger
Right, my preference is to simply show Kershaw and Koufax like this:
11-0
13-3

And then let the reader decide which is the more outstanding performance. Turning 2 dimensions into 1 automatically requires the assumptions to that process to be accepted. There's no chance to "undo" it back to 2 dimensions. So, at the very least, if one insists on a 1 dimension presentation (WAR, WAA, Win Shares Above Bench, etc), we should ALSO show the 2 dimensions (W, L).

95% of the disagreements will go away when we do that.
12:42 PM Mar 4th

studes
Brock, I *think* that Tango is saying that having both wins and losses is a good thing (not just WAA). By having both wins and losses, you don't imply that .500 is the innate level. You can interpret the results however you think best.
10:06 AM Mar 3rd

tangotiger
I qualified it by saying we are looking at the best single seasons. So, 20-6 v 24-10 or something like that.
6:49 AM Mar 2nd

Brock Hanke
tango tiger - I've thought about the comment you made in answer to mine, about Wins above .500. In general, I'd be willing to take your word for it, since you do much more work in this field than I do. However, it still doesn't seem right to me. Say you have two pitchers, one who goes 22-12, the other, 12-2. They are both ten wins above average. But the first guy, the 22-12, amounts to the 12-2 pitcher PLUS a .500 pitcher for 20 decisions. .500 pitchers are not freely available, like I have to tell YOU that. But I'm having trouble getting past it, since it's been the argument used against Linear Weights for decades, and I believe the argument. Can you help me out a bit here with a slightly longer explanation of WHY this works? Please don't feel like I want a 5,000-word essay; I'm not trying to be a public nuisance. But I am trying to figure out how the claim could be true. I just can't get it to work out in my head.
2:04 AM Mar 2nd

CharlesSaeger
@Guy123: Agree on this. In more layman's terms, Fielding WS have something like one half or one third the run value to average that other systems have. While I can see the point—fielding stats are notoriously fluky—you still have the issue of a team defense that saves 100 runs getting credit in FWS for saving about 40 runs, with the rest getting folded back to the pitchers.

There is one thing that FWS does have going for it that TZR and the other systems that BB-Ref and Fangraphs use do not: it does tie outright to team defensive performance. We can more easily measure the fielding of teams than we can of individual fielders, and the team total should be our check. That will pare down some of the flukiness of fielding stats.
7:55 PM Feb 26th

Guy123
I agree with Tango below that WAR would probably be stronger if it used some kind of weighted average of fielding metrics, rather than relying only on one metric. That would largely avoid the kind of large year-to-year swings that Bill highlights in the Markakis case. While we can't be sure that his defense didn't improve dramatically in 2008 and then fall off again -- as his hitting did -- it seems less likely in the case of fielding. I think we should require greater evidence than a single metric before assigning (or removing) that much value.

Fielding Win Shares does have the virtue of a higher year-to-year stability. Unfortunately, it comes at the cost of often failing to reward defensive excellence, while apparently assigning value simply for showing up. Fielding Win Share totals seems largely to reflect playing time and difficulty of position, with only modest adjustments for, well, fielding.

One consequence is that weak fielders at key defensive positions tend to be rated higher than vastly more talented fielders at other positions. For example, WS claims that Derek Jeter contributed more defensive value than Mike Schmidt, Andruw Jones, Curt Flood, Adrian Beltre, or Scott Rolen, and almost three times the value of Keith Hernandez. (Spoiler alert: none of that is true.)

And even within a position, sheer playing time seems to matter far, far more than performance. That yields incongruous claims, like equating Devon White (one of the very best CFs in history) and Bernie Williams (one of the weakest). Some others:
Gary Pettis = Gorman Thomas
Curt Flood = Amos Otis
Paul Blair = Steve Finley
Lorenzo Cain < Johnny Damon

In LF we see things like this:
Cory Gardner = Willie Stargell
Alex Gordon < Lou Brock
Rondell White = Carlos Lee
Carl Crawford = Willie Stargell
Mike Greenwell = Matt Holliday

At 1B, the career leaders (since WWII) are Tony Perez and Steve Garvey, well ahead of actual excellent fielders like Hernandez, Olerud, and Teixeira. We see claims like:
Keith Hernandez = Paul Konerko
John Olerud = Bill Buckner
Bill White = Willie McCovey
P. Goldschmidt = J. Giambi

So yes, WAR's handling of fielding can be improved. But its problems are dwarfed by those we observe in fielding Win Shares.
4:44 PM Feb 26th

tangotiger
Brock,

Wins above average, when looking for the best single seasons, works fine.
7:26 AM Feb 26th

Brock Hanke
tango tiger - You called it a simple method, so there's a limit as to how much criticism of any kind is appropriate, but doesn't "W-L" involve having a zero point of .500, like Linear Weights? I don't know if that's a problem in a simple method, but it was the first thing I thought of when I read it.
12:42 AM Feb 26th

Manushfan
Whut do the Newfangled Stats say? Pie Traynor wasn't that good in the field. Whut did those who actually saw him play or played against him say? He was the gold standard. I go with them.
5:06 PM Feb 25th

studes
FWIW, I'll address a couple of comments that Bill raised:

"If you think about it, if you create a logical pathway toward Wins Above Replacement, you first have to measure WINS. Right?"

In the case of WAR, you first measure wins above average, not total wins. I'm comfortable with that, but not everyone may be.

"...(b) a number of replacement-level wins, to be subtracted from his wins to find his value."

I'm personally of the opinion that replacement level is totally subjective, and different replacement levels can be used for different purposes. I know that not everyone agrees with that. As Tom says, perhaps it's better to just use wins and losses, and let everyone disagree upfront about how to interpret them. People will have to get comfortable with negative losses.

I also want to add that a study like this suffers from a sample bias, in that all of the players studied essentially played a full season. Otherwise, they wouldn't even be considered for MVP. For players with varying amounts of playing time, however, I prefer WAR at least until we add Loss Shares to Win Shares.
2:49 PM Feb 25th

jgf704
I'm a big fan of the 2-dimensional approach as well.

My own shortcut method for getting wins and losses from WAR and WAA is:

W = 2.5 * WAR - 1.5 * WAA
L = 2.5 * WAR - 3.5 * WAA

And sometimes I multiply these by 3 to put them on the win shares scale. Let's call this ibWS (inferred from BB-Ref win shares).

By this method, Koufax 1965 is 39-10 and Kershaw 2013 is 33 and -3 (yes -3 losses).

Anyway, besides Tango's point about the subjective judgement about which of these is better, there is also the quantitative issue of WS vs. ibWS. WS has Koufax ahead by 33 to 22, while ibWS shows them much closer at 39 to 33. To me, this points to the issue Bill acknowledged, i.e. that WS assigns too little value to top flight pitchers. Obviously, this is only one example, and it would be good to study the issue systematically.
12:59 PM Feb 25th

tangotiger
Also, this may be useful to those who think that WAR is some sort of black box. It's really quite elegant. I did the thing below in literally 20 minutes.

www.tangotiger.com/index.php/site/article/introducing-naivewar

Obviously, the hard part is everything else, but it will oscillate around the above.

12:29 PM Feb 25th

tangotiger
11-0 Kershaw 2013

15+1 Koufax 1963
10-0 Koufax 1964
13-3 Koufax 1965
14-0 Koufax 1966

So, if you want to compare Kershaw 2013 to Koufax 1965, you can see that my process certainly gives Koufax the volume consistent with the extra innings. It's just that at the RATE level, Kershaw was way ahead (in 1965).

Therefore, the question is what's "better":
11-0
13-3

And they are close enough.

The main issue I think is that if you are missing the Loss Shares, then you are getting an incomplete view.

Similarly with WAR, it's one dimension.

This is why I prefer the two dimensional approach of showing wins AND losses.

11:09 AM Feb 25th

tangotiger
I looked into it, especially Ryan Doumit. It's legit.
8:30 AM Feb 25th

ksclacktc
@tangotiger

Do you think Fangraphs defense for Catchers was messed up when they added framing numbers to the last 10-12 years? I do.
7:59 AM Feb 25th

Guy123
Gary: WAR does take IP into account. The core of pitcher WAR is how many fewer/more runs he allowed vs. an average pitcher. In the case of Koufax '65 vs. Kershaw '13, Koufax did pitch far more innings but he allowed runs (63% of league average) at a higher rate than Kershaw (51%). After accounting for opposing hitters and team defense, they end up with equivalent WAR. But IP are definitely rewarded, when per-batter performance is similar. Look at Koufax the next year, when his ERA- was 53: despite throwing fewer IP, his WAR increases to 10.3.

There are certainly elements of the WAR calculation one could quarrel with, including the method for estimating team defense behind each pitcher. But quantity of innings are appropriately accounted for.
7:20 AM Feb 25th

Guy123
Before selecting Win Shares as your method of evaluating starting pitchers, you may want to consider what that entails. Here are a few examples:

*Aramis Ramirez had a better career than Pedro Martinez.
*Adam Dunn was better than CC Sabathia
*Roy Halladay was almost as good as Ryan Klesko (but not quite)
*If Zach Greinke has a couple more good years, he might catch Carlos Lee or Derek Lee.
*Nick Markakis has had a more valuable career than Justin Verlander.
*Clayton Kershaw and Asdrubral Cabrera have posted almost perfectly equal careers.

Of if you prefer single-season data, here are some pairs of 2019 seasons that WS says had equivalent value:
*Gerrit Cole and Charlie Blackmon
*Jason deGrom and Kevin Newman
*Stephen Strasburg and Kyle Schwarber.
*Jack Flaherty and J.D. Davis
*Max Scherzer and Dexter Fowler

None of these are consistent with fans' intuition of player value, or how the MLB marketplace valued these players. Compared to these claims, the MVP cases for Aguirre, Wilson and Markakis are rock solid. And there are countless more examples like these.
11:22 PM Feb 24th

garywmaloney
First, Bill, thank you for crystallizing your thoughts on WAR and presenting them to us. Was starting to get concerned about your use of WAR in recent articles, as to whether you had abandoned your own WinShares system (which I believe to be superior, though not perfect).

Second, you did not directly address what I believe to be a flaw in WAR (which pervades many of Jay Jaffe's excellent analyses of great players in Fangraphs) -- namely, the overvaluing of fielding, and most importantly, the NEGATING of by WAR of otherwise good or even exceptional offensive seasons and careers, by minus ratings on fielding. Curiously, this is the same flaw of Linear Weights that you eviscerated in Win Shares and New Historical Abstract, though not taken to the Nap-Lajoie-1915 or Johnny-Bench-was-a-lousy-fielder extremes.

Third, starters today vs. the past. I am astounded that there is so little discussion about the vastly different workloads, and yet no apparent adjustment in WAR.

Here are two legitimately great seasons, valued the same (using bWAR):

Koufax, 1965, 8.1 WAR
Kershaw, 2013, 8.0 WAR

WAR sees this as Kershaw's best season, incidentally. Now -- two awesome seasons, different eras surely . . . but EQUIVALENT ACROSS TIME? Really?
Kershaw had 33 starts, 236 IP; Koufax sustained a similar level of performance, but 41 starts (+ 2 relief), 335.2 IP. Koufax faced almost 400 more batters than Kershaw . . . and WAR calls it the same value?

WinShares does not make that mistake -- Kershaw 2013 earns 22.3 WS, Koufax 1965 earns 33.2 WS.

The laughable reference to the Hank Aguirre 1962 season - same sort of thing. Aguirre led the AL in ERA, pitched 216 innings (just 22 starts), rated at 7.4 WAR (exactly equal to Kershaw's 2015 season, just below Clayton's 2014 CYA season of 7.7 WAR). Aguirre earns 22.3 WinShares.

Should modern pitchers seasons -- fewer starts, significantly fewer innings, hundreds fewer batters -- be rated as the same value and achievement as the 300-IP hurlers of the 60s and 70s? Win Shares says No. And I tend to agree.

The value of individual batters' seasons has not changed -- full-time starting players still have about the same numbers of plate appearances per season. But IP and BFs and GS have changed for pitchers -- and so there are fewer pitchers' seasons that hit the 30-, 25- or even 20-WinShare barrier. Why should this surprise us? The WEIGHT of those performances is smaller than before; the era of the LaRussa-style push-button bullpen has spread out the pitching burden, and this is one of the results.

If there are flaws in this reasoning, guys, have at it.
10:17 PM Feb 24th

tangotiger
If pitchers are having less of an impact, we'd see it in their W/L records. We can create a simple metric: W minus L. This is how many such seasons that you have +15 or higher in each decade:

Season N
1970 11
1980 12
1990 14
2000 13
2010 14

And +10 or higher:
Season N
1970 81
1980 65
1990 84
2000 90
2010 98

Even though there are fewer innings, they are still having an outsized impact.

Some of that is attributed to more teams. After you adjust for that, you'll see it's still quite flat.

Verlander being #32 in Win Shares among all players in 2019 is certainly unreasonable. Bill has acknowledged that he's got the top-end starters too low.

9:45 PM Feb 24th

tangotiger
You can also go here:
www.tinyurl.com/fangraphsRF2008

You will see both DRS and UZR. The +22 for Markakis is there under DRS. Under UZR, he's +12.

Randy Winn is the opposite, +20 by UZR and +9 as DRS.

If you want to see Total Zone (TZ), you can see it here:
www.baseball-reference.com/leagues/MLB/2008-specialpos_rf-fielding.shtml

It's under Rtot. Markakis is +9. Randy Winn is +10.

Now we see Franklin Gutierrez (and outstanding CF playing RF) at +21 in Total Zone. He's +23 in DRS. He's +21 in UZR (you can see it on Fangraphs, but change the Innings qualifier).

So, if you wanted to blend these three approaches, equally, you'd get:
+21 Gutierrez
+14 Markakis
+13 Winn

Markakis would therefore drop 8 runs, or about 0.8 wins.

9:33 PM Feb 24th

Brock Hanke
Guy and jgf - Huh. I had no idea that anyone had actually analyzed this as its own issue. For me, it was a side effect of doing the project I mentioned. I just noticed the gap and its widening as we passed through the seasons (we do a decade run and then a different decade. We just finished the 2000s, and are now doing the 1920s). I am inclined to agree with Guy that the issue is that Win Shares keeps dropping pitcher ratings because of dropping IP. I, personally, agree with this, and think that WS is closer to reality than WAR, but I really don't KNOW enough about how WAR works to be sure of any cause. I just noted that this happens, and it's likely important. You guys are likely way ahead of me.
9:28 PM Feb 24th

studes
That's another great example of why fielding stats should be regressed, which Win Shares kind of does by putting limits on fielding Win Shares.
9:26 PM Feb 24th

tangotiger
Go here:
www.billjamesonline.com/stats/fielding_bible_runs_saved/

Type in Markakis

Go down to Rightfield, look at 2008

You will see +22 runs. That's what Baseball Reference is using. If this was 2009, it would use -8. If this was 2007, it would use +4.

There's really no mystery here.

The question is if that +22 is reliable enough to use as-is as a single source. Or, is Bill suggesting a "broad" approach, meaning you need two or three approaches to fielding.

If so, I agree. That's why I use 50% WAR from Reference and 50% from Fangraphs. That gives me the "broad" approach, two distinct paths to get to my best estimate.

9:21 PM Feb 24th

bjames
AL runs per game

2007: 4.90
2008: 4.78
2009: 4.82

Now, I understand as little about WAR as anyone. But if it's referring to a replacement player, isn't that replacement hitter, in this case, going to be better in years with more offense and worse in years with less?

Markakis's hit big in a down year.

Sure, but the difference between 2007 and 2008 offense is trivial, and doesn't have very much effect on the calculation. The difference is just .12 runs per 27 outs. Given the number of outs that Markakis made, that would be less than two runs.
3:51 PM Feb 24th

shthar
Now I remember.

AL runs per game

2007: 4.90
2008: 4.78
2009: 4.82

Now, I understand as little about WAR as anyone. But if it's referring to a replacement player, isn't that replacement hitter, in this case, going to be better in years with more offense and worse in years with less?

Markakis's hit big in a down year. Doesn't that mean his WAR will be higher? Even if it had stayed the same, he would have been higher, since the league was lower. Right?

Now should it have gone up THAT much? I dunno.

Or am I just missing the point of WAR completely?

12:22 PM Feb 24th

jgf704
Brocke... I did something similar to what you are talking about in your comment. That is, I used TheBaseballGauge to pull season top 100+ in both Win Shares and bWAR for each season from 1960 through 2019. I made a plot of % pitchers in the top 100 players by WS and by bWAR:

https://i.imgur.com/037Lo5O.png
i.imgur.com/037Lo5O.png

Here is a plot of % of pitchers in top 10:

https://i.imgur.com/QCmdviB.png
i.imgur.com/QCmdviB.png

Anyway, I completely agree with your (and Guy23's) characterizations.

FWIW, Tango responded to Bill's post on his blog:
tangotiger.com/index.php/site/article/war-backgrounds-and-foregrounds
tangotiger.com/index.php/site/article/war-backgrounds-and-foregrounds

I posted a few addiotnal plots in those comments, as well as in the BJOL Reader posts.
10:31 AM Feb 24th

Guy123
Brock, you are exactly right about the way WS and WAR treat starting pitchers: 1) WS has always valued SP less, and 2) the gap between them has been growing over time, and is now enormous. And it affects not only about all-time great seasons (e.g. Verlander 2011), it's far broader than that. Interestingly, this has little if anything to do with WAR, which has valued SPs pretty consistently over time -- this is mainly because the share of Win Shares given to SPs has been shrinking steadily.

I looked at the top 20 and top 50 players in each system, for 1970-1990 and 2000-2019, which gives us a fuller look at how the metrics are working than focusing only on the #1 player each league-season:

WIN SHARES
Top 50 Players:
1946-69: 9 pitchers, 41 position (18% pitchers)
1970-99: 7 pitchers, 43 position (14%)
2000-19: 2 pit pitchers, 48 positions (4%) (CC #43, Verlander #48)

Top 20 Players:
1946-69: 2 pitchers, 18 position (10% pitchers)
1970-99: 1 pitcher (at #20), 19 position (5% pitchers)
2000-19: 0 pitchers, 20 position (0% pitchers)

So according to Win Shares, SPs make us a shrinking proportion. WAR tells a more consistent story:

WAR (B-Ref)
Top 50 Players:
1946-69: 15 pitchers, 35 position (30% pitchers)
1970-99: 16 pitchers, 34 position (32%)
2000-19: 17 pitchers, 33 positions (34%)

Top 20 Players:
1946-69: 4 pitchers, 16 position (20% pitchers)
1970-99: 7 pitchers, 13 position (35% pitchers)
2000-19: 8 pitchers, 12 position (40% pitchers)

According to Win Shares, essentially no pitcher is an elite player today. For example, Nick Markakis has more career Win Shares than *any* pitcher in the past 20 years (Verlander, CC, Kershaw, etc.).

I don't know enough about WS to provide a full diagnosis of why this is happening, but certainly part of the story is that as SP provide fewer IP and earn fewer official Wins, relievers are receiving a growing share of pitching Win Shares. Last year, for example, starters received only half (52%) of pitching WS, and only about 18% of all WS. If your system says that SPs provide only 18% of all value, then it's virtually impossible for a SP to emerge as the highest-rated player in the game.

9:30 AM Feb 24th

CharlesSaeger
Ignoring WAR and Win Shares, here’s something going back to the original idea that sabermetrics has changed MVP voting:

In the 89 years of the modern MVP award, 56 MVPs led or shared the lead in RBIs. Half of those—28—were in the 1960s, 1970s, and 1980s. It may well be that sabermetrics has ended that line of thinking, which looks like it started with Jackie Jensen in 1958 and ended around 1989-1990.
6:48 AM Feb 24th

Brock Hanke
I'm very glad I looked at this before I responded to a Hey Bill entry about the differences between WAR and Win Shares. I know one of the differences, but that's not a question, and doesn't belong in Hey Bill if I can avoid it. It belongs here.

On a different site, I'm involved in a project where we are trying to determine the best players of every season in MLB history, no matter how many teams or players. We do one season every month. The person who heads this project puts up a header every "season", with player names, Win Shares and WAR. I take this header, pull it into Word, make a table, sort it by Win Shares and print that, and then sort it by WAR and print that. This give me the ordinals for each of the systems for the seasons.

Comparing those ordinal lists, I have found out something that is likely important, and that is persistent.

1) WAR ranks pitchers, all of them, higher than Win Shares does, every season. 2) The gap between the systems has been growing, very steadily, since at least the 1940s.

Right now, it is excessively high; in the 40s, it wasn't nearly as bad. That gap increase is steady, doesn't dent for high-offense seasons or eras, or low-offense ones. You can see that in your lists above of high-disconnect seasons. In every one where one system chooses a pitcher as MVP and the other does not, the one that DOES is WAR. Win Shares never does this. It always favors the position player when this is a disconnect.

I do not know why this happens, but we have done enough of these "seasons" that I am sure that this is a real disconnect between the systems and a real growth in the gap. Maybe WAR ranks the Three True Outcomes higher than Win Shares. Maybe something else. But the disconnect is real, has been growing steadily since at least 1940, and is now, really, out of control. That is, at least, one good place to start looking for the causes of the disconnects. I hope it's helpful.
10:14 PM Feb 23rd

TJNawrocki
One thing I noticed is that WAR tends to throw out screwy results when nobody is having a good year. The Hank Aguirre year you cite, from 1962, is the lowest league-leading WAR ever, even including strike years. Mickey Mantle won the MVP for having a normal Mickey Mantle year except cut short by injury such that he only had 502 plate appearances.

The Watty Clark year you cite is the lowest league-leading WAR ever for the National League. Frankie Frisch won the MVP for hitting .311 with 4 homers in just 131 games.

10:03 PM Feb 23rd

shthar
I forgot what I was gonna say...

9:46 PM Feb 23rd

bjames
Sorry; I meant to open this up for comments several days ago. Forgot to do it.
8:59 PM Feb 23rd

Three Looks at the MVPs

COMMENTS (32 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: