On Valuing Closers as Hall of Fame Candidates
The sabermetric community places a very different value on Closers than does much of the baseball world. I am merely trying here to state what is obvious in a neutral, non-judgmental way. A good many from the traditional sportswriting world feel that Trevor Hoffman is an obvious Hall of Famer. In our community, we are unable to document the level of value in Hoffman’s career that would justify that.
In the recent Hall of Fame election Trevor Hoffman (Closer) received 67% of the votes although his career WAR according to Baseball Reference is only 28.4. Mike Mussina (Starter) received only 43% of the vote although his career WAR is almost three times as great (83.0).
Lee Smith (Closer) received 34% of the vote although his career WAR is only 29.6. Larry Walker (Outfielder) received only 15.5% of the vote although his career WAR was more than twice as great as Smith’s.
Billy Wagner (Closer) received 10.5% of the vote, enough to stay on the ballot, although his career WAR is only 28.1. Jim Edmonds (Outfielder) received less than one-fourth of Wagner’s vote percentage (2.5%) although his career WAR is twice as high as Wagner’s (60.3). In Fangraphs WAR, Edmonds is three times as valuable as Wagner.
Sportswriters talk about relief pitching as if every pennant race and almost every game revolved around the bullpen, but here is the problem. Trevor Hoffman in his last ten seasons in the major leagues (2001 to 2010) was credited with 330 saves—33 per season—but pitched an average of less than 51 innings per season. Mike Mussina in his last ten seasons in the majors (1999-2008) pitched just short of 200 innings per season (199.4). In his major league career, Mussina pitched 3,563 innings, and Hoffman pitched 1,089. For Hoffman to be more valuable than Mussina, then, he has to be three to four times more valuable per inning pitched.
We in the sabermetric community . . .I am averse to speaking for the town, but just trying here to state what could be agreed upon, I think. We in the sabermetric community agree that the innings pitched by closers have a disproportionate impact on the won-lost record of the team. One inning pitched by a Closer counts more than one inning at random pitched by a Starting Pitcher; we agree with that. What we disagree about is the extent. The "traditional" camp thinks the ratio is 3-1 or higher. We think it is lower.
I am not quarrelling here with the position of the sabermetric side. I agree with the position of the sabermetric side. If I were to be gifted with a Hall of Fame ballot, I would certainly vote for Mike Mussina and Tim Raines and the fellas before I would vote for any of Glenn Hoffman’s relatives.
However, while I agree with our side of the argument, there are a couple of dissenting points that I think should be made. First, while the argument against relievers having value comparable to other players may well be solid, it is not intuitively obvious, and I’m not actually sure that I understand all of it. If I don’t understand the mathematics behind an argument, it seems to me, it is not that likely that the traditional sportswriting world will understand it. And second, perhaps the discussion about the Hall of Fame should not, perhaps, be turned into a subset of WAR?
On the first point, look at it this way. Let us suppose that value for a pitcher consists in having an ERA below 5.00. Just saying. . .a pitcher with an ERA of 5.00 isn’t really worth very much. Mike Mussina in his career pitched 3,583 innings with an ERA of 3.68. That makes Mike Mussina 420 runs better than a pitcher with an ERA of 5.00. Trevor Hoffman pitched 1,089 innings in his career, with an ERA of 2.87. That makes him 227 runs better than a pitcher with an ERA of 5.00. Mussina 420, Hoffman 227.
Suppose, however, that Mussina has a leverage index of 1.0, and Hoffman a leverage index of 2.0. Then, applying the leverage, Hoffman would be ahead, 454 to 420.
Look, I understand generally what is supposed to be wrong with that math. It assumes that the replacement level for a pitcher is an ERA of 5.00, while Baseball Reference and Fangraphs both apparently believe that the actual replacement level for Mussina, given his league ERAs and the parks that he pitched in, was somewhat over 6.00. When you make the replacement level ERA higher, that works to the advantage of Mussina as opposed to Hoffman. But my point is, if the pro-Mussina, anti-Hoffman argument is not obvious to me, then it isn’t obvious. Expecting your typical baseball fan to get it is like ordering from McDonald’s and expecting to receive a culinary masterpiece.
This argument is also flawed because it assumes that the replacement-level ERA for Mussina is the same as it is for Hoffman, which is not true. Baseball Reference apparently feels (if I understand their math) that the Replacement Level for Mussina is an ERA over 6.00, which seems reasonable, while it (apparently) believes that the Replacement Level for Hoffman is in the fours, which seems. . .well, I’m not SURE that’s reasonable; maybe it is, maybe not. The American League ERA from 1991 to 2008 (Mussina’s years) was 4.54, while the National League ERA from 1993 to 2010 (Hoffman’s years) was 4.29, so there is that, and the park factors for San Diego were almost always in the 80s, so there’s that.
From my perspective, the WAR for Mussina versus Hoffman may be presumed to be more right than wrong, but it is not absolutely, totally, completely and perfectly RIGHT, either. There are issues. First, the replacement level is just an estimate. Mussina’s run advantage over Hoffman is much larger if you assume a replacement level (for both pitchers) of 6.00 than if you assume a replacement level of 5.00—and it may well be that the replacement level should be closer to 5.00 than to six.
"Replacement Level" is an important but imprecise concept. We have fallen into the habit of referring to Replacement Level as if this was a known constant in terms of winning percentage contribution, when in reality it is an unknown variable. Some times, some teams, the pitching is so good that nobody needs an average pitcher, whereas other times, other teams, there isn’t an average pitcher on the team, and if you could find somebody who had a .250 effective winning percentage, that would be super.
In a more sophisticated sabermetric analysis, we would recognize that Replacement Levels vary widely from team to team, and that sometimes they are around .300, but equally often they are over .350 and many times they are over .400. The Replacement Levels that the analysts are using now ARE too low IF THEY ARE CONSIDERED TO BE FIXED. I think they’re using a Replacement Level around .310. The real replacement level is USUALLY higher than that; it varies, but if you are using one figure to represent the whole world, .310 is too low. Most teams, most of the time, can find a player better than that.
Second issue, the "Leverage Index" applied to Hoffman’s role POSSIBLY, in my view, could be larger than the numbers used by current analysts.
Third issue, not sure about this because I don’t really understand other people’s methodologies, but some of the people who are figuring WAR may be adjusting the replacement level for Hoffman because he is a Closer. IF they are doing that—I don’t know whether they are or not—but if they are doing that, they shouldn’t be doing it, and they need to take that out of there. (I may have advocated this myself in the past. If I did, I was wrong.) It confuses the discussion. The leverage index has to be calculated based on normative performance. If you credit Hoffman with a "leverage index" but then lower his replacement-level ERA because he is a closer, all you are doing is giving him a break on one hand and then taking it away on the other. In other words, you’re confusing "leverage" with "replacement level", so that (if you are doing that) you’re not ACTUALLY crediting him with a leverage index, at all.
Fourth issue, some people who figure WAR may be (again, not certain who is doing what) but some people may be calculating value based not on actual runs allowed, but on formula estimates of the number of runs the pitcher person could have been expected to allow based on his peripheral numbers. Trevor Hoffman has a 2.87 career ERA, but 3.08 FIP (Fielder Independent Pitching) based on his strikeouts, walks, and home runs allowed, whereas Mike the Moose has a 3.68 career ERA but a 3.57 FIP based on his peripherals.
Again, questionable adjustment. Pitchers do lots of things that have SOME impact on their ERA, other than getting strikeouts and allowing walks, home runs and hit batsmen. They also:
a) Hold baserunners well or poorly,
b) Pick runners off,
c) Field their position,
d) Throw Wild Piches,
e) Induce ground balls, and
f) Pitch to the situation at a certain level.
When figuring data for a SEASON, the discrepancies between ERA and FIP are probably mostly random factors which the pitcher does not control, and we should probably prefer FIP to ERA. But when dealing with CAREER records of several hundred innings—and we are dealing with career records here—when dealing with career records, it is much more likely that the discrepancies between ERA and FIP are created by factors (a) through (f) above, and then the actual ERA is almost certainly the more instructive number. In a more sophisticated sabermetric analysis, we would rely more on FIP when dealing with small data groups, but we would rely more on actual ERA when the number of innings for an individual pitcher is larger.
There is also a little issue of ERAs of starters vs. relievers. . .when a starter and a reliever share an inning, the starter tends to get charged with the run allowed no matter what, which can queer the ERA. But that’s not a real issue for modern closers, because modern closers usually enter the game at the start of the inning. It’s a legitimate issue for pitchers before 1995, and it’s a legitimate issue for 7th- and 8th-inning pitchers.
I believe that 30 years from now, when sabermetric analysis is more sophisticated than it is now, our calculations will still show Mike Mussina as much more valuable than Trevor Hoffman. But I think it is possible, and indeed likely, that it will show a somewhat smaller difference than current analysis is showing.
Now, the second major issue; I raised two major issues at the start of the article. I know I am always doing first point and second point and third point and it is confusing; I’m sorry. Anyway, my second point was that the Hall of Fame discussion, perhaps, should not be treated a subset of the VALUE discussion, or the WAR discussion.
The Hall of Fame is not about value—I am just being the Devil’s Advocate here—the Hall of Fame is not about VALUE, it is about EXCELLENCE. Perhaps his managers should have assigned Trevor Hoffman a different role. Perhaps conventional wisdom placed Trevor Hoffman in a 55-inning role when it should have put him in a 90-inning role, and perhaps conventional wisdom had him protecting three-run leads because three-run leads will get you a save but are not actually a high-leverage situation. . .perhaps, perhaps. This isn’t about value, it is about excellence, and about historic standards. In the role that he was assigned, Trevor Hoffman had historic impact.
Here again, a fallacy. The Hall of Fame is (unarguably) about historic performance. But we really don’t know yet whether Trevor Hoffman’s 600 saves are or are not a historic performance. Closers have only been in their modern role, piling up 45 saves a year, since about 1990. It may be that, in 30 years, there will be 20 pitchers with 600 saves, and it may not be all that notable.
I think the sabermetric community is guilty of a confusing what is permanent—Hall of Fame selection—with what is temporary (our current best estimates of player’s value), but I also think the traditional sportswriting world is guilty of the same thing, in assuming that Trevor Hoffman’s 600 Saves are a historic accomplishment, when in fact they may not be. I’m not sure I would vote for Trevor Hoffman over Lee Smith or Billy Wagner; in fact, I think I probably wouldn’t. I probably would vote for Smith or Wagner first, despite the career Saves number.
But would the Hall of Fame be better, without any Closers? Would a baseball team be better without a Closer? I don’t think so. I think you have to have Closers in the Hall of Fame, regardless of what Baseball Reference thinks their WAR is. I don’t think you can make the Hall of Fame discussion into a wholly owned subsidiary of WAR.
Sophisticated measurements are temporary instruments. If they weren’t temporary instruments then we should all retire, because we would have nothing more to say to the baseball world at large. Just run the numbers, dude; that’s all we know. I don’t see it that way. I think that WAR (and other measurements) are useful tools that WE use to try to see the truth, but that when they start to dictate to us who we can vote for and who we can’t, then it’s time to bring out the Tasers and drive them back into their cages.
One reason that sportswriters believe in the super-importance of Saves (and Closers) is that they believe that certain types of losses are particularly damaging. You blow leads in the ninth inning, those losses undermine the confidence of the team, and cause the team to lose momentum.
This is not an inherently unreasonable thing to believe. In the sabermetric community we are not big fans of momentum, for two reasons. One, we’re skeptics; we tend to believe only in what we can document to be true, which is sometimes a nasty habit. Second, there have been thousands of studies of "momentum" in various ways, all or almost all of which have failed to establish that there is any value in momentum. A hot team has zero tendency to remain hot; a hot hitter is no more likely to get a hit next time up than a hitter who is in a slump.
Studies searching for momentum always fail, but this does not entitle us (on our side of the canyon) to dismiss out of hand a possibility such as "certain types of losses are particularly damaging." We have to assume that that could be true unless or until it is demonstrably shown to be false.
If you ask me, do I believe honestly that traditional sportswriters are exaggerating the importance of Saves by asserting things that are not true, such as that late-inning blown leads are particularly damaging? Yes, that is what I am inclined to believe. What I am NOT inclined to believe is that we have all of the issues completely figured out, and we should cancel the rest of the discussion right now. We don’t have the world all figured out, and we don’t have all the answers. I’m not voting for Trevor Hoffman for the Hall of Fame, but if somebody else wants to, it’s really not my place to pass judgment on his judgment.