Three pitcher seasons. You will probably recognize these seasons, even absent the names and years:
W-L
|
ERA
|
K
|
FIP
|
fWAR
|
27-9
|
1.73
|
317
|
2.07
|
9.0
|
24-4
|
1.54
|
268
|
2.13
|
8.9
|
10-9
|
1.76
|
269
|
1.99
|
9.1
|
Any guesses? One is recent, one dates back to Reagan’s years, and one was in the swinging 60’s. All Cy Young seasons:
Player
|
Year
|
W-L
|
ERA
|
K
|
FIP
|
fWAR
|
Sandy Koufax
|
1966
|
27-9
|
1.73
|
317
|
2.07
|
9.0
|
Doc Gooden
|
1985
|
24-4
|
1.54
|
268
|
2.13
|
8.9
|
Jacob deGrom
|
2018
|
10-9
|
1.76
|
269
|
1.99
|
9.1
|
1966 was Koufax’s final, brilliant season, while 1985 and 2018 mark the years when Gooden and deGrom emerged as the once-and-current ace of the New York Metropolitans. Famous seasons.
For this table I’ve used FanGraph’s version of WAR, which contextualized pitchers using their Fielding-Independent Pitching (FIP) metric. That metric tries to understand a pitcher through their rate of strikeouts, walks, and homeruns relative to league average and park effects.
FanGraphs WAR tells us that these three pitchers were worth about nine wins more than a replacement-level player. That’s a great season, nine wins…nine wins gets a .500 team to contention. These are great, great seasons.
* * *
Hey: I’m sorry to come back to the subject of WAR. This will be a short article, and then I’ll stay away from it for a while. Please bear with me. Or bare whipped cream. Whichever your preference.
Onward.
* * *
Three more seasons. Different pitchers. Still famous, but perhaps 28% less famous than that other group. Similar eras as that other group. Direct competitors, all in the NL.
W-L
|
ERA
|
K
|
ERA+
|
bWAR
|
23-13
|
2.13
|
240
|
169
|
10.3
|
24-9
|
2.34
|
195
|
162
|
10.2
|
17-9
|
2.37
|
224
|
173
|
10.2
|
We’re using a different measure of WAR this time: Baseball-Reference’s version of WAR. The Baseball-Reference version considers the runs allowed by pitchers, instead of the strikeouts and walks and homers. Instead of listing their FIP, I’ve listed each pitcher’s Adjusted ERA (ERA+).
Baseball-Reference tends to rate pitchers a little higher than FanGraphs, so these seasons have a slightly higher WAR than the previous trio. That’s just the arithmetic adjusting: these guys probably weren’t quite as good as the previous three.
Names:
Player
|
Year
|
W-L
|
ERA
|
K
|
ERA+
|
bWAR
|
Juan Marichal
|
1965
|
23-13
|
2.13
|
240
|
169
|
10.3
|
Steve Carlton
|
1980
|
24-9
|
2.34
|
195
|
162
|
10.2
|
Aaron Nola
|
2018
|
17-9
|
2.37
|
224
|
173
|
10.2
|
Juan Marichal was the perennial bridesmaid to Koufax. Lefty Carlton won four Cy Young Awards…this was one of those, but it’s not his famous season (1972). Aaron Nola had a fine 2018 season that got kind of skipped over because of Jacob deGrom.
And bWAR…a different version of WAR…says that these are similarly valuable seasons. These pitchers were worth about ten wins to their team’s ledger, over a replacement-level pitcher.
Pretty, pretty good.
* * *
Looking at each trio again:
Player
|
FIP
|
fWAR
|
Player
|
ERA+
|
bWAR
|
Sandy Koufax
|
2.07
|
9.0
|
Juan Marichal
|
169
|
10.3
|
Doc Gooden
|
2.13
|
8.9
|
Steve Carlton
|
162
|
10.2
|
Jacob deGrom
|
1.99
|
9.1
|
Aaron Nola
|
173
|
10.2
|
Each group demonstrates similar levels of effectiveness, at least according to the metric that each website uses to calculate WAR. Koufax, Gooden, and deGrom have FIP rates that are very close, while the ERA+ for Marichal, Carlton, and Nola is about the same.
Pitch-for-pitch, inning-for-inning, these pitchers are equivalent in quality.
But the quantity of their performances is different. Vastly different.
First our FanGraphs trio:
Player
|
Year
|
IP
|
FIP
|
fWAR
|
Sandy Koufax
|
1966
|
323.0
|
2.07
|
9.0
|
Doc Gooden
|
1985
|
276.2
|
2.13
|
8.9
|
Jacob deGrom
|
2018
|
217.0
|
1.99
|
9.1
|
Sandy Koufax threw 106 more innings in 1966 than Jacob deGrom threw in 2018, the equivalent of a dozen extra complete games. Dwight Gooden was sixty innings ahead of deGrom. Given that their FIP is essentially the same, why is the difference in innings pitched not reflected in FanGraphs’ WAR?
The same thing happens with Baseball-Reference’s version of WAR:
Player
|
Year
|
IP
|
ERA+
|
bWAR
|
Juan Marichal
|
1965
|
295.1
|
169
|
10.3
|
Steve Carlton
|
1980
|
304.0
|
162
|
10.2
|
Aaron Nola
|
2018
|
212.0
|
173
|
10.2
|
Marichal and Carlton threw eighty more innings than Aaron Nola, all in seasons where their teams played 162 games. That’s nine additional complete games where the Giants had Marichal and the Phillies had Carlton and the Phillies had Ben Livery or Ranger Suarez.
Ranger Suarez: that is a great baseball name. He has an 0.93 ERA this year. Never noticed him.
How does this happen? How do you take pitchers with parallel quality but different quantity and call them even?
* * *
Here’s the surprising answer: WAR isn’t wrong.
I didn’t understand this a month ago, and I understand it now, and I’m writing because maybe it will help you understand something.
WAR isn’t wrong.
WAR - an innocent statistic that gets picked on a lot on this site – is trying to answer a specific question. Within the context of season X, how many wins was pitcher Y worth for his team over a player they could reasonably replace him with?
WAR gets the answer correct. Jacob deGrom, relative to his peers in 2018 and understood though the lens of Fielding-Independent pitching, is worth about 9 wins more than a replacement-level pitcher from 2018. Juan Marichal, relative to his peers in 1965 and understood through the metric of runs allowed, was worth about ten wins above a replacement-level player.
The issue is ‘relative to his peers.’
WAR is always taking the measure of a player within the contexts that he is playing in. WAR understands Jacob deGrom though the context of how starting pitchers are used in 2018 or 2021, and it calculates his value against his peers from those years. It does the same thing for Koufax or Gooden or Carlton.
That’s a problem.
* * *
Why?
It is a problem because a lot of us use WAR to understand players across contexts. We use WAR to tell us who had a better peak, or who was a greater player.
But different contexts allow for different possibilities of value. In a context where pitchers routinely throw 260 or 280 innings, that is the line an elite pitcher has to cross to start gaining value. If that line drops to 200 or 180 innings, the line to build value is closer.
Who had a greater peak, Pedro or Sandy? Four-year stretch:
Player
|
Years
|
W-L
|
ERA
|
ERA+
|
IP
|
Sandy
|
‘63-'66
|
97-27
|
1.86
|
172
|
1192
|
Pedro
|
‘99-'03
|
77-25
|
2.16
|
219
|
905
|
Pedro has the better relative ERA, but Koufax averaged seventy more innings per season. Who are you going to take?
Player
|
bWAR
|
Sandy
|
36.3
|
Pedro
|
37.7
|
WAR…Baseball-Reference’s version…tells us that Pedro was more valuable. The metric isn’t wrong: relative to their respective peers, Pedro Martinez probably won more games for the Expos and Red Sox than Koufax won for the Dodgers.
But who was more valuable in reality? Who won more actual games for their team:the pitcher who was making 38 starts each season and completing most of them, or the pitcher who was averaging 30 starts a year and completing fewer than half? Who won more games: not games against replacement peers, but actual baseball games?
It would be Koufax. If your choice is a brilliant pitcher who throws 220 innings and a brilliant pitcher who throws 320 innings, you’re going to take the guy throwing 320 innings.
So what is happening?
What is happening – at least I think is happening– is that pitching WAR is giving modern starting pitchers all of the credit for the contexts that they’re pitching in, and none of the demerits for those contexts.
In the modern game, a starting pitcher like Jacob deGrom can come in and throw 100 miles-an-hour, because there is no expectation that he will go nine innings. He can pitch, in essence, like a closer: go hard, as long as you can, and leave the last two-thirds of the game to the bullpen.
That wasn’t the conditions Koufax operated within, and that wasn’t the conditions that Seaver operated within, and it wasn’t the conditions Greg Maddux operated within.
The WAR metric is neutral on that difference. That is absolutely appropriate: that is what the metric is aiming to understand.
But when we are considering players across contexts, it is crucial to consider the contexts that impact a player’s performance.
* * *
Jacob deGrom is having a fantastic season. But a big part of why his season is so fantastic is because of the current contexts that exist around starting pitchers. It is fantastic that he has a sub-1.00 ERA, but Bob Gibson tallied his 1.12 ERA in a season where he finished twenty-eight of his starts.
deGrom has a 0.69 ERA over 78 IP this season. That is treated as if he is occupying rare air, but Orel Hershiser had a 0.44 ERA over 82 IP in 1988. And Hershiser’s stretch, which closed his season, consisted of eight straight complete games and then a start where he threw ten scoreless innings. DeGrom has had just two starts of eight innings plus this year…and four starts where he's thrown five innings or less.
Jacob deGrom strikes out most of the batters he faced. The FanGraphs version of WAR gives him a great deal of credit for this, and then adjusts that strikeout rate to his peers. But all pitchers today have an advantage when it comes to racking up strikeouts, in that everyone pitches at full effort, all of the time. Adjusting for the context of deGrom’s peers doesn’t mitigate the advantages of the current context that he is pitching within.
That leads to broader flaws of interpretation. WAR wants to tell us that Jacob deGrom is on pace to have a historically great season as a starting pitcher, in a season where deGrom is going to struggle to cross 160 innings pitched. It is up to you to decide if you want to believe that a starting pitcher’s season where he throws just 160 innings can really rate as historically significant. I am not convinced.
But many, many people are convinced of that conclusion. That is one problem: we are giving Jacob deGrom all kinds of credit for contexts he didn’t choose, and using that credit to jump to conclusions about him that are difficult to rationalize within the broader arc of the game’s history.
But the larger problem is that baseball – the whole game – is buying into the same conclusion.
Baseball thinks it is a good thingto have starting pitchers throw 100 mph out of the gate. Baseball is coming around to that being the new normal, the new standard. Baseball thinks the number of strikeouts a pitcher can rack up is a more important measure of a pitcher’s ability than the number of innings they can pitch. Baseball is fine with ace pitchers throwing five or six innings and then calling it a day.
That isn’t the fault of the metric. WAR didn’t rise up out of a spreadsheet and throttle the brains of every GM until they decided that paying a pitcher $30 million for 180 innings was a reasonable fiduciary decision. The metric just answered the question it is being asked to answer.
We’re the ones making the mistake, and we compound that mistake when we normalize something that shouldn’t be normal. Jacob deGrom is the best pitcher in baseball today, but the contexts of today make it impossible for his value to his team to in any way compare with the likes of Koufax or Seaver or Clemens or Pedro. It isn’t deGrom’s fault that pitchers throw six innings when they used to throw nine.
But perhaps he shouldn’t get the credit for that change, either.
David Fleming is a writer living in southwestern Virginia. He welcomes comments, questions, and suggestions here and at dfleming1986@yahoo.com.