BILL JAMES ONLINE

Jacob deGrom's Challenge with Context

July 1, 2021
 
Three pitcher seasons. You will probably recognize these seasons, even absent the names and years:
 
W-L
ERA
K
FIP
fWAR
27-9
1.73
317
2.07
9.0
24-4
1.54
268
2.13
8.9
10-9
1.76
269
1.99
9.1
 
Any guesses? One is recent, one dates back to Reagan’s years, and one was in the swinging 60’s. All Cy Young seasons:
 
Player
Year
W-L
ERA
K
FIP
fWAR
Sandy Koufax
1966
27-9
1.73
317
2.07
9.0
Doc  Gooden
1985
24-4
1.54
268
2.13
8.9
Jacob deGrom
2018
10-9
1.76
269
1.99
9.1
 
1966 was Koufax’s final, brilliant season, while 1985 and 2018 mark the years when Gooden and deGrom emerged as the once-and-current ace of the New York Metropolitans. Famous seasons.
 
For this table I’ve used FanGraph’s version of WAR, which contextualized pitchers using their Fielding-Independent Pitching (FIP) metric. That metric tries to understand a pitcher through their rate of strikeouts, walks, and homeruns relative to league average and park effects.
 
FanGraphs WAR tells us that these three pitchers were worth about nine wins more than a replacement-level player. That’s a great season, nine wins…nine wins gets a .500 team to contention. These are great, great seasons.
 
*            *            *
 
 
Hey: I’m sorry to come back to the subject of WAR. This will be a short article, and then I’ll stay away from it for a while. Please bear with me. Or bare whipped cream. Whichever your preference.
 
Onward.
 
*            *            *
 
 
Three more seasons. Different pitchers. Still famous, but perhaps 28% less famous than that other group. Similar eras as that other group. Direct competitors, all in the NL.
 
W-L
ERA
K
ERA+
bWAR
23-13
2.13
240
169
10.3
24-9
2.34
195
162
10.2
17-9
2.37
224
173
10.2
 
We’re using a different measure of WAR this time: Baseball-Reference’s version of WAR. The Baseball-Reference version considers the runs allowed by pitchers, instead of the strikeouts and walks and homers. Instead of listing their FIP, I’ve listed each pitcher’s Adjusted ERA (ERA+).
 
Baseball-Reference tends to rate pitchers a little higher than FanGraphs, so these seasons have a slightly higher WAR than the previous trio. That’s just the arithmetic adjusting: these guys probably weren’t quite as good as the previous three.
 
Names:
 
Player
Year
W-L
ERA
K
ERA+
bWAR
Juan Marichal
1965
23-13
2.13
240
169
10.3
Steve Carlton
1980
24-9
2.34
195
162
10.2
Aaron Nola
2018
17-9
2.37
224
173
10.2
 
Juan Marichal was the perennial bridesmaid to Koufax. Lefty Carlton won four Cy Young Awards…this was one of those, but it’s not his famous season (1972). Aaron Nola had a fine 2018 season that got kind of skipped over because of Jacob deGrom.
 
And bWAR…a different version of WAR…says that these are similarly valuable seasons. These pitchers were worth about ten wins to their team’s ledger, over a replacement-level pitcher.
 
Pretty, pretty good.
 
*            *            *
 
 
Looking at each trio again:
 
Player
FIP
fWAR
Player
ERA+
bWAR
Sandy Koufax
2.07
9.0
Juan Marichal
169
10.3
Doc Gooden
2.13
8.9
Steve Carlton
162
10.2
Jacob deGrom
1.99
9.1
Aaron Nola
173
10.2
 
Each group demonstrates similar levels of effectiveness, at least according to the metric that each website uses to calculate WAR. Koufax, Gooden, and deGrom have FIP rates that are very close, while the ERA+ for Marichal, Carlton, and Nola is about the same.
 
Pitch-for-pitch, inning-for-inning, these pitchers are equivalent in quality.
 
But the quantity of their performances is different. Vastly different.
 
First our FanGraphs trio:
 
Player
Year
IP
FIP
fWAR
Sandy Koufax
1966
323.0
2.07
9.0
Doc Gooden
1985
276.2
2.13
8.9
Jacob deGrom
2018
217.0
1.99
9.1
 
Sandy Koufax threw 106 more innings in 1966 than Jacob deGrom threw in 2018, the equivalent of a dozen extra complete games. Dwight Gooden was sixty innings ahead of deGrom. Given that their FIP is essentially the same, why is the difference in innings pitched not reflected in FanGraphs’ WAR?
 
The same thing happens with Baseball-Reference’s version of WAR:
 
Player
Year
IP
ERA+
bWAR
Juan Marichal
1965
295.1
169
10.3
Steve Carlton
1980
304.0
162
10.2
Aaron Nola
2018
212.0
173
10.2
 
Marichal and Carlton threw eighty more innings than Aaron Nola, all in seasons where their teams played 162 games. That’s nine additional complete games where the Giants had Marichal and the Phillies had Carlton and the Phillies had Ben Livery or Ranger Suarez.
 
Ranger Suarez: that is a great baseball name. He has an 0.93 ERA this year. Never noticed him.
 
How does this happen? How do you take pitchers with parallel quality but different quantity and call them even?
 
*            *            *
 
 
Here’s the surprising answer: WAR isn’t wrong.
 
I didn’t understand this a month ago, and I understand it now, and I’m writing because maybe it will help you understand something.
 
WAR isn’t wrong.
 
WAR - an innocent statistic that gets picked on a lot on this site – is trying to answer a specific question. Within the context of season X, how many wins was pitcher Y worth for his team over a player they could reasonably replace him with?
 
WAR gets the answer correct. Jacob deGrom, relative to his peers in 2018 and understood though the lens of Fielding-Independent pitching, is worth about 9 wins more than a replacement-level pitcher from 2018. Juan Marichal, relative to his peers in 1965 and understood through the metric of runs allowed, was worth about ten wins above a replacement-level player.
 
The issue is ‘relative to his peers.’
 
WAR is always taking the measure of a player within the contexts that he is playing in. WAR understands Jacob deGrom though the context of how starting pitchers are used in 2018 or 2021, and it calculates his value against his peers from those years. It does the same thing for Koufax or Gooden or Carlton.
 
That’s a problem.
 
*            *            *
 
 
Why?
 
It is a problem because a lot of us use WAR to understand players across contexts. We use WAR to tell us who had a better peak, or who was a greater player.
 
But different contexts allow for different possibilities of value. In a context where pitchers routinely throw 260 or 280 innings, that is the line an elite pitcher has to cross to start gaining value. If that line drops to 200 or 180 innings, the line to build value is closer.
 
Who had a greater peak, Pedro or Sandy? Four-year stretch:
 
Player
Years
W-L
ERA
ERA+
IP
Sandy
‘63-'66
97-27
1.86
172
1192
Pedro
‘99-'03
77-25
2.16
219
905
 
Pedro has the better relative ERA, but Koufax averaged seventy more innings per season. Who are you going to take?
 
Player
bWAR
Sandy
36.3
Pedro
37.7
 
WAR…Baseball-Reference’s version…tells us that Pedro was more valuable. The metric isn’t wrong:  relative to their respective peers, Pedro Martinez probably won more games for the Expos and Red Sox than Koufax won for the Dodgers.
 
But who was more valuable in reality? Who won more actual games for their team:the pitcher who was making 38 starts each season and completing most of them, or the pitcher who was averaging 30 starts a year and completing fewer than half? Who won more games: not games against replacement peers, but actual baseball games?
 
It would be Koufax. If your choice is a brilliant pitcher who throws 220 innings and a brilliant pitcher who throws 320 innings, you’re going to take the guy throwing 320 innings.
 
So what is happening?
 
What is happening – at least I think is happening– is that pitching WAR is giving modern starting pitchers all of the credit for the contexts that they’re pitching in, and none of the demerits for those contexts.
 
In the modern game, a starting pitcher like Jacob deGrom can come in and throw 100 miles-an-hour, because there is no expectation that he will go nine innings. He can pitch, in essence, like a closer: go hard, as long as you can, and leave the last two-thirds of the game to the bullpen.
 
That wasn’t the conditions Koufax operated within, and that wasn’t the conditions that Seaver operated within, and it wasn’t the conditions Greg Maddux operated within.
 
The WAR metric is neutral on that difference. That is absolutely appropriate: that is what the metric is aiming to understand.
 
But when we are considering players across contexts, it is crucial to consider the contexts that impact a player’s performance.
 
 
*            *            *
 
 
Jacob deGrom is having a fantastic season. But a big part of why his season is so fantastic is because of the current contexts that exist around starting pitchers. It is fantastic that he has a sub-1.00 ERA, but Bob Gibson tallied his 1.12 ERA in a season where he finished twenty-eight of his starts.
 
deGrom has a 0.69 ERA over 78 IP this season. That is treated as if he is occupying rare air, but Orel Hershiser had a 0.44 ERA over 82 IP in 1988. And Hershiser’s stretch, which closed his season, consisted of eight straight complete games and then a start where he threw ten scoreless innings. DeGrom has had just two starts of eight innings plus this year…and four starts where he's thrown five innings or less.
 
Jacob deGrom strikes out most of the batters he faced. The FanGraphs version of WAR gives him a great deal of credit for this, and then adjusts that strikeout rate to his peers. But all pitchers today have an advantage when it comes to racking up strikeouts, in that everyone pitches at full effort, all of the time. Adjusting for the context of deGrom’s peers doesn’t mitigate the advantages of the current context that he is pitching within.
 
That leads to broader flaws of interpretation. WAR wants to tell us that Jacob deGrom is on pace to have a historically great season as a starting pitcher, in a season where deGrom is going to struggle to cross 160 innings pitched. It is up to you to decide if you want to believe that a starting pitcher’s season where he throws just 160 innings can really rate as historically significant. I am not convinced.
 
But many, many people are convinced of that conclusion. That is one problem: we are giving Jacob deGrom all kinds of  credit for contexts he didn’t choose, and using that credit to jump to conclusions about him that are difficult to rationalize within the broader arc of the game’s history.
 
But the larger problem is that baseball – the whole game – is buying into the same conclusion.
 
Baseball thinks it is a good thingto have starting pitchers throw 100 mph out of the gate. Baseball is coming around to that being the new normal, the new standard. Baseball thinks the number of strikeouts a pitcher can rack up is a more important measure of a pitcher’s ability than the number of innings they can pitch. Baseball is fine with ace pitchers throwing five or six innings and then calling it a day.
 
That isn’t the fault of the metric. WAR didn’t rise up out of a spreadsheet and throttle the brains of every GM until they decided that paying a pitcher $30 million for 180 innings was a reasonable fiduciary decision. The metric just answered the question it is being asked to answer.
 
We’re the ones making the mistake, and we compound that mistake when we normalize something that shouldn’t be normal. Jacob deGrom is the best pitcher in baseball today, but the contexts of today make it impossible for his value to his team to in any way compare with the likes of Koufax or Seaver or Clemens or Pedro. It isn’t deGrom’s fault that pitchers throw six innings when they used to throw nine.
 
But perhaps he shouldn’t get the credit for that change, either.
 
 
David Fleming is a writer living in southwestern Virginia. He welcomes comments, questions, and suggestions here and at dfleming1986@yahoo.com.
 
    
 
 

COMMENTS (25 Comments, most recent shown first)

rjazzguy
…and yes, I know I left a quotation mark open. My mistake.
7:04 PM Aug 11th
 
rjazzguy
In the third-to-last paragraph of your sixth “section,” (assuming “sections are separated by asterisks) the correct grammar would be, “Those weren’t the conditions…”

Those that object to this correction: To hell with you; the dumbing down of America is due to such objections.
7:02 PM Aug 11th
 
pgups6
Thanks Dave, this is fantastic.

I know we all want one all encompassing number to evaluate talent but just like all statistics, we need to understand where it is derived and the context.

Relative to their peers, deGrom's 2018 may be "equal" to Koufax's 1966, but that doesn't mean deGrom's 2018 is Koufax's 1966. Pitching in 1966 compared to 2018 is pretty much an apples to oranges comparison. In Win Shares, Koufax had 35.1 and deGrom 20.3. Does that mean Koufax was almost twice as good (1.7) as deGrom. I don't think so. That's why we should take it all into account and not just look to WAR all the time.

2:26 PM Jul 8th
 
docfordock
Dave:

The basic components of bWAR are RA/9 and IP (with adjustments for team fielding, park, opposition quality, etc.). The basic components of fWAR are FIP and IP, again with adjustments. Both formulas already take account of IP - it's a core part of both of them.

Your objection is that for the purpose of cross-era comparisons, the formula isn't properly taking into account the different contexts of performance. That's true in a sense. The WAR framework can tell you that player X in 2018 contributed approximately 9 wins as against a construct of a "replacement player" in 2018 and that player Y in 1966 contributed 9 wins as against a 1966 replacement level construct. It can't tell you tthat the respective replacement level constructs for the two years are equivalent or that the "wins" it is measuring in 1966 or 2018 are of equal worth. There is an incommensurability across time.

So far, so good. But your argument goes further to say that (in the case of Koufax and deGrom) that Koufax's 1966 performance is more valuable because "pitching WAR is giving modern starting pitchers all of the credit for the contexts that they’re pitching in, and none of the demerits for those contexts.":

In the modern game, a starting pitcher like Jacob deGrom can come in and throw 100 miles-an-hour, because there is no expectation that he will go nine innings. He can pitch, in essence, like a closer: go hard, as long as you can, and leave the last two-thirds of the game to the bullpen.

That wasn’t the conditions Koufax operated within . . .


Let's assume that's true. In the WAR framework, that means that deGrom will have a relative advantage in terms of RA/9 or FIP but the disadvantage in terms of IP. The 6 inning start giveth and it taketh away. To the extent there is a RA or FIP advantage, it is not an advantage unique to deGrom- it is an advantage for anyone pitching under today's conditions. I.e. deGrom is being compared against other pitchers that turn the dial to 11 on every pitch, while Koufax is being compared to other pitchers that (supposedly) paced themselves in an effort to get complete games.

The difference in conditions doesn't give deGrom in advantage over Koufax, unless deGrom has some special personal ability to take advantage of those conditions as compared to Koufax's ability to take advantage of his. But if true - and I don't know what the basis would be for reaching that conclusion - that seems to be working by design. A player's ability to take advantage of their context to deliver outperformance that helps his team win games is one of things the WAR framework is attempting to capture.
9:56 AM Jul 8th
 
steve161
Dave, if I understand you correctly, you're saying that WAR reflects a pitcher's ability more than it does his value.

Now, if I remember correctly, Win Shares (the original system) assigns claim points to pitchers based, among other things, on runs allowed per inning pitched. Does this mean that it will better assess value than WAR?
9:06 AM Jul 8th
 
OBS2.0
It occurs to me that the same disdain we have for pitchers "Wins" has to be scrupulously scrubbed from all thoughts and metrics about pitching.
12:17 AM Jul 7th
 
DaveFleming
One point I'm trying to get across is that 'Wins Above a Replacement Player' does not automatically translate to 'wins contributed to a team.' The average saber-inclined fan now thinks that those parallel, but they don't.

If you take two pitchers who are identical in either their ability to prevent runs relative to league/park effects (ERA+), or identical in their rates of strikeouts/walks/HR to league averages (FIP), and one pitcher throws 200 innings while the other throws 250 innings, the pitcher who threw the additional 50 innings is contributing more to his team's success.

What makes those two pitchers appear 'equal' by which version of WAR you prefer isn't what they did, but what the people around them did. The metric adjusts their contributions for the contexts of how pitchers are utilized, because that's what the metric is designed to do.

But when we use that metric to argue that a pitcher throwing 200 innings at level X is equal to a pitcher throwing 250 innings at the same level, we're allowing context to be credited. In my opinion, that's a mistake.

It's a mistake that leads us to conclusions that are preposterous.

Right now, deGrom is credited with a pitching WAR of 4.8 at Fangraphs, over 85 IP. Doubling that, he can be expected to be credited with a 9.6 WAR over 170 IP.

Sandy Koufax threw 323 innings in 1966. The metric credits him with a pitching WAR of 9.1.

I'm not interested in the ethics of pitching 300 innings or 170 innings. What I want to ask is who contributed more to his team's ACTUAL wins and loses, Koufax or deGrom? Who, assuming deGrom stays on his current course, will have more impact?

A lot of saber-leaning fans will instinctively say it's deGrom, because that's what WAR says. That isn't the fault of the metric: it's the fault of people not really understanding what the structures of the metric are, and where the limits exist. deGrom is equivalent to Koufax in relation to their peers, but their seasons in no way parallel each other, any more than a good Mariano Rivera season parallels a good Greg Maddux season.
4:14 PM Jul 6th
 
docfordock
[i]But who was more valuable in reality? Who won more actual games for their team:the pitcher who was making 38 starts each season and completing most of them, or the pitcher who was averaging 30 starts a year and completing fewer than half? Who won more games: not games against replacement peers, but actual baseball games?]/i]

I don't really understand the question as framed. Individual pitchers don't win baseball games, teams do. Sandy Koufax didn't win any games in reality; the Dodgers did. All one can do is try to measure the individual's contribution to the teams ability to win. The WAR framework in one method for answering that question but it necessarily does so by examining player performance in the context in which he played. You can't evaluate what deGrom's contribution would be to the 1966 Dodgers because deGrom didn't play for the Dodgers in 1966.

The concept of replacement is abstract but to make it more concrete, in 1966 Koufax had the 27-9 record, 1.73 ERA in 323 IP. The next year he was out of baseball. He was replaced. His place in the rotation was taken by Bill Singer 12-8 2.64 ERA in 204 IP and Jim Brewer 5-4 2.68 ERA in 100 IP (neither had significant IP the prior years). That's close to a point of ERA difference; or about 36 ER over 323 IP.

deGrom pitched in 2019 so one can't do the same comparison but one can note that the 5-6 mets starters in 2018 had ERAs in the high-5s. I think it is reasonable to conclude that there was no likely easily available replacement for deGrom that would be posting a 2.70 ERA or anything close. So it isn't crazy to conclude his performance had equivalent value to his team.

We can make these kinds of comparisons all we want, but what we can't do is move Koufax to the 2018 NL and move deGrom to the 1966 NL. We can't resolve the inherent incommensurability of the different contexts, either because it simply isn't possible or the tools to do it haven't been developed yet.

I think the more interesting questions are:
1) In the present day - are the prevailing usage patterns really optimal or would there be some benefit from trying to extend out certain starters even it means pacing effort or bucking the 3rd time through the order effect?
2) Looking at past eras - can one argue that the usage patterns were suboptimal back then? Did managers/GM underrate the potential of relief pitchers at the time and leave some low hanging performance fruit unpicked?
11:32 PM Jul 5th
 
Doodles
Bravo!

3:41 AM Jul 4th
 
shinsplint
Regarding Marichal, Carlton, Nola---Nola only looked so high up on bWAR because of the apparently lousy defense he had to overcome that year. His RA9def for 2018 is -.61, which is pretty low. Carlton's in 1980 was +.02. Marichal's in 1965 was -.02. So basically bWAR figures he was much better than his ERA+ indicated. If so, then Nola was much better than Carlton or Marichal on a per-inning basis in regards to the years in question, which makes up for his much fewer innings. As 110Phil says, that RA9def for Nola is questionable, but that's a different issue than suggesting that bWAR is context adjusting for innings pitched.

I decided to look at deGrom/Koufax/Gooden for the same years as Dave, but this time in regards to bWAR instead of fWAR. As shown below deGrom IS lower than Koufax or Gooden for bWAR. deGrom's bWAR is roughly
proportionately less than Koufax and Gooden when considering his fewer innings. But bWAR thinks deGRom is better than his ERA+ indicates, given his (again apparently) worse fielding. Just eyeballing it, if anything it looks like deGrom is being shortchanged here.


year/inning/ERA+/RA9def/bWAR/pitcher

2018 217.0 218 -.38 08.0 deGrom
1966 323.0 190 -.07 10.3 Koufax
1985 276.2 229 +.01 12.2 Gooden

Bottom line, I don't see any evidence that bWAR is somehow context-adjusting in such a way that a pitcher gets value easier in these days of fewer innings pitched. Could be that fWAR is doing that, though.

I do, however, see a greater truth that Dave alludes to here that is exhibited by the Pedro vs. Sandy comparison. 2 great pitchers, but one throws 100 more innings a year. Pedro is solidly better in terms of runs allowed per inning, but there must be some level of diminishing returns for pitching a great game. Maybe Sandy won some games 4-2 where Pedro won 4-1. A win is a win, so pitching slightly less excellently may be penalized too much relative to the benefit received for pitching more innings.



2:27 PM Jul 3rd
 
110phil
Checking ... B-Ref assumes the average pitcher replacing Koufax would give up 3.83 R/9, after adjustments for league, park, role, and defense.

For deGrom, it's 4.69 R/9.

That's probably how deGrom catches up to Koufax in fewer innings. He's being credited with an extra 0.86 more R/9 than Koufax, for an identical ERA.

Oops ... just realized for Koufax/deGrom, you're using fWAR, not bWAR. But I bet it's the same reason, era/park mostly.

12:29 PM Jul 3rd
 
110phil
Dave:

But is that what WAR does? I was pretty sure it included raw playing time, not playing time relative to a player's peers. As I understand it, WAR has no idea how many innings other starters pitch, for either Koufax or deGrom.

I think WAR thinks deGrom was a much better pitcher *per inning*. Perhaps that's because of a higher run environment today, and/or Koufax pitching in Dodger Stadium?

Or I could be wrong about how WAR works. Perhaps someone else will chime in if so.
12:24 PM Jul 3rd
 
DaveFleming
110phil: the point is deGrom's quality wasn't 50% or 30% better than Koufax's. It was exactly the same.

deGrom's quality, adjusted against his peers and league contexts, was a FIP of 1.99. Koufax was 2.07. That's the same: they were equally good in terms of their quality of performance.

It is only in quantity where there is a gap....and it is a big gap.

The reason WAR doesn't acknowledge that gap is because deGrom was pitching in an era where the average start might pitch 180-220 innings, where in Koufax's era the average starter would pitch 280-300 innings. WAR is perfectly accurate in doing that, but...there is also the reality that Koufax was pitching 100 more innings than deGrom at the same level of quality.

WAR is, in essence, giving deGrom credit for the lesser work loads of his era. That's fine: that's what WAR is meant to do.

It's a very tricky thing to wrap the mind around: deGrom and Koufax were probably both worth 9 or 10 wins 'above a replacement player.'

But Koufax was worth more ACTUAL WINS to his team than deGrom, because a pitcher with a FIP of 2.00 over 320 innings has to provide more value to his team than a pitcher with a FIP of 2.00 over 220 innings.
12:01 PM Jul 3rd
 
laferrierelouis
Nice piece M.Fleming. Koufax was running marathons while deGroom is running 1500 meters. And we try to compare them: I agree with you, there a is a problem here.
9:19 AM Jul 3rd
 
OBS2.0
Nice piece D-Flem!
6:35 AM Jul 3rd
 
110phil
My understanding is that WAR measures how many more wins than a replacement player FOR THAT AMOUNT OF PLAYING TIME.

If DeGrom beat Koufax while pitching only 2/3 as many innings, it's because he was 50 percent better than Koufax (compared to replacement) in the innings he did pitch.

If that's the case, then what's the problem?

Yes, you'd think that deGrom would have a lower WAR if he was asked to pitch more innings, which is why they don't ask him to do that. And, maybe Koufax would have a lower WAR if he were able to pitch full-out in fewer innings like deGrom, which is why they didn't ask *him* to do that.

Or, it could be that they didn't ask either of them to do that because it wasn't normal for the times in which they pitched. But in that case, why come down on deGrom and not Koufax? It's just as likely that deGrom could pull a Koufax (if he has been asked to) than that Koufax could pull a deGrom (if he had been asked to). Your argument makes it sound like Koufax could have matched deGrom (pitching harder but less), but deGrom couldn't have matched Koufax.

----

(BTW, Aaron Nola's huge bWAR for 2018 in a middling number of innings is, I think, because he's given way too much credit for having a bad fielding team behind him. I wrote about that here: blog.philbirnbaum.com/2020/12/splitting-defensive-credit-between_29.html )
2:28 AM Jul 3rd
 
MWeddell
I read the article and then skimmed it a second time. I don't get the point; I don't get why you think there's a flaw in comparing starting pitchers' seasonal WAR.

The other commenters think this is a brilliant article, so maybe it's just me.
4:36 PM Jul 2nd
 
Gibbo1224
Great job Dave, this is something I have been going over in mind the last few years and just so happens I made comment to Dan Marks on Starting Pitchers ranked 21 thru 30. My comment was comparing Seaver to DeGrom and how you can win Cy Young awards with 10 and 11 wins. Seaver pitched for the Mets when they were a losing team but still won 20 or more games 5 times. The guys like Seaver control the outcome of the game by pitching 9 innings +/- while SP that pitch 6 innings are leaving the game in control of pitchers not as good as them. The most important thing here is to win the game. Your article really gives clarity here and much appreciated!!
9:52 AM Jul 2nd
 
snerze
Brilliant and very helpful. Why not pure WINS as extrapolated from the current WAR stat instead of comparing to replacement level or average (both which will unfairly make DeGrom look equal to Koufax's season)? if fWAR or bWAR both were fW and bW we would get a good comparison across eras, and then would be a fair counting stat.
6:45 AM Jul 2nd
 
MichaelPat
Here's another 9.3 WAR pitching season season (BRef war):
Rube Waddell 1905
27-10 1.48 ERA 181ERA+ 328 2/3 IP

Then, of course, we have Cy Young, with nine seasons of 9 WAR or more (four of them after 1900), maxing out at 13.9 in 1892 and 12.5 in 1901.
Obviously his value was far greater than anything accomplished by Koufax, Seaver or Clemens.
Koufax never pitched more than 335 innings in a season, a total Cy surpassed fourteen times. 14!!

Was it a 'mistake' to move away from using pitchers the way Young and Waddell were used?

No, the game changed.

If a clone of Waddell, Young, Koufax, Seaver, or Clemens were to appear today, do you really think there would be any possible chance they would be used the way they were when they pitched?

The game has changed, and the only fair way to evaluate pitchers is to compare them with their peers.

Yes, pitching more innings means you will contribute more wins (and more losses) to your team's total. So does that necessarily then mean that all the really great pitchers took to the mound before 1905?

In today's game, where a team will use thirty pitchers in a season (9 was the average in 1900, 15 in 1960, 19 in 1990) the expectations for how much any one pitcher should work are far different.

It used to be that guys who could throw 95 mph were few and far between.... Now nearly every team has twenty or more guys who can do that.

I don't have any problem with the idea that deGrom in 160 to 180 innings today could be the equal of Koufax throwing 300 in 1965 or Cty Young throwing 400 in 1894.



12:51 AM Jul 2nd
 
hortonwho
Excellent! You have articulated very well something that has been nagging at me for years!
9:51 PM Jul 1st
 
DaveFleming
Sorry...Elizabeth Warren?

Those numbers are Pedro's stats from 1997-2000, not 1999-2003. It is a four-year stretch (Pedro's best), but I called them the wrong years. The numbers are correct...just ignore the dates.
6:14 PM Jul 1st
 
doncoffin
Everything is within the context of the time in which it is being accomplished. For example, the average compensation for CEOs in the US in 1965 was $835,000. In 2016, 15,360,000. Let's adjust that for inflation...$835,000 is roughly equivalent to $6,500,000--about 40% as much as current CEOs get paid. Do CEOs today work 2 to 3 times as hard, or 2 to three times as effectively? Ultimately, the question is how do we determine the value of someone's performance.


I think it's plausible to argue that (back to baseball) there has been a reconsideration of what starting pitchers "ought" to do. Koufax, for example, was done at 30. (And Drysdale, effectively, at 31.) Gooden didn't have a lot of value after is age 29 season. Hershheiser made it to age 40, but his last season with more than 210 IP was at ae 30 (256 IP in 1989). Yes, there were people who pitched for a long time with heavy loads (Carlton pitched nearly 300 innings at age 38).

In a sense, then, the problem is how to place a value on pitcher performance that allows us to compare Koufax to DeGrom. I agree that WAR (of whatever flavor) does not do that job. And I don't have any suggestions.​
5:41 PM Jul 1st
 
elwarren
For your Pedro Martinez vs. Koufax comparison, '99-'03 represents a five year span, while '63-'66 represents a four year span. Did you mistype? Wanted to compare vs. Win Shares, then realized there may be an issue.
5:08 PM Jul 1st
 
BobGill
"... pitching WAR is giving modern starting pitchers all of the credit for the contexts that they’re pitching in, and none of the demerits for those contexts."

Yes! That sums it up perfectly. I've been trying to find the words to express this exact point for literally years, and now you've done it for me.
4:55 PM Jul 1st
 
 
© 2011 Be Jolly, Inc. All Rights Reserved.