A few weeks ago, Michael Scalfino at FiveThirtyEight
posted an article about Chris Sale’s generally great season, and his strike out rate. The headline of the article was "Chris Sale Still Has Nothing on Pedro." The sub-header read: "If Pedro Martinez was pitching today, he’d have even more strikeouts than the Red Sox current ace."
I do not, as a general practice, like to tell other people when they’re wrong. I am wrong a good portion of the time, and when I’m not wrong I’m probably only ever circling the drain of right-ness. All of us are endeavoring in good faith to understand a complex world, and I’m reluctant to dismiss the efforts of other souls wandering through the dark woods, trying to see the dawn.
Having said that, Mr. Scalfino’s article featured some really bad uses of statistics, and because he happened to be writing about a subject that’s been on my mind recently, I thought I should contribute my two cents.
Let’s start at the top.
Chris Sale circa 2017 isn’t as good as Pedro circa 1999. That’s the conclusion. Sale’s pretty dominant, but Pedro was greater.
The evidence that the writer uses to back up this claim is a table which shows the strikeout rates of individual pitchers, compared to the strikeout rates of their league. The data goes back to 1961. Sorry, Rube Waddell.
Here’s the table from the article:
Rank
|
Year
|
Pitcher
|
League K%
|
Pitcher K%
|
Difference
|
1
|
1999
|
Pedro Martinez
|
16.4
|
37.5
|
21.1
|
2
|
2001
|
Randy Johnson
|
17.3
|
37.4
|
20.1
|
3
|
2000
|
Pedro Martinez
|
16.5
|
34.8
|
18.3
|
4
|
2000
|
Randy Johnson
|
16.5
|
34.7
|
18.2
|
5
|
1995
|
Randy Johnson
|
16.2
|
33.9
|
17.7
|
6
|
1984
|
Dwight Gooden
|
14.0
|
31.4
|
17.4
|
7
|
1999
|
Randy Johnson
|
16.4
|
33.7
|
17.3
|
8
|
1997
|
Randy Johnson
|
17.1
|
34.2
|
17.1
|
9
|
1998
|
Kerry Wood
|
16.9
|
33.3
|
16.4
|
10
|
1989
|
Nolan Ryan
|
14.8
|
30.5
|
15.7
|
11
|
1998
|
Randy Johnson
|
16.9
|
32.4
|
15.5
|
12
|
2002
|
Randy Johnson
|
16.8
|
32.3
|
15.5
|
13
|
1987
|
Nolan Ryan
|
15.5
|
30.9
|
15.4
|
14
|
1997
|
Pedro Martinez
|
17.1
|
32.2
|
15.1
|
15
|
1962
|
Sandy Koufax
|
14.1
|
29
|
14.9
|
16
|
1976
|
Nolan Ryan
|
12.7
|
27.3
|
14.6
|
17
|
1973
|
Nolan Ryan
|
13.7
|
28.2
|
14.5
|
18
|
1997
|
Curt Schilling
|
17.1
|
31.6
|
14.5
|
19
|
1991
|
Nolan Ryan
|
15.2
|
29.7
|
14.5
|
20
|
2017
|
Chris Sale
|
21.6
|
36.1
|
14.5
|
21
|
1993
|
Randy Johnson
|
15.1
|
29.5
|
14.4
|
22
|
2002
|
Curt Schilling
|
16.8
|
31.1
|
14.3
|
23
|
2017
|
Corey Kluber
|
21.6
|
35.8
|
14.2
|
24
|
1979
|
J.R. Richard
|
12.5
|
26.6
|
14.1
|
25
|
2017
|
Max Scherzer
|
21.6
|
35.7
|
14.1
|
When I first glanced at this table, I thought, ‘Yeah, that looks about right.’
The chart has Pedro as the best. Randy Johnson is number two. No one can really take issue with those conclusion, right? It seems right. It passes our intuition…most of us would expect Randy and Pedro to be at the top of the list, and there they are. Looks alright to me.
But if you look closer than that, the whole list starts to crack. For one thing, the top of the list is really crowded by Pedro and Randy Johnson. Randy Johnson has half of the fourteen best seasons, which is surprising. I mean, Randy Johnson was a fine pitcher, but do we really believe that he is responsible for seven of the fourteen best strikeout seasons since 1961? Do you think Johnson’s seventh best strikeout season was better than Koufax’s best, or Gibson’s best, or Sam McDowell’s best?
And doesn’t it seem like a lot of these seasons seem clustered around the turn of the millennium? I mean, I like Kerry Wood just fine, but it’s weird that half of the top dozen seasons happened in 1998, 1999, and 2000. Shouldn’t we see a wider spread?
How come there’s just one season from the sixties, and two from the eighties? That’s a little strange, right?
And what the hell is up with Nolan Ryan? Do any of us seriously believe that Ryan’s best strikeout seasons were in 1989, when he was a billion years old, and 1987, when he was two years shy of a billion? What’s the problem with those Angels years?
And the math…. the math is just weird, right? I mean, the final tally uses subtraction to make its case. Who uses subtraction for anything? What kind of conclusions can be trusted by an equation that uses subtraction? Why doesn’t this table divide a pitcher’s strikeout rate by the league average? Wouldn’t that get us a more accurate ratio?
Of course it would.
Think it through. Who is a better strikeout pitcher: a guy who strikes out 20% of hitters when the league average is 10%, or the guy who strike out 9% percent of hitters when the league average is 3% percent?
Pitcher
|
League K%
|
Pitcher K%
|
Pitcher A
|
20%
|
10%
|
Pitcher B
|
9%
|
3%
|
If you just use subtraction, Pitcher A comes out on top. But Pitcher A’s strikeout rate doubles his league rate. Pitcher B has tripled his league average…he is the much better strikeout pitcher, at least relative to his league.
So let’s try that.
Here’s the same list of players from FiveThirtyEight, but instead of listing them by the difference between their strikeout rate and the league rate (x – y), we’re listing them by their adjusted rate: the individual player’s strikeout rate divided by the league rate. We then multiplied the answer by 100, to give us a nice positive integer. It’s essentially OPS+ for a pitcher’s strikeouts: 100 is average. 200 is twice the average.
Rank
|
Year
|
Pitcher
|
League K%
|
Pitcher K%
|
Adj. K%
|
1
|
1999
|
Pedro Martinez
|
16.4
|
37.5
|
229
|
2
|
1984
|
Dwight Gooden
|
14.0
|
31.4
|
224
|
3
|
2001
|
Randy Johnson
|
17.3
|
37.4
|
216
|
4
|
1976
|
Nolan Ryan
|
12.7
|
27.3
|
215
|
5
|
1979
|
J.R. Richard
|
12.5
|
26.6
|
213
|
6
|
2000
|
Pedro Martinez
|
16.5
|
34.8
|
211
|
7
|
2000
|
Randy Johnson
|
16.5
|
34.7
|
210
|
8
|
1995
|
Randy Johnson
|
16.2
|
33.9
|
209
|
9
|
1989
|
Nolan Ryan
|
14.8
|
30.5
|
206
|
10
|
1973
|
Nolan Ryan
|
13.7
|
28.2
|
206
|
11
|
1962
|
Sandy Koufax
|
14.1
|
29.0
|
206
|
12
|
1999
|
Randy Johnson
|
16.4
|
33.7
|
205
|
13
|
1997
|
Randy Johnson
|
17.1
|
34.2
|
200
|
14
|
1987
|
Nolan Ryan
|
15.5
|
30.9
|
199
|
15
|
1998
|
Kerry Wood
|
16.9
|
33.3
|
197
|
16
|
1991
|
Nolan Ryan
|
15.2
|
29.7
|
195
|
17
|
1993
|
Randy Johnson
|
15.1
|
29.5
|
195
|
18
|
2002
|
Randy Johnson
|
16.8
|
32.3
|
192
|
19
|
1998
|
Randy Johnson
|
16.9
|
32.4
|
192
|
20
|
1997
|
Pedro Martinez
|
17.1
|
32.2
|
188
|
21
|
2002
|
Curt Schilling
|
16.8
|
31.1
|
185
|
22
|
1997
|
Curt Schilling
|
17.1
|
31.6
|
185
|
23
|
2017
|
Chris Sale
|
21.6
|
36.1
|
167
|
24
|
2017
|
Corey Kluber
|
21.6
|
35.8
|
166
|
25
|
2017
|
Max Scherzer
|
21.6
|
35.7
|
165
|
That’s a better list, right?
Pedro is still the best, but the number two guy changes. Instead of Randy Johnson taking the second slot, Dwight Gooden’s 1984 season charts as the second-best strikeout season from the twenty-five players listed. And while Randy Johnson still gets a lot of credit for his strikeout seasons, some other pitchers jump up on the list. J.R. Richards gets a nice boost. Sandy Koufax clips the top of the list, which is exactly right. Kerry Wood declines a bit. And Nolan Ryan gets jumbled around: his 1976 season with the Angels rates as his best strikeout year, and 1973 slips in ahead of 1987.
The second list seems righter. It doesn’t tilt so favorably to Pedro and Randy. It gives Sandy his due.
But it’s not right either.
* * *
So Mr. Scalfino subtracted when he should have divided. That’s not a big deal: I’m sure I’ve made sillier mistakes in my own efforts to understand baseball. I’m not trying to take any kind of a cheap shot at him. I'm just trying to think this through a bit.
What interested me about this article is that Mr. Scalfino was using Chris Sale’s brilliant season to approach a subject that I’ve been thinking about a lot lately: how do we measure strikeout ability of pitchers across generations?
Let’s look at the top-five of the list again:
Rank
|
Year
|
Pitcher
|
League K%
|
Pitcher K%
|
Adj. K%
|
1
|
1999
|
Pedro Martinez
|
16.4
|
37.5
|
229
|
2
|
1984
|
Dwight Gooden
|
14.0
|
31.4
|
224
|
3
|
2001
|
Randy Johnson
|
17.3
|
37.4
|
216
|
4
|
1976
|
Nolan Ryan
|
12.7
|
27.3
|
215
|
5
|
1979
|
J.R. Richard
|
12.5
|
26.6
|
213
|
That is a good list. It’s very tempting to rate individual strikeout seasons by Adjusted Strikeout Rate. It’s a fine metric, and you can’t argue with the results too much. Peak Pedro was as good a pitcher as I’ll ever see. Dwight Gooden was pretty amazing. Randy Johnson and Nolan Ryan have struck out more hitters than any other pitcher in baseball history. It’s a good list.
But Pedro’s 229 mark isn’t the best in history. It isn’t close to the best.
Consider this guy’s six-year run:
Year
|
Pitcher K%
|
Lg. K%
|
Adj K%
|
x
|
x
|
x
|
235
|
x
|
x
|
x
|
298
|
x
|
x
|
x
|
287
|
x
|
x
|
x
|
276
|
x
|
x
|
x
|
223
|
x
|
x
|
x
|
251
|
I’ve left the year and strikeout rates empty for a moment, just to cultivate a sense of suspense.
This pitcher doesn’t just do better than Peak Pedro when it comes to his adjusted strikeout rate…he kicks his ass. Over these six years, our unidentified pitcher averaged an Adjusted Strikeout Percentage of 258.
Is this guy better at striking out hitters than Peak Pedro Martinez? According to his Adjusted Strikeout Percentage, he certainly is. But I'm not absolutely convinced by this conclusion either.
Let’s flush out the numbers:
Year
|
Pitcher K%
|
Lg. K%
|
Adj K%
|
1923
|
16.6%
|
7.1%
|
235
|
1924
|
21.5%
|
7.2%
|
298
|
1925
|
20.3%
|
7.1%
|
287
|
1926
|
19.6%
|
7.1%
|
276
|
1927
|
16.4%
|
7.4%
|
223
|
1928
|
17.8%
|
7.1%
|
251
|
Total
|
18.6%
|
7.2%
|
258
|
The pitcher, of course, is Brooklyn Robin’s ace Dazzy Vance, who dominated the National League during the Roaring Twenties.
Is Dazzy Vance a greater strikeout pitcher than Pedro Martinez? If we were to judge them by their Adusted Strikeout Percentage, Vance would seem the better strikeout pitcher. After all, Vance nearly tripled the league average in 1924. Pedro never came close to tripling that ratio.
If you think that through for a just a second, you’ll realize the problem with simply ranking pitchers by their relative strikeout percentages.
- Dazzy Vance, in 1924, would have had had to strike out 21.6% of batters he faced to triple the league average. He just missed that total.
- Pedro Martinez, in 1999, would have had to strike out 49.2% of the batters he faced to triple the mark. That’s a massive amount of strikeouts. That’s every other batter.
The problem with a metric like Adjusted Strikeout Percentage is that the possible strikeout percentage has an upward limit. As the overall percentage of strikeouts rises, the game gets closer and closer to that upward limit.
Looking at our table of pitchers since 1961 again:
Rank
|
Year
|
Pitcher
|
League K%
|
Pitcher K%
|
Adj. K%
|
1
|
1999
|
Pedro Martinez
|
16.4
|
37.5
|
229
|
2
|
1984
|
Dwight Gooden
|
14.0
|
31.4
|
224
|
3
|
2001
|
Randy Johnson
|
17.3
|
37.4
|
216
|
4
|
1976
|
Nolan Ryan
|
12.7
|
27.3
|
215
|
5
|
1979
|
J.R. Richard
|
12.5
|
26.6
|
213
|
6
|
2000
|
Pedro Martinez
|
16.5
|
34.8
|
211
|
7
|
2000
|
Randy Johnson
|
16.5
|
34.7
|
210
|
8
|
1995
|
Randy Johnson
|
16.2
|
33.9
|
209
|
9
|
1989
|
Nolan Ryan
|
14.8
|
30.5
|
206
|
10
|
1973
|
Nolan Ryan
|
13.7
|
28.2
|
206
|
11
|
1962
|
Sandy Koufax
|
14.1
|
29.0
|
206
|
12
|
1999
|
Randy Johnson
|
16.4
|
33.7
|
205
|
13
|
1997
|
Randy Johnson
|
17.1
|
34.2
|
200
|
14
|
1987
|
Nolan Ryan
|
15.5
|
30.9
|
199
|
15
|
1998
|
Kerry Wood
|
16.9
|
33.3
|
197
|
16
|
1991
|
Nolan Ryan
|
15.2
|
29.7
|
195
|
17
|
1993
|
Randy Johnson
|
15.1
|
29.5
|
195
|
18
|
2002
|
Randy Johnson
|
16.8
|
32.3
|
192
|
19
|
1998
|
Randy Johnson
|
16.9
|
32.4
|
192
|
20
|
1997
|
Pedro Martinez
|
17.1
|
32.2
|
188
|
21
|
2002
|
Curt Schilling
|
16.8
|
31.1
|
185
|
22
|
1997
|
Curt Schilling
|
17.1
|
31.6
|
185
|
23
|
2017
|
Chris Sale
|
21.6
|
36.1
|
167
|
24
|
2017
|
Corey Kluber
|
21.6
|
35.8
|
166
|
25
|
2017
|
Max Scherzer
|
21.6
|
35.7
|
165
|
The three active pitchers on this list are all at the bottom, despite having Pedro-level strikeout percentages. This happens because they are pitching in an era of historically high strikeout rates: they’re at the same altitude as Pedro, but the water has risen around them.
Dazzy Vance, the greatest strikeout pitcher of his era, could imagine tripling the league strikeout rate. I doubt Vance did think about a metric so esoteric as that, all the way back in 1924, but he certainly could have. It wasn’t out of the realm of possibility.
But it’s out of the realm of today’s possible. Chris Sale or Corey Kluber or Max Scherzer can’t think about tripling the league strikeout percentage, because that would mean they’d be striking out two-thirds of the batters they faced. Not the batters they manage to get out…all of the batters they face.
And they can’t really approach doubling the strikeout rate. Chris Sale strikes out a lot of hitters, but on the nights when he strikes out 12 or 13 hitters, he can expect the guys on the other side to strike out six or seven of his teammates.
He can’t do that. It’s not impossible, but it is highly, highly improbable.
And as the strikeout rate rises, it will become impossible.
So a metric like Adjusted Strikeout Percentage doesn’t give a fair shake to active pitchers, and it probably overrates the guys who notched a lot of strikeouts in low-whiff eras. Certainly, Dazzy Vance has to rate as one of the greatest strikeout pitchers of all-time, but I am not at all certain that he was the greatest.
So what could work?
* * *
This is where my limited math abilities come crashing into a wall of ignorance: I have no idea how to calculate this. I’m asking for your help.
What I’d like to know is how to adjust for league-relative strikeout rate and for the rate’s proximity to it’s absolute limit. I understand how to do the first part, but I have no idea how to do the second part.
One thought I had is that we might be better off talking about Strikeouts-Per-Nine-Innings Pitched (K/9), instead of Strikeout Percentage (K%).
Why would that be better?
Well…let’s think about Chris Sale.
Chris Sale has a Strikeout Percentage of 37% this year…he’s about a third of the way to the absolute limit of that metric (100%).
But Sale’s Strikeouts-Per-Nine rate is at 12.9, which is nearly halfway to the limit of that metric…you can only strike out twenty-seven hitters every nine innings pitched…unless your catcher misses a few third strikes.
Which one is the more accurate ceiling? Is it more accurate to judge a pitcher on how he does against every batter he faced (that’d be K%), or should we understand them based on the outs they generate (the K/9 metric)?
I don’t know. Honestly, I have no idea which is better.
I think that a metric based on K% would give less credit to the ‘ceiling’ problem than a metric based on K’s/9 IP, but that’s just a guess. I think the best strategy would be to try both, and see which one works better….see which one comes up with a list that seems more accurate, one that shows less biases towards specific generations.
So that’s the challenge for you guys: how do we adjust for the ceiling of strikeouts? How do we adjust for the fact that Vance could triple his league’s strikeout percentage, while Pedro could double the percentage in his league, while Chris Sale, pitching in the highest strikeout era in the game’s history, can only hit about 1.5% of the league average? How do we factor that in?
I don't know. But until we do, I don't think that we can say with any certainty that 1999 Pedro Martinez was a more dominant strikeout pitcher than Chris Sale.
So c’mon math majors. Help us figure this out.
Dave Fleming is a writer living in western Virginia. He welcomes comments, questions, and math lessons here and at dfleming1986@yahoo.com.