Chris Sale Chases Pedro

August 18, 2017
 
A few weeks ago, Michael Scalfino at FiveThirtyEight posted an article about Chris Sale’s generally great season, and his strike out rate. The headline of the article was "Chris Sale Still Has Nothing on Pedro." The sub-header read: "If Pedro Martinez was pitching today, he’d have even more strikeouts than the Red Sox current ace."
 
I do not, as a general practice, like to tell other people when they’re wrong. I am wrong a good portion of the time, and when I’m not wrong I’m probably only ever circling the drain of right-ness. All of us are endeavoring in good faith to understand a complex world, and I’m reluctant to dismiss the efforts of other souls wandering through the dark woods, trying to see the dawn.
 
Having said that, Mr. Scalfino’s article featured some really bad uses of statistics, and because he happened to be writing about a subject that’s been on my mind recently, I thought I should contribute my two cents.
 
Let’s start at the top.
 
Chris Sale circa 2017 isn’t as good as Pedro circa 1999. That’s the conclusion. Sale’s pretty dominant, but Pedro was greater.
 
The evidence that the writer uses to back up this claim is a table which shows the strikeout rates of individual pitchers, compared to the strikeout rates of their league. The data goes back to 1961. Sorry, Rube Waddell.
 
Here’s the table from the article:
 
Rank
Year
Pitcher
League K%
Pitcher K%
Difference
1
1999
Pedro Martinez
16.4
37.5
21.1
2
2001
Randy Johnson
17.3
37.4
20.1
3
2000
Pedro Martinez
16.5
34.8
18.3
4
2000
Randy Johnson
16.5
34.7
18.2
5
1995
Randy Johnson
16.2
33.9
17.7
6
1984
Dwight Gooden
14.0
31.4
17.4
7
1999
Randy Johnson
16.4
33.7
17.3
8
1997
Randy Johnson
17.1
34.2
17.1
9
1998
Kerry Wood
16.9
33.3
16.4
10
1989
Nolan Ryan
14.8
30.5
15.7
11
1998
Randy Johnson
16.9
32.4
15.5
12
2002
Randy Johnson
16.8
32.3
15.5
13
1987
Nolan Ryan
15.5
30.9
15.4
14
1997
Pedro Martinez
17.1
32.2
15.1
15
1962
Sandy Koufax
14.1
29
14.9
16
1976
Nolan Ryan
12.7
27.3
14.6
17
1973
Nolan Ryan
13.7
28.2
14.5
18
1997
Curt Schilling
17.1
31.6
14.5
19
1991
Nolan Ryan
15.2
29.7
14.5
20
2017
Chris Sale
21.6
36.1
14.5
21
1993
Randy Johnson
15.1
29.5
14.4
22
2002
Curt Schilling
16.8
31.1
14.3
23
2017
Corey Kluber
21.6
35.8
14.2
24
1979
J.R. Richard
12.5
26.6
14.1
25
2017
Max Scherzer
21.6
35.7
14.1
 
When I first glanced at this table, I thought, ‘Yeah, that looks about right.’
 
The chart has Pedro as the best. Randy Johnson is number two. No one can really take issue with those conclusion, right? It seems right. It passes our intuition…most of us would expect Randy and Pedro to be at the top of the list, and there they are. Looks alright to me.
 
But if you look closer than that, the whole list starts to crack. For one thing, the top of the list is really crowded by Pedro and Randy Johnson. Randy Johnson has half of the fourteen best seasons, which is surprising. I mean, Randy Johnson was a fine pitcher, but do we really believe that he is responsible for seven of the fourteen best strikeout seasons since 1961? Do you think Johnson’s seventh best strikeout season was better than Koufax’s best, or Gibson’s best, or Sam McDowell’s best?
 
And doesn’t it seem like a lot of these seasons seem clustered around the turn of the millennium? I mean, I like Kerry Wood just fine, but it’s weird that half of the top dozen seasons happened in 1998, 1999, and 2000. Shouldn’t we see a wider spread?
 
How come there’s just one season from the sixties, and two from the eighties? That’s a little strange, right? 
 
And what the hell is up with Nolan Ryan? Do any of us seriously believe that Ryan’s best strikeout seasons were in 1989, when he was a billion years old, and 1987, when he was two years shy of a billion? What’s the problem with those Angels years?   
 
And the math…. the math is just weird, right? I mean, the final tally uses subtraction to make its case. Who uses subtraction for anything? What kind of conclusions can be trusted by an equation that uses subtraction? Why doesn’t this table divide a pitcher’s strikeout rate by the league average? Wouldn’t that get us a more accurate ratio?
 
Of course it would.
 
Think it through. Who is a better strikeout pitcher: a guy who strikes out 20% of hitters when the league average is 10%, or the guy who strike out 9% percent of hitters when the league average is 3% percent?
 
Pitcher
League K%
Pitcher K%
Pitcher A
20%
10%
Pitcher B
9%
3%
 
If you just use subtraction, Pitcher A comes out on top. But Pitcher A’s strikeout rate doubles his league rate. Pitcher B has tripled his league average…he is the much better strikeout pitcher, at least relative to his league.
 
So let’s try that.
 
Here’s the same list of players from FiveThirtyEight, but instead of listing them by the difference between their strikeout rate and the league rate (x – y), we’re listing them by their adjusted rate: the individual player’s strikeout rate divided by the league rate. We then multiplied the answer by 100, to give us a nice positive integer.  It’s essentially OPS+ for a pitcher’s strikeouts: 100 is average. 200 is twice the average.
 
Rank
Year
Pitcher
League K%
Pitcher K%
Adj. K%
1
1999
Pedro Martinez
16.4
37.5
229
2
1984
Dwight Gooden
14.0
31.4
224
3
2001
Randy Johnson
17.3
37.4
216
4
1976
Nolan Ryan
12.7
27.3
215
5
1979
J.R. Richard
12.5
26.6
213
6
2000
Pedro Martinez
16.5
34.8
211
7
2000
Randy Johnson
16.5
34.7
210
8
1995
Randy Johnson
16.2
33.9
209
9
1989
Nolan Ryan
14.8
30.5
206
10
1973
Nolan Ryan
13.7
28.2
206
11
1962
Sandy Koufax
14.1
29.0
206
12
1999
Randy Johnson
16.4
33.7
205
13
1997
Randy Johnson
17.1
34.2
200
14
1987
Nolan Ryan
15.5
30.9
199
15
1998
Kerry Wood
16.9
33.3
197
16
1991
Nolan Ryan
15.2
29.7
195
17
1993
Randy Johnson
15.1
29.5
195
18
2002
Randy Johnson
16.8
32.3
192
19
1998
Randy Johnson
16.9
32.4
192
20
1997
Pedro Martinez
17.1
32.2
188
21
2002
Curt Schilling
16.8
31.1
185
22
1997
Curt Schilling
17.1
31.6
185
23
2017
Chris Sale
21.6
36.1
167
24
2017
Corey Kluber
21.6
35.8
166
25
2017
Max Scherzer
21.6
35.7
165
 
That’s a better list, right?
 
Pedro is still the best, but the number two guy changes. Instead of Randy Johnson taking the second slot, Dwight Gooden’s 1984 season charts as the second-best strikeout season from the twenty-five players listed. And while Randy Johnson still gets a lot of credit for his strikeout seasons, some other pitchers jump up on the list. J.R. Richards gets a nice boost. Sandy Koufax clips the top of the list, which is exactly right. Kerry Wood declines a bit. And Nolan Ryan gets jumbled around: his 1976 season with the Angels rates as his best strikeout year, and 1973 slips in ahead of 1987.
 
The second list seems righter. It doesn’t tilt so favorably to Pedro and Randy. It gives Sandy his due.
 
But it’s not right either.
 
*             *             *
 
So Mr. Scalfino subtracted when he should have divided. That’s not a big deal: I’m sure I’ve made sillier mistakes in my own efforts to understand baseball. I’m not trying to take any kind of a cheap shot at him. I'm just trying to think this through a bit. 
 
What interested me about this article is that Mr. Scalfino was using Chris Sale’s brilliant season to approach a subject that I’ve been thinking about a lot lately: how do we measure strikeout ability of pitchers across generations?
 
Let’s look at the top-five of the list again:
 
Rank
Year
Pitcher
League K%
Pitcher K%
Adj. K%
1
1999
Pedro Martinez
16.4
37.5
229
2
1984
Dwight Gooden
14.0
31.4
224
3
2001
Randy Johnson
17.3
37.4
216
4
1976
Nolan Ryan
12.7
27.3
215
5
1979
J.R. Richard
12.5
26.6
213
 
That is a good list. It’s very tempting to rate individual strikeout seasons by Adjusted Strikeout Rate. It’s a fine metric, and you can’t argue with the results too much. Peak Pedro was as good a pitcher as I’ll ever see. Dwight Gooden was pretty amazing. Randy Johnson and Nolan Ryan have struck out more hitters than any other pitcher in baseball history. It’s a good list.
 
But Pedro’s 229 mark isn’t the best in history. It isn’t close to the best.
 
Consider this guy’s six-year run:
 
Year
Pitcher K%
Lg. K%
Adj K%
x
x
x
235
x
x
x
298
x
x
x
287
x
x
x
276
x
x
x
223
x
x
x
251
 
I’ve left the year and strikeout rates empty for a moment, just to cultivate a sense of suspense.
 
This pitcher doesn’t just do better than Peak Pedro when it comes to his adjusted strikeout rate…he kicks his ass. Over these six years, our unidentified pitcher averaged an Adjusted Strikeout Percentage of 258.
 
Is this guy better at striking out hitters than Peak Pedro Martinez? According to his Adjusted Strikeout Percentage, he certainly is. But I'm not absolutely convinced by this conclusion either. 
 
Let’s flush out the numbers:
 
Year
Pitcher K%
Lg. K%
Adj K%
1923
16.6%
7.1%
235
1924
21.5%
7.2%
298
1925
20.3%
7.1%
287
1926
19.6%
7.1%
276
1927
16.4%
7.4%
223
1928
17.8%
7.1%
251
Total
18.6%
7.2%
258
 
The pitcher, of course, is Brooklyn Robin’s ace Dazzy Vance, who dominated the National League during the Roaring Twenties.
 
Is Dazzy Vance a greater strikeout pitcher than Pedro Martinez? If we were to judge them by their Adusted Strikeout Percentage, Vance would seem the better strikeout pitcher. After all, Vance nearly tripled the league average in 1924. Pedro never came close to tripling that ratio.
 
If you think that through for a just a second, you’ll realize the problem with simply ranking pitchers by their relative strikeout percentages.
 
-          Dazzy Vance, in 1924, would have had had to strike out 21.6% of batters he faced to triple the league average. He just missed that total.
 
-          Pedro Martinez, in 1999, would have had to strike out 49.2% of the batters he faced to triple the mark. That’s a massive amount of strikeouts. That’s every other batter.
 
The problem with a metric like Adjusted Strikeout Percentage is that the possible strikeout percentage has an upward limit. As the overall percentage of strikeouts rises, the game gets closer and closer to that upward limit.
 
Looking at our table of pitchers since 1961 again:
 
Rank
Year
Pitcher
League K%
Pitcher K%
Adj. K%
1
1999
Pedro Martinez
16.4
37.5
229
2
1984
Dwight Gooden
14.0
31.4
224
3
2001
Randy Johnson
17.3
37.4
216
4
1976
Nolan Ryan
12.7
27.3
215
5
1979
J.R. Richard
12.5
26.6
213
6
2000
Pedro Martinez
16.5
34.8
211
7
2000
Randy Johnson
16.5
34.7
210
8
1995
Randy Johnson
16.2
33.9
209
9
1989
Nolan Ryan
14.8
30.5
206
10
1973
Nolan Ryan
13.7
28.2
206
11
1962
Sandy Koufax
14.1
29.0
206
12
1999
Randy Johnson
16.4
33.7
205
13
1997
Randy Johnson
17.1
34.2
200
14
1987
Nolan Ryan
15.5
30.9
199
15
1998
Kerry Wood
16.9
33.3
197
16
1991
Nolan Ryan
15.2
29.7
195
17
1993
Randy Johnson
15.1
29.5
195
18
2002
Randy Johnson
16.8
32.3
192
19
1998
Randy Johnson
16.9
32.4
192
20
1997
Pedro Martinez
17.1
32.2
188
21
2002
Curt Schilling
16.8
31.1
185
22
1997
Curt Schilling
17.1
31.6
185
23
2017
Chris Sale
21.6
36.1
167
24
2017
Corey Kluber
21.6
35.8
166
25
2017
Max Scherzer
21.6
35.7
165
 
The three active pitchers on this list are all at the bottom, despite having Pedro-level strikeout percentages. This happens because they are pitching in an era of historically high strikeout rates: they’re at the same altitude as Pedro, but the water has risen around them.
 
Dazzy Vance, the greatest strikeout pitcher of his era, could imagine tripling the league strikeout rate. I doubt Vance did think about a metric so esoteric as that, all the way back in 1924, but he certainly could have. It wasn’t out of the realm of possibility.
 
But it’s out of the realm of today’s possible. Chris Sale or Corey Kluber or Max Scherzer can’t think about tripling the league strikeout percentage, because that would mean they’d be striking out two-thirds of the batters they faced. Not the batters they manage to get out…all of the batters they face.
 
And they can’t really approach doubling the strikeout rate. Chris Sale strikes out a lot of hitters, but on the nights when he strikes out 12 or 13 hitters, he can expect the guys on the other side to strike out six or seven of his teammates.
 
He can’t do that. It’s not impossible, but it is highly, highly improbable.
 
And as the strikeout rate rises, it will become impossible.
 
So a metric like Adjusted Strikeout Percentage doesn’t give a fair shake to active pitchers, and it probably overrates the guys who notched a lot of strikeouts in low-whiff eras. Certainly, Dazzy Vance has to rate as one of the greatest strikeout pitchers of all-time, but I am not at all certain that he was the greatest.
 
So what could work?
 
*             *             *
 
This is where my limited math abilities come crashing into a wall of ignorance: I have no idea how to calculate this. I’m asking for your help.
  
What I’d like to know is how to adjust for league-relative strikeout rate and for the rate’s proximity to it’s absolute limit. I understand how to do the first part, but I have no idea how to do the second part.
 
One thought I had is that we might be better off talking about Strikeouts-Per-Nine-Innings Pitched (K/9), instead of Strikeout Percentage (K%).
 
Why would that be better?
 
Well…let’s think about Chris Sale.
 
Chris Sale has a Strikeout Percentage of 37% this year…he’s about a third of the way to the absolute limit of that metric (100%).
 
But Sale’s Strikeouts-Per-Nine rate is at 12.9, which is nearly halfway to the limit of that metric…you can only strike out twenty-seven hitters every nine innings pitched…unless your catcher misses a few third strikes.  
 
Which one is the more accurate ceiling? Is it more accurate to judge a pitcher on how he does against every batter he faced (that’d be K%), or should we understand them based on the outs they generate (the K/9 metric)?
 
I don’t know. Honestly, I have no idea which is better.
 
I think that a metric based on K% would give less credit to the ‘ceiling’ problem than a metric based on K’s/9 IP, but that’s just a guess. I think the best strategy would be to try both, and see which one works better….see which one comes up with a list that seems more accurate, one that shows less biases towards specific generations.
 
So that’s the challenge for you guys: how do we adjust for the ceiling of strikeouts? How do we adjust for the fact that Vance could triple his league’s strikeout percentage, while Pedro could double the percentage in his league, while Chris Sale, pitching in the highest strikeout era in the game’s history, can only hit about 1.5% of the league average? How do we factor that in?
 
I don't know. But until we do, I don't think that we can say with any certainty that 1999 Pedro Martinez was a more dominant strikeout pitcher than Chris Sale. 
 
So c’mon math majors. Help us figure this out.
 
 
Dave Fleming is a writer living in western Virginia. He welcomes comments, questions, and math lessons here and at dfleming1986@yahoo.com.  
 
 

COMMENTS (36 Comments, most recent shown first)

lidsky
Played with the Odds ratio - quite cool. From what I saw in the few years that I looked at it gives similar results to the n-1 Standard Deviation method I was using. (Recall (K/BFP-avgK/BVP)/SD).

My first question was is one better than the other in comparing across vastly different era. I used Pedro 1999 vs. Vance 1924. In both methods, Vance reigned supreme. I figured for them to be reasonable, Pedro had to have a shot of being "equal" to Vance wrt to the league.

Off a baseline of 313K of 835 BFP.
Pedro would have needed 26 more Ks to equal Vance using the SD method.
Pedro would have needed 30 more Ks to equal Vance using the Odds Ratio.

that to me is an indication for that case, either seems reasonable. All else being equal, the Odds Ratio is superior for it's simplicity. I'll have to look more to see if there are any meaningful differences.

In general, looking across the whole population of pitchers with enough innings, both results are monotonic, but the SD method has a steeper slope away from the averages. Not sure what to make of that yet.
6:44 PM Aug 21st
 
DaveFleming
Thanks, TTango! I'll start fooling around with the odds ratio method soon. Fun stuff, this math.
8:53 AM Aug 21st
 
lidsky
Yeah - I saw that in my original analysis which is why I clipped it at 100 innings and asked Charles what he used for his population size originally.

Anyway, clipping at 100 got rid of the tail and the peak near zero as expected. It didn't change the irregular shape of the rise and fall which is why I stuck with the n-1 method. I'll take the time to look at more years as I only looked at 5 different years so far so my sample size isn't large enough to speak with real intelligence. (I'll also check if 120 fixes the shape).

I'll compare methods, the shape may be regular enough that it's close enough to binomial etc, but I want to see if I can find a better way to pump through the data then importing year by year into excel and manipulating the numbers.

I haven't spent much time with Access - but given it has visual basic support I'm guessing I should be able to write scripts to figure this all out year by year automatically. I'll have to teach myself Visual Basic and Access so it will probably be a couple/three weeks before I get to it and through it.
11:07 PM Aug 20th
 
tangotiger
Relievers are a problem. Set your IP limit to at least 120 IP.
7:39 PM Aug 20th
 
lidsky
That last comment was on Odds Ratio.

Charles - can you explain why you think the distribution is binomial? It doesn't look that way to me. (Note - my stats background is in electrical engineering where I know from physics exactly what the distributions are, so this is my first dive into it applied to baseball - I apologize if these are basic questions)

I have to run for a while (my kid has a dog walking business to attend to, which means I do to!)
12:35 PM Aug 20th
 
lidsky
Thanks. For me it all hinges on the last sentence. Can you point me to the discussion on the assumption that the talent of the average pitcher is equal. Here is what I did:

I was using the Lahman database moved into excel.
I plotted for all pitchers both K/9 and K/BFP
For each I noted the distribution looked irregular. Here's what it looks like
-> A spike at around zero
-> A long very low tail going out on the high end
-> The shape of main distribution going up to the peak is different than the shape going down.

So given that irregular distribution (Clearly not bimodal nor gaussian) I clipped the data for any pitchers less than 100 inning pitched. This solved the tail problem but the distribution still didn't look regular. This indicates to me that the average pitcher performance summary may not be right - I need to look up the discussion on that as perhaps this assumption by me is wrong.

So given that, I used excel to get the standard deviation using the n-1 method and then for each pitcher in my new distribution I calculated (K/9 - avg(K/9))/standard deviation. i did the same for K/BFP.

The results for some of the years of K/9 is below.

I think whether you use the Odds Ratio or this or any other method, the tail should be clipped by using minimal innings pitched. My next step will be to compare "my method" vs. the odds ratio method.
12:31 PM Aug 20th
 
CharlesSaeger
Lidsky: I did K/BFP, but K/(IP*3) would also be binomial. Tom using odds ratio, which I didn't think would work myself, does get at the same thing: you need to combine the fact there's a declining number of not-strikeouts as strikeouts increase.
12:27 PM Aug 20th
 
tangotiger
You want the odds ratio method. Suppose the league average is 0.20 K per PA, and your pitcher has 0.333 K per PA. This is what you do:

lg = 0.2 / (1-.2) = 0.25
pitcher = 0.333 / (1-.333) = 0.50

Pitcher odds / lg odds = 0.50 / 0.25 = 2.00

This of course assumes that the talent of the average pitcher in the leagues in question is equal. That's a separate discussion. But the above math is what you want.
8:17 AM Aug 20th
 
lidsky
Correction. I didn't look at the whole population - I restricted the population to those with 100+ innings a ps with fewer innings a stat like k/9 isn't an accurate representation.
10:13 PM Aug 19th
 
lidsky
Hi Charles: I must be missing something. Can you explain how k/9 would have a binomial distribution. It doesn't seem like it to me, but perhaps I'm missing something. So then if I assume it's a binomial anyway, the standard deviation You got is much smaller than I would expect - and much smaller than when I look at the whole population and use the n-1 method. My stats may be rusty so I warrant I could be wrong.

David - good point on the smaller leagues, but I think the real issue is when somebody focuses on a skill others do they will stand way out. I thought my example below showed that using the standard deviation gave a reasonable and believable comparison. Whether it's an accurate comparison is impossible to say of course. 1922 nobody stood out. 1924 Vance stood out, but not so much that it would have been impossible for Pedro to get there.

I think until I can pump through numbers I remain unconvinced, but it seems reasonable so far.
10:11 PM Aug 19th
 
evanecurb
Dave:

I think the limit on ERA is 0.00. Gibson and Walter Johnson each came within 1.1 run per nine innings of that limit. Even in pitcher's eras, when the league ERA might be as low as 3.00, they were around 35% of the league ERA. So they got to 35% vs. a limit of 0%.
8:19 PM Aug 19th
 
CharlesSaeger
I merely used the 25 pitchers listed, and Dazzy Vance, using the exact formula I mentioned. There's no need for a minimum here, since this is a binomial.
3:23 PM Aug 19th
 
DaveFleming
I'm a little wary of using standard deviation, just because I suspect that it would favor players in smaller leagues and earlier eras, where it was a little easier to stand out. Like Babe Ruth in 1920 out-homering everyone....I think it was a bit easier to be a distance away from the league back then. I suppose it would be possible to check that, but then we'd be looking at standards of deviations of standards of deviation, which would be a little like the snake swallowing it's tail.

To evanecurb's point about OPS or ERA....the big differences between those stats and strikeout rate/percentage is that a) those other metrics have fluctuated over the years, while the strikeout rate has only really risen, and b) those other metrics aren't as close to their possible ceiling as strikeouts.

That's the thing that seems crucial to me...the ceiling issue. The highest possible OPS is 5.000....Barry came within 30% of that metric's limit, but it's pretty rare to see anyone tick past 25%. No one has come close to the limit on ERA, because the limit for that is infinite.

But with strikeouts....three guys are putting in top-ten seasons of K/9 this YEAR (Sale, Scherzer and Kluber). In about five years, if the trend continues, the league leaders will tip across 50% of the absolute limit of K/9. And unlike those other metrics (which ebb and flow), the strikeout rate continues to increase steadily.

* * *

On a completely different note, if anyone wants to discuss something that isn't about high levels of math...

For a while, I think the consensus among us nerdy types is that strikeouts are one of the better ways of measuring a pitcher's ability. But if the trends of this season hold (escalating strikeouts, escalating dingers), will we hit a point where strikeouts become less indicative of a pitcher's ability than a pitcher's ability to prevent homers?

This is a Rick Reuschel thing...he wasn't a great strikeout pitcher, but he was exceptional about keep the ball in the yard. That skill had SOME value in the 1970's, especially in Wrigley, but it might have more value going forward.

Anyway....just thinking out loud...
2:56 PM Aug 19th
 
lidsky
Charles - did you require a minimum number of innings pitched when you did that. 19.3 is a very large number of deviations and indicates a very irregular sample shape. I required 100innings and got the data I presented below.
2:09 PM Aug 19th
 
CharlesSaeger
I tried it this way:

Pitcher's K/BFP - League K/BFP

divided by the League Standard Deviation of K/BFP, which is:

Square root of (LgK/BFP*(1-LgK/BFP)/Pitcher's BFP)

Tom Tango did this with a whole bunch of goalies here:

www.insidethebook.com/ee/index.php/site/comments/how_to_figure_out_how_much_talent_there_is_example_with_nhl_goalies/

Anyways, when I do this, I get Petey 1999 on top, as before, but now Ryan 1973 second. The three 2017 pitchers are still at the bottom, likely due to their lower BFP since the season isn't done yet since there will be a higher standard deviation in fewer trials. (The AL's K/BFP is slightly higher than the NL's as of this post. As K/BFP approaches 100%, it really doesn't matter if the batter is a pitcher, a DH, or a potted plant.) Pedro's 1999 is 17.3 standard deviations higher than the league average while Ryan's 1973 is 16.3 higher, giving credit to facing 62% more batters.

As for Dazzy Vance? His 1924 season beats Martínez's 1999, at 19.3 standard deviations above the mean. The rest don't.
1:33 PM Aug 19th
 
Mike137
Gfletch,

I do not claim that adjustments are not possible. I only claim that the method of adjustment needs to be justified with evidence and that so far that has not been done for the issue in question. Otherwise the adjustment is arbitrary and ultimately meaningless.
1:10 PM Aug 19th
 
Mike137
yorobert wrote:
Well, Mike137, you said that unless an unanswerable question ("what if there are now more good strikeout pitchers?) can't be answered

I never said any such thing or anything that can be reasonably interpreted that way. I even suggested an approach to finding the answer to question.

yorobert wrote:
then "the question (about strikeout pitchers) cannot be answered."

That is called "logic". It is a useful skill.

yorobert wrote:
So I guess in your comparison, you're the cop who tells everybody to go home, because the keys are never going to be found anywhere.

No, I am the guy saying that you need a flashlight, because otherwise you are just wasting your time.

The claim was made in the 538 article that "If Pedro Martinez was pitching today, he’d have even more strikeouts than the Red Sox current ace." Nothing that has been said here either supports or refutes that assertion.
1:06 PM Aug 19th
 
Gfletch
I think I understand mike137. Comparisons across eras are difficult at the best of times and approach the level of impossibility the wider the gulf in time. Indeed, that's the essential problem with all statistical comparisons in baseball: adjusting for environment.

It's like a great hunter tracking grizzly bears in northern Canada, trying to compare him to a great hunter tracking giraffes in Kenya, or a great hunter tracking great white sharks off the coast of Australia. The environmental adjustments soon become impossible to adjust for.
12:58 PM Aug 19th
 
yorobert
Well, Mike137, you said that unless an unanswerable question ("what if there are now more good strikeout pitchers?) can't be answered, then "the question (about strikeout pitchers) cannot be answered." So I guess in your comparison, you're the cop who tells everybody to go home, because the keys are never going to be found anywhere.
12:21 PM Aug 19th
 
Mike137
MarisFan wrote:
Probably by the late '80's (give or take), it was well-known that sabermetrics was saying that strikeouts by batters were far less costly, compared to other outs, than had been believed and followed. Whether or not the change within baseball is attributable to this, it is clear that in these last 3 decades and probably with a continuing trend throughout the period, teams have cared less and less about batter strikeouts; maybe by now it has plateaued, maybe there have been blips in time where it reversed a little.

The general trend to more strikeouts has been going on forever. Strikeouts peaked in 1967, then declined quite a bit in the 70's, then went up again to new heights by the mid 90's, then leveled out, then began a rapid rise starting in about 2007.

I took a quick look at the distribution of strikeout rates for all hitters who qualified for the batting title in either 2007 and 2016. It looks like a combination of a general shift to more strikeouts (like one might expect if it is pitching that is responsible) and of there being more realtively more high strikeout hitters (as implied by MarisFan's theory). I have little doubt that both contribute, but at this point have no idea as to the relative importance.

Plus there may be other factors. Just when did the league start enforcing a consistent strike zone, rather than leaving the umpires a lot of discretion?
11:42 AM Aug 19th
 
jpc1957
Back in 2004, I had an article on this very topic appear in SABR's Baseball Research Journal. Its title was "Baseball's Most Dominant Strikeout Pitchers." If you're interested, you can check it out (and all of the other articles in that issue, too) here: https://100incheon.files.wordpress.com/2012/02/brj33-final1-69-73-400-club.pdf

I also thought that division was better than subtraction. (By the way, Baseball Digest disagreed - I originally submitted the article to BD, but they wanted me to redo it using subtraction.)

The conclusion of the article was that Vance, Waddell, Grove, Martinez, Score, Ryan, R.Johnson, Feller, and Dean all had seasons among the top 25.

The most notable absences (from a list of the top 60 seasons) were Koufax, Clemens, Seaver, Carlton, and McDowell.

I also included a similar list for relief pitchers. Not surprisingly, Gagne, Wagner, Dibble, and Lidge were at the top. I haven't updated it since 2004, but I suppose Kimbrel, Marmol, and Chapman would rank pretty high.

Anyway, I agree that a ratio is not fair to modern-day pitchers. The proper way to do it might be similar to what Bill Deane proposed when he discussed normalized winning percentage for pitchers' won-lost records (which you can find here, if you scroll down far enough:
https://www.fantasyguruelite.com/mlb-sabermetric-primer).

In any case, an interesting topic!

11:05 AM Aug 19th
 
evanecurb
Congratulations, Dave. You have managed to combine the two most hated forms of test into one question: a math test and an essay question. Don't try this in the classes you teach until after you've achieved full tenure. Your students will hate you.

Seriously, you've hit upon an issue that I've struggled with in baseball statistics across eras. I think you run into the same problem with something like relative ERA, or OPS+, or other stats that are used to compare performance across eras. I've noticed that the best relative stats tend to occur in eras that are either dominated by offense or by defense. In more balanced eras, you have fewer performances that stand out. Or so it seems.
10:44 AM Aug 19th
 
MarisFan61
Of course it's easier to get strikeouts now, so of course you have to do some accounting for when it was done.
(BTW the "of course" is pretending :-) .....there are no of courses on these things, but this is pretty close to one. Because.....)

It's an of-course if for no other reason than that there has been (apparently) a decision within baseball that makes it so. And, I'd add, I think it's a decision that is largely attributable to the influence of sabermetrics.

Probably by the late '80's (give or take), it was well-known that sabermetrics was saying that strikeouts by batters were far less costly, compared to other outs, than had been believed and followed. Whether or not the change within baseball is attributable to this, it is clear that in these last 3 decades and probably with a continuing trend throughout the period, teams have cared less and less about batter strikeouts; maybe by now it has plateaued, maybe there have been blips in time where it reversed a little.

This thing (assuming it's true) has operated on at least 2 separate levels:
-- Batters' approaches, and
-- Teams' selection and development of players.

Those are separate, and they add together.
The first of those, of course. If players know or are told that it doesn't matter that much if you strike out, just take your best swings, they'll be more likely to strike out.

And if teams believe in this, they won't mind a young hitter's strikeout tendency that much when they select him, they won't be so inclined to pick players because of their not striking out, and in the development of players they won't be so inclined to emphasize contact over power.

One reason that this "of course" isn't an of-course is that other things also change over time, and I don't know that there haven't been other factors, unrelated to this, that work in the other direction. Like (just making this up; I don't mean I think it's true), if batters' eyesight or hand-eye coordination has gotten better, that would counter the above. But it would seem (IMO) that the above is such a distinct and strong thing that it would be hard for other things to cancel it out.
10:02 AM Aug 19th
 
Mike137
Mathematicians often make conjectures, which are things that are thought to be true, but have not been proven. One use of those is to break up a problem into smaller problems. So here are three conjectures for consideration.

(1) Pitcher success strongly correlates with the ability of a pitcher to strike out batters.

(2) Batter success correlates only weakly with the ability of a batter to avoid strikeouts.

(3) There is a long term upward trend is the quality of major league players.

Taken together, the above implies that there should be a long term upward trend in strikeouts due to pitchers being better. But there may be additional reasons for the trend.

If the trend is entirely due to better pitching (questionable), then strikeouts by pitchers should be directly comparable, without adjusting for era. But to compare strikeouts by batters, we would have to make adjustments for era.
8:16 AM Aug 19th
 
Mike137
steve161 wrote:
if I understand you correctly, your point is the same one Stephen Jay Gould made about .400 hitters: it has to do with a change in the level of competition. Is that a fair statement?

I don't know, since I don't know what point Gould made.

My point is that if you want to normalize for something that has changed, you have to know what has changed. What has changed to cause strikeouts to be more common than they were 20 years ago? I can't answer that question. Without being able to answer that question, we can't make a meaningful comparison of Sale to Pedro.
8:03 AM Aug 19th
 
steve161
Mike137, if I understand you correctly, your point is the same one Stephen Jay Gould made about .400 hitters: it has to do with a change in the level of competition. Is that a fair statement? If not, please elaborate for the benefit of readers (if any) who are as dense as I am.
7:15 AM Aug 19th
 
lidsky
So for a few seasons I calculated the leaders in the following way: (K/9-LGAVGK/9)/STDDEV(LGK/9)

Don't know how well that will read, so in words. I found the distribution for each year of interest in terms of Standard deviation. I then took the players K/9 subtracted the league average K/9 and divided it by the Standard Deviation for that year.

Thus I get how many stddev above average somebody is. It is cumbersome so I haven't done many yet.

Pedro 1999 was 4.8 std deviations above the league average of 6.19
Gooden 1984 was 4.12 std deviations above the league average of 5.26

I'll do the rest of the list and add to Readers Posts, but it will take me a bit. I looked at a few and the stat gives similar leaders as in the article.

Then I decided to check the issue when the strike out rate is low, so I looked at Vance's 1924 season:

5.32 Std dev over the average of 2.6

So at first I thought - okay not a good method for those early years. But i noticed nobody else was near Vance, where in terms of Std Dev, there were a couple of players closer to Pedro and Gooden. So I thought, maybe Vance really was comparably that good. To check the small number issue, I checked 1922. Interesting result in that the top pitcher, Pruett didn't blow away the league even though the numbers were low:

2.83 Std dev over and average of 2.79

I'll look at a lot more when I have time, but the metric seems to pass the sniff test of able to work for both eras in that Pedro had struck out 21 more batters he would equal Vance's 5.32 std dev above average.

Perhaps, in terms of K/9, Vance was better than Pedro. (Banish the thought says this Sox fan!)


1:26 AM Aug 19th
 
Dhandforth
Thanks for another great article: Am surprised Roger Clemens and Bob Gibson is not on the list. But let's leave that for another time.

I'm stuck on why we need to consider the leagues average at all. Why not just look at a pitcher's strikeout rate and adjust it using another number? lidsky's idea to look at distributions from the mean could work. But I'd like to take up Mike137's questions and ask if we are inadvertently penalizing great strikeout pitchers who happen to be clustered (i.e., have a great year) together.

I understand the argument that we use league average to equalize batter behavior. In other words, if the high strike out rate is because batters are trying for more home runs, then it might be "easier" to strike them out. But that also means that batters are hitting more home runs (true for this season).

So for this particular argument, we might define a great pitcher as one who gets the most outs while giving up the fewest runs. In which case, we could define the adjusted strikeout ratio as strikeouts minus homeruns given up. I haven't had time to run the tables, but it would be interesting to look at strikeouts minus bases given up, too. Plus, we may have to adjust for park effects.

Still, I'll bet we will find fewer Randy Johnson appearances in the top 25.

1:02 AM Aug 19th
 
SteveN
It even sounds dumb to me when I think it but would calculating the number of batters not struck out be a better indicator? Its late and I am not thinking too clearly right now so maybe this is extra dumb.
10:54 PM Aug 18th
 
Mike137
I wonder if one could make progress by looking at the career progressions of hitters and pitchers in an era in which strikeouts are rising, like the last dozen years, compared to an era in which strikeouts were steady, like the previous decade, and one when they were declining, like the 70's. If it is getting easier to strike batters out, then pitcher's strikeout rates should be improving with age compared to other eras. If pitchers are getting better at striking batters out, then hitter's strikeout rates should be increasing with age in comparison with other eras.
10:16 PM Aug 18th
 
Mike137
A police officer came upon a drunk anxiously searching the sidewalk under a streetlight. The office asked the man what he was looking for. "
My house keys", the drunk replied.
The cop asked "are you sure you lost them here?"
"No I lost them in that alley."
"Then why are you looking for them here?"
"The light is better".

yorobert wrote: "all we can do is compare players to their contemporaries to see which players were the best of their peers, and by how much. Thus, I think this article is on the right track."

Only if the drunk is on the right track.

This article, and the 538 article it referred to, starts by making an assumption that is not only unproven, it is unexamined. That is no way to make progress in understanding.



10:10 PM Aug 18th
 
lidsky
I believe by using just the average were are restricting ourselves. Shouldn't we be looking at the distribution for a given year and then looking where a pitcher lies normalized to that years particular distribution. We could then compare pitchers cross years by seeing how many standard deviations above average for the year.
9:45 PM Aug 18th
 
yorobert
Mike137,
Your objection is true, but it is true for every comparison between eras, so all we can do is compare players to their contemporaries to see which players were the best of their peers, and by how much. Thus, I think this article is on the right track.

I'm guessing that cross-generational comparisons begin to falter when the numbers/statistics are so drastically different between eras, however, resulting in p-values that prevent any sort of accurate conclusions. I wish I knew enough about math to know how to do this myself.
8:03 PM Aug 18th
 
Mike137
I think there is a more fundamental problem here. Why are the league strikeout rates higher? The assumption made here is that it is because hitters are now easier to strike out. But what if the higher strikeout rates are because there are now more good strikeout pitchers? Then it would be inappropriate to adjust for the league average rate. The truth probably lies somewhere in between. But without knowing where it lies, the question can not be answered.
6:38 PM Aug 18th
 
MarisFan61
Good and interesting and thorough work.

But, I'd argue with one of your points:
It's far from clear that it's better to "divide" than "subtract." In fact, I think it's better how 538 did it -- i.e. subtract. I agree that your eventual lists are better than theirs, but I don't think that the "divide/subtract" thing is a reason.

And funnily, the main reason I think 'subtract' is better is this thing that you said, in a different context:

"The possible strikeout percentage has an upward limit. As the overall percentage of strikeouts rises, the game gets closer and closer to that upward limit."

This makes it hard to compare pitchers from different times in any straightforward way, but I think it causes more of a problem if you use division than if you use subtraction, because when the league rate is very low, you can get very high results, even without exceeding the league rate by that much -- far higher than it's possible to get in a higher-strikeout time.


4:45 PM Aug 18th
 
matt_okeefe
I didn't try this on any individual season, but my initial thought was, why not multiply the adjusted K% by K/inning? Would that seem to balance it enough?
4:20 PM Aug 18th
 
 
©2019 Be Jolly, Inc. All Rights Reserved.|Web site design and development by Americaneagle.com|Terms & Conditions|Privacy Policy