Clayton Kershaw vs. Walter Johnson

September 14, 2020
 Clayton Kershaw Versus Walter Johnson

 

            This is a follow-up to the study that I published yesterday, in which I put all major league pitchers starting a game since 1921 into a common pool, and ranked them according to their won-lost success in head to head matchups.   At the end of the article, I wondered aloud what would happen "If you limited the study to games in which both pitchers made 200 major league starts, that would get you about 100,000 games, I think.   Come to think of it, that’s the answer; that’s what I should have done."

            I actually did not think of that alternative until I was writing that last paragraph, and then I felt so strongly that that was what I should have done that I decided to do it.   It is my sad duty to report that it doesn’t work, either; in fact, it fails worse than the original version of the study did—"Fails" in the sense of delivering a list of the greatest starting pitchers ever that no one would take seriously.  

            First, I pared back the data so that it included only games by starting pitchers on both teams who made at least 200 major league starts.   This gave me a list of 567 pitchers and 97,622 games to study, so my estimate there (100,000) was about right.  Also, the mechanics of the study, benefitting from (a) the fact that less than a third as many games were involved, and (b) learning from the mistakes I made the first time I ran the study. . .the mechanics of it were much, much, much easier in this study, as I thought they would be, and this allowed me to run through 20 "cycles" of the study in maybe 90 minutes or two hours of work.   All good.

            Except that it doesn’t work. 

            After the first cycle of comparisons based on head to head performance, with everyone starting out even, the top 10 pitchers in my study were 1. Lefty Grove, 2. Clayton Kershaw, 3. Lefty Gomez, 4. Chris Sale, 5. Sandy Koufax, 6. Orlando Hernandez, 7. Urban Shocker, 8. Chris Tillman, 9. Walter Johnson, and 10. Randy Johnson. 

            Most of those are really good pitchers, but. . .Chris Tillman?   Why is Chris Tillman listed ahead of Randy and Walter Johnson?  

            Chris Tillman, for those of you who have not been following the AL East carefully in recent years, was the Baltimore Orioles’ ace from 2012 to 2016, posting won-lost records of 9-3, 16-7, 13-6, 11-11 and 16-6 in those years.   Still, his major league won-lost record was only 74-60, hardly the kind of record you would associate with Sandy Koufax and Randy Johnson.   How did he wind up on the list?

            Tillman made 205 major league starts, which qualifies him for the study, but when we limit ourselves to games against other pitchers who ALSO made 200 or more starts, then we’re dealing with only 83 starts.  It happens, whether by coincidence or not, that in those 83 starts, Tillman’s personal won-lost record was 35-19, a .648 winning percentage, and the won-lost record of his teams was 54-29, a .651 winning percentage.   He was under .500 when matched against starting pitchers who made less than 200 major league starts (so far), but around .650 when matched against pitchers who DO have 200 or more starts. 

            Now, I think, personally, that that is an interesting fact to know, if we are discussing Chris Tillman.   But it is really NOT a persuasive argument to rank Chris Tillman as a greater pitcher than Walter Johnson.  

            So it turns out that, doing the study that way, I actually WOULD need to include "balance starts" to express skepticism about pitchers who do well in small sample sizes—that, or this approach just is not going to work, anyway.  

            The other question that I really wanted to get to was whether this approach would pick up the "bias of history"—that is, whether it would rank 21st pitchers, in general, higher than older pitchers.   Boy, does it.  It ranks them SO MUCH higher than the older pitchers that, frankly, it makes the study pretty useless.   After 20 cycles through the data—not actually a large number—the top 20 pitchers on the list are all 21st century pitchers except Lefty Grove, who has dropped to 19th.  Sandy Koufax is still hanging around, in the #21 spot, but I am quite confident that, if I ran the data through another 20 cycles, both Grove and Koufax—and all other pitchers from before 1990—would drop entirely out of the top 100, and would still be dropping as we ran more cycles.  

            That is problematic, but not really unexpected.   While it MAY be true that baseball is better than ever and that players now deserve an advantage when compared to older players, that isn’t actually why this happens in this data.   The reason it happens is that the great majority of pitchers are better in the first halves of their careers than in the second halves; there are some exceptions.  But Robin Roberts was 179-120 over the first nine seasons of his career, and 107-125 over the second ten years.    CC Sabathia was 191-102 over the first 12 years of his career, 60-59 over the last seven years.  That’s more the rule than the exception.   Most pitchers have their best years when they are young healthy.  

            That means that, when a young Dwight Gooden pitches against an aging Tom Seaver or Steve Carlton, Dwight Gooden is usually going to win.  (I checked, after I wrote that.  Gooden was 2-0 against Carlton, and never started against Seaver.)    But Gooden was 1-3 against Orel Hershiser, and 1-5 against Doug Drabek.  They weren’t younger than he was, but they were at an earlier point in their careers.   You know what I mean; it’s a young man’s game. 

            That bias corrodes the network, and, over time, over repeated cycles, would probably make almost all 1980s pitchers look better than almost all 1960s pitchers.   So when the system shows us that all of the greatest pitchers ever are all 21st century pitchers, no one really can be expected to believe that—and then there is the problem that it isn’t even always the RIGHT 21st century stars.  This is the top 20 pitchers, after 20 cycles through the data:

First

Last

Score

Clayton

Kershaw

103.8

Chris

Sale

103.8

Chris

Tillman

103.6

David

Price

103.4

Orlando

Hernandez

103.1

Tim

Hudson

102.8

Jon

Lester

102.8

Cliff

Lee

102.7

Mark

Mulder

102.7

Stephen

Strasburg

102.7

Randy

Johnson

102.7

Johan

Santana

102.7

Adam

Wainwright

102.7

Pedro

Martinez

102.6

Clay

Buchholz

102.5

Andy

Pettitte

102.4

Jered

Weaver

102.4

Justin

Verlander

102.4

Lefty

Grove

102.4

Jake

Arrieta

102.4

 

            Even if I was willing to accept the premise that all of the greatest pitchers ever are 21st century pitchers, which I am not, I would rather have Zack Greinke, Max Scherzer, Roy Halladay, Roger Clemens, CC Sabathia and Greg Maddux on the list, rather than Chris Tillman, Orlando Hernandez, Mark Mulder, Clay Buchholz, Jered Weaver and Jake Arrieta. 

            It’s got three problems: 

(1)   Inadequate samples,

(2)  Timeline bias, and

(3)  The inherent problems of relying solely on won-lost records. 

 

So I give up.  It doesn’t work.  

 
 

COMMENTS (7 Comments, most recent shown first)

Brock Hanke
trn6229 - What you are dealing with here is not exactly what Bill was using in the article. Bill's study, which is very good, focuses on pitcher records against other good pitchers. You made a comment about how often pitchers got used against strong pitchers instead of lesser ones. That's not exactly the same thing.

The practice you've seen goes WAY back in baseball history, although it probably doesn't show up until the 1900s, because 19th-century starting pitchers pitched so many games that it was impossible to leverage them. But I know of one clear example, the 1903 Pirates. Manager Fred Clarke had two very similar pitchers as his top two: Deacon Phillippe and Sam Leever. If you look at the numbers without any leveraging, Leever's season looks a bit better than Phillippe's. But Chris Jaffe took a very seriously look at that year, and figured out that Phillippe was getting the large majority of his starts against other teams' aces. It's clear that Phillippe was the ace of the staff and Leever was not, at least in Fred Clarke's mind.

Also, this concept of leveraging was widely used for advertising in the Dead Ball Era. You'd see advertisements for a ballgame that, instead of saying "Giants vs. Cubs" would say "Christy Mathewson vs. Three Finger Brown." So, you certainly have a point, but it goes back MUCH further than you thought.
4:24 AM Sep 22nd
 
raincheck
Thank you. This is part of the process. One of the problems with medical research is that researchers are much less likely to publish failed studies. It reduces the growth of knowledge.

I work with a foundation that funds research for a specific group of diseases. We have an annual round table of researchers who help us to decide which projects to fund. It is amazing how often we are considering funding something that sounds promising, and one of the researchers around the table will say, “we tried that, it doesn’t work”

Those failures are secrets, and knowing what DOESN’T work is as important as what does work.
2:13 PM Sep 15th
 
trn6229
I enjoy reading all your articles. I know that Casey Stengel and some other managers leveraged their starting pitchers. During the time Whitey Ford and Casey Stengel were together, Whitey started many games against the Indians and White Sox, the two best teams other than the Yankees. He had Don Larsen start against the Senators and Athletics. I think on the other side, Billy Pierce also started many games against the Yankees and Indians. When Ralph Houk managed the Yankees, Whitey pitched in a rotation and won 20 games for the first time in 1961.
2:04 PM Sep 15th
 
evanecurb
I always gave it a little extra when I was facing a tough opponent. Sorry I messed up your study.

Regards,

Chris Tillman
10:32 AM Sep 15th
 
esolo25
Really interesting, Bill. Important, I think, to show and describe the studies that DON'T work as well as the ones who do. Good insight into your process as well. This site continues to be a tremendous value, thanks!!
8:13 AM Sep 15th
 
dfan
It's not the kind of study that you personally would want to do, but this sort of data is perfect for a model called Whole History Rating. It assumes that 1) the probability of winning any individual matchup is computable with log5 or what have you, given the pitchers' individual skills at the time of the game, and 2) a pitcher's skill is a random walk over the course of their career (it's more likely to change a little at a time than a lot). Then it finds the most probable career arcs for everyone, at once, given the results we have to go on. It deals with the small-sample problem nicely, as someone who goes 3-0 in their career will have a mediocre score because everything is done in terms of probabilities and it's so much more probable that they had a few lucky games than that they were amazing.​
8:07 AM Sep 15th
 
DaveNJnews
I realize the study didn’t work out the way you wanted, but I appreciate that you posted this. It is informative to see your processes, and know the types of things you attempt.
6:38 PM Sep 14th
 
 
©2020 Be Jolly, Inc. All Rights Reserved.|Web site design and development by Americaneagle.com|Terms & Conditions|Privacy Policy