Zack Greinke versus Cy Young

September 13, 2020
                         Zack Greinke Versus Cy Young

 

            I have done a little study here which didn’t exactly work and which I had to abandon before it really produced anything, but it didn’t completely fail, either, and there might be something in there that would work another time, a little different effort, so I’ll take the trouble to write it up and report it.

            Suppose that you compared all pitchers in history, just based on who wins and who loses in head to head matchups between them.   Of course, Zack Greinke never pitched against Cy Young, but Zack Greinke pitched (started) against Tom Glavine, and Glavine started against Rick Reuschel, and Rick Reuschel started against Bob Gibson, and Bob Gibson started against Warren Spahn (many times), and Warren Spahn started against Johnny Vander Meer, and Vander Meer started against Waite Hoyt, and Waite Hoyt started against Walter Johnson, and Walter Johnson started against Cy Young.  You can, in fact, put Zack Greinke in the same pool of contestants as Cy Young.   You can do it through that string of common opponents, or any of hundreds of other strings of common opponents. 

            I am aware that pitchers’ won-lost records have lost all standing in sabermetrics, and I know that I started that snowball rolling down the hill, but the point of the game is to win.  The point of the game is to be better today than the other guy.   Suppose that we just ignored the problems with Wins and Losses, and ranked all pitchers by the "common opponents" process. 

            The common opponents process is the way that we rank college football or basketball teams or NFL teams, where not everybody plays everybody but all of the teams compete in the same pool of opponents.   My file of pitcher’s starts, 329,988 pitcher’s starts, has the great majority of starts by pitchers between the years 1921 and 2018, so about 100 years of pitcher’s starts in there; I don’t actually have Cy Young, or Walter Johnson at his peak, but I have 8 years of Walter Johnson starts, and he was still the best pitcher in the league the first half of that time. 

            Anyway, I evaluated every game in that file in this way.  First of all, everyone is assigned an "initial value" of 100.000.   It doesn’t matter whether the initial value is 100.000 or 20 or 1000 or 500; it’s just a starting point.  If the average is 100, it also doesn’t matter where everybody is assigned the same number or different numbers; if the process was worked to the end, you could give Tom Seaver an initial value of zero and Tyler Glasnow an initial value of 1000, and it wouldn’t make any difference; the numbers are going to go where they are going to go, based on the data. 

            The pitcher has an "initial value" for the game and an "output value".   You add the initial values for the two starting pitchers together, divide by 2, and:

1)      Add 5 if the pitcher’s team won the game and he was the winning pitcher,

2)     Add 2 if the pitcher’s team won the game but he was not the winning pitcher,

3)     Subtract 2 if the pitcher’s team lost the game but he was not the losing pitcher,

4)     Subtract 5 if the pitcher’s team lost the game and he was the losing pitcher. 

            So when Tom Glavine faced Zack Greinke in Kansas City on June 13, 2004, (I was there) both Glavine and Greinke had initial values of 100.   Glavine won the game and Greinke lost, so we give Glavine a score for the game of 105, and Greinke a score of 95.   We do this for every game of each pitcher’s career, and then find the average for all of their starts.   Their output average value for the first round then becomes the input initial value for the second round, and we run the cycle again. 

            After the first cycle of games Greinke has an average value of 100.66 and Glavine an average value of 100.72, so when we re-run the cycle, Glavine’s score in the second round is not 105, but 105.69.  The value of the game has gone up because Glavine beat what we now know was a good pitcher, Greinke. 

            Actually, it is not QUITE that simple.  We want to keep the average centered at 100.000.   Well, we want to keep the system anchored at some point, so it doesn’t go drifting off to sea.  At the end of the first round, at the end of each round, we figure the average values for all of the pitchers, and adjust all of the averages so that they move back to 100.  After the first round, the average pitcher has a score of 98.29229.  To move this back to 100, we multiply each pitcher’s score by 1.017374. This moves the average back to 100, but also, it moves Greinke up to 102.41 and Glavine up to 102.47.   So when we re-evaluate the Glavine-vs.-Greinke game, Glavine’s output score moves up to 107.44, for that game, because he beat a good pitcher, and Greinke’s output score moves up to 97.44, because, while he lost the game, he lost to a really good pitcher.

            (The average drops to 98.3 after one round essentially because the pitchers who average over 100 get many more starts than the pitchers who average under 100.  The average drops for two or three rounds because of this, then the system adjusts for that, and the average after a cycle will be like 100.00476, or something.  But we still have to move it back to 100.) 

            After one round of calculations, the highest-scoring pitchers are these 10:

 

1

Lefty

Grove

103.5363

2

Spud

Chandler

103.4518

3

Dizzy

Dean

103.4289

4

Lefty

Gomez

103.3126

5

Whitey

Ford

103.3097

6

Parker

Bridwell

103.2635

7

Johnny

Beazley

103.1391

8

Don

Gullett

103.1228

9

Ted

Wilks

103.1222

10

Clayton

Kershaw

103.1183

 

            Of the top 10, five are in the Hall of Fame, so we can see that the system has some potential to deliver reasonable rankings of pitchers.   But who the hell is Parker Bridwell?

            Parker Bridwell was drafted by the Baltimore Orioles in the 9th round in 2010.  After a minor league career of no particular distinction, the Orioles sold him to the Angels, perhaps on the advice of the same guy who told them to trade Mike Yastrzemski to the Giants for Tyler Herb. In 20 starts for the Angels in 2017, he went 10-3.  Then he won one more game in 2018, then he got hurt and went back to the minor leagues, where he still is today, or would be if the minor leagues were playing. 

            I knew that this was going to be an issue.  In data like this there has to be one pitcher in there somewhere who only made one start in his career and beat Lefty Grove or Carl Hubbell or Bob Gibson or somebody.  In a system like this HIS ranking depends entirely on the ranking given to Lefty Grove or Carl Hubbell or whoever, and then his number will stop moving and come to rest at 10 points higher than Carl Hubbell or whoever.   Here he is; Josh Rupe had only one start in his major league career, but he beat Felix Hernandez.   Steve Olin had only one start in his career, but he pitched really well, and he beat Teddy Higuera. 

            I knew that would be an issue, and I had four or five different plans as to how to counteract that.  One of my plans was that I would assign every pitcher in the data one "balance loss" against a pitcher whose value never moves from the starting point (100).  One loss assigned to Felix Hernandez or Teddy Higuera isn’t going to mean anything, and anyway we adjust everything back to 100.00 at the end of the cycle, but one balance loss keeps Josh Rupe from ranking as 10 points better than Felix Hernandez. 

            But the problem of scale doesn’t by any means end there.   A pitcher who has a career record of 200 wins, 120 losses should rank ahead of a pitcher who finished 100-60, other things being equal.  As I said, I had numerous plans for how to deal with this issue, but I thought that maybe the data itself, as the cycles repeated, would push the Parker Bridwell’s and Terry Leach’s and Atley Donald’s and Howie Krist’s off of the leader boards, and allow the best pitchers to assert themselves more consistently.

            And that does happen, to an extent.  After one cycle, Zack Greinke ranks 135th on the list, which is actually in the top one-third of one percent.  After a second cycle he moves up to 117th, after the third cycle 114th, after the fourth cycle 107th.   Between the third and fourth cycles he passes Gustavo Chacin, Bob Grim, Jeff Niemann and Miles Mikolas.   Something is happening within the data which gradually allows the best pitchers to elbow their way toward the top, while Parker Bridwell does begin to gradually drop lower. 

            But here’s the rub.  After one cycle, Parker Bridwell ranks as the #6 starting pitcher among the 4,876 pitchers in the data.   After five cycles, he ranks 8th.  He probably should rank about 3,000th, I would guess.   At the rate the system is catching up, this is going to take one hell of a lot of cycles. 

 

            The system does not completely fail.   After 20 cycles of re-evaluation, the system gives us a Top 20 list that includes Lefty Grove, Dizzy Dean, Whitey Ford, Clayton Kershaw, Lefty Gomez, Pedro Martinez, Roy Halladay, Ron Guidry, Roger Clemens, Sandy Koufax, Johan Santana and Mike Mussina.  It’s not like that’s a dart-board selection; those are all really good pitchers.    But it also includes Parker Bridwell, Terry Leach and Bob Wickman, among others who shouldn’t be there.  

            At this point I gave up on the system, because it is apparent that to get truly meaningful or reliable ratings would take hundreds of cycles, and it is just too much work to do that.   With 329,988 data lines the system which have to repeatedly sorted in this way and that way and data copied from here to there:

1)     It takes too much time,

2)     I made little mistakes, which are costly, and

3)     With this much data, sometimes the computer itself makes mistakes.   Sometimes it just says "this is too much data; I can’t handle it, the hell with you all." 

 

Lots of studies like this, I set them up so that you can run through them hundreds of cycles, but you just can’t do that here, because there is too much data. 

Acknowledging that there are inherent problems with just using wins and losses, and acknowledging that you will never ENTIRELY escape the problems of those with this data set.  But the system appears to be working, at a certain level, to such an extent that, with better equipment (a bigger and better computer), with better programming skills, with a few different assumptions to cut the data, I think you COULD get something out of it.  One question that I was trying to pursue was whether, in a system like this, the average quality of pitchers would be shown to be improving over time.  I suspect that it would, but I couldn’t push the system to the point at which I could really be sure.    

If I was going to re-start the study. . .we’ll have bigger and better computers in a few years, and I’ll be able to fill in the Game Logs for pitchers probably back to 1900, at least.   If I was going to try the study again, here’s what I would do differently. 

You remember the "balance games", the imaginary losses put into the system to limit the effect of a few good starts?   One balance game isn’t enough, and, as I learned through trial and error, three balance games wouldn’t be enough, either, and five wouldn’t be enough.   You would need to assign every pitcher somewhere between 10 and 50 balance games.  That would wipe the Parker Bridwells and Bob Wickmans off of the leader boards immediately, but also, it would make the pitcher who was 300-180 rank higher than the pitcher who was 200-120 or the pitcher who was 100-60.  It would make career length matter—as it should matter, but the $64,000 question is "How much?"  How much "balance" do you put in there, so that career length is given the right amount of credit?

I don’t know.  Another thing I would probably do is, I would probably take out the individual pitchers in the data who only pitched a few games and who only pitched within one season, and assign them all a group name, like "1933 group" or "1968 group".   The rankings for the worst pitchers in the system are not accurate or meaningful or interesting, and tracking their data adds weight to the process while not adding anything of value. 

(As an aside, I did find, while doing this, a game in which BOTH pitchers were making their only major league start.  September 29, 1946, Pittsburgh at Cincinnati (Crosley Field).  The last day of the 1946 season, two bad second-division teams.  28-year-old Walter Alvin Tate started for Cincinnati, his only major league start, while a regular major league outfielder, Ad Libke, started for Cincinnati, the last game of his major league career, and his only start as a pitcher.  The game was over in one hour and 30 minutes.)

Another way to do the study would be to put some limits on who is included in the study.   What you could not do is to limit the study to only, let us say, what you believe to be the top 1% of the pitchers.   If you did that, then you would only include in the study those games in which BOTH pitchers are in the top 1%, which would not be enough games to really do the study.   You couldn’t do it with the top 10%, because that would only get you 1% of the games (ignoring the fact that the best pitchers have longer careers.) 

If you limited the study to games in which both pitchers had 250 or more major league starts, that would get you 16% of the games in the data, or about 50,000 games.   You could probably make that work, and, at that level, then you wouldn’t have to worry about putting "balance games" in for every pitcher, which would make the process a lot easier.   If you limited the study to games in which both pitchers made 200 major league starts, that would get you about 100,000 games, I think.   Come to think of it, that’s the answer; that’s what I should have done. 

But I didn’t.   If I had done THAT, the study might very well have foundered on some other rock.  Most studies do.   Thanks for reading. 

 
 

COMMENTS (4 Comments, most recent shown first)

bjames
StatsGuru
From 1920 to current day there are 52746 games in which both pitchers eventually made at least 200 starts. From the Day by Day Database at BaseballMusings.com.

Right, but my study was 1921 to 2018, as stated in the article, and is missing a few games and uses both pitchers from those games that it uses. So we've got basically the same number, I think.
12:43 AM Sep 15th
 
StatsGuru
From 1920 to current day there are 52746 games in which both pitchers eventually made at least 200 starts. From the Day by Day Database at BaseballMusings.com.

select count(*) as Games
from
(select PG.GameID
from PGInfo as PG
where PG.GamesStarted = 1 and PG.PlayerID in
(select PG.PlayerID
from PGInfo as PG
group by PG.PlayerID
having sum(PG.GamesStarted) >= 200)
group by PG.GameID
having count(*) = 2) as PGC
7:01 PM Sep 14th
 
shthar
I feel like Kevin Bacon is involved somewhere.


3:14 PM Sep 14th
 
bjames
I have written a follow-up study to this one. It has been posted, and will become visible to the members at 4:00 this afternoon.
12:57 AM Sep 14th
 
 
©2020 Be Jolly, Inc. All Rights Reserved.|Web site design and development by Americaneagle.com|Terms & Conditions|Privacy Policy