The Greatest of Games

By Bill James

June 21, 2012

I. The Record Game

I was very into Matt Cain’s perfect game, watched the last three innings of it on my television machine, called my wife to watch the end of it, e-mailed my son in San Francisco to make sure he was watching. The morning after the game it was reported—erroneously—that Cain’s 101 Game Score was the highest Game Score in 50 years, and then I was pretty much bombarded by questions like this in "Hey, Bill":

Matt Cain posted a 101 in his perfecto tonight. Happ logged a 10. That's a 91-point spread -- I wonder if that's a record?

Hey Bill, your Game Score made the USA Today. Unfortunately you had four and-a-half fewer column inches than Rihanna so maybe all you need is a blonde wig. Anyway, they mentioned that Cain's Perfect Game had a 101 Game Score but Kerry Wood still had the record with a 105. It went on to (and I think this is lost in the shuffle) mention that Cain's game came against a retooled Stros lineup while Wood threw that game against a 102 win Stros team. If you had to pick, is that your most dominant game ever? I was watching it that day and that's as unhittable as I've ever seen anyone. They need to factor in the fact that Cain did this at a park that is just second to Yosemite in terms of size.

OK, well. …first, Kerry Wood’s game is not exactly the record, but it is the record for a 9-inning game. The highest Game Score in the last 50 years—although it is barely within the last 50 years—is a very remarkable game that absolutely nobody seems to remember.

On June 6, 1964, the Yankees and Angels met in Dodger Stadium, at that time being shared by the two Los Angeles teams (although owned by the Dodgers.) Jim Bouton started for the Yankees, Dean Chance for the Angels. Bouton was coming off a sterling 1963 season (21-7, 2.53 ERA); Chance was an 18-game loser in 1963 but pitching well. In his previous start, four days earlier against Boston, Chance had pitched a 2-hit shutout, striking out 15 batters—a game Score of 94—and two starts before that, May 24, he had pitched a 3-hit shutout against the Yankees in Yankee Stadium.

Chance had a no-hitter through six innings. He walked Tom Tresh in the 7^th inning and Roger Maris singled, but dodged the rocket with a double play and a strikeout. That was the only hit he gave up through nine innings.

Bouton was good enough to keep the game alive; he gave up 7 hits and 3 walks through nine innings, but—in part because of bad base running—the Angels did not score. Chance retired the Yankees 1-2-3 in the 10^th; Bouton gave up another single in the bottom of the 10^th. The Yankees went 1-2-3 in the 11^th; the Angels wasted a walk in the bottom of the 11^th. Through 11 innings Chance had a 1-hit shutout with ten strikeouts. Each pitcher surrendered a single in the 12^th. In the 13^th Chance gave up another single; Bouton, a double.

For the 14^th inning Bouton was replaced by a reliever; Chance retired the Yankees 1-2-3, and then left the game. In 14 innings he had given up 3 hits, no runs, 2 walks, and had struck out 12. As soon as he was out of the game the Yankees scored and won the game, 2-0 in 15 innings.

The Game Score for Joe Oeschger, when he pitched 26 innings one afternoon, was 153, a feat beyond the understanding of modern fans. But in the last 60 years, Dean Chance against the Yankees on June 6, 1964, had the highest Game Score on record—116. 14 innings, 3 hits, 12 strikeouts, no runs.

Chance’s game has no easy hook, no easy way to remember it, like Wood having 20 strikeouts, or the 16-inning Marichal-against-Spahn game that everybody seems to remember, with the two Hall of Famers, or the Harvey Haddix 12-inning perfect game; there is no one thing about it that makes it stick in the public’s mind—yet Chance’s performance is every bit as impressive as Marichal’s, or Haddix’, or Wood’s.

Chance threw another shutout on June 23 of that season, then threw three consecutive shutouts in July. He pitched a 2-hitter against the Yankees on July 28—not a shutout—then pitched two consecutive two-hit shutouts in August. He shut out the Yankees twice more in September—one of those a 2-hitter--and shut out Minnesota on September 25 for his 20^th win. Altogether he threw eleven shutouts that season, a count which does not include the 14 shutout innings in this game. Chance pitched 50 innings in 1964 against the Yankees—the American League champions with a 99-63 record—and posted a 0.18 ERA in those 50 innings. He gave them 14 hits and one run in 50 innings. The one run was a solo home run by Mickey Mantle.

II. The Widest Splits

Let’s double back to this question, posted by Woverstrap on June 15:

Matt Cain posted a 101 Game Score in his perfecto tonight. Happ logged a 10. That's a 91-point spread -- I wonder if that's a record?

A search of my own data found that the largest difference between two pitchers in a game occurred not when one pitcher pitched extremely well, but when a pitcher pitched unusually poorly. On August 3, 1998, Mike Oquist gave up 16 hits and 14 runs, all of them earned, in a five-inning start. That makes a Game Score of Negative 21—the lowest in the majors in the last 60 years. His opponent, El Duque, posted a Game Score of 83 (in the top 3% of all Game Scores), for a margin between the two of 104 points.

A reader (jwilt) pointed to the Ty Cobb strike game in 1912. In 1912 Ty Cobb was suspended for going into the stands to have a fist-to-face conference with a disrespectful fan. The American League suspended Cobb, and the rest of the Tigers walked out in support of the Redneck Peach. On May 18, with the real Tigers team on strike, the Tigers fielded a team of amateur players and coaches, which lost 24-2. The Tigers’ starting pitcher, Allan Travers, pitched a complete game, posting a Game Score of negative 52. Whatever the other starting pitcher did—which neither jwilt nor I knows—it would probably be enough to create a very wide gap between the two starting pitchers. That’s like the Joe Oeschger game; it’s a different era, and all kinds of weird things could and did happen in them days.

But this introduces a more serious question, one that I have long wanted to research. Suppose that one starting pitcher posts a Game Score of 60, and the other of 50. What are the odds that the team whose starting pitcher is +10 will win the game?

75.3%. There’s a simple pattern that predicts the data: each point of Game Score is 25 points of winning percentage. A four-point advantage in Starting Pitcher Game Score give the pitcher’s team a 100-point advantage in winning percentage. It’s really neat how well that works out:

Advantage	Won	Lost	Winning Pct
10	2594	852	.753
9	2574	1001	.720
8	2552	1116	.696
7	2555	1261	.670
6	2404	1271	.654
5	2359	1500	.611
4	2283	1480	.607
3	2189	1592	.579
2	2176	1688	.563
1	2019	1801	.529
0	1901	1901	.500
-1	1801	2019	.471
-2	1688	2176	.437
-3	1592	2189	.421
-4	1480	2283	.393
-5	1500	2359	.389
-6	1271	2404	.346
-7	1261	2555	.330
-8	1116	2552	.304
-9	1001	2574	.280
-10	852	2594	.247

Obviously this simple rule of thumb can’t extent out indefinitely, or, if one pitcher had a 30-point advantage in winning percentage, his team would have an expected winning percentage of 1.250. But here’s another neat fact. In my data (115,000 games) there are 2,387 games in which the difference between the two starting pitchers is 55 points or more. The won-lost record of the team with the better starting pitcher performance, at that level, is 2,387-0.

The most lop-sided game in which the team with the better starting pitching performance didn’t win occurred May 28, 1990, in Pittsburgh. Tim Belcher started for the Dodgers. Belcher pitched eight innings of 1-hit ball, posting a Game Score of 82. Bob Walk gave up 9 hits, 5 runs in four and two-thirds innings for Pittsburgh, Game Score of 28. The Dodgers took a 5-1 lead into the bottom of the ninth.

Tommy Lasorda, however, replaced Belcher with a pitcher named Pat Perry, who I honestly don’t remember. Perry gave up a single and a double, and threw a wild pitch. It was 5-2. Perry walked Van Slyke; Barry Bonds hit a pop fly that dropped in shallow center; it was 5-3, nobody out. Perry got one out but then Sid Bream singled. Jay Howell replaced Perry on the mound for the Dodgers. Howell walked Don Slaught, loading the bases. The Dodgers still had a 93% chance to win the game. Jose Lind doubled to right field, tying the score, and Slaught scored from first when Hubie Brooks’ throw to third went wild—Hubie’s second error of the game. Pirates win, 6-5—despite a 54-point difference in the performance of the starting pitchers.

Here’s a full list of the winning percentages of teams, by the Game Score advantage of the starting pitchers:

Advantage	Winning Pct
55 or more	1.000
54	.997
53	1.000
52	.998
51	.998
50	.998
49	.998
48	.999
47	1.000
46	.995
45	.997
44	.995
43	.995
42	.996
41	.988
40	.994
39	.990
38	.993
37	.985
36	.985
35	.991
34	.979
33	.973
32	.973
31	.966
30	.959
29	.967
28	.963
27	.955
26	.942
25	.940
24	.932
23	.925
22	.923
21	.909
20	.896
19	.880
18	.876
17	.861
16	.834
15	.835
14	.815
13	.795
12	.786
11	.771
10	.753
9	.720
8	.696
7	.670
6	.654
5	.611
4	.607
3	.579
2	.563
1	.529
0	.500

III. The List

I would have a hard time arguing that Kerry Wood’s game—or Matt Cain’s game—was more impressive than what Dean Chance did to the Yankees on June 6, 1964. On the other hand, if we list the greatest games of the last 60 years simply by the highest Game Scores, we get a list of extra-inning performances—

1) Chance against the Yankees (116),

2) Tom Cheney’s famous 21-strikeout game in 1962 (115),

3) 15 innings of shutout ball by Chris Short, 1965, with 16 strikeouts (114),

4 tie) A 16-inning game by Johnny Antonelli in 1955, in which Antonelli gave up only six hits but one run (112),

4 tie) 16 shutout innings by Gaylord Perry in 1967 (112),

4 tie) 15 shutout innings by Rob Gardner against Chris Short in 1965 (112),

4 tie) 16 shutout innings by Juan Marichal in 1963, in the game in which Marichal defeated Spahn 1-0 in 16 innings (112).

The Harvey Haddix game scores at 107. I would certainly not try to tell you that any of these games is not a remarkable pitching performance, or even that any of them is less remarkable in its own way than the Kerry Wood or Matt Cain games.

But when I draw up a list of the greatest pitching performances in the last 60 years, I don’t want a list of 1950s and 1960s games in which some pitcher was left in the game for 14 innings. I made an arbitrary decision, then, that on my list of top ten pitching performances of the last 60 years, I would include only three extra-inning games, and those three will rank 10^th, 7^th, and 4^th.

The Game Score system scores how pitchers have performed without regard to the park, the league norms, or the quality of the opposing team. Of course those are legitimate issues. OK, how do we integrate those into the system?

I figured "Expected Game Score" for every game in my data, based on

1) The league norm for Game Scores in that League and Season,

2) Modified by the Park Adjustment,

3) Modified by the Quality of the Opposing Offense.

The highest Expected Game Score. . …this is a bizarre fact. The highest Expected Game Score was for the San Francisco Giants pitchers in 1960, when they were facing the Philadelphia Phillies lineup in Candlestick Park in 1960; not 1965 or 1968, as I expected, but 1960.

The Philadelphia offensive lineup in 1960 was remarkably bad. They scored only 546 runs in 154 games—69 fewer runs than any other major league team that season—but they did this despite playing in by far the best hitting park in the majors. The park run factor was 119. Park-adjusted, the 546 runs were more like 500.

An average pitcher, facing that team, could be expected to have a pretty good game. Candlestick Park that year had a park run factor of 74. The average Game Score in the National League in 1960 was 51.44. When you adjust that upward for the weakness of the Phillies’ offense, adjust it upward again because of the pitcher-friendly environment, and add a point and a half because the Phillies were on the road, you get a figure of 69.55.

When I saw that number, I thought I had to have made a calculation error. You remember how good Pedro Martinez was in 1999, when he went 23-4 with 313 strikeouts and 37 walks? His average game score that season was almost exactly that. I didn’t believe that any combination of opposing offense and bad hitter’s park could make an average pitcher into Pedro Martinez, but it seems to hold up. The Phillies played 11 games in Candlestick in 1960. The average Game Score of the 11 San Francisco pitchers was 71.64—even higher in fact than it should have been in theory.

On May 11, 1960, Sam Jones pitched a 2-hit shutout against the Phillies, striking out 11, Game Score 91. The next day, May 12, Jack Sanford posted an identical box score line—9 innings, 2 hits, no runs, 11 strikeouts, 3 walks, Game Score 91. In July, Juan Marichal made his major league debut in San Francisco against Philadelphia. He pitched a 1-hit shutout, Game Score 96. In the last Philly-in-San Francisco game of 1960, Sam Jones struck out 14 batters. The totals for the 11 San Francisco starters in those games: 7 wins, 2 losses, 78 strikeouts, 21 walks, 1.43 ERA.

The second most-favorable situation for a pitcher was Dodger pitchers against the Mets in Dodger Stadium in 1965; expected average Game Score, 68.47, actual, 71.89.

The worst situation for a pitcher was for a pitcher facing the San Francisco Giants’ offense in Coors’ Field in 1999. It’s the same thing on the other end: The Giants had a really good offense, which scored 872 runs in 1999 despite playing in a very poor hitter’s park. When you put them in a great hitter’s park—Coors’ Field—the expected Game Score of an opposing starting pitcher is 32.66. The actual average: 28.67. The ERA of Rockies starters against the Giants that year in Coors Field was 10.08.

OK, having established that the Expected Game Score system works, we can re-rate the performances of starting pitchers by comparing the actual to the expected Game Scores. We divide the actual Game Score by the expected, and multiply by 50. When we do that, we have a list of the greatest games pitched in the last 60 years.

These, then are the ten greatest games by a pitcher in the major leagues in the last sixty years;

10. Johnny Antonelli against the Cincinnati Reds in the Polo Grounds, May 1, 1955. Antonelli pitched 16 innings, giving up 6 hits and 1 run, and finally winning the game 2-1.

9. Pedro Martinez against the Rockies in Colorado, July 29, 1997. 13 strikeouts in a shutout. Our system likes great games pitched in Coors Field.

8. Pedro Martinez against the Devil Rays in Tampa Bay, August 29, 2000. Pedro hit the first batter of the game, Gerald Williams, with a pitch. Williams charged the mound and knocked Pedro down. Williams was ejected from the game, triggering a beanball war in which three Red Sox players were hit by the pitch, and eight players and coaches were ejected from the game. Pedro, ignoring the chaos or feeding off of it, took a no-hitter into the ninth inning, having allowed no one on base after Williams. A ninth-inning single by John Flaherty denied him a no-hitter. He struck out 13.

7. Tom Cheney against the Baltimore Orioles in Baltimore, September 12, 1961. Cheney struck out 21 batters in 16 innings.

6. September 18, 1996, Roger Clemens’ second 20-strikeout game, against the Tigers in Detroit.

5. July 7, 2007; Eric Bedard pitched a 2-hit shutout against the Rangers at The Ballpark in Arlington, striking out 15 and walking no one.

4. Chris Short against the Mets in Shea Stadium, October 2, 1965. Short pitched 15 shutout innings, striking out 18. The game ended in a 0-0 tie.

3. Pat Rapp’s one-hitter in Coors’s Field, September 17, 1995. Raw Game Score: 91; Adjusted Game Score, 102.8.

2. Hideo Nomo’s no-hitter in Coors’ Field, September 17, 1996 (exactly one year after Rapp’s one hitter, and one day before the Roger Clemens game which was number 6 on our list.) Raw Game Score, 91; Adjusted Game Score, 105.7.

1. The Kerry Wood game. Raw Game Score, 105; Adjusted Game Score: 107.1.

IV. The Most Dominant Pitchers

When you establish an objective answer to one question, that puts you in a position to pursue the next logical question, so. . ..

Let’s say that Kerry Wood’s game is the most impressive game ever, considering the team he was pitching against and everything. OK, but that’s the only great game that Wood ever pitched. Who pitched the most impressive list of great games? Suppose that we give a pitcher 250 points for pitching the #1 game on the list, 249 points for #2, 248 points for #3, etc. Who piles up the most points?

11. Kevin Appier, 423 points. I confess, it’s a ten-man list and I stretched it to eleven because it was The Ape. The Royals were terrible most of his career, and people never realized how good Appier was. He pitched a lot of great games, and landed three on the list of the top 250.

10. Frank Tanana, 3 games, 446 points.

9. Kevin Millwood, 3 games, 453 points. There has to be one "What?" on the list; otherwise you don’t learn anything. Millwood had 3 games among the top 250—August 28, 1998 against the Cardinals (ten innings of 2-hit, shutout ball), a no-hitter against the Giants, April 27, 2003, and eight innings of 1-hit shutout ball against the Angels, May 9, 2005. The three games rank 63^rd, 67^th, and 170^th on our list of 250 games—a total of 453 points.

8. Roy Halladay, 8 games, 651 points.

7. Curt Schilling, 6 games, 707 points.

6. Juan Marichal, 4 games, 822 points.

5. Roger Clemens, 6 games, 826 points. Clemens says that he never threw a no-hitter at any level of competition—majors, minors, college, high school or little league.

4. Mike Mussina, 6 games, 968 points. The second surprise, Mussina being this high on the list, but he was a great pitcher.

3. Randy Johnson, 12 games, 1101 points.

2. Nolan Ryan, 7 games, 1239 points.

1. Pedro Martinez, 12 games, 1965 points.

I had expected Nolan Ryan to be #1 on the list, but Pedro is not a problem. Jerry Remy was a teammate of Nolan Ryan’s for several years when Nolan was at his best, and also was the Red Sox broadcaster when Pedro had his greatest seasons. Recently his broadcast partner asked him who the greatest pitcher he ever saw was, when he was on his game, assuming that the answer would be Ryan—but Remy said that it was Pedro; at his best, Pedro dominated the game like nobody else. Our silly little system here agrees.

Our system doesn’t much like the 1960s superstars, other than Marichal. Koufax, Gibson and Drysdale appear on the 250-great games list only once each. It’s kind of a reverse bias, I guess; the system considers a 14-strikeout no-hitter to be a greater game in the steroid era than it does in the 1960s, when 14-strikeout no-hitters were thrown every week. They weren’t, of course, but. . ..it was a pitcher’s era. Remy said that Pedro was the greatest he ever saw in part because Pedro was working in the heart of the steroid era, when batters were hitting .360 and driving in 160 runs a year. Our system sees it the same way; not saying it is right, but that’s the way this system sees it.

V. The Greatest Seasons

If we can evaluate every game by this method, then we can evaluate every season. There are certain limitations or flaws to this method, many of which you might see, and, lest I be accused of misrepresentin’, let me point out another that many of you would miss. If the system has a small bias—such as a bias against a pitcher working in a pitcher’s park, or a bias against a control pitcher as opposed to a power pitcher, that bias is a relatively small concern when you are evaluating single games. A bias becomes a much bigger concern when you are evaluating groups of games, because the bias adds up and adds up, like accumulating interest. It marks down the pitcher in the pitcher’s park a little bit every game, and it adds up.

We evaluate the pitcher’s season game by game. If the Expected Game Score in consideration of the league, the opponent and the park is 58 and the pitcher’s Game Score is 60, that would +2. If his Game Score is 56, that would be -2.

There are thousands and thousands of ways that pitcher/seasons can be evaluated—I personally have invented hundreds—and there are no doubt flaws and limitations to this method, but let me point out that it also has very important virtues. First, to evaluate a starting pitcher by his average Game Score is not a bad approach; just on that level, you get a pretty true read on his season. But this approach also evaluates a pitcher’s park on a game-by-game basis, which is an important step forward. We normally assume that the park effect is the same for every pitcher on the team, but this is not always a safe assumption. It is entirely possible that, just by the luck of the draw, one Dodger pitcher has pitched 18 times in Dodger Stadium and has not pitched at all in Coors’ Field, while his teammate with the same number of starts has pitched 14 times in Dodger Stadium but twice in Coors’ Field. Their park effects are not really the same—but almost all analytical systems will assume that they are the same.

Second, the system systematically adjusts for the teams the pitcher pitches against. I remember debating Bob Gibson vs. Juan Marichal with a friend in college; the friend was a Cardinals fan, and I was a Marichal man. My friend insisted that Marichal piled up wins by pitching against the weak sisters in the league, the Mets and the Astros, while Gibson pitched more against the better teams. As often turns out to be the case, the truth is the exact opposite; Marichal actually pitched more against strong teams (proportionally) than Gibson did—but we don’t generally know, because most analytical evaluations of a pitcher’s season never look who the pitcher pitched against. An approach which systematically looks at these issues has a significant built-in advantage.

OK, let’s ask a few questions.

1) What is the best season in the last 60 years, by this method?

By this method, the greatest season by a pitcher in my data was by Pedro Martinez in 2000, when Martinez had a 1.74 ERA and an ERA+ of 291. Martinez’ season scores at +803 points by this method—meaning that his Game Scores total up to 803 points more than expected, given who he was pitching against and where he was pitching.

2) What are the greatest seasons in the last 60 years?

Rank	Year	First	Last	GS	Advantage
1	2000	Pedro	Martinez	29	803
2	1999	Randy	Johnson	35	708
3	1999	Pedro	Martinez	29	705
4	2001	Randy	Johnson	34	681
5	1997	Pedro	Martinez	31	642
6	1997	Roger	Clemens	34	635
7	2002	Randy	Johnson	35	630
8	1965	Sandy	Koufax	41	628
9	1986	Mike	Scott	37	606
10	2000	Randy	Johnson	35	603

3) What are the worst seasons in the last 60 years?

Rank	Year	First	Last	GS	Advantage
1	1973	Steve	Blass	18	-338
2	1971	Carl	Morton	34	-324
3	2005	Jose	Lima	32	-323
4	2008	Livan	Hernandez	31	-307
5	1952	Bob	Feller	30	-306
6	2009	Jeff	Suppan	30	-305
7	2010	Ryan	Rowland-Smith	20	-292
8	1969	Tony	Cloninger	34	-288
9	2009	Manny	Parra	27	-287
10	1957	Chuck	Stobbs	31	-287

Most of these pitchers have ugly records; Chuck Stobbs in 1957 was 8-20 with a 5.36 ERA in a league with a 3.79 ERA. And. . ..something I never knew until I looked that up: Chuck Stobbs, like Willie Mays Aikens and Mickey Mantle and Lary Sorensen and Larry Doby Johnson and Stan Javier, was named after another baseball star. His actual name is Charles Klein Stobbs.

4) Does the "right" man usually win the Cy Young Award, by this approach?

In recent years the Cy Young Award is usually the pitcher who is seen by this method as most deserving of the award—and, when they are not the same, the disagreements are always small. The Cy Young Award winners since 2009, as nominated by this method, would be:

2011 AL Justin Verlander

2011 NL Clayton Kershaw

2010 AL Felix Hernandez

2010 NL Roy Halladay

2009 AL Zack Greinke

2009 NL Tim Lincecum

All of these pitchers did in fact win the award, as this method says that they should have. In 2008 we have a discrepancy in one league; Cliff Lee won the award when this method says that it should have been Halladay, but that’s a minor discrepancy. Halladay finished second in the voting, and Lee finishes second in our analysis.

But in the early years of the voting, 1956-1962, this is anything but true. In that era there isn’t anyone who won the award and deserved it by this method, and many of the winners are not even good candidates for the award. I’ll look at the specifics of some of those awards later.

From 1956 to 1985 there were 49 Cy Young Awards, of which only 14 were won by the pitcher who would have been nominated for the award by this method (29%). But since 1986 there have been 52 Cy Young Awards, of which 29 have been won by the pitcher favored by this method (56%). Of course, we are only considering starting pitchers as candidates for the Award.

5) Improbable pitchers who might actually have been the best in the league.

Yes, Alex Trebek, I know that is not in the form of a question. Camilo Pascual in 1961 was 15-16 with a 3.46 ERA—but is shown by this method to have been the best pitcher in baseball (1. Pascual, 2. Koufax, 3. Juan Pizarro, 4. Jim O’Toole, 5. Whitey Ford.) The park factor in Pascual’s home park was 118, and he concentrated his innings (to an extent) against the strongest teams in the league, pitching 71 innings against New York and Detroit, the league’s two best teams, but only 34 innings against Kansas City (61-100) and Cleveland (78-83).

In 1982 Floyd Bannister was 12-13 with a 3.42 ERA, but may actually have been the best pitcher in the American League (1. Bannister, 2. Dave Stieb, 3. Jim Clancy, 4. Jim Palmer, 5. Dennis Eckersley.) His ERA was only 0.06 higher than the Cy Young winner (Vuckovich), and Bannister was the most sought-after free agent of the 1982-1983 off season. Four of the league’s five best pitchers in 1982 (Bannister, Stieb, Clancy and Eckersley) all pitched in extreme hitters’ parks, and all suffered from very poor offensive support, allowing the Cy Young Award to slip to an underserving candidate. The park factor in Seattle (Bannister) was 120; in Milwaukee (Vuckovich) it was 85.

In 1996 Roger Clemens was 10-13 with a 3.63 ERA, and was criticized by his General Manager for being out of shape and not ready to pitch. This method shows Clemens to have been still the best pitcher in the American League (1. Clemens, 2. Pat Hentgen, who won the Cy Young Award, 3. Juan Guzman, 4. Kevin Appier, 5. Ken Hill.) The American League ERA in 1996 was 4.99, and the Park Factor in Boston was 113. Clemens raw ERA was the sixth-best in the American League.

There actually are strong parallels between Clemens in 1996 and Josh Beckett this year. Clemens in 1996 was a great pitcher, but people misrepresented things, took things out of context, and failed to appreciate what they had. The same this year with Josh Beckett; he is still actually pitching great, but there’s a smokescreen of bad press that is obscuring the performance.

5) Observation. . ..great pitchers "should" win significantly more Cy Young Awards, looked at by this method, than they actually do.

Of the 101 Cy Young Awards, 48 have been won by pitchers who won multiply Cy Young Awards. If this method were used to determine the Cy Young winner, 73 awards would have been won by pitchers winning multiple awards.

The only pitchers to have won more than three Cy Young Awards are Roger Clemens (7), Randy Johnson (5), Steve Carlton (4) and Greg Maddux (4). If this method were used to determine the Award, Clemens would have won it 11 times, Johnson 6 times, Tom Seaver, Greg Maddux and Pedro Martinez 5 times each, and Nolan Ryan 4 times.

There are seven pitchers in history who never won a Cy Young Award, but who would have won multiple Cy Young Awards by this method: Nolan Ryan (4), Dave Stieb (3), Mario Soto (2), Jason Schmidt (2), Phil Niekro (2), Jim Bunning (2) and Camilo Pascual (2). Pascual’s would have been two-league awards, requiring that he be the best pitcher not only in his league, but in the majors. Bunning, in addition to being (arguably) the best pitcher in the majors in 1957 and the best in the National League in 1967, was also the best in the American League in 1960, despite being saddled with an 11-14 won-lost record in that season.

6) Who are some of the worst Cy Young selections ever, by this approach?

Well, there are the obvious ones (Steve Stone in 1980, Pete Vuckovich in 1982) which almost all of you already know about, so there’s not much really to be said about those. This method sees Steve Stone as the #13 starting pitcher in the American League in 1980, with Mike Norris being the best, and Vuckovich in 1982 as the #33 starting pitcher in the league. Some of the less-obvious sharp disagreements include:

1978 National League, Award going to Gaylord Perry. Gaylord actually deserved two Cy Young Awards (1972 and 1974) and was a strong candidate in many other seasons. But in 1978 he went 21-6 with a 2.73 ERA, and was given a Cy Young Award that resulted mostly from his having extremely good offensive support in an extremely good pitchers’ park. In 1978 Gaylord was 21-6, 2.73; in 1982 he was 10-12 with a 4.40 ERA for the Mariners—but this method shows him to have been essentially the same pitcher in 1982 (+59 in 32 starts) that he was in 1978 (+66 in 37 starts). The better candidates for the 1978 National League award were 1. Phil Niekro, 2. J. R. Richard, 3. Tom Seaver, 4. Steve Carlton, and 5. Craig Swan.

1967 National League Award going to Mike McCormick (first National League Award.) We see McCormick as the 18^th-best starting pitcher in the league, with the top five being 1. Jim Bunning, 2. Gaylord Perry, 3. Dick Hughes, 4. Ferguson Jenkins, 5. Gary Nolan.

1990 American League Award going to Bob Welch, who was credited with 27 wins. This one risks being obvious, so let me make a less obvious point. Welch’s season demonstrates that a modern pitcher could win 30 games, if everything broke right for him. Welch was credited with 27 wins—and he wasn’t even really a top-flight pitcher (+93 points.) If Verlander or Kershaw or Stephen Strasburg had a season with this kind of offensive support, he could be credited with 30 wins.

I believe that those five (McCormick, Perry, Stone, Vuckovich and Welch) are the only Cy Young Awards won by starting pitchers who were less than +100 in our system. No Cy Young Award has ever been won by a pitcher who was in fact less than an average pitcher.

7) What are some of the most interesting Cy Young insights, by this method?

Apologies for the vague, self-serving question. There are two other Cy Young races that seem to me particularly interesting in light of this method:

a) The very first Cy Young Award, to Don Newcombe in 1956. Newcombe went 27-7, walked only 46 men in 268 innings, and won the first Cy Young Award with 10 votes, over Sal Maglie (4), Whitey Ford (1) and Warren Spahn (1). Maglie, just 13-5, got Cy Young votes because he joined the Dodgers in mid-season with the Dodgers languishing in 3rd place, moved into the rotation, and went 11-2 down the stretch as the Dodgers pulled ahead. Don Newcombe was not only the Cy Young winner, but also the MVP.

Different methods see the race in different ways. Win Shares sees the most-deserving Cy Young candidate as Early Wynn (28 Win Shares), then Newcombe (27), Johnny Antonelli (25), Herb Score (25), Spahn (24), Tom Brewer (24), Bob Lemon (23), Whitey Ford (22) and Frank Lary (22). The Season Scores method sees the most-deserving candidate as Newcombe (340), then Wynn (301), Spahn (300), Score (290), Ford (283), and Lew Burdette (270).

Our new method picks Herb Score out of this list of fairly even candidates, and pitches him to the front of the list.. ..solidly to the front of the list:

Year	First	Last	GS	Advantage
1956	Herb	Score	33	495
1956	Early	Wynn	35	363
1956	Whitey	Ford	30	275
1956	Billy	Pierce	33	257
1956	Don	Newcombe	36	257
1956	Jack	Harshman	30	254
1956	Warren	Spahn	29	231
1956	Tom	Brewer	31	227
1956	Bob	Friend	30	221
1956	Sal	Maglie	26	202

This method throws Score 238 points ahead of Newcombe—ahead almost two-to-one—and 132 points ahead of Early Wynn. This seems to come out of nowhere, and it poses the question: Have we been missing something here? Or is this system just a little bit nutty in this case?

I have also used Wynn and Score, in other contexts, to illustrate other points. Wynn and Score were teammates that year, which seems to take several issues out of play, and had the same won –lost record (20-9) and similar ERAs (2.53 for Score, 2.72 for Wynn.) While they are similar or the same on so many scales, they are different in other ways, such as along the power/finesse continuum.

Score. . .. most of you already know this, but I have to make sure people can keep up. Score, although ignored in the Cy Young voting, was hardly an obscure player; he was a first-order sensation at the time. From 1947 through 1954, no major league pitcher had struck out 200 batters. You could lead the league with 180, even 160. Then along came Score, 21 years old in 1955, 22 years old in 1956, who struck out 245 batters in 1955, 263 in 1956. Almost everybody projected him as a Hall of Fame pitcher—once he refined his control a little bit. He was the next Bob Feller.

It’s difficult to explain how out of context Score is. Since 1956, totals of strikeouts have gone up, and up, and up; most of you know this. Thus, when you arrange pitchers along a power/finesse continuum (by strikeouts and walks), the list of "power" pitchers is entirely dominated by pitchers from the last fifteen years—except that, even 55 years later, Herb Score is still number one the list. He got hurt the next year, in one of baseball most dramatic on-field injuries, became the Cleveland broadcaster, and is still very much remembered today, even though he won only 55 major league games, losing 46.

The obvious "anti-Score, pro-Newcombe" argument is:

1) This system is derivative of Game Scores, which are biased in favor of Power Pitchers, and

2) Newcombe was perhaps the greatest hitting pitcher of all time. This system denies him credit for his contributions with the bat.

The second point is true, but not really all that meaningful. Newcombe hit just .234 in 1956, 2 homers, 16 RBI, 12 walks, .654 OPS, whereas Score hit .184 with 1 homer, 8 RBI, 9 walks, 11 runs scored, .513 OPS. Newcombe was a great hitter and Score was a poor hitter, but if you focus just on the 1956 season, there really isn’t that much difference between them as hitters. . .five runs or thereabouts.

As to the "power pitcher" bias; first, I do not concede that Game Scores are biased in favor of a power pitcher. It is my belief that Game Scores have not an irrational preference for strikeouts as opposed to other outs, but rather, an entirely rational and justified preference for power pitchers.

But set that aside, and let’s assume that there is a prejudice in favor of the power pitcher here. Score had 263 strikeouts, true, whereas Newcombe had only 139—but Score also had 129 walks, whereas Newcombe had only 46. Game Scores credit a pitcher with one point for a strikeout, take away one point for a walk. Score is +134, strikeouts over walks; Newcombe is +93. That’s only a 41-point advantage for Score. Score’s edge on Newcombe, based on actual versus expected Game Scores, is 238 points. Something else is going here, something unexpected.

Let’s get into the data. Don Newcombe in 1956 was 27-7 with a 3.08 ERA. Against teams with less-than-.500 records, he was 23-5 with a 2.65 ERA. Against teams with a better-than-.500 record, he was 4-2 with a 4.65 ERA.

Score, on the other hand, was 9-3 with a 3.14 ERA against sub-.500 teams, whereas he was 11-6 with a 2.25 ERA against .500+ teams. Big, big difference in who he pitched against.

This data is skewed against Newcombe, because there were only three .500+ teams in the National League and Newcombe played for the best of them, whereas there were five .500+ teams in the American League, the Indians and four others. Still, against the three worst teams in the American League, Score started 14 times and was 9-3. Against the three worst teams in the National League, Newcombe started 17 times and was 16-2 (one win in relief.)

Now let’s look at the run context. The Park Effect in Cleveland was 94; in Brooklyn, 103. But the American League ERA was 4.16; the National League ERA was 3.77. If you park-adjust the league ERAs, you get about 4.04 for the Indians, 3.82 for the Dodgers. Newcombe’s ERA was about 76 points better-than-context. Score’s was 151 points better.

If we compare Score to his teammate Early Wynn, who had the same won-lost record. . ..Wynn was 12-0 against teams with losing records, with a 1.65 ERA (Score was 9-3, 3.14). But against winning teams, Wynn was just 8-9 with a 3.49 ERA; Score (as mentioned) was 11-6 with a 2.25 ERA.

The average expected Game Score for Don Newcombe, in view of who he pitched against, the league and park norms, was 51.8. For Score, it was 49.3. So it turns out that—agree or disagree--there are some valid reasons why this method likes Score in 1956 better than it does Newcombe or Wynn. I kind of think, given a 1956 Cy Young ballot and this information, I might have to vote for Score.

b) The 1962 Award, which went to Don Drysdale. Our system thinks it should have gone to Bob Gibson.

Younger readers will think "Bob Gibson. . .so what; Gibson was always great." Those of you who are old enough to remember it will remember that Bob Gibson in 1962 was just 15-13, which was the best year he had had up to then, and people in 1962 still thought of Gibson as just a hard-throwing young pitcher who might be good once he harnessed his control.

Gibson was actually older than Drysdale, but six years later in getting established as a major league star. Our system does not dislike Drysdale; our system actually thinks that Drysdale should have won the Cy Young Award both in 1960 and in 1964, and also that he was the best starting pitcher in the National League in 1957. In 1962, however, our method sees Drysdale as the #5 candidate for the Cy Young Award, behind Gibson (405 points), Hank Aguirre (333), Sandy Koufax (310), and Camilo Pascual (272. Drysdale is at 269.)

But this one, actually, is very straightforward compared to the 1956 contest. Drysdale pitched more innings than Gibson, and Drysdale had a 2.83 ERA, but in a pitchers’ park (Park Effect 82), whereas Gibson had essentially the same ERA (2.85) but in a hitters’ park (115). Traditional, pre-sabermetric evaluation rested on their won-lost records (25-9 for Drysdale, 15-13 for Gibson). Modern evaluation writes that off to offensive support, and looks more to the run context as the key to value. Both Drysdale and Gibson were outstanding hitters, but in 1962 Gibson hit much better than Drysdale, so that would push Gibson further ahead, if we integrated that into our analysis.

VI. The Greatest Careers

1. By this method, the top 30 pitchers within my data are as follows:

Rank	First	Last	Career Advantage
1	Roger	Clemens	7574
2	Randy	Johnson	6796
3	Nolan	Ryan	5561
4	Pedro	Martinez	5369
5	Tom	Seaver	5144
6	Bert	Blyleven	4295
7	Greg	Maddux	4290
8	Curt	Schilling	4235
9	Bob	Gibson	4057
10	Steve	Carlton	4042
11	Ferguson	Jenkins	3437
12	Mike	Mussina	3421
13	Gaylord	Perry	3303
14	John	Smoltz	3249
15	David	Cone	3126
16	Roy	Halladay	3039
17	Sandy	Koufax	3037
18	Jim	Palmer	3031
19	Phil	Niekro	2883
20	Kevin	Brown	2743
21	Johan	Santana	2658
22	Don	Sutton	2586
23	Luis	Tiant	2550
24	Bret	Saberhagen	2414
25	C.C.	Sabathia	2384
26	Jim	Bunning	2365
27	Juan	Marichal	2358
28	Dave	Stieb	2337
29	Don	Drysdale	2220
30	Kevin	Appier	2197

2. This method essentially draws a Hall of Fame line at about +2500 points. The weakest Hall of Famer for whom we have full-career data, by this method, is my childhood hero, Catfish Hunter (+1306). Drysdale, Marichal and Bunning are under 2500 points, but the system may discriminate to an extent against the pitchers of the 1960s (which is fine, since evaluation by raw statistics discriminates strongly in their favor.) Eckersley is under 2500 as a starter, but actually, not all that far under; if you put in his relief career he appears to be pretty solid. Luis Tiant is over 2500 points and not in the Hall of Fame, but that’s just a historical oversight; he should be in.

3. Recently retired pitchers. . .Roger Clemens, Randy Johnson, Pedro, Maddux, Schilling, Mussina and Smoltz are all over the Hall of Fame line, as are the less-obvious names David Cone and Kevin Brown. Saberhagen, Stieb, Appier and Guidry are not ridiculous candidates, but short. Chuck Finley, Jack Morris, Tom Glavine, Dwight Gooden, Mark Langston, and Frank Viola are behind them.

4. Active candidates. . .Roy Halladay and Johan Santana are both over the +2500 line, although I will remind you that this is a zero-sum method, so pitchers go up AND down. Sabathia, Oswalt and Hudson are short but in good shape, just a couple of years away or less (not sure if Oswalt is coming back or not.) Andy Pettitte is a substantial distance away, at 1,640. Verlander is burning it up, at 1,604; Beckett is not Hall-of-Fame qualified at 1,594, but is in a strong position. Tim Lincecum and Felix Hernandez were in super shape until they started pitching the way they are.

Thanks for reading. If it wasn’t for you guys, I wouldn’t get to do this stuff for a living.

COMMENTS (20 Comments, most recent shown first)

KaiserD2
I enjoyed the piece very much, although I would have appreciated having the definition of a "game score" up front. I do however have a couple of quibbles about the methodology.
To begin with, Bill, you've gotten awfully interested in various "situational" issues in recent years, and I think their importance is overrated. Wins against weak teams count just as much in the standings as wins against strong teams. The 1954 Indians won 111 games while going 22-22 against the two best teams in the league. In any case, total won-loss records are so fickle, as we all know, that when you break them down into subsets they must become too fickle to take very seriously.
But secondly, this method rather cleverly slides around a critical change in baseball which affects the value of an individual pitcher: the drastic decline in innings pitched. In my opinion, win shares show that no pitcher of this era is nearly as valuable to his team as the great pitches of the 1960s were to theirs, because they pitched well over 300 innings. This measurement appears to measure the ability to turn in a dominant performance, and properly gives more credit to the ability to do so against a better-hitting team or in a better-hitting era. But the role of individual pitchers in winning a pennant has, in my opinion, definitely declined.
7:56 PM Jul 3rd

toonarmy
Hey Bill,

As a Brit I don't really have any subjective opinions on the rankings, but I can see that this is great, great work. Did you consider also ranking on a per-game or per-start basis?
11:56 AM Jul 1st

sprox
This article is going to cause the Jack Morris Hall-of-Fame supporters to bring out their pitchforks and torches ... well it would if they were to actually read the article - so I think you're safe.

I'm guessing your analysis does not take into account "Fear" and "Ability to win baseball games"
10:02 PM Jun 22nd

flyingfish
Very nice piece; I'll have to go back and read it again, at least once. You know, I've been telling my baseball-watching friends for years that Pedro is certainly the greatest pitcher I have ever watched, and I've watched some great ones. There were a few years in there with the Red Sox where he dominated the league as nobody else I've seen did; I think his ERA was a full run below the next-best guy once or twice, which is amazing.
3:05 PM Jun 22nd

Florko
I would be interested in this same system for playoff pitchers
1:12 PM Jun 22nd

raincheck
This method picks up on what made Nolan Ryan seem like such a great pitcher. I used to love to go see him pitch in Anaheim. Sometimes he flamed out because of his wildness, but you had a very high percentage chance of seeing a dominating performance. A great value for your ticket price, if not necessarily for management.
12:36 PM Jun 22nd

Steven Goldleaf
Just a guess that Stobbs was named after Chuck Klein, and not after a family member or friend named "Klein"? If his official name were "Chuck Klein Stobbs" (as Mantle's real name is "Mickey" and Willie Mays Aikens' is "Willie") then I'd be more immediately convinced. Fabulous article.
11:19 AM Jun 22nd

ajmilner
Fantastic article, Bill.

Regarding the Tigers-A's 5/18/1912 game: Jack Coombs started that day for the A's. He threw three scoreless innings (he would be replaced by Boardwalk Brown and, later, 18-year old Herb Pennock, who had made his MLB debut just four days earlier) without allowing a baserunner and striking out three.* This would give Coombs a game score of 62, thus making a Game Score differential of 114.

* Info courtesy of the boxscore in the 5/19/12 Philadelphia Inquirer; the 1860-1922 Inquirer is available through the Free Library of Philadelphia's website.
8:44 PM Jun 21st

FPITAGNO
quite simply, the best article ive ever read on pitchers ive watched in my lifetime.....thank you for this work
8:00 PM Jun 21st

renny
Bill this article is like a veteran pitcher completely in control of every pitch, totally at one with his work with a balance of experience and skill, not spectacular like Ryan or Johnson but methodically merciless like Maddux....
5:03 PM Jun 21st

schwarze
Bill,

I understand the Greg Maddux was not a strikeout pitcher, but I thought his 1994 or 1995 seasons would at least make the top 10. Where did Maddux's best seasons fall?
4:13 PM Jun 21st

nettles9
Wonderful work, Bill. Wonderful.
3:59 PM Jun 21st

bobfiore
I'm a Dodger fan going back to the Walter Alston era and I don't remember Pat Perry.
3:06 PM Jun 21st

wovenstrap
Fascinating that Mike Scott's 1986 season pops up on that list -- in the Astrodome! You'd have thought that the correction would work the other way.
2:15 PM Jun 21st

chill
Great great great article.

I may have missed something, but on the list of all time greatest careers "that are in your database" does your database not include people like Walter Johnson, Lefty Grove, or did they just not rack up the high game scores because the game was different, with lower strikeouts?

If it's the latter, the list takes on an added interest, since the steady increase in strikeouts parallels the overall upward slope of the quality of play. Your list therefore provides a set of answers to the eternal question: if you could pick only one pitcher to win a game for you, who would it be?

Answer: you don't bring Walter Johnson forward in time. You don't even put your money on the 1960s Koufax or Gibson. The greatest pitchers ever pitched in the last few decades, by and large. And here's how they rank.
2:09 PM Jun 21st

jdw
Really fun piece, Bill.

The "Improbable pitchers who might actually have been the best in the league" section is something that almost deserves a longer amplified piece on it's own, perhaps not just the best (i.e. #1) but candidate for it (Top 3).
1:31 PM Jun 21st

johnvgps
One of the things I was surprised by was that no pitcher had struck out more than 200 batters in a season between 1947-1954. Bob Feller immediately came to mind, but although he led the league in 1947 and 1948, it was with "just" 196 and 164 Ks.

Feller lead the league in Ks for seven straight seasons if you count his 1941 and 1946 seasons as consecutive, with totals of 240, 246, 261, 260, 348(!), 196, and 164.

I'd like to see who the pre-1956 Cy Young award winners would have been by this method.
11:24 AM Jun 21st

Robinsong
Bill - Were you as surprised as me by the rating on Ryan's career?
10:42 AM Jun 21st

Robinsong
Thanks, Bill. One of the great articles; I love correcting Game Scores for context. I think we underappreciate this recently retired generation of pitchers; particularly since they will be compared with such all-time greats, Smoltz, Mussina, Brown, Cone, and Schilling will have a tough time getting in. We also overvalue saves.
10:40 AM Jun 21st

pgaskill
> If it wasn’t for you guys, I wouldn’t get to do this stuff for a living.

You deserve a raise. (Well, you've deserved one since the 1970s, but you know what I mean.)
9:44 AM Jun 21st

The Greatest of Games

COMMENTS (20 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: