1. The Bad Warren Spahn
A quick observation about Warren Spahn: Did you ever notice that his worst seasons are actually his best seasons? Between 1949 and 1963, Spahn had only three seasons in which he didn’t win 20 games: 1952, 1955 and 1962. In 1955 he went 17-14 with a 3.26 ERA in 246 innings; he was third in the National League in Innings Pitched and fourth in ERA. . .well, OK, that’s a terrible season; there is no way to defend that one. But the other two "bad" seasons, 1952 and 1962, are not only as good as his other seasons, they’re arguably better. In 1952 he was 14-19, but had easily the best strikeout to walk ratio of his career, 183 to 73. Well, "easily". . .his ratio was almost as good in 1956, but with fewer strikeouts and walks. I have learned from Tom Tango that the most meaningful indicator is the strikeout to walk margin—183 to 73, a margin of 110. His best in any other season was 80. He pitched 290 innings, his fifth-best total (although he also pitched exactly 290 in two other seasons). He gave up only 19 homers, fewer than he had given up in any of the three previous seasons. His ERA, 2.98, was the same as it had been the previous season, when he had won 22 games.
That’s 1952, but what got me started on this was 1962. Spahn won 20 games in 1956, 1957, 1958, 1959, 1960, 1961, and 1963, usually going about 21-11. In 1962 he went just 18-14. But the 1962 season is not only as good as the other seasons; it’s actually better. Lee Sinins’ Complete Baseball Encyclopedia has a summary column called "RSAA", or Runs Saved Against Average. It is park- and league-adjusted. Spahn is credited with 24 RSAA in 1962—more than he had in any other season after 1956.
The reason I started this, though, was a different study, my own study; I am merely citing Sinins to establish that I might not be crazy. I have done a very careful study of every start in my data base—241,536 starts from 1952 to 2013—evaluating every start based on
1) The Season,
2) The Park,
3) The Opposition, and
4) The pitcher’s performance.
That is, whereas almost all evaluations of pitchers are based on season totals, implicitly assuming that the quality of opposition faced by one pitcher is the same as the quality of opposition faced by another, this evaluation looks at the teams the pitcher was matched up against. The conclusion of this approach is that Spahn did in fact pitch better in 1962 than in any other season in my data. In 34 starts in 1962 Spahn had 5 starts which score at "10" on a 10-point scale, 7 which score at "9", 3 which score at "8", and 6 which score at "7". 21 of 34 starts score at 7 or higher on a 10-point scale; only 8 score at 4 or lower. His average performance evaluation in 34 starts was 6.50—the highest of his career, in the data that I have—and he pitched well in 24 out of 34 starts, the highest percentage of his career.
2. Pedro Ramos, 1960
Pedro Ramos in 1960 finished 11-18 with a 3.45 ERA—about the same won-lost record that he had in 1958 (14-18), 1959 (13-19) or 1961 (11-20), although his ERA was better in 1960. According to this study, however, Ramos was actually the second most-valuable pitcher in the American League in 1960, behind another pitcher with a losing record, Jim Bunning. Bunning was 11-14 but with a great ERA and strikeout/walk ratio, so I’ve argued many times before that Bunning was the best pitcher in the league despite his 11-14 record. The argument about Ramos is new.
Ramos in 1960 made 36 starts, which led the American League. But what you wouldn’t know is who he started against. The four best offenses in the league in 1960 (park-adjusted) were the Yankees, the White Sox, the Orioles and the Indians; the four worst offenses were the Kansas City A’s, the Tigers, the Red Sox and the Senators. Ramos started 23 times against the "good" offenses, 13 times against the weak offenses. If you weight the won-lost record of each team by the number of starts Ramos made against them, the average quality of his opposition is .531. When you take that into account. . .he’s very nearly the best pitcher in the league.
3. Explaining My Study in Very General Terms
I called this the 10 Levels Study, but there were actually 11 levels. I sorted every start by every pitcher into one of 11 groups: 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, or 0. There were the same number of starts at every level, so an average start was exactly 5.000000.
Not only was the average start at 5.00000, but an average start in every season was very near to 5.00. An average start in 1968, when the National League ERA was 2.98, would be very close to 5.00, and an average start in 2000, when the league ERA was 4.64, would be close to 5.00, because I adjusted for the runs scored in the league. An average start in Fenway Park would be about 5.00, and an average start in a pitcher’s park would be about 5.00, since I adjusted for the ballpark. And an average start against the Big Red Machine in 1976, or against the Yankees in 1998, would be about 5.00, and an average start against the 1962 Mets would be about 5.00, because I adjusted for the quality of the opposition’s offense.
Then I counted the number of times that each pitcher pitched at a "10" level, a "9" level, etc. There were 21,958 starts in each group, except that the "10" level and the "0" level only had 21,957, because 241,536 is not evenly divisible by 11.
4. I Never Knew That
This is what I had never understood, until doing this study. . ..and it is absolutely amazing that I never understood this, because it is an extremely fundamental truth about the game, which I had somehow unaccountably missed up until this point. Dominant pitchers almost never actually have bad games. I never knew that. Guys like Koufax, Carlton, Gibson, Pedro, the Big Unit, Gooden when he was good. ..they almost never actually have bad games. They lose sometimes, because sometimes they run up against another pitcher having an equally good day, and sometimes they give up a few runs because they may be pitching against a good team in a good hitter’s park or something. But in terms of just having a bad day. . .they almost never do. Their Good Game/Bad Game percentage is actually very close to 1.000.
Suppose we count every start that scores at "6" or above as a "Good Game", and every start at "4" or below as a "Bad Game", and every start at "5" as a No Decision. An average Good Game percentage is .500. Randy Johnson in 1997 was 25-0. He did have four no-decisions—four games that scored at "5" on a 10-point scale—but no actual bad games.
Randy was the only pitcher who was perfect in more than 12 starts, but Pedro Martinez in various seasons was 27-1, 27-1, 28-3 and 24-4. Randy was 31-1 in 2001. Bob Gibson was 30-1 in 1968 and 30-3 in 1969. Greg Maddux in different seasons was 24-2, 23-2 and 25-4. Another pitcher, who I will discuss later, was 34-1 in one season; in other seasons he was 35-5 and 21-3. Sandy Koufax in his big four seasons was 32-8, 23-2, 35-5 and 34-5. Sometimes these guys lose, but it’s not because they don’t pitch well. They pitch well every time out.
5. Starting on Two Days Rest
While I was doing this, I started wondering about the disappearance of pitchers starting on short rest. . ..when exactly did that become not an option?
Since I had the data in front of me, I decided to count the number of times a pitcher started on two days rest (or less. Wilbur Wood once famously started both games of a double-header, and Al Santorini, less famously, also did that, May 26, 1971; he just faced one batter in the first game.)
Anyway, I counted pitchers starting on two days rest or less. This happens most often, I discovered, after a pitcher has been knocked out early in the previous start. Also, I discovered that it is very difficult to determine exactly what constitutes being knocked out early in your previous start.
Let’s start with a simple, unadorned count of how many pitchers started on two days rest or less, by season:
|
0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
195-
|
|
|
70
|
71
|
62
|
52
|
79
|
64
|
58
|
43
|
196-
|
56
|
62
|
64
|
47
|
58
|
64
|
61
|
51
|
39
|
57
|
197-
|
44
|
44
|
81
|
95
|
26
|
36
|
31
|
32
|
26
|
13
|
198-
|
14
|
5
|
19
|
7
|
8
|
12
|
12
|
8
|
7
|
5
|
199-
|
16
|
6
|
3
|
8
|
4
|
10
|
5
|
5
|
5
|
3
|
200-
|
7
|
4
|
3
|
3
|
3
|
3
|
2
|
1
|
2
|
2
|
201-
|
1
|
3
|
2
|
|
|
|
|
|
|
|
The surge in these numbers in 1972-1973. ..that’s the Chuck Tanner blip. I had kind of forgotten that entire episode; I was in the Army when it happened, and not able to follow the season with my usual energy. But Chuck Tanner with the 1972-73 Chicago White Sox went to essentially a three-man rotation, starting Wilbur Wood, Stan Bahnsen and Tom Bradley frequently on two days’ rest (Wood, Bahnsen and Steve Stone in 1973). Bradley started 8 times on two days’ rest in 1972, none in 1973; Bahnsen started 8 times on two days’ rest in 1972 and 8 times in 1973, and Wilbur Wood started 14 times on two days’ rest in 1971, 25 times in 1972 and 19 times in 1973. Steve Stone in 1973 started six times on short rest.
It wasn’t all Tanner; it was in the air. Starting pitchers on short rest had been on the way out since my data set begins in 1952, but it came back, for a couple of years. Paul Splittorff started four times on short rest in 1973, Phil Niekro 9 times; Mickey Lolich did it 3 times in 1971, 2 in 1972, and 3 in 1973. Blyleven started 3 times on short rest in 1972.
That was the last gasp of that strategy; it died quickly after that. Above I gave you the counts of pitchers starting on two days’ rest (or less), but about half of those are pitchers who were knocked out early in their previous start. . ..not Wilbur Wood, obviously, but many of the others. These are the counts of pitchers starting on two days’ rest or less after having faced 15 or more batters in the previous start:
|
0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
195-
|
|
|
46
|
46
|
34
|
29
|
42
|
31
|
28
|
20
|
196-
|
26
|
28
|
36
|
23
|
26
|
26
|
33
|
29
|
23
|
26
|
197-
|
23
|
31
|
63
|
78
|
15
|
19
|
16
|
11
|
8
|
3
|
198-
|
7
|
1
|
6
|
3
|
3
|
5
|
4
|
1
|
2
|
2
|
199-
|
6
|
2
|
1
|
2
|
2
|
3
|
2
|
2
|
1
|
1
|
200-
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
201-
|
0
|
0
|
0
|
|
|
|
|
|
|
|
If you work the data, you can see that in the 1950s, 45% of pitchers who started on two days’ rest or less had been knocked out early in their previous starts. In the 1960s this increased to 51%. In the 1970s, because of the Chuck Tanner blip, it decreased to 38%; Tanner was not starting pitchers on two days rest after they were knocked out early, he was doing it because he had very few good pitchers. But in the 1980s, 65% of pitchers who started on short rest had been knocked out early in their previous start; in the 1990s, 66%. Since 2001, every pitcher who has started on two days’ rest has been knocked out early in the previous start.
6. Stieb and Spahn
It is true, but sad, that wins play a huge role in deciding who gets into the Hall of Fame. That's why Dave Stieb, for instance, who was more valuable in his four or five best seasons tha(n) Warren Spahn was, will never get anywhere near the Hall.
David Kaiser
(Commenting on Dave Fleming article, Defending the Win)
Well. . .I would sort of two-thirds agree with that; maybe three-fourths. Dave Stieb, according to this study (and in concert with many previous studies) was probably the best starting pitcher in the American League in three consecutive seasons, and may have deserved as many as four Cy Young Awards. He won none. He was the best starting pitcher in the American League in 1983; the Cy Young Award went to LaMarr Hoyt, at that time not yet a criminal. He was the best starting pitcher in the league in 1984; the Cy Young Award—and the MVP—went to a reliever, Willie Hernandez. He was the best starting pitcher in the American League in 1985; the Cy Young Award went to Bret Saberhagen.
My method does not show Stieb as the best pitcher in the American League in 1982; it ranks him third, behind Floyd Bannister and Rick Sutcliffe. Still, Stieb may have deserved the Cy Young Award that year, too; certainly he was better than Ugly Pete, who won the award, and he does rank first by other methods, and good methods. In 1981 Stieb was probably the fourth-best starting pitcher in the American League, behind Steve McCatty, Dave Righetti and Jack Morris; the Cy Young Award—and the MVP—went to a reliever, Rollie Fingers. After 1985 Stieb was not the same, although he was about the 6th-best starting pitcher in the American League in 1990.
I have to agree, then, that Stieb was the best starting pitcher in the American League in his era, and I have to agree that he was denied Cy Young Awards that he deserved because of his won-lost records. I will also agree generally that the influence of won-lost records on award voting has been excessive and pernicious; in that debate I switch teams more often than Edwin Jackson, but at the moment I am on the team opposed to Won-Lost records.
I also have to agree, reluctantly, that Stieb was more dominant in his best years than Warren Spahn was, at least in the data that I have. The best season that Spahn had, in my data, was 1962. I have several different indicators of overall value coming out of this study, but Stieb beats Spahn’s best seasons by every indicator in 1983, 1984 and 1985, and by some indicators in 1982 as well. Stieb was in fact more dominant than Spahn, at least the Warren Spahn covered in my data.
Stieb was more dominant than Spahn, but not much more dominant than Spahn; he was better than Spahn by a thin margin, whereas Spahn was outstanding for more than twice as long. Stieb was more dominant than Spahn, but his level of dominance does not approach that of many other pitchers in the data—Seaver, Clemens, Gibson, Pedro, Big Unit, Carlton, etc.
But as to the suggestion that Stieb has been denied Hall of Fame consideration because of his Won-Lost records. . .absolutely not. Dave Stieb does not remotely approach the standard of a Hall of Fame pitcher, and it would be absurd for him to be considered for that honor. Stieb was the best pitcher in the American League at a time when the American League was desperately short of outstanding pitchers.
Stieb made 412 starts in his career, which is a short career for a Hall of Fame candidate. There are Hall of Famers in that range and below, like Koufax and Dazzy Vance, but those are pitchers who were much more dominant than Stieb was. Pedro Martinez and Roy Halladay are deserving Hall of Famers in about that number of starts. But in his 412 starts, Stieb is extremely comparable to four other pitchers who had about the same number of starts: Larry Jackson, Kevin Appier, Mark Langston and Steve Rogers. Let’s look first at games started:
Mark Langston, 428
Larry Jackson, 420
Dave Stieb, 412
Kevin Appier, 402
Steve Rogers, 393
Larry Jackson actually made 429 starts, not 420; I am missing data for nine starts from early in his career. Anyway, let’s break those down into "Good Starts" and "Bad Starts". Stieb in his career had 231 "Good Starts"—considering where he was pitching and who he was pitching against—153 "Bad Starts", and 28 starts which are kind of neutral. All of these other pitchers had similar data:
First
|
Last
|
Good
|
Poor
|
Good Start Pct
|
Kevin
|
Appier
|
221
|
142
|
.609
|
Mark
|
Langston
|
234
|
153
|
.605
|
Dave
|
Stieb
|
231
|
153
|
.602
|
Steve
|
Rogers
|
222
|
153
|
.592
|
Larry
|
Jackson
|
217
|
151
|
.590
|
That chart reduces to three columns the data that I took great pains to produce in eleven columns, so let’s deal with all eleven. In the chart below the "10s" are the outstanding starts, and the "0s" are the extremely poor starts:
First
|
Last
|
10
|
9
|
8
|
7
|
6
|
5
|
4
|
3
|
2
|
1
|
0
|
Kevin
|
Appier
|
59
|
49
|
42
|
35
|
36
|
39
|
34
|
43
|
22
|
14
|
29
|
Mark
|
Langston
|
65
|
47
|
45
|
39
|
38
|
41
|
30
|
31
|
35
|
24
|
33
|
Dave
|
Stieb
|
56
|
60
|
46
|
36
|
33
|
28
|
26
|
50
|
24
|
36
|
17
|
Steve
|
Rogers
|
61
|
39
|
52
|
42
|
28
|
18
|
45
|
35
|
28
|
24
|
21
|
Larry
|
Jackson
|
61
|
45
|
41
|
34
|
36
|
52
|
26
|
39
|
26
|
26
|
34
|
That chart in that form doesn’t show you very much; it is merely a necessary step toward the analysis that will follow. tieb had no more outstanding starts than these other pitchers, although he did have fewer truly terrible starts. Let’s look at the average start value for these pitchers:
First
|
Last
|
Count
|
Total
|
Average
|
Steve
|
Rogers
|
393
|
2294
|
5.84
|
Kevin
|
Appier
|
402
|
2346
|
5.84
|
Dave
|
Stieb
|
412
|
2396
|
5.82
|
Mark
|
Langston
|
428
|
2446
|
5.71
|
Larry
|
Jackson
|
420
|
2356
|
5.61
|
No indication there that Stieb is better than the other guys. To the best of my knowledge that’s actually a good way to look at the data. Each step up in this data represents about the same gain in wins for your team as every other step up; therefore a simple average is a valid way to process the data, although it is not the only option. Another way to look at the data is to compare the players to the average, which is 5.0000:
First
|
Last
|
Count
|
Total
|
Average
|
Plus
|
Kevin
|
Appier
|
402
|
2346
|
5.84
|
336
|
Dave
|
Stieb
|
412
|
2396
|
5.82
|
336
|
Steve
|
Rogers
|
393
|
2294
|
5.84
|
329
|
Mark
|
Langston
|
428
|
2446
|
5.71
|
306
|
Larry
|
Jackson
|
420
|
2356
|
5.61
|
256
|
Appier and Stieb are each 336 "steps" above an average pitcher, in their careers. We could also do Performance Above Replacement, assuming that a Replacement-Level pitcher would generally deliver about a "3":
First
|
Last
|
Count
|
Total
|
Average
|
Plus
|
Above Replacement
|
Mark
|
Langston
|
428
|
2446
|
5.71
|
306
|
1162
|
Dave
|
Stieb
|
412
|
2396
|
5.82
|
336
|
1160
|
Kevin
|
Appier
|
402
|
2346
|
5.84
|
336
|
1140
|
Steve
|
Rogers
|
393
|
2294
|
5.84
|
329
|
1115
|
Larry
|
Jackson
|
420
|
2356
|
5.61
|
256
|
1096
|
Dave Stieb pitched most of his career for very good teams; he actually pitched for much better teams than these other four pitchers. Stieb came up in 1979; the Blue Jays got to about .500 in 1982, and were always over .500 after 1982. Appier spent all of his best years pitching for Kansas City Royals teams that were almost universally miserable. Rogers, like Stieb, came up with an expansion team, but Rogers pitched for losing teams for the first six years of his major league career. Larry Jackson pitched for the Cardinals while they were in a down phase—they had losing records in 1955, 1956, 1958 and 1959—and, when the Cardinals started to improve, was traded to the Cubs, where he pitched for losing teams in 1964, 1965 and a 100-loss team in 1966. Langston came up with the Mariners in 1985, and pitched for five years (1985-1989) for Seattle teams that were never within 10 games of .500, pitched one year for a .500 team in Montreal, then pitched the second half of his career with Los Angeles Angel teams that were usually under .500 and never very far above .500. (They were under .500 in 1990, 1992, 1993, 1994 and 1996. They were exactly .500 in 1991, and a few games over .500 in 1995 and 1997.
Dave Stieb, as a Hall of Fame candidate, is no better than Kevin Appier, Mark Langston, Steve Rogers or Larry Jackson, and there are other pitchers in that class—pitchers less perfectly comparable to Stieb, but still generally comparable to Stieb. That group includes Doc Gooden, John Candelaria, Jimmy Key, Orel Hershiser, Andy Benes, Javier Vazquez, Milt Pappas, Fernando Valenzuela and David Wells. These are the Good Start/Poor Start ratios for those pitchers:
First
|
Last
|
Count
|
Good
|
Poor
|
Good Start Pct
|
John
|
Candelaria
|
356
|
200
|
120
|
.625
|
Dwight
|
Gooden
|
410
|
234
|
141
|
.624
|
Jimmy
|
Key
|
389
|
220
|
138
|
.615
|
Orel
|
Hershiser
|
466
|
262
|
165
|
.614
|
Andy
|
Benes
|
387
|
210
|
137
|
.605
|
Dave
|
Stieb
|
412
|
231
|
153
|
.602
|
Javier
|
Vazquez
|
443
|
231
|
175
|
.569
|
Milt
|
Pappas
|
462
|
241
|
185
|
.566
|
Fernando
|
Valenzuela
|
424
|
213
|
170
|
.556
|
David
|
Wells
|
489
|
232
|
205
|
.531
|
|
|
|
|
|
|
OK, Javier Vazquez is a special case; there are unique problems with his career, so let’s take him out of the group so that we don’t get hung up arguing about him. This chart gives the Average Performance Levels, the .500 Plus numbers, and the "Above Replacement Level" numbers for this group:
First
|
Last
|
Average
|
Above .500
|
Above Replacement
|
Dwight
|
Gooden
|
5.84
|
344
|
1164
|
Dave
|
Stieb
|
5.82
|
336
|
1160
|
John
|
Candelaria
|
5.72
|
257
|
969
|
Jimmy
|
Key
|
5.67
|
262
|
1040
|
Andy
|
Benes
|
5.59
|
230
|
1004
|
Orel
|
Hershiser
|
5.57
|
266
|
1198
|
Milt
|
Pappas
|
5.46
|
214
|
1138
|
Fernando
|
Valenzuela
|
5.38
|
163
|
1011
|
David
|
Wells
|
5.31
|
150
|
1128
|
|
|
|
|
|
But there is another class of pitchers who are distinctly above these pitchers, and who aren’t going to go into the Hall of Fame, either. That class includes David Cone, Kevin Brown, Bret Saberhagen, Luis Tiant, Mickey Lolich, Chuck Finley and Vida Blue. Let’s begin by looking at their Good Start/Poor Start ratios:
First
|
Last
|
Starts
|
Good
|
Poor
|
Good Start Pct
|
David
|
Cone
|
419
|
260
|
120
|
.684
|
Kevin
|
Brown
|
476
|
286
|
145
|
.664
|
Bret
|
Saberhagen
|
371
|
211
|
122
|
.634
|
Luis
|
Tiant
|
484
|
275
|
170
|
.618
|
Dave
|
Stieb
|
412
|
231
|
153
|
.602
|
Mickey
|
Lolich
|
496
|
276
|
183
|
.601
|
Chuck
|
Finley
|
467
|
249
|
168
|
.597
|
Vida
|
Blue
|
473
|
254
|
179
|
.587
|
Cone, Brown and Tiant made more starts than Stieb and with a higher percentage of effective starts. Saberhagen made 10% fewer starts but with a significantly better percentage of good starts, while Lolich, Finley and Blue made significantly more starts with a percentage only slightly lower. Let’s look at the averages:
First
|
Last
|
Average
|
David
|
Cone
|
6.24
|
Kevin
|
Brown
|
6.04
|
Bret
|
Saberhagen
|
6.03
|
Luis
|
Tiant
|
5.89
|
Dave
|
Stieb
|
5.82
|
Mickey
|
Lolich
|
5.78
|
Vida
|
Blue
|
5.70
|
Chuck
|
Finley
|
5.68
|
The margin above average:
First
|
Last
|
Above .500
|
David
|
Cone
|
518
|
Kevin
|
Brown
|
495
|
Luis
|
Tiant
|
432
|
Mickey
|
Lolich
|
387
|
Bret
|
Saberhagen
|
382
|
Dave
|
Stieb
|
336
|
Vida
|
Blue
|
333
|
Chuck
|
Finley
|
316
|
And the margin above replacement level:
First
|
Last
|
Above Replacement
|
Kevin
|
Brown
|
1447
|
Luis
|
Tiant
|
1400
|
Mickey
|
Lolich
|
1379
|
David
|
Cone
|
1356
|
Vida
|
Blue
|
1279
|
Chuck
|
Finley
|
1250
|
Dave
|
Stieb
|
1160
|
Bret
|
Saberhagen
|
1124
|
It was suggested in Dave Fleming’s recent article that David Cone pitched disproportionately against sub-.500 teams, but this method fully and absolutely takes that issue off the table, by adjusting each start for the quality of the competition. I would vote for any of this last group of pitchers for the Hall of Fame before I would even consider voting for Dave Stieb.
7. More About How the Method Works
Each start by every starting pitcher is evaluated essentially by the Game Score, and by a Compensation Score that gives the pitcher more credit for pitching in a high-ERA league than in a low-ERA league, more credit for pitching in a hitter’s park than in a pitcher’s park, and more credit for pitching against a strong offense than for pitching against a weak offense. The sum of these adjustments is called the Compensation, and the total of the Game Score and the Compensation Score is called the Package Score.
There’s a little more to it than that; in some cases I didn’t trust the Game Score system, either. There are a couple of "outs" or "dodges" in the system; I’ll explain those later. First, let me explain the Compensation Scores. To begin with, I had to study (again) how Game Scores vary with the Run Environment. What is the normal Game Score in a 5.50 Run Environment? What is it in a 2.75 Run Environment?
Well, that’s not the first set of questions I had to deal with. The first set of questions that I had to deal with was, "What is the run-scoring level for each league? What is the Park Factor for each Park? What is the (Park-Adjusted) strength of each team’s offense? I had to find or create answers to each of those questions, then I had to put those answers into the data file that had the records of each game, then I had to combine the League Run Level and the Park Adjustment into a single number so that I could evaluate the "Run Context" for every game. Once I had done that, then I could figure the normal Game Score for every Run Context.
My conclusion: A Game Score of 50.000 is the norm at a Run Context Level of 4.68 Runs Per Game.
For each Run Per Game above or below 4.68, there is a Gain of 3.5 points in the Average Game Score, obviously running inversely to the Run Level. In other words, at a run context of 3.7 runs, the average Game Score is 53.5; at 5.7 runs per game, the average Game Score is 46.5.
The expected Game Level at any run level can be approximated by the formula (10 minus Expected Runs) * 3.50 + 31.5. In other words, if the expected Run Level is 7.00, the expected Game Score is 42. If the expected Run Level is 6.00, the expected Game Score is 45.5. If the expected Run Level is 5.00, the expected Game Score is 49. If the expected Run Level is 4.00, the expected Game Score is 52.5. If the expected Run Level is 3.00, the expected Game Score is 56.0. If the pitcher pitches better than expected, that’s a Good Start.
Expected Run Levels in the study go as high as 10.7 runs, and as low as 2.045 runs. That 10.7 runs sounds crazy, but. . ..do the math. That’s the number for Coors Field in the year 2000, facing the San Francisco Giants. The National League Runs Per Game in 2000 were 5.004 per game. The Park Factor for Colorado was 166, meaning that the Rockies scored and allowed 66% more runs per game at home than they did on the road.
We have to modify that figure slightly to recognize that the Rockies were the only National League team which did not have Coors Field among their "road" venues, so the functional park adjustment for Colorado is not 1.660, but 1.587; it actually expanded offense relative to the league average not by 66% but by 59%. Still, that expands the offensive context for Colorado, 2000, to 7.944 runs per team per game.
The best offense in the league was the San Francisco Giants, led by Barry Bonds and the MVP, Jeff Kent. Three regulars on that team had an OPS over 1.000 (Bonds, Kent and Ellis Burks)—and they played in what was almost the worst hitter’s park in the league, with a park factor of 83. Park-adjusted, the Giants’ offense was about 36% better than the league average; let’s call it 35.88%,. So if you expand 7.944 runs per game by 35.88%, you’ve got 10.794 runs per game.
I see that my data file has them at "only" 10.67 runs per game, so I must have made some other little adjustment somewhere that I have now forgotten. In real life, the Giants only scored 8.29 runs per game in Coors’ Field in 2000—but then, they LOST six of the seven games. Let me repeat that: The Giants in 2000 scored 8.29 runs per game in Coors Field—and lost six out of seven games. The Rockies outscored them by four runs per game, at 12.29.
If you pitch a 2-hit shutout, that’s a great game in any environment; if you give up 7 runs in 4 innings, that’s a lousy game in any environment. Between the extremes, though, the Run Environment plays a very significant role in how the performance is evaluated.
One game in 11 is evaluated by this system as a "10", meaning that, under the circumstances, this pitcher pitched a hell of a game. 21,957 games are evaluated as "Ten" performances. Remember now that I am not talking about what is normal or typical; I am talking now about what is extremely atypical, extremely unusual.
The worst Game Score that was evaluated a "10", in this study, was a game pitched by Pedro Astacio of the Rockies in Coors’ Field on May 10, 1999, not against the Giants but against the Mets, also a good-hitting team in 1999. Astacio pitched 8 innings, giving up 7 hits, 3 runs, 3 earned runs, 7 strikeouts, 5 walks. That’s a Game Score of 58, and a Game Score of 58 would ordinarily be evaluated as a "6" or a "7". But facing that team, in that park, in that season, that’s a tremendous effort. The Rockies won the game 10 to 3.
On the other side of the coin, Ken Forsch of the Houston Astros in 1971 had a game in which he went 8 innings, gave up 9 hits, 2 runs, 2 earned, 2 walks, 5 strikeouts. Superficially that is a better game than Astacio’s, with a Game Score of 59—but in this study it is marked down as a "5"—a neutral start under of the circumstances. The National League average in 1971 was less than four runs per game (3.91). The Astrodome was by far the best pitcher’s park in the league, reducing offense by another 17, 18%. The Padres had a terrible offense, losing 100 games and being last in the National League in runs scored by more than 50. Under those circumstances, a Game Score of 59 is a very humdrum performance.
Two more things. I told you that, in this study, I didn’t always trust the Game Score system. I modified the results above in two ways. First, if a pitcher pitched nine innings or more than nine innings in a game and allowed no runs, earned or un-earned, that was always scored as a "10", no matter what the other details were.
I believe in the Game Score system; that is to say, I believe it is appropriate to look at the details of the performance, in asking how well a pitcher has pitched, as well as looking at the bottom line. But a shutout in baseball is an absolute, and absolutes are not relative. If you give up no runs in a game, you absolutely cannot lose. If you pitch 9 innings in a game, you absolutely cannot be expected to do more. In a 9-inning game, you can’t do better than pitch a shutout; you can do it more impressively, but you can’t do any better.
On a zero-to-ten scale, I just would not be comfortable scoring a shutout as anything other than a "10", no matter what the other details were. The least impressive shutout in my data was a game pitched by Dave Freisleben of the San Diego Padres on May 29, 1976. Freisleben gave up 10 hits in that game, walked 4 batters and struck out only 3. The game was pitched in a league in which the ERA was low—3.50—and in a park in which the Park Run Factor was 76, which is extremely low. The game was pitched against the San Francisco Giants, a poor hitting team in 1976. Altogether. .. .not that impressive.
Still, the man pitched a shutout. You can’t do any better. This rule—that a shutout has to be scored as a "10"—has little practical effect in our system, because
a) The overwhelming majority of complete game shutouts (88%) would be scored as "10s" anyway, even without this rule, and
b) Those which wouldn’t be scored as 10s would almost all be scored as "9s".
Getting rid of this rule would have very limited effect on how pitchers were evaluated; it would merely make small changes in how 4/10ths of one percent of the games were evaluated. I just wouldn’t be comfortable not giving a pitcher who pitched a shutout a solid A.
The other little retreat from Game Scores has to do with pitchers who are knocked out of the game early. In the Game Score system, a pitcher starts the game at "50", and goes up or down depending on whether he gets outs or gives up hits, walks and runs. Given the general goal of the Game Score system, to mirror the impact of the pitcher’s performance on the won-lost expectation of his team, that seems like the right approach.
For this particular purpose, it seems problematic. If a pitcher takes the mound, strikes out the first batter, and then has to come out of the game, that’s a Game Score of 51. If he walks the first batter and then has to come out of the game, that’s a Game Score of 49.
But when a team puts a pitcher on the mound, they’re counting on him to get some outs. If he doesn’t get anybody out, that’s not an "average" performance; that’s a disappointing turn of events.
I decided, in this study, to mark down pitchers who are knocked out of the game in the first five innings, by subtracting from them one point for every out they record, short of 15 outs. The pitcher who strikes out the first batter and then comes out of the game, who otherwise would be at 51, which might be a Level 5 game, now goes down to 37, which would be more likely to be a level 3 game. The pitcher who walks the first batter and then comes out drops from 49 to 34.
Again—not a huge impact on the system. This rule affects many more games than the other rule. The "shutout" rule affects 1,078 games. The "short outing" rule affects 49,177 games, although it probably only changes the rank of a small percentage of those. Most of the pitchers who score very poorly by Game Score were knocked out of the game early. When we discount their performance, we are also discounting most of those who would score about the same, then we rank them in comparison to one another, so the net effect is very limited, much like a double foul in basketball. Still, I’m not comfortable with the possibility of a pitcher being credited as pitching a good game when he may not have gotten anyone out. I put in this rule to eliminate that possibility.